d@0: d@0: d@0: SIMD alignment and fftw_malloc - FFTW 3.2.1 d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0:
d@0:

d@0: d@0: d@0: Next: , d@0: Previous: Data Alignment, d@0: Up: Data Alignment d@0:


d@0:
d@0: d@0:

3.1.1 SIMD alignment and fftw_malloc

d@0: d@0:

SIMD, which stands for “Single Instruction Multiple Data,” is a set of d@0: special operations supported by some processors to perform a single d@0: operation on several numbers (usually 2 or 4) simultaneously. SIMD d@0: floating-point instructions are available on several popular CPUs: d@0: SSE/SSE2 (single/double precision) on Pentium III and higher and on d@0: AMD64, AltiVec (single precision) on some PowerPCs (Apple G4 and d@0: higher), and MIPS Paired Single. FFTW can be compiled to support the d@0: SIMD instructions on any of these systems. d@0: d@0: A program linking to an FFTW library compiled with SIMD support can d@0: obtain a nonnegligible speedup for most complex and r2c/c2r d@0: transforms. In order to obtain this speedup, however, the arrays of d@0: complex (or real) data passed to FFTW must be specially aligned in d@0: memory (typically 16-byte aligned), and often this alignment is more d@0: stringent than that provided by the usual malloc (etc.) d@0: allocation routines. d@0: d@0:

In order to guarantee proper alignment for SIMD, therefore, in case d@0: your program is ever linked against a SIMD-using FFTW, we recommend d@0: allocating your transform data with fftw_malloc and d@0: de-allocating it with fftw_free. d@0: These have exactly the same interface and behavior as d@0: malloc/free, except that for a SIMD FFTW they ensure d@0: that the returned pointer has the necessary alignment (by calling d@0: memalign or its equivalent on your OS). d@0: d@0:

You are not required to use fftw_malloc. You can d@0: allocate your data in any way that you like, from malloc to d@0: new (in C++) to a fixed-size array declaration. If the array d@0: happens not to be properly aligned, FFTW will not use the SIMD d@0: extensions. d@0: d@0: d@0: d@0: d@0: