d@0: d@0:
d@0:d@0: d@0: d@0: Next: Stack alignment on x86, d@0: Previous: Data Alignment, d@0: Up: Data Alignment d@0:
SIMD, which stands for “Single Instruction Multiple Data,” is a set of
d@0: special operations supported by some processors to perform a single
d@0: operation on several numbers (usually 2 or 4) simultaneously. SIMD
d@0: floating-point instructions are available on several popular CPUs:
d@0: SSE/SSE2 (single/double precision) on Pentium III and higher and on
d@0: AMD64, AltiVec (single precision) on some PowerPCs (Apple G4 and
d@0: higher), and MIPS Paired Single. FFTW can be compiled to support the
d@0: SIMD instructions on any of these systems.
d@0:
d@0: A program linking to an FFTW library compiled with SIMD support can
d@0: obtain a nonnegligible speedup for most complex and r2c/c2r
d@0: transforms. In order to obtain this speedup, however, the arrays of
d@0: complex (or real) data passed to FFTW must be specially aligned in
d@0: memory (typically 16-byte aligned), and often this alignment is more
d@0: stringent than that provided by the usual malloc
(etc.)
d@0: allocation routines.
d@0:
d@0:
In order to guarantee proper alignment for SIMD, therefore, in case
d@0: your program is ever linked against a SIMD-using FFTW, we recommend
d@0: allocating your transform data with fftw_malloc
and
d@0: de-allocating it with fftw_free
.
d@0: These have exactly the same interface and behavior as
d@0: malloc
/free
, except that for a SIMD FFTW they ensure
d@0: that the returned pointer has the necessary alignment (by calling
d@0: memalign
or its equivalent on your OS).
d@0:
d@0:
You are not required to use fftw_malloc
. You can
d@0: allocate your data in any way that you like, from malloc
to
d@0: new
(in C++) to a fixed-size array declaration. If the array
d@0: happens not to be properly aligned, FFTW will not use the SIMD
d@0: extensions.
d@0:
d@0:
d@0:
d@0:
d@0: