cannam@95: cannam@95:
cannam@95:cannam@95: Next: Multi-dimensional Array Format, cannam@95: Previous: Other Important Topics, cannam@95: Up: Other Important Topics cannam@95:
SIMD, which stands for “Single Instruction Multiple Data,” is a set of cannam@95: special operations supported by some processors to perform a single cannam@95: operation on several numbers (usually 2 or 4) simultaneously. SIMD cannam@95: floating-point instructions are available on several popular CPUs: cannam@95: SSE/SSE2/AVX on recent x86/x86-64 processors, AltiVec (single precision) cannam@95: on some PowerPCs (Apple G4 and higher), NEON on some ARM models, and MIPS Paired Single cannam@95: (currently only in FFTW 3.2.x). FFTW can be compiled to support the cannam@95: SIMD instructions on any of these systems. cannam@95: cannam@95: cannam@95:
A program linking to an FFTW library compiled with SIMD support can
cannam@95: obtain a nonnegligible speedup for most complex and r2c/c2r
cannam@95: transforms. In order to obtain this speedup, however, the arrays of
cannam@95: complex (or real) data passed to FFTW must be specially aligned in
cannam@95: memory (typically 16-byte aligned), and often this alignment is more
cannam@95: stringent than that provided by the usual malloc
(etc.)
cannam@95: allocation routines.
cannam@95:
cannam@95:
In order to guarantee proper alignment for SIMD, therefore, in case
cannam@95: your program is ever linked against a SIMD-using FFTW, we recommend
cannam@95: allocating your transform data with fftw_malloc
and
cannam@95: de-allocating it with fftw_free
.
cannam@95: These have exactly the same interface and behavior as
cannam@95: malloc
/free
, except that for a SIMD FFTW they ensure
cannam@95: that the returned pointer has the necessary alignment (by calling
cannam@95: memalign
or its equivalent on your OS).
cannam@95:
cannam@95:
You are not required to use fftw_malloc
. You can
cannam@95: allocate your data in any way that you like, from malloc
to
cannam@95: new
(in C++) to a fixed-size array declaration. If the array
cannam@95: happens not to be properly aligned, FFTW will not use the SIMD
cannam@95: extensions.
cannam@95:
cannam@95: Since fftw_malloc
only ever needs to be used for real and
cannam@95: complex arrays, we provide two convenient wrapper routines
cannam@95: fftw_alloc_real(N)
and fftw_alloc_complex(N)
that are
cannam@95: equivalent to (double*)fftw_malloc(sizeof(double) * N)
and
cannam@95: (fftw_complex*)fftw_malloc(sizeof(fftw_complex) * N)
,
cannam@95: respectively (or their equivalents in other precisions).
cannam@95:
cannam@95:
cannam@95:
cannam@95: