Chris@42: Chris@42: Chris@42: Chris@42: Chris@42:
Chris@42:Chris@42: Next: Multi-dimensional Array Format, Previous: Other Important Topics, Up: Other Important Topics [Contents][Index]
Chris@42:SIMD, which stands for “Single Instruction Multiple Data,” is a set of Chris@42: special operations supported by some processors to perform a single Chris@42: operation on several numbers (usually 2 or 4) simultaneously. SIMD Chris@42: floating-point instructions are available on several popular CPUs: Chris@42: SSE/SSE2/AVX/AVX2/AVX512/KCVI on some x86/x86-64 processors, AltiVec and Chris@42: VSX on some POWER/PowerPCs, NEON on some ARM models. FFTW can be Chris@42: compiled to support the SIMD instructions on any of these systems. Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42:
Chris@42: Chris@42:A program linking to an FFTW library compiled with SIMD support can
Chris@42: obtain a nonnegligible speedup for most complex and r2c/c2r
Chris@42: transforms. In order to obtain this speedup, however, the arrays of
Chris@42: complex (or real) data passed to FFTW must be specially aligned in
Chris@42: memory (typically 16-byte aligned), and often this alignment is more
Chris@42: stringent than that provided by the usual malloc
(etc.)
Chris@42: allocation routines.
Chris@42:
In order to guarantee proper alignment for SIMD, therefore, in case
Chris@42: your program is ever linked against a SIMD-using FFTW, we recommend
Chris@42: allocating your transform data with fftw_malloc
and
Chris@42: de-allocating it with fftw_free
.
Chris@42:
Chris@42:
Chris@42: These have exactly the same interface and behavior as
Chris@42: malloc
/free
, except that for a SIMD FFTW they ensure
Chris@42: that the returned pointer has the necessary alignment (by calling
Chris@42: memalign
or its equivalent on your OS).
Chris@42:
You are not required to use fftw_malloc
. You can
Chris@42: allocate your data in any way that you like, from malloc
to
Chris@42: new
(in C++) to a fixed-size array declaration. If the array
Chris@42: happens not to be properly aligned, FFTW will not use the SIMD
Chris@42: extensions.
Chris@42:
Chris@42:
Since fftw_malloc
only ever needs to be used for real and
Chris@42: complex arrays, we provide two convenient wrapper routines
Chris@42: fftw_alloc_real(N)
and fftw_alloc_complex(N)
that are
Chris@42: equivalent to (double*)fftw_malloc(sizeof(double) * N)
and
Chris@42: (fftw_complex*)fftw_malloc(sizeof(fftw_complex) * N)
,
Chris@42: respectively (or their equivalents in other precisions).
Chris@42:
Chris@42: Next: Multi-dimensional Array Format, Previous: Other Important Topics, Up: Other Important Topics [Contents][Index]
Chris@42: