cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: FFTW 3.3.5: SIMD alignment and fftw_malloc cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127:
cannam@127:

cannam@127: Next: , Previous: , Up: Other Important Topics   [Contents][Index]

cannam@127:
cannam@127:
cannam@127: cannam@127:

3.1 SIMD alignment and fftw_malloc

cannam@127: cannam@127:

SIMD, which stands for “Single Instruction Multiple Data,” is a set of cannam@127: special operations supported by some processors to perform a single cannam@127: operation on several numbers (usually 2 or 4) simultaneously. SIMD cannam@127: floating-point instructions are available on several popular CPUs: cannam@127: SSE/SSE2/AVX/AVX2/AVX512/KCVI on some x86/x86-64 processors, AltiVec and cannam@127: VSX on some POWER/PowerPCs, NEON on some ARM models. FFTW can be cannam@127: compiled to support the SIMD instructions on any of these systems. cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127:

cannam@127: cannam@127:

A program linking to an FFTW library compiled with SIMD support can cannam@127: obtain a nonnegligible speedup for most complex and r2c/c2r cannam@127: transforms. In order to obtain this speedup, however, the arrays of cannam@127: complex (or real) data passed to FFTW must be specially aligned in cannam@127: memory (typically 16-byte aligned), and often this alignment is more cannam@127: stringent than that provided by the usual malloc (etc.) cannam@127: allocation routines. cannam@127:

cannam@127: cannam@127:

In order to guarantee proper alignment for SIMD, therefore, in case cannam@127: your program is ever linked against a SIMD-using FFTW, we recommend cannam@127: allocating your transform data with fftw_malloc and cannam@127: de-allocating it with fftw_free. cannam@127: cannam@127: cannam@127: These have exactly the same interface and behavior as cannam@127: malloc/free, except that for a SIMD FFTW they ensure cannam@127: that the returned pointer has the necessary alignment (by calling cannam@127: memalign or its equivalent on your OS). cannam@127:

cannam@127:

You are not required to use fftw_malloc. You can cannam@127: allocate your data in any way that you like, from malloc to cannam@127: new (in C++) to a fixed-size array declaration. If the array cannam@127: happens not to be properly aligned, FFTW will not use the SIMD cannam@127: extensions. cannam@127: cannam@127:

cannam@127: cannam@127: cannam@127:

Since fftw_malloc only ever needs to be used for real and cannam@127: complex arrays, we provide two convenient wrapper routines cannam@127: fftw_alloc_real(N) and fftw_alloc_complex(N) that are cannam@127: equivalent to (double*)fftw_malloc(sizeof(double) * N) and cannam@127: (fftw_complex*)fftw_malloc(sizeof(fftw_complex) * N), cannam@127: respectively (or their equivalents in other precisions). cannam@127:

cannam@127:
cannam@127:
cannam@127:

cannam@127: Next: , Previous: , Up: Other Important Topics   [Contents][Index]

cannam@127:
cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: