SIMD alignment and fftw_malloc

Chris@19: Chris@19: Chris@19:

Chris@19: Next: Multi-dimensional Array Format, Chris@19: Previous: Other Important Topics, Chris@19: Up: Other Important Topics Chris@19:

Chris@19:

3.1 SIMD alignment and fftw_malloc

SIMD, which stands for “Single Instruction Multiple Data,” is a set of Chris@19: special operations supported by some processors to perform a single Chris@19: operation on several numbers (usually 2 or 4) simultaneously. SIMD Chris@19: floating-point instructions are available on several popular CPUs: Chris@19: SSE/SSE2/AVX on recent x86/x86-64 processors, AltiVec (single precision) Chris@19: on some PowerPCs (Apple G4 and higher), NEON on some ARM models, and MIPS Paired Single Chris@19: (currently only in FFTW 3.2.x). FFTW can be compiled to support the Chris@19: SIMD instructions on any of these systems. Chris@19: Chris@19: Chris@19:

A program linking to an FFTW library compiled with SIMD support can Chris@19: obtain a nonnegligible speedup for most complex and r2c/c2r Chris@19: transforms. In order to obtain this speedup, however, the arrays of Chris@19: complex (or real) data passed to FFTW must be specially aligned in Chris@19: memory (typically 16-byte aligned), and often this alignment is more Chris@19: stringent than that provided by the usual malloc (etc.) Chris@19: allocation routines. Chris@19: Chris@19:

In order to guarantee proper alignment for SIMD, therefore, in case Chris@19: your program is ever linked against a SIMD-using FFTW, we recommend Chris@19: allocating your transform data with fftw_malloc and Chris@19: de-allocating it with fftw_free. Chris@19: These have exactly the same interface and behavior as Chris@19: malloc/free, except that for a SIMD FFTW they ensure Chris@19: that the returned pointer has the necessary alignment (by calling Chris@19: memalign or its equivalent on your OS). Chris@19: Chris@19:

You are not required to use fftw_malloc. You can Chris@19: allocate your data in any way that you like, from malloc to Chris@19: new (in C++) to a fixed-size array declaration. If the array Chris@19: happens not to be properly aligned, FFTW will not use the SIMD Chris@19: extensions. Chris@19: Chris@19: Since fftw_malloc only ever needs to be used for real and Chris@19: complex arrays, we provide two convenient wrapper routines Chris@19: fftw_alloc_real(N) and fftw_alloc_complex(N) that are Chris@19: equivalent to (double*)fftw_malloc(sizeof(double) * N) and Chris@19: (fftw_complex*)fftw_malloc(sizeof(fftw_complex) * N), Chris@19: respectively (or their equivalents in other precisions). Chris@19: Chris@19: Chris@19: Chris@19: