cannam@167: @node Other Important Topics, FFTW Reference, Tutorial, Top cannam@167: @chapter Other Important Topics cannam@167: @menu cannam@167: * SIMD alignment and fftw_malloc:: cannam@167: * Multi-dimensional Array Format:: cannam@167: * Words of Wisdom-Saving Plans:: cannam@167: * Caveats in Using Wisdom:: cannam@167: @end menu cannam@167: cannam@167: @c ------------------------------------------------------------ cannam@167: @node SIMD alignment and fftw_malloc, Multi-dimensional Array Format, Other Important Topics, Other Important Topics cannam@167: @section SIMD alignment and fftw_malloc cannam@167: cannam@167: SIMD, which stands for ``Single Instruction Multiple Data,'' is a set of cannam@167: special operations supported by some processors to perform a single cannam@167: operation on several numbers (usually 2 or 4) simultaneously. SIMD cannam@167: floating-point instructions are available on several popular CPUs: cannam@167: SSE/SSE2/AVX/AVX2/AVX512/KCVI on some x86/x86-64 processors, AltiVec and cannam@167: VSX on some POWER/PowerPCs, NEON on some ARM models. FFTW can be cannam@167: compiled to support the SIMD instructions on any of these systems. cannam@167: @cindex SIMD cannam@167: @cindex SSE cannam@167: @cindex SSE2 cannam@167: @cindex AVX cannam@167: @cindex AVX2 cannam@167: @cindex AVX512 cannam@167: @cindex AltiVec cannam@167: @cindex VSX cannam@167: @cindex precision cannam@167: cannam@167: cannam@167: A program linking to an FFTW library compiled with SIMD support can cannam@167: obtain a nonnegligible speedup for most complex and r2c/c2r cannam@167: transforms. In order to obtain this speedup, however, the arrays of cannam@167: complex (or real) data passed to FFTW must be specially aligned in cannam@167: memory (typically 16-byte aligned), and often this alignment is more cannam@167: stringent than that provided by the usual @code{malloc} (etc.) cannam@167: allocation routines. cannam@167: cannam@167: @cindex portability cannam@167: In order to guarantee proper alignment for SIMD, therefore, in case cannam@167: your program is ever linked against a SIMD-using FFTW, we recommend cannam@167: allocating your transform data with @code{fftw_malloc} and cannam@167: de-allocating it with @code{fftw_free}. cannam@167: @findex fftw_malloc cannam@167: @findex fftw_free cannam@167: These have exactly the same interface and behavior as cannam@167: @code{malloc}/@code{free}, except that for a SIMD FFTW they ensure cannam@167: that the returned pointer has the necessary alignment (by calling cannam@167: @code{memalign} or its equivalent on your OS). cannam@167: cannam@167: You are not @emph{required} to use @code{fftw_malloc}. You can cannam@167: allocate your data in any way that you like, from @code{malloc} to cannam@167: @code{new} (in C++) to a fixed-size array declaration. If the array cannam@167: happens not to be properly aligned, FFTW will not use the SIMD cannam@167: extensions. cannam@167: @cindex C++ cannam@167: cannam@167: @findex fftw_alloc_real cannam@167: @findex fftw_alloc_complex cannam@167: Since @code{fftw_malloc} only ever needs to be used for real and cannam@167: complex arrays, we provide two convenient wrapper routines cannam@167: @code{fftw_alloc_real(N)} and @code{fftw_alloc_complex(N)} that are cannam@167: equivalent to @code{(double*)fftw_malloc(sizeof(double) * N)} and cannam@167: @code{(fftw_complex*)fftw_malloc(sizeof(fftw_complex) * N)}, cannam@167: respectively (or their equivalents in other precisions). cannam@167: cannam@167: @c ------------------------------------------------------------ cannam@167: @node Multi-dimensional Array Format, Words of Wisdom-Saving Plans, SIMD alignment and fftw_malloc, Other Important Topics cannam@167: @section Multi-dimensional Array Format cannam@167: cannam@167: This section describes the format in which multi-dimensional arrays cannam@167: are stored in FFTW. We felt that a detailed discussion of this topic cannam@167: was necessary. Since several different formats are common, this topic cannam@167: is often a source of confusion. cannam@167: cannam@167: @menu cannam@167: * Row-major Format:: cannam@167: * Column-major Format:: cannam@167: * Fixed-size Arrays in C:: cannam@167: * Dynamic Arrays in C:: cannam@167: * Dynamic Arrays in C-The Wrong Way:: cannam@167: @end menu cannam@167: cannam@167: @c =========> cannam@167: @node Row-major Format, Column-major Format, Multi-dimensional Array Format, Multi-dimensional Array Format cannam@167: @subsection Row-major Format cannam@167: @cindex row-major cannam@167: cannam@167: The multi-dimensional arrays passed to @code{fftw_plan_dft} etcetera cannam@167: are expected to be stored as a single contiguous block in cannam@167: @dfn{row-major} order (sometimes called ``C order''). Basically, this cannam@167: means that as you step through adjacent memory locations, the first cannam@167: dimension's index varies most slowly and the last dimension's index cannam@167: varies most quickly. cannam@167: cannam@167: To be more explicit, let us consider an array of rank @math{d} whose cannam@167: dimensions are @ndims{}. Now, we specify a location in the array by a cannam@167: sequence of @math{d} (zero-based) indices, one for each dimension: cannam@167: @tex cannam@167: $(i_0, i_1, i_2, \ldots, i_{d-1})$. cannam@167: @end tex cannam@167: @ifinfo cannam@167: (i[0], i[1], ..., i[d-1]). cannam@167: @end ifinfo cannam@167: @html cannam@167: (i0, i1, i2,..., id-1). cannam@167: @end html cannam@167: If the array is stored in row-major cannam@167: order, then this element is located at the position cannam@167: @tex cannam@167: $i_{d-1} + n_{d-1} (i_{d-2} + n_{d-2} (\ldots + n_1 i_0))$. cannam@167: @end tex cannam@167: @ifinfo cannam@167: i[d-1] + n[d-1] * (i[d-2] + n[d-2] * (... + n[1] * i[0])). cannam@167: @end ifinfo cannam@167: @html cannam@167: id-1 + nd-1 * (id-2 + nd-2 * (... + n1 * i0)). cannam@167: @end html cannam@167: cannam@167: Note that, for the ordinary complex DFT, each element of the array cannam@167: must be of type @code{fftw_complex}; i.e. a (real, imaginary) pair of cannam@167: (double-precision) numbers. cannam@167: cannam@167: In the advanced FFTW interface, the physical dimensions @math{n} from cannam@167: which the indices are computed can be different from (larger than) cannam@167: the logical dimensions of the transform to be computed, in order to cannam@167: transform a subset of a larger array. cannam@167: @cindex advanced interface cannam@167: Note also that, in the advanced interface, the expression above is cannam@167: multiplied by a @dfn{stride} to get the actual array index---this is cannam@167: useful in situations where each element of the multi-dimensional array cannam@167: is actually a data structure (or another array), and you just want to cannam@167: transform a single field. In the basic interface, however, the stride cannam@167: is 1. cannam@167: @cindex stride cannam@167: cannam@167: @c =========> cannam@167: @node Column-major Format, Fixed-size Arrays in C, Row-major Format, Multi-dimensional Array Format cannam@167: @subsection Column-major Format cannam@167: @cindex column-major cannam@167: cannam@167: Readers from the Fortran world are used to arrays stored in cannam@167: @dfn{column-major} order (sometimes called ``Fortran order''). This is cannam@167: essentially the exact opposite of row-major order in that, here, the cannam@167: @emph{first} dimension's index varies most quickly. cannam@167: cannam@167: If you have an array stored in column-major order and wish to cannam@167: transform it using FFTW, it is quite easy to do. When creating the cannam@167: plan, simply pass the dimensions of the array to the planner in cannam@167: @emph{reverse order}. For example, if your array is a rank three cannam@167: @code{N x M x L} matrix in column-major order, you should pass the cannam@167: dimensions of the array as if it were an @code{L x M x N} matrix cannam@167: (which it is, from the perspective of FFTW). This is done for you cannam@167: @emph{automatically} by the FFTW legacy-Fortran interface cannam@167: (@pxref{Calling FFTW from Legacy Fortran}), but you must do it cannam@167: manually with the modern Fortran interface (@pxref{Reversing array cannam@167: dimensions}). cannam@167: @cindex Fortran interface cannam@167: cannam@167: @c =========> cannam@167: @node Fixed-size Arrays in C, Dynamic Arrays in C, Column-major Format, Multi-dimensional Array Format cannam@167: @subsection Fixed-size Arrays in C cannam@167: @cindex C multi-dimensional arrays cannam@167: cannam@167: A multi-dimensional array whose size is declared at compile time in C cannam@167: is @emph{already} in row-major order. You don't have to do anything cannam@167: special to transform it. For example: cannam@167: cannam@167: @example cannam@167: @{ cannam@167: fftw_complex data[N0][N1][N2]; cannam@167: fftw_plan plan; cannam@167: ... cannam@167: plan = fftw_plan_dft_3d(N0, N1, N2, &data[0][0][0], &data[0][0][0], cannam@167: FFTW_FORWARD, FFTW_ESTIMATE); cannam@167: ... cannam@167: @} cannam@167: @end example cannam@167: cannam@167: This will plan a 3d in-place transform of size @code{N0 x N1 x N2}. cannam@167: Notice how we took the address of the zero-th element to pass to the cannam@167: planner (we could also have used a typecast). cannam@167: cannam@167: However, we tend to @emph{discourage} users from declaring their cannam@167: arrays in this way, for two reasons. First, this allocates the array cannam@167: on the stack (``automatic'' storage), which has a very limited size on cannam@167: most operating systems (declaring an array with more than a few cannam@167: thousand elements will often cause a crash). (You can get around this cannam@167: limitation on many systems by declaring the array as cannam@167: @code{static} and/or global, but that has its own drawbacks.) cannam@167: Second, it may not optimally align the array for use with a SIMD cannam@167: FFTW (@pxref{SIMD alignment and fftw_malloc}). Instead, we recommend cannam@167: using @code{fftw_malloc}, as described below. cannam@167: cannam@167: @c =========> cannam@167: @node Dynamic Arrays in C, Dynamic Arrays in C-The Wrong Way, Fixed-size Arrays in C, Multi-dimensional Array Format cannam@167: @subsection Dynamic Arrays in C cannam@167: cannam@167: We recommend allocating most arrays dynamically, with cannam@167: @code{fftw_malloc}. This isn't too hard to do, although it is not as cannam@167: straightforward for multi-dimensional arrays as it is for cannam@167: one-dimensional arrays. cannam@167: cannam@167: Creating the array is simple: using a dynamic-allocation routine like cannam@167: @code{fftw_malloc}, allocate an array big enough to store N cannam@167: @code{fftw_complex} values (for a complex DFT), where N is the product cannam@167: of the sizes of the array dimensions (i.e. the total number of complex cannam@167: values in the array). For example, here is code to allocate a cannam@167: @threedims{5,12,27} rank-3 array: cannam@167: @findex fftw_malloc cannam@167: cannam@167: @example cannam@167: fftw_complex *an_array; cannam@167: an_array = (fftw_complex*) fftw_malloc(5*12*27 * sizeof(fftw_complex)); cannam@167: @end example cannam@167: cannam@167: Accessing the array elements, however, is more tricky---you can't cannam@167: simply use multiple applications of the @samp{[]} operator like you cannam@167: could for fixed-size arrays. Instead, you have to explicitly compute cannam@167: the offset into the array using the formula given earlier for cannam@167: row-major arrays. For example, to reference the @math{(i,j,k)}-th cannam@167: element of the array allocated above, you would use the expression cannam@167: @code{an_array[k + 27 * (j + 12 * i)]}. cannam@167: cannam@167: This pain can be alleviated somewhat by defining appropriate macros, cannam@167: or, in C++, creating a class and overloading the @samp{()} operator. cannam@167: The recent C99 standard provides a way to reinterpret the dynamic cannam@167: array as a ``variable-length'' multi-dimensional array amenable to cannam@167: @samp{[]}, but this feature is not yet widely supported by compilers. cannam@167: @cindex C99 cannam@167: @cindex C++ cannam@167: cannam@167: @c =========> cannam@167: @node Dynamic Arrays in C-The Wrong Way, , Dynamic Arrays in C, Multi-dimensional Array Format cannam@167: @subsection Dynamic Arrays in C---The Wrong Way cannam@167: cannam@167: A different method for allocating multi-dimensional arrays in C is cannam@167: often suggested that is incompatible with FFTW: @emph{using it will cannam@167: cause FFTW to die a painful death}. We discuss the technique here, cannam@167: however, because it is so commonly known and used. This method is to cannam@167: create arrays of pointers of arrays of pointers of @dots{}etcetera. cannam@167: For example, the analogue in this method to the example above is: cannam@167: cannam@167: @example cannam@167: int i,j; cannam@167: fftw_complex ***a_bad_array; /* @r{another way to make a 5x12x27 array} */ cannam@167: cannam@167: a_bad_array = (fftw_complex ***) malloc(5 * sizeof(fftw_complex **)); cannam@167: for (i = 0; i < 5; ++i) @{ cannam@167: a_bad_array[i] = cannam@167: (fftw_complex **) malloc(12 * sizeof(fftw_complex *)); cannam@167: for (j = 0; j < 12; ++j) cannam@167: a_bad_array[i][j] = cannam@167: (fftw_complex *) malloc(27 * sizeof(fftw_complex)); cannam@167: @} cannam@167: @end example cannam@167: cannam@167: As you can see, this sort of array is inconvenient to allocate (and cannam@167: deallocate). On the other hand, it has the advantage that the cannam@167: @math{(i,j,k)}-th element can be referenced simply by cannam@167: @code{a_bad_array[i][j][k]}. cannam@167: cannam@167: If you like this technique and want to maximize convenience in accessing cannam@167: the array, but still want to pass the array to FFTW, you can use a cannam@167: hybrid method. Allocate the array as one contiguous block, but also cannam@167: declare an array of arrays of pointers that point to appropriate places cannam@167: in the block. That sort of trick is beyond the scope of this cannam@167: documentation; for more information on multi-dimensional arrays in C, cannam@167: see the @code{comp.lang.c} cannam@167: @uref{http://c-faq.com/aryptr/dynmuldimary.html, FAQ}. cannam@167: cannam@167: @c ------------------------------------------------------------ cannam@167: @node Words of Wisdom-Saving Plans, Caveats in Using Wisdom, Multi-dimensional Array Format, Other Important Topics cannam@167: @section Words of Wisdom---Saving Plans cannam@167: @cindex wisdom cannam@167: @cindex saving plans to disk cannam@167: cannam@167: FFTW implements a method for saving plans to disk and restoring them. cannam@167: In fact, what FFTW does is more general than just saving and loading cannam@167: plans. The mechanism is called @dfn{wisdom}. Here, we describe cannam@167: this feature at a high level. @xref{FFTW Reference}, for a less casual cannam@167: but more complete discussion of how to use wisdom in FFTW. cannam@167: cannam@167: Plans created with the @code{FFTW_MEASURE}, @code{FFTW_PATIENT}, or cannam@167: @code{FFTW_EXHAUSTIVE} options produce near-optimal FFT performance, cannam@167: but may require a long time to compute because FFTW must measure the cannam@167: runtime of many possible plans and select the best one. This setup is cannam@167: designed for the situations where so many transforms of the same size cannam@167: must be computed that the start-up time is irrelevant. For short cannam@167: initialization times, but slower transforms, we have provided cannam@167: @code{FFTW_ESTIMATE}. The @code{wisdom} mechanism is a way to get the cannam@167: best of both worlds: you compute a good plan once, save it to cannam@167: disk, and later reload it as many times as necessary. The wisdom cannam@167: mechanism can actually save and reload many plans at once, not just cannam@167: one. cannam@167: @ctindex FFTW_MEASURE cannam@167: @ctindex FFTW_PATIENT cannam@167: @ctindex FFTW_EXHAUSTIVE cannam@167: @ctindex FFTW_ESTIMATE cannam@167: cannam@167: cannam@167: Whenever you create a plan, the FFTW planner accumulates wisdom, which cannam@167: is information sufficient to reconstruct the plan. After planning, cannam@167: you can save this information to disk by means of the function: cannam@167: @example cannam@167: int fftw_export_wisdom_to_filename(const char *filename); cannam@167: @end example cannam@167: @findex fftw_export_wisdom_to_filename cannam@167: (This function returns non-zero on success.) cannam@167: cannam@167: The next time you run the program, you can restore the wisdom with cannam@167: @code{fftw_import_wisdom_from_filename} (which also returns non-zero on success), cannam@167: and then recreate the plan using the same flags as before. cannam@167: @example cannam@167: int fftw_import_wisdom_from_filename(const char *filename); cannam@167: @end example cannam@167: @findex fftw_import_wisdom_from_filename cannam@167: cannam@167: Wisdom is automatically used for any size to which it is applicable, as cannam@167: long as the planner flags are not more ``patient'' than those with which cannam@167: the wisdom was created. For example, wisdom created with cannam@167: @code{FFTW_MEASURE} can be used if you later plan with cannam@167: @code{FFTW_ESTIMATE} or @code{FFTW_MEASURE}, but not with cannam@167: @code{FFTW_PATIENT}. cannam@167: cannam@167: The @code{wisdom} is cumulative, and is stored in a global, private cannam@167: data structure managed internally by FFTW. The storage space required cannam@167: is minimal, proportional to the logarithm of the sizes the wisdom was cannam@167: generated from. If memory usage is a concern, however, the wisdom can cannam@167: be forgotten and its associated memory freed by calling: cannam@167: @example cannam@167: void fftw_forget_wisdom(void); cannam@167: @end example cannam@167: @findex fftw_forget_wisdom cannam@167: cannam@167: Wisdom can be exported to a file, a string, or any other medium. cannam@167: For details, see @ref{Wisdom}. cannam@167: cannam@167: @node Caveats in Using Wisdom, , Words of Wisdom-Saving Plans, Other Important Topics cannam@167: @section Caveats in Using Wisdom cannam@167: @cindex wisdom, problems with cannam@167: cannam@167: @quotation cannam@167: @html cannam@167: cannam@167: @end html cannam@167: For in much wisdom is much grief, and he that increaseth knowledge cannam@167: increaseth sorrow. cannam@167: @html cannam@167: cannam@167: @end html cannam@167: [Ecclesiastes 1:18] cannam@167: @cindex Ecclesiastes cannam@167: @end quotation cannam@167: @iftex cannam@167: @medskip cannam@167: @end iftex cannam@167: cannam@167: @cindex portability cannam@167: There are pitfalls to using wisdom, in that it can negate FFTW's cannam@167: ability to adapt to changing hardware and other conditions. For cannam@167: example, it would be perfectly possible to export wisdom from a cannam@167: program running on one processor and import it into a program running cannam@167: on another processor. Doing so, however, would mean that the second cannam@167: program would use plans optimized for the first processor, instead of cannam@167: the one it is running on. cannam@167: cannam@167: It should be safe to reuse wisdom as long as the hardware and program cannam@167: binaries remain unchanged. (Actually, the optimal plan may change even cannam@167: between runs of the same binary on identical hardware, due to cannam@167: differences in the virtual memory environment, etcetera. Users cannam@167: seriously interested in performance should worry about this problem, cannam@167: too.) It is likely that, if the same wisdom is used for two cannam@167: different program binaries, even running on the same machine, the cannam@167: plans may be sub-optimal because of differing code alignments. It is cannam@167: therefore wise to recreate wisdom every time an application is cannam@167: recompiled. The more the underlying hardware and software changes cannam@167: between the creation of wisdom and its use, the greater grows cannam@167: the risk of sub-optimal plans. cannam@167: cannam@167: Nevertheless, if the choice is between using @code{FFTW_ESTIMATE} or cannam@167: using possibly-suboptimal wisdom (created on the same machine, but for a cannam@167: different binary), the wisdom is likely to be better. For this reason, cannam@167: we provide a function to import wisdom from a standard system-wide cannam@167: location (@code{/etc/fftw/wisdom} on Unix): cannam@167: @cindex wisdom, system-wide cannam@167: cannam@167: @example cannam@167: int fftw_import_system_wisdom(void); cannam@167: @end example cannam@167: @findex fftw_import_system_wisdom cannam@167: cannam@167: FFTW also provides a standalone program, @code{fftw-wisdom} (described cannam@167: by its own @code{man} page on Unix) with which users can create wisdom, cannam@167: e.g. for a canonical set of sizes to store in the system wisdom file. cannam@167: @xref{Wisdom Utilities}. cannam@167: @cindex fftw-wisdom utility cannam@167: