Mercurial > hg > sv-dependency-builds
diff src/fftw-3.3.3/doc/other.texi @ 95:89f5e221ed7b
Add FFTW3
author | Chris Cannam <cannam@all-day-breakfast.com> |
---|---|
date | Wed, 20 Mar 2013 15:35:50 +0000 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/src/fftw-3.3.3/doc/other.texi Wed Mar 20 15:35:50 2013 +0000 @@ -0,0 +1,398 @@ +@node Other Important Topics, FFTW Reference, Tutorial, Top +@chapter Other Important Topics +@menu +* SIMD alignment and fftw_malloc:: +* Multi-dimensional Array Format:: +* Words of Wisdom-Saving Plans:: +* Caveats in Using Wisdom:: +@end menu + +@c ------------------------------------------------------------ +@node SIMD alignment and fftw_malloc, Multi-dimensional Array Format, Other Important Topics, Other Important Topics +@section SIMD alignment and fftw_malloc + +SIMD, which stands for ``Single Instruction Multiple Data,'' is a set of +special operations supported by some processors to perform a single +operation on several numbers (usually 2 or 4) simultaneously. SIMD +floating-point instructions are available on several popular CPUs: +SSE/SSE2/AVX on recent x86/x86-64 processors, AltiVec (single precision) +on some PowerPCs (Apple G4 and higher), NEON on some ARM models, and MIPS Paired Single +(currently only in FFTW 3.2.x). FFTW can be compiled to support the +SIMD instructions on any of these systems. +@cindex SIMD +@cindex SSE +@cindex SSE2 +@cindex AVX +@cindex AltiVec +@cindex MIPS PS +@cindex precision + + +A program linking to an FFTW library compiled with SIMD support can +obtain a nonnegligible speedup for most complex and r2c/c2r +transforms. In order to obtain this speedup, however, the arrays of +complex (or real) data passed to FFTW must be specially aligned in +memory (typically 16-byte aligned), and often this alignment is more +stringent than that provided by the usual @code{malloc} (etc.) +allocation routines. + +@cindex portability +In order to guarantee proper alignment for SIMD, therefore, in case +your program is ever linked against a SIMD-using FFTW, we recommend +allocating your transform data with @code{fftw_malloc} and +de-allocating it with @code{fftw_free}. +@findex fftw_malloc +@findex fftw_free +These have exactly the same interface and behavior as +@code{malloc}/@code{free}, except that for a SIMD FFTW they ensure +that the returned pointer has the necessary alignment (by calling +@code{memalign} or its equivalent on your OS). + +You are not @emph{required} to use @code{fftw_malloc}. You can +allocate your data in any way that you like, from @code{malloc} to +@code{new} (in C++) to a fixed-size array declaration. If the array +happens not to be properly aligned, FFTW will not use the SIMD +extensions. +@cindex C++ + +@findex fftw_alloc_real +@findex fftw_alloc_complex +Since @code{fftw_malloc} only ever needs to be used for real and +complex arrays, we provide two convenient wrapper routines +@code{fftw_alloc_real(N)} and @code{fftw_alloc_complex(N)} that are +equivalent to @code{(double*)fftw_malloc(sizeof(double) * N)} and +@code{(fftw_complex*)fftw_malloc(sizeof(fftw_complex) * N)}, +respectively (or their equivalents in other precisions). + +@c ------------------------------------------------------------ +@node Multi-dimensional Array Format, Words of Wisdom-Saving Plans, SIMD alignment and fftw_malloc, Other Important Topics +@section Multi-dimensional Array Format + +This section describes the format in which multi-dimensional arrays +are stored in FFTW. We felt that a detailed discussion of this topic +was necessary. Since several different formats are common, this topic +is often a source of confusion. + +@menu +* Row-major Format:: +* Column-major Format:: +* Fixed-size Arrays in C:: +* Dynamic Arrays in C:: +* Dynamic Arrays in C-The Wrong Way:: +@end menu + +@c =========> +@node Row-major Format, Column-major Format, Multi-dimensional Array Format, Multi-dimensional Array Format +@subsection Row-major Format +@cindex row-major + +The multi-dimensional arrays passed to @code{fftw_plan_dft} etcetera +are expected to be stored as a single contiguous block in +@dfn{row-major} order (sometimes called ``C order''). Basically, this +means that as you step through adjacent memory locations, the first +dimension's index varies most slowly and the last dimension's index +varies most quickly. + +To be more explicit, let us consider an array of rank @math{d} whose +dimensions are @ndims{}. Now, we specify a location in the array by a +sequence of @math{d} (zero-based) indices, one for each dimension: +@tex +$(i_0, i_1, i_2, \ldots, i_{d-1})$. +@end tex +@ifinfo +(i[0], i[1], ..., i[d-1]). +@end ifinfo +@html +(i<sub>0</sub>, i<sub>1</sub>, i<sub>2</sub>,..., i<sub>d-1</sub>). +@end html +If the array is stored in row-major +order, then this element is located at the position +@tex +$i_{d-1} + n_{d-1} (i_{d-2} + n_{d-2} (\ldots + n_1 i_0))$. +@end tex +@ifinfo +i[d-1] + n[d-1] * (i[d-2] + n[d-2] * (... + n[1] * i[0])). +@end ifinfo +@html +i<sub>d-1</sub> + n<sub>d-1</sub> * (i<sub>d-2</sub> + n<sub>d-2</sub> * (... + n<sub>1</sub> * i<sub>0</sub>)). +@end html + +Note that, for the ordinary complex DFT, each element of the array +must be of type @code{fftw_complex}; i.e. a (real, imaginary) pair of +(double-precision) numbers. + +In the advanced FFTW interface, the physical dimensions @math{n} from +which the indices are computed can be different from (larger than) +the logical dimensions of the transform to be computed, in order to +transform a subset of a larger array. +@cindex advanced interface +Note also that, in the advanced interface, the expression above is +multiplied by a @dfn{stride} to get the actual array index---this is +useful in situations where each element of the multi-dimensional array +is actually a data structure (or another array), and you just want to +transform a single field. In the basic interface, however, the stride +is 1. +@cindex stride + +@c =========> +@node Column-major Format, Fixed-size Arrays in C, Row-major Format, Multi-dimensional Array Format +@subsection Column-major Format +@cindex column-major + +Readers from the Fortran world are used to arrays stored in +@dfn{column-major} order (sometimes called ``Fortran order''). This is +essentially the exact opposite of row-major order in that, here, the +@emph{first} dimension's index varies most quickly. + +If you have an array stored in column-major order and wish to +transform it using FFTW, it is quite easy to do. When creating the +plan, simply pass the dimensions of the array to the planner in +@emph{reverse order}. For example, if your array is a rank three +@code{N x M x L} matrix in column-major order, you should pass the +dimensions of the array as if it were an @code{L x M x N} matrix +(which it is, from the perspective of FFTW). This is done for you +@emph{automatically} by the FFTW legacy-Fortran interface +(@pxref{Calling FFTW from Legacy Fortran}), but you must do it +manually with the modern Fortran interface (@pxref{Reversing array +dimensions}). +@cindex Fortran interface + +@c =========> +@node Fixed-size Arrays in C, Dynamic Arrays in C, Column-major Format, Multi-dimensional Array Format +@subsection Fixed-size Arrays in C +@cindex C multi-dimensional arrays + +A multi-dimensional array whose size is declared at compile time in C +is @emph{already} in row-major order. You don't have to do anything +special to transform it. For example: + +@example +@{ + fftw_complex data[N0][N1][N2]; + fftw_plan plan; + ... + plan = fftw_plan_dft_3d(N0, N1, N2, &data[0][0][0], &data[0][0][0], + FFTW_FORWARD, FFTW_ESTIMATE); + ... +@} +@end example + +This will plan a 3d in-place transform of size @code{N0 x N1 x N2}. +Notice how we took the address of the zero-th element to pass to the +planner (we could also have used a typecast). + +However, we tend to @emph{discourage} users from declaring their +arrays in this way, for two reasons. First, this allocates the array +on the stack (``automatic'' storage), which has a very limited size on +most operating systems (declaring an array with more than a few +thousand elements will often cause a crash). (You can get around this +limitation on many systems by declaring the array as +@code{static} and/or global, but that has its own drawbacks.) +Second, it may not optimally align the array for use with a SIMD +FFTW (@pxref{SIMD alignment and fftw_malloc}). Instead, we recommend +using @code{fftw_malloc}, as described below. + +@c =========> +@node Dynamic Arrays in C, Dynamic Arrays in C-The Wrong Way, Fixed-size Arrays in C, Multi-dimensional Array Format +@subsection Dynamic Arrays in C + +We recommend allocating most arrays dynamically, with +@code{fftw_malloc}. This isn't too hard to do, although it is not as +straightforward for multi-dimensional arrays as it is for +one-dimensional arrays. + +Creating the array is simple: using a dynamic-allocation routine like +@code{fftw_malloc}, allocate an array big enough to store N +@code{fftw_complex} values (for a complex DFT), where N is the product +of the sizes of the array dimensions (i.e. the total number of complex +values in the array). For example, here is code to allocate a +@threedims{5,12,27} rank-3 array: +@findex fftw_malloc + +@example +fftw_complex *an_array; +an_array = (fftw_complex*) fftw_malloc(5*12*27 * sizeof(fftw_complex)); +@end example + +Accessing the array elements, however, is more tricky---you can't +simply use multiple applications of the @samp{[]} operator like you +could for fixed-size arrays. Instead, you have to explicitly compute +the offset into the array using the formula given earlier for +row-major arrays. For example, to reference the @math{(i,j,k)}-th +element of the array allocated above, you would use the expression +@code{an_array[k + 27 * (j + 12 * i)]}. + +This pain can be alleviated somewhat by defining appropriate macros, +or, in C++, creating a class and overloading the @samp{()} operator. +The recent C99 standard provides a way to reinterpret the dynamic +array as a ``variable-length'' multi-dimensional array amenable to +@samp{[]}, but this feature is not yet widely supported by compilers. +@cindex C99 +@cindex C++ + +@c =========> +@node Dynamic Arrays in C-The Wrong Way, , Dynamic Arrays in C, Multi-dimensional Array Format +@subsection Dynamic Arrays in C---The Wrong Way + +A different method for allocating multi-dimensional arrays in C is +often suggested that is incompatible with FFTW: @emph{using it will +cause FFTW to die a painful death}. We discuss the technique here, +however, because it is so commonly known and used. This method is to +create arrays of pointers of arrays of pointers of @dots{}etcetera. +For example, the analogue in this method to the example above is: + +@example +int i,j; +fftw_complex ***a_bad_array; /* @r{another way to make a 5x12x27 array} */ + +a_bad_array = (fftw_complex ***) malloc(5 * sizeof(fftw_complex **)); +for (i = 0; i < 5; ++i) @{ + a_bad_array[i] = + (fftw_complex **) malloc(12 * sizeof(fftw_complex *)); + for (j = 0; j < 12; ++j) + a_bad_array[i][j] = + (fftw_complex *) malloc(27 * sizeof(fftw_complex)); +@} +@end example + +As you can see, this sort of array is inconvenient to allocate (and +deallocate). On the other hand, it has the advantage that the +@math{(i,j,k)}-th element can be referenced simply by +@code{a_bad_array[i][j][k]}. + +If you like this technique and want to maximize convenience in accessing +the array, but still want to pass the array to FFTW, you can use a +hybrid method. Allocate the array as one contiguous block, but also +declare an array of arrays of pointers that point to appropriate places +in the block. That sort of trick is beyond the scope of this +documentation; for more information on multi-dimensional arrays in C, +see the @code{comp.lang.c} +@uref{http://c-faq.com/aryptr/dynmuldimary.html, FAQ}. + +@c ------------------------------------------------------------ +@node Words of Wisdom-Saving Plans, Caveats in Using Wisdom, Multi-dimensional Array Format, Other Important Topics +@section Words of Wisdom---Saving Plans +@cindex wisdom +@cindex saving plans to disk + +FFTW implements a method for saving plans to disk and restoring them. +In fact, what FFTW does is more general than just saving and loading +plans. The mechanism is called @dfn{wisdom}. Here, we describe +this feature at a high level. @xref{FFTW Reference}, for a less casual +but more complete discussion of how to use wisdom in FFTW. + +Plans created with the @code{FFTW_MEASURE}, @code{FFTW_PATIENT}, or +@code{FFTW_EXHAUSTIVE} options produce near-optimal FFT performance, +but may require a long time to compute because FFTW must measure the +runtime of many possible plans and select the best one. This setup is +designed for the situations where so many transforms of the same size +must be computed that the start-up time is irrelevant. For short +initialization times, but slower transforms, we have provided +@code{FFTW_ESTIMATE}. The @code{wisdom} mechanism is a way to get the +best of both worlds: you compute a good plan once, save it to +disk, and later reload it as many times as necessary. The wisdom +mechanism can actually save and reload many plans at once, not just +one. +@ctindex FFTW_MEASURE +@ctindex FFTW_PATIENT +@ctindex FFTW_EXHAUSTIVE +@ctindex FFTW_ESTIMATE + + +Whenever you create a plan, the FFTW planner accumulates wisdom, which +is information sufficient to reconstruct the plan. After planning, +you can save this information to disk by means of the function: +@example +int fftw_export_wisdom_to_filename(const char *filename); +@end example +@findex fftw_export_wisdom_to_filename +(This function returns non-zero on success.) + +The next time you run the program, you can restore the wisdom with +@code{fftw_import_wisdom_from_filename} (which also returns non-zero on success), +and then recreate the plan using the same flags as before. +@example +int fftw_import_wisdom_from_filename(const char *filename); +@end example +@findex fftw_import_wisdom_from_filename + +Wisdom is automatically used for any size to which it is applicable, as +long as the planner flags are not more ``patient'' than those with which +the wisdom was created. For example, wisdom created with +@code{FFTW_MEASURE} can be used if you later plan with +@code{FFTW_ESTIMATE} or @code{FFTW_MEASURE}, but not with +@code{FFTW_PATIENT}. + +The @code{wisdom} is cumulative, and is stored in a global, private +data structure managed internally by FFTW. The storage space required +is minimal, proportional to the logarithm of the sizes the wisdom was +generated from. If memory usage is a concern, however, the wisdom can +be forgotten and its associated memory freed by calling: +@example +void fftw_forget_wisdom(void); +@end example +@findex fftw_forget_wisdom + +Wisdom can be exported to a file, a string, or any other medium. +For details, see @ref{Wisdom}. + +@node Caveats in Using Wisdom, , Words of Wisdom-Saving Plans, Other Important Topics +@section Caveats in Using Wisdom +@cindex wisdom, problems with + +@quotation +@html +<i> +@end html +For in much wisdom is much grief, and he that increaseth knowledge +increaseth sorrow. +@html +</i> +@end html +[Ecclesiastes 1:18] +@cindex Ecclesiastes +@end quotation +@iftex +@medskip +@end iftex + +@cindex portability +There are pitfalls to using wisdom, in that it can negate FFTW's +ability to adapt to changing hardware and other conditions. For +example, it would be perfectly possible to export wisdom from a +program running on one processor and import it into a program running +on another processor. Doing so, however, would mean that the second +program would use plans optimized for the first processor, instead of +the one it is running on. + +It should be safe to reuse wisdom as long as the hardware and program +binaries remain unchanged. (Actually, the optimal plan may change even +between runs of the same binary on identical hardware, due to +differences in the virtual memory environment, etcetera. Users +seriously interested in performance should worry about this problem, +too.) It is likely that, if the same wisdom is used for two +different program binaries, even running on the same machine, the +plans may be sub-optimal because of differing code alignments. It is +therefore wise to recreate wisdom every time an application is +recompiled. The more the underlying hardware and software changes +between the creation of wisdom and its use, the greater grows +the risk of sub-optimal plans. + +Nevertheless, if the choice is between using @code{FFTW_ESTIMATE} or +using possibly-suboptimal wisdom (created on the same machine, but for a +different binary), the wisdom is likely to be better. For this reason, +we provide a function to import wisdom from a standard system-wide +location (@code{/etc/fftw/wisdom} on Unix): +@cindex wisdom, system-wide + +@example +int fftw_import_system_wisdom(void); +@end example +@findex fftw_import_system_wisdom + +FFTW also provides a standalone program, @code{fftw-wisdom} (described +by its own @code{man} page on Unix) with which users can create wisdom, +e.g. for a canonical set of sizes to store in the system wisdom file. +@xref{Wisdom Utilities}. +@cindex fftw-wisdom utility +