comparison src/fftw-3.3.3/doc/other.texi @ 95:89f5e221ed7b

Add FFTW3
author Chris Cannam <cannam@all-day-breakfast.com>
date Wed, 20 Mar 2013 15:35:50 +0000
parents
children
comparison
equal deleted inserted replaced
94:d278df1123f9 95:89f5e221ed7b
1 @node Other Important Topics, FFTW Reference, Tutorial, Top
2 @chapter Other Important Topics
3 @menu
4 * SIMD alignment and fftw_malloc::
5 * Multi-dimensional Array Format::
6 * Words of Wisdom-Saving Plans::
7 * Caveats in Using Wisdom::
8 @end menu
9
10 @c ------------------------------------------------------------
11 @node SIMD alignment and fftw_malloc, Multi-dimensional Array Format, Other Important Topics, Other Important Topics
12 @section SIMD alignment and fftw_malloc
13
14 SIMD, which stands for ``Single Instruction Multiple Data,'' is a set of
15 special operations supported by some processors to perform a single
16 operation on several numbers (usually 2 or 4) simultaneously. SIMD
17 floating-point instructions are available on several popular CPUs:
18 SSE/SSE2/AVX on recent x86/x86-64 processors, AltiVec (single precision)
19 on some PowerPCs (Apple G4 and higher), NEON on some ARM models, and MIPS Paired Single
20 (currently only in FFTW 3.2.x). FFTW can be compiled to support the
21 SIMD instructions on any of these systems.
22 @cindex SIMD
23 @cindex SSE
24 @cindex SSE2
25 @cindex AVX
26 @cindex AltiVec
27 @cindex MIPS PS
28 @cindex precision
29
30
31 A program linking to an FFTW library compiled with SIMD support can
32 obtain a nonnegligible speedup for most complex and r2c/c2r
33 transforms. In order to obtain this speedup, however, the arrays of
34 complex (or real) data passed to FFTW must be specially aligned in
35 memory (typically 16-byte aligned), and often this alignment is more
36 stringent than that provided by the usual @code{malloc} (etc.)
37 allocation routines.
38
39 @cindex portability
40 In order to guarantee proper alignment for SIMD, therefore, in case
41 your program is ever linked against a SIMD-using FFTW, we recommend
42 allocating your transform data with @code{fftw_malloc} and
43 de-allocating it with @code{fftw_free}.
44 @findex fftw_malloc
45 @findex fftw_free
46 These have exactly the same interface and behavior as
47 @code{malloc}/@code{free}, except that for a SIMD FFTW they ensure
48 that the returned pointer has the necessary alignment (by calling
49 @code{memalign} or its equivalent on your OS).
50
51 You are not @emph{required} to use @code{fftw_malloc}. You can
52 allocate your data in any way that you like, from @code{malloc} to
53 @code{new} (in C++) to a fixed-size array declaration. If the array
54 happens not to be properly aligned, FFTW will not use the SIMD
55 extensions.
56 @cindex C++
57
58 @findex fftw_alloc_real
59 @findex fftw_alloc_complex
60 Since @code{fftw_malloc} only ever needs to be used for real and
61 complex arrays, we provide two convenient wrapper routines
62 @code{fftw_alloc_real(N)} and @code{fftw_alloc_complex(N)} that are
63 equivalent to @code{(double*)fftw_malloc(sizeof(double) * N)} and
64 @code{(fftw_complex*)fftw_malloc(sizeof(fftw_complex) * N)},
65 respectively (or their equivalents in other precisions).
66
67 @c ------------------------------------------------------------
68 @node Multi-dimensional Array Format, Words of Wisdom-Saving Plans, SIMD alignment and fftw_malloc, Other Important Topics
69 @section Multi-dimensional Array Format
70
71 This section describes the format in which multi-dimensional arrays
72 are stored in FFTW. We felt that a detailed discussion of this topic
73 was necessary. Since several different formats are common, this topic
74 is often a source of confusion.
75
76 @menu
77 * Row-major Format::
78 * Column-major Format::
79 * Fixed-size Arrays in C::
80 * Dynamic Arrays in C::
81 * Dynamic Arrays in C-The Wrong Way::
82 @end menu
83
84 @c =========>
85 @node Row-major Format, Column-major Format, Multi-dimensional Array Format, Multi-dimensional Array Format
86 @subsection Row-major Format
87 @cindex row-major
88
89 The multi-dimensional arrays passed to @code{fftw_plan_dft} etcetera
90 are expected to be stored as a single contiguous block in
91 @dfn{row-major} order (sometimes called ``C order''). Basically, this
92 means that as you step through adjacent memory locations, the first
93 dimension's index varies most slowly and the last dimension's index
94 varies most quickly.
95
96 To be more explicit, let us consider an array of rank @math{d} whose
97 dimensions are @ndims{}. Now, we specify a location in the array by a
98 sequence of @math{d} (zero-based) indices, one for each dimension:
99 @tex
100 $(i_0, i_1, i_2, \ldots, i_{d-1})$.
101 @end tex
102 @ifinfo
103 (i[0], i[1], ..., i[d-1]).
104 @end ifinfo
105 @html
106 (i<sub>0</sub>, i<sub>1</sub>, i<sub>2</sub>,..., i<sub>d-1</sub>).
107 @end html
108 If the array is stored in row-major
109 order, then this element is located at the position
110 @tex
111 $i_{d-1} + n_{d-1} (i_{d-2} + n_{d-2} (\ldots + n_1 i_0))$.
112 @end tex
113 @ifinfo
114 i[d-1] + n[d-1] * (i[d-2] + n[d-2] * (... + n[1] * i[0])).
115 @end ifinfo
116 @html
117 i<sub>d-1</sub> + n<sub>d-1</sub> * (i<sub>d-2</sub> + n<sub>d-2</sub> * (... + n<sub>1</sub> * i<sub>0</sub>)).
118 @end html
119
120 Note that, for the ordinary complex DFT, each element of the array
121 must be of type @code{fftw_complex}; i.e. a (real, imaginary) pair of
122 (double-precision) numbers.
123
124 In the advanced FFTW interface, the physical dimensions @math{n} from
125 which the indices are computed can be different from (larger than)
126 the logical dimensions of the transform to be computed, in order to
127 transform a subset of a larger array.
128 @cindex advanced interface
129 Note also that, in the advanced interface, the expression above is
130 multiplied by a @dfn{stride} to get the actual array index---this is
131 useful in situations where each element of the multi-dimensional array
132 is actually a data structure (or another array), and you just want to
133 transform a single field. In the basic interface, however, the stride
134 is 1.
135 @cindex stride
136
137 @c =========>
138 @node Column-major Format, Fixed-size Arrays in C, Row-major Format, Multi-dimensional Array Format
139 @subsection Column-major Format
140 @cindex column-major
141
142 Readers from the Fortran world are used to arrays stored in
143 @dfn{column-major} order (sometimes called ``Fortran order''). This is
144 essentially the exact opposite of row-major order in that, here, the
145 @emph{first} dimension's index varies most quickly.
146
147 If you have an array stored in column-major order and wish to
148 transform it using FFTW, it is quite easy to do. When creating the
149 plan, simply pass the dimensions of the array to the planner in
150 @emph{reverse order}. For example, if your array is a rank three
151 @code{N x M x L} matrix in column-major order, you should pass the
152 dimensions of the array as if it were an @code{L x M x N} matrix
153 (which it is, from the perspective of FFTW). This is done for you
154 @emph{automatically} by the FFTW legacy-Fortran interface
155 (@pxref{Calling FFTW from Legacy Fortran}), but you must do it
156 manually with the modern Fortran interface (@pxref{Reversing array
157 dimensions}).
158 @cindex Fortran interface
159
160 @c =========>
161 @node Fixed-size Arrays in C, Dynamic Arrays in C, Column-major Format, Multi-dimensional Array Format
162 @subsection Fixed-size Arrays in C
163 @cindex C multi-dimensional arrays
164
165 A multi-dimensional array whose size is declared at compile time in C
166 is @emph{already} in row-major order. You don't have to do anything
167 special to transform it. For example:
168
169 @example
170 @{
171 fftw_complex data[N0][N1][N2];
172 fftw_plan plan;
173 ...
174 plan = fftw_plan_dft_3d(N0, N1, N2, &data[0][0][0], &data[0][0][0],
175 FFTW_FORWARD, FFTW_ESTIMATE);
176 ...
177 @}
178 @end example
179
180 This will plan a 3d in-place transform of size @code{N0 x N1 x N2}.
181 Notice how we took the address of the zero-th element to pass to the
182 planner (we could also have used a typecast).
183
184 However, we tend to @emph{discourage} users from declaring their
185 arrays in this way, for two reasons. First, this allocates the array
186 on the stack (``automatic'' storage), which has a very limited size on
187 most operating systems (declaring an array with more than a few
188 thousand elements will often cause a crash). (You can get around this
189 limitation on many systems by declaring the array as
190 @code{static} and/or global, but that has its own drawbacks.)
191 Second, it may not optimally align the array for use with a SIMD
192 FFTW (@pxref{SIMD alignment and fftw_malloc}). Instead, we recommend
193 using @code{fftw_malloc}, as described below.
194
195 @c =========>
196 @node Dynamic Arrays in C, Dynamic Arrays in C-The Wrong Way, Fixed-size Arrays in C, Multi-dimensional Array Format
197 @subsection Dynamic Arrays in C
198
199 We recommend allocating most arrays dynamically, with
200 @code{fftw_malloc}. This isn't too hard to do, although it is not as
201 straightforward for multi-dimensional arrays as it is for
202 one-dimensional arrays.
203
204 Creating the array is simple: using a dynamic-allocation routine like
205 @code{fftw_malloc}, allocate an array big enough to store N
206 @code{fftw_complex} values (for a complex DFT), where N is the product
207 of the sizes of the array dimensions (i.e. the total number of complex
208 values in the array). For example, here is code to allocate a
209 @threedims{5,12,27} rank-3 array:
210 @findex fftw_malloc
211
212 @example
213 fftw_complex *an_array;
214 an_array = (fftw_complex*) fftw_malloc(5*12*27 * sizeof(fftw_complex));
215 @end example
216
217 Accessing the array elements, however, is more tricky---you can't
218 simply use multiple applications of the @samp{[]} operator like you
219 could for fixed-size arrays. Instead, you have to explicitly compute
220 the offset into the array using the formula given earlier for
221 row-major arrays. For example, to reference the @math{(i,j,k)}-th
222 element of the array allocated above, you would use the expression
223 @code{an_array[k + 27 * (j + 12 * i)]}.
224
225 This pain can be alleviated somewhat by defining appropriate macros,
226 or, in C++, creating a class and overloading the @samp{()} operator.
227 The recent C99 standard provides a way to reinterpret the dynamic
228 array as a ``variable-length'' multi-dimensional array amenable to
229 @samp{[]}, but this feature is not yet widely supported by compilers.
230 @cindex C99
231 @cindex C++
232
233 @c =========>
234 @node Dynamic Arrays in C-The Wrong Way, , Dynamic Arrays in C, Multi-dimensional Array Format
235 @subsection Dynamic Arrays in C---The Wrong Way
236
237 A different method for allocating multi-dimensional arrays in C is
238 often suggested that is incompatible with FFTW: @emph{using it will
239 cause FFTW to die a painful death}. We discuss the technique here,
240 however, because it is so commonly known and used. This method is to
241 create arrays of pointers of arrays of pointers of @dots{}etcetera.
242 For example, the analogue in this method to the example above is:
243
244 @example
245 int i,j;
246 fftw_complex ***a_bad_array; /* @r{another way to make a 5x12x27 array} */
247
248 a_bad_array = (fftw_complex ***) malloc(5 * sizeof(fftw_complex **));
249 for (i = 0; i < 5; ++i) @{
250 a_bad_array[i] =
251 (fftw_complex **) malloc(12 * sizeof(fftw_complex *));
252 for (j = 0; j < 12; ++j)
253 a_bad_array[i][j] =
254 (fftw_complex *) malloc(27 * sizeof(fftw_complex));
255 @}
256 @end example
257
258 As you can see, this sort of array is inconvenient to allocate (and
259 deallocate). On the other hand, it has the advantage that the
260 @math{(i,j,k)}-th element can be referenced simply by
261 @code{a_bad_array[i][j][k]}.
262
263 If you like this technique and want to maximize convenience in accessing
264 the array, but still want to pass the array to FFTW, you can use a
265 hybrid method. Allocate the array as one contiguous block, but also
266 declare an array of arrays of pointers that point to appropriate places
267 in the block. That sort of trick is beyond the scope of this
268 documentation; for more information on multi-dimensional arrays in C,
269 see the @code{comp.lang.c}
270 @uref{http://c-faq.com/aryptr/dynmuldimary.html, FAQ}.
271
272 @c ------------------------------------------------------------
273 @node Words of Wisdom-Saving Plans, Caveats in Using Wisdom, Multi-dimensional Array Format, Other Important Topics
274 @section Words of Wisdom---Saving Plans
275 @cindex wisdom
276 @cindex saving plans to disk
277
278 FFTW implements a method for saving plans to disk and restoring them.
279 In fact, what FFTW does is more general than just saving and loading
280 plans. The mechanism is called @dfn{wisdom}. Here, we describe
281 this feature at a high level. @xref{FFTW Reference}, for a less casual
282 but more complete discussion of how to use wisdom in FFTW.
283
284 Plans created with the @code{FFTW_MEASURE}, @code{FFTW_PATIENT}, or
285 @code{FFTW_EXHAUSTIVE} options produce near-optimal FFT performance,
286 but may require a long time to compute because FFTW must measure the
287 runtime of many possible plans and select the best one. This setup is
288 designed for the situations where so many transforms of the same size
289 must be computed that the start-up time is irrelevant. For short
290 initialization times, but slower transforms, we have provided
291 @code{FFTW_ESTIMATE}. The @code{wisdom} mechanism is a way to get the
292 best of both worlds: you compute a good plan once, save it to
293 disk, and later reload it as many times as necessary. The wisdom
294 mechanism can actually save and reload many plans at once, not just
295 one.
296 @ctindex FFTW_MEASURE
297 @ctindex FFTW_PATIENT
298 @ctindex FFTW_EXHAUSTIVE
299 @ctindex FFTW_ESTIMATE
300
301
302 Whenever you create a plan, the FFTW planner accumulates wisdom, which
303 is information sufficient to reconstruct the plan. After planning,
304 you can save this information to disk by means of the function:
305 @example
306 int fftw_export_wisdom_to_filename(const char *filename);
307 @end example
308 @findex fftw_export_wisdom_to_filename
309 (This function returns non-zero on success.)
310
311 The next time you run the program, you can restore the wisdom with
312 @code{fftw_import_wisdom_from_filename} (which also returns non-zero on success),
313 and then recreate the plan using the same flags as before.
314 @example
315 int fftw_import_wisdom_from_filename(const char *filename);
316 @end example
317 @findex fftw_import_wisdom_from_filename
318
319 Wisdom is automatically used for any size to which it is applicable, as
320 long as the planner flags are not more ``patient'' than those with which
321 the wisdom was created. For example, wisdom created with
322 @code{FFTW_MEASURE} can be used if you later plan with
323 @code{FFTW_ESTIMATE} or @code{FFTW_MEASURE}, but not with
324 @code{FFTW_PATIENT}.
325
326 The @code{wisdom} is cumulative, and is stored in a global, private
327 data structure managed internally by FFTW. The storage space required
328 is minimal, proportional to the logarithm of the sizes the wisdom was
329 generated from. If memory usage is a concern, however, the wisdom can
330 be forgotten and its associated memory freed by calling:
331 @example
332 void fftw_forget_wisdom(void);
333 @end example
334 @findex fftw_forget_wisdom
335
336 Wisdom can be exported to a file, a string, or any other medium.
337 For details, see @ref{Wisdom}.
338
339 @node Caveats in Using Wisdom, , Words of Wisdom-Saving Plans, Other Important Topics
340 @section Caveats in Using Wisdom
341 @cindex wisdom, problems with
342
343 @quotation
344 @html
345 <i>
346 @end html
347 For in much wisdom is much grief, and he that increaseth knowledge
348 increaseth sorrow.
349 @html
350 </i>
351 @end html
352 [Ecclesiastes 1:18]
353 @cindex Ecclesiastes
354 @end quotation
355 @iftex
356 @medskip
357 @end iftex
358
359 @cindex portability
360 There are pitfalls to using wisdom, in that it can negate FFTW's
361 ability to adapt to changing hardware and other conditions. For
362 example, it would be perfectly possible to export wisdom from a
363 program running on one processor and import it into a program running
364 on another processor. Doing so, however, would mean that the second
365 program would use plans optimized for the first processor, instead of
366 the one it is running on.
367
368 It should be safe to reuse wisdom as long as the hardware and program
369 binaries remain unchanged. (Actually, the optimal plan may change even
370 between runs of the same binary on identical hardware, due to
371 differences in the virtual memory environment, etcetera. Users
372 seriously interested in performance should worry about this problem,
373 too.) It is likely that, if the same wisdom is used for two
374 different program binaries, even running on the same machine, the
375 plans may be sub-optimal because of differing code alignments. It is
376 therefore wise to recreate wisdom every time an application is
377 recompiled. The more the underlying hardware and software changes
378 between the creation of wisdom and its use, the greater grows
379 the risk of sub-optimal plans.
380
381 Nevertheless, if the choice is between using @code{FFTW_ESTIMATE} or
382 using possibly-suboptimal wisdom (created on the same machine, but for a
383 different binary), the wisdom is likely to be better. For this reason,
384 we provide a function to import wisdom from a standard system-wide
385 location (@code{/etc/fftw/wisdom} on Unix):
386 @cindex wisdom, system-wide
387
388 @example
389 int fftw_import_system_wisdom(void);
390 @end example
391 @findex fftw_import_system_wisdom
392
393 FFTW also provides a standalone program, @code{fftw-wisdom} (described
394 by its own @code{man} page on Unix) with which users can create wisdom,
395 e.g. for a canonical set of sizes to store in the system wisdom file.
396 @xref{Wisdom Utilities}.
397 @cindex fftw-wisdom utility
398