comparison src/fftw-3.3.8/doc/other.texi @ 167:bd3cc4d1df30

Add FFTW 3.3.8 source, and a Linux build
author Chris Cannam <cannam@all-day-breakfast.com>
date Tue, 19 Nov 2019 14:52:55 +0000
parents
children
comparison
equal deleted inserted replaced
166:cbd6d7e562c7 167:bd3cc4d1df30
1 @node Other Important Topics, FFTW Reference, Tutorial, Top
2 @chapter Other Important Topics
3 @menu
4 * SIMD alignment and fftw_malloc::
5 * Multi-dimensional Array Format::
6 * Words of Wisdom-Saving Plans::
7 * Caveats in Using Wisdom::
8 @end menu
9
10 @c ------------------------------------------------------------
11 @node SIMD alignment and fftw_malloc, Multi-dimensional Array Format, Other Important Topics, Other Important Topics
12 @section SIMD alignment and fftw_malloc
13
14 SIMD, which stands for ``Single Instruction Multiple Data,'' is a set of
15 special operations supported by some processors to perform a single
16 operation on several numbers (usually 2 or 4) simultaneously. SIMD
17 floating-point instructions are available on several popular CPUs:
18 SSE/SSE2/AVX/AVX2/AVX512/KCVI on some x86/x86-64 processors, AltiVec and
19 VSX on some POWER/PowerPCs, NEON on some ARM models. FFTW can be
20 compiled to support the SIMD instructions on any of these systems.
21 @cindex SIMD
22 @cindex SSE
23 @cindex SSE2
24 @cindex AVX
25 @cindex AVX2
26 @cindex AVX512
27 @cindex AltiVec
28 @cindex VSX
29 @cindex precision
30
31
32 A program linking to an FFTW library compiled with SIMD support can
33 obtain a nonnegligible speedup for most complex and r2c/c2r
34 transforms. In order to obtain this speedup, however, the arrays of
35 complex (or real) data passed to FFTW must be specially aligned in
36 memory (typically 16-byte aligned), and often this alignment is more
37 stringent than that provided by the usual @code{malloc} (etc.)
38 allocation routines.
39
40 @cindex portability
41 In order to guarantee proper alignment for SIMD, therefore, in case
42 your program is ever linked against a SIMD-using FFTW, we recommend
43 allocating your transform data with @code{fftw_malloc} and
44 de-allocating it with @code{fftw_free}.
45 @findex fftw_malloc
46 @findex fftw_free
47 These have exactly the same interface and behavior as
48 @code{malloc}/@code{free}, except that for a SIMD FFTW they ensure
49 that the returned pointer has the necessary alignment (by calling
50 @code{memalign} or its equivalent on your OS).
51
52 You are not @emph{required} to use @code{fftw_malloc}. You can
53 allocate your data in any way that you like, from @code{malloc} to
54 @code{new} (in C++) to a fixed-size array declaration. If the array
55 happens not to be properly aligned, FFTW will not use the SIMD
56 extensions.
57 @cindex C++
58
59 @findex fftw_alloc_real
60 @findex fftw_alloc_complex
61 Since @code{fftw_malloc} only ever needs to be used for real and
62 complex arrays, we provide two convenient wrapper routines
63 @code{fftw_alloc_real(N)} and @code{fftw_alloc_complex(N)} that are
64 equivalent to @code{(double*)fftw_malloc(sizeof(double) * N)} and
65 @code{(fftw_complex*)fftw_malloc(sizeof(fftw_complex) * N)},
66 respectively (or their equivalents in other precisions).
67
68 @c ------------------------------------------------------------
69 @node Multi-dimensional Array Format, Words of Wisdom-Saving Plans, SIMD alignment and fftw_malloc, Other Important Topics
70 @section Multi-dimensional Array Format
71
72 This section describes the format in which multi-dimensional arrays
73 are stored in FFTW. We felt that a detailed discussion of this topic
74 was necessary. Since several different formats are common, this topic
75 is often a source of confusion.
76
77 @menu
78 * Row-major Format::
79 * Column-major Format::
80 * Fixed-size Arrays in C::
81 * Dynamic Arrays in C::
82 * Dynamic Arrays in C-The Wrong Way::
83 @end menu
84
85 @c =========>
86 @node Row-major Format, Column-major Format, Multi-dimensional Array Format, Multi-dimensional Array Format
87 @subsection Row-major Format
88 @cindex row-major
89
90 The multi-dimensional arrays passed to @code{fftw_plan_dft} etcetera
91 are expected to be stored as a single contiguous block in
92 @dfn{row-major} order (sometimes called ``C order''). Basically, this
93 means that as you step through adjacent memory locations, the first
94 dimension's index varies most slowly and the last dimension's index
95 varies most quickly.
96
97 To be more explicit, let us consider an array of rank @math{d} whose
98 dimensions are @ndims{}. Now, we specify a location in the array by a
99 sequence of @math{d} (zero-based) indices, one for each dimension:
100 @tex
101 $(i_0, i_1, i_2, \ldots, i_{d-1})$.
102 @end tex
103 @ifinfo
104 (i[0], i[1], ..., i[d-1]).
105 @end ifinfo
106 @html
107 (i<sub>0</sub>, i<sub>1</sub>, i<sub>2</sub>,..., i<sub>d-1</sub>).
108 @end html
109 If the array is stored in row-major
110 order, then this element is located at the position
111 @tex
112 $i_{d-1} + n_{d-1} (i_{d-2} + n_{d-2} (\ldots + n_1 i_0))$.
113 @end tex
114 @ifinfo
115 i[d-1] + n[d-1] * (i[d-2] + n[d-2] * (... + n[1] * i[0])).
116 @end ifinfo
117 @html
118 i<sub>d-1</sub> + n<sub>d-1</sub> * (i<sub>d-2</sub> + n<sub>d-2</sub> * (... + n<sub>1</sub> * i<sub>0</sub>)).
119 @end html
120
121 Note that, for the ordinary complex DFT, each element of the array
122 must be of type @code{fftw_complex}; i.e. a (real, imaginary) pair of
123 (double-precision) numbers.
124
125 In the advanced FFTW interface, the physical dimensions @math{n} from
126 which the indices are computed can be different from (larger than)
127 the logical dimensions of the transform to be computed, in order to
128 transform a subset of a larger array.
129 @cindex advanced interface
130 Note also that, in the advanced interface, the expression above is
131 multiplied by a @dfn{stride} to get the actual array index---this is
132 useful in situations where each element of the multi-dimensional array
133 is actually a data structure (or another array), and you just want to
134 transform a single field. In the basic interface, however, the stride
135 is 1.
136 @cindex stride
137
138 @c =========>
139 @node Column-major Format, Fixed-size Arrays in C, Row-major Format, Multi-dimensional Array Format
140 @subsection Column-major Format
141 @cindex column-major
142
143 Readers from the Fortran world are used to arrays stored in
144 @dfn{column-major} order (sometimes called ``Fortran order''). This is
145 essentially the exact opposite of row-major order in that, here, the
146 @emph{first} dimension's index varies most quickly.
147
148 If you have an array stored in column-major order and wish to
149 transform it using FFTW, it is quite easy to do. When creating the
150 plan, simply pass the dimensions of the array to the planner in
151 @emph{reverse order}. For example, if your array is a rank three
152 @code{N x M x L} matrix in column-major order, you should pass the
153 dimensions of the array as if it were an @code{L x M x N} matrix
154 (which it is, from the perspective of FFTW). This is done for you
155 @emph{automatically} by the FFTW legacy-Fortran interface
156 (@pxref{Calling FFTW from Legacy Fortran}), but you must do it
157 manually with the modern Fortran interface (@pxref{Reversing array
158 dimensions}).
159 @cindex Fortran interface
160
161 @c =========>
162 @node Fixed-size Arrays in C, Dynamic Arrays in C, Column-major Format, Multi-dimensional Array Format
163 @subsection Fixed-size Arrays in C
164 @cindex C multi-dimensional arrays
165
166 A multi-dimensional array whose size is declared at compile time in C
167 is @emph{already} in row-major order. You don't have to do anything
168 special to transform it. For example:
169
170 @example
171 @{
172 fftw_complex data[N0][N1][N2];
173 fftw_plan plan;
174 ...
175 plan = fftw_plan_dft_3d(N0, N1, N2, &data[0][0][0], &data[0][0][0],
176 FFTW_FORWARD, FFTW_ESTIMATE);
177 ...
178 @}
179 @end example
180
181 This will plan a 3d in-place transform of size @code{N0 x N1 x N2}.
182 Notice how we took the address of the zero-th element to pass to the
183 planner (we could also have used a typecast).
184
185 However, we tend to @emph{discourage} users from declaring their
186 arrays in this way, for two reasons. First, this allocates the array
187 on the stack (``automatic'' storage), which has a very limited size on
188 most operating systems (declaring an array with more than a few
189 thousand elements will often cause a crash). (You can get around this
190 limitation on many systems by declaring the array as
191 @code{static} and/or global, but that has its own drawbacks.)
192 Second, it may not optimally align the array for use with a SIMD
193 FFTW (@pxref{SIMD alignment and fftw_malloc}). Instead, we recommend
194 using @code{fftw_malloc}, as described below.
195
196 @c =========>
197 @node Dynamic Arrays in C, Dynamic Arrays in C-The Wrong Way, Fixed-size Arrays in C, Multi-dimensional Array Format
198 @subsection Dynamic Arrays in C
199
200 We recommend allocating most arrays dynamically, with
201 @code{fftw_malloc}. This isn't too hard to do, although it is not as
202 straightforward for multi-dimensional arrays as it is for
203 one-dimensional arrays.
204
205 Creating the array is simple: using a dynamic-allocation routine like
206 @code{fftw_malloc}, allocate an array big enough to store N
207 @code{fftw_complex} values (for a complex DFT), where N is the product
208 of the sizes of the array dimensions (i.e. the total number of complex
209 values in the array). For example, here is code to allocate a
210 @threedims{5,12,27} rank-3 array:
211 @findex fftw_malloc
212
213 @example
214 fftw_complex *an_array;
215 an_array = (fftw_complex*) fftw_malloc(5*12*27 * sizeof(fftw_complex));
216 @end example
217
218 Accessing the array elements, however, is more tricky---you can't
219 simply use multiple applications of the @samp{[]} operator like you
220 could for fixed-size arrays. Instead, you have to explicitly compute
221 the offset into the array using the formula given earlier for
222 row-major arrays. For example, to reference the @math{(i,j,k)}-th
223 element of the array allocated above, you would use the expression
224 @code{an_array[k + 27 * (j + 12 * i)]}.
225
226 This pain can be alleviated somewhat by defining appropriate macros,
227 or, in C++, creating a class and overloading the @samp{()} operator.
228 The recent C99 standard provides a way to reinterpret the dynamic
229 array as a ``variable-length'' multi-dimensional array amenable to
230 @samp{[]}, but this feature is not yet widely supported by compilers.
231 @cindex C99
232 @cindex C++
233
234 @c =========>
235 @node Dynamic Arrays in C-The Wrong Way, , Dynamic Arrays in C, Multi-dimensional Array Format
236 @subsection Dynamic Arrays in C---The Wrong Way
237
238 A different method for allocating multi-dimensional arrays in C is
239 often suggested that is incompatible with FFTW: @emph{using it will
240 cause FFTW to die a painful death}. We discuss the technique here,
241 however, because it is so commonly known and used. This method is to
242 create arrays of pointers of arrays of pointers of @dots{}etcetera.
243 For example, the analogue in this method to the example above is:
244
245 @example
246 int i,j;
247 fftw_complex ***a_bad_array; /* @r{another way to make a 5x12x27 array} */
248
249 a_bad_array = (fftw_complex ***) malloc(5 * sizeof(fftw_complex **));
250 for (i = 0; i < 5; ++i) @{
251 a_bad_array[i] =
252 (fftw_complex **) malloc(12 * sizeof(fftw_complex *));
253 for (j = 0; j < 12; ++j)
254 a_bad_array[i][j] =
255 (fftw_complex *) malloc(27 * sizeof(fftw_complex));
256 @}
257 @end example
258
259 As you can see, this sort of array is inconvenient to allocate (and
260 deallocate). On the other hand, it has the advantage that the
261 @math{(i,j,k)}-th element can be referenced simply by
262 @code{a_bad_array[i][j][k]}.
263
264 If you like this technique and want to maximize convenience in accessing
265 the array, but still want to pass the array to FFTW, you can use a
266 hybrid method. Allocate the array as one contiguous block, but also
267 declare an array of arrays of pointers that point to appropriate places
268 in the block. That sort of trick is beyond the scope of this
269 documentation; for more information on multi-dimensional arrays in C,
270 see the @code{comp.lang.c}
271 @uref{http://c-faq.com/aryptr/dynmuldimary.html, FAQ}.
272
273 @c ------------------------------------------------------------
274 @node Words of Wisdom-Saving Plans, Caveats in Using Wisdom, Multi-dimensional Array Format, Other Important Topics
275 @section Words of Wisdom---Saving Plans
276 @cindex wisdom
277 @cindex saving plans to disk
278
279 FFTW implements a method for saving plans to disk and restoring them.
280 In fact, what FFTW does is more general than just saving and loading
281 plans. The mechanism is called @dfn{wisdom}. Here, we describe
282 this feature at a high level. @xref{FFTW Reference}, for a less casual
283 but more complete discussion of how to use wisdom in FFTW.
284
285 Plans created with the @code{FFTW_MEASURE}, @code{FFTW_PATIENT}, or
286 @code{FFTW_EXHAUSTIVE} options produce near-optimal FFT performance,
287 but may require a long time to compute because FFTW must measure the
288 runtime of many possible plans and select the best one. This setup is
289 designed for the situations where so many transforms of the same size
290 must be computed that the start-up time is irrelevant. For short
291 initialization times, but slower transforms, we have provided
292 @code{FFTW_ESTIMATE}. The @code{wisdom} mechanism is a way to get the
293 best of both worlds: you compute a good plan once, save it to
294 disk, and later reload it as many times as necessary. The wisdom
295 mechanism can actually save and reload many plans at once, not just
296 one.
297 @ctindex FFTW_MEASURE
298 @ctindex FFTW_PATIENT
299 @ctindex FFTW_EXHAUSTIVE
300 @ctindex FFTW_ESTIMATE
301
302
303 Whenever you create a plan, the FFTW planner accumulates wisdom, which
304 is information sufficient to reconstruct the plan. After planning,
305 you can save this information to disk by means of the function:
306 @example
307 int fftw_export_wisdom_to_filename(const char *filename);
308 @end example
309 @findex fftw_export_wisdom_to_filename
310 (This function returns non-zero on success.)
311
312 The next time you run the program, you can restore the wisdom with
313 @code{fftw_import_wisdom_from_filename} (which also returns non-zero on success),
314 and then recreate the plan using the same flags as before.
315 @example
316 int fftw_import_wisdom_from_filename(const char *filename);
317 @end example
318 @findex fftw_import_wisdom_from_filename
319
320 Wisdom is automatically used for any size to which it is applicable, as
321 long as the planner flags are not more ``patient'' than those with which
322 the wisdom was created. For example, wisdom created with
323 @code{FFTW_MEASURE} can be used if you later plan with
324 @code{FFTW_ESTIMATE} or @code{FFTW_MEASURE}, but not with
325 @code{FFTW_PATIENT}.
326
327 The @code{wisdom} is cumulative, and is stored in a global, private
328 data structure managed internally by FFTW. The storage space required
329 is minimal, proportional to the logarithm of the sizes the wisdom was
330 generated from. If memory usage is a concern, however, the wisdom can
331 be forgotten and its associated memory freed by calling:
332 @example
333 void fftw_forget_wisdom(void);
334 @end example
335 @findex fftw_forget_wisdom
336
337 Wisdom can be exported to a file, a string, or any other medium.
338 For details, see @ref{Wisdom}.
339
340 @node Caveats in Using Wisdom, , Words of Wisdom-Saving Plans, Other Important Topics
341 @section Caveats in Using Wisdom
342 @cindex wisdom, problems with
343
344 @quotation
345 @html
346 <i>
347 @end html
348 For in much wisdom is much grief, and he that increaseth knowledge
349 increaseth sorrow.
350 @html
351 </i>
352 @end html
353 [Ecclesiastes 1:18]
354 @cindex Ecclesiastes
355 @end quotation
356 @iftex
357 @medskip
358 @end iftex
359
360 @cindex portability
361 There are pitfalls to using wisdom, in that it can negate FFTW's
362 ability to adapt to changing hardware and other conditions. For
363 example, it would be perfectly possible to export wisdom from a
364 program running on one processor and import it into a program running
365 on another processor. Doing so, however, would mean that the second
366 program would use plans optimized for the first processor, instead of
367 the one it is running on.
368
369 It should be safe to reuse wisdom as long as the hardware and program
370 binaries remain unchanged. (Actually, the optimal plan may change even
371 between runs of the same binary on identical hardware, due to
372 differences in the virtual memory environment, etcetera. Users
373 seriously interested in performance should worry about this problem,
374 too.) It is likely that, if the same wisdom is used for two
375 different program binaries, even running on the same machine, the
376 plans may be sub-optimal because of differing code alignments. It is
377 therefore wise to recreate wisdom every time an application is
378 recompiled. The more the underlying hardware and software changes
379 between the creation of wisdom and its use, the greater grows
380 the risk of sub-optimal plans.
381
382 Nevertheless, if the choice is between using @code{FFTW_ESTIMATE} or
383 using possibly-suboptimal wisdom (created on the same machine, but for a
384 different binary), the wisdom is likely to be better. For this reason,
385 we provide a function to import wisdom from a standard system-wide
386 location (@code{/etc/fftw/wisdom} on Unix):
387 @cindex wisdom, system-wide
388
389 @example
390 int fftw_import_system_wisdom(void);
391 @end example
392 @findex fftw_import_system_wisdom
393
394 FFTW also provides a standalone program, @code{fftw-wisdom} (described
395 by its own @code{man} page on Unix) with which users can create wisdom,
396 e.g. for a canonical set of sizes to store in the system wisdom file.
397 @xref{Wisdom Utilities}.
398 @cindex fftw-wisdom utility
399