Mercurial > hg > sv-dependency-builds
comparison src/fftw-3.3.5/doc/other.texi @ 127:7867fa7e1b6b
Current fftw source
author | Chris Cannam <cannam@all-day-breakfast.com> |
---|---|
date | Tue, 18 Oct 2016 13:40:26 +0100 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
126:4a7071416412 | 127:7867fa7e1b6b |
---|---|
1 @node Other Important Topics, FFTW Reference, Tutorial, Top | |
2 @chapter Other Important Topics | |
3 @menu | |
4 * SIMD alignment and fftw_malloc:: | |
5 * Multi-dimensional Array Format:: | |
6 * Words of Wisdom-Saving Plans:: | |
7 * Caveats in Using Wisdom:: | |
8 @end menu | |
9 | |
10 @c ------------------------------------------------------------ | |
11 @node SIMD alignment and fftw_malloc, Multi-dimensional Array Format, Other Important Topics, Other Important Topics | |
12 @section SIMD alignment and fftw_malloc | |
13 | |
14 SIMD, which stands for ``Single Instruction Multiple Data,'' is a set of | |
15 special operations supported by some processors to perform a single | |
16 operation on several numbers (usually 2 or 4) simultaneously. SIMD | |
17 floating-point instructions are available on several popular CPUs: | |
18 SSE/SSE2/AVX/AVX2/AVX512/KCVI on some x86/x86-64 processors, AltiVec and | |
19 VSX on some POWER/PowerPCs, NEON on some ARM models. FFTW can be | |
20 compiled to support the SIMD instructions on any of these systems. | |
21 @cindex SIMD | |
22 @cindex SSE | |
23 @cindex SSE2 | |
24 @cindex AVX | |
25 @cindex AVX2 | |
26 @cindex AVX512 | |
27 @cindex AltiVec | |
28 @cindex VSX | |
29 @cindex precision | |
30 | |
31 | |
32 A program linking to an FFTW library compiled with SIMD support can | |
33 obtain a nonnegligible speedup for most complex and r2c/c2r | |
34 transforms. In order to obtain this speedup, however, the arrays of | |
35 complex (or real) data passed to FFTW must be specially aligned in | |
36 memory (typically 16-byte aligned), and often this alignment is more | |
37 stringent than that provided by the usual @code{malloc} (etc.) | |
38 allocation routines. | |
39 | |
40 @cindex portability | |
41 In order to guarantee proper alignment for SIMD, therefore, in case | |
42 your program is ever linked against a SIMD-using FFTW, we recommend | |
43 allocating your transform data with @code{fftw_malloc} and | |
44 de-allocating it with @code{fftw_free}. | |
45 @findex fftw_malloc | |
46 @findex fftw_free | |
47 These have exactly the same interface and behavior as | |
48 @code{malloc}/@code{free}, except that for a SIMD FFTW they ensure | |
49 that the returned pointer has the necessary alignment (by calling | |
50 @code{memalign} or its equivalent on your OS). | |
51 | |
52 You are not @emph{required} to use @code{fftw_malloc}. You can | |
53 allocate your data in any way that you like, from @code{malloc} to | |
54 @code{new} (in C++) to a fixed-size array declaration. If the array | |
55 happens not to be properly aligned, FFTW will not use the SIMD | |
56 extensions. | |
57 @cindex C++ | |
58 | |
59 @findex fftw_alloc_real | |
60 @findex fftw_alloc_complex | |
61 Since @code{fftw_malloc} only ever needs to be used for real and | |
62 complex arrays, we provide two convenient wrapper routines | |
63 @code{fftw_alloc_real(N)} and @code{fftw_alloc_complex(N)} that are | |
64 equivalent to @code{(double*)fftw_malloc(sizeof(double) * N)} and | |
65 @code{(fftw_complex*)fftw_malloc(sizeof(fftw_complex) * N)}, | |
66 respectively (or their equivalents in other precisions). | |
67 | |
68 @c ------------------------------------------------------------ | |
69 @node Multi-dimensional Array Format, Words of Wisdom-Saving Plans, SIMD alignment and fftw_malloc, Other Important Topics | |
70 @section Multi-dimensional Array Format | |
71 | |
72 This section describes the format in which multi-dimensional arrays | |
73 are stored in FFTW. We felt that a detailed discussion of this topic | |
74 was necessary. Since several different formats are common, this topic | |
75 is often a source of confusion. | |
76 | |
77 @menu | |
78 * Row-major Format:: | |
79 * Column-major Format:: | |
80 * Fixed-size Arrays in C:: | |
81 * Dynamic Arrays in C:: | |
82 * Dynamic Arrays in C-The Wrong Way:: | |
83 @end menu | |
84 | |
85 @c =========> | |
86 @node Row-major Format, Column-major Format, Multi-dimensional Array Format, Multi-dimensional Array Format | |
87 @subsection Row-major Format | |
88 @cindex row-major | |
89 | |
90 The multi-dimensional arrays passed to @code{fftw_plan_dft} etcetera | |
91 are expected to be stored as a single contiguous block in | |
92 @dfn{row-major} order (sometimes called ``C order''). Basically, this | |
93 means that as you step through adjacent memory locations, the first | |
94 dimension's index varies most slowly and the last dimension's index | |
95 varies most quickly. | |
96 | |
97 To be more explicit, let us consider an array of rank @math{d} whose | |
98 dimensions are @ndims{}. Now, we specify a location in the array by a | |
99 sequence of @math{d} (zero-based) indices, one for each dimension: | |
100 @tex | |
101 $(i_0, i_1, i_2, \ldots, i_{d-1})$. | |
102 @end tex | |
103 @ifinfo | |
104 (i[0], i[1], ..., i[d-1]). | |
105 @end ifinfo | |
106 @html | |
107 (i<sub>0</sub>, i<sub>1</sub>, i<sub>2</sub>,..., i<sub>d-1</sub>). | |
108 @end html | |
109 If the array is stored in row-major | |
110 order, then this element is located at the position | |
111 @tex | |
112 $i_{d-1} + n_{d-1} (i_{d-2} + n_{d-2} (\ldots + n_1 i_0))$. | |
113 @end tex | |
114 @ifinfo | |
115 i[d-1] + n[d-1] * (i[d-2] + n[d-2] * (... + n[1] * i[0])). | |
116 @end ifinfo | |
117 @html | |
118 i<sub>d-1</sub> + n<sub>d-1</sub> * (i<sub>d-2</sub> + n<sub>d-2</sub> * (... + n<sub>1</sub> * i<sub>0</sub>)). | |
119 @end html | |
120 | |
121 Note that, for the ordinary complex DFT, each element of the array | |
122 must be of type @code{fftw_complex}; i.e. a (real, imaginary) pair of | |
123 (double-precision) numbers. | |
124 | |
125 In the advanced FFTW interface, the physical dimensions @math{n} from | |
126 which the indices are computed can be different from (larger than) | |
127 the logical dimensions of the transform to be computed, in order to | |
128 transform a subset of a larger array. | |
129 @cindex advanced interface | |
130 Note also that, in the advanced interface, the expression above is | |
131 multiplied by a @dfn{stride} to get the actual array index---this is | |
132 useful in situations where each element of the multi-dimensional array | |
133 is actually a data structure (or another array), and you just want to | |
134 transform a single field. In the basic interface, however, the stride | |
135 is 1. | |
136 @cindex stride | |
137 | |
138 @c =========> | |
139 @node Column-major Format, Fixed-size Arrays in C, Row-major Format, Multi-dimensional Array Format | |
140 @subsection Column-major Format | |
141 @cindex column-major | |
142 | |
143 Readers from the Fortran world are used to arrays stored in | |
144 @dfn{column-major} order (sometimes called ``Fortran order''). This is | |
145 essentially the exact opposite of row-major order in that, here, the | |
146 @emph{first} dimension's index varies most quickly. | |
147 | |
148 If you have an array stored in column-major order and wish to | |
149 transform it using FFTW, it is quite easy to do. When creating the | |
150 plan, simply pass the dimensions of the array to the planner in | |
151 @emph{reverse order}. For example, if your array is a rank three | |
152 @code{N x M x L} matrix in column-major order, you should pass the | |
153 dimensions of the array as if it were an @code{L x M x N} matrix | |
154 (which it is, from the perspective of FFTW). This is done for you | |
155 @emph{automatically} by the FFTW legacy-Fortran interface | |
156 (@pxref{Calling FFTW from Legacy Fortran}), but you must do it | |
157 manually with the modern Fortran interface (@pxref{Reversing array | |
158 dimensions}). | |
159 @cindex Fortran interface | |
160 | |
161 @c =========> | |
162 @node Fixed-size Arrays in C, Dynamic Arrays in C, Column-major Format, Multi-dimensional Array Format | |
163 @subsection Fixed-size Arrays in C | |
164 @cindex C multi-dimensional arrays | |
165 | |
166 A multi-dimensional array whose size is declared at compile time in C | |
167 is @emph{already} in row-major order. You don't have to do anything | |
168 special to transform it. For example: | |
169 | |
170 @example | |
171 @{ | |
172 fftw_complex data[N0][N1][N2]; | |
173 fftw_plan plan; | |
174 ... | |
175 plan = fftw_plan_dft_3d(N0, N1, N2, &data[0][0][0], &data[0][0][0], | |
176 FFTW_FORWARD, FFTW_ESTIMATE); | |
177 ... | |
178 @} | |
179 @end example | |
180 | |
181 This will plan a 3d in-place transform of size @code{N0 x N1 x N2}. | |
182 Notice how we took the address of the zero-th element to pass to the | |
183 planner (we could also have used a typecast). | |
184 | |
185 However, we tend to @emph{discourage} users from declaring their | |
186 arrays in this way, for two reasons. First, this allocates the array | |
187 on the stack (``automatic'' storage), which has a very limited size on | |
188 most operating systems (declaring an array with more than a few | |
189 thousand elements will often cause a crash). (You can get around this | |
190 limitation on many systems by declaring the array as | |
191 @code{static} and/or global, but that has its own drawbacks.) | |
192 Second, it may not optimally align the array for use with a SIMD | |
193 FFTW (@pxref{SIMD alignment and fftw_malloc}). Instead, we recommend | |
194 using @code{fftw_malloc}, as described below. | |
195 | |
196 @c =========> | |
197 @node Dynamic Arrays in C, Dynamic Arrays in C-The Wrong Way, Fixed-size Arrays in C, Multi-dimensional Array Format | |
198 @subsection Dynamic Arrays in C | |
199 | |
200 We recommend allocating most arrays dynamically, with | |
201 @code{fftw_malloc}. This isn't too hard to do, although it is not as | |
202 straightforward for multi-dimensional arrays as it is for | |
203 one-dimensional arrays. | |
204 | |
205 Creating the array is simple: using a dynamic-allocation routine like | |
206 @code{fftw_malloc}, allocate an array big enough to store N | |
207 @code{fftw_complex} values (for a complex DFT), where N is the product | |
208 of the sizes of the array dimensions (i.e. the total number of complex | |
209 values in the array). For example, here is code to allocate a | |
210 @threedims{5,12,27} rank-3 array: | |
211 @findex fftw_malloc | |
212 | |
213 @example | |
214 fftw_complex *an_array; | |
215 an_array = (fftw_complex*) fftw_malloc(5*12*27 * sizeof(fftw_complex)); | |
216 @end example | |
217 | |
218 Accessing the array elements, however, is more tricky---you can't | |
219 simply use multiple applications of the @samp{[]} operator like you | |
220 could for fixed-size arrays. Instead, you have to explicitly compute | |
221 the offset into the array using the formula given earlier for | |
222 row-major arrays. For example, to reference the @math{(i,j,k)}-th | |
223 element of the array allocated above, you would use the expression | |
224 @code{an_array[k + 27 * (j + 12 * i)]}. | |
225 | |
226 This pain can be alleviated somewhat by defining appropriate macros, | |
227 or, in C++, creating a class and overloading the @samp{()} operator. | |
228 The recent C99 standard provides a way to reinterpret the dynamic | |
229 array as a ``variable-length'' multi-dimensional array amenable to | |
230 @samp{[]}, but this feature is not yet widely supported by compilers. | |
231 @cindex C99 | |
232 @cindex C++ | |
233 | |
234 @c =========> | |
235 @node Dynamic Arrays in C-The Wrong Way, , Dynamic Arrays in C, Multi-dimensional Array Format | |
236 @subsection Dynamic Arrays in C---The Wrong Way | |
237 | |
238 A different method for allocating multi-dimensional arrays in C is | |
239 often suggested that is incompatible with FFTW: @emph{using it will | |
240 cause FFTW to die a painful death}. We discuss the technique here, | |
241 however, because it is so commonly known and used. This method is to | |
242 create arrays of pointers of arrays of pointers of @dots{}etcetera. | |
243 For example, the analogue in this method to the example above is: | |
244 | |
245 @example | |
246 int i,j; | |
247 fftw_complex ***a_bad_array; /* @r{another way to make a 5x12x27 array} */ | |
248 | |
249 a_bad_array = (fftw_complex ***) malloc(5 * sizeof(fftw_complex **)); | |
250 for (i = 0; i < 5; ++i) @{ | |
251 a_bad_array[i] = | |
252 (fftw_complex **) malloc(12 * sizeof(fftw_complex *)); | |
253 for (j = 0; j < 12; ++j) | |
254 a_bad_array[i][j] = | |
255 (fftw_complex *) malloc(27 * sizeof(fftw_complex)); | |
256 @} | |
257 @end example | |
258 | |
259 As you can see, this sort of array is inconvenient to allocate (and | |
260 deallocate). On the other hand, it has the advantage that the | |
261 @math{(i,j,k)}-th element can be referenced simply by | |
262 @code{a_bad_array[i][j][k]}. | |
263 | |
264 If you like this technique and want to maximize convenience in accessing | |
265 the array, but still want to pass the array to FFTW, you can use a | |
266 hybrid method. Allocate the array as one contiguous block, but also | |
267 declare an array of arrays of pointers that point to appropriate places | |
268 in the block. That sort of trick is beyond the scope of this | |
269 documentation; for more information on multi-dimensional arrays in C, | |
270 see the @code{comp.lang.c} | |
271 @uref{http://c-faq.com/aryptr/dynmuldimary.html, FAQ}. | |
272 | |
273 @c ------------------------------------------------------------ | |
274 @node Words of Wisdom-Saving Plans, Caveats in Using Wisdom, Multi-dimensional Array Format, Other Important Topics | |
275 @section Words of Wisdom---Saving Plans | |
276 @cindex wisdom | |
277 @cindex saving plans to disk | |
278 | |
279 FFTW implements a method for saving plans to disk and restoring them. | |
280 In fact, what FFTW does is more general than just saving and loading | |
281 plans. The mechanism is called @dfn{wisdom}. Here, we describe | |
282 this feature at a high level. @xref{FFTW Reference}, for a less casual | |
283 but more complete discussion of how to use wisdom in FFTW. | |
284 | |
285 Plans created with the @code{FFTW_MEASURE}, @code{FFTW_PATIENT}, or | |
286 @code{FFTW_EXHAUSTIVE} options produce near-optimal FFT performance, | |
287 but may require a long time to compute because FFTW must measure the | |
288 runtime of many possible plans and select the best one. This setup is | |
289 designed for the situations where so many transforms of the same size | |
290 must be computed that the start-up time is irrelevant. For short | |
291 initialization times, but slower transforms, we have provided | |
292 @code{FFTW_ESTIMATE}. The @code{wisdom} mechanism is a way to get the | |
293 best of both worlds: you compute a good plan once, save it to | |
294 disk, and later reload it as many times as necessary. The wisdom | |
295 mechanism can actually save and reload many plans at once, not just | |
296 one. | |
297 @ctindex FFTW_MEASURE | |
298 @ctindex FFTW_PATIENT | |
299 @ctindex FFTW_EXHAUSTIVE | |
300 @ctindex FFTW_ESTIMATE | |
301 | |
302 | |
303 Whenever you create a plan, the FFTW planner accumulates wisdom, which | |
304 is information sufficient to reconstruct the plan. After planning, | |
305 you can save this information to disk by means of the function: | |
306 @example | |
307 int fftw_export_wisdom_to_filename(const char *filename); | |
308 @end example | |
309 @findex fftw_export_wisdom_to_filename | |
310 (This function returns non-zero on success.) | |
311 | |
312 The next time you run the program, you can restore the wisdom with | |
313 @code{fftw_import_wisdom_from_filename} (which also returns non-zero on success), | |
314 and then recreate the plan using the same flags as before. | |
315 @example | |
316 int fftw_import_wisdom_from_filename(const char *filename); | |
317 @end example | |
318 @findex fftw_import_wisdom_from_filename | |
319 | |
320 Wisdom is automatically used for any size to which it is applicable, as | |
321 long as the planner flags are not more ``patient'' than those with which | |
322 the wisdom was created. For example, wisdom created with | |
323 @code{FFTW_MEASURE} can be used if you later plan with | |
324 @code{FFTW_ESTIMATE} or @code{FFTW_MEASURE}, but not with | |
325 @code{FFTW_PATIENT}. | |
326 | |
327 The @code{wisdom} is cumulative, and is stored in a global, private | |
328 data structure managed internally by FFTW. The storage space required | |
329 is minimal, proportional to the logarithm of the sizes the wisdom was | |
330 generated from. If memory usage is a concern, however, the wisdom can | |
331 be forgotten and its associated memory freed by calling: | |
332 @example | |
333 void fftw_forget_wisdom(void); | |
334 @end example | |
335 @findex fftw_forget_wisdom | |
336 | |
337 Wisdom can be exported to a file, a string, or any other medium. | |
338 For details, see @ref{Wisdom}. | |
339 | |
340 @node Caveats in Using Wisdom, , Words of Wisdom-Saving Plans, Other Important Topics | |
341 @section Caveats in Using Wisdom | |
342 @cindex wisdom, problems with | |
343 | |
344 @quotation | |
345 @html | |
346 <i> | |
347 @end html | |
348 For in much wisdom is much grief, and he that increaseth knowledge | |
349 increaseth sorrow. | |
350 @html | |
351 </i> | |
352 @end html | |
353 [Ecclesiastes 1:18] | |
354 @cindex Ecclesiastes | |
355 @end quotation | |
356 @iftex | |
357 @medskip | |
358 @end iftex | |
359 | |
360 @cindex portability | |
361 There are pitfalls to using wisdom, in that it can negate FFTW's | |
362 ability to adapt to changing hardware and other conditions. For | |
363 example, it would be perfectly possible to export wisdom from a | |
364 program running on one processor and import it into a program running | |
365 on another processor. Doing so, however, would mean that the second | |
366 program would use plans optimized for the first processor, instead of | |
367 the one it is running on. | |
368 | |
369 It should be safe to reuse wisdom as long as the hardware and program | |
370 binaries remain unchanged. (Actually, the optimal plan may change even | |
371 between runs of the same binary on identical hardware, due to | |
372 differences in the virtual memory environment, etcetera. Users | |
373 seriously interested in performance should worry about this problem, | |
374 too.) It is likely that, if the same wisdom is used for two | |
375 different program binaries, even running on the same machine, the | |
376 plans may be sub-optimal because of differing code alignments. It is | |
377 therefore wise to recreate wisdom every time an application is | |
378 recompiled. The more the underlying hardware and software changes | |
379 between the creation of wisdom and its use, the greater grows | |
380 the risk of sub-optimal plans. | |
381 | |
382 Nevertheless, if the choice is between using @code{FFTW_ESTIMATE} or | |
383 using possibly-suboptimal wisdom (created on the same machine, but for a | |
384 different binary), the wisdom is likely to be better. For this reason, | |
385 we provide a function to import wisdom from a standard system-wide | |
386 location (@code{/etc/fftw/wisdom} on Unix): | |
387 @cindex wisdom, system-wide | |
388 | |
389 @example | |
390 int fftw_import_system_wisdom(void); | |
391 @end example | |
392 @findex fftw_import_system_wisdom | |
393 | |
394 FFTW also provides a standalone program, @code{fftw-wisdom} (described | |
395 by its own @code{man} page on Unix) with which users can create wisdom, | |
396 e.g. for a canonical set of sizes to store in the system wisdom file. | |
397 @xref{Wisdom Utilities}. | |
398 @cindex fftw-wisdom utility | |
399 |