Mercurial > hg > sv-dependency-builds
comparison src/fftw-3.3.3/doc/tutorial.texi @ 10:37bf6b4a2645
Add FFTW3
author | Chris Cannam |
---|---|
date | Wed, 20 Mar 2013 15:35:50 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
9:c0fb53affa76 | 10:37bf6b4a2645 |
---|---|
1 @node Tutorial, Other Important Topics, Introduction, Top | |
2 @chapter Tutorial | |
3 @menu | |
4 * Complex One-Dimensional DFTs:: | |
5 * Complex Multi-Dimensional DFTs:: | |
6 * One-Dimensional DFTs of Real Data:: | |
7 * Multi-Dimensional DFTs of Real Data:: | |
8 * More DFTs of Real Data:: | |
9 @end menu | |
10 | |
11 This chapter describes the basic usage of FFTW, i.e., how to compute | |
12 @cindex basic interface | |
13 the Fourier transform of a single array. This chapter tells the | |
14 truth, but not the @emph{whole} truth. Specifically, FFTW implements | |
15 additional routines and flags that are not documented here, although | |
16 in many cases we try to indicate where added capabilities exist. For | |
17 more complete information, see @ref{FFTW Reference}. (Note that you | |
18 need to compile and install FFTW before you can use it in a program. | |
19 For the details of the installation, see @ref{Installation and | |
20 Customization}.) | |
21 | |
22 We recommend that you read this tutorial in order.@footnote{You can | |
23 read the tutorial in bit-reversed order after computing your first | |
24 transform.} At the least, read the first section (@pxref{Complex | |
25 One-Dimensional DFTs}) before reading any of the others, even if your | |
26 main interest lies in one of the other transform types. | |
27 | |
28 Users of FFTW version 2 and earlier may also want to read @ref{Upgrading | |
29 from FFTW version 2}. | |
30 | |
31 @c ------------------------------------------------------------ | |
32 @node Complex One-Dimensional DFTs, Complex Multi-Dimensional DFTs, Tutorial, Tutorial | |
33 @section Complex One-Dimensional DFTs | |
34 | |
35 @quotation | |
36 Plan: To bother about the best method of accomplishing an accidental result. | |
37 [Ambrose Bierce, @cite{The Enlarged Devil's Dictionary}.] | |
38 @cindex Devil | |
39 @end quotation | |
40 | |
41 @iftex | |
42 @medskip | |
43 @end iftex | |
44 | |
45 The basic usage of FFTW to compute a one-dimensional DFT of size | |
46 @code{N} is simple, and it typically looks something like this code: | |
47 | |
48 @example | |
49 #include <fftw3.h> | |
50 ... | |
51 @{ | |
52 fftw_complex *in, *out; | |
53 fftw_plan p; | |
54 ... | |
55 in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N); | |
56 out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N); | |
57 p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE); | |
58 ... | |
59 fftw_execute(p); /* @r{repeat as needed} */ | |
60 ... | |
61 fftw_destroy_plan(p); | |
62 fftw_free(in); fftw_free(out); | |
63 @} | |
64 @end example | |
65 | |
66 You must link this code with the @code{fftw3} library. On Unix systems, | |
67 link with @code{-lfftw3 -lm}. | |
68 | |
69 The example code first allocates the input and output arrays. You can | |
70 allocate them in any way that you like, but we recommend using | |
71 @code{fftw_malloc}, which behaves like | |
72 @findex fftw_malloc | |
73 @code{malloc} except that it properly aligns the array when SIMD | |
74 instructions (such as SSE and Altivec) are available (@pxref{SIMD | |
75 alignment and fftw_malloc}). [Alternatively, we provide a convenient wrapper function @code{fftw_alloc_complex(N)} which has the same effect.] | |
76 @findex fftw_alloc_complex | |
77 @cindex SIMD | |
78 | |
79 | |
80 The data is an array of type @code{fftw_complex}, which is by default a | |
81 @code{double[2]} composed of the real (@code{in[i][0]}) and imaginary | |
82 (@code{in[i][1]}) parts of a complex number. | |
83 @tindex fftw_complex | |
84 | |
85 The next step is to create a @dfn{plan}, which is an object | |
86 @cindex plan | |
87 that contains all the data that FFTW needs to compute the FFT. | |
88 This function creates the plan: | |
89 | |
90 @example | |
91 fftw_plan fftw_plan_dft_1d(int n, fftw_complex *in, fftw_complex *out, | |
92 int sign, unsigned flags); | |
93 @end example | |
94 @findex fftw_plan_dft_1d | |
95 @tindex fftw_plan | |
96 | |
97 The first argument, @code{n}, is the size of the transform you are | |
98 trying to compute. The size @code{n} can be any positive integer, but | |
99 sizes that are products of small factors are transformed most | |
100 efficiently (although prime sizes still use an @Onlogn{} algorithm). | |
101 | |
102 The next two arguments are pointers to the input and output arrays of | |
103 the transform. These pointers can be equal, indicating an | |
104 @dfn{in-place} transform. | |
105 @cindex in-place | |
106 | |
107 | |
108 The fourth argument, @code{sign}, can be either @code{FFTW_FORWARD} | |
109 (@code{-1}) or @code{FFTW_BACKWARD} (@code{+1}), | |
110 @ctindex FFTW_FORWARD | |
111 @ctindex FFTW_BACKWARD | |
112 and indicates the direction of the transform you are interested in; | |
113 technically, it is the sign of the exponent in the transform. | |
114 | |
115 The @code{flags} argument is usually either @code{FFTW_MEASURE} or | |
116 @cindex flags | |
117 @code{FFTW_ESTIMATE}. @code{FFTW_MEASURE} instructs FFTW to run | |
118 @ctindex FFTW_MEASURE | |
119 and measure the execution time of several FFTs in order to find the | |
120 best way to compute the transform of size @code{n}. This process takes | |
121 some time (usually a few seconds), depending on your machine and on | |
122 the size of the transform. @code{FFTW_ESTIMATE}, on the contrary, | |
123 does not run any computation and just builds a | |
124 @ctindex FFTW_ESTIMATE | |
125 reasonable plan that is probably sub-optimal. In short, if your | |
126 program performs many transforms of the same size and initialization | |
127 time is not important, use @code{FFTW_MEASURE}; otherwise use the | |
128 estimate. | |
129 | |
130 @emph{You must create the plan before initializing the input}, because | |
131 @code{FFTW_MEASURE} overwrites the @code{in}/@code{out} arrays. | |
132 (Technically, @code{FFTW_ESTIMATE} does not touch your arrays, but you | |
133 should always create plans first just to be sure.) | |
134 | |
135 Once the plan has been created, you can use it as many times as you | |
136 like for transforms on the specified @code{in}/@code{out} arrays, | |
137 computing the actual transforms via @code{fftw_execute(plan)}: | |
138 @example | |
139 void fftw_execute(const fftw_plan plan); | |
140 @end example | |
141 @findex fftw_execute | |
142 | |
143 The DFT results are stored in-order in the array @code{out}, with the | |
144 zero-frequency (DC) component in @code{out[0]}. | |
145 @cindex frequency | |
146 If @code{in != out}, the transform is @dfn{out-of-place} and the input | |
147 array @code{in} is not modified. Otherwise, the input array is | |
148 overwritten with the transform. | |
149 | |
150 @cindex execute | |
151 If you want to transform a @emph{different} array of the same size, you | |
152 can create a new plan with @code{fftw_plan_dft_1d} and FFTW | |
153 automatically reuses the information from the previous plan, if | |
154 possible. Alternatively, with the ``guru'' interface you can apply a | |
155 given plan to a different array, if you are careful. | |
156 @xref{FFTW Reference}. | |
157 | |
158 When you are done with the plan, you deallocate it by calling | |
159 @code{fftw_destroy_plan(plan)}: | |
160 @example | |
161 void fftw_destroy_plan(fftw_plan plan); | |
162 @end example | |
163 @findex fftw_destroy_plan | |
164 If you allocate an array with @code{fftw_malloc()} you must deallocate | |
165 it with @code{fftw_free()}. Do not use @code{free()} or, heaven | |
166 forbid, @code{delete}. | |
167 @findex fftw_free | |
168 | |
169 FFTW computes an @emph{unnormalized} DFT. Thus, computing a forward | |
170 followed by a backward transform (or vice versa) results in the original | |
171 array scaled by @code{n}. For the definition of the DFT, see @ref{What | |
172 FFTW Really Computes}. | |
173 @cindex DFT | |
174 @cindex normalization | |
175 | |
176 | |
177 If you have a C compiler, such as @code{gcc}, that supports the | |
178 C99 standard, and you @code{#include <complex.h>} @emph{before} | |
179 @code{<fftw3.h>}, then @code{fftw_complex} is the native | |
180 double-precision complex type and you can manipulate it with ordinary | |
181 arithmetic. Otherwise, FFTW defines its own complex type, which is | |
182 bit-compatible with the C99 complex type. @xref{Complex numbers}. | |
183 (The C++ @code{<complex>} template class may also be usable via a | |
184 typecast.) | |
185 @cindex C++ | |
186 | |
187 To use single or long-double precision versions of FFTW, replace the | |
188 @code{fftw_} prefix by @code{fftwf_} or @code{fftwl_} and link with | |
189 @code{-lfftw3f} or @code{-lfftw3l}, but use the @emph{same} | |
190 @code{<fftw3.h>} header file. | |
191 @cindex precision | |
192 | |
193 | |
194 Many more flags exist besides @code{FFTW_MEASURE} and | |
195 @code{FFTW_ESTIMATE}. For example, use @code{FFTW_PATIENT} if you're | |
196 willing to wait even longer for a possibly even faster plan (@pxref{FFTW | |
197 Reference}). | |
198 @ctindex FFTW_PATIENT | |
199 You can also save plans for future use, as described by @ref{Words of | |
200 Wisdom-Saving Plans}. | |
201 | |
202 @c ------------------------------------------------------------ | |
203 @node Complex Multi-Dimensional DFTs, One-Dimensional DFTs of Real Data, Complex One-Dimensional DFTs, Tutorial | |
204 @section Complex Multi-Dimensional DFTs | |
205 | |
206 Multi-dimensional transforms work much the same way as one-dimensional | |
207 transforms: you allocate arrays of @code{fftw_complex} (preferably | |
208 using @code{fftw_malloc}), create an @code{fftw_plan}, execute it as | |
209 many times as you want with @code{fftw_execute(plan)}, and clean up | |
210 with @code{fftw_destroy_plan(plan)} (and @code{fftw_free}). | |
211 | |
212 FFTW provides two routines for creating plans for 2d and 3d transforms, | |
213 and one routine for creating plans of arbitrary dimensionality. | |
214 The 2d and 3d routines have the following signature: | |
215 @example | |
216 fftw_plan fftw_plan_dft_2d(int n0, int n1, | |
217 fftw_complex *in, fftw_complex *out, | |
218 int sign, unsigned flags); | |
219 fftw_plan fftw_plan_dft_3d(int n0, int n1, int n2, | |
220 fftw_complex *in, fftw_complex *out, | |
221 int sign, unsigned flags); | |
222 @end example | |
223 @findex fftw_plan_dft_2d | |
224 @findex fftw_plan_dft_3d | |
225 | |
226 These routines create plans for @code{n0} by @code{n1} two-dimensional | |
227 (2d) transforms and @code{n0} by @code{n1} by @code{n2} 3d transforms, | |
228 respectively. All of these transforms operate on contiguous arrays in | |
229 the C-standard @dfn{row-major} order, so that the last dimension has the | |
230 fastest-varying index in the array. This layout is described further in | |
231 @ref{Multi-dimensional Array Format}. | |
232 | |
233 FFTW can also compute transforms of higher dimensionality. In order to | |
234 avoid confusion between the various meanings of the the word | |
235 ``dimension'', we use the term @emph{rank} | |
236 @cindex rank | |
237 to denote the number of independent indices in an array.@footnote{The | |
238 term ``rank'' is commonly used in the APL, FORTRAN, and Common Lisp | |
239 traditions, although it is not so common in the C@tie{}world.} For | |
240 example, we say that a 2d transform has rank@tie{}2, a 3d transform has | |
241 rank@tie{}3, and so on. You can plan transforms of arbitrary rank by | |
242 means of the following function: | |
243 | |
244 @example | |
245 fftw_plan fftw_plan_dft(int rank, const int *n, | |
246 fftw_complex *in, fftw_complex *out, | |
247 int sign, unsigned flags); | |
248 @end example | |
249 @findex fftw_plan_dft | |
250 | |
251 Here, @code{n} is a pointer to an array @code{n[rank]} denoting an | |
252 @code{n[0]} by @code{n[1]} by @dots{} by @code{n[rank-1]} transform. | |
253 Thus, for example, the call | |
254 @example | |
255 fftw_plan_dft_2d(n0, n1, in, out, sign, flags); | |
256 @end example | |
257 is equivalent to the following code fragment: | |
258 @example | |
259 int n[2]; | |
260 n[0] = n0; | |
261 n[1] = n1; | |
262 fftw_plan_dft(2, n, in, out, sign, flags); | |
263 @end example | |
264 @code{fftw_plan_dft} is not restricted to 2d and 3d transforms, | |
265 however, but it can plan transforms of arbitrary rank. | |
266 | |
267 You may have noticed that all the planner routines described so far | |
268 have overlapping functionality. For example, you can plan a 1d or 2d | |
269 transform by using @code{fftw_plan_dft} with a @code{rank} of @code{1} | |
270 or @code{2}, or even by calling @code{fftw_plan_dft_3d} with @code{n0} | |
271 and/or @code{n1} equal to @code{1} (with no loss in efficiency). This | |
272 pattern continues, and FFTW's planning routines in general form a | |
273 ``partial order,'' sequences of | |
274 @cindex partial order | |
275 interfaces with strictly increasing generality but correspondingly | |
276 greater complexity. | |
277 | |
278 @code{fftw_plan_dft} is the most general complex-DFT routine that we | |
279 describe in this tutorial, but there are also the advanced and guru interfaces, | |
280 @cindex advanced interface | |
281 @cindex guru interface | |
282 which allow one to efficiently combine multiple/strided transforms | |
283 into a single FFTW plan, transform a subset of a larger | |
284 multi-dimensional array, and/or to handle more general complex-number | |
285 formats. For more information, see @ref{FFTW Reference}. | |
286 | |
287 @c ------------------------------------------------------------ | |
288 @node One-Dimensional DFTs of Real Data, Multi-Dimensional DFTs of Real Data, Complex Multi-Dimensional DFTs, Tutorial | |
289 @section One-Dimensional DFTs of Real Data | |
290 | |
291 In many practical applications, the input data @code{in[i]} are purely | |
292 real numbers, in which case the DFT output satisfies the ``Hermitian'' | |
293 @cindex Hermitian | |
294 redundancy: @code{out[i]} is the conjugate of @code{out[n-i]}. It is | |
295 possible to take advantage of these circumstances in order to achieve | |
296 roughly a factor of two improvement in both speed and memory usage. | |
297 | |
298 In exchange for these speed and space advantages, the user sacrifices | |
299 some of the simplicity of FFTW's complex transforms. First of all, the | |
300 input and output arrays are of @emph{different sizes and types}: the | |
301 input is @code{n} real numbers, while the output is @code{n/2+1} | |
302 complex numbers (the non-redundant outputs); this also requires slight | |
303 ``padding'' of the input array for | |
304 @cindex padding | |
305 in-place transforms. Second, the inverse transform (complex to real) | |
306 has the side-effect of @emph{overwriting its input array}, by default. | |
307 Neither of these inconveniences should pose a serious problem for | |
308 users, but it is important to be aware of them. | |
309 | |
310 The routines to perform real-data transforms are almost the same as | |
311 those for complex transforms: you allocate arrays of @code{double} | |
312 and/or @code{fftw_complex} (preferably using @code{fftw_malloc} or | |
313 @code{fftw_alloc_complex}), create an @code{fftw_plan}, execute it as | |
314 many times as you want with @code{fftw_execute(plan)}, and clean up | |
315 with @code{fftw_destroy_plan(plan)} (and @code{fftw_free}). The only | |
316 differences are that the input (or output) is of type @code{double} | |
317 and there are new routines to create the plan. In one dimension: | |
318 | |
319 @example | |
320 fftw_plan fftw_plan_dft_r2c_1d(int n, double *in, fftw_complex *out, | |
321 unsigned flags); | |
322 fftw_plan fftw_plan_dft_c2r_1d(int n, fftw_complex *in, double *out, | |
323 unsigned flags); | |
324 @end example | |
325 @findex fftw_plan_dft_r2c_1d | |
326 @findex fftw_plan_dft_c2r_1d | |
327 | |
328 for the real input to complex-Hermitian output (@dfn{r2c}) and | |
329 complex-Hermitian input to real output (@dfn{c2r}) transforms. | |
330 @cindex r2c | |
331 @cindex c2r | |
332 Unlike the complex DFT planner, there is no @code{sign} argument. | |
333 Instead, r2c DFTs are always @code{FFTW_FORWARD} and c2r DFTs are | |
334 always @code{FFTW_BACKWARD}. | |
335 @ctindex FFTW_FORWARD | |
336 @ctindex FFTW_BACKWARD | |
337 (For single/long-double precision | |
338 @code{fftwf} and @code{fftwl}, @code{double} should be replaced by | |
339 @code{float} and @code{long double}, respectively.) | |
340 @cindex precision | |
341 | |
342 | |
343 Here, @code{n} is the ``logical'' size of the DFT, not necessarily the | |
344 physical size of the array. In particular, the real (@code{double}) | |
345 array has @code{n} elements, while the complex (@code{fftw_complex}) | |
346 array has @code{n/2+1} elements (where the division is rounded down). | |
347 For an in-place transform, | |
348 @cindex in-place | |
349 @code{in} and @code{out} are aliased to the same array, which must be | |
350 big enough to hold both; so, the real array would actually have | |
351 @code{2*(n/2+1)} elements, where the elements beyond the first | |
352 @code{n} are unused padding. (Note that this is very different from | |
353 the concept of ``zero-padding'' a transform to a larger length, which | |
354 changes the logical size of the DFT by actually adding new input | |
355 data.) The @math{k}th element of the complex array is exactly the | |
356 same as the @math{k}th element of the corresponding complex DFT. All | |
357 positive @code{n} are supported; products of small factors are most | |
358 efficient, but an @Onlogn algorithm is used even for prime sizes. | |
359 | |
360 As noted above, the c2r transform destroys its input array even for | |
361 out-of-place transforms. This can be prevented, if necessary, by | |
362 including @code{FFTW_PRESERVE_INPUT} in the @code{flags}, with | |
363 unfortunately some sacrifice in performance. | |
364 @cindex flags | |
365 @ctindex FFTW_PRESERVE_INPUT | |
366 This flag is also not currently supported for multi-dimensional real | |
367 DFTs (next section). | |
368 | |
369 Readers familiar with DFTs of real data will recall that the 0th (the | |
370 ``DC'') and @code{n/2}-th (the ``Nyquist'' frequency, when @code{n} is | |
371 even) elements of the complex output are purely real. Some | |
372 implementations therefore store the Nyquist element where the DC | |
373 imaginary part would go, in order to make the input and output arrays | |
374 the same size. Such packing, however, does not generalize well to | |
375 multi-dimensional transforms, and the space savings are miniscule in | |
376 any case; FFTW does not support it. | |
377 | |
378 An alternative interface for one-dimensional r2c and c2r DFTs can be | |
379 found in the @samp{r2r} interface (@pxref{The Halfcomplex-format | |
380 DFT}), with ``halfcomplex''-format output that @emph{is} the same size | |
381 (and type) as the input array. | |
382 @cindex halfcomplex format | |
383 That interface, although it is not very useful for multi-dimensional | |
384 transforms, may sometimes yield better performance. | |
385 | |
386 @c ------------------------------------------------------------ | |
387 @node Multi-Dimensional DFTs of Real Data, More DFTs of Real Data, One-Dimensional DFTs of Real Data, Tutorial | |
388 @section Multi-Dimensional DFTs of Real Data | |
389 | |
390 Multi-dimensional DFTs of real data use the following planner routines: | |
391 | |
392 @example | |
393 fftw_plan fftw_plan_dft_r2c_2d(int n0, int n1, | |
394 double *in, fftw_complex *out, | |
395 unsigned flags); | |
396 fftw_plan fftw_plan_dft_r2c_3d(int n0, int n1, int n2, | |
397 double *in, fftw_complex *out, | |
398 unsigned flags); | |
399 fftw_plan fftw_plan_dft_r2c(int rank, const int *n, | |
400 double *in, fftw_complex *out, | |
401 unsigned flags); | |
402 @end example | |
403 @findex fftw_plan_dft_r2c_2d | |
404 @findex fftw_plan_dft_r2c_3d | |
405 @findex fftw_plan_dft_r2c | |
406 | |
407 as well as the corresponding @code{c2r} routines with the input/output | |
408 types swapped. These routines work similarly to their complex | |
409 analogues, except for the fact that here the complex output array is cut | |
410 roughly in half and the real array requires padding for in-place | |
411 transforms (as in 1d, above). | |
412 | |
413 As before, @code{n} is the logical size of the array, and the | |
414 consequences of this on the the format of the complex arrays deserve | |
415 careful attention. | |
416 @cindex r2c/c2r multi-dimensional array format | |
417 Suppose that the real data has dimensions @ndims (in row-major order). | |
418 Then, after an r2c transform, the output is an @ndimshalf array of | |
419 @code{fftw_complex} values in row-major order, corresponding to slightly | |
420 over half of the output of the corresponding complex DFT. (The division | |
421 is rounded down.) The ordering of the data is otherwise exactly the | |
422 same as in the complex-DFT case. | |
423 | |
424 For out-of-place transforms, this is the end of the story: the real | |
425 data is stored as a row-major array of size @ndims and the complex | |
426 data is stored as a row-major array of size @ndimshalf{}. | |
427 | |
428 For in-place transforms, however, extra padding of the real-data array | |
429 is necessary because the complex array is larger than the real array, | |
430 and the two arrays share the same memory locations. Thus, for | |
431 in-place transforms, the final dimension of the real-data array must | |
432 be padded with extra values to accommodate the size of the complex | |
433 data---two values if the last dimension is even and one if it is odd. | |
434 @cindex padding | |
435 That is, the last dimension of the real data must physically contain | |
436 @tex | |
437 $2 (n_{d-1}/2+1)$ | |
438 @end tex | |
439 @ifinfo | |
440 2 * (n[d-1]/2+1) | |
441 @end ifinfo | |
442 @html | |
443 2 * (n<sub>d-1</sub>/2+1) | |
444 @end html | |
445 @code{double} values (exactly enough to hold the complex data). | |
446 This physical array size does not, however, change the @emph{logical} | |
447 array size---only | |
448 @tex | |
449 $n_{d-1}$ | |
450 @end tex | |
451 @ifinfo | |
452 n[d-1] | |
453 @end ifinfo | |
454 @html | |
455 n<sub>d-1</sub> | |
456 @end html | |
457 values are actually stored in the last dimension, and | |
458 @tex | |
459 $n_{d-1}$ | |
460 @end tex | |
461 @ifinfo | |
462 n[d-1] | |
463 @end ifinfo | |
464 @html | |
465 n<sub>d-1</sub> | |
466 @end html | |
467 is the last dimension passed to the plan-creation routine. | |
468 | |
469 For example, consider the transform of a two-dimensional real array of | |
470 size @code{n0} by @code{n1}. The output of the r2c transform is a | |
471 two-dimensional complex array of size @code{n0} by @code{n1/2+1}, where | |
472 the @code{y} dimension has been cut nearly in half because of | |
473 redundancies in the output. Because @code{fftw_complex} is twice the | |
474 size of @code{double}, the output array is slightly bigger than the | |
475 input array. Thus, if we want to compute the transform in place, we | |
476 must @emph{pad} the input array so that it is of size @code{n0} by | |
477 @code{2*(n1/2+1)}. If @code{n1} is even, then there are two padding | |
478 elements at the end of each row (which need not be initialized, as they | |
479 are only used for output). | |
480 | |
481 @ifhtml | |
482 The following illustration depicts the input and output arrays just | |
483 described, for both the out-of-place and in-place transforms (with the | |
484 arrows indicating consecutive memory locations): | |
485 @image{rfftwnd-for-html} | |
486 @end ifhtml | |
487 @ifnotinfo | |
488 @ifnothtml | |
489 @float Figure,fig:rfftwnd | |
490 @center @image{rfftwnd} | |
491 @caption{Illustration of the data layout for a 2d @code{nx} by @code{ny} | |
492 real-to-complex transform.} | |
493 @end float | |
494 @ref{fig:rfftwnd} depicts the input and output arrays just | |
495 described, for both the out-of-place and in-place transforms (with the | |
496 arrows indicating consecutive memory locations): | |
497 @end ifnothtml | |
498 @end ifnotinfo | |
499 | |
500 These transforms are unnormalized, so an r2c followed by a c2r | |
501 transform (or vice versa) will result in the original data scaled by | |
502 the number of real data elements---that is, the product of the | |
503 (logical) dimensions of the real data. | |
504 @cindex normalization | |
505 | |
506 | |
507 (Because the last dimension is treated specially, if it is equal to | |
508 @code{1} the transform is @emph{not} equivalent to a lower-dimensional | |
509 r2c/c2r transform. In that case, the last complex dimension also has | |
510 size @code{1} (@code{=1/2+1}), and no advantage is gained over the | |
511 complex transforms.) | |
512 | |
513 @c ------------------------------------------------------------ | |
514 @node More DFTs of Real Data, , Multi-Dimensional DFTs of Real Data, Tutorial | |
515 @section More DFTs of Real Data | |
516 @menu | |
517 * The Halfcomplex-format DFT:: | |
518 * Real even/odd DFTs (cosine/sine transforms):: | |
519 * The Discrete Hartley Transform:: | |
520 @end menu | |
521 | |
522 FFTW supports several other transform types via a unified @dfn{r2r} | |
523 (real-to-real) interface, | |
524 @cindex r2r | |
525 so called because it takes a real (@code{double}) array and outputs a | |
526 real array of the same size. These r2r transforms currently fall into | |
527 three categories: DFTs of real input and complex-Hermitian output in | |
528 halfcomplex format, DFTs of real input with even/odd symmetry | |
529 (a.k.a. discrete cosine/sine transforms, DCTs/DSTs), and discrete | |
530 Hartley transforms (DHTs), all described in more detail by the | |
531 following sections. | |
532 | |
533 The r2r transforms follow the by now familiar interface of creating an | |
534 @code{fftw_plan}, executing it with @code{fftw_execute(plan)}, and | |
535 destroying it with @code{fftw_destroy_plan(plan)}. Furthermore, all | |
536 r2r transforms share the same planner interface: | |
537 | |
538 @example | |
539 fftw_plan fftw_plan_r2r_1d(int n, double *in, double *out, | |
540 fftw_r2r_kind kind, unsigned flags); | |
541 fftw_plan fftw_plan_r2r_2d(int n0, int n1, double *in, double *out, | |
542 fftw_r2r_kind kind0, fftw_r2r_kind kind1, | |
543 unsigned flags); | |
544 fftw_plan fftw_plan_r2r_3d(int n0, int n1, int n2, | |
545 double *in, double *out, | |
546 fftw_r2r_kind kind0, | |
547 fftw_r2r_kind kind1, | |
548 fftw_r2r_kind kind2, | |
549 unsigned flags); | |
550 fftw_plan fftw_plan_r2r(int rank, const int *n, double *in, double *out, | |
551 const fftw_r2r_kind *kind, unsigned flags); | |
552 @end example | |
553 @findex fftw_plan_r2r_1d | |
554 @findex fftw_plan_r2r_2d | |
555 @findex fftw_plan_r2r_3d | |
556 @findex fftw_plan_r2r | |
557 | |
558 Just as for the complex DFT, these plan 1d/2d/3d/multi-dimensional | |
559 transforms for contiguous arrays in row-major order, transforming (real) | |
560 input to output of the same size, where @code{n} specifies the | |
561 @emph{physical} dimensions of the arrays. All positive @code{n} are | |
562 supported (with the exception of @code{n=1} for the @code{FFTW_REDFT00} | |
563 kind, noted in the real-even subsection below); products of small | |
564 factors are most efficient (factorizing @code{n-1} and @code{n+1} for | |
565 @code{FFTW_REDFT00} and @code{FFTW_RODFT00} kinds, described below), but | |
566 an @Onlogn algorithm is used even for prime sizes. | |
567 | |
568 Each dimension has a @dfn{kind} parameter, of type | |
569 @code{fftw_r2r_kind}, specifying the kind of r2r transform to be used | |
570 for that dimension. | |
571 @cindex kind (r2r) | |
572 @tindex fftw_r2r_kind | |
573 (In the case of @code{fftw_plan_r2r}, this is an array @code{kind[rank]} | |
574 where @code{kind[i]} is the transform kind for the dimension | |
575 @code{n[i]}.) The kind can be one of a set of predefined constants, | |
576 defined in the following subsections. | |
577 | |
578 In other words, FFTW computes the separable product of the specified | |
579 r2r transforms over each dimension, which can be used e.g. for partial | |
580 differential equations with mixed boundary conditions. (For some r2r | |
581 kinds, notably the halfcomplex DFT and the DHT, such a separable | |
582 product is somewhat problematic in more than one dimension, however, | |
583 as is described below.) | |
584 | |
585 In the current version of FFTW, all r2r transforms except for the | |
586 halfcomplex type are computed via pre- or post-processing of | |
587 halfcomplex transforms, and they are therefore not as fast as they | |
588 could be. Since most other general DCT/DST codes employ a similar | |
589 algorithm, however, FFTW's implementation should provide at least | |
590 competitive performance. | |
591 | |
592 @c =========> | |
593 @node The Halfcomplex-format DFT, Real even/odd DFTs (cosine/sine transforms), More DFTs of Real Data, More DFTs of Real Data | |
594 @subsection The Halfcomplex-format DFT | |
595 | |
596 An r2r kind of @code{FFTW_R2HC} (@dfn{r2hc}) corresponds to an r2c DFT | |
597 @ctindex FFTW_R2HC | |
598 @cindex r2c | |
599 @cindex r2hc | |
600 (@pxref{One-Dimensional DFTs of Real Data}) but with ``halfcomplex'' | |
601 format output, and may sometimes be faster and/or more convenient than | |
602 the latter. | |
603 @cindex halfcomplex format | |
604 The inverse @dfn{hc2r} transform is of kind @code{FFTW_HC2R}. | |
605 @ctindex FFTW_HC2R | |
606 @cindex hc2r | |
607 This consists of the non-redundant half of the complex output for a 1d | |
608 real-input DFT of size @code{n}, stored as a sequence of @code{n} real | |
609 numbers (@code{double}) in the format: | |
610 | |
611 @tex | |
612 $$ | |
613 r_0, r_1, r_2, \ldots, r_{n/2}, i_{(n+1)/2-1}, \ldots, i_2, i_1 | |
614 $$ | |
615 @end tex | |
616 @ifinfo | |
617 r0, r1, r2, r(n/2), i((n+1)/2-1), ..., i2, i1 | |
618 @end ifinfo | |
619 @html | |
620 <p align=center> | |
621 r<sub>0</sub>, r<sub>1</sub>, r<sub>2</sub>, ..., r<sub>n/2</sub>, i<sub>(n+1)/2-1</sub>, ..., i<sub>2</sub>, i<sub>1</sub> | |
622 </p> | |
623 @end html | |
624 | |
625 Here, | |
626 @ifinfo | |
627 rk | |
628 @end ifinfo | |
629 @tex | |
630 $r_k$ | |
631 @end tex | |
632 @html | |
633 r<sub>k</sub> | |
634 @end html | |
635 is the real part of the @math{k}th output, and | |
636 @ifinfo | |
637 ik | |
638 @end ifinfo | |
639 @tex | |
640 $i_k$ | |
641 @end tex | |
642 @html | |
643 i<sub>k</sub> | |
644 @end html | |
645 is the imaginary part. (Division by 2 is rounded down.) For a | |
646 halfcomplex array @code{hc[n]}, the @math{k}th component thus has its | |
647 real part in @code{hc[k]} and its imaginary part in @code{hc[n-k]}, with | |
648 the exception of @code{k} @code{==} @code{0} or @code{n/2} (the latter | |
649 only if @code{n} is even)---in these two cases, the imaginary part is | |
650 zero due to symmetries of the real-input DFT, and is not stored. | |
651 Thus, the r2hc transform of @code{n} real values is a halfcomplex array of | |
652 length @code{n}, and vice versa for hc2r. | |
653 @cindex normalization | |
654 | |
655 | |
656 Aside from the differing format, the output of | |
657 @code{FFTW_R2HC}/@code{FFTW_HC2R} is otherwise exactly the same as for | |
658 the corresponding 1d r2c/c2r transform | |
659 (i.e. @code{FFTW_FORWARD}/@code{FFTW_BACKWARD} transforms, respectively). | |
660 Recall that these transforms are unnormalized, so r2hc followed by hc2r | |
661 will result in the original data multiplied by @code{n}. Furthermore, | |
662 like the c2r transform, an out-of-place hc2r transform will | |
663 @emph{destroy its input} array. | |
664 | |
665 Although these halfcomplex transforms can be used with the | |
666 multi-dimensional r2r interface, the interpretation of such a separable | |
667 product of transforms along each dimension is problematic. For example, | |
668 consider a two-dimensional @code{n0} by @code{n1}, r2hc by r2hc | |
669 transform planned by @code{fftw_plan_r2r_2d(n0, n1, in, out, FFTW_R2HC, | |
670 FFTW_R2HC, FFTW_MEASURE)}. Conceptually, FFTW first transforms the rows | |
671 (of size @code{n1}) to produce halfcomplex rows, and then transforms the | |
672 columns (of size @code{n0}). Half of these column transforms, however, | |
673 are of imaginary parts, and should therefore be multiplied by @math{i} | |
674 and combined with the r2hc transforms of the real columns to produce the | |
675 2d DFT amplitudes; FFTW's r2r transform does @emph{not} perform this | |
676 combination for you. Thus, if a multi-dimensional real-input/output DFT | |
677 is required, we recommend using the ordinary r2c/c2r | |
678 interface (@pxref{Multi-Dimensional DFTs of Real Data}). | |
679 | |
680 @c =========> | |
681 @node Real even/odd DFTs (cosine/sine transforms), The Discrete Hartley Transform, The Halfcomplex-format DFT, More DFTs of Real Data | |
682 @subsection Real even/odd DFTs (cosine/sine transforms) | |
683 | |
684 The Fourier transform of a real-even function @math{f(-x) = f(x)} is | |
685 real-even, and @math{i} times the Fourier transform of a real-odd | |
686 function @math{f(-x) = -f(x)} is real-odd. Similar results hold for a | |
687 discrete Fourier transform, and thus for these symmetries the need for | |
688 complex inputs/outputs is entirely eliminated. Moreover, one gains a | |
689 factor of two in speed/space from the fact that the data are real, and | |
690 an additional factor of two from the even/odd symmetry: only the | |
691 non-redundant (first) half of the array need be stored. The result is | |
692 the real-even DFT (@dfn{REDFT}) and the real-odd DFT (@dfn{RODFT}), also | |
693 known as the discrete cosine and sine transforms (@dfn{DCT} and | |
694 @dfn{DST}), respectively. | |
695 @cindex real-even DFT | |
696 @cindex REDFT | |
697 @cindex real-odd DFT | |
698 @cindex RODFT | |
699 @cindex discrete cosine transform | |
700 @cindex DCT | |
701 @cindex discrete sine transform | |
702 @cindex DST | |
703 | |
704 | |
705 (In this section, we describe the 1d transforms; multi-dimensional | |
706 transforms are just a separable product of these transforms operating | |
707 along each dimension.) | |
708 | |
709 Because of the discrete sampling, one has an additional choice: is the | |
710 data even/odd around a sampling point, or around the point halfway | |
711 between two samples? The latter corresponds to @emph{shifting} the | |
712 samples by @emph{half} an interval, and gives rise to several transform | |
713 variants denoted by REDFT@math{ab} and RODFT@math{ab}: @math{a} and | |
714 @math{b} are @math{0} or @math{1}, and indicate whether the input | |
715 (@math{a}) and/or output (@math{b}) are shifted by half a sample | |
716 (@math{1} means it is shifted). These are also known as types I-IV of | |
717 the DCT and DST, and all four types are supported by FFTW's r2r | |
718 interface.@footnote{There are also type V-VIII transforms, which | |
719 correspond to a logical DFT of @emph{odd} size @math{N}, independent of | |
720 whether the physical size @code{n} is odd, but we do not support these | |
721 variants.} | |
722 | |
723 The r2r kinds for the various REDFT and RODFT types supported by FFTW, | |
724 along with the boundary conditions at both ends of the @emph{input} | |
725 array (@code{n} real numbers @code{in[j=0..n-1]}), are: | |
726 | |
727 @itemize @bullet | |
728 | |
729 @item | |
730 @code{FFTW_REDFT00} (DCT-I): even around @math{j=0} and even around @math{j=n-1}. | |
731 @ctindex FFTW_REDFT00 | |
732 | |
733 @item | |
734 @code{FFTW_REDFT10} (DCT-II, ``the'' DCT): even around @math{j=-0.5} and even around @math{j=n-0.5}. | |
735 @ctindex FFTW_REDFT10 | |
736 | |
737 @item | |
738 @code{FFTW_REDFT01} (DCT-III, ``the'' IDCT): even around @math{j=0} and odd around @math{j=n}. | |
739 @ctindex FFTW_REDFT01 | |
740 @cindex IDCT | |
741 | |
742 @item | |
743 @code{FFTW_REDFT11} (DCT-IV): even around @math{j=-0.5} and odd around @math{j=n-0.5}. | |
744 @ctindex FFTW_REDFT11 | |
745 | |
746 @item | |
747 @code{FFTW_RODFT00} (DST-I): odd around @math{j=-1} and odd around @math{j=n}. | |
748 @ctindex FFTW_RODFT00 | |
749 | |
750 @item | |
751 @code{FFTW_RODFT10} (DST-II): odd around @math{j=-0.5} and odd around @math{j=n-0.5}. | |
752 @ctindex FFTW_RODFT10 | |
753 | |
754 @item | |
755 @code{FFTW_RODFT01} (DST-III): odd around @math{j=-1} and even around @math{j=n-1}. | |
756 @ctindex FFTW_RODFT01 | |
757 | |
758 @item | |
759 @code{FFTW_RODFT11} (DST-IV): odd around @math{j=-0.5} and even around @math{j=n-0.5}. | |
760 @ctindex FFTW_RODFT11 | |
761 | |
762 @end itemize | |
763 | |
764 Note that these symmetries apply to the ``logical'' array being | |
765 transformed; @strong{there are no constraints on your physical input | |
766 data}. So, for example, if you specify a size-5 REDFT00 (DCT-I) of the | |
767 data @math{abcde}, it corresponds to the DFT of the logical even array | |
768 @math{abcdedcb} of size 8. A size-4 REDFT10 (DCT-II) of the data | |
769 @math{abcd} corresponds to the size-8 logical DFT of the even array | |
770 @math{abcddcba}, shifted by half a sample. | |
771 | |
772 All of these transforms are invertible. The inverse of R*DFT00 is | |
773 R*DFT00; of R*DFT10 is R*DFT01 and vice versa (these are often called | |
774 simply ``the'' DCT and IDCT, respectively); and of R*DFT11 is R*DFT11. | |
775 However, the transforms computed by FFTW are unnormalized, exactly | |
776 like the corresponding real and complex DFTs, so computing a transform | |
777 followed by its inverse yields the original array scaled by @math{N}, | |
778 where @math{N} is the @emph{logical} DFT size. For REDFT00, | |
779 @math{N=2(n-1)}; for RODFT00, @math{N=2(n+1)}; otherwise, @math{N=2n}. | |
780 @cindex normalization | |
781 @cindex IDCT | |
782 | |
783 | |
784 Note that the boundary conditions of the transform output array are | |
785 given by the input boundary conditions of the inverse transform. | |
786 Thus, the above transforms are all inequivalent in terms of | |
787 input/output boundary conditions, even neglecting the 0.5 shift | |
788 difference. | |
789 | |
790 FFTW is most efficient when @math{N} is a product of small factors; note | |
791 that this @emph{differs} from the factorization of the physical size | |
792 @code{n} for REDFT00 and RODFT00! There is another oddity: @code{n=1} | |
793 REDFT00 transforms correspond to @math{N=0}, and so are @emph{not | |
794 defined} (the planner will return @code{NULL}). Otherwise, any positive | |
795 @code{n} is supported. | |
796 | |
797 For the precise mathematical definitions of these transforms as used by | |
798 FFTW, see @ref{What FFTW Really Computes}. (For people accustomed to | |
799 the DCT/DST, FFTW's definitions have a coefficient of @math{2} in front | |
800 of the cos/sin functions so that they correspond precisely to an | |
801 even/odd DFT of size @math{N}. Some authors also include additional | |
802 multiplicative factors of | |
803 @ifinfo | |
804 sqrt(2) | |
805 @end ifinfo | |
806 @html | |
807 √2 | |
808 @end html | |
809 @tex | |
810 $\sqrt{2}$ | |
811 @end tex | |
812 for selected inputs and outputs; this makes | |
813 the transform orthogonal, but sacrifices the direct equivalence to a | |
814 symmetric DFT.) | |
815 | |
816 @subsubheading Which type do you need? | |
817 | |
818 Since the required flavor of even/odd DFT depends upon your problem, | |
819 you are the best judge of this choice, but we can make a few comments | |
820 on relative efficiency to help you in your selection. In particular, | |
821 R*DFT01 and R*DFT10 tend to be slightly faster than R*DFT11 | |
822 (especially for odd sizes), while the R*DFT00 transforms are sometimes | |
823 significantly slower (especially for even sizes).@footnote{R*DFT00 is | |
824 sometimes slower in FFTW because we discovered that the standard | |
825 algorithm for computing this by a pre/post-processed real DFT---the | |
826 algorithm used in FFTPACK, Numerical Recipes, and other sources for | |
827 decades now---has serious numerical problems: it already loses several | |
828 decimal places of accuracy for 16k sizes. There seem to be only two | |
829 alternatives in the literature that do not suffer similarly: a | |
830 recursive decomposition into smaller DCTs, which would require a large | |
831 set of codelets for efficiency and generality, or sacrificing a factor of | |
832 @tex | |
833 $\sim 2$ | |
834 @end tex | |
835 @ifnottex | |
836 2 | |
837 @end ifnottex | |
838 in speed to use a real DFT of twice the size. We currently | |
839 employ the latter technique for general @math{n}, as well as a limited | |
840 form of the former method: a split-radix decomposition when @math{n} | |
841 is odd (@math{N} a multiple of 4). For @math{N} containing many | |
842 factors of 2, the split-radix method seems to recover most of the | |
843 speed of the standard algorithm without the accuracy tradeoff.} | |
844 | |
845 Thus, if only the boundary conditions on the transform inputs are | |
846 specified, we generally recommend R*DFT10 over R*DFT00 and R*DFT01 over | |
847 R*DFT11 (unless the half-sample shift or the self-inverse property is | |
848 significant for your problem). | |
849 | |
850 If performance is important to you and you are using only small sizes | |
851 (say @math{n<200}), e.g. for multi-dimensional transforms, then you | |
852 might consider generating hard-coded transforms of those sizes and types | |
853 that you are interested in (@pxref{Generating your own code}). | |
854 | |
855 We are interested in hearing what types of symmetric transforms you find | |
856 most useful. | |
857 | |
858 @c =========> | |
859 @node The Discrete Hartley Transform, , Real even/odd DFTs (cosine/sine transforms), More DFTs of Real Data | |
860 @subsection The Discrete Hartley Transform | |
861 | |
862 If you are planning to use the DHT because you've heard that it is | |
863 ``faster'' than the DFT (FFT), @strong{stop here}. The DHT is not | |
864 faster than the DFT. That story is an old but enduring misconception | |
865 that was debunked in 1987. | |
866 | |
867 The discrete Hartley transform (DHT) is an invertible linear transform | |
868 closely related to the DFT. In the DFT, one multiplies each input by | |
869 @math{cos - i * sin} (a complex exponential), whereas in the DHT each | |
870 input is multiplied by simply @math{cos + sin}. Thus, the DHT | |
871 transforms @code{n} real numbers to @code{n} real numbers, and has the | |
872 convenient property of being its own inverse. In FFTW, a DHT (of any | |
873 positive @code{n}) can be specified by an r2r kind of @code{FFTW_DHT}. | |
874 @ctindex FFTW_DHT | |
875 @cindex discrete Hartley transform | |
876 @cindex DHT | |
877 | |
878 Like the DFT, in FFTW the DHT is unnormalized, so computing a DHT of | |
879 size @code{n} followed by another DHT of the same size will result in | |
880 the original array multiplied by @code{n}. | |
881 @cindex normalization | |
882 | |
883 The DHT was originally proposed as a more efficient alternative to the | |
884 DFT for real data, but it was subsequently shown that a specialized DFT | |
885 (such as FFTW's r2hc or r2c transforms) could be just as fast. In FFTW, | |
886 the DHT is actually computed by post-processing an r2hc transform, so | |
887 there is ordinarily no reason to prefer it from a performance | |
888 perspective.@footnote{We provide the DHT mainly as a byproduct of some | |
889 internal algorithms. FFTW computes a real input/output DFT of | |
890 @emph{prime} size by re-expressing it as a DHT plus post/pre-processing | |
891 and then using Rader's prime-DFT algorithm adapted to the DHT.} | |
892 However, we have heard rumors that the DHT might be the most appropriate | |
893 transform in its own right for certain applications, and we would be | |
894 very interested to hear from anyone who finds it useful. | |
895 | |
896 If @code{FFTW_DHT} is specified for multiple dimensions of a | |
897 multi-dimensional transform, FFTW computes the separable product of 1d | |
898 DHTs along each dimension. Unfortunately, this is not quite the same | |
899 thing as a true multi-dimensional DHT; you can compute the latter, if | |
900 necessary, with at most @code{rank-1} post-processing passes | |
901 [see e.g. H. Hao and R. N. Bracewell, @i{Proc. IEEE} @b{75}, 264--266 (1987)]. | |
902 | |
903 For the precise mathematical definition of the DHT as used by FFTW, see | |
904 @ref{What FFTW Really Computes}. | |
905 |