comparison src/fftw-3.3.3/doc/tutorial.texi @ 10:37bf6b4a2645

Add FFTW3
author Chris Cannam
date Wed, 20 Mar 2013 15:35:50 +0000
parents
children
comparison
equal deleted inserted replaced
9:c0fb53affa76 10:37bf6b4a2645
1 @node Tutorial, Other Important Topics, Introduction, Top
2 @chapter Tutorial
3 @menu
4 * Complex One-Dimensional DFTs::
5 * Complex Multi-Dimensional DFTs::
6 * One-Dimensional DFTs of Real Data::
7 * Multi-Dimensional DFTs of Real Data::
8 * More DFTs of Real Data::
9 @end menu
10
11 This chapter describes the basic usage of FFTW, i.e., how to compute
12 @cindex basic interface
13 the Fourier transform of a single array. This chapter tells the
14 truth, but not the @emph{whole} truth. Specifically, FFTW implements
15 additional routines and flags that are not documented here, although
16 in many cases we try to indicate where added capabilities exist. For
17 more complete information, see @ref{FFTW Reference}. (Note that you
18 need to compile and install FFTW before you can use it in a program.
19 For the details of the installation, see @ref{Installation and
20 Customization}.)
21
22 We recommend that you read this tutorial in order.@footnote{You can
23 read the tutorial in bit-reversed order after computing your first
24 transform.} At the least, read the first section (@pxref{Complex
25 One-Dimensional DFTs}) before reading any of the others, even if your
26 main interest lies in one of the other transform types.
27
28 Users of FFTW version 2 and earlier may also want to read @ref{Upgrading
29 from FFTW version 2}.
30
31 @c ------------------------------------------------------------
32 @node Complex One-Dimensional DFTs, Complex Multi-Dimensional DFTs, Tutorial, Tutorial
33 @section Complex One-Dimensional DFTs
34
35 @quotation
36 Plan: To bother about the best method of accomplishing an accidental result.
37 [Ambrose Bierce, @cite{The Enlarged Devil's Dictionary}.]
38 @cindex Devil
39 @end quotation
40
41 @iftex
42 @medskip
43 @end iftex
44
45 The basic usage of FFTW to compute a one-dimensional DFT of size
46 @code{N} is simple, and it typically looks something like this code:
47
48 @example
49 #include <fftw3.h>
50 ...
51 @{
52 fftw_complex *in, *out;
53 fftw_plan p;
54 ...
55 in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
56 out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
57 p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE);
58 ...
59 fftw_execute(p); /* @r{repeat as needed} */
60 ...
61 fftw_destroy_plan(p);
62 fftw_free(in); fftw_free(out);
63 @}
64 @end example
65
66 You must link this code with the @code{fftw3} library. On Unix systems,
67 link with @code{-lfftw3 -lm}.
68
69 The example code first allocates the input and output arrays. You can
70 allocate them in any way that you like, but we recommend using
71 @code{fftw_malloc}, which behaves like
72 @findex fftw_malloc
73 @code{malloc} except that it properly aligns the array when SIMD
74 instructions (such as SSE and Altivec) are available (@pxref{SIMD
75 alignment and fftw_malloc}). [Alternatively, we provide a convenient wrapper function @code{fftw_alloc_complex(N)} which has the same effect.]
76 @findex fftw_alloc_complex
77 @cindex SIMD
78
79
80 The data is an array of type @code{fftw_complex}, which is by default a
81 @code{double[2]} composed of the real (@code{in[i][0]}) and imaginary
82 (@code{in[i][1]}) parts of a complex number.
83 @tindex fftw_complex
84
85 The next step is to create a @dfn{plan}, which is an object
86 @cindex plan
87 that contains all the data that FFTW needs to compute the FFT.
88 This function creates the plan:
89
90 @example
91 fftw_plan fftw_plan_dft_1d(int n, fftw_complex *in, fftw_complex *out,
92 int sign, unsigned flags);
93 @end example
94 @findex fftw_plan_dft_1d
95 @tindex fftw_plan
96
97 The first argument, @code{n}, is the size of the transform you are
98 trying to compute. The size @code{n} can be any positive integer, but
99 sizes that are products of small factors are transformed most
100 efficiently (although prime sizes still use an @Onlogn{} algorithm).
101
102 The next two arguments are pointers to the input and output arrays of
103 the transform. These pointers can be equal, indicating an
104 @dfn{in-place} transform.
105 @cindex in-place
106
107
108 The fourth argument, @code{sign}, can be either @code{FFTW_FORWARD}
109 (@code{-1}) or @code{FFTW_BACKWARD} (@code{+1}),
110 @ctindex FFTW_FORWARD
111 @ctindex FFTW_BACKWARD
112 and indicates the direction of the transform you are interested in;
113 technically, it is the sign of the exponent in the transform.
114
115 The @code{flags} argument is usually either @code{FFTW_MEASURE} or
116 @cindex flags
117 @code{FFTW_ESTIMATE}. @code{FFTW_MEASURE} instructs FFTW to run
118 @ctindex FFTW_MEASURE
119 and measure the execution time of several FFTs in order to find the
120 best way to compute the transform of size @code{n}. This process takes
121 some time (usually a few seconds), depending on your machine and on
122 the size of the transform. @code{FFTW_ESTIMATE}, on the contrary,
123 does not run any computation and just builds a
124 @ctindex FFTW_ESTIMATE
125 reasonable plan that is probably sub-optimal. In short, if your
126 program performs many transforms of the same size and initialization
127 time is not important, use @code{FFTW_MEASURE}; otherwise use the
128 estimate.
129
130 @emph{You must create the plan before initializing the input}, because
131 @code{FFTW_MEASURE} overwrites the @code{in}/@code{out} arrays.
132 (Technically, @code{FFTW_ESTIMATE} does not touch your arrays, but you
133 should always create plans first just to be sure.)
134
135 Once the plan has been created, you can use it as many times as you
136 like for transforms on the specified @code{in}/@code{out} arrays,
137 computing the actual transforms via @code{fftw_execute(plan)}:
138 @example
139 void fftw_execute(const fftw_plan plan);
140 @end example
141 @findex fftw_execute
142
143 The DFT results are stored in-order in the array @code{out}, with the
144 zero-frequency (DC) component in @code{out[0]}.
145 @cindex frequency
146 If @code{in != out}, the transform is @dfn{out-of-place} and the input
147 array @code{in} is not modified. Otherwise, the input array is
148 overwritten with the transform.
149
150 @cindex execute
151 If you want to transform a @emph{different} array of the same size, you
152 can create a new plan with @code{fftw_plan_dft_1d} and FFTW
153 automatically reuses the information from the previous plan, if
154 possible. Alternatively, with the ``guru'' interface you can apply a
155 given plan to a different array, if you are careful.
156 @xref{FFTW Reference}.
157
158 When you are done with the plan, you deallocate it by calling
159 @code{fftw_destroy_plan(plan)}:
160 @example
161 void fftw_destroy_plan(fftw_plan plan);
162 @end example
163 @findex fftw_destroy_plan
164 If you allocate an array with @code{fftw_malloc()} you must deallocate
165 it with @code{fftw_free()}. Do not use @code{free()} or, heaven
166 forbid, @code{delete}.
167 @findex fftw_free
168
169 FFTW computes an @emph{unnormalized} DFT. Thus, computing a forward
170 followed by a backward transform (or vice versa) results in the original
171 array scaled by @code{n}. For the definition of the DFT, see @ref{What
172 FFTW Really Computes}.
173 @cindex DFT
174 @cindex normalization
175
176
177 If you have a C compiler, such as @code{gcc}, that supports the
178 C99 standard, and you @code{#include <complex.h>} @emph{before}
179 @code{<fftw3.h>}, then @code{fftw_complex} is the native
180 double-precision complex type and you can manipulate it with ordinary
181 arithmetic. Otherwise, FFTW defines its own complex type, which is
182 bit-compatible with the C99 complex type. @xref{Complex numbers}.
183 (The C++ @code{<complex>} template class may also be usable via a
184 typecast.)
185 @cindex C++
186
187 To use single or long-double precision versions of FFTW, replace the
188 @code{fftw_} prefix by @code{fftwf_} or @code{fftwl_} and link with
189 @code{-lfftw3f} or @code{-lfftw3l}, but use the @emph{same}
190 @code{<fftw3.h>} header file.
191 @cindex precision
192
193
194 Many more flags exist besides @code{FFTW_MEASURE} and
195 @code{FFTW_ESTIMATE}. For example, use @code{FFTW_PATIENT} if you're
196 willing to wait even longer for a possibly even faster plan (@pxref{FFTW
197 Reference}).
198 @ctindex FFTW_PATIENT
199 You can also save plans for future use, as described by @ref{Words of
200 Wisdom-Saving Plans}.
201
202 @c ------------------------------------------------------------
203 @node Complex Multi-Dimensional DFTs, One-Dimensional DFTs of Real Data, Complex One-Dimensional DFTs, Tutorial
204 @section Complex Multi-Dimensional DFTs
205
206 Multi-dimensional transforms work much the same way as one-dimensional
207 transforms: you allocate arrays of @code{fftw_complex} (preferably
208 using @code{fftw_malloc}), create an @code{fftw_plan}, execute it as
209 many times as you want with @code{fftw_execute(plan)}, and clean up
210 with @code{fftw_destroy_plan(plan)} (and @code{fftw_free}).
211
212 FFTW provides two routines for creating plans for 2d and 3d transforms,
213 and one routine for creating plans of arbitrary dimensionality.
214 The 2d and 3d routines have the following signature:
215 @example
216 fftw_plan fftw_plan_dft_2d(int n0, int n1,
217 fftw_complex *in, fftw_complex *out,
218 int sign, unsigned flags);
219 fftw_plan fftw_plan_dft_3d(int n0, int n1, int n2,
220 fftw_complex *in, fftw_complex *out,
221 int sign, unsigned flags);
222 @end example
223 @findex fftw_plan_dft_2d
224 @findex fftw_plan_dft_3d
225
226 These routines create plans for @code{n0} by @code{n1} two-dimensional
227 (2d) transforms and @code{n0} by @code{n1} by @code{n2} 3d transforms,
228 respectively. All of these transforms operate on contiguous arrays in
229 the C-standard @dfn{row-major} order, so that the last dimension has the
230 fastest-varying index in the array. This layout is described further in
231 @ref{Multi-dimensional Array Format}.
232
233 FFTW can also compute transforms of higher dimensionality. In order to
234 avoid confusion between the various meanings of the the word
235 ``dimension'', we use the term @emph{rank}
236 @cindex rank
237 to denote the number of independent indices in an array.@footnote{The
238 term ``rank'' is commonly used in the APL, FORTRAN, and Common Lisp
239 traditions, although it is not so common in the C@tie{}world.} For
240 example, we say that a 2d transform has rank@tie{}2, a 3d transform has
241 rank@tie{}3, and so on. You can plan transforms of arbitrary rank by
242 means of the following function:
243
244 @example
245 fftw_plan fftw_plan_dft(int rank, const int *n,
246 fftw_complex *in, fftw_complex *out,
247 int sign, unsigned flags);
248 @end example
249 @findex fftw_plan_dft
250
251 Here, @code{n} is a pointer to an array @code{n[rank]} denoting an
252 @code{n[0]} by @code{n[1]} by @dots{} by @code{n[rank-1]} transform.
253 Thus, for example, the call
254 @example
255 fftw_plan_dft_2d(n0, n1, in, out, sign, flags);
256 @end example
257 is equivalent to the following code fragment:
258 @example
259 int n[2];
260 n[0] = n0;
261 n[1] = n1;
262 fftw_plan_dft(2, n, in, out, sign, flags);
263 @end example
264 @code{fftw_plan_dft} is not restricted to 2d and 3d transforms,
265 however, but it can plan transforms of arbitrary rank.
266
267 You may have noticed that all the planner routines described so far
268 have overlapping functionality. For example, you can plan a 1d or 2d
269 transform by using @code{fftw_plan_dft} with a @code{rank} of @code{1}
270 or @code{2}, or even by calling @code{fftw_plan_dft_3d} with @code{n0}
271 and/or @code{n1} equal to @code{1} (with no loss in efficiency). This
272 pattern continues, and FFTW's planning routines in general form a
273 ``partial order,'' sequences of
274 @cindex partial order
275 interfaces with strictly increasing generality but correspondingly
276 greater complexity.
277
278 @code{fftw_plan_dft} is the most general complex-DFT routine that we
279 describe in this tutorial, but there are also the advanced and guru interfaces,
280 @cindex advanced interface
281 @cindex guru interface
282 which allow one to efficiently combine multiple/strided transforms
283 into a single FFTW plan, transform a subset of a larger
284 multi-dimensional array, and/or to handle more general complex-number
285 formats. For more information, see @ref{FFTW Reference}.
286
287 @c ------------------------------------------------------------
288 @node One-Dimensional DFTs of Real Data, Multi-Dimensional DFTs of Real Data, Complex Multi-Dimensional DFTs, Tutorial
289 @section One-Dimensional DFTs of Real Data
290
291 In many practical applications, the input data @code{in[i]} are purely
292 real numbers, in which case the DFT output satisfies the ``Hermitian''
293 @cindex Hermitian
294 redundancy: @code{out[i]} is the conjugate of @code{out[n-i]}. It is
295 possible to take advantage of these circumstances in order to achieve
296 roughly a factor of two improvement in both speed and memory usage.
297
298 In exchange for these speed and space advantages, the user sacrifices
299 some of the simplicity of FFTW's complex transforms. First of all, the
300 input and output arrays are of @emph{different sizes and types}: the
301 input is @code{n} real numbers, while the output is @code{n/2+1}
302 complex numbers (the non-redundant outputs); this also requires slight
303 ``padding'' of the input array for
304 @cindex padding
305 in-place transforms. Second, the inverse transform (complex to real)
306 has the side-effect of @emph{overwriting its input array}, by default.
307 Neither of these inconveniences should pose a serious problem for
308 users, but it is important to be aware of them.
309
310 The routines to perform real-data transforms are almost the same as
311 those for complex transforms: you allocate arrays of @code{double}
312 and/or @code{fftw_complex} (preferably using @code{fftw_malloc} or
313 @code{fftw_alloc_complex}), create an @code{fftw_plan}, execute it as
314 many times as you want with @code{fftw_execute(plan)}, and clean up
315 with @code{fftw_destroy_plan(plan)} (and @code{fftw_free}). The only
316 differences are that the input (or output) is of type @code{double}
317 and there are new routines to create the plan. In one dimension:
318
319 @example
320 fftw_plan fftw_plan_dft_r2c_1d(int n, double *in, fftw_complex *out,
321 unsigned flags);
322 fftw_plan fftw_plan_dft_c2r_1d(int n, fftw_complex *in, double *out,
323 unsigned flags);
324 @end example
325 @findex fftw_plan_dft_r2c_1d
326 @findex fftw_plan_dft_c2r_1d
327
328 for the real input to complex-Hermitian output (@dfn{r2c}) and
329 complex-Hermitian input to real output (@dfn{c2r}) transforms.
330 @cindex r2c
331 @cindex c2r
332 Unlike the complex DFT planner, there is no @code{sign} argument.
333 Instead, r2c DFTs are always @code{FFTW_FORWARD} and c2r DFTs are
334 always @code{FFTW_BACKWARD}.
335 @ctindex FFTW_FORWARD
336 @ctindex FFTW_BACKWARD
337 (For single/long-double precision
338 @code{fftwf} and @code{fftwl}, @code{double} should be replaced by
339 @code{float} and @code{long double}, respectively.)
340 @cindex precision
341
342
343 Here, @code{n} is the ``logical'' size of the DFT, not necessarily the
344 physical size of the array. In particular, the real (@code{double})
345 array has @code{n} elements, while the complex (@code{fftw_complex})
346 array has @code{n/2+1} elements (where the division is rounded down).
347 For an in-place transform,
348 @cindex in-place
349 @code{in} and @code{out} are aliased to the same array, which must be
350 big enough to hold both; so, the real array would actually have
351 @code{2*(n/2+1)} elements, where the elements beyond the first
352 @code{n} are unused padding. (Note that this is very different from
353 the concept of ``zero-padding'' a transform to a larger length, which
354 changes the logical size of the DFT by actually adding new input
355 data.) The @math{k}th element of the complex array is exactly the
356 same as the @math{k}th element of the corresponding complex DFT. All
357 positive @code{n} are supported; products of small factors are most
358 efficient, but an @Onlogn algorithm is used even for prime sizes.
359
360 As noted above, the c2r transform destroys its input array even for
361 out-of-place transforms. This can be prevented, if necessary, by
362 including @code{FFTW_PRESERVE_INPUT} in the @code{flags}, with
363 unfortunately some sacrifice in performance.
364 @cindex flags
365 @ctindex FFTW_PRESERVE_INPUT
366 This flag is also not currently supported for multi-dimensional real
367 DFTs (next section).
368
369 Readers familiar with DFTs of real data will recall that the 0th (the
370 ``DC'') and @code{n/2}-th (the ``Nyquist'' frequency, when @code{n} is
371 even) elements of the complex output are purely real. Some
372 implementations therefore store the Nyquist element where the DC
373 imaginary part would go, in order to make the input and output arrays
374 the same size. Such packing, however, does not generalize well to
375 multi-dimensional transforms, and the space savings are miniscule in
376 any case; FFTW does not support it.
377
378 An alternative interface for one-dimensional r2c and c2r DFTs can be
379 found in the @samp{r2r} interface (@pxref{The Halfcomplex-format
380 DFT}), with ``halfcomplex''-format output that @emph{is} the same size
381 (and type) as the input array.
382 @cindex halfcomplex format
383 That interface, although it is not very useful for multi-dimensional
384 transforms, may sometimes yield better performance.
385
386 @c ------------------------------------------------------------
387 @node Multi-Dimensional DFTs of Real Data, More DFTs of Real Data, One-Dimensional DFTs of Real Data, Tutorial
388 @section Multi-Dimensional DFTs of Real Data
389
390 Multi-dimensional DFTs of real data use the following planner routines:
391
392 @example
393 fftw_plan fftw_plan_dft_r2c_2d(int n0, int n1,
394 double *in, fftw_complex *out,
395 unsigned flags);
396 fftw_plan fftw_plan_dft_r2c_3d(int n0, int n1, int n2,
397 double *in, fftw_complex *out,
398 unsigned flags);
399 fftw_plan fftw_plan_dft_r2c(int rank, const int *n,
400 double *in, fftw_complex *out,
401 unsigned flags);
402 @end example
403 @findex fftw_plan_dft_r2c_2d
404 @findex fftw_plan_dft_r2c_3d
405 @findex fftw_plan_dft_r2c
406
407 as well as the corresponding @code{c2r} routines with the input/output
408 types swapped. These routines work similarly to their complex
409 analogues, except for the fact that here the complex output array is cut
410 roughly in half and the real array requires padding for in-place
411 transforms (as in 1d, above).
412
413 As before, @code{n} is the logical size of the array, and the
414 consequences of this on the the format of the complex arrays deserve
415 careful attention.
416 @cindex r2c/c2r multi-dimensional array format
417 Suppose that the real data has dimensions @ndims (in row-major order).
418 Then, after an r2c transform, the output is an @ndimshalf array of
419 @code{fftw_complex} values in row-major order, corresponding to slightly
420 over half of the output of the corresponding complex DFT. (The division
421 is rounded down.) The ordering of the data is otherwise exactly the
422 same as in the complex-DFT case.
423
424 For out-of-place transforms, this is the end of the story: the real
425 data is stored as a row-major array of size @ndims and the complex
426 data is stored as a row-major array of size @ndimshalf{}.
427
428 For in-place transforms, however, extra padding of the real-data array
429 is necessary because the complex array is larger than the real array,
430 and the two arrays share the same memory locations. Thus, for
431 in-place transforms, the final dimension of the real-data array must
432 be padded with extra values to accommodate the size of the complex
433 data---two values if the last dimension is even and one if it is odd.
434 @cindex padding
435 That is, the last dimension of the real data must physically contain
436 @tex
437 $2 (n_{d-1}/2+1)$
438 @end tex
439 @ifinfo
440 2 * (n[d-1]/2+1)
441 @end ifinfo
442 @html
443 2 * (n<sub>d-1</sub>/2+1)
444 @end html
445 @code{double} values (exactly enough to hold the complex data).
446 This physical array size does not, however, change the @emph{logical}
447 array size---only
448 @tex
449 $n_{d-1}$
450 @end tex
451 @ifinfo
452 n[d-1]
453 @end ifinfo
454 @html
455 n<sub>d-1</sub>
456 @end html
457 values are actually stored in the last dimension, and
458 @tex
459 $n_{d-1}$
460 @end tex
461 @ifinfo
462 n[d-1]
463 @end ifinfo
464 @html
465 n<sub>d-1</sub>
466 @end html
467 is the last dimension passed to the plan-creation routine.
468
469 For example, consider the transform of a two-dimensional real array of
470 size @code{n0} by @code{n1}. The output of the r2c transform is a
471 two-dimensional complex array of size @code{n0} by @code{n1/2+1}, where
472 the @code{y} dimension has been cut nearly in half because of
473 redundancies in the output. Because @code{fftw_complex} is twice the
474 size of @code{double}, the output array is slightly bigger than the
475 input array. Thus, if we want to compute the transform in place, we
476 must @emph{pad} the input array so that it is of size @code{n0} by
477 @code{2*(n1/2+1)}. If @code{n1} is even, then there are two padding
478 elements at the end of each row (which need not be initialized, as they
479 are only used for output).
480
481 @ifhtml
482 The following illustration depicts the input and output arrays just
483 described, for both the out-of-place and in-place transforms (with the
484 arrows indicating consecutive memory locations):
485 @image{rfftwnd-for-html}
486 @end ifhtml
487 @ifnotinfo
488 @ifnothtml
489 @float Figure,fig:rfftwnd
490 @center @image{rfftwnd}
491 @caption{Illustration of the data layout for a 2d @code{nx} by @code{ny}
492 real-to-complex transform.}
493 @end float
494 @ref{fig:rfftwnd} depicts the input and output arrays just
495 described, for both the out-of-place and in-place transforms (with the
496 arrows indicating consecutive memory locations):
497 @end ifnothtml
498 @end ifnotinfo
499
500 These transforms are unnormalized, so an r2c followed by a c2r
501 transform (or vice versa) will result in the original data scaled by
502 the number of real data elements---that is, the product of the
503 (logical) dimensions of the real data.
504 @cindex normalization
505
506
507 (Because the last dimension is treated specially, if it is equal to
508 @code{1} the transform is @emph{not} equivalent to a lower-dimensional
509 r2c/c2r transform. In that case, the last complex dimension also has
510 size @code{1} (@code{=1/2+1}), and no advantage is gained over the
511 complex transforms.)
512
513 @c ------------------------------------------------------------
514 @node More DFTs of Real Data, , Multi-Dimensional DFTs of Real Data, Tutorial
515 @section More DFTs of Real Data
516 @menu
517 * The Halfcomplex-format DFT::
518 * Real even/odd DFTs (cosine/sine transforms)::
519 * The Discrete Hartley Transform::
520 @end menu
521
522 FFTW supports several other transform types via a unified @dfn{r2r}
523 (real-to-real) interface,
524 @cindex r2r
525 so called because it takes a real (@code{double}) array and outputs a
526 real array of the same size. These r2r transforms currently fall into
527 three categories: DFTs of real input and complex-Hermitian output in
528 halfcomplex format, DFTs of real input with even/odd symmetry
529 (a.k.a. discrete cosine/sine transforms, DCTs/DSTs), and discrete
530 Hartley transforms (DHTs), all described in more detail by the
531 following sections.
532
533 The r2r transforms follow the by now familiar interface of creating an
534 @code{fftw_plan}, executing it with @code{fftw_execute(plan)}, and
535 destroying it with @code{fftw_destroy_plan(plan)}. Furthermore, all
536 r2r transforms share the same planner interface:
537
538 @example
539 fftw_plan fftw_plan_r2r_1d(int n, double *in, double *out,
540 fftw_r2r_kind kind, unsigned flags);
541 fftw_plan fftw_plan_r2r_2d(int n0, int n1, double *in, double *out,
542 fftw_r2r_kind kind0, fftw_r2r_kind kind1,
543 unsigned flags);
544 fftw_plan fftw_plan_r2r_3d(int n0, int n1, int n2,
545 double *in, double *out,
546 fftw_r2r_kind kind0,
547 fftw_r2r_kind kind1,
548 fftw_r2r_kind kind2,
549 unsigned flags);
550 fftw_plan fftw_plan_r2r(int rank, const int *n, double *in, double *out,
551 const fftw_r2r_kind *kind, unsigned flags);
552 @end example
553 @findex fftw_plan_r2r_1d
554 @findex fftw_plan_r2r_2d
555 @findex fftw_plan_r2r_3d
556 @findex fftw_plan_r2r
557
558 Just as for the complex DFT, these plan 1d/2d/3d/multi-dimensional
559 transforms for contiguous arrays in row-major order, transforming (real)
560 input to output of the same size, where @code{n} specifies the
561 @emph{physical} dimensions of the arrays. All positive @code{n} are
562 supported (with the exception of @code{n=1} for the @code{FFTW_REDFT00}
563 kind, noted in the real-even subsection below); products of small
564 factors are most efficient (factorizing @code{n-1} and @code{n+1} for
565 @code{FFTW_REDFT00} and @code{FFTW_RODFT00} kinds, described below), but
566 an @Onlogn algorithm is used even for prime sizes.
567
568 Each dimension has a @dfn{kind} parameter, of type
569 @code{fftw_r2r_kind}, specifying the kind of r2r transform to be used
570 for that dimension.
571 @cindex kind (r2r)
572 @tindex fftw_r2r_kind
573 (In the case of @code{fftw_plan_r2r}, this is an array @code{kind[rank]}
574 where @code{kind[i]} is the transform kind for the dimension
575 @code{n[i]}.) The kind can be one of a set of predefined constants,
576 defined in the following subsections.
577
578 In other words, FFTW computes the separable product of the specified
579 r2r transforms over each dimension, which can be used e.g. for partial
580 differential equations with mixed boundary conditions. (For some r2r
581 kinds, notably the halfcomplex DFT and the DHT, such a separable
582 product is somewhat problematic in more than one dimension, however,
583 as is described below.)
584
585 In the current version of FFTW, all r2r transforms except for the
586 halfcomplex type are computed via pre- or post-processing of
587 halfcomplex transforms, and they are therefore not as fast as they
588 could be. Since most other general DCT/DST codes employ a similar
589 algorithm, however, FFTW's implementation should provide at least
590 competitive performance.
591
592 @c =========>
593 @node The Halfcomplex-format DFT, Real even/odd DFTs (cosine/sine transforms), More DFTs of Real Data, More DFTs of Real Data
594 @subsection The Halfcomplex-format DFT
595
596 An r2r kind of @code{FFTW_R2HC} (@dfn{r2hc}) corresponds to an r2c DFT
597 @ctindex FFTW_R2HC
598 @cindex r2c
599 @cindex r2hc
600 (@pxref{One-Dimensional DFTs of Real Data}) but with ``halfcomplex''
601 format output, and may sometimes be faster and/or more convenient than
602 the latter.
603 @cindex halfcomplex format
604 The inverse @dfn{hc2r} transform is of kind @code{FFTW_HC2R}.
605 @ctindex FFTW_HC2R
606 @cindex hc2r
607 This consists of the non-redundant half of the complex output for a 1d
608 real-input DFT of size @code{n}, stored as a sequence of @code{n} real
609 numbers (@code{double}) in the format:
610
611 @tex
612 $$
613 r_0, r_1, r_2, \ldots, r_{n/2}, i_{(n+1)/2-1}, \ldots, i_2, i_1
614 $$
615 @end tex
616 @ifinfo
617 r0, r1, r2, r(n/2), i((n+1)/2-1), ..., i2, i1
618 @end ifinfo
619 @html
620 <p align=center>
621 r<sub>0</sub>, r<sub>1</sub>, r<sub>2</sub>, ..., r<sub>n/2</sub>, i<sub>(n+1)/2-1</sub>, ..., i<sub>2</sub>, i<sub>1</sub>
622 </p>
623 @end html
624
625 Here,
626 @ifinfo
627 rk
628 @end ifinfo
629 @tex
630 $r_k$
631 @end tex
632 @html
633 r<sub>k</sub>
634 @end html
635 is the real part of the @math{k}th output, and
636 @ifinfo
637 ik
638 @end ifinfo
639 @tex
640 $i_k$
641 @end tex
642 @html
643 i<sub>k</sub>
644 @end html
645 is the imaginary part. (Division by 2 is rounded down.) For a
646 halfcomplex array @code{hc[n]}, the @math{k}th component thus has its
647 real part in @code{hc[k]} and its imaginary part in @code{hc[n-k]}, with
648 the exception of @code{k} @code{==} @code{0} or @code{n/2} (the latter
649 only if @code{n} is even)---in these two cases, the imaginary part is
650 zero due to symmetries of the real-input DFT, and is not stored.
651 Thus, the r2hc transform of @code{n} real values is a halfcomplex array of
652 length @code{n}, and vice versa for hc2r.
653 @cindex normalization
654
655
656 Aside from the differing format, the output of
657 @code{FFTW_R2HC}/@code{FFTW_HC2R} is otherwise exactly the same as for
658 the corresponding 1d r2c/c2r transform
659 (i.e. @code{FFTW_FORWARD}/@code{FFTW_BACKWARD} transforms, respectively).
660 Recall that these transforms are unnormalized, so r2hc followed by hc2r
661 will result in the original data multiplied by @code{n}. Furthermore,
662 like the c2r transform, an out-of-place hc2r transform will
663 @emph{destroy its input} array.
664
665 Although these halfcomplex transforms can be used with the
666 multi-dimensional r2r interface, the interpretation of such a separable
667 product of transforms along each dimension is problematic. For example,
668 consider a two-dimensional @code{n0} by @code{n1}, r2hc by r2hc
669 transform planned by @code{fftw_plan_r2r_2d(n0, n1, in, out, FFTW_R2HC,
670 FFTW_R2HC, FFTW_MEASURE)}. Conceptually, FFTW first transforms the rows
671 (of size @code{n1}) to produce halfcomplex rows, and then transforms the
672 columns (of size @code{n0}). Half of these column transforms, however,
673 are of imaginary parts, and should therefore be multiplied by @math{i}
674 and combined with the r2hc transforms of the real columns to produce the
675 2d DFT amplitudes; FFTW's r2r transform does @emph{not} perform this
676 combination for you. Thus, if a multi-dimensional real-input/output DFT
677 is required, we recommend using the ordinary r2c/c2r
678 interface (@pxref{Multi-Dimensional DFTs of Real Data}).
679
680 @c =========>
681 @node Real even/odd DFTs (cosine/sine transforms), The Discrete Hartley Transform, The Halfcomplex-format DFT, More DFTs of Real Data
682 @subsection Real even/odd DFTs (cosine/sine transforms)
683
684 The Fourier transform of a real-even function @math{f(-x) = f(x)} is
685 real-even, and @math{i} times the Fourier transform of a real-odd
686 function @math{f(-x) = -f(x)} is real-odd. Similar results hold for a
687 discrete Fourier transform, and thus for these symmetries the need for
688 complex inputs/outputs is entirely eliminated. Moreover, one gains a
689 factor of two in speed/space from the fact that the data are real, and
690 an additional factor of two from the even/odd symmetry: only the
691 non-redundant (first) half of the array need be stored. The result is
692 the real-even DFT (@dfn{REDFT}) and the real-odd DFT (@dfn{RODFT}), also
693 known as the discrete cosine and sine transforms (@dfn{DCT} and
694 @dfn{DST}), respectively.
695 @cindex real-even DFT
696 @cindex REDFT
697 @cindex real-odd DFT
698 @cindex RODFT
699 @cindex discrete cosine transform
700 @cindex DCT
701 @cindex discrete sine transform
702 @cindex DST
703
704
705 (In this section, we describe the 1d transforms; multi-dimensional
706 transforms are just a separable product of these transforms operating
707 along each dimension.)
708
709 Because of the discrete sampling, one has an additional choice: is the
710 data even/odd around a sampling point, or around the point halfway
711 between two samples? The latter corresponds to @emph{shifting} the
712 samples by @emph{half} an interval, and gives rise to several transform
713 variants denoted by REDFT@math{ab} and RODFT@math{ab}: @math{a} and
714 @math{b} are @math{0} or @math{1}, and indicate whether the input
715 (@math{a}) and/or output (@math{b}) are shifted by half a sample
716 (@math{1} means it is shifted). These are also known as types I-IV of
717 the DCT and DST, and all four types are supported by FFTW's r2r
718 interface.@footnote{There are also type V-VIII transforms, which
719 correspond to a logical DFT of @emph{odd} size @math{N}, independent of
720 whether the physical size @code{n} is odd, but we do not support these
721 variants.}
722
723 The r2r kinds for the various REDFT and RODFT types supported by FFTW,
724 along with the boundary conditions at both ends of the @emph{input}
725 array (@code{n} real numbers @code{in[j=0..n-1]}), are:
726
727 @itemize @bullet
728
729 @item
730 @code{FFTW_REDFT00} (DCT-I): even around @math{j=0} and even around @math{j=n-1}.
731 @ctindex FFTW_REDFT00
732
733 @item
734 @code{FFTW_REDFT10} (DCT-II, ``the'' DCT): even around @math{j=-0.5} and even around @math{j=n-0.5}.
735 @ctindex FFTW_REDFT10
736
737 @item
738 @code{FFTW_REDFT01} (DCT-III, ``the'' IDCT): even around @math{j=0} and odd around @math{j=n}.
739 @ctindex FFTW_REDFT01
740 @cindex IDCT
741
742 @item
743 @code{FFTW_REDFT11} (DCT-IV): even around @math{j=-0.5} and odd around @math{j=n-0.5}.
744 @ctindex FFTW_REDFT11
745
746 @item
747 @code{FFTW_RODFT00} (DST-I): odd around @math{j=-1} and odd around @math{j=n}.
748 @ctindex FFTW_RODFT00
749
750 @item
751 @code{FFTW_RODFT10} (DST-II): odd around @math{j=-0.5} and odd around @math{j=n-0.5}.
752 @ctindex FFTW_RODFT10
753
754 @item
755 @code{FFTW_RODFT01} (DST-III): odd around @math{j=-1} and even around @math{j=n-1}.
756 @ctindex FFTW_RODFT01
757
758 @item
759 @code{FFTW_RODFT11} (DST-IV): odd around @math{j=-0.5} and even around @math{j=n-0.5}.
760 @ctindex FFTW_RODFT11
761
762 @end itemize
763
764 Note that these symmetries apply to the ``logical'' array being
765 transformed; @strong{there are no constraints on your physical input
766 data}. So, for example, if you specify a size-5 REDFT00 (DCT-I) of the
767 data @math{abcde}, it corresponds to the DFT of the logical even array
768 @math{abcdedcb} of size 8. A size-4 REDFT10 (DCT-II) of the data
769 @math{abcd} corresponds to the size-8 logical DFT of the even array
770 @math{abcddcba}, shifted by half a sample.
771
772 All of these transforms are invertible. The inverse of R*DFT00 is
773 R*DFT00; of R*DFT10 is R*DFT01 and vice versa (these are often called
774 simply ``the'' DCT and IDCT, respectively); and of R*DFT11 is R*DFT11.
775 However, the transforms computed by FFTW are unnormalized, exactly
776 like the corresponding real and complex DFTs, so computing a transform
777 followed by its inverse yields the original array scaled by @math{N},
778 where @math{N} is the @emph{logical} DFT size. For REDFT00,
779 @math{N=2(n-1)}; for RODFT00, @math{N=2(n+1)}; otherwise, @math{N=2n}.
780 @cindex normalization
781 @cindex IDCT
782
783
784 Note that the boundary conditions of the transform output array are
785 given by the input boundary conditions of the inverse transform.
786 Thus, the above transforms are all inequivalent in terms of
787 input/output boundary conditions, even neglecting the 0.5 shift
788 difference.
789
790 FFTW is most efficient when @math{N} is a product of small factors; note
791 that this @emph{differs} from the factorization of the physical size
792 @code{n} for REDFT00 and RODFT00! There is another oddity: @code{n=1}
793 REDFT00 transforms correspond to @math{N=0}, and so are @emph{not
794 defined} (the planner will return @code{NULL}). Otherwise, any positive
795 @code{n} is supported.
796
797 For the precise mathematical definitions of these transforms as used by
798 FFTW, see @ref{What FFTW Really Computes}. (For people accustomed to
799 the DCT/DST, FFTW's definitions have a coefficient of @math{2} in front
800 of the cos/sin functions so that they correspond precisely to an
801 even/odd DFT of size @math{N}. Some authors also include additional
802 multiplicative factors of
803 @ifinfo
804 sqrt(2)
805 @end ifinfo
806 @html
807 &radic;2
808 @end html
809 @tex
810 $\sqrt{2}$
811 @end tex
812 for selected inputs and outputs; this makes
813 the transform orthogonal, but sacrifices the direct equivalence to a
814 symmetric DFT.)
815
816 @subsubheading Which type do you need?
817
818 Since the required flavor of even/odd DFT depends upon your problem,
819 you are the best judge of this choice, but we can make a few comments
820 on relative efficiency to help you in your selection. In particular,
821 R*DFT01 and R*DFT10 tend to be slightly faster than R*DFT11
822 (especially for odd sizes), while the R*DFT00 transforms are sometimes
823 significantly slower (especially for even sizes).@footnote{R*DFT00 is
824 sometimes slower in FFTW because we discovered that the standard
825 algorithm for computing this by a pre/post-processed real DFT---the
826 algorithm used in FFTPACK, Numerical Recipes, and other sources for
827 decades now---has serious numerical problems: it already loses several
828 decimal places of accuracy for 16k sizes. There seem to be only two
829 alternatives in the literature that do not suffer similarly: a
830 recursive decomposition into smaller DCTs, which would require a large
831 set of codelets for efficiency and generality, or sacrificing a factor of
832 @tex
833 $\sim 2$
834 @end tex
835 @ifnottex
836 2
837 @end ifnottex
838 in speed to use a real DFT of twice the size. We currently
839 employ the latter technique for general @math{n}, as well as a limited
840 form of the former method: a split-radix decomposition when @math{n}
841 is odd (@math{N} a multiple of 4). For @math{N} containing many
842 factors of 2, the split-radix method seems to recover most of the
843 speed of the standard algorithm without the accuracy tradeoff.}
844
845 Thus, if only the boundary conditions on the transform inputs are
846 specified, we generally recommend R*DFT10 over R*DFT00 and R*DFT01 over
847 R*DFT11 (unless the half-sample shift or the self-inverse property is
848 significant for your problem).
849
850 If performance is important to you and you are using only small sizes
851 (say @math{n<200}), e.g. for multi-dimensional transforms, then you
852 might consider generating hard-coded transforms of those sizes and types
853 that you are interested in (@pxref{Generating your own code}).
854
855 We are interested in hearing what types of symmetric transforms you find
856 most useful.
857
858 @c =========>
859 @node The Discrete Hartley Transform, , Real even/odd DFTs (cosine/sine transforms), More DFTs of Real Data
860 @subsection The Discrete Hartley Transform
861
862 If you are planning to use the DHT because you've heard that it is
863 ``faster'' than the DFT (FFT), @strong{stop here}. The DHT is not
864 faster than the DFT. That story is an old but enduring misconception
865 that was debunked in 1987.
866
867 The discrete Hartley transform (DHT) is an invertible linear transform
868 closely related to the DFT. In the DFT, one multiplies each input by
869 @math{cos - i * sin} (a complex exponential), whereas in the DHT each
870 input is multiplied by simply @math{cos + sin}. Thus, the DHT
871 transforms @code{n} real numbers to @code{n} real numbers, and has the
872 convenient property of being its own inverse. In FFTW, a DHT (of any
873 positive @code{n}) can be specified by an r2r kind of @code{FFTW_DHT}.
874 @ctindex FFTW_DHT
875 @cindex discrete Hartley transform
876 @cindex DHT
877
878 Like the DFT, in FFTW the DHT is unnormalized, so computing a DHT of
879 size @code{n} followed by another DHT of the same size will result in
880 the original array multiplied by @code{n}.
881 @cindex normalization
882
883 The DHT was originally proposed as a more efficient alternative to the
884 DFT for real data, but it was subsequently shown that a specialized DFT
885 (such as FFTW's r2hc or r2c transforms) could be just as fast. In FFTW,
886 the DHT is actually computed by post-processing an r2hc transform, so
887 there is ordinarily no reason to prefer it from a performance
888 perspective.@footnote{We provide the DHT mainly as a byproduct of some
889 internal algorithms. FFTW computes a real input/output DFT of
890 @emph{prime} size by re-expressing it as a DHT plus post/pre-processing
891 and then using Rader's prime-DFT algorithm adapted to the DHT.}
892 However, we have heard rumors that the DHT might be the most appropriate
893 transform in its own right for certain applications, and we would be
894 very interested to hear from anyone who finds it useful.
895
896 If @code{FFTW_DHT} is specified for multiple dimensions of a
897 multi-dimensional transform, FFTW computes the separable product of 1d
898 DHTs along each dimension. Unfortunately, this is not quite the same
899 thing as a true multi-dimensional DHT; you can compute the latter, if
900 necessary, with at most @code{rank-1} post-processing passes
901 [see e.g. H. Hao and R. N. Bracewell, @i{Proc. IEEE} @b{75}, 264--266 (1987)].
902
903 For the precise mathematical definition of the DHT as used by FFTW, see
904 @ref{What FFTW Really Computes}.
905