Mercurial > hg > sv-dependency-builds
comparison src/fftw-3.3.5/doc/reference.texi @ 127:7867fa7e1b6b
Current fftw source
author | Chris Cannam <cannam@all-day-breakfast.com> |
---|---|
date | Tue, 18 Oct 2016 13:40:26 +0100 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
126:4a7071416412 | 127:7867fa7e1b6b |
---|---|
1 @node FFTW Reference, Multi-threaded FFTW, Other Important Topics, Top | |
2 @chapter FFTW Reference | |
3 | |
4 This chapter provides a complete reference for all sequential (i.e., | |
5 one-processor) FFTW functions. Parallel transforms are described in | |
6 later chapters. | |
7 | |
8 @menu | |
9 * Data Types and Files:: | |
10 * Using Plans:: | |
11 * Basic Interface:: | |
12 * Advanced Interface:: | |
13 * Guru Interface:: | |
14 * New-array Execute Functions:: | |
15 * Wisdom:: | |
16 * What FFTW Really Computes:: | |
17 @end menu | |
18 | |
19 @c ------------------------------------------------------------ | |
20 @node Data Types and Files, Using Plans, FFTW Reference, FFTW Reference | |
21 @section Data Types and Files | |
22 | |
23 All programs using FFTW should include its header file: | |
24 | |
25 @example | |
26 #include <fftw3.h> | |
27 @end example | |
28 | |
29 You must also link to the FFTW library. On Unix, this | |
30 means adding @code{-lfftw3 -lm} at the @emph{end} of the link command. | |
31 | |
32 @menu | |
33 * Complex numbers:: | |
34 * Precision:: | |
35 * Memory Allocation:: | |
36 @end menu | |
37 | |
38 @c =========> | |
39 @node Complex numbers, Precision, Data Types and Files, Data Types and Files | |
40 @subsection Complex numbers | |
41 | |
42 The default FFTW interface uses @code{double} precision for all | |
43 floating-point numbers, and defines a @code{fftw_complex} type to hold | |
44 complex numbers as: | |
45 | |
46 @example | |
47 typedef double fftw_complex[2]; | |
48 @end example | |
49 @tindex fftw_complex | |
50 | |
51 Here, the @code{[0]} element holds the real part and the @code{[1]} | |
52 element holds the imaginary part. | |
53 | |
54 Alternatively, if you have a C compiler (such as @code{gcc}) that | |
55 supports the C99 revision of the ANSI C standard, you can use C's new | |
56 native complex type (which is binary-compatible with the typedef above). | |
57 In particular, if you @code{#include <complex.h>} @emph{before} | |
58 @code{<fftw3.h>}, then @code{fftw_complex} is defined to be the native | |
59 complex type and you can manipulate it with ordinary arithmetic | |
60 (e.g. @code{x = y * (3+4*I)}, where @code{x} and @code{y} are | |
61 @code{fftw_complex} and @code{I} is the standard symbol for the | |
62 imaginary unit); | |
63 @cindex C99 | |
64 | |
65 | |
66 C++ has its own @code{complex<T>} template class, defined in the | |
67 standard @code{<complex>} header file. Reportedly, the C++ standards | |
68 committee has recently agreed to mandate that the storage format used | |
69 for this type be binary-compatible with the C99 type, i.e. an array | |
70 @code{T[2]} with consecutive real @code{[0]} and imaginary @code{[1]} | |
71 parts. (See report | |
72 @uref{http://www.open-std.org/jtc1/sc22/WG21/docs/papers/2002/n1388.pdf | |
73 WG21/N1388}.) Although not part of the official standard as of this | |
74 writing, the proposal stated that: ``This solution has been tested with | |
75 all current major implementations of the standard library and shown to | |
76 be working.'' To the extent that this is true, if you have a variable | |
77 @code{complex<double> *x}, you can pass it directly to FFTW via | |
78 @code{reinterpret_cast<fftw_complex*>(x)}. | |
79 @cindex C++ | |
80 @cindex portability | |
81 | |
82 @c =========> | |
83 @node Precision, Memory Allocation, Complex numbers, Data Types and Files | |
84 @subsection Precision | |
85 @cindex precision | |
86 | |
87 You can install single and long-double precision versions of FFTW, | |
88 which replace @code{double} with @code{float} and @code{long double}, | |
89 respectively (@pxref{Installation and Customization}). To use these | |
90 interfaces, you: | |
91 | |
92 @itemize @bullet | |
93 | |
94 @item | |
95 Link to the single/long-double libraries; on Unix, @code{-lfftw3f} or | |
96 @code{-lfftw3l} instead of (or in addition to) @code{-lfftw3}. (You | |
97 can link to the different-precision libraries simultaneously.) | |
98 | |
99 @item | |
100 Include the @emph{same} @code{<fftw3.h>} header file. | |
101 | |
102 @item | |
103 Replace all lowercase instances of @samp{fftw_} with @samp{fftwf_} or | |
104 @samp{fftwl_} for single or long-double precision, respectively. | |
105 (@code{fftw_complex} becomes @code{fftwf_complex}, @code{fftw_execute} | |
106 becomes @code{fftwf_execute}, etcetera.) | |
107 | |
108 @item | |
109 Uppercase names, i.e. names beginning with @samp{FFTW_}, remain the | |
110 same. | |
111 | |
112 @item | |
113 Replace @code{double} with @code{float} or @code{long double} for | |
114 subroutine parameters. | |
115 | |
116 @end itemize | |
117 | |
118 Depending upon your compiler and/or hardware, @code{long double} may not | |
119 be any more precise than @code{double} (or may not be supported at all, | |
120 although it is standard in C99). | |
121 @cindex C99 | |
122 | |
123 | |
124 We also support using the nonstandard @code{__float128} | |
125 quadruple-precision type provided by recent versions of @code{gcc} on | |
126 32- and 64-bit x86 hardware (@pxref{Installation and Customization}). | |
127 To use this type, link with @code{-lfftw3q -lquadmath -lm} (the | |
128 @code{libquadmath} library provided by @code{gcc} is needed for | |
129 quadruple-precision trigonometric functions) and use @samp{fftwq_} | |
130 identifiers. | |
131 | |
132 @c =========> | |
133 @node Memory Allocation, , Precision, Data Types and Files | |
134 @subsection Memory Allocation | |
135 | |
136 @example | |
137 void *fftw_malloc(size_t n); | |
138 void fftw_free(void *p); | |
139 @end example | |
140 @findex fftw_malloc | |
141 @findex fftw_free | |
142 | |
143 These are functions that behave identically to @code{malloc} and | |
144 @code{free}, except that they guarantee that the returned pointer obeys | |
145 any special alignment restrictions imposed by any algorithm in FFTW | |
146 (e.g. for SIMD acceleration). @xref{SIMD alignment and fftw_malloc}. | |
147 @cindex alignment | |
148 | |
149 | |
150 Data allocated by @code{fftw_malloc} @emph{must} be deallocated by | |
151 @code{fftw_free} and not by the ordinary @code{free}. | |
152 | |
153 These routines simply call through to your operating system's | |
154 @code{malloc} or, if necessary, its aligned equivalent | |
155 (e.g. @code{memalign}), so you normally need not worry about any | |
156 significant time or space overhead. You are @emph{not required} to use | |
157 them to allocate your data, but we strongly recommend it. | |
158 | |
159 Note: in C++, just as with ordinary @code{malloc}, you must typecast | |
160 the output of @code{fftw_malloc} to whatever pointer type you are | |
161 allocating. | |
162 @cindex C++ | |
163 | |
164 | |
165 We also provide the following two convenience functions to allocate | |
166 real and complex arrays with @code{n} elements, which are equivalent | |
167 to @code{(double *) fftw_malloc(sizeof(double) * n)} and | |
168 @code{(fftw_complex *) fftw_malloc(sizeof(fftw_complex) * n)}, | |
169 respectively: | |
170 | |
171 @example | |
172 double *fftw_alloc_real(size_t n); | |
173 fftw_complex *fftw_alloc_complex(size_t n); | |
174 @end example | |
175 @findex fftw_alloc_real | |
176 @findex fftw_alloc_complex | |
177 | |
178 The equivalent functions in other precisions allocate arrays of @code{n} | |
179 elements in that precision. e.g. @code{fftwf_alloc_real(n)} is | |
180 equivalent to @code{(float *) fftwf_malloc(sizeof(float) * n)}. | |
181 @cindex precision | |
182 | |
183 @c ------------------------------------------------------------ | |
184 @node Using Plans, Basic Interface, Data Types and Files, FFTW Reference | |
185 @section Using Plans | |
186 | |
187 Plans for all transform types in FFTW are stored as type | |
188 @code{fftw_plan} (an opaque pointer type), and are created by one of the | |
189 various planning routines described in the following sections. | |
190 @tindex fftw_plan | |
191 An @code{fftw_plan} contains all information necessary to compute the | |
192 transform, including the pointers to the input and output arrays. | |
193 | |
194 @example | |
195 void fftw_execute(const fftw_plan plan); | |
196 @end example | |
197 @findex fftw_execute | |
198 | |
199 This executes the @code{plan}, to compute the corresponding transform on | |
200 the arrays for which it was planned (which must still exist). The plan | |
201 is not modified, and @code{fftw_execute} can be called as many times as | |
202 desired. | |
203 | |
204 To apply a given plan to a different array, you can use the new-array execute | |
205 interface. @xref{New-array Execute Functions}. | |
206 | |
207 @code{fftw_execute} (and equivalents) is the only function in FFTW | |
208 guaranteed to be thread-safe; see @ref{Thread safety}. | |
209 | |
210 This function: | |
211 @example | |
212 void fftw_destroy_plan(fftw_plan plan); | |
213 @end example | |
214 @findex fftw_destroy_plan | |
215 deallocates the @code{plan} and all its associated data. | |
216 | |
217 FFTW's planner saves some other persistent data, such as the | |
218 accumulated wisdom and a list of algorithms available in the current | |
219 configuration. If you want to deallocate all of that and reset FFTW | |
220 to the pristine state it was in when you started your program, you can | |
221 call: | |
222 | |
223 @example | |
224 void fftw_cleanup(void); | |
225 @end example | |
226 @findex fftw_cleanup | |
227 | |
228 After calling @code{fftw_cleanup}, all existing plans become undefined, | |
229 and you should not attempt to execute them nor to destroy them. You can | |
230 however create and execute/destroy new plans, in which case FFTW starts | |
231 accumulating wisdom information again. | |
232 | |
233 @code{fftw_cleanup} does not deallocate your plans, however. To prevent | |
234 memory leaks, you must still call @code{fftw_destroy_plan} before | |
235 executing @code{fftw_cleanup}. | |
236 | |
237 Occasionally, it may useful to know FFTW's internal ``cost'' metric | |
238 that it uses to compare plans to one another; this cost is | |
239 proportional to an execution time of the plan, in undocumented units, | |
240 if the plan was created with the @code{FFTW_MEASURE} or other | |
241 timing-based options, or alternatively is a heuristic cost function | |
242 for @code{FFTW_ESTIMATE} plans. (The cost values of measured and | |
243 estimated plans are not comparable, being in different units. Also, | |
244 costs from different FFTW versions or the same version compiled | |
245 differently may not be in the same units. Plans created from wisdom | |
246 have a cost of 0 since no timing measurement is performed for them. | |
247 Finally, certain problems for which only one top-level algorithm was | |
248 possible may have required no measurements of the cost of the whole | |
249 plan, in which case @code{fftw_cost} will also return 0.) The cost | |
250 metric for a given plan is returned by: | |
251 | |
252 @example | |
253 double fftw_cost(const fftw_plan plan); | |
254 @end example | |
255 @findex fftw_cost | |
256 | |
257 The following two routines are provided purely for academic purposes | |
258 (that is, for entertainment). | |
259 | |
260 @example | |
261 void fftw_flops(const fftw_plan plan, | |
262 double *add, double *mul, double *fma); | |
263 @end example | |
264 @findex fftw_flops | |
265 | |
266 Given a @code{plan}, set @code{add}, @code{mul}, and @code{fma} to an | |
267 exact count of the number of floating-point additions, multiplications, | |
268 and fused multiply-add operations involved in the plan's execution. The | |
269 total number of floating-point operations (flops) is @code{add + mul + | |
270 2*fma}, or @code{add + mul + fma} if the hardware supports fused | |
271 multiply-add instructions (although the number of FMA operations is only | |
272 approximate because of compiler voodoo). (The number of operations | |
273 should be an integer, but we use @code{double} to avoid overflowing | |
274 @code{int} for large transforms; the arguments are of type @code{double} | |
275 even for single and long-double precision versions of FFTW.) | |
276 | |
277 @example | |
278 void fftw_fprint_plan(const fftw_plan plan, FILE *output_file); | |
279 void fftw_print_plan(const fftw_plan plan); | |
280 char *fftw_sprint_plan(const fftw_plan plan); | |
281 @end example | |
282 @findex fftw_fprint_plan | |
283 @findex fftw_print_plan | |
284 | |
285 This outputs a ``nerd-readable'' representation of the @code{plan} to | |
286 the given file, to @code{stdout}, or two a newly allocated | |
287 NUL-terminated string (which the caller is responsible for deallocating | |
288 with @code{free}), respectively. | |
289 | |
290 @c ------------------------------------------------------------ | |
291 @node Basic Interface, Advanced Interface, Using Plans, FFTW Reference | |
292 @section Basic Interface | |
293 @cindex basic interface | |
294 | |
295 Recall that the FFTW API is divided into three parts@footnote{@i{Gallia est | |
296 omnis divisa in partes tres} (Julius Caesar).}: the @dfn{basic interface} | |
297 computes a single transform of contiguous data, the @dfn{advanced | |
298 interface} computes transforms of multiple or strided arrays, and the | |
299 @dfn{guru interface} supports the most general data layouts, | |
300 multiplicities, and strides. This section describes the the basic | |
301 interface, which we expect to satisfy the needs of most users. | |
302 | |
303 @menu | |
304 * Complex DFTs:: | |
305 * Planner Flags:: | |
306 * Real-data DFTs:: | |
307 * Real-data DFT Array Format:: | |
308 * Real-to-Real Transforms:: | |
309 * Real-to-Real Transform Kinds:: | |
310 @end menu | |
311 | |
312 @c =========> | |
313 @node Complex DFTs, Planner Flags, Basic Interface, Basic Interface | |
314 @subsection Complex DFTs | |
315 | |
316 @example | |
317 fftw_plan fftw_plan_dft_1d(int n0, | |
318 fftw_complex *in, fftw_complex *out, | |
319 int sign, unsigned flags); | |
320 fftw_plan fftw_plan_dft_2d(int n0, int n1, | |
321 fftw_complex *in, fftw_complex *out, | |
322 int sign, unsigned flags); | |
323 fftw_plan fftw_plan_dft_3d(int n0, int n1, int n2, | |
324 fftw_complex *in, fftw_complex *out, | |
325 int sign, unsigned flags); | |
326 fftw_plan fftw_plan_dft(int rank, const int *n, | |
327 fftw_complex *in, fftw_complex *out, | |
328 int sign, unsigned flags); | |
329 @end example | |
330 @findex fftw_plan_dft_1d | |
331 @findex fftw_plan_dft_2d | |
332 @findex fftw_plan_dft_3d | |
333 @findex fftw_plan_dft | |
334 | |
335 Plan a complex input/output discrete Fourier transform (DFT) in zero or | |
336 more dimensions, returning an @code{fftw_plan} (@pxref{Using Plans}). | |
337 | |
338 Once you have created a plan for a certain transform type and | |
339 parameters, then creating another plan of the same type and parameters, | |
340 but for different arrays, is fast and shares constant data with the | |
341 first plan (if it still exists). | |
342 | |
343 The planner returns @code{NULL} if the plan cannot be created. In the | |
344 standard FFTW distribution, the basic interface is guaranteed to return | |
345 a non-@code{NULL} plan. A plan may be @code{NULL}, however, if you are | |
346 using a customized FFTW configuration supporting a restricted set of | |
347 transforms. | |
348 | |
349 @subsubheading Arguments | |
350 @itemize @bullet | |
351 | |
352 @item | |
353 @code{rank} is the rank of the transform (it should be the size of the | |
354 array @code{*n}), and can be any non-negative integer. (@xref{Complex | |
355 Multi-Dimensional DFTs}, for the definition of ``rank''.) The | |
356 @samp{_1d}, @samp{_2d}, and @samp{_3d} planners correspond to a | |
357 @code{rank} of @code{1}, @code{2}, and @code{3}, respectively. The rank | |
358 may be zero, which is equivalent to a rank-1 transform of size 1, i.e. a | |
359 copy of one number from input to output. | |
360 | |
361 @item | |
362 @code{n0}, @code{n1}, @code{n2}, or @code{n[0..rank-1]} (as appropriate | |
363 for each routine) specify the size of the transform dimensions. They | |
364 can be any positive integer. | |
365 | |
366 @itemize @minus | |
367 @item | |
368 @cindex row-major | |
369 Multi-dimensional arrays are stored in row-major order with dimensions: | |
370 @code{n0} x @code{n1}; or @code{n0} x @code{n1} x @code{n2}; or | |
371 @code{n[0]} x @code{n[1]} x ... x @code{n[rank-1]}. | |
372 @xref{Multi-dimensional Array Format}. | |
373 @item | |
374 FFTW is best at handling sizes of the form | |
375 @ifinfo | |
376 @math{2^a 3^b 5^c 7^d 11^e 13^f}, | |
377 @end ifinfo | |
378 @tex | |
379 $2^a 3^b 5^c 7^d 11^e 13^f$, | |
380 @end tex | |
381 @html | |
382 2<sup>a</sup> 3<sup>b</sup> 5<sup>c</sup> 7<sup>d</sup> | |
383 11<sup>e</sup> 13<sup>f</sup>, | |
384 @end html | |
385 where @math{e+f} is either @math{0} or @math{1}, and the other exponents | |
386 are arbitrary. Other sizes are computed by means of a slow, | |
387 general-purpose algorithm (which nevertheless retains @Onlogn{} performance even for prime sizes). It is possible to customize FFTW | |
388 for different array sizes; see @ref{Installation and Customization}. | |
389 Transforms whose sizes are powers of @math{2} are especially fast. | |
390 @end itemize | |
391 | |
392 @item | |
393 @code{in} and @code{out} point to the input and output arrays of the | |
394 transform, which may be the same (yielding an in-place transform). | |
395 @cindex in-place | |
396 These arrays are overwritten during planning, unless | |
397 @code{FFTW_ESTIMATE} is used in the flags. (The arrays need not be | |
398 initialized, but they must be allocated.) | |
399 | |
400 If @code{in == out}, the transform is @dfn{in-place} and the input | |
401 array is overwritten. If @code{in != out}, the two arrays must | |
402 not overlap (but FFTW does not check for this condition). | |
403 | |
404 @item | |
405 @ctindex FFTW_FORWARD | |
406 @ctindex FFTW_BACKWARD | |
407 @code{sign} is the sign of the exponent in the formula that defines the | |
408 Fourier transform. It can be @math{-1} (= @code{FFTW_FORWARD}) or | |
409 @math{+1} (= @code{FFTW_BACKWARD}). | |
410 | |
411 @item | |
412 @cindex flags | |
413 @code{flags} is a bitwise OR (@samp{|}) of zero or more planner flags, | |
414 as defined in @ref{Planner Flags}. | |
415 | |
416 @end itemize | |
417 | |
418 FFTW computes an unnormalized transform: computing a forward followed by | |
419 a backward transform (or vice versa) will result in the original data | |
420 multiplied by the size of the transform (the product of the dimensions). | |
421 @cindex normalization | |
422 For more information, see @ref{What FFTW Really Computes}. | |
423 | |
424 @c =========> | |
425 @node Planner Flags, Real-data DFTs, Complex DFTs, Basic Interface | |
426 @subsection Planner Flags | |
427 | |
428 All of the planner routines in FFTW accept an integer @code{flags} | |
429 argument, which is a bitwise OR (@samp{|}) of zero or more of the flag | |
430 constants defined below. These flags control the rigor (and time) of | |
431 the planning process, and can also impose (or lift) restrictions on the | |
432 type of transform algorithm that is employed. | |
433 | |
434 @emph{Important:} the planner overwrites the input array during | |
435 planning unless a saved plan (@pxref{Wisdom}) is available for that | |
436 problem, so you should initialize your input data after creating the | |
437 plan. The only exceptions to this are the @code{FFTW_ESTIMATE} and | |
438 @code{FFTW_WISDOM_ONLY} flags, as mentioned below. | |
439 | |
440 In all cases, if wisdom is available for the given problem that was | |
441 created with equal-or-greater planning rigor, then the more rigorous | |
442 wisdom is used. For example, in @code{FFTW_ESTIMATE} mode any available | |
443 wisdom is used, whereas in @code{FFTW_PATIENT} mode only wisdom created | |
444 in patient or exhaustive mode can be used. @xref{Words of Wisdom-Saving | |
445 Plans}. | |
446 | |
447 @subsubheading Planning-rigor flags | |
448 @itemize @bullet | |
449 | |
450 @item | |
451 @ctindex FFTW_ESTIMATE | |
452 @code{FFTW_ESTIMATE} specifies that, instead of actual measurements of | |
453 different algorithms, a simple heuristic is used to pick a (probably | |
454 sub-optimal) plan quickly. With this flag, the input/output arrays are | |
455 not overwritten during planning. | |
456 | |
457 @item | |
458 @ctindex FFTW_MEASURE | |
459 @code{FFTW_MEASURE} tells FFTW to find an optimized plan by actually | |
460 @emph{computing} several FFTs and measuring their execution time. | |
461 Depending on your machine, this can take some time (often a few | |
462 seconds). @code{FFTW_MEASURE} is the default planning option. | |
463 | |
464 @item | |
465 @ctindex FFTW_PATIENT | |
466 @code{FFTW_PATIENT} is like @code{FFTW_MEASURE}, but considers a wider | |
467 range of algorithms and often produces a ``more optimal'' plan | |
468 (especially for large transforms), but at the expense of several times | |
469 longer planning time (especially for large transforms). | |
470 | |
471 @item | |
472 @ctindex FFTW_EXHAUSTIVE | |
473 @code{FFTW_EXHAUSTIVE} is like @code{FFTW_PATIENT}, but considers an | |
474 even wider range of algorithms, including many that we think are | |
475 unlikely to be fast, to produce the most optimal plan but with a | |
476 substantially increased planning time. | |
477 | |
478 @item | |
479 @ctindex FFTW_WISDOM_ONLY | |
480 @code{FFTW_WISDOM_ONLY} is a special planning mode in which the plan | |
481 is only created if wisdom is available for the given problem, and | |
482 otherwise a @code{NULL} plan is returned. This can be combined with | |
483 other flags, e.g. @samp{FFTW_WISDOM_ONLY | FFTW_PATIENT} creates a | |
484 plan only if wisdom is available that was created in | |
485 @code{FFTW_PATIENT} or @code{FFTW_EXHAUSTIVE} mode. The | |
486 @code{FFTW_WISDOM_ONLY} flag is intended for users who need to detect | |
487 whether wisdom is available; for example, if wisdom is not available | |
488 one may wish to allocate new arrays for planning so that user data is | |
489 not overwritten. | |
490 | |
491 @end itemize | |
492 | |
493 @subsubheading Algorithm-restriction flags | |
494 @itemize @bullet | |
495 | |
496 @item | |
497 @ctindex FFTW_DESTROY_INPUT | |
498 @code{FFTW_DESTROY_INPUT} specifies that an out-of-place transform is | |
499 allowed to @emph{overwrite its input} array with arbitrary data; this | |
500 can sometimes allow more efficient algorithms to be employed. | |
501 @cindex out-of-place | |
502 | |
503 @item | |
504 @ctindex FFTW_PRESERVE_INPUT | |
505 @code{FFTW_PRESERVE_INPUT} specifies that an out-of-place transform must | |
506 @emph{not change its input} array. This is ordinarily the | |
507 @emph{default}, except for c2r and hc2r (i.e. complex-to-real) | |
508 transforms for which @code{FFTW_DESTROY_INPUT} is the default. In the | |
509 latter cases, passing @code{FFTW_PRESERVE_INPUT} will attempt to use | |
510 algorithms that do not destroy the input, at the expense of worse | |
511 performance; for multi-dimensional c2r transforms, however, no | |
512 input-preserving algorithms are implemented and the planner will return | |
513 @code{NULL} if one is requested. | |
514 @cindex c2r | |
515 @cindex hc2r | |
516 | |
517 @item | |
518 @ctindex FFTW_UNALIGNED | |
519 @cindex alignment | |
520 @findex fftw_malloc | |
521 @findex fftw_alignment_of | |
522 @code{FFTW_UNALIGNED} specifies that the algorithm may not impose any | |
523 unusual alignment requirements on the input/output arrays (i.e. no | |
524 SIMD may be used). This flag is normally @emph{not necessary}, since | |
525 the planner automatically detects misaligned arrays. The only use for | |
526 this flag is if you want to use the new-array execute interface to | |
527 execute a given plan on a different array that may not be aligned like | |
528 the original. (Using @code{fftw_malloc} makes this flag unnecessary | |
529 even then. You can also use @code{fftw_alignment_of} to detect | |
530 whether two arrays are equivalently aligned.) | |
531 | |
532 @end itemize | |
533 | |
534 @subsubheading Limiting planning time | |
535 | |
536 @example | |
537 extern void fftw_set_timelimit(double seconds); | |
538 @end example | |
539 @findex fftw_set_timelimit | |
540 | |
541 This function instructs FFTW to spend at most @code{seconds} seconds | |
542 (approximately) in the planner. If @code{seconds == | |
543 FFTW_NO_TIMELIMIT} (the default value, which is negative), then | |
544 planning time is unbounded. Otherwise, FFTW plans with a | |
545 progressively wider range of algorithms until the the given time limit | |
546 is reached or the given range of algorithms is explored, returning the | |
547 best available plan. | |
548 @ctindex FFTW_NO_TIMELIMIT | |
549 | |
550 | |
551 For example, specifying @code{FFTW_PATIENT} first plans in | |
552 @code{FFTW_ESTIMATE} mode, then in @code{FFTW_MEASURE} mode, then | |
553 finally (time permitting) in @code{FFTW_PATIENT}. If | |
554 @code{FFTW_EXHAUSTIVE} is specified instead, the planner will further | |
555 progress to @code{FFTW_EXHAUSTIVE} mode. | |
556 | |
557 Note that the @code{seconds} argument specifies only a rough limit; in | |
558 practice, the planner may use somewhat more time if the time limit is | |
559 reached when the planner is in the middle of an operation that cannot | |
560 be interrupted. At the very least, the planner will complete planning | |
561 in @code{FFTW_ESTIMATE} mode (which is thus equivalent to a time limit | |
562 of 0). | |
563 | |
564 | |
565 @c =========> | |
566 @node Real-data DFTs, Real-data DFT Array Format, Planner Flags, Basic Interface | |
567 @subsection Real-data DFTs | |
568 | |
569 @example | |
570 fftw_plan fftw_plan_dft_r2c_1d(int n0, | |
571 double *in, fftw_complex *out, | |
572 unsigned flags); | |
573 fftw_plan fftw_plan_dft_r2c_2d(int n0, int n1, | |
574 double *in, fftw_complex *out, | |
575 unsigned flags); | |
576 fftw_plan fftw_plan_dft_r2c_3d(int n0, int n1, int n2, | |
577 double *in, fftw_complex *out, | |
578 unsigned flags); | |
579 fftw_plan fftw_plan_dft_r2c(int rank, const int *n, | |
580 double *in, fftw_complex *out, | |
581 unsigned flags); | |
582 @end example | |
583 @findex fftw_plan_dft_r2c_1d | |
584 @findex fftw_plan_dft_r2c_2d | |
585 @findex fftw_plan_dft_r2c_3d | |
586 @findex fftw_plan_dft_r2c | |
587 @cindex r2c | |
588 | |
589 Plan a real-input/complex-output discrete Fourier transform (DFT) in | |
590 zero or more dimensions, returning an @code{fftw_plan} (@pxref{Using | |
591 Plans}). | |
592 | |
593 Once you have created a plan for a certain transform type and | |
594 parameters, then creating another plan of the same type and parameters, | |
595 but for different arrays, is fast and shares constant data with the | |
596 first plan (if it still exists). | |
597 | |
598 The planner returns @code{NULL} if the plan cannot be created. A | |
599 non-@code{NULL} plan is always returned by the basic interface unless | |
600 you are using a customized FFTW configuration supporting a restricted | |
601 set of transforms, or if you use the @code{FFTW_PRESERVE_INPUT} flag | |
602 with a multi-dimensional out-of-place c2r transform (see below). | |
603 | |
604 @subsubheading Arguments | |
605 @itemize @bullet | |
606 | |
607 @item | |
608 @code{rank} is the rank of the transform (it should be the size of the | |
609 array @code{*n}), and can be any non-negative integer. (@xref{Complex | |
610 Multi-Dimensional DFTs}, for the definition of ``rank''.) The | |
611 @samp{_1d}, @samp{_2d}, and @samp{_3d} planners correspond to a | |
612 @code{rank} of @code{1}, @code{2}, and @code{3}, respectively. The rank | |
613 may be zero, which is equivalent to a rank-1 transform of size 1, i.e. a | |
614 copy of one real number (with zero imaginary part) from input to output. | |
615 | |
616 @item | |
617 @code{n0}, @code{n1}, @code{n2}, or @code{n[0..rank-1]}, (as appropriate | |
618 for each routine) specify the size of the transform dimensions. They | |
619 can be any positive integer. This is different in general from the | |
620 @emph{physical} array dimensions, which are described in @ref{Real-data | |
621 DFT Array Format}. | |
622 | |
623 @itemize @minus | |
624 @item | |
625 FFTW is best at handling sizes of the form | |
626 @ifinfo | |
627 @math{2^a 3^b 5^c 7^d 11^e 13^f}, | |
628 @end ifinfo | |
629 @tex | |
630 $2^a 3^b 5^c 7^d 11^e 13^f$, | |
631 @end tex | |
632 @html | |
633 2<sup>a</sup> 3<sup>b</sup> 5<sup>c</sup> 7<sup>d</sup> | |
634 11<sup>e</sup> 13<sup>f</sup>, | |
635 @end html | |
636 where @math{e+f} is either @math{0} or @math{1}, and the other exponents | |
637 are arbitrary. Other sizes are computed by means of a slow, | |
638 general-purpose algorithm (which nevertheless retains @Onlogn{} performance even for prime sizes). (It is possible to customize FFTW | |
639 for different array sizes; see @ref{Installation and Customization}.) | |
640 Transforms whose sizes are powers of @math{2} are especially fast, and | |
641 it is generally beneficial for the @emph{last} dimension of an r2c/c2r | |
642 transform to be @emph{even}. | |
643 @end itemize | |
644 | |
645 @item | |
646 @code{in} and @code{out} point to the input and output arrays of the | |
647 transform, which may be the same (yielding an in-place transform). | |
648 @cindex in-place | |
649 These arrays are overwritten during planning, unless | |
650 @code{FFTW_ESTIMATE} is used in the flags. (The arrays need not be | |
651 initialized, but they must be allocated.) For an in-place transform, it | |
652 is important to remember that the real array will require padding, | |
653 described in @ref{Real-data DFT Array Format}. | |
654 @cindex padding | |
655 | |
656 @item | |
657 @cindex flags | |
658 @code{flags} is a bitwise OR (@samp{|}) of zero or more planner flags, | |
659 as defined in @ref{Planner Flags}. | |
660 | |
661 @end itemize | |
662 | |
663 The inverse transforms, taking complex input (storing the non-redundant | |
664 half of a logically Hermitian array) to real output, are given by: | |
665 | |
666 @example | |
667 fftw_plan fftw_plan_dft_c2r_1d(int n0, | |
668 fftw_complex *in, double *out, | |
669 unsigned flags); | |
670 fftw_plan fftw_plan_dft_c2r_2d(int n0, int n1, | |
671 fftw_complex *in, double *out, | |
672 unsigned flags); | |
673 fftw_plan fftw_plan_dft_c2r_3d(int n0, int n1, int n2, | |
674 fftw_complex *in, double *out, | |
675 unsigned flags); | |
676 fftw_plan fftw_plan_dft_c2r(int rank, const int *n, | |
677 fftw_complex *in, double *out, | |
678 unsigned flags); | |
679 @end example | |
680 @findex fftw_plan_dft_c2r_1d | |
681 @findex fftw_plan_dft_c2r_2d | |
682 @findex fftw_plan_dft_c2r_3d | |
683 @findex fftw_plan_dft_c2r | |
684 @cindex c2r | |
685 | |
686 The arguments are the same as for the r2c transforms, except that the | |
687 input and output data formats are reversed. | |
688 | |
689 FFTW computes an unnormalized transform: computing an r2c followed by a | |
690 c2r transform (or vice versa) will result in the original data | |
691 multiplied by the size of the transform (the product of the logical | |
692 dimensions). | |
693 @cindex normalization | |
694 An r2c transform produces the same output as a @code{FFTW_FORWARD} | |
695 complex DFT of the same input, and a c2r transform is correspondingly | |
696 equivalent to @code{FFTW_BACKWARD}. For more information, see @ref{What | |
697 FFTW Really Computes}. | |
698 | |
699 @c =========> | |
700 @node Real-data DFT Array Format, Real-to-Real Transforms, Real-data DFTs, Basic Interface | |
701 @subsection Real-data DFT Array Format | |
702 @cindex r2c/c2r multi-dimensional array format | |
703 | |
704 The output of a DFT of real data (r2c) contains symmetries that, in | |
705 principle, make half of the outputs redundant (@pxref{What FFTW Really | |
706 Computes}). (Similarly for the input of an inverse c2r transform.) In | |
707 practice, it is not possible to entirely realize these savings in an | |
708 efficient and understandable format that generalizes to | |
709 multi-dimensional transforms. Instead, the output of the r2c | |
710 transforms is @emph{slightly} over half of the output of the | |
711 corresponding complex transform. We do not ``pack'' the data in any | |
712 way, but store it as an ordinary array of @code{fftw_complex} values. | |
713 In fact, this data is simply a subsection of what would be the array in | |
714 the corresponding complex transform. | |
715 | |
716 Specifically, for a real transform of @math{d} (= @code{rank}) | |
717 dimensions @ndims{}, the complex data is an @ndimshalf array of | |
718 @code{fftw_complex} values in row-major order (with the division rounded | |
719 down). That is, we only store the @emph{lower} half (non-negative | |
720 frequencies), plus one element, of the last dimension of the data from | |
721 the ordinary complex transform. (We could have instead taken half of | |
722 any other dimension, but implementation turns out to be simpler if the | |
723 last, contiguous, dimension is used.) | |
724 | |
725 @cindex out-of-place | |
726 For an out-of-place transform, the real data is simply an array with | |
727 physical dimensions @ndims in row-major order. | |
728 | |
729 @cindex in-place | |
730 @cindex padding | |
731 For an in-place transform, some complications arise since the complex data | |
732 is slightly larger than the real data. In this case, the final | |
733 dimension of the real data must be @emph{padded} with extra values to | |
734 accommodate the size of the complex data---two extra if the last | |
735 dimension is even and one if it is odd. That is, the last dimension of | |
736 the real data must physically contain | |
737 @tex | |
738 $2 (n_{d-1}/2+1)$ | |
739 @end tex | |
740 @ifinfo | |
741 2 * (n[d-1]/2+1) | |
742 @end ifinfo | |
743 @html | |
744 2 * (n<sub>d-1</sub>/2+1) | |
745 @end html | |
746 @code{double} values (exactly enough to hold the complex data). This | |
747 physical array size does not, however, change the @emph{logical} array | |
748 size---only | |
749 @tex | |
750 $n_{d-1}$ | |
751 @end tex | |
752 @ifinfo | |
753 n[d-1] | |
754 @end ifinfo | |
755 @html | |
756 n<sub>d-1</sub> | |
757 @end html | |
758 values are actually stored in the last dimension, and | |
759 @tex | |
760 $n_{d-1}$ | |
761 @end tex | |
762 @ifinfo | |
763 n[d-1] | |
764 @end ifinfo | |
765 @html | |
766 n<sub>d-1</sub> | |
767 @end html | |
768 is the last dimension passed to the planner. | |
769 | |
770 @c =========> | |
771 @node Real-to-Real Transforms, Real-to-Real Transform Kinds, Real-data DFT Array Format, Basic Interface | |
772 @subsection Real-to-Real Transforms | |
773 @cindex r2r | |
774 | |
775 @example | |
776 fftw_plan fftw_plan_r2r_1d(int n, double *in, double *out, | |
777 fftw_r2r_kind kind, unsigned flags); | |
778 fftw_plan fftw_plan_r2r_2d(int n0, int n1, double *in, double *out, | |
779 fftw_r2r_kind kind0, fftw_r2r_kind kind1, | |
780 unsigned flags); | |
781 fftw_plan fftw_plan_r2r_3d(int n0, int n1, int n2, | |
782 double *in, double *out, | |
783 fftw_r2r_kind kind0, | |
784 fftw_r2r_kind kind1, | |
785 fftw_r2r_kind kind2, | |
786 unsigned flags); | |
787 fftw_plan fftw_plan_r2r(int rank, const int *n, double *in, double *out, | |
788 const fftw_r2r_kind *kind, unsigned flags); | |
789 @end example | |
790 @findex fftw_plan_r2r_1d | |
791 @findex fftw_plan_r2r_2d | |
792 @findex fftw_plan_r2r_3d | |
793 @findex fftw_plan_r2r | |
794 | |
795 Plan a real input/output (r2r) transform of various kinds in zero or | |
796 more dimensions, returning an @code{fftw_plan} (@pxref{Using Plans}). | |
797 | |
798 Once you have created a plan for a certain transform type and | |
799 parameters, then creating another plan of the same type and parameters, | |
800 but for different arrays, is fast and shares constant data with the | |
801 first plan (if it still exists). | |
802 | |
803 The planner returns @code{NULL} if the plan cannot be created. A | |
804 non-@code{NULL} plan is always returned by the basic interface unless | |
805 you are using a customized FFTW configuration supporting a restricted | |
806 set of transforms, or for size-1 @code{FFTW_REDFT00} kinds (which are | |
807 not defined). | |
808 @ctindex FFTW_REDFT00 | |
809 | |
810 @subsubheading Arguments | |
811 @itemize @bullet | |
812 | |
813 @item | |
814 @code{rank} is the dimensionality of the transform (it should be the | |
815 size of the arrays @code{*n} and @code{*kind}), and can be any | |
816 non-negative integer. The @samp{_1d}, @samp{_2d}, and @samp{_3d} | |
817 planners correspond to a @code{rank} of @code{1}, @code{2}, and | |
818 @code{3}, respectively. A @code{rank} of zero is equivalent to a copy | |
819 of one number from input to output. | |
820 | |
821 @item | |
822 @code{n}, or @code{n0}/@code{n1}/@code{n2}, or @code{n[rank]}, | |
823 respectively, gives the (physical) size of the transform dimensions. | |
824 They can be any positive integer. | |
825 | |
826 @itemize @minus | |
827 @item | |
828 @cindex row-major | |
829 Multi-dimensional arrays are stored in row-major order with dimensions: | |
830 @code{n0} x @code{n1}; or @code{n0} x @code{n1} x @code{n2}; or | |
831 @code{n[0]} x @code{n[1]} x ... x @code{n[rank-1]}. | |
832 @xref{Multi-dimensional Array Format}. | |
833 @item | |
834 FFTW is generally best at handling sizes of the form | |
835 @ifinfo | |
836 @math{2^a 3^b 5^c 7^d 11^e 13^f}, | |
837 @end ifinfo | |
838 @tex | |
839 $2^a 3^b 5^c 7^d 11^e 13^f$, | |
840 @end tex | |
841 @html | |
842 2<sup>a</sup> 3<sup>b</sup> 5<sup>c</sup> 7<sup>d</sup> | |
843 11<sup>e</sup> 13<sup>f</sup>, | |
844 @end html | |
845 where @math{e+f} is either @math{0} or @math{1}, and the other exponents | |
846 are arbitrary. Other sizes are computed by means of a slow, | |
847 general-purpose algorithm (which nevertheless retains @Onlogn{} performance even for prime sizes). (It is possible to customize FFTW | |
848 for different array sizes; see @ref{Installation and Customization}.) | |
849 Transforms whose sizes are powers of @math{2} are especially fast. | |
850 @item | |
851 For a @code{REDFT00} or @code{RODFT00} transform kind in a dimension of | |
852 size @math{n}, it is @math{n-1} or @math{n+1}, respectively, that | |
853 should be factorizable in the above form. | |
854 @end itemize | |
855 | |
856 @item | |
857 @code{in} and @code{out} point to the input and output arrays of the | |
858 transform, which may be the same (yielding an in-place transform). | |
859 @cindex in-place | |
860 These arrays are overwritten during planning, unless | |
861 @code{FFTW_ESTIMATE} is used in the flags. (The arrays need not be | |
862 initialized, but they must be allocated.) | |
863 | |
864 @item | |
865 @code{kind}, or @code{kind0}/@code{kind1}/@code{kind2}, or | |
866 @code{kind[rank]}, is the kind of r2r transform used for the | |
867 corresponding dimension. The valid kind constants are described in | |
868 @ref{Real-to-Real Transform Kinds}. In a multi-dimensional transform, | |
869 what is computed is the separable product formed by taking each | |
870 transform kind along the corresponding dimension, one dimension after | |
871 another. | |
872 | |
873 @item | |
874 @cindex flags | |
875 @code{flags} is a bitwise OR (@samp{|}) of zero or more planner flags, | |
876 as defined in @ref{Planner Flags}. | |
877 | |
878 @end itemize | |
879 | |
880 @c =========> | |
881 @node Real-to-Real Transform Kinds, , Real-to-Real Transforms, Basic Interface | |
882 @subsection Real-to-Real Transform Kinds | |
883 @cindex kind (r2r) | |
884 | |
885 FFTW currently supports 11 different r2r transform kinds, specified by | |
886 one of the constants below. For the precise definitions of these | |
887 transforms, see @ref{What FFTW Really Computes}. For a more colloquial | |
888 introduction to these transform kinds, see @ref{More DFTs of Real Data}. | |
889 | |
890 For dimension of size @code{n}, there is a corresponding ``logical'' | |
891 dimension @code{N} that determines the normalization (and the optimal | |
892 factorization); the formula for @code{N} is given for each kind below. | |
893 Also, with each transform kind is listed its corrsponding inverse | |
894 transform. FFTW computes unnormalized transforms: a transform followed | |
895 by its inverse will result in the original data multiplied by @code{N} | |
896 (or the product of the @code{N}'s for each dimension, in | |
897 multi-dimensions). | |
898 @cindex normalization | |
899 | |
900 @itemize @bullet | |
901 | |
902 @item | |
903 @ctindex FFTW_R2HC | |
904 @code{FFTW_R2HC} computes a real-input DFT with output in | |
905 ``halfcomplex'' format, i.e. real and imaginary parts for a transform of | |
906 size @code{n} stored as: | |
907 @tex | |
908 $$ | |
909 r_0, r_1, r_2, \ldots, r_{n/2}, i_{(n+1)/2-1}, \ldots, i_2, i_1 | |
910 $$ | |
911 @end tex | |
912 @ifinfo | |
913 r0, r1, r2, r(n/2), i((n+1)/2-1), ..., i2, i1 | |
914 @end ifinfo | |
915 @html | |
916 <p align=center> | |
917 r<sub>0</sub>, r<sub>1</sub>, r<sub>2</sub>, ..., r<sub>n/2</sub>, i<sub>(n+1)/2-1</sub>, ..., i<sub>2</sub>, i<sub>1</sub> | |
918 </p> | |
919 @end html | |
920 (Logical @code{N=n}, inverse is @code{FFTW_HC2R}.) | |
921 | |
922 @item | |
923 @ctindex FFTW_HC2R | |
924 @code{FFTW_HC2R} computes the reverse of @code{FFTW_R2HC}, above. | |
925 (Logical @code{N=n}, inverse is @code{FFTW_R2HC}.) | |
926 | |
927 @item | |
928 @ctindex FFTW_DHT | |
929 @code{FFTW_DHT} computes a discrete Hartley transform. | |
930 (Logical @code{N=n}, inverse is @code{FFTW_DHT}.) | |
931 @cindex discrete Hartley transform | |
932 | |
933 @item | |
934 @ctindex FFTW_REDFT00 | |
935 @code{FFTW_REDFT00} computes an REDFT00 transform, i.e. a DCT-I. | |
936 (Logical @code{N=2*(n-1)}, inverse is @code{FFTW_REDFT00}.) | |
937 @cindex discrete cosine transform | |
938 @cindex DCT | |
939 | |
940 @item | |
941 @ctindex FFTW_REDFT10 | |
942 @code{FFTW_REDFT10} computes an REDFT10 transform, i.e. a DCT-II (sometimes called ``the'' DCT). | |
943 (Logical @code{N=2*n}, inverse is @code{FFTW_REDFT01}.) | |
944 | |
945 @item | |
946 @ctindex FFTW_REDFT01 | |
947 @code{FFTW_REDFT01} computes an REDFT01 transform, i.e. a DCT-III (sometimes called ``the'' IDCT, being the inverse of DCT-II). | |
948 (Logical @code{N=2*n}, inverse is @code{FFTW_REDFT=10}.) | |
949 @cindex IDCT | |
950 | |
951 @item | |
952 @ctindex FFTW_REDFT11 | |
953 @code{FFTW_REDFT11} computes an REDFT11 transform, i.e. a DCT-IV. | |
954 (Logical @code{N=2*n}, inverse is @code{FFTW_REDFT11}.) | |
955 | |
956 @item | |
957 @ctindex FFTW_RODFT00 | |
958 @code{FFTW_RODFT00} computes an RODFT00 transform, i.e. a DST-I. | |
959 (Logical @code{N=2*(n+1)}, inverse is @code{FFTW_RODFT00}.) | |
960 @cindex discrete sine transform | |
961 @cindex DST | |
962 | |
963 @item | |
964 @ctindex FFTW_RODFT10 | |
965 @code{FFTW_RODFT10} computes an RODFT10 transform, i.e. a DST-II. | |
966 (Logical @code{N=2*n}, inverse is @code{FFTW_RODFT01}.) | |
967 | |
968 @item | |
969 @ctindex FFTW_RODFT01 | |
970 @code{FFTW_RODFT01} computes an RODFT01 transform, i.e. a DST-III. | |
971 (Logical @code{N=2*n}, inverse is @code{FFTW_RODFT=10}.) | |
972 | |
973 @item | |
974 @ctindex FFTW_RODFT11 | |
975 @code{FFTW_RODFT11} computes an RODFT11 transform, i.e. a DST-IV. | |
976 (Logical @code{N=2*n}, inverse is @code{FFTW_RODFT11}.) | |
977 | |
978 @end itemize | |
979 | |
980 @c ------------------------------------------------------------ | |
981 @node Advanced Interface, Guru Interface, Basic Interface, FFTW Reference | |
982 @section Advanced Interface | |
983 @cindex advanced interface | |
984 | |
985 FFTW's ``advanced'' interface supplements the basic interface with four | |
986 new planner routines, providing a new level of flexibility: you can plan | |
987 a transform of multiple arrays simultaneously, operate on non-contiguous | |
988 (strided) data, and transform a subset of a larger multi-dimensional | |
989 array. Other than these additional features, the planner operates in | |
990 the same fashion as in the basic interface, and the resulting | |
991 @code{fftw_plan} is used in the same way (@pxref{Using Plans}). | |
992 | |
993 @menu | |
994 * Advanced Complex DFTs:: | |
995 * Advanced Real-data DFTs:: | |
996 * Advanced Real-to-real Transforms:: | |
997 @end menu | |
998 | |
999 @c =========> | |
1000 @node Advanced Complex DFTs, Advanced Real-data DFTs, Advanced Interface, Advanced Interface | |
1001 @subsection Advanced Complex DFTs | |
1002 | |
1003 @example | |
1004 fftw_plan fftw_plan_many_dft(int rank, const int *n, int howmany, | |
1005 fftw_complex *in, const int *inembed, | |
1006 int istride, int idist, | |
1007 fftw_complex *out, const int *onembed, | |
1008 int ostride, int odist, | |
1009 int sign, unsigned flags); | |
1010 @end example | |
1011 @findex fftw_plan_many_dft | |
1012 | |
1013 This routine plans multiple multidimensional complex DFTs, and it | |
1014 extends the @code{fftw_plan_dft} routine (@pxref{Complex DFTs}) to | |
1015 compute @code{howmany} transforms, each having rank @code{rank} and size | |
1016 @code{n}. In addition, the transform data need not be contiguous, but | |
1017 it may be laid out in memory with an arbitrary stride. To account for | |
1018 these possibilities, @code{fftw_plan_many_dft} adds the new parameters | |
1019 @code{howmany}, @{@code{i},@code{o}@}@code{nembed}, | |
1020 @{@code{i},@code{o}@}@code{stride}, and | |
1021 @{@code{i},@code{o}@}@code{dist}. The FFTW basic interface | |
1022 (@pxref{Complex DFTs}) provides routines specialized for ranks 1, 2, | |
1023 and@tie{}3, but the advanced interface handles only the general-rank | |
1024 case. | |
1025 | |
1026 @code{howmany} is the number of transforms to compute. The resulting | |
1027 plan computes @code{howmany} transforms, where the input of the | |
1028 @code{k}-th transform is at location @code{in+k*idist} (in C pointer | |
1029 arithmetic), and its output is at location @code{out+k*odist}. Plans | |
1030 obtained in this way can often be faster than calling FFTW multiple | |
1031 times for the individual transforms. The basic @code{fftw_plan_dft} | |
1032 interface corresponds to @code{howmany=1} (in which case the @code{dist} | |
1033 parameters are ignored). | |
1034 @cindex howmany parameter | |
1035 @cindex dist | |
1036 | |
1037 | |
1038 Each of the @code{howmany} transforms has rank @code{rank} and size | |
1039 @code{n}, as in the basic interface. In addition, the advanced | |
1040 interface allows the input and output arrays of each transform to be | |
1041 row-major subarrays of larger rank-@code{rank} arrays, described by | |
1042 @code{inembed} and @code{onembed} parameters, respectively. | |
1043 @{@code{i},@code{o}@}@code{nembed} must be arrays of length @code{rank}, | |
1044 and @code{n} should be elementwise less than or equal to | |
1045 @{@code{i},@code{o}@}@code{nembed}. Passing @code{NULL} for an | |
1046 @code{nembed} parameter is equivalent to passing @code{n} (i.e. same | |
1047 physical and logical dimensions, as in the basic interface.) | |
1048 | |
1049 The @code{stride} parameters indicate that the @code{j}-th element of | |
1050 the input or output arrays is located at @code{j*istride} or | |
1051 @code{j*ostride}, respectively. (For a multi-dimensional array, | |
1052 @code{j} is the ordinary row-major index.) When combined with the | |
1053 @code{k}-th transform in a @code{howmany} loop, from above, this means | |
1054 that the (@code{j},@code{k})-th element is at @code{j*stride+k*dist}. | |
1055 (The basic @code{fftw_plan_dft} interface corresponds to a stride of 1.) | |
1056 @cindex stride | |
1057 | |
1058 | |
1059 For in-place transforms, the input and output @code{stride} and | |
1060 @code{dist} parameters should be the same; otherwise, the planner may | |
1061 return @code{NULL}. | |
1062 | |
1063 Arrays @code{n}, @code{inembed}, and @code{onembed} are not used after | |
1064 this function returns. You can safely free or reuse them. | |
1065 | |
1066 @strong{Examples}: | |
1067 One transform of one 5 by 6 array contiguous in memory: | |
1068 @example | |
1069 int rank = 2; | |
1070 int n[] = @{5, 6@}; | |
1071 int howmany = 1; | |
1072 int idist = odist = 0; /* unused because howmany = 1 */ | |
1073 int istride = ostride = 1; /* array is contiguous in memory */ | |
1074 int *inembed = n, *onembed = n; | |
1075 @end example | |
1076 | |
1077 Transform of three 5 by 6 arrays, each contiguous in memory, | |
1078 stored in memory one after another: | |
1079 @example | |
1080 int rank = 2; | |
1081 int n[] = @{5, 6@}; | |
1082 int howmany = 3; | |
1083 int idist = odist = n[0]*n[1]; /* = 30, the distance in memory | |
1084 between the first element | |
1085 of the first array and the | |
1086 first element of the second array */ | |
1087 int istride = ostride = 1; /* array is contiguous in memory */ | |
1088 int *inembed = n, *onembed = n; | |
1089 @end example | |
1090 | |
1091 Transform each column of a 2d array with 10 rows and 3 columns: | |
1092 @example | |
1093 int rank = 1; /* not 2: we are computing 1d transforms */ | |
1094 int n[] = @{10@}; /* 1d transforms of length 10 */ | |
1095 int howmany = 3; | |
1096 int idist = odist = 1; | |
1097 int istride = ostride = 3; /* distance between two elements in | |
1098 the same column */ | |
1099 int *inembed = n, *onembed = n; | |
1100 @end example | |
1101 | |
1102 @c =========> | |
1103 @node Advanced Real-data DFTs, Advanced Real-to-real Transforms, Advanced Complex DFTs, Advanced Interface | |
1104 @subsection Advanced Real-data DFTs | |
1105 | |
1106 @example | |
1107 fftw_plan fftw_plan_many_dft_r2c(int rank, const int *n, int howmany, | |
1108 double *in, const int *inembed, | |
1109 int istride, int idist, | |
1110 fftw_complex *out, const int *onembed, | |
1111 int ostride, int odist, | |
1112 unsigned flags); | |
1113 fftw_plan fftw_plan_many_dft_c2r(int rank, const int *n, int howmany, | |
1114 fftw_complex *in, const int *inembed, | |
1115 int istride, int idist, | |
1116 double *out, const int *onembed, | |
1117 int ostride, int odist, | |
1118 unsigned flags); | |
1119 @end example | |
1120 @findex fftw_plan_many_dft_r2c | |
1121 @findex fftw_plan_many_dft_c2r | |
1122 | |
1123 Like @code{fftw_plan_many_dft}, these two functions add @code{howmany}, | |
1124 @code{nembed}, @code{stride}, and @code{dist} parameters to the | |
1125 @code{fftw_plan_dft_r2c} and @code{fftw_plan_dft_c2r} functions, but | |
1126 otherwise behave the same as the basic interface. | |
1127 | |
1128 The interpretation of @code{howmany}, @code{stride}, and @code{dist} are | |
1129 the same as for @code{fftw_plan_many_dft}, above. Note that the | |
1130 @code{stride} and @code{dist} for the real array are in units of | |
1131 @code{double}, and for the complex array are in units of | |
1132 @code{fftw_complex}. | |
1133 | |
1134 If an @code{nembed} parameter is @code{NULL}, it is interpreted as what | |
1135 it would be in the basic interface, as described in @ref{Real-data DFT | |
1136 Array Format}. That is, for the complex array the size is assumed to be | |
1137 the same as @code{n}, but with the last dimension cut roughly in half. | |
1138 For the real array, the size is assumed to be @code{n} if the transform | |
1139 is out-of-place, or @code{n} with the last dimension ``padded'' if the | |
1140 transform is in-place. | |
1141 | |
1142 If an @code{nembed} parameter is non-@code{NULL}, it is interpreted as | |
1143 the physical size of the corresponding array, in row-major order, just | |
1144 as for @code{fftw_plan_many_dft}. In this case, each dimension of | |
1145 @code{nembed} should be @code{>=} what it would be in the basic | |
1146 interface (e.g. the halved or padded @code{n}). | |
1147 | |
1148 Arrays @code{n}, @code{inembed}, and @code{onembed} are not used after | |
1149 this function returns. You can safely free or reuse them. | |
1150 | |
1151 @c =========> | |
1152 @node Advanced Real-to-real Transforms, , Advanced Real-data DFTs, Advanced Interface | |
1153 @subsection Advanced Real-to-real Transforms | |
1154 | |
1155 @example | |
1156 fftw_plan fftw_plan_many_r2r(int rank, const int *n, int howmany, | |
1157 double *in, const int *inembed, | |
1158 int istride, int idist, | |
1159 double *out, const int *onembed, | |
1160 int ostride, int odist, | |
1161 const fftw_r2r_kind *kind, unsigned flags); | |
1162 @end example | |
1163 @findex fftw_plan_many_r2r | |
1164 | |
1165 Like @code{fftw_plan_many_dft}, this functions adds @code{howmany}, | |
1166 @code{nembed}, @code{stride}, and @code{dist} parameters to the | |
1167 @code{fftw_plan_r2r} function, but otherwise behave the same as the | |
1168 basic interface. The interpretation of those additional parameters are | |
1169 the same as for @code{fftw_plan_many_dft}. (Of course, the | |
1170 @code{stride} and @code{dist} parameters are now in units of | |
1171 @code{double}, not @code{fftw_complex}.) | |
1172 | |
1173 Arrays @code{n}, @code{inembed}, @code{onembed}, and @code{kind} are not | |
1174 used after this function returns. You can safely free or reuse them. | |
1175 | |
1176 @c ------------------------------------------------------------ | |
1177 @node Guru Interface, New-array Execute Functions, Advanced Interface, FFTW Reference | |
1178 @section Guru Interface | |
1179 @cindex guru interface | |
1180 | |
1181 The ``guru'' interface to FFTW is intended to expose as much as possible | |
1182 of the flexibility in the underlying FFTW architecture. It allows one | |
1183 to compute multi-dimensional ``vectors'' (loops) of multi-dimensional | |
1184 transforms, where each vector/transform dimension has an independent | |
1185 size and stride. | |
1186 @cindex vector | |
1187 One can also use more general complex-number formats, e.g. separate real | |
1188 and imaginary arrays. | |
1189 | |
1190 For those users who require the flexibility of the guru interface, it is | |
1191 important that they pay special attention to the documentation lest they | |
1192 shoot themselves in the foot. | |
1193 | |
1194 @menu | |
1195 * Interleaved and split arrays:: | |
1196 * Guru vector and transform sizes:: | |
1197 * Guru Complex DFTs:: | |
1198 * Guru Real-data DFTs:: | |
1199 * Guru Real-to-real Transforms:: | |
1200 * 64-bit Guru Interface:: | |
1201 @end menu | |
1202 | |
1203 @c =========> | |
1204 @node Interleaved and split arrays, Guru vector and transform sizes, Guru Interface, Guru Interface | |
1205 @subsection Interleaved and split arrays | |
1206 | |
1207 The guru interface supports two representations of complex numbers, | |
1208 which we call the interleaved and the split format. | |
1209 | |
1210 The @dfn{interleaved} format is the same one used by the basic and | |
1211 advanced interfaces, and it is documented in @ref{Complex numbers}. | |
1212 In the interleaved format, you provide pointers to the real part of a | |
1213 complex number, and the imaginary part understood to be stored in the | |
1214 next memory location. | |
1215 @cindex interleaved format | |
1216 | |
1217 | |
1218 The @dfn{split} format allows separate pointers to the real and | |
1219 imaginary parts of a complex array. | |
1220 @cindex split format | |
1221 | |
1222 | |
1223 Technically, the interleaved format is redundant, because you can | |
1224 always express an interleaved array in terms of a split array with | |
1225 appropriate pointers and strides. On the other hand, the interleaved | |
1226 format is simpler to use, and it is common in practice. Hence, FFTW | |
1227 supports it as a special case. | |
1228 | |
1229 @c =========> | |
1230 @node Guru vector and transform sizes, Guru Complex DFTs, Interleaved and split arrays, Guru Interface | |
1231 @subsection Guru vector and transform sizes | |
1232 | |
1233 The guru interface introduces one basic new data structure, | |
1234 @code{fftw_iodim}, that is used to specify sizes and strides for | |
1235 multi-dimensional transforms and vectors: | |
1236 | |
1237 @example | |
1238 typedef struct @{ | |
1239 int n; | |
1240 int is; | |
1241 int os; | |
1242 @} fftw_iodim; | |
1243 @end example | |
1244 @tindex fftw_iodim | |
1245 | |
1246 Here, @code{n} is the size of the dimension, and @code{is} and @code{os} | |
1247 are the strides of that dimension for the input and output arrays. (The | |
1248 stride is the separation of consecutive elements along this dimension.) | |
1249 | |
1250 The meaning of the stride parameter depends on the type of the array | |
1251 that the stride refers to. @emph{If the array is interleaved complex, | |
1252 strides are expressed in units of complex numbers | |
1253 (@code{fftw_complex}). If the array is split complex or real, strides | |
1254 are expressed in units of real numbers (@code{double}).} This | |
1255 convention is consistent with the usual pointer arithmetic in the C | |
1256 language. An interleaved array is denoted by a pointer @code{p} to | |
1257 @code{fftw_complex}, so that @code{p+1} points to the next complex | |
1258 number. Split arrays are denoted by pointers to @code{double}, in | |
1259 which case pointer arithmetic operates in units of | |
1260 @code{sizeof(double)}. | |
1261 @cindex stride | |
1262 | |
1263 | |
1264 The guru planner interfaces all take a (@code{rank}, @code{dims[rank]}) | |
1265 pair describing the transform size, and a (@code{howmany_rank}, | |
1266 @code{howmany_dims[howmany_rank]}) pair describing the ``vector'' size (a | |
1267 multi-dimensional loop of transforms to perform), where @code{dims} and | |
1268 @code{howmany_dims} are arrays of @code{fftw_iodim}. | |
1269 | |
1270 For example, the @code{howmany} parameter in the advanced complex-DFT | |
1271 interface corresponds to @code{howmany_rank} = 1, | |
1272 @code{howmany_dims[0].n} = @code{howmany}, @code{howmany_dims[0].is} = | |
1273 @code{idist}, and @code{howmany_dims[0].os} = @code{odist}. | |
1274 @cindex howmany loop | |
1275 @cindex dist | |
1276 (To compute a single transform, you can just use @code{howmany_rank} = 0.) | |
1277 | |
1278 | |
1279 A row-major multidimensional array with dimensions @code{n[rank]} | |
1280 (@pxref{Row-major Format}) corresponds to @code{dims[i].n} = | |
1281 @code{n[i]} and the recurrence @code{dims[i].is} = @code{n[i+1] * | |
1282 dims[i+1].is} (similarly for @code{os}). The stride of the last | |
1283 (@code{i=rank-1}) dimension is the overall stride of the array. | |
1284 e.g. to be equivalent to the advanced complex-DFT interface, you would | |
1285 have @code{dims[rank-1].is} = @code{istride} and | |
1286 @code{dims[rank-1].os} = @code{ostride}. | |
1287 @cindex row-major | |
1288 | |
1289 | |
1290 In general, we only guarantee FFTW to return a non-@code{NULL} plan if | |
1291 the vector and transform dimensions correspond to a set of distinct | |
1292 indices, and for in-place transforms the input/output strides should | |
1293 be the same. | |
1294 | |
1295 @c =========> | |
1296 @node Guru Complex DFTs, Guru Real-data DFTs, Guru vector and transform sizes, Guru Interface | |
1297 @subsection Guru Complex DFTs | |
1298 | |
1299 @example | |
1300 fftw_plan fftw_plan_guru_dft( | |
1301 int rank, const fftw_iodim *dims, | |
1302 int howmany_rank, const fftw_iodim *howmany_dims, | |
1303 fftw_complex *in, fftw_complex *out, | |
1304 int sign, unsigned flags); | |
1305 | |
1306 fftw_plan fftw_plan_guru_split_dft( | |
1307 int rank, const fftw_iodim *dims, | |
1308 int howmany_rank, const fftw_iodim *howmany_dims, | |
1309 double *ri, double *ii, double *ro, double *io, | |
1310 unsigned flags); | |
1311 @end example | |
1312 @findex fftw_plan_guru_dft | |
1313 @findex fftw_plan_guru_split_dft | |
1314 | |
1315 These two functions plan a complex-data, multi-dimensional DFT | |
1316 for the interleaved and split format, respectively. | |
1317 Transform dimensions are given by (@code{rank}, @code{dims}) over a | |
1318 multi-dimensional vector (loop) of dimensions (@code{howmany_rank}, | |
1319 @code{howmany_dims}). @code{dims} and @code{howmany_dims} should point | |
1320 to @code{fftw_iodim} arrays of length @code{rank} and | |
1321 @code{howmany_rank}, respectively. | |
1322 | |
1323 @cindex flags | |
1324 @code{flags} is a bitwise OR (@samp{|}) of zero or more planner flags, | |
1325 as defined in @ref{Planner Flags}. | |
1326 | |
1327 In the @code{fftw_plan_guru_dft} function, the pointers @code{in} and | |
1328 @code{out} point to the interleaved input and output arrays, | |
1329 respectively. The sign can be either @math{-1} (= | |
1330 @code{FFTW_FORWARD}) or @math{+1} (= @code{FFTW_BACKWARD}). If the | |
1331 pointers are equal, the transform is in-place. | |
1332 | |
1333 In the @code{fftw_plan_guru_split_dft} function, | |
1334 @code{ri} and @code{ii} point to the real and imaginary input arrays, | |
1335 and @code{ro} and @code{io} point to the real and imaginary output | |
1336 arrays. The input and output pointers may be the same, indicating an | |
1337 in-place transform. For example, for @code{fftw_complex} pointers | |
1338 @code{in} and @code{out}, the corresponding parameters are: | |
1339 | |
1340 @example | |
1341 ri = (double *) in; | |
1342 ii = (double *) in + 1; | |
1343 ro = (double *) out; | |
1344 io = (double *) out + 1; | |
1345 @end example | |
1346 | |
1347 Because @code{fftw_plan_guru_split_dft} accepts split arrays, strides | |
1348 are expressed in units of @code{double}. For a contiguous | |
1349 @code{fftw_complex} array, the overall stride of the transform should | |
1350 be 2, the distance between consecutive real parts or between | |
1351 consecutive imaginary parts; see @ref{Guru vector and transform | |
1352 sizes}. Note that the dimension strides are applied equally to the | |
1353 real and imaginary parts; real and imaginary arrays with different | |
1354 strides are not supported. | |
1355 | |
1356 There is no @code{sign} parameter in @code{fftw_plan_guru_split_dft}. | |
1357 This function always plans for an @code{FFTW_FORWARD} transform. To | |
1358 plan for an @code{FFTW_BACKWARD} transform, you can exploit the | |
1359 identity that the backwards DFT is equal to the forwards DFT with the | |
1360 real and imaginary parts swapped. For example, in the case of the | |
1361 @code{fftw_complex} arrays above, the @code{FFTW_BACKWARD} transform | |
1362 is computed by the parameters: | |
1363 | |
1364 @example | |
1365 ri = (double *) in + 1; | |
1366 ii = (double *) in; | |
1367 ro = (double *) out + 1; | |
1368 io = (double *) out; | |
1369 @end example | |
1370 | |
1371 @c =========> | |
1372 @node Guru Real-data DFTs, Guru Real-to-real Transforms, Guru Complex DFTs, Guru Interface | |
1373 @subsection Guru Real-data DFTs | |
1374 | |
1375 @example | |
1376 fftw_plan fftw_plan_guru_dft_r2c( | |
1377 int rank, const fftw_iodim *dims, | |
1378 int howmany_rank, const fftw_iodim *howmany_dims, | |
1379 double *in, fftw_complex *out, | |
1380 unsigned flags); | |
1381 | |
1382 fftw_plan fftw_plan_guru_split_dft_r2c( | |
1383 int rank, const fftw_iodim *dims, | |
1384 int howmany_rank, const fftw_iodim *howmany_dims, | |
1385 double *in, double *ro, double *io, | |
1386 unsigned flags); | |
1387 | |
1388 fftw_plan fftw_plan_guru_dft_c2r( | |
1389 int rank, const fftw_iodim *dims, | |
1390 int howmany_rank, const fftw_iodim *howmany_dims, | |
1391 fftw_complex *in, double *out, | |
1392 unsigned flags); | |
1393 | |
1394 fftw_plan fftw_plan_guru_split_dft_c2r( | |
1395 int rank, const fftw_iodim *dims, | |
1396 int howmany_rank, const fftw_iodim *howmany_dims, | |
1397 double *ri, double *ii, double *out, | |
1398 unsigned flags); | |
1399 @end example | |
1400 @findex fftw_plan_guru_dft_r2c | |
1401 @findex fftw_plan_guru_split_dft_r2c | |
1402 @findex fftw_plan_guru_dft_c2r | |
1403 @findex fftw_plan_guru_split_dft_c2r | |
1404 | |
1405 Plan a real-input (r2c) or real-output (c2r), multi-dimensional DFT with | |
1406 transform dimensions given by (@code{rank}, @code{dims}) over a | |
1407 multi-dimensional vector (loop) of dimensions (@code{howmany_rank}, | |
1408 @code{howmany_dims}). @code{dims} and @code{howmany_dims} should point | |
1409 to @code{fftw_iodim} arrays of length @code{rank} and | |
1410 @code{howmany_rank}, respectively. As for the basic and advanced | |
1411 interfaces, an r2c transform is @code{FFTW_FORWARD} and a c2r transform | |
1412 is @code{FFTW_BACKWARD}. | |
1413 | |
1414 The @emph{last} dimension of @code{dims} is interpreted specially: | |
1415 that dimension of the real array has size @code{dims[rank-1].n}, but | |
1416 that dimension of the complex array has size @code{dims[rank-1].n/2+1} | |
1417 (division rounded down). The strides, on the other hand, are taken to | |
1418 be exactly as specified. It is up to the user to specify the strides | |
1419 appropriately for the peculiar dimensions of the data, and we do not | |
1420 guarantee that the planner will succeed (return non-@code{NULL}) for | |
1421 any dimensions other than those described in @ref{Real-data DFT Array | |
1422 Format} and generalized in @ref{Advanced Real-data DFTs}. (That is, | |
1423 for an in-place transform, each individual dimension should be able to | |
1424 operate in place.) | |
1425 @cindex in-place | |
1426 | |
1427 | |
1428 @code{in} and @code{out} point to the input and output arrays for r2c | |
1429 and c2r transforms, respectively. For split arrays, @code{ri} and | |
1430 @code{ii} point to the real and imaginary input arrays for a c2r | |
1431 transform, and @code{ro} and @code{io} point to the real and imaginary | |
1432 output arrays for an r2c transform. @code{in} and @code{ro} or | |
1433 @code{ri} and @code{out} may be the same, indicating an in-place | |
1434 transform. (In-place transforms where @code{in} and @code{io} or | |
1435 @code{ii} and @code{out} are the same are not currently supported.) | |
1436 | |
1437 @cindex flags | |
1438 @code{flags} is a bitwise OR (@samp{|}) of zero or more planner flags, | |
1439 as defined in @ref{Planner Flags}. | |
1440 | |
1441 In-place transforms of rank greater than 1 are currently only | |
1442 supported for interleaved arrays. For split arrays, the planner will | |
1443 return @code{NULL}. | |
1444 @cindex in-place | |
1445 | |
1446 @c =========> | |
1447 @node Guru Real-to-real Transforms, 64-bit Guru Interface, Guru Real-data DFTs, Guru Interface | |
1448 @subsection Guru Real-to-real Transforms | |
1449 | |
1450 @example | |
1451 fftw_plan fftw_plan_guru_r2r(int rank, const fftw_iodim *dims, | |
1452 int howmany_rank, | |
1453 const fftw_iodim *howmany_dims, | |
1454 double *in, double *out, | |
1455 const fftw_r2r_kind *kind, | |
1456 unsigned flags); | |
1457 @end example | |
1458 @findex fftw_plan_guru_r2r | |
1459 | |
1460 Plan a real-to-real (r2r) multi-dimensional @code{FFTW_FORWARD} | |
1461 transform with transform dimensions given by (@code{rank}, @code{dims}) | |
1462 over a multi-dimensional vector (loop) of dimensions | |
1463 (@code{howmany_rank}, @code{howmany_dims}). @code{dims} and | |
1464 @code{howmany_dims} should point to @code{fftw_iodim} arrays of length | |
1465 @code{rank} and @code{howmany_rank}, respectively. | |
1466 | |
1467 The transform kind of each dimension is given by the @code{kind} | |
1468 parameter, which should point to an array of length @code{rank}. Valid | |
1469 @code{fftw_r2r_kind} constants are given in @ref{Real-to-Real Transform | |
1470 Kinds}. | |
1471 | |
1472 @code{in} and @code{out} point to the real input and output arrays; they | |
1473 may be the same, indicating an in-place transform. | |
1474 | |
1475 @cindex flags | |
1476 @code{flags} is a bitwise OR (@samp{|}) of zero or more planner flags, | |
1477 as defined in @ref{Planner Flags}. | |
1478 | |
1479 @c =========> | |
1480 @node 64-bit Guru Interface, , Guru Real-to-real Transforms, Guru Interface | |
1481 @subsection 64-bit Guru Interface | |
1482 @cindex 64-bit architecture | |
1483 | |
1484 When compiled in 64-bit mode on a 64-bit architecture (where addresses | |
1485 are 64 bits wide), FFTW uses 64-bit quantities internally for all | |
1486 transform sizes, strides, and so on---you don't have to do anything | |
1487 special to exploit this. However, in the ordinary FFTW interfaces, | |
1488 you specify the transform size by an @code{int} quantity, which is | |
1489 normally only 32 bits wide. This means that, even though FFTW is | |
1490 using 64-bit sizes internally, you cannot specify a single transform | |
1491 dimension larger than | |
1492 @ifinfo | |
1493 2^31-1 | |
1494 @end ifinfo | |
1495 @html | |
1496 2<sup><small>31</small></sup>−1 | |
1497 @end html | |
1498 @tex | |
1499 $2^{31}-1$ | |
1500 @end tex | |
1501 numbers. | |
1502 | |
1503 We expect that few users will require transforms larger than this, but, | |
1504 for those who do, we provide a 64-bit version of the guru interface in | |
1505 which all sizes are specified as integers of type @code{ptrdiff_t} | |
1506 instead of @code{int}. (@code{ptrdiff_t} is a signed integer type | |
1507 defined by the C standard to be wide enough to represent address | |
1508 differences, and thus must be at least 64 bits wide on a 64-bit | |
1509 machine.) We stress that there is @emph{no performance advantage} to | |
1510 using this interface---the same internal FFTW code is employed | |
1511 regardless---and it is only necessary if you want to specify very | |
1512 large transform sizes. | |
1513 @tindex ptrdiff_t | |
1514 | |
1515 | |
1516 In particular, the 64-bit guru interface is a set of planner routines | |
1517 that are exactly the same as the guru planner routines, except that | |
1518 they are named with @samp{guru64} instead of @samp{guru} and they take | |
1519 arguments of type @code{fftw_iodim64} instead of @code{fftw_iodim}. | |
1520 For example, instead of @code{fftw_plan_guru_dft}, we have | |
1521 @code{fftw_plan_guru64_dft}. | |
1522 | |
1523 @example | |
1524 fftw_plan fftw_plan_guru64_dft( | |
1525 int rank, const fftw_iodim64 *dims, | |
1526 int howmany_rank, const fftw_iodim64 *howmany_dims, | |
1527 fftw_complex *in, fftw_complex *out, | |
1528 int sign, unsigned flags); | |
1529 @end example | |
1530 @findex fftw_plan_guru64_dft | |
1531 | |
1532 The @code{fftw_iodim64} type is similar to @code{fftw_iodim}, with the | |
1533 same interpretation, except that it uses type @code{ptrdiff_t} instead | |
1534 of type @code{int}. | |
1535 | |
1536 @example | |
1537 typedef struct @{ | |
1538 ptrdiff_t n; | |
1539 ptrdiff_t is; | |
1540 ptrdiff_t os; | |
1541 @} fftw_iodim64; | |
1542 @end example | |
1543 @tindex fftw_iodim64 | |
1544 | |
1545 Every other @samp{fftw_plan_guru} function also has a | |
1546 @samp{fftw_plan_guru64} equivalent, but we do not repeat their | |
1547 documentation here since they are identical to the 32-bit versions | |
1548 except as noted above. | |
1549 | |
1550 @c ----------------------------------------------------------- | |
1551 @node New-array Execute Functions, Wisdom, Guru Interface, FFTW Reference | |
1552 @section New-array Execute Functions | |
1553 @cindex execute | |
1554 @cindex new-array execution | |
1555 | |
1556 Normally, one executes a plan for the arrays with which the plan was | |
1557 created, by calling @code{fftw_execute(plan)} as described in @ref{Using | |
1558 Plans}. | |
1559 @findex fftw_execute | |
1560 However, it is possible for sophisticated users to apply a given plan | |
1561 to a @emph{different} array using the ``new-array execute'' functions | |
1562 detailed below, provided that the following conditions are met: | |
1563 | |
1564 @itemize @bullet | |
1565 | |
1566 @item | |
1567 The array size, strides, etcetera are the same (since those are set by | |
1568 the plan). | |
1569 | |
1570 @item | |
1571 The input and output arrays are the same (in-place) or different | |
1572 (out-of-place) if the plan was originally created to be in-place or | |
1573 out-of-place, respectively. | |
1574 | |
1575 @item | |
1576 For split arrays, the separations between the real and imaginary | |
1577 parts, @code{ii-ri} and @code{io-ro}, are the same as they were for | |
1578 the input and output arrays when the plan was created. (This | |
1579 condition is automatically satisfied for interleaved arrays.) | |
1580 | |
1581 @item | |
1582 The @dfn{alignment} of the new input/output arrays is the same as that | |
1583 of the input/output arrays when the plan was created, unless the plan | |
1584 was created with the @code{FFTW_UNALIGNED} flag. | |
1585 @ctindex FFTW_UNALIGNED | |
1586 Here, the alignment is a platform-dependent quantity (for example, it is | |
1587 the address modulo 16 if SSE SIMD instructions are used, but the address | |
1588 modulo 4 for non-SIMD single-precision FFTW on the same machine). In | |
1589 general, only arrays allocated with @code{fftw_malloc} are guaranteed to | |
1590 be equally aligned (@pxref{SIMD alignment and fftw_malloc}). | |
1591 | |
1592 @end itemize | |
1593 | |
1594 @cindex alignment | |
1595 The alignment issue is especially critical, because if you don't use | |
1596 @code{fftw_malloc} then you may have little control over the alignment | |
1597 of arrays in memory. For example, neither the C++ @code{new} function | |
1598 nor the Fortran @code{allocate} statement provide strong enough | |
1599 guarantees about data alignment. If you don't use @code{fftw_malloc}, | |
1600 therefore, you probably have to use @code{FFTW_UNALIGNED} (which | |
1601 disables most SIMD support). If possible, it is probably better for | |
1602 you to simply create multiple plans (creating a new plan is quick once | |
1603 one exists for a given size), or better yet re-use the same array for | |
1604 your transforms. | |
1605 | |
1606 @findex fftw_alignment_of | |
1607 For rare circumstances in which you cannot control the alignment of | |
1608 allocated memory, but wish to determine where a given array is | |
1609 aligned like the original array for which a plan was created, you can | |
1610 use the @code{fftw_alignment_of} function: | |
1611 @example | |
1612 int fftw_alignment_of(double *p); | |
1613 @end example | |
1614 Two arrays have equivalent alignment (for the purposes of applying a | |
1615 plan) if and only if @code{fftw_alignment_of} returns the same value | |
1616 for the corresponding pointers to their data (typecast to @code{double*} | |
1617 if necessary). | |
1618 | |
1619 If you are tempted to use the new-array execute interface because you | |
1620 want to transform a known bunch of arrays of the same size, you should | |
1621 probably go use the advanced interface instead (@pxref{Advanced | |
1622 Interface})). | |
1623 | |
1624 The new-array execute functions are: | |
1625 | |
1626 @example | |
1627 void fftw_execute_dft( | |
1628 const fftw_plan p, | |
1629 fftw_complex *in, fftw_complex *out); | |
1630 | |
1631 void fftw_execute_split_dft( | |
1632 const fftw_plan p, | |
1633 double *ri, double *ii, double *ro, double *io); | |
1634 | |
1635 void fftw_execute_dft_r2c( | |
1636 const fftw_plan p, | |
1637 double *in, fftw_complex *out); | |
1638 | |
1639 void fftw_execute_split_dft_r2c( | |
1640 const fftw_plan p, | |
1641 double *in, double *ro, double *io); | |
1642 | |
1643 void fftw_execute_dft_c2r( | |
1644 const fftw_plan p, | |
1645 fftw_complex *in, double *out); | |
1646 | |
1647 void fftw_execute_split_dft_c2r( | |
1648 const fftw_plan p, | |
1649 double *ri, double *ii, double *out); | |
1650 | |
1651 void fftw_execute_r2r( | |
1652 const fftw_plan p, | |
1653 double *in, double *out); | |
1654 @end example | |
1655 @findex fftw_execute_dft | |
1656 @findex fftw_execute_split_dft | |
1657 @findex fftw_execute_dft_r2c | |
1658 @findex fftw_execute_split_dft_r2c | |
1659 @findex fftw_execute_dft_c2r | |
1660 @findex fftw_execute_split_dft_c2r | |
1661 @findex fftw_execute_r2r | |
1662 | |
1663 These execute the @code{plan} to compute the corresponding transform on | |
1664 the input/output arrays specified by the subsequent arguments. The | |
1665 input/output array arguments have the same meanings as the ones passed | |
1666 to the guru planner routines in the preceding sections. The @code{plan} | |
1667 is not modified, and these routines can be called as many times as | |
1668 desired, or intermixed with calls to the ordinary @code{fftw_execute}. | |
1669 | |
1670 The @code{plan} @emph{must} have been created for the transform type | |
1671 corresponding to the execute function, e.g. it must be a complex-DFT | |
1672 plan for @code{fftw_execute_dft}. Any of the planner routines for that | |
1673 transform type, from the basic to the guru interface, could have been | |
1674 used to create the plan, however. | |
1675 | |
1676 @c ------------------------------------------------------------ | |
1677 @node Wisdom, What FFTW Really Computes, New-array Execute Functions, FFTW Reference | |
1678 @section Wisdom | |
1679 @cindex wisdom | |
1680 @cindex saving plans to disk | |
1681 | |
1682 This section documents the FFTW mechanism for saving and restoring | |
1683 plans from disk. This mechanism is called @dfn{wisdom}. | |
1684 | |
1685 @menu | |
1686 * Wisdom Export:: | |
1687 * Wisdom Import:: | |
1688 * Forgetting Wisdom:: | |
1689 * Wisdom Utilities:: | |
1690 @end menu | |
1691 | |
1692 @c =========> | |
1693 @node Wisdom Export, Wisdom Import, Wisdom, Wisdom | |
1694 @subsection Wisdom Export | |
1695 | |
1696 @example | |
1697 int fftw_export_wisdom_to_filename(const char *filename); | |
1698 void fftw_export_wisdom_to_file(FILE *output_file); | |
1699 char *fftw_export_wisdom_to_string(void); | |
1700 void fftw_export_wisdom(void (*write_char)(char c, void *), void *data); | |
1701 @end example | |
1702 @findex fftw_export_wisdom | |
1703 @findex fftw_export_wisdom_to_filename | |
1704 @findex fftw_export_wisdom_to_file | |
1705 @findex fftw_export_wisdom_to_string | |
1706 | |
1707 These functions allow you to export all currently accumulated wisdom | |
1708 in a form from which it can be later imported and restored, even | |
1709 during a separate run of the program. (@xref{Words of Wisdom-Saving | |
1710 Plans}.) The current store of wisdom is not affected by calling any | |
1711 of these routines. | |
1712 | |
1713 @code{fftw_export_wisdom} exports the wisdom to any output | |
1714 medium, as specified by the callback function | |
1715 @code{write_char}. @code{write_char} is a @code{putc}-like function that | |
1716 writes the character @code{c} to some output; its second parameter is | |
1717 the @code{data} pointer passed to @code{fftw_export_wisdom}. For | |
1718 convenience, the following three ``wrapper'' routines are provided: | |
1719 | |
1720 @code{fftw_export_wisdom_to_filename} writes wisdom to a file named | |
1721 @code{filename} (which is created or overwritten), returning @code{1} | |
1722 on success and @code{0} on failure. A lower-level function, which | |
1723 requires you to open and close the file yourself (e.g. if you want to | |
1724 write wisdom to a portion of a larger file) is | |
1725 @code{fftw_export_wisdom_to_file}. This writes the wisdom to the | |
1726 current position in @code{output_file}, which should be open with | |
1727 write permission; upon exit, the file remains open and is positioned | |
1728 at the end of the wisdom data. | |
1729 | |
1730 @code{fftw_export_wisdom_to_string} returns a pointer to a | |
1731 @code{NULL}-terminated string holding the wisdom data. This string is | |
1732 dynamically allocated, and it is the responsibility of the caller to | |
1733 deallocate it with @code{free} when it is no longer needed. | |
1734 | |
1735 All of these routines export the wisdom in the same format, which we | |
1736 will not document here except to say that it is LISP-like ASCII text | |
1737 that is insensitive to white space. | |
1738 | |
1739 @c =========> | |
1740 @node Wisdom Import, Forgetting Wisdom, Wisdom Export, Wisdom | |
1741 @subsection Wisdom Import | |
1742 | |
1743 @example | |
1744 int fftw_import_system_wisdom(void); | |
1745 int fftw_import_wisdom_from_filename(const char *filename); | |
1746 int fftw_import_wisdom_from_string(const char *input_string); | |
1747 int fftw_import_wisdom(int (*read_char)(void *), void *data); | |
1748 @end example | |
1749 @findex fftw_import_wisdom | |
1750 @findex fftw_import_system_wisdom | |
1751 @findex fftw_import_wisdom_from_filename | |
1752 @findex fftw_import_wisdom_from_file | |
1753 @findex fftw_import_wisdom_from_string | |
1754 | |
1755 These functions import wisdom into a program from data stored by the | |
1756 @code{fftw_export_wisdom} functions above. (@xref{Words of | |
1757 Wisdom-Saving Plans}.) The imported wisdom replaces any wisdom | |
1758 already accumulated by the running program. | |
1759 | |
1760 @code{fftw_import_wisdom} imports wisdom from any input medium, as | |
1761 specified by the callback function @code{read_char}. @code{read_char} is | |
1762 a @code{getc}-like function that returns the next character in the | |
1763 input; its parameter is the @code{data} pointer passed to | |
1764 @code{fftw_import_wisdom}. If the end of the input data is reached | |
1765 (which should never happen for valid data), @code{read_char} should | |
1766 return @code{EOF} (as defined in @code{<stdio.h>}). For convenience, | |
1767 the following three ``wrapper'' routines are provided: | |
1768 | |
1769 @code{fftw_import_wisdom_from_filename} reads wisdom from a file named | |
1770 @code{filename}. A lower-level function, which requires you to open | |
1771 and close the file yourself (e.g. if you want to read wisdom from a | |
1772 portion of a larger file) is @code{fftw_import_wisdom_from_file}. This | |
1773 reads wisdom from the current position in @code{input_file} (which | |
1774 should be open with read permission); upon exit, the file remains | |
1775 open, but the position of the read pointer is unspecified. | |
1776 | |
1777 @code{fftw_import_wisdom_from_string} reads wisdom from the | |
1778 @code{NULL}-terminated string @code{input_string}. | |
1779 | |
1780 @code{fftw_import_system_wisdom} reads wisdom from an | |
1781 implementation-defined standard file (@code{/etc/fftw/wisdom} on Unix | |
1782 and GNU systems). | |
1783 @cindex wisdom, system-wide | |
1784 | |
1785 | |
1786 The return value of these import routines is @code{1} if the wisdom was | |
1787 read successfully and @code{0} otherwise. Note that, in all of these | |
1788 functions, any data in the input stream past the end of the wisdom data | |
1789 is simply ignored. | |
1790 | |
1791 @c =========> | |
1792 @node Forgetting Wisdom, Wisdom Utilities, Wisdom Import, Wisdom | |
1793 @subsection Forgetting Wisdom | |
1794 | |
1795 @example | |
1796 void fftw_forget_wisdom(void); | |
1797 @end example | |
1798 @findex fftw_forget_wisdom | |
1799 | |
1800 Calling @code{fftw_forget_wisdom} causes all accumulated @code{wisdom} | |
1801 to be discarded and its associated memory to be freed. (New | |
1802 @code{wisdom} can still be gathered subsequently, however.) | |
1803 | |
1804 @c =========> | |
1805 @node Wisdom Utilities, , Forgetting Wisdom, Wisdom | |
1806 @subsection Wisdom Utilities | |
1807 | |
1808 FFTW includes two standalone utility programs that deal with wisdom. We | |
1809 merely summarize them here, since they come with their own @code{man} | |
1810 pages for Unix and GNU systems (with HTML versions on our web site). | |
1811 | |
1812 The first program is @code{fftw-wisdom} (or @code{fftwf-wisdom} in | |
1813 single precision, etcetera), which can be used to create a wisdom file | |
1814 containing plans for any of the transform sizes and types supported by | |
1815 FFTW. It is preferable to create wisdom directly from your executable | |
1816 (@pxref{Caveats in Using Wisdom}), but this program is useful for | |
1817 creating global wisdom files for @code{fftw_import_system_wisdom}. | |
1818 @cindex fftw-wisdom utility | |
1819 | |
1820 | |
1821 The second program is @code{fftw-wisdom-to-conf}, which takes a wisdom | |
1822 file as input and produces a @dfn{configuration routine} as output. The | |
1823 latter is a C subroutine that you can compile and link into your | |
1824 program, replacing a routine of the same name in the FFTW library, that | |
1825 determines which parts of FFTW are callable by your program. | |
1826 @code{fftw-wisdom-to-conf} produces a configuration routine that links | |
1827 to only those parts of FFTW needed by the saved plans in the wisdom, | |
1828 greatly reducing the size of statically linked executables (which should | |
1829 only attempt to create plans corresponding to those in the wisdom, | |
1830 however). | |
1831 @cindex fftw-wisdom-to-conf utility | |
1832 @cindex configuration routines | |
1833 | |
1834 @c ------------------------------------------------------------ | |
1835 @node What FFTW Really Computes, , Wisdom, FFTW Reference | |
1836 @section What FFTW Really Computes | |
1837 | |
1838 In this section, we provide precise mathematical definitions for the | |
1839 transforms that FFTW computes. These transform definitions are fairly | |
1840 standard, but some authors follow slightly different conventions for the | |
1841 normalization of the transform (the constant factor in front) and the | |
1842 sign of the complex exponent. We begin by presenting the | |
1843 one-dimensional (1d) transform definitions, and then give the | |
1844 straightforward extension to multi-dimensional transforms. | |
1845 | |
1846 @menu | |
1847 * The 1d Discrete Fourier Transform (DFT):: | |
1848 * The 1d Real-data DFT:: | |
1849 * 1d Real-even DFTs (DCTs):: | |
1850 * 1d Real-odd DFTs (DSTs):: | |
1851 * 1d Discrete Hartley Transforms (DHTs):: | |
1852 * Multi-dimensional Transforms:: | |
1853 @end menu | |
1854 | |
1855 @c =========> | |
1856 @node The 1d Discrete Fourier Transform (DFT), The 1d Real-data DFT, What FFTW Really Computes, What FFTW Really Computes | |
1857 @subsection The 1d Discrete Fourier Transform (DFT) | |
1858 | |
1859 @cindex discrete Fourier transform | |
1860 @cindex DFT | |
1861 The forward (@code{FFTW_FORWARD}) discrete Fourier transform (DFT) of a | |
1862 1d complex array @math{X} of size @math{n} computes an array @math{Y}, | |
1863 where: | |
1864 @tex | |
1865 $$ | |
1866 Y_k = \sum_{j = 0}^{n - 1} X_j e^{-2\pi j k \sqrt{-1}/n} \ . | |
1867 $$ | |
1868 @end tex | |
1869 @ifinfo | |
1870 @center Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(-2 pi j k sqrt(-1)/n) . | |
1871 @end ifinfo | |
1872 @html | |
1873 <center><img src="equation-dft.png" align="top">.</center> | |
1874 @end html | |
1875 The backward (@code{FFTW_BACKWARD}) DFT computes: | |
1876 @tex | |
1877 $$ | |
1878 Y_k = \sum_{j = 0}^{n - 1} X_j e^{2\pi j k \sqrt{-1}/n} \ . | |
1879 $$ | |
1880 @end tex | |
1881 @ifinfo | |
1882 @center Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(2 pi j k sqrt(-1)/n) . | |
1883 @end ifinfo | |
1884 @html | |
1885 <center><img src="equation-idft.png" align="top">.</center> | |
1886 @end html | |
1887 | |
1888 @cindex normalization | |
1889 FFTW computes an unnormalized transform, in that there is no coefficient | |
1890 in front of the summation in the DFT. In other words, applying the | |
1891 forward and then the backward transform will multiply the input by | |
1892 @math{n}. | |
1893 | |
1894 @cindex frequency | |
1895 From above, an @code{FFTW_FORWARD} transform corresponds to a sign of | |
1896 @math{-1} in the exponent of the DFT. Note also that we use the | |
1897 standard ``in-order'' output ordering---the @math{k}-th output | |
1898 corresponds to the frequency @math{k/n} (or @math{k/T}, where @math{T} | |
1899 is your total sampling period). For those who like to think in terms of | |
1900 positive and negative frequencies, this means that the positive | |
1901 frequencies are stored in the first half of the output and the negative | |
1902 frequencies are stored in backwards order in the second half of the | |
1903 output. (The frequency @math{-k/n} is the same as the frequency | |
1904 @math{(n-k)/n}.) | |
1905 | |
1906 @c =========> | |
1907 @node The 1d Real-data DFT, 1d Real-even DFTs (DCTs), The 1d Discrete Fourier Transform (DFT), What FFTW Really Computes | |
1908 @subsection The 1d Real-data DFT | |
1909 | |
1910 The real-input (r2c) DFT in FFTW computes the @emph{forward} transform | |
1911 @math{Y} of the size @code{n} real array @math{X}, exactly as defined | |
1912 above, i.e. | |
1913 @tex | |
1914 $$ | |
1915 Y_k = \sum_{j = 0}^{n - 1} X_j e^{-2\pi j k \sqrt{-1}/n} \ . | |
1916 $$ | |
1917 @end tex | |
1918 @ifinfo | |
1919 @center Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(-2 pi j k sqrt(-1)/n) . | |
1920 @end ifinfo | |
1921 @html | |
1922 <center><img src="equation-dft.png" align="top">.</center> | |
1923 @end html | |
1924 This output array @math{Y} can easily be shown to possess the | |
1925 ``Hermitian'' symmetry | |
1926 @cindex Hermitian | |
1927 @tex | |
1928 $Y_k = Y_{n-k}^*$, | |
1929 @end tex | |
1930 @ifinfo | |
1931 Y[k] = Y[n-k]*, | |
1932 @end ifinfo | |
1933 @html | |
1934 <i>Y<sub>k</sub> = Y<sub>n-k</sub></i><sup>*</sup>, | |
1935 @end html | |
1936 where we take @math{Y} to be periodic so that | |
1937 @tex | |
1938 $Y_n = Y_0$. | |
1939 @end tex | |
1940 @ifinfo | |
1941 Y[n] = Y[0]. | |
1942 @end ifinfo | |
1943 @html | |
1944 <i>Y<sub>n</sub> = Y</i><sub>0</sub>. | |
1945 @end html | |
1946 | |
1947 As a result of this symmetry, half of the output @math{Y} is redundant | |
1948 (being the complex conjugate of the other half), and so the 1d r2c | |
1949 transforms only output elements @math{0}@dots{}@math{n/2} of @math{Y} | |
1950 (@math{n/2+1} complex numbers), where the division by @math{2} is | |
1951 rounded down. | |
1952 | |
1953 Moreover, the Hermitian symmetry implies that | |
1954 @tex | |
1955 $Y_0$ | |
1956 @end tex | |
1957 @ifinfo | |
1958 Y[0] | |
1959 @end ifinfo | |
1960 @html | |
1961 <i>Y</i><sub>0</sub> | |
1962 @end html | |
1963 and, if @math{n} is even, the | |
1964 @tex | |
1965 $Y_{n/2}$ | |
1966 @end tex | |
1967 @ifinfo | |
1968 Y[n/2] | |
1969 @end ifinfo | |
1970 @html | |
1971 <i>Y</i><sub><i>n</i>/2</sub> | |
1972 @end html | |
1973 element, are purely real. So, for the @code{R2HC} r2r transform, the | |
1974 halfcomplex format does not store the imaginary parts of these elements. | |
1975 @cindex r2r | |
1976 @ctindex R2HC | |
1977 @cindex halfcomplex format | |
1978 | |
1979 | |
1980 The c2r and @code{H2RC} r2r transforms compute the backward DFT of the | |
1981 @emph{complex} array @math{X} with Hermitian symmetry, stored in the | |
1982 r2c/@code{R2HC} output formats, respectively, where the backward | |
1983 transform is defined exactly as for the complex case: | |
1984 @tex | |
1985 $$ | |
1986 Y_k = \sum_{j = 0}^{n - 1} X_j e^{2\pi j k \sqrt{-1}/n} \ . | |
1987 $$ | |
1988 @end tex | |
1989 @ifinfo | |
1990 @center Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(2 pi j k sqrt(-1)/n) . | |
1991 @end ifinfo | |
1992 @html | |
1993 <center><img src="equation-idft.png" align="top">.</center> | |
1994 @end html | |
1995 The outputs @code{Y} of this transform can easily be seen to be purely | |
1996 real, and are stored as an array of real numbers. | |
1997 | |
1998 @cindex normalization | |
1999 Like FFTW's complex DFT, these transforms are unnormalized. In other | |
2000 words, applying the real-to-complex (forward) and then the | |
2001 complex-to-real (backward) transform will multiply the input by | |
2002 @math{n}. | |
2003 | |
2004 @c =========> | |
2005 @node 1d Real-even DFTs (DCTs), 1d Real-odd DFTs (DSTs), The 1d Real-data DFT, What FFTW Really Computes | |
2006 @subsection 1d Real-even DFTs (DCTs) | |
2007 | |
2008 The Real-even symmetry DFTs in FFTW are exactly equivalent to the unnormalized | |
2009 forward (and backward) DFTs as defined above, where the input array | |
2010 @math{X} of length @math{N} is purely real and is also @dfn{even} symmetry. In | |
2011 this case, the output array is likewise real and even symmetry. | |
2012 @cindex real-even DFT | |
2013 @cindex REDFT | |
2014 | |
2015 | |
2016 @ctindex REDFT00 | |
2017 For the case of @code{REDFT00}, this even symmetry means that | |
2018 @tex | |
2019 $X_j = X_{N-j}$, | |
2020 @end tex | |
2021 @ifinfo | |
2022 X[j] = X[N-j], | |
2023 @end ifinfo | |
2024 @html | |
2025 <i>X<sub>j</sub> = X<sub>N-j</sub></i>, | |
2026 @end html | |
2027 where we take @math{X} to be periodic so that | |
2028 @tex | |
2029 $X_N = X_0$. | |
2030 @end tex | |
2031 @ifinfo | |
2032 X[N] = X[0]. | |
2033 @end ifinfo | |
2034 @html | |
2035 <i>X<sub>N</sub> = X</i><sub>0</sub>. | |
2036 @end html | |
2037 Because of this redundancy, only the first @math{n} real numbers are | |
2038 actually stored, where @math{N = 2(n-1)}. | |
2039 | |
2040 The proper definition of even symmetry for @code{REDFT10}, | |
2041 @code{REDFT01}, and @code{REDFT11} transforms is somewhat more intricate | |
2042 because of the shifts by @math{1/2} of the input and/or output, although | |
2043 the corresponding boundary conditions are given in @ref{Real even/odd | |
2044 DFTs (cosine/sine transforms)}. Because of the even symmetry, however, | |
2045 the sine terms in the DFT all cancel and the remaining cosine terms are | |
2046 written explicitly below. This formulation often leads people to call | |
2047 such a transform a @dfn{discrete cosine transform} (DCT), although it is | |
2048 really just a special case of the DFT. | |
2049 @cindex discrete cosine transform | |
2050 @cindex DCT | |
2051 | |
2052 | |
2053 In each of the definitions below, we transform a real array @math{X} of | |
2054 length @math{n} to a real array @math{Y} of length @math{n}: | |
2055 | |
2056 @subsubheading REDFT00 (DCT-I) | |
2057 @ctindex REDFT00 | |
2058 An @code{REDFT00} transform (type-I DCT) in FFTW is defined by: | |
2059 @tex | |
2060 $$ | |
2061 Y_k = X_0 + (-1)^k X_{n-1} | |
2062 + 2 \sum_{j=1}^{n-2} X_j \cos [ \pi j k / (n-1)]. | |
2063 $$ | |
2064 @end tex | |
2065 @ifinfo | |
2066 Y[k] = X[0] + (-1)^k X[n-1] + 2 (sum for j = 1 to n-2 of X[j] cos(pi jk /(n-1))). | |
2067 @end ifinfo | |
2068 @html | |
2069 <center><img src="equation-redft00.png" align="top">.</center> | |
2070 @end html | |
2071 Note that this transform is not defined for @math{n=1}. For @math{n=2}, | |
2072 the summation term above is dropped as you might expect. | |
2073 | |
2074 @subsubheading REDFT10 (DCT-II) | |
2075 @ctindex REDFT10 | |
2076 An @code{REDFT10} transform (type-II DCT, sometimes called ``the'' DCT) in FFTW is defined by: | |
2077 @tex | |
2078 $$ | |
2079 Y_k = 2 \sum_{j=0}^{n-1} X_j \cos [\pi (j+1/2) k / n]. | |
2080 $$ | |
2081 @end tex | |
2082 @ifinfo | |
2083 Y[k] = 2 (sum for j = 0 to n-1 of X[j] cos(pi (j+1/2) k / n)). | |
2084 @end ifinfo | |
2085 @html | |
2086 <center><img src="equation-redft10.png" align="top">.</center> | |
2087 @end html | |
2088 | |
2089 @subsubheading REDFT01 (DCT-III) | |
2090 @ctindex REDFT01 | |
2091 An @code{REDFT01} transform (type-III DCT) in FFTW is defined by: | |
2092 @tex | |
2093 $$ | |
2094 Y_k = X_0 + 2 \sum_{j=1}^{n-1} X_j \cos [\pi j (k+1/2) / n]. | |
2095 $$ | |
2096 @end tex | |
2097 @ifinfo | |
2098 Y[k] = X[0] + 2 (sum for j = 1 to n-1 of X[j] cos(pi j (k+1/2) / n)). | |
2099 @end ifinfo | |
2100 @html | |
2101 <center><img src="equation-redft01.png" align="top">.</center> | |
2102 @end html | |
2103 In the case of @math{n=1}, this reduces to | |
2104 @tex | |
2105 $Y_0 = X_0$. | |
2106 @end tex | |
2107 @ifinfo | |
2108 Y[0] = X[0]. | |
2109 @end ifinfo | |
2110 @html | |
2111 <i>Y</i><sub>0</sub> = <i>X</i><sub>0</sub>. | |
2112 @end html | |
2113 Up to a scale factor (see below), this is the inverse of @code{REDFT10} (``the'' DCT), and so the @code{REDFT01} (DCT-III) is sometimes called the ``IDCT''. | |
2114 @cindex IDCT | |
2115 | |
2116 @subsubheading REDFT11 (DCT-IV) | |
2117 @ctindex REDFT11 | |
2118 An @code{REDFT11} transform (type-IV DCT) in FFTW is defined by: | |
2119 @tex | |
2120 $$ | |
2121 Y_k = 2 \sum_{j=0}^{n-1} X_j \cos [\pi (j+1/2) (k+1/2) / n]. | |
2122 $$ | |
2123 @end tex | |
2124 @ifinfo | |
2125 Y[k] = 2 (sum for j = 0 to n-1 of X[j] cos(pi (j+1/2) (k+1/2) / n)). | |
2126 @end ifinfo | |
2127 @html | |
2128 <center><img src="equation-redft11.png" align="top">.</center> | |
2129 @end html | |
2130 | |
2131 @subsubheading Inverses and Normalization | |
2132 | |
2133 These definitions correspond directly to the unnormalized DFTs used | |
2134 elsewhere in FFTW (hence the factors of @math{2} in front of the | |
2135 summations). The unnormalized inverse of @code{REDFT00} is | |
2136 @code{REDFT00}, of @code{REDFT10} is @code{REDFT01} and vice versa, and | |
2137 of @code{REDFT11} is @code{REDFT11}. Each unnormalized inverse results | |
2138 in the original array multiplied by @math{N}, where @math{N} is the | |
2139 @emph{logical} DFT size. For @code{REDFT00}, @math{N=2(n-1)} (note that | |
2140 @math{n=1} is not defined); otherwise, @math{N=2n}. | |
2141 @cindex normalization | |
2142 | |
2143 | |
2144 In defining the discrete cosine transform, some authors also include | |
2145 additional factors of | |
2146 @ifinfo | |
2147 sqrt(2) | |
2148 @end ifinfo | |
2149 @html | |
2150 √2 | |
2151 @end html | |
2152 @tex | |
2153 $\sqrt{2}$ | |
2154 @end tex | |
2155 (or its inverse) multiplying selected inputs and/or outputs. This is a | |
2156 mostly cosmetic change that makes the transform orthogonal, but | |
2157 sacrifices the direct equivalence to a symmetric DFT. | |
2158 | |
2159 @c =========> | |
2160 @node 1d Real-odd DFTs (DSTs), 1d Discrete Hartley Transforms (DHTs), 1d Real-even DFTs (DCTs), What FFTW Really Computes | |
2161 @subsection 1d Real-odd DFTs (DSTs) | |
2162 | |
2163 The Real-odd symmetry DFTs in FFTW are exactly equivalent to the unnormalized | |
2164 forward (and backward) DFTs as defined above, where the input array | |
2165 @math{X} of length @math{N} is purely real and is also @dfn{odd} symmetry. In | |
2166 this case, the output is odd symmetry and purely imaginary. | |
2167 @cindex real-odd DFT | |
2168 @cindex RODFT | |
2169 | |
2170 | |
2171 @ctindex RODFT00 | |
2172 For the case of @code{RODFT00}, this odd symmetry means that | |
2173 @tex | |
2174 $X_j = -X_{N-j}$, | |
2175 @end tex | |
2176 @ifinfo | |
2177 X[j] = -X[N-j], | |
2178 @end ifinfo | |
2179 @html | |
2180 <i>X<sub>j</sub> = -X<sub>N-j</sub></i>, | |
2181 @end html | |
2182 where we take @math{X} to be periodic so that | |
2183 @tex | |
2184 $X_N = X_0$. | |
2185 @end tex | |
2186 @ifinfo | |
2187 X[N] = X[0]. | |
2188 @end ifinfo | |
2189 @html | |
2190 <i>X<sub>N</sub> = X</i><sub>0</sub>. | |
2191 @end html | |
2192 Because of this redundancy, only the first @math{n} real numbers | |
2193 starting at @math{j=1} are actually stored (the @math{j=0} element is | |
2194 zero), where @math{N = 2(n+1)}. | |
2195 | |
2196 The proper definition of odd symmetry for @code{RODFT10}, | |
2197 @code{RODFT01}, and @code{RODFT11} transforms is somewhat more intricate | |
2198 because of the shifts by @math{1/2} of the input and/or output, although | |
2199 the corresponding boundary conditions are given in @ref{Real even/odd | |
2200 DFTs (cosine/sine transforms)}. Because of the odd symmetry, however, | |
2201 the cosine terms in the DFT all cancel and the remaining sine terms are | |
2202 written explicitly below. This formulation often leads people to call | |
2203 such a transform a @dfn{discrete sine transform} (DST), although it is | |
2204 really just a special case of the DFT. | |
2205 @cindex discrete sine transform | |
2206 @cindex DST | |
2207 | |
2208 | |
2209 In each of the definitions below, we transform a real array @math{X} of | |
2210 length @math{n} to a real array @math{Y} of length @math{n}: | |
2211 | |
2212 @subsubheading RODFT00 (DST-I) | |
2213 @ctindex RODFT00 | |
2214 An @code{RODFT00} transform (type-I DST) in FFTW is defined by: | |
2215 @tex | |
2216 $$ | |
2217 Y_k = 2 \sum_{j=0}^{n-1} X_j \sin [ \pi (j+1) (k+1) / (n+1)]. | |
2218 $$ | |
2219 @end tex | |
2220 @ifinfo | |
2221 Y[k] = 2 (sum for j = 0 to n-1 of X[j] sin(pi (j+1)(k+1) / (n+1))). | |
2222 @end ifinfo | |
2223 @html | |
2224 <center><img src="equation-rodft00.png" align="top">.</center> | |
2225 @end html | |
2226 | |
2227 @subsubheading RODFT10 (DST-II) | |
2228 @ctindex RODFT10 | |
2229 An @code{RODFT10} transform (type-II DST) in FFTW is defined by: | |
2230 @tex | |
2231 $$ | |
2232 Y_k = 2 \sum_{j=0}^{n-1} X_j \sin [\pi (j+1/2) (k+1) / n]. | |
2233 $$ | |
2234 @end tex | |
2235 @ifinfo | |
2236 Y[k] = 2 (sum for j = 0 to n-1 of X[j] sin(pi (j+1/2) (k+1) / n)). | |
2237 @end ifinfo | |
2238 @html | |
2239 <center><img src="equation-rodft10.png" align="top">.</center> | |
2240 @end html | |
2241 | |
2242 @subsubheading RODFT01 (DST-III) | |
2243 @ctindex RODFT01 | |
2244 An @code{RODFT01} transform (type-III DST) in FFTW is defined by: | |
2245 @tex | |
2246 $$ | |
2247 Y_k = (-1)^k X_{n-1} + 2 \sum_{j=0}^{n-2} X_j \sin [\pi (j+1) (k+1/2) / n]. | |
2248 $$ | |
2249 @end tex | |
2250 @ifinfo | |
2251 Y[k] = (-1)^k X[n-1] + 2 (sum for j = 0 to n-2 of X[j] sin(pi (j+1) (k+1/2) / n)). | |
2252 @end ifinfo | |
2253 @html | |
2254 <center><img src="equation-rodft01.png" align="top">.</center> | |
2255 @end html | |
2256 In the case of @math{n=1}, this reduces to | |
2257 @tex | |
2258 $Y_0 = X_0$. | |
2259 @end tex | |
2260 @ifinfo | |
2261 Y[0] = X[0]. | |
2262 @end ifinfo | |
2263 @html | |
2264 <i>Y</i><sub>0</sub> = <i>X</i><sub>0</sub>. | |
2265 @end html | |
2266 | |
2267 @subsubheading RODFT11 (DST-IV) | |
2268 @ctindex RODFT11 | |
2269 An @code{RODFT11} transform (type-IV DST) in FFTW is defined by: | |
2270 @tex | |
2271 $$ | |
2272 Y_k = 2 \sum_{j=0}^{n-1} X_j \sin [\pi (j+1/2) (k+1/2) / n]. | |
2273 $$ | |
2274 @end tex | |
2275 @ifinfo | |
2276 Y[k] = 2 (sum for j = 0 to n-1 of X[j] sin(pi (j+1/2) (k+1/2) / n)). | |
2277 @end ifinfo | |
2278 @html | |
2279 <center><img src="equation-rodft11.png" align="top">.</center> | |
2280 @end html | |
2281 | |
2282 @subsubheading Inverses and Normalization | |
2283 | |
2284 These definitions correspond directly to the unnormalized DFTs used | |
2285 elsewhere in FFTW (hence the factors of @math{2} in front of the | |
2286 summations). The unnormalized inverse of @code{RODFT00} is | |
2287 @code{RODFT00}, of @code{RODFT10} is @code{RODFT01} and vice versa, and | |
2288 of @code{RODFT11} is @code{RODFT11}. Each unnormalized inverse results | |
2289 in the original array multiplied by @math{N}, where @math{N} is the | |
2290 @emph{logical} DFT size. For @code{RODFT00}, @math{N=2(n+1)}; | |
2291 otherwise, @math{N=2n}. | |
2292 @cindex normalization | |
2293 | |
2294 | |
2295 In defining the discrete sine transform, some authors also include | |
2296 additional factors of | |
2297 @ifinfo | |
2298 sqrt(2) | |
2299 @end ifinfo | |
2300 @html | |
2301 √2 | |
2302 @end html | |
2303 @tex | |
2304 $\sqrt{2}$ | |
2305 @end tex | |
2306 (or its inverse) multiplying selected inputs and/or outputs. This is a | |
2307 mostly cosmetic change that makes the transform orthogonal, but | |
2308 sacrifices the direct equivalence to an antisymmetric DFT. | |
2309 | |
2310 @c =========> | |
2311 @node 1d Discrete Hartley Transforms (DHTs), Multi-dimensional Transforms, 1d Real-odd DFTs (DSTs), What FFTW Really Computes | |
2312 @subsection 1d Discrete Hartley Transforms (DHTs) | |
2313 | |
2314 @cindex discrete Hartley transform | |
2315 @cindex DHT | |
2316 The discrete Hartley transform (DHT) of a 1d real array @math{X} of size | |
2317 @math{n} computes a real array @math{Y} of the same size, where: | |
2318 @tex | |
2319 $$ | |
2320 Y_k = \sum_{j = 0}^{n - 1} X_j [ \cos(2\pi j k / n) + \sin(2\pi j k / n)]. | |
2321 $$ | |
2322 @end tex | |
2323 @ifinfo | |
2324 @center Y[k] = sum for j = 0 to (n - 1) of X[j] * [cos(2 pi j k / n) + sin(2 pi j k / n)]. | |
2325 @end ifinfo | |
2326 @html | |
2327 <center><img src="equation-dht.png" align="top">.</center> | |
2328 @end html | |
2329 | |
2330 @cindex normalization | |
2331 FFTW computes an unnormalized transform, in that there is no coefficient | |
2332 in front of the summation in the DHT. In other words, applying the | |
2333 transform twice (the DHT is its own inverse) will multiply the input by | |
2334 @math{n}. | |
2335 | |
2336 @c =========> | |
2337 @node Multi-dimensional Transforms, , 1d Discrete Hartley Transforms (DHTs), What FFTW Really Computes | |
2338 @subsection Multi-dimensional Transforms | |
2339 | |
2340 The multi-dimensional transforms of FFTW, in general, compute simply the | |
2341 separable product of the given 1d transform along each dimension of the | |
2342 array. Since each of these transforms is unnormalized, computing the | |
2343 forward followed by the backward/inverse multi-dimensional transform | |
2344 will result in the original array scaled by the product of the | |
2345 normalization factors for each dimension (e.g. the product of the | |
2346 dimension sizes, for a multi-dimensional DFT). | |
2347 | |
2348 @tex | |
2349 As an explicit example, consider the following exact mathematical | |
2350 definition of our multi-dimensional DFT. Let $X$ be a $d$-dimensional | |
2351 complex array whose elements are $X[j_1, j_2, \ldots, j_d]$, where $0 | |
2352 \leq j_s < n_s$ for all~$s \in \{ 1, 2, \ldots, d \}$. Let also | |
2353 $\omega_s = e^{2\pi \sqrt{-1}/n_s}$, for all ~$s \in \{ 1, 2, \ldots, d | |
2354 \}$. | |
2355 | |
2356 The forward transform computes a complex array~$Y$, whose | |
2357 structure is the same as that of~$X$, defined by | |
2358 | |
2359 $$ | |
2360 Y[k_1, k_2, \ldots, k_d] = | |
2361 \sum_{j_1 = 0}^{n_1 - 1} | |
2362 \sum_{j_2 = 0}^{n_2 - 1} | |
2363 \cdots | |
2364 \sum_{j_d = 0}^{n_d - 1} | |
2365 X[j_1, j_2, \ldots, j_d] | |
2366 \omega_1^{-j_1 k_1} | |
2367 \omega_2^{-j_2 k_2} | |
2368 \cdots | |
2369 \omega_d^{-j_d k_d} \ . | |
2370 $$ | |
2371 | |
2372 The backward transform computes | |
2373 $$ | |
2374 Y[k_1, k_2, \ldots, k_d] = | |
2375 \sum_{j_1 = 0}^{n_1 - 1} | |
2376 \sum_{j_2 = 0}^{n_2 - 1} | |
2377 \cdots | |
2378 \sum_{j_d = 0}^{n_d - 1} | |
2379 X[j_1, j_2, \ldots, j_d] | |
2380 \omega_1^{j_1 k_1} | |
2381 \omega_2^{j_2 k_2} | |
2382 \cdots | |
2383 \omega_d^{j_d k_d} \ . | |
2384 $$ | |
2385 | |
2386 Computing the forward transform followed by the backward transform | |
2387 will multiply the array by $\prod_{s=1}^{d} n_d$. | |
2388 @end tex | |
2389 | |
2390 @cindex r2c | |
2391 The definition of FFTW's multi-dimensional DFT of real data (r2c) | |
2392 deserves special attention. In this case, we logically compute the full | |
2393 multi-dimensional DFT of the input data; since the input data are purely | |
2394 real, the output data have the Hermitian symmetry and therefore only one | |
2395 non-redundant half need be stored. More specifically, for an @ndims multi-dimensional real-input DFT, the full (logical) complex output array | |
2396 @tex | |
2397 $Y[k_0, k_1, \ldots, k_{d-1}]$ | |
2398 @end tex | |
2399 @html | |
2400 <i>Y</i>[<i>k</i><sub>0</sub>, <i>k</i><sub>1</sub>, ..., | |
2401 <i>k</i><sub><i>d-1</i></sub>] | |
2402 @end html | |
2403 @ifinfo | |
2404 Y[k[0], k[1], ..., k[d-1]] | |
2405 @end ifinfo | |
2406 has the symmetry: | |
2407 @tex | |
2408 $$ | |
2409 Y[k_0, k_1, \ldots, k_{d-1}] = Y[n_0 - k_0, n_1 - k_1, \ldots, n_{d-1} - k_{d-1}]^* | |
2410 $$ | |
2411 @end tex | |
2412 @html | |
2413 <i>Y</i>[<i>k</i><sub>0</sub>, <i>k</i><sub>1</sub>, ..., | |
2414 <i>k</i><sub><i>d-1</i></sub>] = <i>Y</i>[<i>n</i><sub>0</sub> - | |
2415 <i>k</i><sub>0</sub>, <i>n</i><sub>1</sub> - <i>k</i><sub>1</sub>, ..., | |
2416 <i>n</i><sub><i>d-1</i></sub> - <i>k</i><sub><i>d-1</i></sub>]<sup>*</sup> | |
2417 @end html | |
2418 @ifinfo | |
2419 Y[k[0], k[1], ..., k[d-1]] = Y[n[0] - k[0], n[1] - k[1], ..., n[d-1] - k[d-1]]* | |
2420 @end ifinfo | |
2421 (where each dimension is periodic). Because of this symmetry, we only | |
2422 store the | |
2423 @tex | |
2424 $k_{d-1} = 0 \cdots n_{d-1}/2$ | |
2425 @end tex | |
2426 @html | |
2427 <i>k</i><sub><i>d-1</i></sub> = 0...<i>n</i><sub><i>d-1</i></sub>/2+1 | |
2428 @end html | |
2429 @ifinfo | |
2430 k[d-1] = 0...n[d-1]/2 | |
2431 @end ifinfo | |
2432 elements of the @emph{last} dimension (division by @math{2} is rounded | |
2433 down). (We could instead have cut any other dimension in half, but the | |
2434 last dimension proved computationally convenient.) This results in the | |
2435 peculiar array format described in more detail by @ref{Real-data DFT | |
2436 Array Format}. | |
2437 | |
2438 The multi-dimensional c2r transform is simply the unnormalized inverse | |
2439 of the r2c transform. i.e. it is the same as FFTW's complex backward | |
2440 multi-dimensional DFT, operating on a Hermitian input array in the | |
2441 peculiar format mentioned above and outputting a real array (since the | |
2442 DFT output is purely real). | |
2443 | |
2444 We should remind the user that the separable product of 1d transforms | |
2445 along each dimension, as computed by FFTW, is not always the same thing | |
2446 as the usual multi-dimensional transform. A multi-dimensional | |
2447 @code{R2HC} (or @code{HC2R}) transform is not identical to the | |
2448 multi-dimensional DFT, requiring some post-processing to combine the | |
2449 requisite real and imaginary parts, as was described in @ref{The | |
2450 Halfcomplex-format DFT}. Likewise, FFTW's multidimensional | |
2451 @code{FFTW_DHT} r2r transform is not the same thing as the logical | |
2452 multi-dimensional discrete Hartley transform defined in the literature, | |
2453 as discussed in @ref{The Discrete Hartley Transform}. | |
2454 |