comparison fft/fftw/fftw-3.3.4/doc/fftw3.info-1 @ 19:26056e866c29

Add FFTW to comparison table
author Chris Cannam
date Tue, 06 Oct 2015 13:08:39 +0100
parents
children
comparison
equal deleted inserted replaced
18:8db794ca3e0b 19:26056e866c29
1 This is fftw3.info, produced by makeinfo version 4.13 from fftw3.texi.
2
3 This manual is for FFTW (version 3.3.4, 20 September 2013).
4
5 Copyright (C) 2003 Matteo Frigo.
6
7 Copyright (C) 2003 Massachusetts Institute of Technology.
8
9 Permission is granted to make and distribute verbatim copies of
10 this manual provided the copyright notice and this permission
11 notice are preserved on all copies.
12
13 Permission is granted to copy and distribute modified versions of
14 this manual under the conditions for verbatim copying, provided
15 that the entire resulting derived work is distributed under the
16 terms of a permission notice identical to this one.
17
18 Permission is granted to copy and distribute translations of this
19 manual into another language, under the above conditions for
20 modified versions, except that this permission notice may be
21 stated in a translation approved by the Free Software Foundation.
22
23 INFO-DIR-SECTION Development
24 START-INFO-DIR-ENTRY
25 * fftw3: (fftw3). FFTW User's Manual.
26 END-INFO-DIR-ENTRY
27
28 
29 File: fftw3.info, Node: Top, Next: Introduction, Prev: (dir), Up: (dir)
30
31 FFTW User Manual
32 ****************
33
34 Welcome to FFTW, the Fastest Fourier Transform in the West. FFTW is a
35 collection of fast C routines to compute the discrete Fourier transform.
36 This manual documents FFTW version 3.3.4.
37
38 * Menu:
39
40 * Introduction::
41 * Tutorial::
42 * Other Important Topics::
43 * FFTW Reference::
44 * Multi-threaded FFTW::
45 * Distributed-memory FFTW with MPI::
46 * Calling FFTW from Modern Fortran::
47 * Calling FFTW from Legacy Fortran::
48 * Upgrading from FFTW version 2::
49 * Installation and Customization::
50 * Acknowledgments::
51 * License and Copyright::
52 * Concept Index::
53 * Library Index::
54
55 
56 File: fftw3.info, Node: Introduction, Next: Tutorial, Prev: Top, Up: Top
57
58 1 Introduction
59 **************
60
61 This manual documents version 3.3.4 of FFTW, the _Fastest Fourier
62 Transform in the West_. FFTW is a comprehensive collection of fast C
63 routines for computing the discrete Fourier transform (DFT) and various
64 special cases thereof.
65 * FFTW computes the DFT of complex data, real data, even- or
66 odd-symmetric real data (these symmetric transforms are usually
67 known as the discrete cosine or sine transform, respectively), and
68 the discrete Hartley transform (DHT) of real data.
69
70 * The input data can have arbitrary length. FFTW employs O(n
71 log n) algorithms for all lengths, including prime numbers.
72
73 * FFTW supports arbitrary multi-dimensional data.
74
75 * FFTW supports the SSE, SSE2, AVX, Altivec, and MIPS PS instruction
76 sets.
77
78 * FFTW includes parallel (multi-threaded) transforms for
79 shared-memory systems.
80
81 * Starting with version 3.3, FFTW includes distributed-memory
82 parallel transforms using MPI.
83
84 We assume herein that you are familiar with the properties and uses
85 of the DFT that are relevant to your application. Otherwise, see e.g.
86 `The Fast Fourier Transform and Its Applications' by E. O. Brigham
87 (Prentice-Hall, Englewood Cliffs, NJ, 1988). Our web page
88 (http://www.fftw.org) also has links to FFT-related information online.
89
90 In order to use FFTW effectively, you need to learn one basic concept
91 of FFTW's internal structure: FFTW does not use a fixed algorithm for
92 computing the transform, but instead it adapts the DFT algorithm to
93 details of the underlying hardware in order to maximize performance.
94 Hence, the computation of the transform is split into two phases.
95 First, FFTW's "planner" "learns" the fastest way to compute the
96 transform on your machine. The planner produces a data structure
97 called a "plan" that contains this information. Subsequently, the plan
98 is "executed" to transform the array of input data as dictated by the
99 plan. The plan can be reused as many times as needed. In typical
100 high-performance applications, many transforms of the same size are
101 computed and, consequently, a relatively expensive initialization of
102 this sort is acceptable. On the other hand, if you need a single
103 transform of a given size, the one-time cost of the planner becomes
104 significant. For this case, FFTW provides fast planners based on
105 heuristics or on previously computed plans.
106
107 FFTW supports transforms of data with arbitrary length, rank,
108 multiplicity, and a general memory layout. In simple cases, however,
109 this generality may be unnecessary and confusing. Consequently, we
110 organized the interface to FFTW into three levels of increasing
111 generality.
112 * The "basic interface" computes a single transform of
113 contiguous data.
114
115 * The "advanced interface" computes transforms of multiple or
116 strided arrays.
117
118 * The "guru interface" supports the most general data layouts,
119 multiplicities, and strides.
120 We expect that most users will be best served by the basic interface,
121 whereas the guru interface requires careful attention to the
122 documentation to avoid problems.
123
124 Besides the automatic performance adaptation performed by the
125 planner, it is also possible for advanced users to customize FFTW
126 manually. For example, if code space is a concern, we provide a tool
127 that links only the subset of FFTW needed by your application.
128 Conversely, you may need to extend FFTW because the standard
129 distribution is not sufficient for your needs. For example, the
130 standard FFTW distribution works most efficiently for arrays whose size
131 can be factored into small primes (2, 3, 5, and 7), and otherwise it
132 uses a slower general-purpose routine. If you need efficient
133 transforms of other sizes, you can use FFTW's code generator, which
134 produces fast C programs ("codelets") for any particular array size you
135 may care about. For example, if you need transforms of size 513 = 19 x
136 3^3, you can customize FFTW to support the factor 19 efficiently.
137
138 For more information regarding FFTW, see the paper, "The Design and
139 Implementation of FFTW3," by M. Frigo and S. G. Johnson, which was an
140 invited paper in `Proc. IEEE' 93 (2), p. 216 (2005). The code
141 generator is described in the paper "A fast Fourier transform compiler", by
142 M. Frigo, in the `Proceedings of the 1999 ACM SIGPLAN Conference on
143 Programming Language Design and Implementation (PLDI), Atlanta,
144 Georgia, May 1999'. These papers, along with the latest version of
145 FFTW, the FAQ, benchmarks, and other links, are available at the FFTW
146 home page (http://www.fftw.org).
147
148 The current version of FFTW incorporates many good ideas from the
149 past thirty years of FFT literature. In one way or another, FFTW uses
150 the Cooley-Tukey algorithm, the prime factor algorithm, Rader's
151 algorithm for prime sizes, and a split-radix algorithm (with a
152 "conjugate-pair" variation pointed out to us by Dan Bernstein). FFTW's
153 code generator also produces new algorithms that we do not completely
154 understand. The reader is referred to the cited papers for the
155 appropriate references.
156
157 The rest of this manual is organized as follows. We first discuss
158 the sequential (single-processor) implementation. We start by
159 describing the basic interface/features of FFTW in *note Tutorial::.
160 Next, *note Other Important Topics:: discusses data alignment (*note
161 SIMD alignment and fftw_malloc::), the storage scheme of
162 multi-dimensional arrays (*note Multi-dimensional Array Format::), and
163 FFTW's mechanism for storing plans on disk (*note Words of
164 Wisdom-Saving Plans::). Next, *note FFTW Reference:: provides
165 comprehensive documentation of all FFTW's features. Parallel
166 transforms are discussed in their own chapters: *note Multi-threaded
167 FFTW:: and *note Distributed-memory FFTW with MPI::. Fortran
168 programmers can also use FFTW, as described in *note Calling FFTW from
169 Legacy Fortran:: and *note Calling FFTW from Modern Fortran::. *note
170 Installation and Customization:: explains how to install FFTW in your
171 computer system and how to adapt FFTW to your needs. License and
172 copyright information is given in *note License and Copyright::.
173 Finally, we thank all the people who helped us in *note
174 Acknowledgments::.
175
176 
177 File: fftw3.info, Node: Tutorial, Next: Other Important Topics, Prev: Introduction, Up: Top
178
179 2 Tutorial
180 **********
181
182 * Menu:
183
184 * Complex One-Dimensional DFTs::
185 * Complex Multi-Dimensional DFTs::
186 * One-Dimensional DFTs of Real Data::
187 * Multi-Dimensional DFTs of Real Data::
188 * More DFTs of Real Data::
189
190 This chapter describes the basic usage of FFTW, i.e., how to compute the
191 Fourier transform of a single array. This chapter tells the truth, but
192 not the _whole_ truth. Specifically, FFTW implements additional
193 routines and flags that are not documented here, although in many cases
194 we try to indicate where added capabilities exist. For more complete
195 information, see *note FFTW Reference::. (Note that you need to
196 compile and install FFTW before you can use it in a program. For the
197 details of the installation, see *note Installation and
198 Customization::.)
199
200 We recommend that you read this tutorial in order.(1) At the least,
201 read the first section (*note Complex One-Dimensional DFTs::) before
202 reading any of the others, even if your main interest lies in one of
203 the other transform types.
204
205 Users of FFTW version 2 and earlier may also want to read *note
206 Upgrading from FFTW version 2::.
207
208 ---------- Footnotes ----------
209
210 (1) You can read the tutorial in bit-reversed order after computing
211 your first transform.
212
213 
214 File: fftw3.info, Node: Complex One-Dimensional DFTs, Next: Complex Multi-Dimensional DFTs, Prev: Tutorial, Up: Tutorial
215
216 2.1 Complex One-Dimensional DFTs
217 ================================
218
219 Plan: To bother about the best method of accomplishing an
220 accidental result. [Ambrose Bierce, `The Enlarged Devil's
221 Dictionary'.]
222
223 The basic usage of FFTW to compute a one-dimensional DFT of size `N'
224 is simple, and it typically looks something like this code:
225
226 #include <fftw3.h>
227 ...
228 {
229 fftw_complex *in, *out;
230 fftw_plan p;
231 ...
232 in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
233 out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
234 p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE);
235 ...
236 fftw_execute(p); /* repeat as needed */
237 ...
238 fftw_destroy_plan(p);
239 fftw_free(in); fftw_free(out);
240 }
241
242 You must link this code with the `fftw3' library. On Unix systems,
243 link with `-lfftw3 -lm'.
244
245 The example code first allocates the input and output arrays. You
246 can allocate them in any way that you like, but we recommend using
247 `fftw_malloc', which behaves like `malloc' except that it properly
248 aligns the array when SIMD instructions (such as SSE and Altivec) are
249 available (*note SIMD alignment and fftw_malloc::). [Alternatively, we
250 provide a convenient wrapper function `fftw_alloc_complex(N)' which has
251 the same effect.]
252
253 The data is an array of type `fftw_complex', which is by default a
254 `double[2]' composed of the real (`in[i][0]') and imaginary
255 (`in[i][1]') parts of a complex number.
256
257 The next step is to create a "plan", which is an object that
258 contains all the data that FFTW needs to compute the FFT. This
259 function creates the plan:
260
261 fftw_plan fftw_plan_dft_1d(int n, fftw_complex *in, fftw_complex *out,
262 int sign, unsigned flags);
263
264 The first argument, `n', is the size of the transform you are trying
265 to compute. The size `n' can be any positive integer, but sizes that
266 are products of small factors are transformed most efficiently
267 (although prime sizes still use an O(n log n) algorithm).
268
269 The next two arguments are pointers to the input and output arrays of
270 the transform. These pointers can be equal, indicating an "in-place"
271 transform.
272
273 The fourth argument, `sign', can be either `FFTW_FORWARD' (`-1') or
274 `FFTW_BACKWARD' (`+1'), and indicates the direction of the transform
275 you are interested in; technically, it is the sign of the exponent in
276 the transform.
277
278 The `flags' argument is usually either `FFTW_MEASURE' or `FFTW_ESTIMATE'.
279 `FFTW_MEASURE' instructs FFTW to run and measure the execution time of
280 several FFTs in order to find the best way to compute the transform of
281 size `n'. This process takes some time (usually a few seconds),
282 depending on your machine and on the size of the transform.
283 `FFTW_ESTIMATE', on the contrary, does not run any computation and just
284 builds a reasonable plan that is probably sub-optimal. In short, if
285 your program performs many transforms of the same size and
286 initialization time is not important, use `FFTW_MEASURE'; otherwise use
287 the estimate.
288
289 _You must create the plan before initializing the input_, because
290 `FFTW_MEASURE' overwrites the `in'/`out' arrays. (Technically,
291 `FFTW_ESTIMATE' does not touch your arrays, but you should always
292 create plans first just to be sure.)
293
294 Once the plan has been created, you can use it as many times as you
295 like for transforms on the specified `in'/`out' arrays, computing the
296 actual transforms via `fftw_execute(plan)':
297 void fftw_execute(const fftw_plan plan);
298
299 The DFT results are stored in-order in the array `out', with the
300 zero-frequency (DC) component in `out[0]'. If `in != out', the
301 transform is "out-of-place" and the input array `in' is not modified.
302 Otherwise, the input array is overwritten with the transform.
303
304 If you want to transform a _different_ array of the same size, you
305 can create a new plan with `fftw_plan_dft_1d' and FFTW automatically
306 reuses the information from the previous plan, if possible.
307 Alternatively, with the "guru" interface you can apply a given plan to
308 a different array, if you are careful. *Note FFTW Reference::.
309
310 When you are done with the plan, you deallocate it by calling
311 `fftw_destroy_plan(plan)':
312 void fftw_destroy_plan(fftw_plan plan);
313 If you allocate an array with `fftw_malloc()' you must deallocate it
314 with `fftw_free()'. Do not use `free()' or, heaven forbid, `delete'.
315
316 FFTW computes an _unnormalized_ DFT. Thus, computing a forward
317 followed by a backward transform (or vice versa) results in the original
318 array scaled by `n'. For the definition of the DFT, see *note What
319 FFTW Really Computes::.
320
321 If you have a C compiler, such as `gcc', that supports the C99
322 standard, and you `#include <complex.h>' _before_ `<fftw3.h>', then
323 `fftw_complex' is the native double-precision complex type and you can
324 manipulate it with ordinary arithmetic. Otherwise, FFTW defines its
325 own complex type, which is bit-compatible with the C99 complex type.
326 *Note Complex numbers::. (The C++ `<complex>' template class may also
327 be usable via a typecast.)
328
329 To use single or long-double precision versions of FFTW, replace the
330 `fftw_' prefix by `fftwf_' or `fftwl_' and link with `-lfftw3f' or
331 `-lfftw3l', but use the _same_ `<fftw3.h>' header file.
332
333 Many more flags exist besides `FFTW_MEASURE' and `FFTW_ESTIMATE'.
334 For example, use `FFTW_PATIENT' if you're willing to wait even longer
335 for a possibly even faster plan (*note FFTW Reference::). You can also
336 save plans for future use, as described by *note Words of Wisdom-Saving
337 Plans::.
338
339 
340 File: fftw3.info, Node: Complex Multi-Dimensional DFTs, Next: One-Dimensional DFTs of Real Data, Prev: Complex One-Dimensional DFTs, Up: Tutorial
341
342 2.2 Complex Multi-Dimensional DFTs
343 ==================================
344
345 Multi-dimensional transforms work much the same way as one-dimensional
346 transforms: you allocate arrays of `fftw_complex' (preferably using
347 `fftw_malloc'), create an `fftw_plan', execute it as many times as you
348 want with `fftw_execute(plan)', and clean up with
349 `fftw_destroy_plan(plan)' (and `fftw_free').
350
351 FFTW provides two routines for creating plans for 2d and 3d
352 transforms, and one routine for creating plans of arbitrary
353 dimensionality. The 2d and 3d routines have the following signature:
354 fftw_plan fftw_plan_dft_2d(int n0, int n1,
355 fftw_complex *in, fftw_complex *out,
356 int sign, unsigned flags);
357 fftw_plan fftw_plan_dft_3d(int n0, int n1, int n2,
358 fftw_complex *in, fftw_complex *out,
359 int sign, unsigned flags);
360
361 These routines create plans for `n0' by `n1' two-dimensional (2d)
362 transforms and `n0' by `n1' by `n2' 3d transforms, respectively. All
363 of these transforms operate on contiguous arrays in the C-standard
364 "row-major" order, so that the last dimension has the fastest-varying
365 index in the array. This layout is described further in *note
366 Multi-dimensional Array Format::.
367
368 FFTW can also compute transforms of higher dimensionality. In order
369 to avoid confusion between the various meanings of the the word
370 "dimension", we use the term _rank_ to denote the number of independent
371 indices in an array.(1) For example, we say that a 2d transform has
372 rank 2, a 3d transform has rank 3, and so on. You can plan transforms
373 of arbitrary rank by means of the following function:
374
375 fftw_plan fftw_plan_dft(int rank, const int *n,
376 fftw_complex *in, fftw_complex *out,
377 int sign, unsigned flags);
378
379 Here, `n' is a pointer to an array `n[rank]' denoting an `n[0]' by
380 `n[1]' by ... by `n[rank-1]' transform. Thus, for example, the call
381 fftw_plan_dft_2d(n0, n1, in, out, sign, flags);
382 is equivalent to the following code fragment:
383 int n[2];
384 n[0] = n0;
385 n[1] = n1;
386 fftw_plan_dft(2, n, in, out, sign, flags);
387 `fftw_plan_dft' is not restricted to 2d and 3d transforms, however,
388 but it can plan transforms of arbitrary rank.
389
390 You may have noticed that all the planner routines described so far
391 have overlapping functionality. For example, you can plan a 1d or 2d
392 transform by using `fftw_plan_dft' with a `rank' of `1' or `2', or even
393 by calling `fftw_plan_dft_3d' with `n0' and/or `n1' equal to `1' (with
394 no loss in efficiency). This pattern continues, and FFTW's planning
395 routines in general form a "partial order," sequences of interfaces
396 with strictly increasing generality but correspondingly greater
397 complexity.
398
399 `fftw_plan_dft' is the most general complex-DFT routine that we
400 describe in this tutorial, but there are also the advanced and guru
401 interfaces, which allow one to efficiently combine multiple/strided
402 transforms into a single FFTW plan, transform a subset of a larger
403 multi-dimensional array, and/or to handle more general complex-number
404 formats. For more information, see *note FFTW Reference::.
405
406 ---------- Footnotes ----------
407
408 (1) The term "rank" is commonly used in the APL, FORTRAN, and Common
409 Lisp traditions, although it is not so common in the C world.
410
411 
412 File: fftw3.info, Node: One-Dimensional DFTs of Real Data, Next: Multi-Dimensional DFTs of Real Data, Prev: Complex Multi-Dimensional DFTs, Up: Tutorial
413
414 2.3 One-Dimensional DFTs of Real Data
415 =====================================
416
417 In many practical applications, the input data `in[i]' are purely real
418 numbers, in which case the DFT output satisfies the "Hermitian" redundancy:
419 `out[i]' is the conjugate of `out[n-i]'. It is possible to take
420 advantage of these circumstances in order to achieve roughly a factor
421 of two improvement in both speed and memory usage.
422
423 In exchange for these speed and space advantages, the user sacrifices
424 some of the simplicity of FFTW's complex transforms. First of all, the
425 input and output arrays are of _different sizes and types_: the input
426 is `n' real numbers, while the output is `n/2+1' complex numbers (the
427 non-redundant outputs); this also requires slight "padding" of the
428 input array for in-place transforms. Second, the inverse transform
429 (complex to real) has the side-effect of _overwriting its input array_,
430 by default. Neither of these inconveniences should pose a serious
431 problem for users, but it is important to be aware of them.
432
433 The routines to perform real-data transforms are almost the same as
434 those for complex transforms: you allocate arrays of `double' and/or
435 `fftw_complex' (preferably using `fftw_malloc' or
436 `fftw_alloc_complex'), create an `fftw_plan', execute it as many times
437 as you want with `fftw_execute(plan)', and clean up with
438 `fftw_destroy_plan(plan)' (and `fftw_free'). The only differences are
439 that the input (or output) is of type `double' and there are new
440 routines to create the plan. In one dimension:
441
442 fftw_plan fftw_plan_dft_r2c_1d(int n, double *in, fftw_complex *out,
443 unsigned flags);
444 fftw_plan fftw_plan_dft_c2r_1d(int n, fftw_complex *in, double *out,
445 unsigned flags);
446
447 for the real input to complex-Hermitian output ("r2c") and
448 complex-Hermitian input to real output ("c2r") transforms. Unlike the
449 complex DFT planner, there is no `sign' argument. Instead, r2c DFTs
450 are always `FFTW_FORWARD' and c2r DFTs are always `FFTW_BACKWARD'. (For
451 single/long-double precision `fftwf' and `fftwl', `double' should be
452 replaced by `float' and `long double', respectively.)
453
454 Here, `n' is the "logical" size of the DFT, not necessarily the
455 physical size of the array. In particular, the real (`double') array
456 has `n' elements, while the complex (`fftw_complex') array has `n/2+1'
457 elements (where the division is rounded down). For an in-place
458 transform, `in' and `out' are aliased to the same array, which must be
459 big enough to hold both; so, the real array would actually have
460 `2*(n/2+1)' elements, where the elements beyond the first `n' are
461 unused padding. (Note that this is very different from the concept of
462 "zero-padding" a transform to a larger length, which changes the
463 logical size of the DFT by actually adding new input data.) The kth
464 element of the complex array is exactly the same as the kth element of
465 the corresponding complex DFT. All positive `n' are supported;
466 products of small factors are most efficient, but an O(n log n)
467 algorithm is used even for prime sizes.
468
469 As noted above, the c2r transform destroys its input array even for
470 out-of-place transforms. This can be prevented, if necessary, by
471 including `FFTW_PRESERVE_INPUT' in the `flags', with unfortunately some
472 sacrifice in performance. This flag is also not currently supported
473 for multi-dimensional real DFTs (next section).
474
475 Readers familiar with DFTs of real data will recall that the 0th (the
476 "DC") and `n/2'-th (the "Nyquist" frequency, when `n' is even) elements
477 of the complex output are purely real. Some implementations therefore
478 store the Nyquist element where the DC imaginary part would go, in
479 order to make the input and output arrays the same size. Such packing,
480 however, does not generalize well to multi-dimensional transforms, and
481 the space savings are miniscule in any case; FFTW does not support it.
482
483 An alternative interface for one-dimensional r2c and c2r DFTs can be
484 found in the `r2r' interface (*note The Halfcomplex-format DFT::), with
485 "halfcomplex"-format output that _is_ the same size (and type) as the
486 input array. That interface, although it is not very useful for
487 multi-dimensional transforms, may sometimes yield better performance.
488
489 
490 File: fftw3.info, Node: Multi-Dimensional DFTs of Real Data, Next: More DFTs of Real Data, Prev: One-Dimensional DFTs of Real Data, Up: Tutorial
491
492 2.4 Multi-Dimensional DFTs of Real Data
493 =======================================
494
495 Multi-dimensional DFTs of real data use the following planner routines:
496
497 fftw_plan fftw_plan_dft_r2c_2d(int n0, int n1,
498 double *in, fftw_complex *out,
499 unsigned flags);
500 fftw_plan fftw_plan_dft_r2c_3d(int n0, int n1, int n2,
501 double *in, fftw_complex *out,
502 unsigned flags);
503 fftw_plan fftw_plan_dft_r2c(int rank, const int *n,
504 double *in, fftw_complex *out,
505 unsigned flags);
506
507 as well as the corresponding `c2r' routines with the input/output
508 types swapped. These routines work similarly to their complex
509 analogues, except for the fact that here the complex output array is cut
510 roughly in half and the real array requires padding for in-place
511 transforms (as in 1d, above).
512
513 As before, `n' is the logical size of the array, and the
514 consequences of this on the the format of the complex arrays deserve
515 careful attention. Suppose that the real data has dimensions n[0] x
516 n[1] x n[2] x ... x n[d-1] (in row-major order). Then, after an r2c
517 transform, the output is an n[0] x n[1] x n[2] x ... x (n[d-1]/2 + 1)
518 array of `fftw_complex' values in row-major order, corresponding to
519 slightly over half of the output of the corresponding complex DFT.
520 (The division is rounded down.) The ordering of the data is otherwise
521 exactly the same as in the complex-DFT case.
522
523 For out-of-place transforms, this is the end of the story: the real
524 data is stored as a row-major array of size n[0] x n[1] x n[2] x ... x
525 n[d-1] and the complex data is stored as a row-major array of size
526 n[0] x n[1] x n[2] x ... x (n[d-1]/2 + 1) .
527
528 For in-place transforms, however, extra padding of the real-data
529 array is necessary because the complex array is larger than the real
530 array, and the two arrays share the same memory locations. Thus, for
531 in-place transforms, the final dimension of the real-data array must be
532 padded with extra values to accommodate the size of the complex
533 data--two values if the last dimension is even and one if it is odd. That
534 is, the last dimension of the real data must physically contain 2 *
535 (n[d-1]/2+1) `double' values (exactly enough to hold the complex data).
536 This physical array size does not, however, change the _logical_ array
537 size--only n[d-1] values are actually stored in the last dimension, and
538 n[d-1] is the last dimension passed to the plan-creation routine.
539
540 For example, consider the transform of a two-dimensional real array
541 of size `n0' by `n1'. The output of the r2c transform is a
542 two-dimensional complex array of size `n0' by `n1/2+1', where the `y'
543 dimension has been cut nearly in half because of redundancies in the
544 output. Because `fftw_complex' is twice the size of `double', the
545 output array is slightly bigger than the input array. Thus, if we want
546 to compute the transform in place, we must _pad_ the input array so
547 that it is of size `n0' by `2*(n1/2+1)'. If `n1' is even, then there
548 are two padding elements at the end of each row (which need not be
549 initialized, as they are only used for output).
550
551 These transforms are unnormalized, so an r2c followed by a c2r
552 transform (or vice versa) will result in the original data scaled by
553 the number of real data elements--that is, the product of the (logical)
554 dimensions of the real data.
555
556 (Because the last dimension is treated specially, if it is equal to
557 `1' the transform is _not_ equivalent to a lower-dimensional r2c/c2r
558 transform. In that case, the last complex dimension also has size `1'
559 (`=1/2+1'), and no advantage is gained over the complex transforms.)
560
561 
562 File: fftw3.info, Node: More DFTs of Real Data, Prev: Multi-Dimensional DFTs of Real Data, Up: Tutorial
563
564 2.5 More DFTs of Real Data
565 ==========================
566
567 * Menu:
568
569 * The Halfcomplex-format DFT::
570 * Real even/odd DFTs (cosine/sine transforms)::
571 * The Discrete Hartley Transform::
572
573 FFTW supports several other transform types via a unified "r2r"
574 (real-to-real) interface, so called because it takes a real (`double')
575 array and outputs a real array of the same size. These r2r transforms
576 currently fall into three categories: DFTs of real input and
577 complex-Hermitian output in halfcomplex format, DFTs of real input with
578 even/odd symmetry (a.k.a. discrete cosine/sine transforms, DCTs/DSTs),
579 and discrete Hartley transforms (DHTs), all described in more detail by
580 the following sections.
581
582 The r2r transforms follow the by now familiar interface of creating
583 an `fftw_plan', executing it with `fftw_execute(plan)', and destroying
584 it with `fftw_destroy_plan(plan)'. Furthermore, all r2r transforms
585 share the same planner interface:
586
587 fftw_plan fftw_plan_r2r_1d(int n, double *in, double *out,
588 fftw_r2r_kind kind, unsigned flags);
589 fftw_plan fftw_plan_r2r_2d(int n0, int n1, double *in, double *out,
590 fftw_r2r_kind kind0, fftw_r2r_kind kind1,
591 unsigned flags);
592 fftw_plan fftw_plan_r2r_3d(int n0, int n1, int n2,
593 double *in, double *out,
594 fftw_r2r_kind kind0,
595 fftw_r2r_kind kind1,
596 fftw_r2r_kind kind2,
597 unsigned flags);
598 fftw_plan fftw_plan_r2r(int rank, const int *n, double *in, double *out,
599 const fftw_r2r_kind *kind, unsigned flags);
600
601 Just as for the complex DFT, these plan 1d/2d/3d/multi-dimensional
602 transforms for contiguous arrays in row-major order, transforming (real)
603 input to output of the same size, where `n' specifies the _physical_
604 dimensions of the arrays. All positive `n' are supported (with the
605 exception of `n=1' for the `FFTW_REDFT00' kind, noted in the real-even
606 subsection below); products of small factors are most efficient
607 (factorizing `n-1' and `n+1' for `FFTW_REDFT00' and `FFTW_RODFT00'
608 kinds, described below), but an O(n log n) algorithm is used even for
609 prime sizes.
610
611 Each dimension has a "kind" parameter, of type `fftw_r2r_kind',
612 specifying the kind of r2r transform to be used for that dimension. (In
613 the case of `fftw_plan_r2r', this is an array `kind[rank]' where
614 `kind[i]' is the transform kind for the dimension `n[i]'.) The kind
615 can be one of a set of predefined constants, defined in the following
616 subsections.
617
618 In other words, FFTW computes the separable product of the specified
619 r2r transforms over each dimension, which can be used e.g. for partial
620 differential equations with mixed boundary conditions. (For some r2r
621 kinds, notably the halfcomplex DFT and the DHT, such a separable
622 product is somewhat problematic in more than one dimension, however, as
623 is described below.)
624
625 In the current version of FFTW, all r2r transforms except for the
626 halfcomplex type are computed via pre- or post-processing of
627 halfcomplex transforms, and they are therefore not as fast as they
628 could be. Since most other general DCT/DST codes employ a similar
629 algorithm, however, FFTW's implementation should provide at least
630 competitive performance.
631
632 
633 File: fftw3.info, Node: The Halfcomplex-format DFT, Next: Real even/odd DFTs (cosine/sine transforms), Prev: More DFTs of Real Data, Up: More DFTs of Real Data
634
635 2.5.1 The Halfcomplex-format DFT
636 --------------------------------
637
638 An r2r kind of `FFTW_R2HC' ("r2hc") corresponds to an r2c DFT (*note
639 One-Dimensional DFTs of Real Data::) but with "halfcomplex" format
640 output, and may sometimes be faster and/or more convenient than the
641 latter. The inverse "hc2r" transform is of kind `FFTW_HC2R'. This
642 consists of the non-redundant half of the complex output for a 1d
643 real-input DFT of size `n', stored as a sequence of `n' real numbers
644 (`double') in the format:
645
646 r0, r1, r2, r(n/2), i((n+1)/2-1), ..., i2, i1
647
648 Here, rk is the real part of the kth output, and ik is the imaginary
649 part. (Division by 2 is rounded down.) For a halfcomplex array
650 `hc[n]', the kth component thus has its real part in `hc[k]' and its
651 imaginary part in `hc[n-k]', with the exception of `k' `==' `0' or
652 `n/2' (the latter only if `n' is even)--in these two cases, the
653 imaginary part is zero due to symmetries of the real-input DFT, and is
654 not stored. Thus, the r2hc transform of `n' real values is a
655 halfcomplex array of length `n', and vice versa for hc2r.
656
657 Aside from the differing format, the output of
658 `FFTW_R2HC'/`FFTW_HC2R' is otherwise exactly the same as for the
659 corresponding 1d r2c/c2r transform (i.e. `FFTW_FORWARD'/`FFTW_BACKWARD'
660 transforms, respectively). Recall that these transforms are
661 unnormalized, so r2hc followed by hc2r will result in the original data
662 multiplied by `n'. Furthermore, like the c2r transform, an
663 out-of-place hc2r transform will _destroy its input_ array.
664
665 Although these halfcomplex transforms can be used with the
666 multi-dimensional r2r interface, the interpretation of such a separable
667 product of transforms along each dimension is problematic. For example,
668 consider a two-dimensional `n0' by `n1', r2hc by r2hc transform planned
669 by `fftw_plan_r2r_2d(n0, n1, in, out, FFTW_R2HC, FFTW_R2HC,
670 FFTW_MEASURE)'. Conceptually, FFTW first transforms the rows (of size
671 `n1') to produce halfcomplex rows, and then transforms the columns (of
672 size `n0'). Half of these column transforms, however, are of imaginary
673 parts, and should therefore be multiplied by i and combined with the
674 r2hc transforms of the real columns to produce the 2d DFT amplitudes;
675 FFTW's r2r transform does _not_ perform this combination for you.
676 Thus, if a multi-dimensional real-input/output DFT is required, we
677 recommend using the ordinary r2c/c2r interface (*note Multi-Dimensional
678 DFTs of Real Data::).
679
680 
681 File: fftw3.info, Node: Real even/odd DFTs (cosine/sine transforms), Next: The Discrete Hartley Transform, Prev: The Halfcomplex-format DFT, Up: More DFTs of Real Data
682
683 2.5.2 Real even/odd DFTs (cosine/sine transforms)
684 -------------------------------------------------
685
686 The Fourier transform of a real-even function f(-x) = f(x) is
687 real-even, and i times the Fourier transform of a real-odd function
688 f(-x) = -f(x) is real-odd. Similar results hold for a discrete Fourier
689 transform, and thus for these symmetries the need for complex
690 inputs/outputs is entirely eliminated. Moreover, one gains a factor of
691 two in speed/space from the fact that the data are real, and an
692 additional factor of two from the even/odd symmetry: only the
693 non-redundant (first) half of the array need be stored. The result is
694 the real-even DFT ("REDFT") and the real-odd DFT ("RODFT"), also known
695 as the discrete cosine and sine transforms ("DCT" and "DST"),
696 respectively.
697
698 (In this section, we describe the 1d transforms; multi-dimensional
699 transforms are just a separable product of these transforms operating
700 along each dimension.)
701
702 Because of the discrete sampling, one has an additional choice: is
703 the data even/odd around a sampling point, or around the point halfway
704 between two samples? The latter corresponds to _shifting_ the samples
705 by _half_ an interval, and gives rise to several transform variants
706 denoted by REDFTab and RODFTab: a and b are 0 or 1, and indicate
707 whether the input (a) and/or output (b) are shifted by half a sample (1
708 means it is shifted). These are also known as types I-IV of the DCT
709 and DST, and all four types are supported by FFTW's r2r interface.(1)
710
711 The r2r kinds for the various REDFT and RODFT types supported by
712 FFTW, along with the boundary conditions at both ends of the _input_
713 array (`n' real numbers `in[j=0..n-1]'), are:
714
715 * `FFTW_REDFT00' (DCT-I): even around j=0 and even around j=n-1.
716
717 * `FFTW_REDFT10' (DCT-II, "the" DCT): even around j=-0.5 and even
718 around j=n-0.5.
719
720 * `FFTW_REDFT01' (DCT-III, "the" IDCT): even around j=0 and odd
721 around j=n.
722
723 * `FFTW_REDFT11' (DCT-IV): even around j=-0.5 and odd around j=n-0.5.
724
725 * `FFTW_RODFT00' (DST-I): odd around j=-1 and odd around j=n.
726
727 * `FFTW_RODFT10' (DST-II): odd around j=-0.5 and odd around j=n-0.5.
728
729 * `FFTW_RODFT01' (DST-III): odd around j=-1 and even around j=n-1.
730
731 * `FFTW_RODFT11' (DST-IV): odd around j=-0.5 and even around j=n-0.5.
732
733
734 Note that these symmetries apply to the "logical" array being
735 transformed; *there are no constraints on your physical input data*.
736 So, for example, if you specify a size-5 REDFT00 (DCT-I) of the data
737 abcde, it corresponds to the DFT of the logical even array abcdedcb of
738 size 8. A size-4 REDFT10 (DCT-II) of the data abcd corresponds to the
739 size-8 logical DFT of the even array abcddcba, shifted by half a sample.
740
741 All of these transforms are invertible. The inverse of R*DFT00 is
742 R*DFT00; of R*DFT10 is R*DFT01 and vice versa (these are often called
743 simply "the" DCT and IDCT, respectively); and of R*DFT11 is R*DFT11.
744 However, the transforms computed by FFTW are unnormalized, exactly like
745 the corresponding real and complex DFTs, so computing a transform
746 followed by its inverse yields the original array scaled by N, where N
747 is the _logical_ DFT size. For REDFT00, N=2(n-1); for RODFT00,
748 N=2(n+1); otherwise, N=2n.
749
750 Note that the boundary conditions of the transform output array are
751 given by the input boundary conditions of the inverse transform. Thus,
752 the above transforms are all inequivalent in terms of input/output
753 boundary conditions, even neglecting the 0.5 shift difference.
754
755 FFTW is most efficient when N is a product of small factors; note
756 that this _differs_ from the factorization of the physical size `n' for
757 REDFT00 and RODFT00! There is another oddity: `n=1' REDFT00 transforms
758 correspond to N=0, and so are _not defined_ (the planner will return
759 `NULL'). Otherwise, any positive `n' is supported.
760
761 For the precise mathematical definitions of these transforms as used
762 by FFTW, see *note What FFTW Really Computes::. (For people accustomed
763 to the DCT/DST, FFTW's definitions have a coefficient of 2 in front of
764 the cos/sin functions so that they correspond precisely to an even/odd
765 DFT of size N. Some authors also include additional multiplicative
766 factors of sqrt(2) for selected inputs and outputs; this makes the
767 transform orthogonal, but sacrifices the direct equivalence to a
768 symmetric DFT.)
769
770 Which type do you need?
771 .......................
772
773 Since the required flavor of even/odd DFT depends upon your problem,
774 you are the best judge of this choice, but we can make a few comments
775 on relative efficiency to help you in your selection. In particular,
776 R*DFT01 and R*DFT10 tend to be slightly faster than R*DFT11 (especially
777 for odd sizes), while the R*DFT00 transforms are sometimes
778 significantly slower (especially for even sizes).(2)
779
780 Thus, if only the boundary conditions on the transform inputs are
781 specified, we generally recommend R*DFT10 over R*DFT00 and R*DFT01 over
782 R*DFT11 (unless the half-sample shift or the self-inverse property is
783 significant for your problem).
784
785 If performance is important to you and you are using only small sizes
786 (say n<200), e.g. for multi-dimensional transforms, then you might
787 consider generating hard-coded transforms of those sizes and types that
788 you are interested in (*note Generating your own code::).
789
790 We are interested in hearing what types of symmetric transforms you
791 find most useful.
792
793 ---------- Footnotes ----------
794
795 (1) There are also type V-VIII transforms, which correspond to a
796 logical DFT of _odd_ size N, independent of whether the physical size
797 `n' is odd, but we do not support these variants.
798
799 (2) R*DFT00 is sometimes slower in FFTW because we discovered that
800 the standard algorithm for computing this by a pre/post-processed real
801 DFT--the algorithm used in FFTPACK, Numerical Recipes, and other
802 sources for decades now--has serious numerical problems: it already
803 loses several decimal places of accuracy for 16k sizes. There seem to
804 be only two alternatives in the literature that do not suffer
805 similarly: a recursive decomposition into smaller DCTs, which would
806 require a large set of codelets for efficiency and generality, or
807 sacrificing a factor of 2 in speed to use a real DFT of twice the size.
808 We currently employ the latter technique for general n, as well as a
809 limited form of the former method: a split-radix decomposition when n
810 is odd (N a multiple of 4). For N containing many factors of 2, the
811 split-radix method seems to recover most of the speed of the standard
812 algorithm without the accuracy tradeoff.
813
814 
815 File: fftw3.info, Node: The Discrete Hartley Transform, Prev: Real even/odd DFTs (cosine/sine transforms), Up: More DFTs of Real Data
816
817 2.5.3 The Discrete Hartley Transform
818 ------------------------------------
819
820 If you are planning to use the DHT because you've heard that it is
821 "faster" than the DFT (FFT), *stop here*. The DHT is not faster than
822 the DFT. That story is an old but enduring misconception that was
823 debunked in 1987.
824
825 The discrete Hartley transform (DHT) is an invertible linear
826 transform closely related to the DFT. In the DFT, one multiplies each
827 input by cos - i * sin (a complex exponential), whereas in the DHT each
828 input is multiplied by simply cos + sin. Thus, the DHT transforms `n'
829 real numbers to `n' real numbers, and has the convenient property of
830 being its own inverse. In FFTW, a DHT (of any positive `n') can be
831 specified by an r2r kind of `FFTW_DHT'.
832
833 Like the DFT, in FFTW the DHT is unnormalized, so computing a DHT of
834 size `n' followed by another DHT of the same size will result in the
835 original array multiplied by `n'.
836
837 The DHT was originally proposed as a more efficient alternative to
838 the DFT for real data, but it was subsequently shown that a specialized
839 DFT (such as FFTW's r2hc or r2c transforms) could be just as fast. In
840 FFTW, the DHT is actually computed by post-processing an r2hc
841 transform, so there is ordinarily no reason to prefer it from a
842 performance perspective.(1) However, we have heard rumors that the DHT
843 might be the most appropriate transform in its own right for certain
844 applications, and we would be very interested to hear from anyone who
845 finds it useful.
846
847 If `FFTW_DHT' is specified for multiple dimensions of a
848 multi-dimensional transform, FFTW computes the separable product of 1d
849 DHTs along each dimension. Unfortunately, this is not quite the same
850 thing as a true multi-dimensional DHT; you can compute the latter, if
851 necessary, with at most `rank-1' post-processing passes [see e.g. H.
852 Hao and R. N. Bracewell, Proc. IEEE 75, 264-266 (1987)].
853
854 For the precise mathematical definition of the DHT as used by FFTW,
855 see *note What FFTW Really Computes::.
856
857 ---------- Footnotes ----------
858
859 (1) We provide the DHT mainly as a byproduct of some internal
860 algorithms. FFTW computes a real input/output DFT of _prime_ size by
861 re-expressing it as a DHT plus post/pre-processing and then using
862 Rader's prime-DFT algorithm adapted to the DHT.
863
864 
865 File: fftw3.info, Node: Other Important Topics, Next: FFTW Reference, Prev: Tutorial, Up: Top
866
867 3 Other Important Topics
868 ************************
869
870 * Menu:
871
872 * SIMD alignment and fftw_malloc::
873 * Multi-dimensional Array Format::
874 * Words of Wisdom-Saving Plans::
875 * Caveats in Using Wisdom::
876
877 
878 File: fftw3.info, Node: SIMD alignment and fftw_malloc, Next: Multi-dimensional Array Format, Prev: Other Important Topics, Up: Other Important Topics
879
880 3.1 SIMD alignment and fftw_malloc
881 ==================================
882
883 SIMD, which stands for "Single Instruction Multiple Data," is a set of
884 special operations supported by some processors to perform a single
885 operation on several numbers (usually 2 or 4) simultaneously. SIMD
886 floating-point instructions are available on several popular CPUs:
887 SSE/SSE2/AVX on recent x86/x86-64 processors, AltiVec (single precision)
888 on some PowerPCs (Apple G4 and higher), NEON on some ARM models, and
889 MIPS Paired Single (currently only in FFTW 3.2.x). FFTW can be
890 compiled to support the SIMD instructions on any of these systems.
891
892 A program linking to an FFTW library compiled with SIMD support can
893 obtain a nonnegligible speedup for most complex and r2c/c2r transforms.
894 In order to obtain this speedup, however, the arrays of complex (or
895 real) data passed to FFTW must be specially aligned in memory
896 (typically 16-byte aligned), and often this alignment is more stringent
897 than that provided by the usual `malloc' (etc.) allocation routines.
898
899 In order to guarantee proper alignment for SIMD, therefore, in case
900 your program is ever linked against a SIMD-using FFTW, we recommend
901 allocating your transform data with `fftw_malloc' and de-allocating it
902 with `fftw_free'. These have exactly the same interface and behavior as
903 `malloc'/`free', except that for a SIMD FFTW they ensure that the
904 returned pointer has the necessary alignment (by calling `memalign' or
905 its equivalent on your OS).
906
907 You are not _required_ to use `fftw_malloc'. You can allocate your
908 data in any way that you like, from `malloc' to `new' (in C++) to a
909 fixed-size array declaration. If the array happens not to be properly
910 aligned, FFTW will not use the SIMD extensions.
911
912 Since `fftw_malloc' only ever needs to be used for real and complex
913 arrays, we provide two convenient wrapper routines `fftw_alloc_real(N)'
914 and `fftw_alloc_complex(N)' that are equivalent to
915 `(double*)fftw_malloc(sizeof(double) * N)' and
916 `(fftw_complex*)fftw_malloc(sizeof(fftw_complex) * N)', respectively
917 (or their equivalents in other precisions).
918
919 
920 File: fftw3.info, Node: Multi-dimensional Array Format, Next: Words of Wisdom-Saving Plans, Prev: SIMD alignment and fftw_malloc, Up: Other Important Topics
921
922 3.2 Multi-dimensional Array Format
923 ==================================
924
925 This section describes the format in which multi-dimensional arrays are
926 stored in FFTW. We felt that a detailed discussion of this topic was
927 necessary. Since several different formats are common, this topic is
928 often a source of confusion.
929
930 * Menu:
931
932 * Row-major Format::
933 * Column-major Format::
934 * Fixed-size Arrays in C::
935 * Dynamic Arrays in C::
936 * Dynamic Arrays in C-The Wrong Way::
937
938 
939 File: fftw3.info, Node: Row-major Format, Next: Column-major Format, Prev: Multi-dimensional Array Format, Up: Multi-dimensional Array Format
940
941 3.2.1 Row-major Format
942 ----------------------
943
944 The multi-dimensional arrays passed to `fftw_plan_dft' etcetera are
945 expected to be stored as a single contiguous block in "row-major" order
946 (sometimes called "C order"). Basically, this means that as you step
947 through adjacent memory locations, the first dimension's index varies
948 most slowly and the last dimension's index varies most quickly.
949
950 To be more explicit, let us consider an array of rank d whose
951 dimensions are n[0] x n[1] x n[2] x ... x n[d-1] . Now, we specify a
952 location in the array by a sequence of d (zero-based) indices, one for
953 each dimension: (i[0], i[1], ..., i[d-1]). If the array is stored in
954 row-major order, then this element is located at the position i[d-1] +
955 n[d-1] * (i[d-2] + n[d-2] * (... + n[1] * i[0])).
956
957 Note that, for the ordinary complex DFT, each element of the array
958 must be of type `fftw_complex'; i.e. a (real, imaginary) pair of
959 (double-precision) numbers.
960
961 In the advanced FFTW interface, the physical dimensions n from which
962 the indices are computed can be different from (larger than) the
963 logical dimensions of the transform to be computed, in order to
964 transform a subset of a larger array. Note also that, in the advanced
965 interface, the expression above is multiplied by a "stride" to get the
966 actual array index--this is useful in situations where each element of
967 the multi-dimensional array is actually a data structure (or another
968 array), and you just want to transform a single field. In the basic
969 interface, however, the stride is 1.
970
971 
972 File: fftw3.info, Node: Column-major Format, Next: Fixed-size Arrays in C, Prev: Row-major Format, Up: Multi-dimensional Array Format
973
974 3.2.2 Column-major Format
975 -------------------------
976
977 Readers from the Fortran world are used to arrays stored in
978 "column-major" order (sometimes called "Fortran order"). This is
979 essentially the exact opposite of row-major order in that, here, the
980 _first_ dimension's index varies most quickly.
981
982 If you have an array stored in column-major order and wish to
983 transform it using FFTW, it is quite easy to do. When creating the
984 plan, simply pass the dimensions of the array to the planner in
985 _reverse order_. For example, if your array is a rank three `N x M x
986 L' matrix in column-major order, you should pass the dimensions of the
987 array as if it were an `L x M x N' matrix (which it is, from the
988 perspective of FFTW). This is done for you _automatically_ by the FFTW
989 legacy-Fortran interface (*note Calling FFTW from Legacy Fortran::),
990 but you must do it manually with the modern Fortran interface (*note
991 Reversing array dimensions::).
992
993 
994 File: fftw3.info, Node: Fixed-size Arrays in C, Next: Dynamic Arrays in C, Prev: Column-major Format, Up: Multi-dimensional Array Format
995
996 3.2.3 Fixed-size Arrays in C
997 ----------------------------
998
999 A multi-dimensional array whose size is declared at compile time in C
1000 is _already_ in row-major order. You don't have to do anything special
1001 to transform it. For example:
1002
1003 {
1004 fftw_complex data[N0][N1][N2];
1005 fftw_plan plan;
1006 ...
1007 plan = fftw_plan_dft_3d(N0, N1, N2, &data[0][0][0], &data[0][0][0],
1008 FFTW_FORWARD, FFTW_ESTIMATE);
1009 ...
1010 }
1011
1012 This will plan a 3d in-place transform of size `N0 x N1 x N2'.
1013 Notice how we took the address of the zero-th element to pass to the
1014 planner (we could also have used a typecast).
1015
1016 However, we tend to _discourage_ users from declaring their arrays
1017 in this way, for two reasons. First, this allocates the array on the
1018 stack ("automatic" storage), which has a very limited size on most
1019 operating systems (declaring an array with more than a few thousand
1020 elements will often cause a crash). (You can get around this
1021 limitation on many systems by declaring the array as `static' and/or
1022 global, but that has its own drawbacks.) Second, it may not optimally
1023 align the array for use with a SIMD FFTW (*note SIMD alignment and
1024 fftw_malloc::). Instead, we recommend using `fftw_malloc', as
1025 described below.
1026
1027 
1028 File: fftw3.info, Node: Dynamic Arrays in C, Next: Dynamic Arrays in C-The Wrong Way, Prev: Fixed-size Arrays in C, Up: Multi-dimensional Array Format
1029
1030 3.2.4 Dynamic Arrays in C
1031 -------------------------
1032
1033 We recommend allocating most arrays dynamically, with `fftw_malloc'.
1034 This isn't too hard to do, although it is not as straightforward for
1035 multi-dimensional arrays as it is for one-dimensional arrays.
1036
1037 Creating the array is simple: using a dynamic-allocation routine like
1038 `fftw_malloc', allocate an array big enough to store N `fftw_complex'
1039 values (for a complex DFT), where N is the product of the sizes of the
1040 array dimensions (i.e. the total number of complex values in the
1041 array). For example, here is code to allocate a 5 x 12 x 27 rank-3
1042 array:
1043
1044 fftw_complex *an_array;
1045 an_array = (fftw_complex*) fftw_malloc(5*12*27 * sizeof(fftw_complex));
1046
1047 Accessing the array elements, however, is more tricky--you can't
1048 simply use multiple applications of the `[]' operator like you could
1049 for fixed-size arrays. Instead, you have to explicitly compute the
1050 offset into the array using the formula given earlier for row-major
1051 arrays. For example, to reference the (i,j,k)-th element of the array
1052 allocated above, you would use the expression `an_array[k + 27 * (j +
1053 12 * i)]'.
1054
1055 This pain can be alleviated somewhat by defining appropriate macros,
1056 or, in C++, creating a class and overloading the `()' operator. The
1057 recent C99 standard provides a way to reinterpret the dynamic array as
1058 a "variable-length" multi-dimensional array amenable to `[]', but this
1059 feature is not yet widely supported by compilers.
1060
1061 
1062 File: fftw3.info, Node: Dynamic Arrays in C-The Wrong Way, Prev: Dynamic Arrays in C, Up: Multi-dimensional Array Format
1063
1064 3.2.5 Dynamic Arrays in C--The Wrong Way
1065 ----------------------------------------
1066
1067 A different method for allocating multi-dimensional arrays in C is
1068 often suggested that is incompatible with FFTW: _using it will cause
1069 FFTW to die a painful death_. We discuss the technique here, however,
1070 because it is so commonly known and used. This method is to create
1071 arrays of pointers of arrays of pointers of ...etcetera. For example,
1072 the analogue in this method to the example above is:
1073
1074 int i,j;
1075 fftw_complex ***a_bad_array; /* another way to make a 5x12x27 array */
1076
1077 a_bad_array = (fftw_complex ***) malloc(5 * sizeof(fftw_complex **));
1078 for (i = 0; i < 5; ++i) {
1079 a_bad_array[i] =
1080 (fftw_complex **) malloc(12 * sizeof(fftw_complex *));
1081 for (j = 0; j < 12; ++j)
1082 a_bad_array[i][j] =
1083 (fftw_complex *) malloc(27 * sizeof(fftw_complex));
1084 }
1085
1086 As you can see, this sort of array is inconvenient to allocate (and
1087 deallocate). On the other hand, it has the advantage that the
1088 (i,j,k)-th element can be referenced simply by `a_bad_array[i][j][k]'.
1089
1090 If you like this technique and want to maximize convenience in
1091 accessing the array, but still want to pass the array to FFTW, you can
1092 use a hybrid method. Allocate the array as one contiguous block, but
1093 also declare an array of arrays of pointers that point to appropriate
1094 places in the block. That sort of trick is beyond the scope of this
1095 documentation; for more information on multi-dimensional arrays in C,
1096 see the `comp.lang.c' FAQ (http://c-faq.com/aryptr/dynmuldimary.html).
1097
1098 
1099 File: fftw3.info, Node: Words of Wisdom-Saving Plans, Next: Caveats in Using Wisdom, Prev: Multi-dimensional Array Format, Up: Other Important Topics
1100
1101 3.3 Words of Wisdom--Saving Plans
1102 =================================
1103
1104 FFTW implements a method for saving plans to disk and restoring them.
1105 In fact, what FFTW does is more general than just saving and loading
1106 plans. The mechanism is called "wisdom". Here, we describe this
1107 feature at a high level. *Note FFTW Reference::, for a less casual but
1108 more complete discussion of how to use wisdom in FFTW.
1109
1110 Plans created with the `FFTW_MEASURE', `FFTW_PATIENT', or
1111 `FFTW_EXHAUSTIVE' options produce near-optimal FFT performance, but may
1112 require a long time to compute because FFTW must measure the runtime of
1113 many possible plans and select the best one. This setup is designed
1114 for the situations where so many transforms of the same size must be
1115 computed that the start-up time is irrelevant. For short
1116 initialization times, but slower transforms, we have provided
1117 `FFTW_ESTIMATE'. The `wisdom' mechanism is a way to get the best of
1118 both worlds: you compute a good plan once, save it to disk, and later
1119 reload it as many times as necessary. The wisdom mechanism can
1120 actually save and reload many plans at once, not just one.
1121
1122 Whenever you create a plan, the FFTW planner accumulates wisdom,
1123 which is information sufficient to reconstruct the plan. After
1124 planning, you can save this information to disk by means of the
1125 function:
1126 int fftw_export_wisdom_to_filename(const char *filename);
1127 (This function returns non-zero on success.)
1128
1129 The next time you run the program, you can restore the wisdom with
1130 `fftw_import_wisdom_from_filename' (which also returns non-zero on
1131 success), and then recreate the plan using the same flags as before.
1132 int fftw_import_wisdom_from_filename(const char *filename);
1133
1134 Wisdom is automatically used for any size to which it is applicable,
1135 as long as the planner flags are not more "patient" than those with
1136 which the wisdom was created. For example, wisdom created with
1137 `FFTW_MEASURE' can be used if you later plan with `FFTW_ESTIMATE' or
1138 `FFTW_MEASURE', but not with `FFTW_PATIENT'.
1139
1140 The `wisdom' is cumulative, and is stored in a global, private data
1141 structure managed internally by FFTW. The storage space required is
1142 minimal, proportional to the logarithm of the sizes the wisdom was
1143 generated from. If memory usage is a concern, however, the wisdom can
1144 be forgotten and its associated memory freed by calling:
1145 void fftw_forget_wisdom(void);
1146
1147 Wisdom can be exported to a file, a string, or any other medium.
1148 For details, see *note Wisdom::.
1149
1150 
1151 File: fftw3.info, Node: Caveats in Using Wisdom, Prev: Words of Wisdom-Saving Plans, Up: Other Important Topics
1152
1153 3.4 Caveats in Using Wisdom
1154 ===========================
1155
1156 For in much wisdom is much grief, and he that increaseth knowledge
1157 increaseth sorrow. [Ecclesiastes 1:18]
1158
1159 There are pitfalls to using wisdom, in that it can negate FFTW's
1160 ability to adapt to changing hardware and other conditions. For
1161 example, it would be perfectly possible to export wisdom from a program
1162 running on one processor and import it into a program running on
1163 another processor. Doing so, however, would mean that the second
1164 program would use plans optimized for the first processor, instead of
1165 the one it is running on.
1166
1167 It should be safe to reuse wisdom as long as the hardware and program
1168 binaries remain unchanged. (Actually, the optimal plan may change even
1169 between runs of the same binary on identical hardware, due to
1170 differences in the virtual memory environment, etcetera. Users
1171 seriously interested in performance should worry about this problem,
1172 too.) It is likely that, if the same wisdom is used for two different
1173 program binaries, even running on the same machine, the plans may be
1174 sub-optimal because of differing code alignments. It is therefore wise
1175 to recreate wisdom every time an application is recompiled. The more
1176 the underlying hardware and software changes between the creation of
1177 wisdom and its use, the greater grows the risk of sub-optimal plans.
1178
1179 Nevertheless, if the choice is between using `FFTW_ESTIMATE' or
1180 using possibly-suboptimal wisdom (created on the same machine, but for a
1181 different binary), the wisdom is likely to be better. For this reason,
1182 we provide a function to import wisdom from a standard system-wide
1183 location (`/etc/fftw/wisdom' on Unix):
1184
1185 int fftw_import_system_wisdom(void);
1186
1187 FFTW also provides a standalone program, `fftw-wisdom' (described by
1188 its own `man' page on Unix) with which users can create wisdom, e.g.
1189 for a canonical set of sizes to store in the system wisdom file. *Note
1190 Wisdom Utilities::.
1191
1192 
1193 File: fftw3.info, Node: FFTW Reference, Next: Multi-threaded FFTW, Prev: Other Important Topics, Up: Top
1194
1195 4 FFTW Reference
1196 ****************
1197
1198 This chapter provides a complete reference for all sequential (i.e.,
1199 one-processor) FFTW functions. Parallel transforms are described in
1200 later chapters.
1201
1202 * Menu:
1203
1204 * Data Types and Files::
1205 * Using Plans::
1206 * Basic Interface::
1207 * Advanced Interface::
1208 * Guru Interface::
1209 * New-array Execute Functions::
1210 * Wisdom::
1211 * What FFTW Really Computes::
1212
1213 
1214 File: fftw3.info, Node: Data Types and Files, Next: Using Plans, Prev: FFTW Reference, Up: FFTW Reference
1215
1216 4.1 Data Types and Files
1217 ========================
1218
1219 All programs using FFTW should include its header file:
1220
1221 #include <fftw3.h>
1222
1223 You must also link to the FFTW library. On Unix, this means adding
1224 `-lfftw3 -lm' at the _end_ of the link command.
1225
1226 * Menu:
1227
1228 * Complex numbers::
1229 * Precision::
1230 * Memory Allocation::
1231
1232 
1233 File: fftw3.info, Node: Complex numbers, Next: Precision, Prev: Data Types and Files, Up: Data Types and Files
1234
1235 4.1.1 Complex numbers
1236 ---------------------
1237
1238 The default FFTW interface uses `double' precision for all
1239 floating-point numbers, and defines a `fftw_complex' type to hold
1240 complex numbers as:
1241
1242 typedef double fftw_complex[2];
1243
1244 Here, the `[0]' element holds the real part and the `[1]' element
1245 holds the imaginary part.
1246
1247 Alternatively, if you have a C compiler (such as `gcc') that
1248 supports the C99 revision of the ANSI C standard, you can use C's new
1249 native complex type (which is binary-compatible with the typedef above).
1250 In particular, if you `#include <complex.h>' _before_ `<fftw3.h>', then
1251 `fftw_complex' is defined to be the native complex type and you can
1252 manipulate it with ordinary arithmetic (e.g. `x = y * (3+4*I)', where
1253 `x' and `y' are `fftw_complex' and `I' is the standard symbol for the
1254 imaginary unit);
1255
1256 C++ has its own `complex<T>' template class, defined in the standard
1257 `<complex>' header file. Reportedly, the C++ standards committee has
1258 recently agreed to mandate that the storage format used for this type
1259 be binary-compatible with the C99 type, i.e. an array `T[2]' with
1260 consecutive real `[0]' and imaginary `[1]' parts. (See report
1261 `http://www.open-std.org/jtc1/sc22/WG21/docs/papers/2002/n1388.pdf
1262 WG21/N1388'.) Although not part of the official standard as of this
1263 writing, the proposal stated that: "This solution has been tested with
1264 all current major implementations of the standard library and shown to
1265 be working." To the extent that this is true, if you have a variable
1266 `complex<double> *x', you can pass it directly to FFTW via
1267 `reinterpret_cast<fftw_complex*>(x)'.
1268
1269 
1270 File: fftw3.info, Node: Precision, Next: Memory Allocation, Prev: Complex numbers, Up: Data Types and Files
1271
1272 4.1.2 Precision
1273 ---------------
1274
1275 You can install single and long-double precision versions of FFTW,
1276 which replace `double' with `float' and `long double', respectively
1277 (*note Installation and Customization::). To use these interfaces, you:
1278
1279 * Link to the single/long-double libraries; on Unix, `-lfftw3f' or
1280 `-lfftw3l' instead of (or in addition to) `-lfftw3'. (You can
1281 link to the different-precision libraries simultaneously.)
1282
1283 * Include the _same_ `<fftw3.h>' header file.
1284
1285 * Replace all lowercase instances of `fftw_' with `fftwf_' or
1286 `fftwl_' for single or long-double precision, respectively.
1287 (`fftw_complex' becomes `fftwf_complex', `fftw_execute' becomes
1288 `fftwf_execute', etcetera.)
1289
1290 * Uppercase names, i.e. names beginning with `FFTW_', remain the
1291 same.
1292
1293 * Replace `double' with `float' or `long double' for subroutine
1294 parameters.
1295
1296
1297 Depending upon your compiler and/or hardware, `long double' may not
1298 be any more precise than `double' (or may not be supported at all,
1299 although it is standard in C99).
1300
1301 We also support using the nonstandard `__float128'
1302 quadruple-precision type provided by recent versions of `gcc' on 32-
1303 and 64-bit x86 hardware (*note Installation and Customization::). To
1304 use this type, link with `-lfftw3q -lquadmath -lm' (the `libquadmath'
1305 library provided by `gcc' is needed for quadruple-precision
1306 trigonometric functions) and use `fftwq_' identifiers.
1307
1308 
1309 File: fftw3.info, Node: Memory Allocation, Prev: Precision, Up: Data Types and Files
1310
1311 4.1.3 Memory Allocation
1312 -----------------------
1313
1314 void *fftw_malloc(size_t n);
1315 void fftw_free(void *p);
1316
1317 These are functions that behave identically to `malloc' and `free',
1318 except that they guarantee that the returned pointer obeys any special
1319 alignment restrictions imposed by any algorithm in FFTW (e.g. for SIMD
1320 acceleration). *Note SIMD alignment and fftw_malloc::.
1321
1322 Data allocated by `fftw_malloc' _must_ be deallocated by `fftw_free'
1323 and not by the ordinary `free'.
1324
1325 These routines simply call through to your operating system's
1326 `malloc' or, if necessary, its aligned equivalent (e.g. `memalign'), so
1327 you normally need not worry about any significant time or space
1328 overhead. You are _not required_ to use them to allocate your data,
1329 but we strongly recommend it.
1330
1331 Note: in C++, just as with ordinary `malloc', you must typecast the
1332 output of `fftw_malloc' to whatever pointer type you are allocating.
1333
1334 We also provide the following two convenience functions to allocate
1335 real and complex arrays with `n' elements, which are equivalent to
1336 `(double *) fftw_malloc(sizeof(double) * n)' and `(fftw_complex *)
1337 fftw_malloc(sizeof(fftw_complex) * n)', respectively:
1338
1339 double *fftw_alloc_real(size_t n);
1340 fftw_complex *fftw_alloc_complex(size_t n);
1341
1342 The equivalent functions in other precisions allocate arrays of `n'
1343 elements in that precision. e.g. `fftwf_alloc_real(n)' is equivalent
1344 to `(float *) fftwf_malloc(sizeof(float) * n)'.
1345
1346 
1347 File: fftw3.info, Node: Using Plans, Next: Basic Interface, Prev: Data Types and Files, Up: FFTW Reference
1348
1349 4.2 Using Plans
1350 ===============
1351
1352 Plans for all transform types in FFTW are stored as type `fftw_plan'
1353 (an opaque pointer type), and are created by one of the various
1354 planning routines described in the following sections. An `fftw_plan'
1355 contains all information necessary to compute the transform, including
1356 the pointers to the input and output arrays.
1357
1358 void fftw_execute(const fftw_plan plan);
1359
1360 This executes the `plan', to compute the corresponding transform on
1361 the arrays for which it was planned (which must still exist). The plan
1362 is not modified, and `fftw_execute' can be called as many times as
1363 desired.
1364
1365 To apply a given plan to a different array, you can use the
1366 new-array execute interface. *Note New-array Execute Functions::.
1367
1368 `fftw_execute' (and equivalents) is the only function in FFTW
1369 guaranteed to be thread-safe; see *note Thread safety::.
1370
1371 This function:
1372 void fftw_destroy_plan(fftw_plan plan);
1373 deallocates the `plan' and all its associated data.
1374
1375 FFTW's planner saves some other persistent data, such as the
1376 accumulated wisdom and a list of algorithms available in the current
1377 configuration. If you want to deallocate all of that and reset FFTW to
1378 the pristine state it was in when you started your program, you can
1379 call:
1380
1381 void fftw_cleanup(void);
1382
1383 After calling `fftw_cleanup', all existing plans become undefined,
1384 and you should not attempt to execute them nor to destroy them. You can
1385 however create and execute/destroy new plans, in which case FFTW starts
1386 accumulating wisdom information again.
1387
1388 `fftw_cleanup' does not deallocate your plans, however. To prevent
1389 memory leaks, you must still call `fftw_destroy_plan' before executing
1390 `fftw_cleanup'.
1391
1392 Occasionally, it may useful to know FFTW's internal "cost" metric
1393 that it uses to compare plans to one another; this cost is proportional
1394 to an execution time of the plan, in undocumented units, if the plan
1395 was created with the `FFTW_MEASURE' or other timing-based options, or
1396 alternatively is a heuristic cost function for `FFTW_ESTIMATE' plans.
1397 (The cost values of measured and estimated plans are not comparable,
1398 being in different units. Also, costs from different FFTW versions or
1399 the same version compiled differently may not be in the same units.
1400 Plans created from wisdom have a cost of 0 since no timing measurement
1401 is performed for them. Finally, certain problems for which only one
1402 top-level algorithm was possible may have required no measurements of
1403 the cost of the whole plan, in which case `fftw_cost' will also return
1404 0.) The cost metric for a given plan is returned by:
1405
1406 double fftw_cost(const fftw_plan plan);
1407
1408 The following two routines are provided purely for academic purposes
1409 (that is, for entertainment).
1410
1411 void fftw_flops(const fftw_plan plan,
1412 double *add, double *mul, double *fma);
1413
1414 Given a `plan', set `add', `mul', and `fma' to an exact count of the
1415 number of floating-point additions, multiplications, and fused
1416 multiply-add operations involved in the plan's execution. The total
1417 number of floating-point operations (flops) is `add + mul + 2*fma', or
1418 `add + mul + fma' if the hardware supports fused multiply-add
1419 instructions (although the number of FMA operations is only approximate
1420 because of compiler voodoo). (The number of operations should be an
1421 integer, but we use `double' to avoid overflowing `int' for large
1422 transforms; the arguments are of type `double' even for single and
1423 long-double precision versions of FFTW.)
1424
1425 void fftw_fprint_plan(const fftw_plan plan, FILE *output_file);
1426 void fftw_print_plan(const fftw_plan plan);
1427 char *fftw_sprint_plan(const fftw_plan plan);
1428
1429 This outputs a "nerd-readable" representation of the `plan' to the
1430 given file, to `stdout', or two a newly allocated NUL-terminated string
1431 (which the caller is responsible for deallocating with `free'),
1432 respectively.
1433
1434 
1435 File: fftw3.info, Node: Basic Interface, Next: Advanced Interface, Prev: Using Plans, Up: FFTW Reference
1436
1437 4.3 Basic Interface
1438 ===================
1439
1440 Recall that the FFTW API is divided into three parts(1): the "basic
1441 interface" computes a single transform of contiguous data, the "advanced
1442 interface" computes transforms of multiple or strided arrays, and the
1443 "guru interface" supports the most general data layouts,
1444 multiplicities, and strides. This section describes the the basic
1445 interface, which we expect to satisfy the needs of most users.
1446
1447 * Menu:
1448
1449 * Complex DFTs::
1450 * Planner Flags::
1451 * Real-data DFTs::
1452 * Real-data DFT Array Format::
1453 * Real-to-Real Transforms::
1454 * Real-to-Real Transform Kinds::
1455
1456 ---------- Footnotes ----------
1457
1458 (1) Gallia est omnis divisa in partes tres (Julius Caesar).
1459
1460 
1461 File: fftw3.info, Node: Complex DFTs, Next: Planner Flags, Prev: Basic Interface, Up: Basic Interface
1462
1463 4.3.1 Complex DFTs
1464 ------------------
1465
1466 fftw_plan fftw_plan_dft_1d(int n0,
1467 fftw_complex *in, fftw_complex *out,
1468 int sign, unsigned flags);
1469 fftw_plan fftw_plan_dft_2d(int n0, int n1,
1470 fftw_complex *in, fftw_complex *out,
1471 int sign, unsigned flags);
1472 fftw_plan fftw_plan_dft_3d(int n0, int n1, int n2,
1473 fftw_complex *in, fftw_complex *out,
1474 int sign, unsigned flags);
1475 fftw_plan fftw_plan_dft(int rank, const int *n,
1476 fftw_complex *in, fftw_complex *out,
1477 int sign, unsigned flags);
1478
1479 Plan a complex input/output discrete Fourier transform (DFT) in zero
1480 or more dimensions, returning an `fftw_plan' (*note Using Plans::).
1481
1482 Once you have created a plan for a certain transform type and
1483 parameters, then creating another plan of the same type and parameters,
1484 but for different arrays, is fast and shares constant data with the
1485 first plan (if it still exists).
1486
1487 The planner returns `NULL' if the plan cannot be created. In the
1488 standard FFTW distribution, the basic interface is guaranteed to return
1489 a non-`NULL' plan. A plan may be `NULL', however, if you are using a
1490 customized FFTW configuration supporting a restricted set of transforms.
1491
1492 Arguments
1493 .........
1494
1495 * `rank' is the rank of the transform (it should be the size of the
1496 array `*n'), and can be any non-negative integer. (*Note Complex
1497 Multi-Dimensional DFTs::, for the definition of "rank".) The
1498 `_1d', `_2d', and `_3d' planners correspond to a `rank' of `1',
1499 `2', and `3', respectively. The rank may be zero, which is
1500 equivalent to a rank-1 transform of size 1, i.e. a copy of one
1501 number from input to output.
1502
1503 * `n0', `n1', `n2', or `n[0..rank-1]' (as appropriate for each
1504 routine) specify the size of the transform dimensions. They can
1505 be any positive integer.
1506
1507 - Multi-dimensional arrays are stored in row-major order with
1508 dimensions: `n0' x `n1'; or `n0' x `n1' x `n2'; or `n[0]' x
1509 `n[1]' x ... x `n[rank-1]'. *Note Multi-dimensional Array
1510 Format::.
1511
1512 - FFTW is best at handling sizes of the form 2^a 3^b 5^c 7^d
1513 11^e 13^f, where e+f is either 0 or 1, and the other exponents
1514 are arbitrary. Other sizes are computed by means of a slow,
1515 general-purpose algorithm (which nevertheless retains O(n log
1516 n) performance even for prime sizes). It is possible to
1517 customize FFTW for different array sizes; see *note
1518 Installation and Customization::. Transforms whose sizes are
1519 powers of 2 are especially fast.
1520
1521 * `in' and `out' point to the input and output arrays of the
1522 transform, which may be the same (yielding an in-place transform). These
1523 arrays are overwritten during planning, unless `FFTW_ESTIMATE' is
1524 used in the flags. (The arrays need not be initialized, but they
1525 must be allocated.)
1526
1527 If `in == out', the transform is "in-place" and the input array is
1528 overwritten. If `in != out', the two arrays must not overlap (but
1529 FFTW does not check for this condition).
1530
1531 * `sign' is the sign of the exponent in the formula that defines the
1532 Fourier transform. It can be -1 (= `FFTW_FORWARD') or +1 (=
1533 `FFTW_BACKWARD').
1534
1535 * `flags' is a bitwise OR (`|') of zero or more planner flags, as
1536 defined in *note Planner Flags::.
1537
1538
1539 FFTW computes an unnormalized transform: computing a forward
1540 followed by a backward transform (or vice versa) will result in the
1541 original data multiplied by the size of the transform (the product of
1542 the dimensions). For more information, see *note What FFTW Really
1543 Computes::.
1544
1545 
1546 File: fftw3.info, Node: Planner Flags, Next: Real-data DFTs, Prev: Complex DFTs, Up: Basic Interface
1547
1548 4.3.2 Planner Flags
1549 -------------------
1550
1551 All of the planner routines in FFTW accept an integer `flags' argument,
1552 which is a bitwise OR (`|') of zero or more of the flag constants
1553 defined below. These flags control the rigor (and time) of the
1554 planning process, and can also impose (or lift) restrictions on the
1555 type of transform algorithm that is employed.
1556
1557 _Important:_ the planner overwrites the input array during planning
1558 unless a saved plan (*note Wisdom::) is available for that problem, so
1559 you should initialize your input data after creating the plan. The
1560 only exceptions to this are the `FFTW_ESTIMATE' and `FFTW_WISDOM_ONLY'
1561 flags, as mentioned below.
1562
1563 In all cases, if wisdom is available for the given problem that
1564 was created with equal-or-greater planning rigor, then the more
1565 rigorous wisdom is used. For example, in `FFTW_ESTIMATE' mode any
1566 available wisdom is used, whereas in `FFTW_PATIENT' mode only wisdom
1567 created in patient or exhaustive mode can be used. *Note Words of
1568 Wisdom-Saving Plans::.
1569
1570 Planning-rigor flags
1571 ....................
1572
1573 * `FFTW_ESTIMATE' specifies that, instead of actual measurements of
1574 different algorithms, a simple heuristic is used to pick a
1575 (probably sub-optimal) plan quickly. With this flag, the
1576 input/output arrays are not overwritten during planning.
1577
1578 * `FFTW_MEASURE' tells FFTW to find an optimized plan by actually
1579 _computing_ several FFTs and measuring their execution time.
1580 Depending on your machine, this can take some time (often a few
1581 seconds). `FFTW_MEASURE' is the default planning option.
1582
1583 * `FFTW_PATIENT' is like `FFTW_MEASURE', but considers a wider range
1584 of algorithms and often produces a "more optimal" plan (especially
1585 for large transforms), but at the expense of several times longer
1586 planning time (especially for large transforms).
1587
1588 * `FFTW_EXHAUSTIVE' is like `FFTW_PATIENT', but considers an even
1589 wider range of algorithms, including many that we think are
1590 unlikely to be fast, to produce the most optimal plan but with a
1591 substantially increased planning time.
1592
1593 * `FFTW_WISDOM_ONLY' is a special planning mode in which the plan is
1594 only created if wisdom is available for the given problem, and
1595 otherwise a `NULL' plan is returned. This can be combined with
1596 other flags, e.g. `FFTW_WISDOM_ONLY | FFTW_PATIENT' creates a plan
1597 only if wisdom is available that was created in `FFTW_PATIENT' or
1598 `FFTW_EXHAUSTIVE' mode. The `FFTW_WISDOM_ONLY' flag is intended
1599 for users who need to detect whether wisdom is available; for
1600 example, if wisdom is not available one may wish to allocate new
1601 arrays for planning so that user data is not overwritten.
1602
1603
1604 Algorithm-restriction flags
1605 ...........................
1606
1607 * `FFTW_DESTROY_INPUT' specifies that an out-of-place transform is
1608 allowed to _overwrite its input_ array with arbitrary data; this
1609 can sometimes allow more efficient algorithms to be employed.
1610
1611 * `FFTW_PRESERVE_INPUT' specifies that an out-of-place transform must
1612 _not change its input_ array. This is ordinarily the _default_,
1613 except for c2r and hc2r (i.e. complex-to-real) transforms for
1614 which `FFTW_DESTROY_INPUT' is the default. In the latter cases,
1615 passing `FFTW_PRESERVE_INPUT' will attempt to use algorithms that
1616 do not destroy the input, at the expense of worse performance; for
1617 multi-dimensional c2r transforms, however, no input-preserving
1618 algorithms are implemented and the planner will return `NULL' if
1619 one is requested.
1620
1621 * `FFTW_UNALIGNED' specifies that the algorithm may not impose any
1622 unusual alignment requirements on the input/output arrays (i.e. no
1623 SIMD may be used). This flag is normally _not necessary_, since
1624 the planner automatically detects misaligned arrays. The only use
1625 for this flag is if you want to use the new-array execute
1626 interface to execute a given plan on a different array that may
1627 not be aligned like the original. (Using `fftw_malloc' makes this
1628 flag unnecessary even then. You can also use `fftw_alignment_of'
1629 to detect whether two arrays are equivalently aligned.)
1630
1631
1632 Limiting planning time
1633 ......................
1634
1635 extern void fftw_set_timelimit(double seconds);
1636
1637 This function instructs FFTW to spend at most `seconds' seconds
1638 (approximately) in the planner. If `seconds == FFTW_NO_TIMELIMIT' (the
1639 default value, which is negative), then planning time is unbounded.
1640 Otherwise, FFTW plans with a progressively wider range of algorithms
1641 until the the given time limit is reached or the given range of
1642 algorithms is explored, returning the best available plan.
1643
1644 For example, specifying `FFTW_PATIENT' first plans in
1645 `FFTW_ESTIMATE' mode, then in `FFTW_MEASURE' mode, then finally (time
1646 permitting) in `FFTW_PATIENT'. If `FFTW_EXHAUSTIVE' is specified
1647 instead, the planner will further progress to `FFTW_EXHAUSTIVE' mode.
1648
1649 Note that the `seconds' argument specifies only a rough limit; in
1650 practice, the planner may use somewhat more time if the time limit is
1651 reached when the planner is in the middle of an operation that cannot
1652 be interrupted. At the very least, the planner will complete planning
1653 in `FFTW_ESTIMATE' mode (which is thus equivalent to a time limit of 0).
1654
1655 
1656 File: fftw3.info, Node: Real-data DFTs, Next: Real-data DFT Array Format, Prev: Planner Flags, Up: Basic Interface
1657
1658 4.3.3 Real-data DFTs
1659 --------------------
1660
1661 fftw_plan fftw_plan_dft_r2c_1d(int n0,
1662 double *in, fftw_complex *out,
1663 unsigned flags);
1664 fftw_plan fftw_plan_dft_r2c_2d(int n0, int n1,
1665 double *in, fftw_complex *out,
1666 unsigned flags);
1667 fftw_plan fftw_plan_dft_r2c_3d(int n0, int n1, int n2,
1668 double *in, fftw_complex *out,
1669 unsigned flags);
1670 fftw_plan fftw_plan_dft_r2c(int rank, const int *n,
1671 double *in, fftw_complex *out,
1672 unsigned flags);
1673
1674 Plan a real-input/complex-output discrete Fourier transform (DFT) in
1675 zero or more dimensions, returning an `fftw_plan' (*note Using Plans::).
1676
1677 Once you have created a plan for a certain transform type and
1678 parameters, then creating another plan of the same type and parameters,
1679 but for different arrays, is fast and shares constant data with the
1680 first plan (if it still exists).
1681
1682 The planner returns `NULL' if the plan cannot be created. A
1683 non-`NULL' plan is always returned by the basic interface unless you
1684 are using a customized FFTW configuration supporting a restricted set
1685 of transforms, or if you use the `FFTW_PRESERVE_INPUT' flag with a
1686 multi-dimensional out-of-place c2r transform (see below).
1687
1688 Arguments
1689 .........
1690
1691 * `rank' is the rank of the transform (it should be the size of the
1692 array `*n'), and can be any non-negative integer. (*Note Complex
1693 Multi-Dimensional DFTs::, for the definition of "rank".) The
1694 `_1d', `_2d', and `_3d' planners correspond to a `rank' of `1',
1695 `2', and `3', respectively. The rank may be zero, which is
1696 equivalent to a rank-1 transform of size 1, i.e. a copy of one
1697 real number (with zero imaginary part) from input to output.
1698
1699 * `n0', `n1', `n2', or `n[0..rank-1]', (as appropriate for each
1700 routine) specify the size of the transform dimensions. They can
1701 be any positive integer. This is different in general from the
1702 _physical_ array dimensions, which are described in *note
1703 Real-data DFT Array Format::.
1704
1705 - FFTW is best at handling sizes of the form 2^a 3^b 5^c 7^d
1706 11^e 13^f, where e+f is either 0 or 1, and the other exponents
1707 are arbitrary. Other sizes are computed by means of a slow,
1708 general-purpose algorithm (which nevertheless retains O(n log
1709 n) performance even for prime sizes). (It is possible to
1710 customize FFTW for different array sizes; see *note
1711 Installation and Customization::.) Transforms whose sizes
1712 are powers of 2 are especially fast, and it is generally
1713 beneficial for the _last_ dimension of an r2c/c2r transform
1714 to be _even_.
1715
1716 * `in' and `out' point to the input and output arrays of the
1717 transform, which may be the same (yielding an in-place transform). These
1718 arrays are overwritten during planning, unless `FFTW_ESTIMATE' is
1719 used in the flags. (The arrays need not be initialized, but they
1720 must be allocated.) For an in-place transform, it is important to
1721 remember that the real array will require padding, described in
1722 *note Real-data DFT Array Format::.
1723
1724 * `flags' is a bitwise OR (`|') of zero or more planner flags, as
1725 defined in *note Planner Flags::.
1726
1727
1728 The inverse transforms, taking complex input (storing the
1729 non-redundant half of a logically Hermitian array) to real output, are
1730 given by:
1731
1732 fftw_plan fftw_plan_dft_c2r_1d(int n0,
1733 fftw_complex *in, double *out,
1734 unsigned flags);
1735 fftw_plan fftw_plan_dft_c2r_2d(int n0, int n1,
1736 fftw_complex *in, double *out,
1737 unsigned flags);
1738 fftw_plan fftw_plan_dft_c2r_3d(int n0, int n1, int n2,
1739 fftw_complex *in, double *out,
1740 unsigned flags);
1741 fftw_plan fftw_plan_dft_c2r(int rank, const int *n,
1742 fftw_complex *in, double *out,
1743 unsigned flags);
1744
1745 The arguments are the same as for the r2c transforms, except that the
1746 input and output data formats are reversed.
1747
1748 FFTW computes an unnormalized transform: computing an r2c followed
1749 by a c2r transform (or vice versa) will result in the original data
1750 multiplied by the size of the transform (the product of the logical
1751 dimensions). An r2c transform produces the same output as a
1752 `FFTW_FORWARD' complex DFT of the same input, and a c2r transform is
1753 correspondingly equivalent to `FFTW_BACKWARD'. For more information,
1754 see *note What FFTW Really Computes::.
1755
1756 
1757 File: fftw3.info, Node: Real-data DFT Array Format, Next: Real-to-Real Transforms, Prev: Real-data DFTs, Up: Basic Interface
1758
1759 4.3.4 Real-data DFT Array Format
1760 --------------------------------
1761
1762 The output of a DFT of real data (r2c) contains symmetries that, in
1763 principle, make half of the outputs redundant (*note What FFTW Really
1764 Computes::). (Similarly for the input of an inverse c2r transform.) In
1765 practice, it is not possible to entirely realize these savings in an
1766 efficient and understandable format that generalizes to
1767 multi-dimensional transforms. Instead, the output of the r2c
1768 transforms is _slightly_ over half of the output of the corresponding
1769 complex transform. We do not "pack" the data in any way, but store it
1770 as an ordinary array of `fftw_complex' values. In fact, this data is
1771 simply a subsection of what would be the array in the corresponding
1772 complex transform.
1773
1774 Specifically, for a real transform of d (= `rank') dimensions n[0] x
1775 n[1] x n[2] x ... x n[d-1] , the complex data is an n[0] x n[1] x n[2]
1776 x ... x (n[d-1]/2 + 1) array of `fftw_complex' values in row-major
1777 order (with the division rounded down). That is, we only store the
1778 _lower_ half (non-negative frequencies), plus one element, of the last
1779 dimension of the data from the ordinary complex transform. (We could
1780 have instead taken half of any other dimension, but implementation
1781 turns out to be simpler if the last, contiguous, dimension is used.)
1782
1783 For an out-of-place transform, the real data is simply an array with
1784 physical dimensions n[0] x n[1] x n[2] x ... x n[d-1] in row-major
1785 order.
1786
1787 For an in-place transform, some complications arise since the
1788 complex data is slightly larger than the real data. In this case, the
1789 final dimension of the real data must be _padded_ with extra values to
1790 accommodate the size of the complex data--two extra if the last
1791 dimension is even and one if it is odd. That is, the last dimension of
1792 the real data must physically contain 2 * (n[d-1]/2+1) `double' values
1793 (exactly enough to hold the complex data). This physical array size
1794 does not, however, change the _logical_ array size--only n[d-1] values
1795 are actually stored in the last dimension, and n[d-1] is the last
1796 dimension passed to the planner.
1797
1798 
1799 File: fftw3.info, Node: Real-to-Real Transforms, Next: Real-to-Real Transform Kinds, Prev: Real-data DFT Array Format, Up: Basic Interface
1800
1801 4.3.5 Real-to-Real Transforms
1802 -----------------------------
1803
1804 fftw_plan fftw_plan_r2r_1d(int n, double *in, double *out,
1805 fftw_r2r_kind kind, unsigned flags);
1806 fftw_plan fftw_plan_r2r_2d(int n0, int n1, double *in, double *out,
1807 fftw_r2r_kind kind0, fftw_r2r_kind kind1,
1808 unsigned flags);
1809 fftw_plan fftw_plan_r2r_3d(int n0, int n1, int n2,
1810 double *in, double *out,
1811 fftw_r2r_kind kind0,
1812 fftw_r2r_kind kind1,
1813 fftw_r2r_kind kind2,
1814 unsigned flags);
1815 fftw_plan fftw_plan_r2r(int rank, const int *n, double *in, double *out,
1816 const fftw_r2r_kind *kind, unsigned flags);
1817
1818 Plan a real input/output (r2r) transform of various kinds in zero or
1819 more dimensions, returning an `fftw_plan' (*note Using Plans::).
1820
1821 Once you have created a plan for a certain transform type and
1822 parameters, then creating another plan of the same type and parameters,
1823 but for different arrays, is fast and shares constant data with the
1824 first plan (if it still exists).
1825
1826 The planner returns `NULL' if the plan cannot be created. A
1827 non-`NULL' plan is always returned by the basic interface unless you
1828 are using a customized FFTW configuration supporting a restricted set
1829 of transforms, or for size-1 `FFTW_REDFT00' kinds (which are not
1830 defined).
1831
1832 Arguments
1833 .........
1834
1835 * `rank' is the dimensionality of the transform (it should be the
1836 size of the arrays `*n' and `*kind'), and can be any non-negative
1837 integer. The `_1d', `_2d', and `_3d' planners correspond to a
1838 `rank' of `1', `2', and `3', respectively. A `rank' of zero is
1839 equivalent to a copy of one number from input to output.
1840
1841 * `n', or `n0'/`n1'/`n2', or `n[rank]', respectively, gives the
1842 (physical) size of the transform dimensions. They can be any
1843 positive integer.
1844
1845 - Multi-dimensional arrays are stored in row-major order with
1846 dimensions: `n0' x `n1'; or `n0' x `n1' x `n2'; or `n[0]' x
1847 `n[1]' x ... x `n[rank-1]'. *Note Multi-dimensional Array
1848 Format::.
1849
1850 - FFTW is generally best at handling sizes of the form 2^a 3^b
1851 5^c 7^d 11^e 13^f, where e+f is either 0 or 1, and the other
1852 exponents are arbitrary. Other sizes are computed by means
1853 of a slow, general-purpose algorithm (which nevertheless
1854 retains O(n log n) performance even for prime sizes). (It
1855 is possible to customize FFTW for different array sizes; see
1856 *note Installation and Customization::.) Transforms whose
1857 sizes are powers of 2 are especially fast.
1858
1859 - For a `REDFT00' or `RODFT00' transform kind in a dimension of
1860 size n, it is n-1 or n+1, respectively, that should be
1861 factorizable in the above form.
1862
1863 * `in' and `out' point to the input and output arrays of the
1864 transform, which may be the same (yielding an in-place transform). These
1865 arrays are overwritten during planning, unless `FFTW_ESTIMATE' is
1866 used in the flags. (The arrays need not be initialized, but they
1867 must be allocated.)
1868
1869 * `kind', or `kind0'/`kind1'/`kind2', or `kind[rank]', is the kind
1870 of r2r transform used for the corresponding dimension. The valid
1871 kind constants are described in *note Real-to-Real Transform
1872 Kinds::. In a multi-dimensional transform, what is computed is
1873 the separable product formed by taking each transform kind along
1874 the corresponding dimension, one dimension after another.
1875
1876 * `flags' is a bitwise OR (`|') of zero or more planner flags, as
1877 defined in *note Planner Flags::.
1878
1879
1880 
1881 File: fftw3.info, Node: Real-to-Real Transform Kinds, Prev: Real-to-Real Transforms, Up: Basic Interface
1882
1883 4.3.6 Real-to-Real Transform Kinds
1884 ----------------------------------
1885
1886 FFTW currently supports 11 different r2r transform kinds, specified by
1887 one of the constants below. For the precise definitions of these
1888 transforms, see *note What FFTW Really Computes::. For a more
1889 colloquial introduction to these transform kinds, see *note More DFTs
1890 of Real Data::.
1891
1892 For dimension of size `n', there is a corresponding "logical"
1893 dimension `N' that determines the normalization (and the optimal
1894 factorization); the formula for `N' is given for each kind below.
1895 Also, with each transform kind is listed its corrsponding inverse
1896 transform. FFTW computes unnormalized transforms: a transform followed
1897 by its inverse will result in the original data multiplied by `N' (or
1898 the product of the `N''s for each dimension, in multi-dimensions).
1899
1900 * `FFTW_R2HC' computes a real-input DFT with output in "halfcomplex"
1901 format, i.e. real and imaginary parts for a transform of size `n'
1902 stored as: r0, r1, r2, r(n/2), i((n+1)/2-1), ..., i2, i1 (Logical
1903 `N=n', inverse is `FFTW_HC2R'.)
1904
1905 * `FFTW_HC2R' computes the reverse of `FFTW_R2HC', above. (Logical
1906 `N=n', inverse is `FFTW_R2HC'.)
1907
1908 * `FFTW_DHT' computes a discrete Hartley transform. (Logical `N=n',
1909 inverse is `FFTW_DHT'.)
1910
1911 * `FFTW_REDFT00' computes an REDFT00 transform, i.e. a DCT-I.
1912 (Logical `N=2*(n-1)', inverse is `FFTW_REDFT00'.)
1913
1914 * `FFTW_REDFT10' computes an REDFT10 transform, i.e. a DCT-II
1915 (sometimes called "the" DCT). (Logical `N=2*n', inverse is
1916 `FFTW_REDFT01'.)
1917
1918 * `FFTW_REDFT01' computes an REDFT01 transform, i.e. a DCT-III
1919 (sometimes called "the" IDCT, being the inverse of DCT-II).
1920 (Logical `N=2*n', inverse is `FFTW_REDFT=10'.)
1921
1922 * `FFTW_REDFT11' computes an REDFT11 transform, i.e. a DCT-IV.
1923 (Logical `N=2*n', inverse is `FFTW_REDFT11'.)
1924
1925 * `FFTW_RODFT00' computes an RODFT00 transform, i.e. a DST-I.
1926 (Logical `N=2*(n+1)', inverse is `FFTW_RODFT00'.)
1927
1928 * `FFTW_RODFT10' computes an RODFT10 transform, i.e. a DST-II.
1929 (Logical `N=2*n', inverse is `FFTW_RODFT01'.)
1930
1931 * `FFTW_RODFT01' computes an RODFT01 transform, i.e. a DST-III.
1932 (Logical `N=2*n', inverse is `FFTW_RODFT=10'.)
1933
1934 * `FFTW_RODFT11' computes an RODFT11 transform, i.e. a DST-IV.
1935 (Logical `N=2*n', inverse is `FFTW_RODFT11'.)
1936
1937
1938 
1939 File: fftw3.info, Node: Advanced Interface, Next: Guru Interface, Prev: Basic Interface, Up: FFTW Reference
1940
1941 4.4 Advanced Interface
1942 ======================
1943
1944 FFTW's "advanced" interface supplements the basic interface with four
1945 new planner routines, providing a new level of flexibility: you can plan
1946 a transform of multiple arrays simultaneously, operate on non-contiguous
1947 (strided) data, and transform a subset of a larger multi-dimensional
1948 array. Other than these additional features, the planner operates in
1949 the same fashion as in the basic interface, and the resulting
1950 `fftw_plan' is used in the same way (*note Using Plans::).
1951
1952 * Menu:
1953
1954 * Advanced Complex DFTs::
1955 * Advanced Real-data DFTs::
1956 * Advanced Real-to-real Transforms::
1957
1958 
1959 File: fftw3.info, Node: Advanced Complex DFTs, Next: Advanced Real-data DFTs, Prev: Advanced Interface, Up: Advanced Interface
1960
1961 4.4.1 Advanced Complex DFTs
1962 ---------------------------
1963
1964 fftw_plan fftw_plan_many_dft(int rank, const int *n, int howmany,
1965 fftw_complex *in, const int *inembed,
1966 int istride, int idist,
1967 fftw_complex *out, const int *onembed,
1968 int ostride, int odist,
1969 int sign, unsigned flags);
1970
1971 This routine plans multiple multidimensional complex DFTs, and it
1972 extends the `fftw_plan_dft' routine (*note Complex DFTs::) to compute
1973 `howmany' transforms, each having rank `rank' and size `n'. In
1974 addition, the transform data need not be contiguous, but it may be laid
1975 out in memory with an arbitrary stride. To account for these
1976 possibilities, `fftw_plan_many_dft' adds the new parameters `howmany',
1977 {`i',`o'}`nembed', {`i',`o'}`stride', and {`i',`o'}`dist'. The FFTW
1978 basic interface (*note Complex DFTs::) provides routines specialized
1979 for ranks 1, 2, and 3, but the advanced interface handles only the
1980 general-rank case.
1981
1982 `howmany' is the number of transforms to compute. The resulting
1983 plan computes `howmany' transforms, where the input of the `k'-th
1984 transform is at location `in+k*idist' (in C pointer arithmetic), and
1985 its output is at location `out+k*odist'. Plans obtained in this way
1986 can often be faster than calling FFTW multiple times for the individual
1987 transforms. The basic `fftw_plan_dft' interface corresponds to
1988 `howmany=1' (in which case the `dist' parameters are ignored).
1989
1990 Each of the `howmany' transforms has rank `rank' and size `n', as in
1991 the basic interface. In addition, the advanced interface allows the
1992 input and output arrays of each transform to be row-major subarrays of
1993 larger rank-`rank' arrays, described by `inembed' and `onembed'
1994 parameters, respectively. {`i',`o'}`nembed' must be arrays of length
1995 `rank', and `n' should be elementwise less than or equal to
1996 {`i',`o'}`nembed'. Passing `NULL' for an `nembed' parameter is
1997 equivalent to passing `n' (i.e. same physical and logical dimensions,
1998 as in the basic interface.)
1999
2000 The `stride' parameters indicate that the `j'-th element of the
2001 input or output arrays is located at `j*istride' or `j*ostride',
2002 respectively. (For a multi-dimensional array, `j' is the ordinary
2003 row-major index.) When combined with the `k'-th transform in a
2004 `howmany' loop, from above, this means that the (`j',`k')-th element is
2005 at `j*stride+k*dist'. (The basic `fftw_plan_dft' interface corresponds
2006 to a stride of 1.)
2007
2008 For in-place transforms, the input and output `stride' and `dist'
2009 parameters should be the same; otherwise, the planner may return `NULL'.
2010
2011 Arrays `n', `inembed', and `onembed' are not used after this
2012 function returns. You can safely free or reuse them.
2013
2014 *Examples*: One transform of one 5 by 6 array contiguous in memory:
2015 int rank = 2;
2016 int n[] = {5, 6};
2017 int howmany = 1;
2018 int idist = odist = 0; /* unused because howmany = 1 */
2019 int istride = ostride = 1; /* array is contiguous in memory */
2020 int *inembed = n, *onembed = n;
2021
2022 Transform of three 5 by 6 arrays, each contiguous in memory, stored
2023 in memory one after another:
2024 int rank = 2;
2025 int n[] = {5, 6};
2026 int howmany = 3;
2027 int idist = odist = n[0]*n[1]; /* = 30, the distance in memory
2028 between the first element
2029 of the first array and the
2030 first element of the second array */
2031 int istride = ostride = 1; /* array is contiguous in memory */
2032 int *inembed = n, *onembed = n;
2033
2034 Transform each column of a 2d array with 10 rows and 3 columns:
2035 int rank = 1; /* not 2: we are computing 1d transforms */
2036 int n[] = {10}; /* 1d transforms of length 10 */
2037 int howmany = 3;
2038 int idist = odist = 1;
2039 int istride = ostride = 3; /* distance between two elements in
2040 the same column */
2041 int *inembed = n, *onembed = n;
2042
2043 
2044 File: fftw3.info, Node: Advanced Real-data DFTs, Next: Advanced Real-to-real Transforms, Prev: Advanced Complex DFTs, Up: Advanced Interface
2045
2046 4.4.2 Advanced Real-data DFTs
2047 -----------------------------
2048
2049 fftw_plan fftw_plan_many_dft_r2c(int rank, const int *n, int howmany,
2050 double *in, const int *inembed,
2051 int istride, int idist,
2052 fftw_complex *out, const int *onembed,
2053 int ostride, int odist,
2054 unsigned flags);
2055 fftw_plan fftw_plan_many_dft_c2r(int rank, const int *n, int howmany,
2056 fftw_complex *in, const int *inembed,
2057 int istride, int idist,
2058 double *out, const int *onembed,
2059 int ostride, int odist,
2060 unsigned flags);
2061
2062 Like `fftw_plan_many_dft', these two functions add `howmany',
2063 `nembed', `stride', and `dist' parameters to the `fftw_plan_dft_r2c'
2064 and `fftw_plan_dft_c2r' functions, but otherwise behave the same as the
2065 basic interface.
2066
2067 The interpretation of `howmany', `stride', and `dist' are the same
2068 as for `fftw_plan_many_dft', above. Note that the `stride' and `dist'
2069 for the real array are in units of `double', and for the complex array
2070 are in units of `fftw_complex'.
2071
2072 If an `nembed' parameter is `NULL', it is interpreted as what it
2073 would be in the basic interface, as described in *note Real-data DFT
2074 Array Format::. That is, for the complex array the size is assumed to
2075 be the same as `n', but with the last dimension cut roughly in half.
2076 For the real array, the size is assumed to be `n' if the transform is
2077 out-of-place, or `n' with the last dimension "padded" if the transform
2078 is in-place.
2079
2080 If an `nembed' parameter is non-`NULL', it is interpreted as the
2081 physical size of the corresponding array, in row-major order, just as
2082 for `fftw_plan_many_dft'. In this case, each dimension of `nembed'
2083 should be `>=' what it would be in the basic interface (e.g. the halved
2084 or padded `n').
2085
2086 Arrays `n', `inembed', and `onembed' are not used after this
2087 function returns. You can safely free or reuse them.
2088
2089 
2090 File: fftw3.info, Node: Advanced Real-to-real Transforms, Prev: Advanced Real-data DFTs, Up: Advanced Interface
2091
2092 4.4.3 Advanced Real-to-real Transforms
2093 --------------------------------------
2094
2095 fftw_plan fftw_plan_many_r2r(int rank, const int *n, int howmany,
2096 double *in, const int *inembed,
2097 int istride, int idist,
2098 double *out, const int *onembed,
2099 int ostride, int odist,
2100 const fftw_r2r_kind *kind, unsigned flags);
2101
2102 Like `fftw_plan_many_dft', this functions adds `howmany', `nembed',
2103 `stride', and `dist' parameters to the `fftw_plan_r2r' function, but
2104 otherwise behave the same as the basic interface. The interpretation
2105 of those additional parameters are the same as for
2106 `fftw_plan_many_dft'. (Of course, the `stride' and `dist' parameters
2107 are now in units of `double', not `fftw_complex'.)
2108
2109 Arrays `n', `inembed', `onembed', and `kind' are not used after this
2110 function returns. You can safely free or reuse them.
2111
2112 
2113 File: fftw3.info, Node: Guru Interface, Next: New-array Execute Functions, Prev: Advanced Interface, Up: FFTW Reference
2114
2115 4.5 Guru Interface
2116 ==================
2117
2118 The "guru" interface to FFTW is intended to expose as much as possible
2119 of the flexibility in the underlying FFTW architecture. It allows one
2120 to compute multi-dimensional "vectors" (loops) of multi-dimensional
2121 transforms, where each vector/transform dimension has an independent
2122 size and stride. One can also use more general complex-number formats,
2123 e.g. separate real and imaginary arrays.
2124
2125 For those users who require the flexibility of the guru interface,
2126 it is important that they pay special attention to the documentation
2127 lest they shoot themselves in the foot.
2128
2129 * Menu:
2130
2131 * Interleaved and split arrays::
2132 * Guru vector and transform sizes::
2133 * Guru Complex DFTs::
2134 * Guru Real-data DFTs::
2135 * Guru Real-to-real Transforms::
2136 * 64-bit Guru Interface::
2137
2138 
2139 File: fftw3.info, Node: Interleaved and split arrays, Next: Guru vector and transform sizes, Prev: Guru Interface, Up: Guru Interface
2140
2141 4.5.1 Interleaved and split arrays
2142 ----------------------------------
2143
2144 The guru interface supports two representations of complex numbers,
2145 which we call the interleaved and the split format.
2146
2147 The "interleaved" format is the same one used by the basic and
2148 advanced interfaces, and it is documented in *note Complex numbers::.
2149 In the interleaved format, you provide pointers to the real part of a
2150 complex number, and the imaginary part understood to be stored in the
2151 next memory location.
2152
2153 The "split" format allows separate pointers to the real and
2154 imaginary parts of a complex array.
2155
2156 Technically, the interleaved format is redundant, because you can
2157 always express an interleaved array in terms of a split array with
2158 appropriate pointers and strides. On the other hand, the interleaved
2159 format is simpler to use, and it is common in practice. Hence, FFTW
2160 supports it as a special case.
2161
2162 
2163 File: fftw3.info, Node: Guru vector and transform sizes, Next: Guru Complex DFTs, Prev: Interleaved and split arrays, Up: Guru Interface
2164
2165 4.5.2 Guru vector and transform sizes
2166 -------------------------------------
2167
2168 The guru interface introduces one basic new data structure,
2169 `fftw_iodim', that is used to specify sizes and strides for
2170 multi-dimensional transforms and vectors:
2171
2172 typedef struct {
2173 int n;
2174 int is;
2175 int os;
2176 } fftw_iodim;
2177
2178 Here, `n' is the size of the dimension, and `is' and `os' are the
2179 strides of that dimension for the input and output arrays. (The stride
2180 is the separation of consecutive elements along this dimension.)
2181
2182 The meaning of the stride parameter depends on the type of the array
2183 that the stride refers to. _If the array is interleaved complex,
2184 strides are expressed in units of complex numbers (`fftw_complex'). If
2185 the array is split complex or real, strides are expressed in units of
2186 real numbers (`double')._ This convention is consistent with the usual
2187 pointer arithmetic in the C language. An interleaved array is denoted
2188 by a pointer `p' to `fftw_complex', so that `p+1' points to the next
2189 complex number. Split arrays are denoted by pointers to `double', in
2190 which case pointer arithmetic operates in units of `sizeof(double)'.
2191
2192 The guru planner interfaces all take a (`rank', `dims[rank]') pair
2193 describing the transform size, and a (`howmany_rank',
2194 `howmany_dims[howmany_rank]') pair describing the "vector" size (a
2195 multi-dimensional loop of transforms to perform), where `dims' and
2196 `howmany_dims' are arrays of `fftw_iodim'.
2197
2198 For example, the `howmany' parameter in the advanced complex-DFT
2199 interface corresponds to `howmany_rank' = 1, `howmany_dims[0].n' =
2200 `howmany', `howmany_dims[0].is' = `idist', and `howmany_dims[0].os' =
2201 `odist'. (To compute a single transform, you can just use
2202 `howmany_rank' = 0.)
2203
2204 A row-major multidimensional array with dimensions `n[rank]' (*note
2205 Row-major Format::) corresponds to `dims[i].n' = `n[i]' and the
2206 recurrence `dims[i].is' = `n[i+1] * dims[i+1].is' (similarly for `os').
2207 The stride of the last (`i=rank-1') dimension is the overall stride of
2208 the array. e.g. to be equivalent to the advanced complex-DFT
2209 interface, you would have `dims[rank-1].is' = `istride' and
2210 `dims[rank-1].os' = `ostride'.
2211
2212 In general, we only guarantee FFTW to return a non-`NULL' plan if
2213 the vector and transform dimensions correspond to a set of distinct
2214 indices, and for in-place transforms the input/output strides should be
2215 the same.
2216
2217 
2218 File: fftw3.info, Node: Guru Complex DFTs, Next: Guru Real-data DFTs, Prev: Guru vector and transform sizes, Up: Guru Interface
2219
2220 4.5.3 Guru Complex DFTs
2221 -----------------------
2222
2223 fftw_plan fftw_plan_guru_dft(
2224 int rank, const fftw_iodim *dims,
2225 int howmany_rank, const fftw_iodim *howmany_dims,
2226 fftw_complex *in, fftw_complex *out,
2227 int sign, unsigned flags);
2228
2229 fftw_plan fftw_plan_guru_split_dft(
2230 int rank, const fftw_iodim *dims,
2231 int howmany_rank, const fftw_iodim *howmany_dims,
2232 double *ri, double *ii, double *ro, double *io,
2233 unsigned flags);
2234
2235 These two functions plan a complex-data, multi-dimensional DFT for
2236 the interleaved and split format, respectively. Transform dimensions
2237 are given by (`rank', `dims') over a multi-dimensional vector (loop) of
2238 dimensions (`howmany_rank', `howmany_dims'). `dims' and `howmany_dims'
2239 should point to `fftw_iodim' arrays of length `rank' and
2240 `howmany_rank', respectively.
2241
2242 `flags' is a bitwise OR (`|') of zero or more planner flags, as
2243 defined in *note Planner Flags::.
2244
2245 In the `fftw_plan_guru_dft' function, the pointers `in' and `out'
2246 point to the interleaved input and output arrays, respectively. The
2247 sign can be either -1 (= `FFTW_FORWARD') or +1 (= `FFTW_BACKWARD'). If
2248 the pointers are equal, the transform is in-place.
2249
2250 In the `fftw_plan_guru_split_dft' function, `ri' and `ii' point to
2251 the real and imaginary input arrays, and `ro' and `io' point to the
2252 real and imaginary output arrays. The input and output pointers may be
2253 the same, indicating an in-place transform. For example, for
2254 `fftw_complex' pointers `in' and `out', the corresponding parameters
2255 are:
2256
2257 ri = (double *) in;
2258 ii = (double *) in + 1;
2259 ro = (double *) out;
2260 io = (double *) out + 1;
2261
2262 Because `fftw_plan_guru_split_dft' accepts split arrays, strides are
2263 expressed in units of `double'. For a contiguous `fftw_complex' array,
2264 the overall stride of the transform should be 2, the distance between
2265 consecutive real parts or between consecutive imaginary parts; see
2266 *note Guru vector and transform sizes::. Note that the dimension
2267 strides are applied equally to the real and imaginary parts; real and
2268 imaginary arrays with different strides are not supported.
2269
2270 There is no `sign' parameter in `fftw_plan_guru_split_dft'. This
2271 function always plans for an `FFTW_FORWARD' transform. To plan for an
2272 `FFTW_BACKWARD' transform, you can exploit the identity that the
2273 backwards DFT is equal to the forwards DFT with the real and imaginary
2274 parts swapped. For example, in the case of the `fftw_complex' arrays
2275 above, the `FFTW_BACKWARD' transform is computed by the parameters:
2276
2277 ri = (double *) in + 1;
2278 ii = (double *) in;
2279 ro = (double *) out + 1;
2280 io = (double *) out;
2281
2282 
2283 File: fftw3.info, Node: Guru Real-data DFTs, Next: Guru Real-to-real Transforms, Prev: Guru Complex DFTs, Up: Guru Interface
2284
2285 4.5.4 Guru Real-data DFTs
2286 -------------------------
2287
2288 fftw_plan fftw_plan_guru_dft_r2c(
2289 int rank, const fftw_iodim *dims,
2290 int howmany_rank, const fftw_iodim *howmany_dims,
2291 double *in, fftw_complex *out,
2292 unsigned flags);
2293
2294 fftw_plan fftw_plan_guru_split_dft_r2c(
2295 int rank, const fftw_iodim *dims,
2296 int howmany_rank, const fftw_iodim *howmany_dims,
2297 double *in, double *ro, double *io,
2298 unsigned flags);
2299
2300 fftw_plan fftw_plan_guru_dft_c2r(
2301 int rank, const fftw_iodim *dims,
2302 int howmany_rank, const fftw_iodim *howmany_dims,
2303 fftw_complex *in, double *out,
2304 unsigned flags);
2305
2306 fftw_plan fftw_plan_guru_split_dft_c2r(
2307 int rank, const fftw_iodim *dims,
2308 int howmany_rank, const fftw_iodim *howmany_dims,
2309 double *ri, double *ii, double *out,
2310 unsigned flags);
2311
2312 Plan a real-input (r2c) or real-output (c2r), multi-dimensional DFT
2313 with transform dimensions given by (`rank', `dims') over a
2314 multi-dimensional vector (loop) of dimensions (`howmany_rank',
2315 `howmany_dims'). `dims' and `howmany_dims' should point to
2316 `fftw_iodim' arrays of length `rank' and `howmany_rank', respectively.
2317 As for the basic and advanced interfaces, an r2c transform is
2318 `FFTW_FORWARD' and a c2r transform is `FFTW_BACKWARD'.
2319
2320 The _last_ dimension of `dims' is interpreted specially: that
2321 dimension of the real array has size `dims[rank-1].n', but that
2322 dimension of the complex array has size `dims[rank-1].n/2+1' (division
2323 rounded down). The strides, on the other hand, are taken to be exactly
2324 as specified. It is up to the user to specify the strides
2325 appropriately for the peculiar dimensions of the data, and we do not
2326 guarantee that the planner will succeed (return non-`NULL') for any
2327 dimensions other than those described in *note Real-data DFT Array
2328 Format:: and generalized in *note Advanced Real-data DFTs::. (That is,
2329 for an in-place transform, each individual dimension should be able to
2330 operate in place.)
2331
2332 `in' and `out' point to the input and output arrays for r2c and c2r
2333 transforms, respectively. For split arrays, `ri' and `ii' point to the
2334 real and imaginary input arrays for a c2r transform, and `ro' and `io'
2335 point to the real and imaginary output arrays for an r2c transform.
2336 `in' and `ro' or `ri' and `out' may be the same, indicating an in-place
2337 transform. (In-place transforms where `in' and `io' or `ii' and `out'
2338 are the same are not currently supported.)
2339
2340 `flags' is a bitwise OR (`|') of zero or more planner flags, as
2341 defined in *note Planner Flags::.
2342
2343 In-place transforms of rank greater than 1 are currently only
2344 supported for interleaved arrays. For split arrays, the planner will
2345 return `NULL'.
2346
2347 
2348 File: fftw3.info, Node: Guru Real-to-real Transforms, Next: 64-bit Guru Interface, Prev: Guru Real-data DFTs, Up: Guru Interface
2349
2350 4.5.5 Guru Real-to-real Transforms
2351 ----------------------------------
2352
2353 fftw_plan fftw_plan_guru_r2r(int rank, const fftw_iodim *dims,
2354 int howmany_rank,
2355 const fftw_iodim *howmany_dims,
2356 double *in, double *out,
2357 const fftw_r2r_kind *kind,
2358 unsigned flags);
2359
2360 Plan a real-to-real (r2r) multi-dimensional `FFTW_FORWARD' transform
2361 with transform dimensions given by (`rank', `dims') over a
2362 multi-dimensional vector (loop) of dimensions (`howmany_rank',
2363 `howmany_dims'). `dims' and `howmany_dims' should point to
2364 `fftw_iodim' arrays of length `rank' and `howmany_rank', respectively.
2365
2366 The transform kind of each dimension is given by the `kind'
2367 parameter, which should point to an array of length `rank'. Valid
2368 `fftw_r2r_kind' constants are given in *note Real-to-Real Transform
2369 Kinds::.
2370
2371 `in' and `out' point to the real input and output arrays; they may
2372 be the same, indicating an in-place transform.
2373
2374 `flags' is a bitwise OR (`|') of zero or more planner flags, as
2375 defined in *note Planner Flags::.
2376
2377 
2378 File: fftw3.info, Node: 64-bit Guru Interface, Prev: Guru Real-to-real Transforms, Up: Guru Interface
2379
2380 4.5.6 64-bit Guru Interface
2381 ---------------------------
2382
2383 When compiled in 64-bit mode on a 64-bit architecture (where addresses
2384 are 64 bits wide), FFTW uses 64-bit quantities internally for all
2385 transform sizes, strides, and so on--you don't have to do anything
2386 special to exploit this. However, in the ordinary FFTW interfaces, you
2387 specify the transform size by an `int' quantity, which is normally only
2388 32 bits wide. This means that, even though FFTW is using 64-bit sizes
2389 internally, you cannot specify a single transform dimension larger than
2390 2^31-1 numbers.
2391
2392 We expect that few users will require transforms larger than this,
2393 but, for those who do, we provide a 64-bit version of the guru
2394 interface in which all sizes are specified as integers of type
2395 `ptrdiff_t' instead of `int'. (`ptrdiff_t' is a signed integer type
2396 defined by the C standard to be wide enough to represent address
2397 differences, and thus must be at least 64 bits wide on a 64-bit
2398 machine.) We stress that there is _no performance advantage_ to using
2399 this interface--the same internal FFTW code is employed regardless--and
2400 it is only necessary if you want to specify very large transform sizes.
2401
2402 In particular, the 64-bit guru interface is a set of planner routines
2403 that are exactly the same as the guru planner routines, except that
2404 they are named with `guru64' instead of `guru' and they take arguments
2405 of type `fftw_iodim64' instead of `fftw_iodim'. For example, instead
2406 of `fftw_plan_guru_dft', we have `fftw_plan_guru64_dft'.
2407
2408 fftw_plan fftw_plan_guru64_dft(
2409 int rank, const fftw_iodim64 *dims,
2410 int howmany_rank, const fftw_iodim64 *howmany_dims,
2411 fftw_complex *in, fftw_complex *out,
2412 int sign, unsigned flags);
2413
2414 The `fftw_iodim64' type is similar to `fftw_iodim', with the same
2415 interpretation, except that it uses type `ptrdiff_t' instead of type
2416 `int'.
2417
2418 typedef struct {
2419 ptrdiff_t n;
2420 ptrdiff_t is;
2421 ptrdiff_t os;
2422 } fftw_iodim64;
2423
2424 Every other `fftw_plan_guru' function also has a `fftw_plan_guru64'
2425 equivalent, but we do not repeat their documentation here since they
2426 are identical to the 32-bit versions except as noted above.
2427
2428 
2429 File: fftw3.info, Node: New-array Execute Functions, Next: Wisdom, Prev: Guru Interface, Up: FFTW Reference
2430
2431 4.6 New-array Execute Functions
2432 ===============================
2433
2434 Normally, one executes a plan for the arrays with which the plan was
2435 created, by calling `fftw_execute(plan)' as described in *note Using
2436 Plans::. However, it is possible for sophisticated users to apply a
2437 given plan to a _different_ array using the "new-array execute"
2438 functions detailed below, provided that the following conditions are
2439 met:
2440
2441 * The array size, strides, etcetera are the same (since those are
2442 set by the plan).
2443
2444 * The input and output arrays are the same (in-place) or different
2445 (out-of-place) if the plan was originally created to be in-place or
2446 out-of-place, respectively.
2447
2448 * For split arrays, the separations between the real and imaginary
2449 parts, `ii-ri' and `io-ro', are the same as they were for the
2450 input and output arrays when the plan was created. (This
2451 condition is automatically satisfied for interleaved arrays.)
2452
2453 * The "alignment" of the new input/output arrays is the same as that
2454 of the input/output arrays when the plan was created, unless the
2455 plan was created with the `FFTW_UNALIGNED' flag. Here, the
2456 alignment is a platform-dependent quantity (for example, it is the
2457 address modulo 16 if SSE SIMD instructions are used, but the
2458 address modulo 4 for non-SIMD single-precision FFTW on the same
2459 machine). In general, only arrays allocated with `fftw_malloc'
2460 are guaranteed to be equally aligned (*note SIMD alignment and
2461 fftw_malloc::).
2462
2463
2464 The alignment issue is especially critical, because if you don't use
2465 `fftw_malloc' then you may have little control over the alignment of
2466 arrays in memory. For example, neither the C++ `new' function nor the
2467 Fortran `allocate' statement provide strong enough guarantees about
2468 data alignment. If you don't use `fftw_malloc', therefore, you
2469 probably have to use `FFTW_UNALIGNED' (which disables most SIMD
2470 support). If possible, it is probably better for you to simply create
2471 multiple plans (creating a new plan is quick once one exists for a
2472 given size), or better yet re-use the same array for your transforms.
2473
2474 For rare circumstances in which you cannot control the alignment of
2475 allocated memory, but wish to determine where a given array is aligned
2476 like the original array for which a plan was created, you can use the
2477 `fftw_alignment_of' function:
2478 int fftw_alignment_of(double *p);
2479 Two arrays have equivalent alignment (for the purposes of applying a
2480 plan) if and only if `fftw_alignment_of' returns the same value for the
2481 corresponding pointers to their data (typecast to `double*' if
2482 necessary).
2483
2484 If you are tempted to use the new-array execute interface because you
2485 want to transform a known bunch of arrays of the same size, you should
2486 probably go use the advanced interface instead (*note Advanced
2487 Interface::)).
2488
2489 The new-array execute functions are:
2490
2491 void fftw_execute_dft(
2492 const fftw_plan p,
2493 fftw_complex *in, fftw_complex *out);
2494
2495 void fftw_execute_split_dft(
2496 const fftw_plan p,
2497 double *ri, double *ii, double *ro, double *io);
2498
2499 void fftw_execute_dft_r2c(
2500 const fftw_plan p,
2501 double *in, fftw_complex *out);
2502
2503 void fftw_execute_split_dft_r2c(
2504 const fftw_plan p,
2505 double *in, double *ro, double *io);
2506
2507 void fftw_execute_dft_c2r(
2508 const fftw_plan p,
2509 fftw_complex *in, double *out);
2510
2511 void fftw_execute_split_dft_c2r(
2512 const fftw_plan p,
2513 double *ri, double *ii, double *out);
2514
2515 void fftw_execute_r2r(
2516 const fftw_plan p,
2517 double *in, double *out);
2518
2519 These execute the `plan' to compute the corresponding transform on
2520 the input/output arrays specified by the subsequent arguments. The
2521 input/output array arguments have the same meanings as the ones passed
2522 to the guru planner routines in the preceding sections. The `plan' is
2523 not modified, and these routines can be called as many times as
2524 desired, or intermixed with calls to the ordinary `fftw_execute'.
2525
2526 The `plan' _must_ have been created for the transform type
2527 corresponding to the execute function, e.g. it must be a complex-DFT
2528 plan for `fftw_execute_dft'. Any of the planner routines for that
2529 transform type, from the basic to the guru interface, could have been
2530 used to create the plan, however.
2531
2532 
2533 File: fftw3.info, Node: Wisdom, Next: What FFTW Really Computes, Prev: New-array Execute Functions, Up: FFTW Reference
2534
2535 4.7 Wisdom
2536 ==========
2537
2538 This section documents the FFTW mechanism for saving and restoring
2539 plans from disk. This mechanism is called "wisdom".
2540
2541 * Menu:
2542
2543 * Wisdom Export::
2544 * Wisdom Import::
2545 * Forgetting Wisdom::
2546 * Wisdom Utilities::
2547
2548 
2549 File: fftw3.info, Node: Wisdom Export, Next: Wisdom Import, Prev: Wisdom, Up: Wisdom
2550
2551 4.7.1 Wisdom Export
2552 -------------------
2553
2554 int fftw_export_wisdom_to_filename(const char *filename);
2555 void fftw_export_wisdom_to_file(FILE *output_file);
2556 char *fftw_export_wisdom_to_string(void);
2557 void fftw_export_wisdom(void (*write_char)(char c, void *), void *data);
2558
2559 These functions allow you to export all currently accumulated wisdom
2560 in a form from which it can be later imported and restored, even during
2561 a separate run of the program. (*Note Words of Wisdom-Saving Plans::.)
2562 The current store of wisdom is not affected by calling any of these
2563 routines.
2564
2565 `fftw_export_wisdom' exports the wisdom to any output medium, as
2566 specified by the callback function `write_char'. `write_char' is a
2567 `putc'-like function that writes the character `c' to some output; its
2568 second parameter is the `data' pointer passed to `fftw_export_wisdom'.
2569 For convenience, the following three "wrapper" routines are provided:
2570
2571 `fftw_export_wisdom_to_filename' writes wisdom to a file named
2572 `filename' (which is created or overwritten), returning `1' on success
2573 and `0' on failure. A lower-level function, which requires you to open
2574 and close the file yourself (e.g. if you want to write wisdom to a
2575 portion of a larger file) is `fftw_export_wisdom_to_file'. This writes
2576 the wisdom to the current position in `output_file', which should be
2577 open with write permission; upon exit, the file remains open and is
2578 positioned at the end of the wisdom data.
2579
2580 `fftw_export_wisdom_to_string' returns a pointer to a
2581 `NULL'-terminated string holding the wisdom data. This string is
2582 dynamically allocated, and it is the responsibility of the caller to
2583 deallocate it with `free' when it is no longer needed.
2584
2585 All of these routines export the wisdom in the same format, which we
2586 will not document here except to say that it is LISP-like ASCII text
2587 that is insensitive to white space.
2588
2589 
2590 File: fftw3.info, Node: Wisdom Import, Next: Forgetting Wisdom, Prev: Wisdom Export, Up: Wisdom
2591
2592 4.7.2 Wisdom Import
2593 -------------------
2594
2595 int fftw_import_system_wisdom(void);
2596 int fftw_import_wisdom_from_filename(const char *filename);
2597 int fftw_import_wisdom_from_string(const char *input_string);
2598 int fftw_import_wisdom(int (*read_char)(void *), void *data);
2599
2600 These functions import wisdom into a program from data stored by the
2601 `fftw_export_wisdom' functions above. (*Note Words of Wisdom-Saving
2602 Plans::.) The imported wisdom replaces any wisdom already accumulated
2603 by the running program.
2604
2605 `fftw_import_wisdom' imports wisdom from any input medium, as
2606 specified by the callback function `read_char'. `read_char' is a
2607 `getc'-like function that returns the next character in the input; its
2608 parameter is the `data' pointer passed to `fftw_import_wisdom'. If the
2609 end of the input data is reached (which should never happen for valid
2610 data), `read_char' should return `EOF' (as defined in `<stdio.h>').
2611 For convenience, the following three "wrapper" routines are provided:
2612
2613 `fftw_import_wisdom_from_filename' reads wisdom from a file named
2614 `filename'. A lower-level function, which requires you to open and
2615 close the file yourself (e.g. if you want to read wisdom from a portion
2616 of a larger file) is `fftw_import_wisdom_from_file'. This reads wisdom
2617 from the current position in `input_file' (which should be open with
2618 read permission); upon exit, the file remains open, but the position of
2619 the read pointer is unspecified.
2620
2621 `fftw_import_wisdom_from_string' reads wisdom from the
2622 `NULL'-terminated string `input_string'.
2623
2624 `fftw_import_system_wisdom' reads wisdom from an
2625 implementation-defined standard file (`/etc/fftw/wisdom' on Unix and
2626 GNU systems).
2627
2628 The return value of these import routines is `1' if the wisdom was
2629 read successfully and `0' otherwise. Note that, in all of these
2630 functions, any data in the input stream past the end of the wisdom data
2631 is simply ignored.
2632
2633 
2634 File: fftw3.info, Node: Forgetting Wisdom, Next: Wisdom Utilities, Prev: Wisdom Import, Up: Wisdom
2635
2636 4.7.3 Forgetting Wisdom
2637 -----------------------
2638
2639 void fftw_forget_wisdom(void);
2640
2641 Calling `fftw_forget_wisdom' causes all accumulated `wisdom' to be
2642 discarded and its associated memory to be freed. (New `wisdom' can
2643 still be gathered subsequently, however.)
2644
2645 
2646 File: fftw3.info, Node: Wisdom Utilities, Prev: Forgetting Wisdom, Up: Wisdom
2647
2648 4.7.4 Wisdom Utilities
2649 ----------------------
2650
2651 FFTW includes two standalone utility programs that deal with wisdom. We
2652 merely summarize them here, since they come with their own `man' pages
2653 for Unix and GNU systems (with HTML versions on our web site).
2654
2655 The first program is `fftw-wisdom' (or `fftwf-wisdom' in single
2656 precision, etcetera), which can be used to create a wisdom file
2657 containing plans for any of the transform sizes and types supported by
2658 FFTW. It is preferable to create wisdom directly from your executable
2659 (*note Caveats in Using Wisdom::), but this program is useful for
2660 creating global wisdom files for `fftw_import_system_wisdom'.
2661
2662 The second program is `fftw-wisdom-to-conf', which takes a wisdom
2663 file as input and produces a "configuration routine" as output. The
2664 latter is a C subroutine that you can compile and link into your
2665 program, replacing a routine of the same name in the FFTW library, that
2666 determines which parts of FFTW are callable by your program.
2667 `fftw-wisdom-to-conf' produces a configuration routine that links to
2668 only those parts of FFTW needed by the saved plans in the wisdom,
2669 greatly reducing the size of statically linked executables (which should
2670 only attempt to create plans corresponding to those in the wisdom,
2671 however).
2672
2673 
2674 File: fftw3.info, Node: What FFTW Really Computes, Prev: Wisdom, Up: FFTW Reference
2675
2676 4.8 What FFTW Really Computes
2677 =============================
2678
2679 In this section, we provide precise mathematical definitions for the
2680 transforms that FFTW computes. These transform definitions are fairly
2681 standard, but some authors follow slightly different conventions for the
2682 normalization of the transform (the constant factor in front) and the
2683 sign of the complex exponent. We begin by presenting the
2684 one-dimensional (1d) transform definitions, and then give the
2685 straightforward extension to multi-dimensional transforms.
2686
2687 * Menu:
2688
2689 * The 1d Discrete Fourier Transform (DFT)::
2690 * The 1d Real-data DFT::
2691 * 1d Real-even DFTs (DCTs)::
2692 * 1d Real-odd DFTs (DSTs)::
2693 * 1d Discrete Hartley Transforms (DHTs)::
2694 * Multi-dimensional Transforms::
2695
2696 
2697 File: fftw3.info, Node: The 1d Discrete Fourier Transform (DFT), Next: The 1d Real-data DFT, Prev: What FFTW Really Computes, Up: What FFTW Really Computes
2698
2699 4.8.1 The 1d Discrete Fourier Transform (DFT)
2700 ---------------------------------------------
2701
2702 The forward (`FFTW_FORWARD') discrete Fourier transform (DFT) of a 1d
2703 complex array X of size n computes an array Y, where: Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(-2 pi j k sqrt(-1)/n) .
2704 The backward (`FFTW_BACKWARD') DFT computes: Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(2 pi j k sqrt(-1)/n) .
2705 FFTW computes an unnormalized transform, in that there is no
2706 coefficient in front of the summation in the DFT. In other words,
2707 applying the forward and then the backward transform will multiply the
2708 input by n.
2709
2710 From above, an `FFTW_FORWARD' transform corresponds to a sign of -1
2711 in the exponent of the DFT. Note also that we use the standard
2712 "in-order" output ordering--the k-th output corresponds to the
2713 frequency k/n (or k/T, where T is your total sampling period). For
2714 those who like to think in terms of positive and negative frequencies,
2715 this means that the positive frequencies are stored in the first half
2716 of the output and the negative frequencies are stored in backwards
2717 order in the second half of the output. (The frequency -k/n is the
2718 same as the frequency (n-k)/n.)
2719
2720 
2721 File: fftw3.info, Node: The 1d Real-data DFT, Next: 1d Real-even DFTs (DCTs), Prev: The 1d Discrete Fourier Transform (DFT), Up: What FFTW Really Computes
2722
2723 4.8.2 The 1d Real-data DFT
2724 --------------------------
2725
2726 The real-input (r2c) DFT in FFTW computes the _forward_ transform Y of
2727 the size `n' real array X, exactly as defined above, i.e. Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(-2 pi j k sqrt(-1)/n) .
2728 This output array Y can easily be shown to possess the "Hermitian"
2729 symmetry Y[k] = Y[n-k]*, where we take Y to be periodic so that Y[n] =
2730 Y[0].
2731
2732 As a result of this symmetry, half of the output Y is redundant
2733 (being the complex conjugate of the other half), and so the 1d r2c
2734 transforms only output elements 0...n/2 of Y (n/2+1 complex numbers),
2735 where the division by 2 is rounded down.
2736
2737 Moreover, the Hermitian symmetry implies that Y[0] and, if n is
2738 even, the Y[n/2] element, are purely real. So, for the `R2HC' r2r
2739 transform, these elements are not stored in the halfcomplex output
2740 format.
2741
2742 The c2r and `H2RC' r2r transforms compute the backward DFT of the
2743 _complex_ array X with Hermitian symmetry, stored in the r2c/`R2HC'
2744 output formats, respectively, where the backward transform is defined
2745 exactly as for the complex case: Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(2 pi j k sqrt(-1)/n) .
2746 The outputs `Y' of this transform can easily be seen to be purely
2747 real, and are stored as an array of real numbers.
2748
2749 Like FFTW's complex DFT, these transforms are unnormalized. In other
2750 words, applying the real-to-complex (forward) and then the
2751 complex-to-real (backward) transform will multiply the input by n.
2752
2753 
2754 File: fftw3.info, Node: 1d Real-even DFTs (DCTs), Next: 1d Real-odd DFTs (DSTs), Prev: The 1d Real-data DFT, Up: What FFTW Really Computes
2755
2756 4.8.3 1d Real-even DFTs (DCTs)
2757 ------------------------------
2758
2759 The Real-even symmetry DFTs in FFTW are exactly equivalent to the
2760 unnormalized forward (and backward) DFTs as defined above, where the
2761 input array X of length N is purely real and is also "even" symmetry.
2762 In this case, the output array is likewise real and even symmetry.
2763
2764 For the case of `REDFT00', this even symmetry means that X[j] =
2765 X[N-j], where we take X to be periodic so that X[N] = X[0]. Because of
2766 this redundancy, only the first n real numbers are actually stored,
2767 where N = 2(n-1).
2768
2769 The proper definition of even symmetry for `REDFT10', `REDFT01', and
2770 `REDFT11' transforms is somewhat more intricate because of the shifts
2771 by 1/2 of the input and/or output, although the corresponding boundary
2772 conditions are given in *note Real even/odd DFTs (cosine/sine
2773 transforms)::. Because of the even symmetry, however, the sine terms
2774 in the DFT all cancel and the remaining cosine terms are written
2775 explicitly below. This formulation often leads people to call such a
2776 transform a "discrete cosine transform" (DCT), although it is really
2777 just a special case of the DFT.
2778
2779 In each of the definitions below, we transform a real array X of
2780 length n to a real array Y of length n:
2781
2782 REDFT00 (DCT-I)
2783 ...............
2784
2785 An `REDFT00' transform (type-I DCT) in FFTW is defined by: Y[k] = X[0]
2786 + (-1)^k X[n-1] + 2 (sum for j = 1 to n-2 of X[j] cos(pi jk /(n-1))).
2787 Note that this transform is not defined for n=1. For n=2, the
2788 summation term above is dropped as you might expect.
2789
2790 REDFT10 (DCT-II)
2791 ................
2792
2793 An `REDFT10' transform (type-II DCT, sometimes called "the" DCT) in
2794 FFTW is defined by: Y[k] = 2 (sum for j = 0 to n-1 of X[j] cos(pi
2795 (j+1/2) k / n)).
2796
2797 REDFT01 (DCT-III)
2798 .................
2799
2800 An `REDFT01' transform (type-III DCT) in FFTW is defined by: Y[k] =
2801 X[0] + 2 (sum for j = 1 to n-1 of X[j] cos(pi j (k+1/2) / n)). In the
2802 case of n=1, this reduces to Y[0] = X[0]. Up to a scale factor (see
2803 below), this is the inverse of `REDFT10' ("the" DCT), and so the
2804 `REDFT01' (DCT-III) is sometimes called the "IDCT".
2805
2806 REDFT11 (DCT-IV)
2807 ................
2808
2809 An `REDFT11' transform (type-IV DCT) in FFTW is defined by: Y[k] = 2
2810 (sum for j = 0 to n-1 of X[j] cos(pi (j+1/2) (k+1/2) / n)).
2811
2812 Inverses and Normalization
2813 ..........................
2814
2815 These definitions correspond directly to the unnormalized DFTs used
2816 elsewhere in FFTW (hence the factors of 2 in front of the summations).
2817 The unnormalized inverse of `REDFT00' is `REDFT00', of `REDFT10' is
2818 `REDFT01' and vice versa, and of `REDFT11' is `REDFT11'. Each
2819 unnormalized inverse results in the original array multiplied by N,
2820 where N is the _logical_ DFT size. For `REDFT00', N=2(n-1) (note that
2821 n=1 is not defined); otherwise, N=2n.
2822
2823 In defining the discrete cosine transform, some authors also include
2824 additional factors of sqrt(2) (or its inverse) multiplying selected
2825 inputs and/or outputs. This is a mostly cosmetic change that makes the
2826 transform orthogonal, but sacrifices the direct equivalence to a
2827 symmetric DFT.
2828
2829 
2830 File: fftw3.info, Node: 1d Real-odd DFTs (DSTs), Next: 1d Discrete Hartley Transforms (DHTs), Prev: 1d Real-even DFTs (DCTs), Up: What FFTW Really Computes
2831
2832 4.8.4 1d Real-odd DFTs (DSTs)
2833 -----------------------------
2834
2835 The Real-odd symmetry DFTs in FFTW are exactly equivalent to the
2836 unnormalized forward (and backward) DFTs as defined above, where the
2837 input array X of length N is purely real and is also "odd" symmetry. In
2838 this case, the output is odd symmetry and purely imaginary.
2839
2840 For the case of `RODFT00', this odd symmetry means that X[j] =
2841 -X[N-j], where we take X to be periodic so that X[N] = X[0]. Because
2842 of this redundancy, only the first n real numbers starting at j=1 are
2843 actually stored (the j=0 element is zero), where N = 2(n+1).
2844
2845 The proper definition of odd symmetry for `RODFT10', `RODFT01', and
2846 `RODFT11' transforms is somewhat more intricate because of the shifts
2847 by 1/2 of the input and/or output, although the corresponding boundary
2848 conditions are given in *note Real even/odd DFTs (cosine/sine
2849 transforms)::. Because of the odd symmetry, however, the cosine terms
2850 in the DFT all cancel and the remaining sine terms are written
2851 explicitly below. This formulation often leads people to call such a
2852 transform a "discrete sine transform" (DST), although it is really just
2853 a special case of the DFT.
2854
2855 In each of the definitions below, we transform a real array X of
2856 length n to a real array Y of length n:
2857
2858 RODFT00 (DST-I)
2859 ...............
2860
2861 An `RODFT00' transform (type-I DST) in FFTW is defined by: Y[k] = 2
2862 (sum for j = 0 to n-1 of X[j] sin(pi (j+1)(k+1) / (n+1))).
2863
2864 RODFT10 (DST-II)
2865 ................
2866
2867 An `RODFT10' transform (type-II DST) in FFTW is defined by: Y[k] = 2
2868 (sum for j = 0 to n-1 of X[j] sin(pi (j+1/2) (k+1) / n)).
2869
2870 RODFT01 (DST-III)
2871 .................
2872
2873 An `RODFT01' transform (type-III DST) in FFTW is defined by: Y[k] =
2874 (-1)^k X[n-1] + 2 (sum for j = 0 to n-2 of X[j] sin(pi (j+1) (k+1/2) /
2875 n)). In the case of n=1, this reduces to Y[0] = X[0].
2876
2877 RODFT11 (DST-IV)
2878 ................
2879
2880 An `RODFT11' transform (type-IV DST) in FFTW is defined by: Y[k] = 2
2881 (sum for j = 0 to n-1 of X[j] sin(pi (j+1/2) (k+1/2) / n)).
2882
2883 Inverses and Normalization
2884 ..........................
2885
2886 These definitions correspond directly to the unnormalized DFTs used
2887 elsewhere in FFTW (hence the factors of 2 in front of the summations).
2888 The unnormalized inverse of `RODFT00' is `RODFT00', of `RODFT10' is
2889 `RODFT01' and vice versa, and of `RODFT11' is `RODFT11'. Each
2890 unnormalized inverse results in the original array multiplied by N,
2891 where N is the _logical_ DFT size. For `RODFT00', N=2(n+1); otherwise,
2892 N=2n.
2893
2894 In defining the discrete sine transform, some authors also include
2895 additional factors of sqrt(2) (or its inverse) multiplying selected
2896 inputs and/or outputs. This is a mostly cosmetic change that makes the
2897 transform orthogonal, but sacrifices the direct equivalence to an
2898 antisymmetric DFT.
2899
2900 
2901 File: fftw3.info, Node: 1d Discrete Hartley Transforms (DHTs), Next: Multi-dimensional Transforms, Prev: 1d Real-odd DFTs (DSTs), Up: What FFTW Really Computes
2902
2903 4.8.5 1d Discrete Hartley Transforms (DHTs)
2904 -------------------------------------------
2905
2906 The discrete Hartley transform (DHT) of a 1d real array X of size n
2907 computes a real array Y of the same size, where: Y[k] = sum for j = 0 to (n - 1) of X[j] * [cos(2 pi j k / n) + sin(2 pi j k / n)].
2908 FFTW computes an unnormalized transform, in that there is no
2909 coefficient in front of the summation in the DHT. In other words,
2910 applying the transform twice (the DHT is its own inverse) will multiply
2911 the input by n.
2912
2913 
2914 File: fftw3.info, Node: Multi-dimensional Transforms, Prev: 1d Discrete Hartley Transforms (DHTs), Up: What FFTW Really Computes
2915
2916 4.8.6 Multi-dimensional Transforms
2917 ----------------------------------
2918
2919 The multi-dimensional transforms of FFTW, in general, compute simply the
2920 separable product of the given 1d transform along each dimension of the
2921 array. Since each of these transforms is unnormalized, computing the
2922 forward followed by the backward/inverse multi-dimensional transform
2923 will result in the original array scaled by the product of the
2924 normalization factors for each dimension (e.g. the product of the
2925 dimension sizes, for a multi-dimensional DFT).
2926
2927 The definition of FFTW's multi-dimensional DFT of real data (r2c)
2928 deserves special attention. In this case, we logically compute the full
2929 multi-dimensional DFT of the input data; since the input data are purely
2930 real, the output data have the Hermitian symmetry and therefore only one
2931 non-redundant half need be stored. More specifically, for an n[0] x
2932 n[1] x n[2] x ... x n[d-1] multi-dimensional real-input DFT, the full
2933 (logical) complex output array Y[k[0], k[1], ..., k[d-1]] has the
2934 symmetry: Y[k[0], k[1], ..., k[d-1]] = Y[n[0] - k[0], n[1] - k[1], ...,
2935 n[d-1] - k[d-1]]* (where each dimension is periodic). Because of this
2936 symmetry, we only store the k[d-1] = 0...n[d-1]/2 elements of the
2937 _last_ dimension (division by 2 is rounded down). (We could instead
2938 have cut any other dimension in half, but the last dimension proved
2939 computationally convenient.) This results in the peculiar array format
2940 described in more detail by *note Real-data DFT Array Format::.
2941
2942 The multi-dimensional c2r transform is simply the unnormalized
2943 inverse of the r2c transform. i.e. it is the same as FFTW's complex
2944 backward multi-dimensional DFT, operating on a Hermitian input array in
2945 the peculiar format mentioned above and outputting a real array (since
2946 the DFT output is purely real).
2947
2948 We should remind the user that the separable product of 1d transforms
2949 along each dimension, as computed by FFTW, is not always the same thing
2950 as the usual multi-dimensional transform. A multi-dimensional `R2HC'
2951 (or `HC2R') transform is not identical to the multi-dimensional DFT,
2952 requiring some post-processing to combine the requisite real and
2953 imaginary parts, as was described in *note The Halfcomplex-format
2954 DFT::. Likewise, FFTW's multidimensional `FFTW_DHT' r2r transform is
2955 not the same thing as the logical multi-dimensional discrete Hartley
2956 transform defined in the literature, as discussed in *note The Discrete
2957 Hartley Transform::.
2958
2959 
2960 File: fftw3.info, Node: Multi-threaded FFTW, Next: Distributed-memory FFTW with MPI, Prev: FFTW Reference, Up: Top
2961
2962 5 Multi-threaded FFTW
2963 *********************
2964
2965 In this chapter we document the parallel FFTW routines for
2966 shared-memory parallel hardware. These routines, which support
2967 parallel one- and multi-dimensional transforms of both real and complex
2968 data, are the easiest way to take advantage of multiple processors with
2969 FFTW. They work just like the corresponding uniprocessor transform
2970 routines, except that you have an extra initialization routine to call,
2971 and there is a routine to set the number of threads to employ. Any
2972 program that uses the uniprocessor FFTW can therefore be trivially
2973 modified to use the multi-threaded FFTW.
2974
2975 A shared-memory machine is one in which all CPUs can directly access
2976 the same main memory, and such machines are now common due to the
2977 ubiquity of multi-core CPUs. FFTW's multi-threading support allows you
2978 to utilize these additional CPUs transparently from a single program.
2979 However, this does not necessarily translate into performance
2980 gains--when multiple threads/CPUs are employed, there is an overhead
2981 required for synchronization that may outweigh the computatational
2982 parallelism. Therefore, you can only benefit from threads if your
2983 problem is sufficiently large.
2984
2985 * Menu:
2986
2987 * Installation and Supported Hardware/Software::
2988 * Usage of Multi-threaded FFTW::
2989 * How Many Threads to Use?::
2990 * Thread safety::
2991
2992 
2993 File: fftw3.info, Node: Installation and Supported Hardware/Software, Next: Usage of Multi-threaded FFTW, Prev: Multi-threaded FFTW, Up: Multi-threaded FFTW
2994
2995 5.1 Installation and Supported Hardware/Software
2996 ================================================
2997
2998 All of the FFTW threads code is located in the `threads' subdirectory
2999 of the FFTW package. On Unix systems, the FFTW threads libraries and
3000 header files can be automatically configured, compiled, and installed
3001 along with the uniprocessor FFTW libraries simply by including
3002 `--enable-threads' in the flags to the `configure' script (*note
3003 Installation on Unix::), or `--enable-openmp' to use OpenMP
3004 (http://www.openmp.org) threads.
3005
3006 The threads routines require your operating system to have some sort
3007 of shared-memory threads support. Specifically, the FFTW threads
3008 package works with POSIX threads (available on most Unix variants, from
3009 GNU/Linux to MacOS X) and Win32 threads. OpenMP threads, which are
3010 supported in many common compilers (e.g. gcc) are also supported, and
3011 may give better performance on some systems. (OpenMP threads are also
3012 useful if you are employing OpenMP in your own code, in order to
3013 minimize conflicts between threading models.) If you have a
3014 shared-memory machine that uses a different threads API, it should be a
3015 simple matter of programming to include support for it; see the file
3016 `threads/threads.c' for more detail.
3017
3018 You can compile FFTW with _both_ `--enable-threads' and
3019 `--enable-openmp' at the same time, since they install libraries with
3020 different names (`fftw3_threads' and `fftw3_omp', as described below).
3021 However, your programs may only link to _one_ of these two libraries at
3022 a time.
3023
3024 Ideally, of course, you should also have multiple processors in
3025 order to get any benefit from the threaded transforms.
3026
3027 
3028 File: fftw3.info, Node: Usage of Multi-threaded FFTW, Next: How Many Threads to Use?, Prev: Installation and Supported Hardware/Software, Up: Multi-threaded FFTW
3029
3030 5.2 Usage of Multi-threaded FFTW
3031 ================================
3032
3033 Here, it is assumed that the reader is already familiar with the usage
3034 of the uniprocessor FFTW routines, described elsewhere in this manual.
3035 We only describe what one has to change in order to use the
3036 multi-threaded routines.
3037
3038 First, programs using the parallel complex transforms should be
3039 linked with `-lfftw3_threads -lfftw3 -lm' on Unix, or `-lfftw3_omp
3040 -lfftw3 -lm' if you compiled with OpenMP. You will also need to link
3041 with whatever library is responsible for threads on your system (e.g.
3042 `-lpthread' on GNU/Linux) or include whatever compiler flag enables
3043 OpenMP (e.g. `-fopenmp' with gcc).
3044
3045 Second, before calling _any_ FFTW routines, you should call the
3046 function:
3047
3048 int fftw_init_threads(void);
3049
3050 This function, which need only be called once, performs any one-time
3051 initialization required to use threads on your system. It returns zero
3052 if there was some error (which should not happen under normal
3053 circumstances) and a non-zero value otherwise.
3054
3055 Third, before creating a plan that you want to parallelize, you
3056 should call:
3057
3058 void fftw_plan_with_nthreads(int nthreads);
3059
3060 The `nthreads' argument indicates the number of threads you want
3061 FFTW to use (or actually, the maximum number). All plans subsequently
3062 created with any planner routine will use that many threads. You can
3063 call `fftw_plan_with_nthreads', create some plans, call
3064 `fftw_plan_with_nthreads' again with a different argument, and create
3065 some more plans for a new number of threads. Plans already created
3066 before a call to `fftw_plan_with_nthreads' are unaffected. If you pass
3067 an `nthreads' argument of `1' (the default), threads are disabled for
3068 subsequent plans.
3069
3070 With OpenMP, to configure FFTW to use all of the currently running
3071 OpenMP threads (set by `omp_set_num_threads(nthreads)' or by the
3072 `OMP_NUM_THREADS' environment variable), you can do:
3073 `fftw_plan_with_nthreads(omp_get_max_threads())'. (The `omp_' OpenMP
3074 functions are declared via `#include <omp.h>'.)
3075
3076 Given a plan, you then execute it as usual with
3077 `fftw_execute(plan)', and the execution will use the number of threads
3078 specified when the plan was created. When done, you destroy it as
3079 usual with `fftw_destroy_plan'. As described in *note Thread safety::,
3080 plan _execution_ is thread-safe, but plan creation and destruction are
3081 _not_: you should create/destroy plans only from a single thread, but
3082 can safely execute multiple plans in parallel.
3083
3084 There is one additional routine: if you want to get rid of all memory
3085 and other resources allocated internally by FFTW, you can call:
3086
3087 void fftw_cleanup_threads(void);
3088
3089 which is much like the `fftw_cleanup()' function except that it also
3090 gets rid of threads-related data. You must _not_ execute any
3091 previously created plans after calling this function.
3092
3093 We should also mention one other restriction: if you save wisdom
3094 from a program using the multi-threaded FFTW, that wisdom _cannot be
3095 used_ by a program using only the single-threaded FFTW (i.e. not calling
3096 `fftw_init_threads'). *Note Words of Wisdom-Saving Plans::.
3097
3098 
3099 File: fftw3.info, Node: How Many Threads to Use?, Next: Thread safety, Prev: Usage of Multi-threaded FFTW, Up: Multi-threaded FFTW
3100
3101 5.3 How Many Threads to Use?
3102 ============================
3103
3104 There is a fair amount of overhead involved in synchronizing threads,
3105 so the optimal number of threads to use depends upon the size of the
3106 transform as well as on the number of processors you have.
3107
3108 As a general rule, you don't want to use more threads than you have
3109 processors. (Using more threads will work, but there will be extra
3110 overhead with no benefit.) In fact, if the problem size is too small,
3111 you may want to use fewer threads than you have processors.
3112
3113 You will have to experiment with your system to see what level of
3114 parallelization is best for your problem size. Typically, the problem
3115 will have to involve at least a few thousand data points before threads
3116 become beneficial. If you plan with `FFTW_PATIENT', it will
3117 automatically disable threads for sizes that don't benefit from
3118 parallelization.
3119
3120 
3121 File: fftw3.info, Node: Thread safety, Prev: How Many Threads to Use?, Up: Multi-threaded FFTW
3122
3123 5.4 Thread safety
3124 =================
3125
3126 Users writing multi-threaded programs (including OpenMP) must concern
3127 themselves with the "thread safety" of the libraries they use--that is,
3128 whether it is safe to call routines in parallel from multiple threads.
3129 FFTW can be used in such an environment, but some care must be taken
3130 because the planner routines share data (e.g. wisdom and trigonometric
3131 tables) between calls and plans.
3132
3133 The upshot is that the only thread-safe (re-entrant) routine in FFTW
3134 is `fftw_execute' (and the new-array variants thereof). All other
3135 routines (e.g. the planner) should only be called from one thread at a
3136 time. So, for example, you can wrap a semaphore lock around any calls
3137 to the planner; even more simply, you can just create all of your plans
3138 from one thread. We do not think this should be an important
3139 restriction (FFTW is designed for the situation where the only
3140 performance-sensitive code is the actual execution of the transform),
3141 and the benefits of shared data between plans are great.
3142
3143 Note also that, since the plan is not modified by `fftw_execute', it
3144 is safe to execute the _same plan_ in parallel by multiple threads.
3145 However, since a given plan operates by default on a fixed array, you
3146 need to use one of the new-array execute functions (*note New-array
3147 Execute Functions::) so that different threads compute the transform of
3148 different data.
3149
3150 (Users should note that these comments only apply to programs using
3151 shared-memory threads or OpenMP. Parallelism using MPI or forked
3152 processes involves a separate address-space and global variables for
3153 each process, and is not susceptible to problems of this sort.)
3154
3155 If you are configured FFTW with the `--enable-debug' or
3156 `--enable-debug-malloc' flags (*note Installation on Unix::), then
3157 `fftw_execute' is not thread-safe. These flags are not documented
3158 because they are intended only for developing and debugging FFTW, but
3159 if you must use `--enable-debug' then you should also specifically pass
3160 `--disable-debug-malloc' for `fftw_execute' to be thread-safe.
3161
3162 
3163 File: fftw3.info, Node: Distributed-memory FFTW with MPI, Next: Calling FFTW from Modern Fortran, Prev: Multi-threaded FFTW, Up: Top
3164
3165 6 Distributed-memory FFTW with MPI
3166 **********************************
3167
3168 In this chapter we document the parallel FFTW routines for parallel
3169 systems supporting the MPI message-passing interface. Unlike the
3170 shared-memory threads described in the previous chapter, MPI allows you
3171 to use _distributed-memory_ parallelism, where each CPU has its own
3172 separate memory, and which can scale up to clusters of many thousands
3173 of processors. This capability comes at a price, however: each process
3174 only stores a _portion_ of the data to be transformed, which means that
3175 the data structures and programming-interface are quite different from
3176 the serial or threads versions of FFTW.
3177
3178 Distributed-memory parallelism is especially useful when you are
3179 transforming arrays so large that they do not fit into the memory of a
3180 single processor. The storage per-process required by FFTW's MPI
3181 routines is proportional to the total array size divided by the number
3182 of processes. Conversely, distributed-memory parallelism can easily
3183 pose an unacceptably high communications overhead for small problems;
3184 the threshold problem size for which parallelism becomes advantageous
3185 will depend on the precise problem you are interested in, your
3186 hardware, and your MPI implementation.
3187
3188 A note on terminology: in MPI, you divide the data among a set of
3189 "processes" which each run in their own memory address space.
3190 Generally, each process runs on a different physical processor, but
3191 this is not required. A set of processes in MPI is described by an
3192 opaque data structure called a "communicator," the most common of which
3193 is the predefined communicator `MPI_COMM_WORLD' which refers to _all_
3194 processes. For more information on these and other concepts common to
3195 all MPI programs, we refer the reader to the documentation at the MPI
3196 home page (http://www.mcs.anl.gov/research/projects/mpi/).
3197
3198 We assume in this chapter that the reader is familiar with the usage
3199 of the serial (uniprocessor) FFTW, and focus only on the concepts new
3200 to the MPI interface.
3201
3202 * Menu:
3203
3204 * FFTW MPI Installation::
3205 * Linking and Initializing MPI FFTW::
3206 * 2d MPI example::
3207 * MPI Data Distribution::
3208 * Multi-dimensional MPI DFTs of Real Data::
3209 * Other Multi-dimensional Real-data MPI Transforms::
3210 * FFTW MPI Transposes::
3211 * FFTW MPI Wisdom::
3212 * Avoiding MPI Deadlocks::
3213 * FFTW MPI Performance Tips::
3214 * Combining MPI and Threads::
3215 * FFTW MPI Reference::
3216 * FFTW MPI Fortran Interface::
3217
3218 
3219 File: fftw3.info, Node: FFTW MPI Installation, Next: Linking and Initializing MPI FFTW, Prev: Distributed-memory FFTW with MPI, Up: Distributed-memory FFTW with MPI
3220
3221 6.1 FFTW MPI Installation
3222 =========================
3223
3224 All of the FFTW MPI code is located in the `mpi' subdirectory of the
3225 FFTW package. On Unix systems, the FFTW MPI libraries and header files
3226 are automatically configured, compiled, and installed along with the
3227 uniprocessor FFTW libraries simply by including `--enable-mpi' in the
3228 flags to the `configure' script (*note Installation on Unix::).
3229
3230 Any implementation of the MPI standard, version 1 or later, should
3231 work with FFTW. The `configure' script will attempt to automatically
3232 detect how to compile and link code using your MPI implementation. In
3233 some cases, especially if you have multiple different MPI
3234 implementations installed or have an unusual MPI software package, you
3235 may need to provide this information explicitly.
3236
3237 Most commonly, one compiles MPI code by invoking a special compiler
3238 command, typically `mpicc' for C code. The `configure' script knows
3239 the most common names for this command, but you can specify the MPI
3240 compilation command explicitly by setting the `MPICC' variable, as in
3241 `./configure MPICC=mpicc ...'.
3242
3243 If, instead of a special compiler command, you need to link a certain
3244 library, you can specify the link command via the `MPILIBS' variable,
3245 as in `./configure MPILIBS=-lmpi ...'. Note that if your MPI library
3246 is installed in a non-standard location (one the compiler does not know
3247 about by default), you may also have to specify the location of the
3248 library and header files via `LDFLAGS' and `CPPFLAGS' variables,
3249 respectively, as in `./configure LDFLAGS=-L/path/to/mpi/libs
3250 CPPFLAGS=-I/path/to/mpi/include ...'.
3251
3252 
3253 File: fftw3.info, Node: Linking and Initializing MPI FFTW, Next: 2d MPI example, Prev: FFTW MPI Installation, Up: Distributed-memory FFTW with MPI
3254
3255 6.2 Linking and Initializing MPI FFTW
3256 =====================================
3257
3258 Programs using the MPI FFTW routines should be linked with `-lfftw3_mpi
3259 -lfftw3 -lm' on Unix in double precision, `-lfftw3f_mpi -lfftw3f -lm'
3260 in single precision, and so on (*note Precision::). You will also need
3261 to link with whatever library is responsible for MPI on your system; in
3262 most MPI implementations, there is a special compiler alias named
3263 `mpicc' to compile and link MPI code.
3264
3265 Before calling any FFTW routines except possibly `fftw_init_threads'
3266 (*note Combining MPI and Threads::), but after calling `MPI_Init', you
3267 should call the function:
3268
3269 void fftw_mpi_init(void);
3270
3271 If, at the end of your program, you want to get rid of all memory and
3272 other resources allocated internally by FFTW, for both the serial and
3273 MPI routines, you can call:
3274
3275 void fftw_mpi_cleanup(void);
3276
3277 which is much like the `fftw_cleanup()' function except that it also
3278 gets rid of FFTW's MPI-related data. You must _not_ execute any
3279 previously created plans after calling this function.
3280
3281 
3282 File: fftw3.info, Node: 2d MPI example, Next: MPI Data Distribution, Prev: Linking and Initializing MPI FFTW, Up: Distributed-memory FFTW with MPI
3283
3284 6.3 2d MPI example
3285 ==================
3286
3287 Before we document the FFTW MPI interface in detail, we begin with a
3288 simple example outlining how one would perform a two-dimensional `N0'
3289 by `N1' complex DFT.
3290
3291 #include <fftw3-mpi.h>
3292
3293 int main(int argc, char **argv)
3294 {
3295 const ptrdiff_t N0 = ..., N1 = ...;
3296 fftw_plan plan;
3297 fftw_complex *data;
3298 ptrdiff_t alloc_local, local_n0, local_0_start, i, j;
3299
3300 MPI_Init(&argc, &argv);
3301 fftw_mpi_init();
3302
3303 /* get local data size and allocate */
3304 alloc_local = fftw_mpi_local_size_2d(N0, N1, MPI_COMM_WORLD,
3305 &local_n0, &local_0_start);
3306 data = fftw_alloc_complex(alloc_local);
3307
3308 /* create plan for in-place forward DFT */
3309 plan = fftw_mpi_plan_dft_2d(N0, N1, data, data, MPI_COMM_WORLD,
3310 FFTW_FORWARD, FFTW_ESTIMATE);
3311
3312 /* initialize data to some function my_function(x,y) */
3313 for (i = 0; i < local_n0; ++i) for (j = 0; j < N1; ++j)
3314 data[i*N1 + j] = my_function(local_0_start + i, j);
3315
3316 /* compute transforms, in-place, as many times as desired */
3317 fftw_execute(plan);
3318
3319 fftw_destroy_plan(plan);
3320
3321 MPI_Finalize();
3322 }
3323
3324 As can be seen above, the MPI interface follows the same basic style
3325 of allocate/plan/execute/destroy as the serial FFTW routines. All of
3326 the MPI-specific routines are prefixed with `fftw_mpi_' instead of
3327 `fftw_'. There are a few important differences, however:
3328
3329 First, we must call `fftw_mpi_init()' after calling `MPI_Init'
3330 (required in all MPI programs) and before calling any other `fftw_mpi_'
3331 routine.
3332
3333 Second, when we create the plan with `fftw_mpi_plan_dft_2d',
3334 analogous to `fftw_plan_dft_2d', we pass an additional argument: the
3335 communicator, indicating which processes will participate in the
3336 transform (here `MPI_COMM_WORLD', indicating all processes). Whenever
3337 you create, execute, or destroy a plan for an MPI transform, you must
3338 call the corresponding FFTW routine on _all_ processes in the
3339 communicator for that transform. (That is, these are _collective_
3340 calls.) Note that the plan for the MPI transform uses the standard
3341 `fftw_execute' and `fftw_destroy' routines (on the other hand, there
3342 are MPI-specific new-array execute functions documented below).
3343
3344 Third, all of the FFTW MPI routines take `ptrdiff_t' arguments
3345 instead of `int' as for the serial FFTW. `ptrdiff_t' is a standard C
3346 integer type which is (at least) 32 bits wide on a 32-bit machine and
3347 64 bits wide on a 64-bit machine. This is to make it easy to specify
3348 very large parallel transforms on a 64-bit machine. (You can specify
3349 64-bit transform sizes in the serial FFTW, too, but only by using the
3350 `guru64' planner interface. *Note 64-bit Guru Interface::.)
3351
3352 Fourth, and most importantly, you don't allocate the entire
3353 two-dimensional array on each process. Instead, you call
3354 `fftw_mpi_local_size_2d' to find out what _portion_ of the array
3355 resides on each processor, and how much space to allocate. Here, the
3356 portion of the array on each process is a `local_n0' by `N1' slice of
3357 the total array, starting at index `local_0_start'. The total number
3358 of `fftw_complex' numbers to allocate is given by the `alloc_local'
3359 return value, which _may_ be greater than `local_n0 * N1' (in case some
3360 intermediate calculations require additional storage). The data
3361 distribution in FFTW's MPI interface is described in more detail by the
3362 next section.
3363
3364 Given the portion of the array that resides on the local process, it
3365 is straightforward to initialize the data (here to a function
3366 `myfunction') and otherwise manipulate it. Of course, at the end of
3367 the program you may want to output the data somehow, but synchronizing
3368 this output is up to you and is beyond the scope of this manual. (One
3369 good way to output a large multi-dimensional distributed array in MPI
3370 to a portable binary file is to use the free HDF5 library; see the HDF
3371 home page (http://www.hdfgroup.org/).)
3372
3373 
3374 File: fftw3.info, Node: MPI Data Distribution, Next: Multi-dimensional MPI DFTs of Real Data, Prev: 2d MPI example, Up: Distributed-memory FFTW with MPI
3375
3376 6.4 MPI Data Distribution
3377 =========================
3378
3379 The most important concept to understand in using FFTW's MPI interface
3380 is the data distribution. With a serial or multithreaded FFT, all of
3381 the inputs and outputs are stored as a single contiguous chunk of
3382 memory. With a distributed-memory FFT, the inputs and outputs are
3383 broken into disjoint blocks, one per process.
3384
3385 In particular, FFTW uses a _1d block distribution_ of the data,
3386 distributed along the _first dimension_. For example, if you want to
3387 perform a 100 x 200 complex DFT, distributed over 4 processes, each
3388 process will get a 25 x 200 slice of the data. That is, process 0
3389 will get rows 0 through 24, process 1 will get rows 25 through 49,
3390 process 2 will get rows 50 through 74, and process 3 will get rows 75
3391 through 99. If you take the same array but distribute it over 3
3392 processes, then it is not evenly divisible so the different processes
3393 will have unequal chunks. FFTW's default choice in this case is to
3394 assign 34 rows to processes 0 and 1, and 32 rows to process 2.
3395
3396 FFTW provides several `fftw_mpi_local_size' routines that you can
3397 call to find out what portion of an array is stored on the current
3398 process. In most cases, you should use the default block sizes picked
3399 by FFTW, but it is also possible to specify your own block size. For
3400 example, with a 100 x 200 array on three processes, you can tell FFTW
3401 to use a block size of 40, which would assign 40 rows to processes 0
3402 and 1, and 20 rows to process 2. FFTW's default is to divide the data
3403 equally among the processes if possible, and as best it can otherwise.
3404 The rows are always assigned in "rank order," i.e. process 0 gets the
3405 first block of rows, then process 1, and so on. (You can change this
3406 by using `MPI_Comm_split' to create a new communicator with re-ordered
3407 processes.) However, you should always call the `fftw_mpi_local_size'
3408 routines, if possible, rather than trying to predict FFTW's
3409 distribution choices.
3410
3411 In particular, it is critical that you allocate the storage size that
3412 is returned by `fftw_mpi_local_size', which is _not_ necessarily the
3413 size of the local slice of the array. The reason is that intermediate
3414 steps of FFTW's algorithms involve transposing the array and
3415 redistributing the data, so at these intermediate steps FFTW may
3416 require more local storage space (albeit always proportional to the
3417 total size divided by the number of processes). The
3418 `fftw_mpi_local_size' functions know how much storage is required for
3419 these intermediate steps and tell you the correct amount to allocate.
3420
3421 * Menu:
3422
3423 * Basic and advanced distribution interfaces::
3424 * Load balancing::
3425 * Transposed distributions::
3426 * One-dimensional distributions::
3427
3428 
3429 File: fftw3.info, Node: Basic and advanced distribution interfaces, Next: Load balancing, Prev: MPI Data Distribution, Up: MPI Data Distribution
3430
3431 6.4.1 Basic and advanced distribution interfaces
3432 ------------------------------------------------
3433
3434 As with the planner interface, the `fftw_mpi_local_size' distribution
3435 interface is broken into basic and advanced (`_many') interfaces, where
3436 the latter allows you to specify the block size manually and also to
3437 request block sizes when computing multiple transforms simultaneously.
3438 These functions are documented more exhaustively by the FFTW MPI
3439 Reference, but we summarize the basic ideas here using a couple of
3440 two-dimensional examples.
3441
3442 For the 100 x 200 complex-DFT example, above, we would find the
3443 distribution by calling the following function in the basic interface:
3444
3445 ptrdiff_t fftw_mpi_local_size_2d(ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
3446 ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
3447
3448 Given the total size of the data to be transformed (here, `n0 = 100'
3449 and `n1 = 200') and an MPI communicator (`comm'), this function
3450 provides three numbers.
3451
3452 First, it describes the shape of the local data: the current process
3453 should store a `local_n0' by `n1' slice of the overall dataset, in
3454 row-major order (`n1' dimension contiguous), starting at index
3455 `local_0_start'. That is, if the total dataset is viewed as a `n0' by
3456 `n1' matrix, the current process should store the rows `local_0_start'
3457 to `local_0_start+local_n0-1'. Obviously, if you are running with only
3458 a single MPI process, that process will store the entire array:
3459 `local_0_start' will be zero and `local_n0' will be `n0'. *Note
3460 Row-major Format::.
3461
3462 Second, the return value is the total number of data elements (e.g.,
3463 complex numbers for a complex DFT) that should be allocated for the
3464 input and output arrays on the current process (ideally with
3465 `fftw_malloc' or an `fftw_alloc' function, to ensure optimal
3466 alignment). It might seem that this should always be equal to
3467 `local_n0 * n1', but this is _not_ the case. FFTW's distributed FFT
3468 algorithms require data redistributions at intermediate stages of the
3469 transform, and in some circumstances this may require slightly larger
3470 local storage. This is discussed in more detail below, under *note
3471 Load balancing::.
3472
3473 The advanced-interface `local_size' function for multidimensional
3474 transforms returns the same three things (`local_n0', `local_0_start',
3475 and the total number of elements to allocate), but takes more inputs:
3476
3477 ptrdiff_t fftw_mpi_local_size_many(int rnk, const ptrdiff_t *n,
3478 ptrdiff_t howmany,
3479 ptrdiff_t block0,
3480 MPI_Comm comm,
3481 ptrdiff_t *local_n0,
3482 ptrdiff_t *local_0_start);
3483
3484 The two-dimensional case above corresponds to `rnk = 2' and an array
3485 `n' of length 2 with `n[0] = n0' and `n[1] = n1'. This routine is for
3486 any `rnk > 1'; one-dimensional transforms have their own interface
3487 because they work slightly differently, as discussed below.
3488
3489 First, the advanced interface allows you to perform multiple
3490 transforms at once, of interleaved data, as specified by the `howmany'
3491 parameter. (`hoamany' is 1 for a single transform.)
3492
3493 Second, here you can specify your desired block size in the `n0'
3494 dimension, `block0'. To use FFTW's default block size, pass
3495 `FFTW_MPI_DEFAULT_BLOCK' (0) for `block0'. Otherwise, on `P'
3496 processes, FFTW will return `local_n0' equal to `block0' on the first
3497 `P / block0' processes (rounded down), return `local_n0' equal to `n0 -
3498 block0 * (P / block0)' on the next process, and `local_n0' equal to
3499 zero on any remaining processes. In general, we recommend using the
3500 default block size (which corresponds to `n0 / P', rounded up).
3501
3502 For example, suppose you have `P = 4' processes and `n0 = 21'. The
3503 default will be a block size of `6', which will give `local_n0 = 6' on
3504 the first three processes and `local_n0 = 3' on the last process.
3505 Instead, however, you could specify `block0 = 5' if you wanted, which
3506 would give `local_n0 = 5' on processes 0 to 2, `local_n0 = 6' on
3507 process 3. (This choice, while it may look superficially more
3508 "balanced," has the same critical path as FFTW's default but requires
3509 more communications.)
3510
3511 
3512 File: fftw3.info, Node: Load balancing, Next: Transposed distributions, Prev: Basic and advanced distribution interfaces, Up: MPI Data Distribution
3513
3514 6.4.2 Load balancing
3515 --------------------
3516
3517 Ideally, when you parallelize a transform over some P processes, each
3518 process should end up with work that takes equal time. Otherwise, all
3519 of the processes end up waiting on whichever process is slowest. This
3520 goal is known as "load balancing." In this section, we describe the
3521 circumstances under which FFTW is able to load-balance well, and in
3522 particular how you should choose your transform size in order to load
3523 balance.
3524
3525 Load balancing is especially difficult when you are parallelizing
3526 over heterogeneous machines; for example, if one of your processors is a
3527 old 486 and another is a Pentium IV, obviously you should give the
3528 Pentium more work to do than the 486 since the latter is much slower.
3529 FFTW does not deal with this problem, however--it assumes that your
3530 processes run on hardware of comparable speed, and that the goal is
3531 therefore to divide the problem as equally as possible.
3532
3533 For a multi-dimensional complex DFT, FFTW can divide the problem
3534 equally among the processes if: (i) the _first_ dimension `n0' is
3535 divisible by P; and (ii), the _product_ of the subsequent dimensions is
3536 divisible by P. (For the advanced interface, where you can specify
3537 multiple simultaneous transforms via some "vector" length `howmany', a
3538 factor of `howmany' is included in the product of the subsequent
3539 dimensions.)
3540
3541 For a one-dimensional complex DFT, the length `N' of the data should
3542 be divisible by P _squared_ to be able to divide the problem equally
3543 among the processes.
3544
3545 
3546 File: fftw3.info, Node: Transposed distributions, Next: One-dimensional distributions, Prev: Load balancing, Up: MPI Data Distribution
3547
3548 6.4.3 Transposed distributions
3549 ------------------------------
3550
3551 Internally, FFTW's MPI transform algorithms work by first computing
3552 transforms of the data local to each process, then by globally
3553 _transposing_ the data in some fashion to redistribute the data among
3554 the processes, transforming the new data local to each process, and
3555 transposing back. For example, a two-dimensional `n0' by `n1' array,
3556 distributed across the `n0' dimension, is transformd by: (i)
3557 transforming the `n1' dimension, which are local to each process; (ii)
3558 transposing to an `n1' by `n0' array, distributed across the `n1'
3559 dimension; (iii) transforming the `n0' dimension, which is now local to
3560 each process; (iv) transposing back.
3561
3562 However, in many applications it is acceptable to compute a
3563 multidimensional DFT whose results are produced in transposed order
3564 (e.g., `n1' by `n0' in two dimensions). This provides a significant
3565 performance advantage, because it means that the final transposition
3566 step can be omitted. FFTW supports this optimization, which you
3567 specify by passing the flag `FFTW_MPI_TRANSPOSED_OUT' to the planner
3568 routines. To compute the inverse transform of transposed output, you
3569 specify `FFTW_MPI_TRANSPOSED_IN' to tell it that the input is
3570 transposed. In this section, we explain how to interpret the output
3571 format of such a transform.
3572
3573 Suppose you have are transforming multi-dimensional data with (at
3574 least two) dimensions n[0] x n[1] x n[2] x ... x n[d-1] . As always,
3575 it is distributed along the first dimension n[0] . Now, if we compute
3576 its DFT with the `FFTW_MPI_TRANSPOSED_OUT' flag, the resulting output
3577 data are stored with the first _two_ dimensions transposed: n[1] x n[0]
3578 x n[2] x ... x n[d-1] , distributed along the n[1] dimension.
3579 Conversely, if we take the n[1] x n[0] x n[2] x ... x n[d-1] data and
3580 transform it with the `FFTW_MPI_TRANSPOSED_IN' flag, then the format
3581 goes back to the original n[0] x n[1] x n[2] x ... x n[d-1] array.
3582
3583 There are two ways to find the portion of the transposed array that
3584 resides on the current process. First, you can simply call the
3585 appropriate `local_size' function, passing n[1] x n[0] x n[2] x ... x
3586 n[d-1] (the transposed dimensions). This would mean calling the
3587 `local_size' function twice, once for the transposed and once for the
3588 non-transposed dimensions. Alternatively, you can call one of the
3589 `local_size_transposed' functions, which returns both the
3590 non-transposed and transposed data distribution from a single call.
3591 For example, for a 3d transform with transposed output (or input), you
3592 might call:
3593
3594 ptrdiff_t fftw_mpi_local_size_3d_transposed(
3595 ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2, MPI_Comm comm,
3596 ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
3597 ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
3598
3599 Here, `local_n0' and `local_0_start' give the size and starting
3600 index of the `n0' dimension for the _non_-transposed data, as in the
3601 previous sections. For _transposed_ data (e.g. the output for
3602 `FFTW_MPI_TRANSPOSED_OUT'), `local_n1' and `local_1_start' give the
3603 size and starting index of the `n1' dimension, which is the first
3604 dimension of the transposed data (`n1' by `n0' by `n2').
3605
3606 (Note that `FFTW_MPI_TRANSPOSED_IN' is completely equivalent to
3607 performing `FFTW_MPI_TRANSPOSED_OUT' and passing the first two
3608 dimensions to the planner in reverse order, or vice versa. If you pass
3609 _both_ the `FFTW_MPI_TRANSPOSED_IN' and `FFTW_MPI_TRANSPOSED_OUT'
3610 flags, it is equivalent to swapping the first two dimensions passed to
3611 the planner and passing _neither_ flag.)
3612
3613 
3614 File: fftw3.info, Node: One-dimensional distributions, Prev: Transposed distributions, Up: MPI Data Distribution
3615
3616 6.4.4 One-dimensional distributions
3617 -----------------------------------
3618
3619 For one-dimensional distributed DFTs using FFTW, matters are slightly
3620 more complicated because the data distribution is more closely tied to
3621 how the algorithm works. In particular, you can no longer pass an
3622 arbitrary block size and must accept FFTW's default; also, the block
3623 sizes may be different for input and output. Also, the data
3624 distribution depends on the flags and transform direction, in order for
3625 forward and backward transforms to work correctly.
3626
3627 ptrdiff_t fftw_mpi_local_size_1d(ptrdiff_t n0, MPI_Comm comm,
3628 int sign, unsigned flags,
3629 ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
3630 ptrdiff_t *local_no, ptrdiff_t *local_o_start);
3631
3632 This function computes the data distribution for a 1d transform of
3633 size `n0' with the given transform `sign' and `flags'. Both input and
3634 output data use block distributions. The input on the current process
3635 will consist of `local_ni' numbers starting at index `local_i_start';
3636 e.g. if only a single process is used, then `local_ni' will be `n0' and
3637 `local_i_start' will be `0'. Similarly for the output, with `local_no'
3638 numbers starting at index `local_o_start'. The return value of
3639 `fftw_mpi_local_size_1d' will be the total number of elements to
3640 allocate on the current process (which might be slightly larger than
3641 the local size due to intermediate steps in the algorithm).
3642
3643 As mentioned above (*note Load balancing::), the data will be divided
3644 equally among the processes if `n0' is divisible by the _square_ of the
3645 number of processes. In this case, `local_ni' will equal `local_no'.
3646 Otherwise, they may be different.
3647
3648 For some applications, such as convolutions, the order of the output
3649 data is irrelevant. In this case, performance can be improved by
3650 specifying that the output data be stored in an FFTW-defined
3651 "scrambled" format. (In particular, this is the analogue of transposed
3652 output in the multidimensional case: scrambled output saves a
3653 communications step.) If you pass `FFTW_MPI_SCRAMBLED_OUT' in the
3654 flags, then the output is stored in this (undocumented) scrambled
3655 order. Conversely, to perform the inverse transform of data in
3656 scrambled order, pass the `FFTW_MPI_SCRAMBLED_IN' flag.
3657
3658 In MPI FFTW, only composite sizes `n0' can be parallelized; we have
3659 not yet implemented a parallel algorithm for large prime sizes.
3660
3661 
3662 File: fftw3.info, Node: Multi-dimensional MPI DFTs of Real Data, Next: Other Multi-dimensional Real-data MPI Transforms, Prev: MPI Data Distribution, Up: Distributed-memory FFTW with MPI
3663
3664 6.5 Multi-dimensional MPI DFTs of Real Data
3665 ===========================================
3666
3667 FFTW's MPI interface also supports multi-dimensional DFTs of real data,
3668 similar to the serial r2c and c2r interfaces. (Parallel
3669 one-dimensional real-data DFTs are not currently supported; you must
3670 use a complex transform and set the imaginary parts of the inputs to
3671 zero.)
3672
3673 The key points to understand for r2c and c2r MPI transforms (compared
3674 to the MPI complex DFTs or the serial r2c/c2r transforms), are:
3675
3676 * Just as for serial transforms, r2c/c2r DFTs transform n[0] x n[1]
3677 x n[2] x ... x n[d-1] real data to/from n[0] x n[1] x n[2] x ...
3678 x (n[d-1]/2 + 1) complex data: the last dimension of the complex
3679 data is cut in half (rounded down), plus one. As for the serial
3680 transforms, the sizes you pass to the `plan_dft_r2c' and
3681 `plan_dft_c2r' are the n[0] x n[1] x n[2] x ... x n[d-1]
3682 dimensions of the real data.
3683
3684 * Although the real data is _conceptually_ n[0] x n[1] x n[2] x ...
3685 x n[d-1] , it is _physically_ stored as an n[0] x n[1] x n[2] x
3686 ... x [2 (n[d-1]/2 + 1)] array, where the last dimension has been
3687 _padded_ to make it the same size as the complex output. This is
3688 much like the in-place serial r2c/c2r interface (*note
3689 Multi-Dimensional DFTs of Real Data::), except that in MPI the
3690 padding is required even for out-of-place data. The extra padding
3691 numbers are ignored by FFTW (they are _not_ like zero-padding the
3692 transform to a larger size); they are only used to determine the
3693 data layout.
3694
3695 * The data distribution in MPI for _both_ the real and complex data
3696 is determined by the shape of the _complex_ data. That is, you
3697 call the appropriate `local size' function for the n[0] x n[1] x
3698 n[2] x ... x (n[d-1]/2 + 1)
3699
3700 complex data, and then use the _same_ distribution for the real
3701 data except that the last complex dimension is replaced by a
3702 (padded) real dimension of twice the length.
3703
3704
3705 For example suppose we are performing an out-of-place r2c transform
3706 of L x M x N real data [padded to L x M x 2(N/2+1) ], resulting in L x
3707 M x N/2+1 complex data. Similar to the example in *note 2d MPI
3708 example::, we might do something like:
3709
3710 #include <fftw3-mpi.h>
3711
3712 int main(int argc, char **argv)
3713 {
3714 const ptrdiff_t L = ..., M = ..., N = ...;
3715 fftw_plan plan;
3716 double *rin;
3717 fftw_complex *cout;
3718 ptrdiff_t alloc_local, local_n0, local_0_start, i, j, k;
3719
3720 MPI_Init(&argc, &argv);
3721 fftw_mpi_init();
3722
3723 /* get local data size and allocate */
3724 alloc_local = fftw_mpi_local_size_3d(L, M, N/2+1, MPI_COMM_WORLD,
3725 &local_n0, &local_0_start);
3726 rin = fftw_alloc_real(2 * alloc_local);
3727 cout = fftw_alloc_complex(alloc_local);
3728
3729 /* create plan for out-of-place r2c DFT */
3730 plan = fftw_mpi_plan_dft_r2c_3d(L, M, N, rin, cout, MPI_COMM_WORLD,
3731 FFTW_MEASURE);
3732
3733 /* initialize rin to some function my_func(x,y,z) */
3734 for (i = 0; i < local_n0; ++i)
3735 for (j = 0; j < M; ++j)
3736 for (k = 0; k < N; ++k)
3737 rin[(i*M + j) * (2*(N/2+1)) + k] = my_func(local_0_start+i, j, k);
3738
3739 /* compute transforms as many times as desired */
3740 fftw_execute(plan);
3741
3742 fftw_destroy_plan(plan);
3743
3744 MPI_Finalize();
3745 }
3746
3747 Note that we allocated `rin' using `fftw_alloc_real' with an
3748 argument of `2 * alloc_local': since `alloc_local' is the number of
3749 _complex_ values to allocate, the number of _real_ values is twice as
3750 many. The `rin' array is then local_n0 x M x 2(N/2+1) in row-major
3751 order, so its `(i,j,k)' element is at the index `(i*M + j) *
3752 (2*(N/2+1)) + k' (*note Multi-dimensional Array Format::).
3753
3754 As for the complex transforms, improved performance can be obtained
3755 by specifying that the output is the transpose of the input or vice
3756 versa (*note Transposed distributions::). In our L x M x N r2c
3757 example, including `FFTW_TRANSPOSED_OUT' in the flags means that the
3758 input would be a padded L x M x 2(N/2+1) real array distributed over
3759 the `L' dimension, while the output would be a M x L x N/2+1 complex
3760 array distributed over the `M' dimension. To perform the inverse c2r
3761 transform with the same data distributions, you would use the
3762 `FFTW_TRANSPOSED_IN' flag.
3763
3764 
3765 File: fftw3.info, Node: Other Multi-dimensional Real-data MPI Transforms, Next: FFTW MPI Transposes, Prev: Multi-dimensional MPI DFTs of Real Data, Up: Distributed-memory FFTW with MPI
3766
3767 6.6 Other multi-dimensional Real-Data MPI Transforms
3768 ====================================================
3769
3770 FFTW's MPI interface also supports multi-dimensional `r2r' transforms
3771 of all kinds supported by the serial interface (e.g. discrete cosine
3772 and sine transforms, discrete Hartley transforms, etc.). Only
3773 multi-dimensional `r2r' transforms, not one-dimensional transforms, are
3774 currently parallelized.
3775
3776 These are used much like the multidimensional complex DFTs discussed
3777 above, except that the data is real rather than complex, and one needs
3778 to pass an r2r transform kind (`fftw_r2r_kind') for each dimension as
3779 in the serial FFTW (*note More DFTs of Real Data::).
3780
3781 For example, one might perform a two-dimensional L x M that is an
3782 REDFT10 (DCT-II) in the first dimension and an RODFT10 (DST-II) in the
3783 second dimension with code like:
3784
3785 const ptrdiff_t L = ..., M = ...;
3786 fftw_plan plan;
3787 double *data;
3788 ptrdiff_t alloc_local, local_n0, local_0_start, i, j;
3789
3790 /* get local data size and allocate */
3791 alloc_local = fftw_mpi_local_size_2d(L, M, MPI_COMM_WORLD,
3792 &local_n0, &local_0_start);
3793 data = fftw_alloc_real(alloc_local);
3794
3795 /* create plan for in-place REDFT10 x RODFT10 */
3796 plan = fftw_mpi_plan_r2r_2d(L, M, data, data, MPI_COMM_WORLD,
3797 FFTW_REDFT10, FFTW_RODFT10, FFTW_MEASURE);
3798
3799 /* initialize data to some function my_function(x,y) */
3800 for (i = 0; i < local_n0; ++i) for (j = 0; j < M; ++j)
3801 data[i*M + j] = my_function(local_0_start + i, j);
3802
3803 /* compute transforms, in-place, as many times as desired */
3804 fftw_execute(plan);
3805
3806 fftw_destroy_plan(plan);
3807
3808 Notice that we use the same `local_size' functions as we did for
3809 complex data, only now we interpret the sizes in terms of real rather
3810 than complex values, and correspondingly use `fftw_alloc_real'.
3811
3812 
3813 File: fftw3.info, Node: FFTW MPI Transposes, Next: FFTW MPI Wisdom, Prev: Other Multi-dimensional Real-data MPI Transforms, Up: Distributed-memory FFTW with MPI
3814
3815 6.7 FFTW MPI Transposes
3816 =======================
3817
3818 The FFTW's MPI Fourier transforms rely on one or more _global
3819 transposition_ step for their communications. For example, the
3820 multidimensional transforms work by transforming along some dimensions,
3821 then transposing to make the first dimension local and transforming
3822 that, then transposing back. Because global transposition of a
3823 block-distributed matrix has many other potential uses besides FFTs,
3824 FFTW's transpose routines can be called directly, as documented in this
3825 section.
3826
3827 * Menu:
3828
3829 * Basic distributed-transpose interface::
3830 * Advanced distributed-transpose interface::
3831 * An improved replacement for MPI_Alltoall::
3832
3833 
3834 File: fftw3.info, Node: Basic distributed-transpose interface, Next: Advanced distributed-transpose interface, Prev: FFTW MPI Transposes, Up: FFTW MPI Transposes
3835
3836 6.7.1 Basic distributed-transpose interface
3837 -------------------------------------------
3838
3839 In particular, suppose that we have an `n0' by `n1' array in row-major
3840 order, block-distributed across the `n0' dimension. To transpose this
3841 into an `n1' by `n0' array block-distributed across the `n1' dimension,
3842 we would create a plan by calling the following function:
3843
3844 fftw_plan fftw_mpi_plan_transpose(ptrdiff_t n0, ptrdiff_t n1,
3845 double *in, double *out,
3846 MPI_Comm comm, unsigned flags);
3847
3848 The input and output arrays (`in' and `out') can be the same. The
3849 transpose is actually executed by calling `fftw_execute' on the plan,
3850 as usual.
3851
3852 The `flags' are the usual FFTW planner flags, but support two
3853 additional flags: `FFTW_MPI_TRANSPOSED_OUT' and/or
3854 `FFTW_MPI_TRANSPOSED_IN'. What these flags indicate, for transpose
3855 plans, is that the output and/or input, respectively, are _locally_
3856 transposed. That is, on each process input data is normally stored as
3857 a `local_n0' by `n1' array in row-major order, but for an
3858 `FFTW_MPI_TRANSPOSED_IN' plan the input data is stored as `n1' by
3859 `local_n0' in row-major order. Similarly, `FFTW_MPI_TRANSPOSED_OUT'
3860 means that the output is `n0' by `local_n1' instead of `local_n1' by
3861 `n0'.
3862
3863 To determine the local size of the array on each process before and
3864 after the transpose, as well as the amount of storage that must be
3865 allocated, one should call `fftw_mpi_local_size_2d_transposed', just as
3866 for a 2d DFT as described in the previous section:
3867
3868 ptrdiff_t fftw_mpi_local_size_2d_transposed
3869 (ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
3870 ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
3871 ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
3872
3873 Again, the return value is the local storage to allocate, which in
3874 this case is the number of _real_ (`double') values rather than complex
3875 numbers as in the previous examples.
3876
3877 
3878 File: fftw3.info, Node: Advanced distributed-transpose interface, Next: An improved replacement for MPI_Alltoall, Prev: Basic distributed-transpose interface, Up: FFTW MPI Transposes
3879
3880 6.7.2 Advanced distributed-transpose interface
3881 ----------------------------------------------
3882
3883 The above routines are for a transpose of a matrix of numbers (of type
3884 `double'), using FFTW's default block sizes. More generally, one can
3885 perform transposes of _tuples_ of numbers, with user-specified block
3886 sizes for the input and output:
3887
3888 fftw_plan fftw_mpi_plan_many_transpose
3889 (ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t howmany,
3890 ptrdiff_t block0, ptrdiff_t block1,
3891 double *in, double *out, MPI_Comm comm, unsigned flags);
3892
3893 In this case, one is transposing an `n0' by `n1' matrix of
3894 `howmany'-tuples (e.g. `howmany = 2' for complex numbers). The input
3895 is distributed along the `n0' dimension with block size `block0', and
3896 the `n1' by `n0' output is distributed along the `n1' dimension with
3897 block size `block1'. If `FFTW_MPI_DEFAULT_BLOCK' (0) is passed for a
3898 block size then FFTW uses its default block size. To get the local
3899 size of the data on each process, you should then call
3900 `fftw_mpi_local_size_many_transposed'.
3901
3902 
3903 File: fftw3.info, Node: An improved replacement for MPI_Alltoall, Prev: Advanced distributed-transpose interface, Up: FFTW MPI Transposes
3904
3905 6.7.3 An improved replacement for MPI_Alltoall
3906 ----------------------------------------------
3907
3908 We close this section by noting that FFTW's MPI transpose routines can
3909 be thought of as a generalization for the `MPI_Alltoall' function
3910 (albeit only for floating-point types), and in some circumstances can
3911 function as an improved replacement.
3912
3913 `MPI_Alltoall' is defined by the MPI standard as:
3914
3915 int MPI_Alltoall(void *sendbuf, int sendcount, MPI_Datatype sendtype,
3916 void *recvbuf, int recvcnt, MPI_Datatype recvtype,
3917 MPI_Comm comm);
3918
3919 In particular, for `double*' arrays `in' and `out', consider the
3920 call:
3921
3922 MPI_Alltoall(in, howmany, MPI_DOUBLE, out, howmany MPI_DOUBLE, comm);
3923
3924 This is completely equivalent to:
3925
3926 MPI_Comm_size(comm, &P);
3927 plan = fftw_mpi_plan_many_transpose(P, P, howmany, 1, 1, in, out, comm, FFTW_ESTIMATE);
3928 fftw_execute(plan);
3929 fftw_destroy_plan(plan);
3930
3931 That is, computing a P x P transpose on `P' processes, with a block
3932 size of 1, is just a standard all-to-all communication.
3933
3934 However, using the FFTW routine instead of `MPI_Alltoall' may have
3935 certain advantages. First of all, FFTW's routine can operate in-place
3936 (`in == out') whereas `MPI_Alltoall' can only operate out-of-place.
3937
3938 Second, even for out-of-place plans, FFTW's routine may be faster,
3939 especially if you need to perform the all-to-all communication many
3940 times and can afford to use `FFTW_MEASURE' or `FFTW_PATIENT'. It
3941 should certainly be no slower, not including the time to create the
3942 plan, since one of the possible algorithms that FFTW uses for an
3943 out-of-place transpose _is_ simply to call `MPI_Alltoall'. However,
3944 FFTW also considers several other possible algorithms that, depending
3945 on your MPI implementation and your hardware, may be faster.
3946
3947 
3948 File: fftw3.info, Node: FFTW MPI Wisdom, Next: Avoiding MPI Deadlocks, Prev: FFTW MPI Transposes, Up: Distributed-memory FFTW with MPI
3949
3950 6.8 FFTW MPI Wisdom
3951 ===================
3952
3953 FFTW's "wisdom" facility (*note Words of Wisdom-Saving Plans::) can be
3954 used to save MPI plans as well as to save uniprocessor plans. However,
3955 for MPI there are several unavoidable complications.
3956
3957 First, the MPI standard does not guarantee that every process can
3958 perform file I/O (at least, not using C stdio routines)--in general, we
3959 may only assume that process 0 is capable of I/O.(1) So, if we want to
3960 export the wisdom from a single process to a file, we must first export
3961 the wisdom to a string, then send it to process 0, then write it to a
3962 file.
3963
3964 Second, in principle we may want to have separate wisdom for every
3965 process, since in general the processes may run on different hardware
3966 even for a single MPI program. However, in practice FFTW's MPI code is
3967 designed for the case of homogeneous hardware (*note Load balancing::),
3968 and in this case it is convenient to use the same wisdom for every
3969 process. Thus, we need a mechanism to synchronize the wisdom.
3970
3971 To address both of these problems, FFTW provides the following two
3972 functions:
3973
3974 void fftw_mpi_broadcast_wisdom(MPI_Comm comm);
3975 void fftw_mpi_gather_wisdom(MPI_Comm comm);
3976
3977 Given a communicator `comm', `fftw_mpi_broadcast_wisdom' will
3978 broadcast the wisdom from process 0 to all other processes.
3979 Conversely, `fftw_mpi_gather_wisdom' will collect wisdom from all
3980 processes onto process 0. (If the plans created for the same problem
3981 by different processes are not the same, `fftw_mpi_gather_wisdom' will
3982 arbitrarily choose one of the plans.) Both of these functions may
3983 result in suboptimal plans for different processes if the processes are
3984 running on non-identical hardware. Both of these functions are
3985 _collective_ calls, which means that they must be executed by all
3986 processes in the communicator.
3987
3988 So, for example, a typical code snippet to import wisdom from a file
3989 and use it on all processes would be:
3990
3991 {
3992 int rank;
3993
3994 fftw_mpi_init();
3995 MPI_Comm_rank(MPI_COMM_WORLD, &rank);
3996 if (rank == 0) fftw_import_wisdom_from_filename("mywisdom");
3997 fftw_mpi_broadcast_wisdom(MPI_COMM_WORLD);
3998 }
3999
4000 (Note that we must call `fftw_mpi_init' before importing any wisdom
4001 that might contain MPI plans.) Similarly, a typical code snippet to
4002 export wisdom from all processes to a file is:
4003
4004 {
4005 int rank;
4006
4007 fftw_mpi_gather_wisdom(MPI_COMM_WORLD);
4008 MPI_Comm_rank(MPI_COMM_WORLD, &rank);
4009 if (rank == 0) fftw_export_wisdom_to_filename("mywisdom");
4010 }
4011
4012 ---------- Footnotes ----------
4013
4014 (1) In fact, even this assumption is not technically guaranteed by
4015 the standard, although it seems to be universal in actual MPI
4016 implementations and is widely assumed by MPI-using software.
4017 Technically, you need to query the `MPI_IO' attribute of
4018 `MPI_COMM_WORLD' with `MPI_Attr_get'. If this attribute is
4019 `MPI_PROC_NULL', no I/O is possible. If it is `MPI_ANY_SOURCE', any
4020 process can perform I/O. Otherwise, it is the rank of a process that
4021 can perform I/O ... but since it is not guaranteed to yield the _same_
4022 rank on all processes, you have to do an `MPI_Allreduce' of some kind
4023 if you want all processes to agree about which is going to do I/O. And
4024 even then, the standard only guarantees that this process can perform
4025 output, but not input. See e.g. `Parallel Programming with MPI' by P.
4026 S. Pacheco, section 8.1.3. Needless to say, in our experience
4027 virtually no MPI programmers worry about this.
4028
4029 
4030 File: fftw3.info, Node: Avoiding MPI Deadlocks, Next: FFTW MPI Performance Tips, Prev: FFTW MPI Wisdom, Up: Distributed-memory FFTW with MPI
4031
4032 6.9 Avoiding MPI Deadlocks
4033 ==========================
4034
4035 An MPI program can _deadlock_ if one process is waiting for a message
4036 from another process that never gets sent. To avoid deadlocks when
4037 using FFTW's MPI routines, it is important to know which functions are
4038 _collective_: that is, which functions must _always_ be called in the
4039 _same order_ from _every_ process in a given communicator. (For
4040 example, `MPI_Barrier' is the canonical example of a collective
4041 function in the MPI standard.)
4042
4043 The functions in FFTW that are _always_ collective are: every
4044 function beginning with `fftw_mpi_plan', as well as
4045 `fftw_mpi_broadcast_wisdom' and `fftw_mpi_gather_wisdom'. Also, the
4046 following functions from the ordinary FFTW interface are collective
4047 when they are applied to a plan created by an `fftw_mpi_plan' function:
4048 `fftw_execute', `fftw_destroy_plan', and `fftw_flops'.
4049
4050 
4051 File: fftw3.info, Node: FFTW MPI Performance Tips, Next: Combining MPI and Threads, Prev: Avoiding MPI Deadlocks, Up: Distributed-memory FFTW with MPI
4052
4053 6.10 FFTW MPI Performance Tips
4054 ==============================
4055
4056 In this section, we collect a few tips on getting the best performance
4057 out of FFTW's MPI transforms.
4058
4059 First, because of the 1d block distribution, FFTW's parallelization
4060 is currently limited by the size of the first dimension.
4061 (Multidimensional block distributions may be supported by a future
4062 version.) More generally, you should ideally arrange the dimensions so
4063 that FFTW can divide them equally among the processes. *Note Load
4064 balancing::.
4065
4066 Second, if it is not too inconvenient, you should consider working
4067 with transposed output for multidimensional plans, as this saves a
4068 considerable amount of communications. *Note Transposed
4069 distributions::.
4070
4071 Third, the fastest choices are generally either an in-place transform
4072 or an out-of-place transform with the `FFTW_DESTROY_INPUT' flag (which
4073 allows the input array to be used as scratch space). In-place is
4074 especially beneficial if the amount of data per process is large.
4075
4076 Fourth, if you have multiple arrays to transform at once, rather than
4077 calling FFTW's MPI transforms several times it usually seems to be
4078 faster to interleave the data and use the advanced interface. (This
4079 groups the communications together instead of requiring separate
4080 messages for each transform.)
4081
4082 
4083 File: fftw3.info, Node: Combining MPI and Threads, Next: FFTW MPI Reference, Prev: FFTW MPI Performance Tips, Up: Distributed-memory FFTW with MPI
4084
4085 6.11 Combining MPI and Threads
4086 ==============================
4087
4088 In certain cases, it may be advantageous to combine MPI
4089 (distributed-memory) and threads (shared-memory) parallelization. FFTW
4090 supports this, with certain caveats. For example, if you have a
4091 cluster of 4-processor shared-memory nodes, you may want to use threads
4092 within the nodes and MPI between the nodes, instead of MPI for all
4093 parallelization.
4094
4095 In particular, it is possible to seamlessly combine the MPI FFTW
4096 routines with the multi-threaded FFTW routines (*note Multi-threaded
4097 FFTW::). However, some care must be taken in the initialization code,
4098 which should look something like this:
4099
4100 int threads_ok;
4101
4102 int main(int argc, char **argv)
4103 {
4104 int provided;
4105 MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &provided);
4106 threads_ok = provided >= MPI_THREAD_FUNNELED;
4107
4108 if (threads_ok) threads_ok = fftw_init_threads();
4109 fftw_mpi_init();
4110
4111 ...
4112 if (threads_ok) fftw_plan_with_nthreads(...);
4113 ...
4114
4115 MPI_Finalize();
4116 }
4117
4118 First, note that instead of calling `MPI_Init', you should call
4119 `MPI_Init_threads', which is the initialization routine defined by the
4120 MPI-2 standard to indicate to MPI that your program will be
4121 multithreaded. We pass `MPI_THREAD_FUNNELED', which indicates that we
4122 will only call MPI routines from the main thread. (FFTW will launch
4123 additional threads internally, but the extra threads will not call MPI
4124 code.) (You may also pass `MPI_THREAD_SERIALIZED' or
4125 `MPI_THREAD_MULTIPLE', which requests additional multithreading support
4126 from the MPI implementation, but this is not required by FFTW.) The
4127 `provided' parameter returns what level of threads support is actually
4128 supported by your MPI implementation; this _must_ be at least
4129 `MPI_THREAD_FUNNELED' if you want to call the FFTW threads routines, so
4130 we define a global variable `threads_ok' to record this. You should
4131 only call `fftw_init_threads' or `fftw_plan_with_nthreads' if
4132 `threads_ok' is true. For more information on thread safety in MPI,
4133 see the MPI and Threads
4134 (http://www.mpi-forum.org/docs/mpi-20-html/node162.htm) section of the
4135 MPI-2 standard.
4136
4137 Second, we must call `fftw_init_threads' _before_ `fftw_mpi_init'.
4138 This is critical for technical reasons having to do with how FFTW
4139 initializes its list of algorithms.
4140
4141 Then, if you call `fftw_plan_with_nthreads(N)', _every_ MPI process
4142 will launch (up to) `N' threads to parallelize its transforms.
4143
4144 For example, in the hypothetical cluster of 4-processor nodes, you
4145 might wish to launch only a single MPI process per node, and then call
4146 `fftw_plan_with_nthreads(4)' on each process to use all processors in
4147 the nodes.
4148
4149 This may or may not be faster than simply using as many MPI processes
4150 as you have processors, however. On the one hand, using threads within
4151 a node eliminates the need for explicit message passing within the
4152 node. On the other hand, FFTW's transpose routines are not
4153 multi-threaded, and this means that the communications that do take
4154 place will not benefit from parallelization within the node. Moreover,
4155 many MPI implementations already have optimizations to exploit shared
4156 memory when it is available, so adding the multithreaded FFTW on top of
4157 this may be superfluous.
4158
4159 
4160 File: fftw3.info, Node: FFTW MPI Reference, Next: FFTW MPI Fortran Interface, Prev: Combining MPI and Threads, Up: Distributed-memory FFTW with MPI
4161
4162 6.12 FFTW MPI Reference
4163 =======================
4164
4165 This chapter provides a complete reference to all FFTW MPI functions,
4166 datatypes, and constants. See also *note FFTW Reference:: for
4167 information on functions and types in common with the serial interface.
4168
4169 * Menu:
4170
4171 * MPI Files and Data Types::
4172 * MPI Initialization::
4173 * Using MPI Plans::
4174 * MPI Data Distribution Functions::
4175 * MPI Plan Creation::
4176 * MPI Wisdom Communication::
4177
4178 
4179 File: fftw3.info, Node: MPI Files and Data Types, Next: MPI Initialization, Prev: FFTW MPI Reference, Up: FFTW MPI Reference
4180
4181 6.12.1 MPI Files and Data Types
4182 -------------------------------
4183
4184 All programs using FFTW's MPI support should include its header file:
4185
4186 #include <fftw3-mpi.h>
4187
4188 Note that this header file includes the serial-FFTW `fftw3.h' header
4189 file, and also the `mpi.h' header file for MPI, so you need not include
4190 those files separately.
4191
4192 You must also link to _both_ the FFTW MPI library and to the serial
4193 FFTW library. On Unix, this means adding `-lfftw3_mpi -lfftw3 -lm' at
4194 the end of the link command.
4195
4196 Different precisions are handled as in the serial interface: *Note
4197 Precision::. That is, `fftw_' functions become `fftwf_' (in single
4198 precision) etcetera, and the libraries become `-lfftw3f_mpi -lfftw3f
4199 -lm' etcetera on Unix. Long-double precision is supported in MPI, but
4200 quad precision (`fftwq_') is not due to the lack of MPI support for
4201 this type.
4202
4203 
4204 File: fftw3.info, Node: MPI Initialization, Next: Using MPI Plans, Prev: MPI Files and Data Types, Up: FFTW MPI Reference
4205
4206 6.12.2 MPI Initialization
4207 -------------------------
4208
4209 Before calling any other FFTW MPI (`fftw_mpi_') function, and before
4210 importing any wisdom for MPI problems, you must call:
4211
4212 void fftw_mpi_init(void);
4213
4214 If FFTW threads support is used, however, `fftw_mpi_init' should be
4215 called _after_ `fftw_init_threads' (*note Combining MPI and Threads::).
4216 Calling `fftw_mpi_init' additional times (before `fftw_mpi_cleanup')
4217 has no effect.
4218
4219 If you want to deallocate all persistent data and reset FFTW to the
4220 pristine state it was in when you started your program, you can call:
4221
4222 void fftw_mpi_cleanup(void);
4223
4224 (This calls `fftw_cleanup', so you need not call the serial cleanup
4225 routine too, although it is safe to do so.) After calling
4226 `fftw_mpi_cleanup', all existing plans become undefined, and you should
4227 not attempt to execute or destroy them. You must call `fftw_mpi_init'
4228 again after `fftw_mpi_cleanup' if you want to resume using the MPI FFTW
4229 routines.
4230
4231 
4232 File: fftw3.info, Node: Using MPI Plans, Next: MPI Data Distribution Functions, Prev: MPI Initialization, Up: FFTW MPI Reference
4233
4234 6.12.3 Using MPI Plans
4235 ----------------------
4236
4237 Once an MPI plan is created, you can execute and destroy it using
4238 `fftw_execute', `fftw_destroy_plan', and the other functions in the
4239 serial interface that operate on generic plans (*note Using Plans::).
4240
4241 The `fftw_execute' and `fftw_destroy_plan' functions, applied to MPI
4242 plans, are _collective_ calls: they must be called for all processes in
4243 the communicator that was used to create the plan.
4244
4245 You must _not_ use the serial new-array plan-execution functions
4246 `fftw_execute_dft' and so on (*note New-array Execute Functions::) with
4247 MPI plans. Such functions are specialized to the problem type, and
4248 there are specific new-array execute functions for MPI plans:
4249
4250 void fftw_mpi_execute_dft(fftw_plan p, fftw_complex *in, fftw_complex *out);
4251 void fftw_mpi_execute_dft_r2c(fftw_plan p, double *in, fftw_complex *out);
4252 void fftw_mpi_execute_dft_c2r(fftw_plan p, fftw_complex *in, double *out);
4253 void fftw_mpi_execute_r2r(fftw_plan p, double *in, double *out);
4254
4255 These functions have the same restrictions as those of the serial
4256 new-array execute functions. They are _always_ safe to apply to the
4257 _same_ `in' and `out' arrays that were used to create the plan. They
4258 can only be applied to new arrarys if those arrays have the same types,
4259 dimensions, in-placeness, and alignment as the original arrays, where
4260 the best way to ensure the same alignment is to use FFTW's
4261 `fftw_malloc' and related allocation functions for all arrays (*note
4262 Memory Allocation::). Note that distributed transposes (*note FFTW MPI
4263 Transposes::) use `fftw_mpi_execute_r2r', since they count as rank-zero
4264 r2r plans from FFTW's perspective.
4265
4266 
4267 File: fftw3.info, Node: MPI Data Distribution Functions, Next: MPI Plan Creation, Prev: Using MPI Plans, Up: FFTW MPI Reference
4268
4269 6.12.4 MPI Data Distribution Functions
4270 --------------------------------------
4271
4272 As described above (*note MPI Data Distribution::), in order to
4273 allocate your arrays, _before_ creating a plan, you must first call one
4274 of the following routines to determine the required allocation size and
4275 the portion of the array locally stored on a given process. The
4276 `MPI_Comm' communicator passed here must be equivalent to the
4277 communicator used below for plan creation.
4278
4279 The basic interface for multidimensional transforms consists of the
4280 functions:
4281
4282 ptrdiff_t fftw_mpi_local_size_2d(ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
4283 ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
4284 ptrdiff_t fftw_mpi_local_size_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
4285 MPI_Comm comm,
4286 ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
4287 ptrdiff_t fftw_mpi_local_size(int rnk, const ptrdiff_t *n, MPI_Comm comm,
4288 ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
4289
4290 ptrdiff_t fftw_mpi_local_size_2d_transposed(ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
4291 ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
4292 ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
4293 ptrdiff_t fftw_mpi_local_size_3d_transposed(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
4294 MPI_Comm comm,
4295 ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
4296 ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
4297 ptrdiff_t fftw_mpi_local_size_transposed(int rnk, const ptrdiff_t *n, MPI_Comm comm,
4298 ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
4299 ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
4300
4301 These functions return the number of elements to allocate (complex
4302 numbers for DFT/r2c/c2r plans, real numbers for r2r plans), whereas the
4303 `local_n0' and `local_0_start' return the portion (`local_0_start' to
4304 `local_0_start + local_n0 - 1') of the first dimension of an n[0] x
4305 n[1] x n[2] x ... x n[d-1] array that is stored on the local process.
4306 *Note Basic and advanced distribution interfaces::. For
4307 `FFTW_MPI_TRANSPOSED_OUT' plans, the `_transposed' variants are useful
4308 in order to also return the local portion of the first dimension in the
4309 n[1] x n[0] x n[2] x ... x n[d-1] transposed output. *Note Transposed
4310 distributions::. The advanced interface for multidimensional
4311 transforms is:
4312
4313 ptrdiff_t fftw_mpi_local_size_many(int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
4314 ptrdiff_t block0, MPI_Comm comm,
4315 ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
4316 ptrdiff_t fftw_mpi_local_size_many_transposed(int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
4317 ptrdiff_t block0, ptrdiff_t block1, MPI_Comm comm,
4318 ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
4319 ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
4320
4321 These differ from the basic interface in only two ways. First, they
4322 allow you to specify block sizes `block0' and `block1' (the latter for
4323 the transposed output); you can pass `FFTW_MPI_DEFAULT_BLOCK' to use
4324 FFTW's default block size as in the basic interface. Second, you can
4325 pass a `howmany' parameter, corresponding to the advanced planning
4326 interface below: this is for transforms of contiguous `howmany'-tuples
4327 of numbers (`howmany = 1' in the basic interface).
4328
4329 The corresponding basic and advanced routines for one-dimensional
4330 transforms (currently only complex DFTs) are:
4331
4332 ptrdiff_t fftw_mpi_local_size_1d(
4333 ptrdiff_t n0, MPI_Comm comm, int sign, unsigned flags,
4334 ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
4335 ptrdiff_t *local_no, ptrdiff_t *local_o_start);
4336 ptrdiff_t fftw_mpi_local_size_many_1d(
4337 ptrdiff_t n0, ptrdiff_t howmany,
4338 MPI_Comm comm, int sign, unsigned flags,
4339 ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
4340 ptrdiff_t *local_no, ptrdiff_t *local_o_start);
4341
4342 As above, the return value is the number of elements to allocate
4343 (complex numbers, for complex DFTs). The `local_ni' and
4344 `local_i_start' arguments return the portion (`local_i_start' to
4345 `local_i_start + local_ni - 1') of the 1d array that is stored on this
4346 process for the transform _input_, and `local_no' and `local_o_start'
4347 are the corresponding quantities for the input. The `sign'
4348 (`FFTW_FORWARD' or `FFTW_BACKWARD') and `flags' must match the
4349 arguments passed when creating a plan. Although the inputs and outputs
4350 have different data distributions in general, it is guaranteed that the
4351 _output_ data distribution of an `FFTW_FORWARD' plan will match the
4352 _input_ data distribution of an `FFTW_BACKWARD' plan and vice versa;
4353 similarly for the `FFTW_MPI_SCRAMBLED_OUT' and `FFTW_MPI_SCRAMBLED_IN'
4354 flags. *Note One-dimensional distributions::.
4355
4356 
4357 File: fftw3.info, Node: MPI Plan Creation, Next: MPI Wisdom Communication, Prev: MPI Data Distribution Functions, Up: FFTW MPI Reference
4358
4359 6.12.5 MPI Plan Creation
4360 ------------------------
4361
4362 Complex-data MPI DFTs
4363 .....................
4364
4365 Plans for complex-data DFTs (*note 2d MPI example::) are created by:
4366
4367 fftw_plan fftw_mpi_plan_dft_1d(ptrdiff_t n0, fftw_complex *in, fftw_complex *out,
4368 MPI_Comm comm, int sign, unsigned flags);
4369 fftw_plan fftw_mpi_plan_dft_2d(ptrdiff_t n0, ptrdiff_t n1,
4370 fftw_complex *in, fftw_complex *out,
4371 MPI_Comm comm, int sign, unsigned flags);
4372 fftw_plan fftw_mpi_plan_dft_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
4373 fftw_complex *in, fftw_complex *out,
4374 MPI_Comm comm, int sign, unsigned flags);
4375 fftw_plan fftw_mpi_plan_dft(int rnk, const ptrdiff_t *n,
4376 fftw_complex *in, fftw_complex *out,
4377 MPI_Comm comm, int sign, unsigned flags);
4378 fftw_plan fftw_mpi_plan_many_dft(int rnk, const ptrdiff_t *n,
4379 ptrdiff_t howmany, ptrdiff_t block, ptrdiff_t tblock,
4380 fftw_complex *in, fftw_complex *out,
4381 MPI_Comm comm, int sign, unsigned flags);
4382
4383 These are similar to their serial counterparts (*note Complex DFTs::)
4384 in specifying the dimensions, sign, and flags of the transform. The
4385 `comm' argument gives an MPI communicator that specifies the set of
4386 processes to participate in the transform; plan creation is a
4387 collective function that must be called for all processes in the
4388 communicator. The `in' and `out' pointers refer only to a portion of
4389 the overall transform data (*note MPI Data Distribution::) as specified
4390 by the `local_size' functions in the previous section. Unless `flags'
4391 contains `FFTW_ESTIMATE', these arrays are overwritten during plan
4392 creation as for the serial interface. For multi-dimensional
4393 transforms, any dimensions `> 1' are supported; for one-dimensional
4394 transforms, only composite (non-prime) `n0' are currently supported
4395 (unlike the serial FFTW). Requesting an unsupported transform size
4396 will yield a `NULL' plan. (As in the serial interface, highly
4397 composite sizes generally yield the best performance.)
4398
4399 The advanced-interface `fftw_mpi_plan_many_dft' additionally allows
4400 you to specify the block sizes for the first dimension (`block') of the
4401 n[0] x n[1] x n[2] x ... x n[d-1] input data and the first dimension
4402 (`tblock') of the n[1] x n[0] x n[2] x ... x n[d-1] transposed data
4403 (at intermediate steps of the transform, and for the output if
4404 `FFTW_TRANSPOSED_OUT' is specified in `flags'). These must be the same
4405 block sizes as were passed to the corresponding `local_size' function;
4406 you can pass `FFTW_MPI_DEFAULT_BLOCK' to use FFTW's default block size
4407 as in the basic interface. Also, the `howmany' parameter specifies
4408 that the transform is of contiguous `howmany'-tuples rather than
4409 individual complex numbers; this corresponds to the same parameter in
4410 the serial advanced interface (*note Advanced Complex DFTs::) with
4411 `stride = howmany' and `dist = 1'.
4412
4413 MPI flags
4414 .........
4415
4416 The `flags' can be any of those for the serial FFTW (*note Planner
4417 Flags::), and in addition may include one or more of the following
4418 MPI-specific flags, which improve performance at the cost of changing
4419 the output or input data formats.
4420
4421 * `FFTW_MPI_SCRAMBLED_OUT', `FFTW_MPI_SCRAMBLED_IN': valid for 1d
4422 transforms only, these flags indicate that the output/input of the
4423 transform are in an undocumented "scrambled" order. A forward
4424 `FFTW_MPI_SCRAMBLED_OUT' transform can be inverted by a backward
4425 `FFTW_MPI_SCRAMBLED_IN' (times the usual 1/N normalization).
4426 *Note One-dimensional distributions::.
4427
4428 * `FFTW_MPI_TRANSPOSED_OUT', `FFTW_MPI_TRANSPOSED_IN': valid for
4429 multidimensional (`rnk > 1') transforms only, these flags specify
4430 that the output or input of an n[0] x n[1] x n[2] x ... x n[d-1]
4431 transform is transposed to n[1] x n[0] x n[2] x ... x n[d-1] .
4432 *Note Transposed distributions::.
4433
4434
4435 Real-data MPI DFTs
4436 ..................
4437
4438 Plans for real-input/output (r2c/c2r) DFTs (*note Multi-dimensional MPI
4439 DFTs of Real Data::) are created by:
4440
4441 fftw_plan fftw_mpi_plan_dft_r2c_2d(ptrdiff_t n0, ptrdiff_t n1,
4442 double *in, fftw_complex *out,
4443 MPI_Comm comm, unsigned flags);
4444 fftw_plan fftw_mpi_plan_dft_r2c_2d(ptrdiff_t n0, ptrdiff_t n1,
4445 double *in, fftw_complex *out,
4446 MPI_Comm comm, unsigned flags);
4447 fftw_plan fftw_mpi_plan_dft_r2c_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
4448 double *in, fftw_complex *out,
4449 MPI_Comm comm, unsigned flags);
4450 fftw_plan fftw_mpi_plan_dft_r2c(int rnk, const ptrdiff_t *n,
4451 double *in, fftw_complex *out,
4452 MPI_Comm comm, unsigned flags);
4453 fftw_plan fftw_mpi_plan_dft_c2r_2d(ptrdiff_t n0, ptrdiff_t n1,
4454 fftw_complex *in, double *out,
4455 MPI_Comm comm, unsigned flags);
4456 fftw_plan fftw_mpi_plan_dft_c2r_2d(ptrdiff_t n0, ptrdiff_t n1,
4457 fftw_complex *in, double *out,
4458 MPI_Comm comm, unsigned flags);
4459 fftw_plan fftw_mpi_plan_dft_c2r_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
4460 fftw_complex *in, double *out,
4461 MPI_Comm comm, unsigned flags);
4462 fftw_plan fftw_mpi_plan_dft_c2r(int rnk, const ptrdiff_t *n,
4463 fftw_complex *in, double *out,
4464 MPI_Comm comm, unsigned flags);
4465
4466 Similar to the serial interface (*note Real-data DFTs::), these
4467 transform logically n[0] x n[1] x n[2] x ... x n[d-1] real data
4468 to/from n[0] x n[1] x n[2] x ... x (n[d-1]/2 + 1) complex data,
4469 representing the non-redundant half of the conjugate-symmetry output of
4470 a real-input DFT (*note Multi-dimensional Transforms::). However, the
4471 real array must be stored within a padded n[0] x n[1] x n[2] x ... x [2
4472 (n[d-1]/2 + 1)]
4473
4474 array (much like the in-place serial r2c transforms, but here for
4475 out-of-place transforms as well). Currently, only multi-dimensional
4476 (`rnk > 1') r2c/c2r transforms are supported (requesting a plan for
4477 `rnk = 1' will yield `NULL'). As explained above (*note
4478 Multi-dimensional MPI DFTs of Real Data::), the data distribution of
4479 both the real and complex arrays is given by the `local_size' function
4480 called for the dimensions of the _complex_ array. Similar to the other
4481 planning functions, the input and output arrays are overwritten when
4482 the plan is created except in `FFTW_ESTIMATE' mode.
4483
4484 As for the complex DFTs above, there is an advance interface that
4485 allows you to manually specify block sizes and to transform contiguous
4486 `howmany'-tuples of real/complex numbers:
4487
4488 fftw_plan fftw_mpi_plan_many_dft_r2c
4489 (int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
4490 ptrdiff_t iblock, ptrdiff_t oblock,
4491 double *in, fftw_complex *out,
4492 MPI_Comm comm, unsigned flags);
4493 fftw_plan fftw_mpi_plan_many_dft_c2r
4494 (int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
4495 ptrdiff_t iblock, ptrdiff_t oblock,
4496 fftw_complex *in, double *out,
4497 MPI_Comm comm, unsigned flags);
4498
4499 MPI r2r transforms
4500 ..................
4501
4502 There are corresponding plan-creation routines for r2r transforms
4503 (*note More DFTs of Real Data::), currently supporting multidimensional
4504 (`rnk > 1') transforms only (`rnk = 1' will yield a `NULL' plan):
4505
4506 fftw_plan fftw_mpi_plan_r2r_2d(ptrdiff_t n0, ptrdiff_t n1,
4507 double *in, double *out,
4508 MPI_Comm comm,
4509 fftw_r2r_kind kind0, fftw_r2r_kind kind1,
4510 unsigned flags);
4511 fftw_plan fftw_mpi_plan_r2r_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
4512 double *in, double *out,
4513 MPI_Comm comm,
4514 fftw_r2r_kind kind0, fftw_r2r_kind kind1, fftw_r2r_kind kind2,
4515 unsigned flags);
4516 fftw_plan fftw_mpi_plan_r2r(int rnk, const ptrdiff_t *n,
4517 double *in, double *out,
4518 MPI_Comm comm, const fftw_r2r_kind *kind,
4519 unsigned flags);
4520 fftw_plan fftw_mpi_plan_many_r2r(int rnk, const ptrdiff_t *n,
4521 ptrdiff_t iblock, ptrdiff_t oblock,
4522 double *in, double *out,
4523 MPI_Comm comm, const fftw_r2r_kind *kind,
4524 unsigned flags);
4525
4526 The parameters are much the same as for the complex DFTs above,
4527 except that the arrays are of real numbers (and hence the outputs of the
4528 `local_size' data-distribution functions should be interpreted as
4529 counts of real rather than complex numbers). Also, the `kind'
4530 parameters specify the r2r kinds along each dimension as for the serial
4531 interface (*note Real-to-Real Transform Kinds::). *Note Other
4532 Multi-dimensional Real-data MPI Transforms::.
4533
4534 MPI transposition
4535 .................
4536
4537 FFTW also provides routines to plan a transpose of a distributed `n0'
4538 by `n1' array of real numbers, or an array of `howmany'-tuples of real
4539 numbers with specified block sizes (*note FFTW MPI Transposes::):
4540
4541 fftw_plan fftw_mpi_plan_transpose(ptrdiff_t n0, ptrdiff_t n1,
4542 double *in, double *out,
4543 MPI_Comm comm, unsigned flags);
4544 fftw_plan fftw_mpi_plan_many_transpose
4545 (ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t howmany,
4546 ptrdiff_t block0, ptrdiff_t block1,
4547 double *in, double *out, MPI_Comm comm, unsigned flags);
4548
4549 These plans are used with the `fftw_mpi_execute_r2r' new-array
4550 execute function (*note Using MPI Plans::), since they count as (rank
4551 zero) r2r plans from FFTW's perspective.
4552
4553 
4554 File: fftw3.info, Node: MPI Wisdom Communication, Prev: MPI Plan Creation, Up: FFTW MPI Reference
4555
4556 6.12.6 MPI Wisdom Communication
4557 -------------------------------
4558
4559 To facilitate synchronizing wisdom among the different MPI processes,
4560 we provide two functions:
4561
4562 void fftw_mpi_gather_wisdom(MPI_Comm comm);
4563 void fftw_mpi_broadcast_wisdom(MPI_Comm comm);
4564
4565 The `fftw_mpi_gather_wisdom' function gathers all wisdom in the
4566 given communicator `comm' to the process of rank 0 in the communicator:
4567 that process obtains the union of all wisdom on all the processes. As
4568 a side effect, some other processes will gain additional wisdom from
4569 other processes, but only process 0 will gain the complete union.
4570
4571 The `fftw_mpi_broadcast_wisdom' does the reverse: it exports wisdom
4572 from process 0 in `comm' to all other processes in the communicator,
4573 replacing any wisdom they currently have.
4574
4575 *Note FFTW MPI Wisdom::.
4576
4577 
4578 File: fftw3.info, Node: FFTW MPI Fortran Interface, Prev: FFTW MPI Reference, Up: Distributed-memory FFTW with MPI
4579
4580 6.13 FFTW MPI Fortran Interface
4581 ===============================
4582
4583 The FFTW MPI interface is callable from modern Fortran compilers
4584 supporting the Fortran 2003 `iso_c_binding' standard for calling C
4585 functions. As described in *note Calling FFTW from Modern Fortran::,
4586 this means that you can directly call FFTW's C interface from Fortran
4587 with only minor changes in syntax. There are, however, a few things
4588 specific to the MPI interface to keep in mind:
4589
4590 * Instead of including `fftw3.f03' as in *note Overview of Fortran
4591 interface::, you should `include 'fftw3-mpi.f03'' (after `use,
4592 intrinsic :: iso_c_binding' as before). The `fftw3-mpi.f03' file
4593 includes `fftw3.f03', so you should _not_ `include' them both
4594 yourself. (You will also want to include the MPI header file,
4595 usually via `include 'mpif.h'' or similar, although though this is
4596 not needed by `fftw3-mpi.f03' per se.) (To use the `fftwl_' `long
4597 double' extended-precision routines in supporting compilers, you
4598 should include `fftw3f-mpi.f03' in _addition_ to `fftw3-mpi.f03'.
4599 *Note Extended and quadruple precision in Fortran::.)
4600
4601 * Because of the different storage conventions between C and Fortran,
4602 you reverse the order of your array dimensions when passing them to
4603 FFTW (*note Reversing array dimensions::). This is merely a
4604 difference in notation and incurs no performance overhead.
4605 However, it means that, whereas in C the _first_ dimension is
4606 distributed, in Fortran the _last_ dimension of your array is
4607 distributed.
4608
4609 * In Fortran, communicators are stored as `integer' types; there is
4610 no `MPI_Comm' type, nor is there any way to access a C `MPI_Comm'.
4611 Fortunately, this is taken care of for you by the FFTW Fortran
4612 interface: whenever the C interface expects an `MPI_Comm' type,
4613 you should pass the Fortran communicator as an `integer'.(1)
4614
4615 * Because you need to call the `local_size' function to find out how
4616 much space to allocate, and this may be _larger_ than the local
4617 portion of the array (*note MPI Data Distribution::), you should
4618 _always_ allocate your arrays dynamically using FFTW's allocation
4619 routines as described in *note Allocating aligned memory in
4620 Fortran::. (Coincidentally, this also provides the best
4621 performance by guaranteeding proper data alignment.)
4622
4623 * Because all sizes in the MPI FFTW interface are declared as
4624 `ptrdiff_t' in C, you should use `integer(C_INTPTR_T)' in Fortran
4625 (*note FFTW Fortran type reference::).
4626
4627 * In Fortran, because of the language semantics, we generally
4628 recommend using the new-array execute functions for all plans,
4629 even in the common case where you are executing the plan on the
4630 same arrays for which the plan was created (*note Plan execution
4631 in Fortran::). However, note that in the MPI interface these
4632 functions are changed: `fftw_execute_dft' becomes
4633 `fftw_mpi_execute_dft', etcetera. *Note Using MPI Plans::.
4634
4635
4636 For example, here is a Fortran code snippet to perform a distributed
4637 L x M complex DFT in-place. (This assumes you have already
4638 initialized MPI with `MPI_init' and have also performed `call
4639 fftw_mpi_init'.)
4640
4641 use, intrinsic :: iso_c_binding
4642 include 'fftw3-mpi.f03'
4643 integer(C_INTPTR_T), parameter :: L = ...
4644 integer(C_INTPTR_T), parameter :: M = ...
4645 type(C_PTR) :: plan, cdata
4646 complex(C_DOUBLE_COMPLEX), pointer :: data(:,:)
4647 integer(C_INTPTR_T) :: i, j, alloc_local, local_M, local_j_offset
4648
4649 ! get local data size and allocate (note dimension reversal)
4650 alloc_local = fftw_mpi_local_size_2d(M, L, MPI_COMM_WORLD, &
4651 local_M, local_j_offset)
4652 cdata = fftw_alloc_complex(alloc_local)
4653 call c_f_pointer(cdata, data, [L,local_M])
4654
4655 ! create MPI plan for in-place forward DFT (note dimension reversal)
4656 plan = fftw_mpi_plan_dft_2d(M, L, data, data, MPI_COMM_WORLD, &
4657 FFTW_FORWARD, FFTW_MEASURE)
4658
4659 ! initialize data to some function my_function(i,j)
4660 do j = 1, local_M
4661 do i = 1, L
4662 data(i, j) = my_function(i, j + local_j_offset)
4663 end do
4664 end do
4665
4666 ! compute transform (as many times as desired)
4667 call fftw_mpi_execute_dft(plan, data, data)
4668
4669 call fftw_destroy_plan(plan)
4670 call fftw_free(cdata)
4671
4672 Note that when we called `fftw_mpi_local_size_2d' and
4673 `fftw_mpi_plan_dft_2d' with the dimensions in reversed order, since a L
4674 x M Fortran array is viewed by FFTW in C as a M x L array. This
4675 means that the array was distributed over the `M' dimension, the local
4676 portion of which is a L x local_M array in Fortran. (You must _not_
4677 use an `allocate' statement to allocate an L x local_M array, however;
4678 you must allocate `alloc_local' complex numbers, which may be greater
4679 than `L * local_M', in order to reserve space for intermediate steps of
4680 the transform.) Finally, we mention that because C's array indices are
4681 zero-based, the `local_j_offset' argument can conveniently be
4682 interpreted as an offset in the 1-based `j' index (rather than as a
4683 starting index as in C).
4684
4685 If instead you had used the `ior(FFTW_MEASURE,
4686 FFTW_MPI_TRANSPOSED_OUT)' flag, the output of the transform would be a
4687 transposed M x local_L array, associated with the _same_ `cdata'
4688 allocation (since the transform is in-place), and which you could
4689 declare with:
4690
4691 complex(C_DOUBLE_COMPLEX), pointer :: tdata(:,:)
4692 ...
4693 call c_f_pointer(cdata, tdata, [M,local_L])
4694
4695 where `local_L' would have been obtained by changing the
4696 `fftw_mpi_local_size_2d' call to:
4697
4698 alloc_local = fftw_mpi_local_size_2d_transposed(M, L, MPI_COMM_WORLD, &
4699 local_M, local_j_offset, local_L, local_i_offset)
4700
4701 ---------- Footnotes ----------
4702
4703 (1) Technically, this is because you aren't actually calling the C
4704 functions directly. You are calling wrapper functions that translate
4705 the communicator with `MPI_Comm_f2c' before calling the ordinary C
4706 interface. This is all done transparently, however, since the
4707 `fftw3-mpi.f03' interface file renames the wrappers so that they are
4708 called in Fortran with the same names as the C interface functions.
4709
4710 
4711 File: fftw3.info, Node: Calling FFTW from Modern Fortran, Next: Calling FFTW from Legacy Fortran, Prev: Distributed-memory FFTW with MPI, Up: Top
4712
4713 7 Calling FFTW from Modern Fortran
4714 **********************************
4715
4716 Fortran 2003 standardized ways for Fortran code to call C libraries,
4717 and this allows us to support a direct translation of the FFTW C API
4718 into Fortran. Compared to the legacy Fortran 77 interface (*note
4719 Calling FFTW from Legacy Fortran::), this direct interface offers many
4720 advantages, especially compile-time type-checking and aligned memory
4721 allocation. As of this writing, support for these C interoperability
4722 features seems widespread, having been implemented in nearly all major
4723 Fortran compilers (e.g. GNU, Intel, IBM, Oracle/Solaris, Portland
4724 Group, NAG).
4725
4726 This chapter documents that interface. For the most part, since this
4727 interface allows Fortran to call the C interface directly, the usage is
4728 identical to C translated to Fortran syntax. However, there are a few
4729 subtle points such as memory allocation, wisdom, and data types that
4730 deserve closer attention.
4731
4732 * Menu:
4733
4734 * Overview of Fortran interface::
4735 * Reversing array dimensions::
4736 * FFTW Fortran type reference::
4737 * Plan execution in Fortran::
4738 * Allocating aligned memory in Fortran::
4739 * Accessing the wisdom API from Fortran::
4740 * Defining an FFTW module::
4741
4742 
4743 File: fftw3.info, Node: Overview of Fortran interface, Next: Reversing array dimensions, Prev: Calling FFTW from Modern Fortran, Up: Calling FFTW from Modern Fortran
4744
4745 7.1 Overview of Fortran interface
4746 =================================
4747
4748 FFTW provides a file `fftw3.f03' that defines Fortran 2003 interfaces
4749 for all of its C routines, except for the MPI routines described
4750 elsewhere, which can be found in the same directory as `fftw3.h' (the C
4751 header file). In any Fortran subroutine where you want to use FFTW
4752 functions, you should begin with:
4753
4754 use, intrinsic :: iso_c_binding
4755 include 'fftw3.f03'
4756
4757 This includes the interface definitions and the standard
4758 `iso_c_binding' module (which defines the equivalents of C types). You
4759 can also put the FFTW functions into a module if you prefer (*note
4760 Defining an FFTW module::).
4761
4762 At this point, you can now call anything in the FFTW C interface
4763 directly, almost exactly as in C other than minor changes in syntax.
4764 For example:
4765
4766 type(C_PTR) :: plan
4767 complex(C_DOUBLE_COMPLEX), dimension(1024,1000) :: in, out
4768 plan = fftw_plan_dft_2d(1000,1024, in,out, FFTW_FORWARD,FFTW_ESTIMATE)
4769 ...
4770 call fftw_execute_dft(plan, in, out)
4771 ...
4772 call fftw_destroy_plan(plan)
4773
4774 A few important things to keep in mind are:
4775
4776 * FFTW plans are `type(C_PTR)'. Other C types are mapped in the
4777 obvious way via the `iso_c_binding' standard: `int' turns into
4778 `integer(C_INT)', `fftw_complex' turns into
4779 `complex(C_DOUBLE_COMPLEX)', `double' turns into `real(C_DOUBLE)',
4780 and so on. *Note FFTW Fortran type reference::.
4781
4782 * Functions in C become functions in Fortran if they have a return
4783 value, and subroutines in Fortran otherwise.
4784
4785 * The ordering of the Fortran array dimensions must be _reversed_
4786 when they are passed to the FFTW plan creation, thanks to
4787 differences in array indexing conventions (*note Multi-dimensional
4788 Array Format::). This is _unlike_ the legacy Fortran interface
4789 (*note Fortran-interface routines::), which reversed the dimensions
4790 for you. *Note Reversing array dimensions::.
4791
4792 * Using ordinary Fortran array declarations like this works, but may
4793 yield suboptimal performance because the data may not be not
4794 aligned to exploit SIMD instructions on modern proessors (*note
4795 SIMD alignment and fftw_malloc::). Better performance will often
4796 be obtained by allocating with `fftw_alloc'. *Note Allocating
4797 aligned memory in Fortran::.
4798
4799 * Similar to the legacy Fortran interface (*note FFTW Execution in
4800 Fortran::), we currently recommend _not_ using `fftw_execute' but
4801 rather using the more specialized functions like
4802 `fftw_execute_dft' (*note New-array Execute Functions::).
4803 However, you should execute the plan on the `same arrays' as the
4804 ones for which you created the plan, unless you are especially
4805 careful. *Note Plan execution in Fortran::. To prevent you from
4806 using `fftw_execute' by mistake, the `fftw3.f03' file does not
4807 provide an `fftw_execute' interface declaration.
4808
4809 * Multiple planner flags are combined with `ior' (equivalent to `|'
4810 in C). e.g. `FFTW_MEASURE | FFTW_DESTROY_INPUT' becomes
4811 `ior(FFTW_MEASURE, FFTW_DESTROY_INPUT)'. (You can also use `+' as
4812 long as you don't try to include a given flag more than once.)
4813
4814
4815 * Menu:
4816
4817 * Extended and quadruple precision in Fortran::
4818
4819 
4820 File: fftw3.info, Node: Extended and quadruple precision in Fortran, Prev: Overview of Fortran interface, Up: Overview of Fortran interface
4821
4822 7.1.1 Extended and quadruple precision in Fortran
4823 -------------------------------------------------
4824
4825 If FFTW is compiled in `long double' (extended) precision (*note
4826 Installation and Customization::), you may be able to call the
4827 resulting `fftwl_' routines (*note Precision::) from Fortran if your
4828 compiler supports the `C_LONG_DOUBLE_COMPLEX' type code.
4829
4830 Because some Fortran compilers do not support
4831 `C_LONG_DOUBLE_COMPLEX', the `fftwl_' declarations are segregated into
4832 a separate interface file `fftw3l.f03', which you should include _in
4833 addition_ to `fftw3.f03' (which declares precision-independent `FFTW_'
4834 constants):
4835
4836 use, intrinsic :: iso_c_binding
4837 include 'fftw3.f03'
4838 include 'fftw3l.f03'
4839
4840 We also support using the nonstandard `__float128'
4841 quadruple-precision type provided by recent versions of `gcc' on 32-
4842 and 64-bit x86 hardware (*note Installation and Customization::), using
4843 the corresponding `real(16)' and `complex(16)' types supported by
4844 `gfortran'. The quadruple-precision `fftwq_' functions (*note
4845 Precision::) are declared in a `fftw3q.f03' interface file, which
4846 should be included in addition to `fftw3l.f03', as above. You should
4847 also link with `-lfftw3q -lquadmath -lm' as in C.
4848
4849 
4850 File: fftw3.info, Node: Reversing array dimensions, Next: FFTW Fortran type reference, Prev: Overview of Fortran interface, Up: Calling FFTW from Modern Fortran
4851
4852 7.2 Reversing array dimensions
4853 ==============================
4854
4855 A minor annoyance in calling FFTW from Fortran is that FFTW's array
4856 dimensions are defined in the C convention (row-major order), while
4857 Fortran's array dimensions are the opposite convention (column-major
4858 order). *Note Multi-dimensional Array Format::. This is just a
4859 bookkeeping difference, with no effect on performance. The only
4860 consequence of this is that, whenever you create an FFTW plan for a
4861 multi-dimensional transform, you must always _reverse the ordering of
4862 the dimensions_.
4863
4864 For example, consider the three-dimensional (L x M x N ) arrays:
4865
4866 complex(C_DOUBLE_COMPLEX), dimension(L,M,N) :: in, out
4867
4868 To plan a DFT for these arrays using `fftw_plan_dft_3d', you could
4869 do:
4870
4871 plan = fftw_plan_dft_3d(N,M,L, in,out, FFTW_FORWARD,FFTW_ESTIMATE)
4872
4873 That is, from FFTW's perspective this is a N x M x L array. _No
4874 data transposition need occur_, as this is _only notation_. Similarly,
4875 to use the more generic routine `fftw_plan_dft' with the same arrays,
4876 you could do:
4877
4878 integer(C_INT), dimension(3) :: n = [N,M,L]
4879 plan = fftw_plan_dft_3d(3, n, in,out, FFTW_FORWARD,FFTW_ESTIMATE)
4880
4881 Note, by the way, that this is different from the legacy Fortran
4882 interface (*note Fortran-interface routines::), which automatically
4883 reverses the order of the array dimension for you. Here, you are
4884 calling the C interface directly, so there is no "translation" layer.
4885
4886 An important thing to keep in mind is the implication of this for
4887 multidimensional real-to-complex transforms (*note Multi-Dimensional
4888 DFTs of Real Data::). In C, a multidimensional real-to-complex DFT
4889 chops the last dimension roughly in half (N x M x L real input goes to
4890 N x M x L/2+1 complex output). In Fortran, because the array
4891 dimension notation is reversed, the _first_ dimension of the complex
4892 data is chopped roughly in half. For example consider the `r2c'
4893 transform of L x M x N real input in Fortran:
4894
4895 type(C_PTR) :: plan
4896 real(C_DOUBLE), dimension(L,M,N) :: in
4897 complex(C_DOUBLE_COMPLEX), dimension(L/2+1,M,N) :: out
4898 plan = fftw_plan_dft_r2c_3d(N,M,L, in,out, FFTW_ESTIMATE)
4899 ...
4900 call fftw_execute_dft_r2c(plan, in, out)
4901
4902 Alternatively, for an in-place r2c transform, as described in the C
4903 documentation we must _pad_ the _first_ dimension of the real input
4904 with an extra two entries (which are ignored by FFTW) so as to leave
4905 enough space for the complex output. The input is _allocated_ as a
4906 2[L/2+1] x M x N array, even though only L x M x N of it is actually
4907 used. In this example, we will allocate the array as a pointer type,
4908 using `fftw_alloc' to ensure aligned memory for maximum performance
4909 (*note Allocating aligned memory in Fortran::); this also makes it easy
4910 to reference the same memory as both a real array and a complex array.
4911
4912 real(C_DOUBLE), pointer :: in(:,:,:)
4913 complex(C_DOUBLE_COMPLEX), pointer :: out(:,:,:)
4914 type(C_PTR) :: plan, data
4915 data = fftw_alloc_complex(int((L/2+1) * M * N, C_SIZE_T))
4916 call c_f_pointer(data, in, [2*(L/2+1),M,N])
4917 call c_f_pointer(data, out, [L/2+1,M,N])
4918 plan = fftw_plan_dft_r2c_3d(N,M,L, in,out, FFTW_ESTIMATE)
4919 ...
4920 call fftw_execute_dft_r2c(plan, in, out)
4921 ...
4922 call fftw_destroy_plan(plan)
4923 call fftw_free(data)
4924
4925 
4926 File: fftw3.info, Node: FFTW Fortran type reference, Next: Plan execution in Fortran, Prev: Reversing array dimensions, Up: Calling FFTW from Modern Fortran
4927
4928 7.3 FFTW Fortran type reference
4929 ===============================
4930
4931 The following are the most important type correspondences between the C
4932 interface and Fortran:
4933
4934 * Plans (`fftw_plan' and variants) are `type(C_PTR)' (i.e. an opaque
4935 pointer).
4936
4937 * The C floating-point types `double', `float', and `long double'
4938 correspond to `real(C_DOUBLE)', `real(C_FLOAT)', and
4939 `real(C_LONG_DOUBLE)', respectively. The C complex types
4940 `fftw_complex', `fftwf_complex', and `fftwl_complex' correspond in
4941 Fortran to `complex(C_DOUBLE_COMPLEX)',
4942 `complex(C_FLOAT_COMPLEX)', and `complex(C_LONG_DOUBLE_COMPLEX)',
4943 respectively. Just as in C (*note Precision::), the FFTW
4944 subroutines and types are prefixed with `fftw_', `fftwf_', and
4945 `fftwl_' for the different precisions, and link to different
4946 libraries (`-lfftw3', `-lfftw3f', and `-lfftw3l' on Unix), but use
4947 the _same_ include file `fftw3.f03' and the _same_ constants (all
4948 of which begin with `FFTW_'). The exception is `long double'
4949 precision, for which you should _also_ include `fftw3l.f03' (*note
4950 Extended and quadruple precision in Fortran::).
4951
4952 * The C integer types `int' and `unsigned' (used for planner flags)
4953 become `integer(C_INT)'. The C integer type `ptrdiff_t' (e.g. in
4954 the *note 64-bit Guru Interface::) becomes `integer(C_INTPTR_T)',
4955 and `size_t' (in `fftw_malloc' etc.) becomes `integer(C_SIZE_T)'.
4956
4957 * The `fftw_r2r_kind' type (*note Real-to-Real Transform Kinds::)
4958 becomes `integer(C_FFTW_R2R_KIND)'. The various constant values
4959 of the C enumerated type (`FFTW_R2HC' etc.) become simply integer
4960 constants of the same names in Fortran.
4961
4962 * Numeric array pointer arguments (e.g. `double *') become
4963 `dimension(*), intent(out)' arrays of the same type, or
4964 `dimension(*), intent(in)' if they are pointers to constant data
4965 (e.g. `const int *'). There are a few exceptions where numeric
4966 pointers refer to scalar outputs (e.g. for `fftw_flops'), in which
4967 case they are `intent(out)' scalar arguments in Fortran too. For
4968 the new-array execute functions (*note New-array Execute
4969 Functions::), the input arrays are declared `dimension(*),
4970 intent(inout)', since they can be modified in the case of in-place
4971 or `FFTW_DESTROY_INPUT' transforms.
4972
4973 * Pointer _return_ values (e.g `double *') become `type(C_PTR)'.
4974 (If they are pointers to arrays, as for `fftw_alloc_real', you can
4975 convert them back to Fortran array pointers with the standard
4976 intrinsic function `c_f_pointer'.)
4977
4978 * The `fftw_iodim' type in the guru interface (*note Guru vector and
4979 transform sizes::) becomes `type(fftw_iodim)' in Fortran, a
4980 derived data type (the Fortran analogue of C's `struct') with
4981 three `integer(C_INT)' components: `n', `is', and `os', with the
4982 same meanings as in C. The `fftw_iodim64' type in the 64-bit guru
4983 interface (*note 64-bit Guru Interface::) is the same, except that
4984 its components are of type `integer(C_INTPTR_T)'.
4985
4986 * Using the wisdom import/export functions from Fortran is a bit
4987 tricky, and is discussed in *note Accessing the wisdom API from
4988 Fortran::. In brief, the `FILE *' arguments map to `type(C_PTR)',
4989 `const char *' to `character(C_CHAR), dimension(*), intent(in)'
4990 (null-terminated!), and the generic read-char/write-char functions
4991 map to `type(C_FUNPTR)'.
4992
4993
4994 You may be wondering if you need to search-and-replace
4995 `real(kind(0.0d0))' (or whatever your favorite Fortran spelling of
4996 "double precision" is) with `real(C_DOUBLE)' everywhere in your
4997 program, and similarly for `complex' and `integer' types. The answer
4998 is no; you can still use your existing types. As long as these types
4999 match their C counterparts, things should work without a hitch. The
5000 worst that can happen, e.g. in the (unlikely) event of a system where
5001 `real(kind(0.0d0))' is different from `real(C_DOUBLE)', is that the
5002 compiler will give you a type-mismatch error. That is, if you don't
5003 use the `iso_c_binding' kinds you need to accept at least the
5004 theoretical possibility of having to change your code in response to
5005 compiler errors on some future machine, but you don't need to worry
5006 about silently compiling incorrect code that yields runtime errors.
5007
5008 
5009 File: fftw3.info, Node: Plan execution in Fortran, Next: Allocating aligned memory in Fortran, Prev: FFTW Fortran type reference, Up: Calling FFTW from Modern Fortran
5010
5011 7.4 Plan execution in Fortran
5012 =============================
5013
5014 In C, in order to use a plan, one normally calls `fftw_execute', which
5015 executes the plan to perform the transform on the input/output arrays
5016 passed when the plan was created (*note Using Plans::). The
5017 corresponding subroutine call in modern Fortran is:
5018 call fftw_execute(plan)
5019
5020 However, we have had reports that this causes problems with some
5021 recent optimizing Fortran compilers. The problem is, because the
5022 input/output arrays are not passed as explicit arguments to
5023 `fftw_execute', the semantics of Fortran (unlike C) allow the compiler
5024 to assume that the input/output arrays are not changed by
5025 `fftw_execute'. As a consequence, certain compilers end up
5026 repositioning the call to `fftw_execute', assuming incorrectly that it
5027 does nothing to the arrays.
5028
5029 There are various workarounds to this, but the safest and simplest
5030 thing is to not use `fftw_execute' in Fortran. Instead, use the
5031 functions described in *note New-array Execute Functions::, which take
5032 the input/output arrays as explicit arguments. For example, if the
5033 plan is for a complex-data DFT and was created for the arrays `in' and
5034 `out', you would do:
5035 call fftw_execute_dft(plan, in, out)
5036
5037 There are a few things to be careful of, however:
5038
5039 * You must use the correct type of execute function, matching the way
5040 the plan was created. Complex DFT plans should use
5041 `fftw_execute_dft', Real-input (r2c) DFT plans should use use
5042 `fftw_execute_dft_r2c', and real-output (c2r) DFT plans should use
5043 `fftw_execute_dft_c2r'. The various r2r plans should use
5044 `fftw_execute_r2r'. Fortunately, if you use the wrong one you
5045 will get a compile-time type-mismatch error (unlike legacy
5046 Fortran).
5047
5048 * You should normally pass the same input/output arrays that were
5049 used when creating the plan. This is always safe.
5050
5051 * _If_ you pass _different_ input/output arrays compared to those
5052 used when creating the plan, you must abide by all the
5053 restrictions of the new-array execute functions (*note New-array
5054 Execute Functions::). The most tricky of these is the requirement
5055 that the new arrays have the same alignment as the original
5056 arrays; the best (and possibly only) way to guarantee this is to
5057 use the `fftw_alloc' functions to allocate your arrays (*note
5058 Allocating aligned memory in Fortran::). Alternatively, you can
5059 use the `FFTW_UNALIGNED' flag when creating the plan, in which
5060 case the plan does not depend on the alignment, but this may
5061 sacrifice substantial performance on architectures (like x86) with
5062 SIMD instructions (*note SIMD alignment and fftw_malloc::).
5063
5064
5065 
5066 File: fftw3.info, Node: Allocating aligned memory in Fortran, Next: Accessing the wisdom API from Fortran, Prev: Plan execution in Fortran, Up: Calling FFTW from Modern Fortran
5067
5068 7.5 Allocating aligned memory in Fortran
5069 ========================================
5070
5071 In order to obtain maximum performance in FFTW, you should store your
5072 data in arrays that have been specially aligned in memory (*note SIMD
5073 alignment and fftw_malloc::). Enforcing alignment also permits you to
5074 safely use the new-array execute functions (*note New-array Execute
5075 Functions::) to apply a given plan to more than one pair of in/out
5076 arrays. Unfortunately, standard Fortran arrays do _not_ provide any
5077 alignment guarantees. The _only_ way to allocate aligned memory in
5078 standard Fortran is to allocate it with an external C function, like
5079 the `fftw_alloc_real' and `fftw_alloc_complex' functions. Fortunately,
5080 Fortran 2003 provides a simple way to associate such allocated memory
5081 with a standard Fortran array pointer that you can then use normally.
5082
5083 We therefore recommend allocating all your input/output arrays using
5084 the following technique:
5085
5086 1. Declare a `pointer', `arr', to your array of the desired type and
5087 dimensions. For example, `real(C_DOUBLE), pointer :: a(:,:)' for
5088 a 2d real array, or `complex(C_DOUBLE_COMPLEX), pointer ::
5089 a(:,:,:)' for a 3d complex array.
5090
5091 2. The number of elements to allocate must be an `integer(C_SIZE_T)'.
5092 You can either declare a variable of this type, e.g.
5093 `integer(C_SIZE_T) :: sz', to store the number of elements to
5094 allocate, or you can use the `int(..., C_SIZE_T)' intrinsic
5095 function. e.g. set `sz = L * M * N' or use `int(L * M * N,
5096 C_SIZE_T)' for an L x M x N array.
5097
5098 3. Declare a `type(C_PTR) :: p' to hold the return value from FFTW's
5099 allocation routine. Set `p = fftw_alloc_real(sz)' for a real
5100 array, or `p = fftw_alloc_complex(sz)' for a complex array.
5101
5102 4. Associate your pointer `arr' with the allocated memory `p' using
5103 the standard `c_f_pointer' subroutine: `call c_f_pointer(p, arr,
5104 [...dimensions...])', where `[...dimensions...])' are an array of
5105 the dimensions of the array (in the usual Fortran order). e.g.
5106 `call c_f_pointer(p, arr, [L,M,N])' for an L x M x N array.
5107 (Alternatively, you can omit the dimensions argument if you
5108 specified the shape explicitly when declaring `arr'.) You can now
5109 use `arr' as a usual multidimensional array.
5110
5111 5. When you are done using the array, deallocate the memory by `call
5112 fftw_free(p)' on `p'.
5113
5114
5115 For example, here is how we would allocate an L x M 2d real array:
5116
5117 real(C_DOUBLE), pointer :: arr(:,:)
5118 type(C_PTR) :: p
5119 p = fftw_alloc_real(int(L * M, C_SIZE_T))
5120 call c_f_pointer(p, arr, [L,M])
5121 _...use arr and arr(i,j) as usual..._
5122 call fftw_free(p)
5123
5124 and here is an L x M x N 3d complex array:
5125
5126 complex(C_DOUBLE_COMPLEX), pointer :: arr(:,:,:)
5127 type(C_PTR) :: p
5128 p = fftw_alloc_complex(int(L * M * N, C_SIZE_T))
5129 call c_f_pointer(p, arr, [L,M,N])
5130 _...use arr and arr(i,j,k) as usual..._
5131 call fftw_free(p)
5132
5133 See *note Reversing array dimensions:: for an example allocating a
5134 single array and associating both real and complex array pointers with
5135 it, for in-place real-to-complex transforms.
5136
5137 
5138 File: fftw3.info, Node: Accessing the wisdom API from Fortran, Next: Defining an FFTW module, Prev: Allocating aligned memory in Fortran, Up: Calling FFTW from Modern Fortran
5139
5140 7.6 Accessing the wisdom API from Fortran
5141 =========================================
5142
5143 As explained in *note Words of Wisdom-Saving Plans::, FFTW provides a
5144 "wisdom" API for saving plans to disk so that they can be recreated
5145 quickly. The C API for exporting (*note Wisdom Export::) and importing
5146 (*note Wisdom Import::) wisdom is somewhat tricky to use from Fortran,
5147 however, because of differences in file I/O and string types between C
5148 and Fortran.
5149
5150 * Menu:
5151
5152 * Wisdom File Export/Import from Fortran::
5153 * Wisdom String Export/Import from Fortran::
5154 * Wisdom Generic Export/Import from Fortran::
5155
5156 
5157 File: fftw3.info, Node: Wisdom File Export/Import from Fortran, Next: Wisdom String Export/Import from Fortran, Prev: Accessing the wisdom API from Fortran, Up: Accessing the wisdom API from Fortran
5158
5159 7.6.1 Wisdom File Export/Import from Fortran
5160 --------------------------------------------
5161
5162 The easiest way to export and import wisdom is to do so using
5163 `fftw_export_wisdom_to_filename' and `fftw_wisdom_from_filename'. The
5164 only trick is that these require you to pass a C string, which is an
5165 array of type `CHARACTER(C_CHAR)' that is terminated by `C_NULL_CHAR'.
5166 You can call them like this:
5167
5168 integer(C_INT) :: ret
5169 ret = fftw_export_wisdom_to_filename(C_CHAR_'my_wisdom.dat' // C_NULL_CHAR)
5170 if (ret .eq. 0) stop 'error exporting wisdom to file'
5171 ret = fftw_import_wisdom_from_filename(C_CHAR_'my_wisdom.dat' // C_NULL_CHAR)
5172 if (ret .eq. 0) stop 'error importing wisdom from file'
5173
5174 Note that prepending `C_CHAR_' is needed to specify that the literal
5175 string is of kind `C_CHAR', and we null-terminate the string by
5176 appending `// C_NULL_CHAR'. These functions return an `integer(C_INT)'
5177 (`ret') which is `0' if an error occurred during export/import and
5178 nonzero otherwise.
5179
5180 It is also possible to use the lower-level routines
5181 `fftw_export_wisdom_to_file' and `fftw_import_wisdom_from_file', which
5182 accept parameters of the C type `FILE*', expressed in Fortran as
5183 `type(C_PTR)'. However, you are then responsible for creating the
5184 `FILE*' yourself. You can do this by using `iso_c_binding' to define
5185 Fortran intefaces for the C library functions `fopen' and `fclose',
5186 which is a bit strange in Fortran but workable.
5187
5188 
5189 File: fftw3.info, Node: Wisdom String Export/Import from Fortran, Next: Wisdom Generic Export/Import from Fortran, Prev: Wisdom File Export/Import from Fortran, Up: Accessing the wisdom API from Fortran
5190
5191 7.6.2 Wisdom String Export/Import from Fortran
5192 ----------------------------------------------
5193
5194 Dealing with FFTW's C string export/import is a bit more painful. In
5195 particular, the `fftw_export_wisdom_to_string' function requires you to
5196 deal with a dynamically allocated C string. To get its length, you
5197 must define an interface to the C `strlen' function, and to deallocate
5198 it you must define an interface to C `free':
5199
5200 use, intrinsic :: iso_c_binding
5201 interface
5202 integer(C_INT) function strlen(s) bind(C, name='strlen')
5203 import
5204 type(C_PTR), value :: s
5205 end function strlen
5206 subroutine free(p) bind(C, name='free')
5207 import
5208 type(C_PTR), value :: p
5209 end subroutine free
5210 end interface
5211
5212 Given these definitions, you can then export wisdom to a Fortran
5213 character array:
5214
5215 character(C_CHAR), pointer :: s(:)
5216 integer(C_SIZE_T) :: slen
5217 type(C_PTR) :: p
5218 p = fftw_export_wisdom_to_string()
5219 if (.not. c_associated(p)) stop 'error exporting wisdom'
5220 slen = strlen(p)
5221 call c_f_pointer(p, s, [slen+1])
5222 ...
5223 call free(p)
5224
5225 Note that `slen' is the length of the C string, but the length of
5226 the array is `slen+1' because it includes the terminating null
5227 character. (You can omit the `+1' if you don't want Fortran to know
5228 about the null character.) The standard `c_associated' function checks
5229 whether `p' is a null pointer, which is returned by
5230 `fftw_export_wisdom_to_string' if there was an error.
5231
5232 To import wisdom from a string, use `fftw_import_wisdom_from_string'
5233 as usual; note that the argument of this function must be a
5234 `character(C_CHAR)' that is terminated by the `C_NULL_CHAR' character,
5235 like the `s' array above.
5236
5237 
5238 File: fftw3.info, Node: Wisdom Generic Export/Import from Fortran, Prev: Wisdom String Export/Import from Fortran, Up: Accessing the wisdom API from Fortran
5239
5240 7.6.3 Wisdom Generic Export/Import from Fortran
5241 -----------------------------------------------
5242
5243 The most generic wisdom export/import functions allow you to provide an
5244 arbitrary callback function to read/write one character at a time in
5245 any way you want. However, your callback function must be written in a
5246 special way, using the `bind(C)' attribute to be passed to a C
5247 interface.
5248
5249 In particular, to call the generic wisdom export function
5250 `fftw_export_wisdom', you would write a callback subroutine of the form:
5251
5252 subroutine my_write_char(c, p) bind(C)
5253 use, intrinsic :: iso_c_binding
5254 character(C_CHAR), value :: c
5255 type(C_PTR), value :: p
5256 _...write c..._
5257 end subroutine my_write_char
5258
5259 Given such a subroutine (along with the corresponding interface
5260 definition), you could then export wisdom using:
5261
5262 call fftw_export_wisdom(c_funloc(my_write_char), p)
5263
5264 The standard `c_funloc' intrinsic converts a Fortran `bind(C)'
5265 subroutine into a C function pointer. The parameter `p' is a
5266 `type(C_PTR)' to any arbitrary data that you want to pass to
5267 `my_write_char' (or `C_NULL_PTR' if none). (Note that you can get a C
5268 pointer to Fortran data using the intrinsic `c_loc', and convert it
5269 back to a Fortran pointer in `my_write_char' using `c_f_pointer'.)
5270
5271 Similarly, to use the generic `fftw_import_wisdom', you would define
5272 a callback function of the form:
5273
5274 integer(C_INT) function my_read_char(p) bind(C)
5275 use, intrinsic :: iso_c_binding
5276 type(C_PTR), value :: p
5277 character :: c
5278 _...read a character c..._
5279 my_read_char = ichar(c, C_INT)
5280 end function my_read_char
5281
5282 ....
5283
5284 integer(C_INT) :: ret
5285 ret = fftw_import_wisdom(c_funloc(my_read_char), p)
5286 if (ret .eq. 0) stop 'error importing wisdom'
5287
5288 Your function can return `-1' if the end of the input is reached.
5289 Again, `p' is an arbitrary `type(C_PTR' that is passed through to your
5290 function. `fftw_import_wisdom' returns `0' if an error occurred and
5291 nonzero otherwise.
5292
5293 
5294 File: fftw3.info, Node: Defining an FFTW module, Prev: Accessing the wisdom API from Fortran, Up: Calling FFTW from Modern Fortran
5295
5296 7.7 Defining an FFTW module
5297 ===========================
5298
5299 Rather than using the `include' statement to include the `fftw3.f03'
5300 interface file in any subroutine where you want to use FFTW, you might
5301 prefer to define an FFTW Fortran module. FFTW does not install itself
5302 as a module, primarily because `fftw3.f03' can be shared between
5303 different Fortran compilers while modules (in general) cannot.
5304 However, it is trivial to define your own FFTW module if you want.
5305 Just create a file containing:
5306
5307 module FFTW3
5308 use, intrinsic :: iso_c_binding
5309 include 'fftw3.f03'
5310 end module
5311
5312 Compile this file into a module as usual for your compiler (e.g. with
5313 `gfortran -c' you will get a file `fftw3.mod'). Now, instead of
5314 `include 'fftw3.f03'', whenever you want to use FFTW routines you can
5315 just do:
5316
5317 use FFTW3
5318
5319 as usual for Fortran modules. (You still need to link to the FFTW
5320 library, of course.)
5321
5322 
5323 File: fftw3.info, Node: Calling FFTW from Legacy Fortran, Next: Upgrading from FFTW version 2, Prev: Calling FFTW from Modern Fortran, Up: Top
5324
5325 8 Calling FFTW from Legacy Fortran
5326 **********************************
5327
5328 This chapter describes the interface to FFTW callable by Fortran code
5329 in older compilers not supporting the Fortran 2003 C interoperability
5330 features (*note Calling FFTW from Modern Fortran::). This interface
5331 has the major disadvantage that it is not type-checked, so if you
5332 mistake the argument types or ordering then your program will not have
5333 any compiler errors, and will likely crash at runtime. So, greater
5334 care is needed. Also, technically interfacing older Fortran versions
5335 to C is nonstandard, but in practice we have found that the techniques
5336 used in this chapter have worked with all known Fortran compilers for
5337 many years.
5338
5339 The legacy Fortran interface differs from the C interface only in the
5340 prefix (`dfftw_' instead of `fftw_' in double precision) and a few
5341 other minor details. This Fortran interface is included in the FFTW
5342 libraries by default, unless a Fortran compiler isn't found on your
5343 system or `--disable-fortran' is included in the `configure' flags. We
5344 assume here that the reader is already familiar with the usage of FFTW
5345 in C, as described elsewhere in this manual.
5346
5347 The MPI parallel interface to FFTW is _not_ currently available to
5348 legacy Fortran.
5349
5350 * Menu:
5351
5352 * Fortran-interface routines::
5353 * FFTW Constants in Fortran::
5354 * FFTW Execution in Fortran::
5355 * Fortran Examples::
5356 * Wisdom of Fortran?::
5357
5358 
5359 File: fftw3.info, Node: Fortran-interface routines, Next: FFTW Constants in Fortran, Prev: Calling FFTW from Legacy Fortran, Up: Calling FFTW from Legacy Fortran
5360
5361 8.1 Fortran-interface routines
5362 ==============================
5363
5364 Nearly all of the FFTW functions have Fortran-callable equivalents.
5365 The name of the legacy Fortran routine is the same as that of the
5366 corresponding C routine, but with the `fftw_' prefix replaced by
5367 `dfftw_'.(1) The single and long-double precision versions use
5368 `sfftw_' and `lfftw_', respectively, instead of `fftwf_' and `fftwl_';
5369 quadruple precision (`real*16') is available on some systems as
5370 `fftwq_' (*note Precision::). (Note that `long double' on x86 hardware
5371 is usually at most 80-bit extended precision, _not_ quadruple
5372 precision.)
5373
5374 For the most part, all of the arguments to the functions are the
5375 same, with the following exceptions:
5376
5377 * `plan' variables (what would be of type `fftw_plan' in C), must be
5378 declared as a type that is at least as big as a pointer (address)
5379 on your machine. We recommend using `integer*8' everywhere, since
5380 this should always be big enough.
5381
5382 * Any function that returns a value (e.g. `fftw_plan_dft') is
5383 converted into a _subroutine_. The return value is converted into
5384 an additional _first_ parameter of this subroutine.(2)
5385
5386 * The Fortran routines expect multi-dimensional arrays to be in
5387 _column-major_ order, which is the ordinary format of Fortran
5388 arrays (*note Multi-dimensional Array Format::). They do this
5389 transparently and costlessly simply by reversing the order of the
5390 dimensions passed to FFTW, but this has one important consequence
5391 for multi-dimensional real-complex transforms, discussed below.
5392
5393 * Wisdom import and export is somewhat more tricky because one cannot
5394 easily pass files or strings between C and Fortran; see *note
5395 Wisdom of Fortran?::.
5396
5397 * Legacy Fortran cannot use the `fftw_malloc' dynamic-allocation
5398 routine. If you want to exploit the SIMD FFTW (*note SIMD
5399 alignment and fftw_malloc::), you'll need to figure out some other
5400 way to ensure that your arrays are at least 16-byte aligned.
5401
5402 * Since Fortran 77 does not have data structures, the `fftw_iodim'
5403 structure from the guru interface (*note Guru vector and transform
5404 sizes::) must be split into separate arguments. In particular, any
5405 `fftw_iodim' array arguments in the C guru interface become three
5406 integer array arguments (`n', `is', and `os') in the Fortran guru
5407 interface, all of whose lengths should be equal to the
5408 corresponding `rank' argument.
5409
5410 * The guru planner interface in Fortran does _not_ do any automatic
5411 translation between column-major and row-major; you are responsible
5412 for setting the strides etcetera to correspond to your Fortran
5413 arrays. However, as a slight bug that we are preserving for
5414 backwards compatibility, the `plan_guru_r2r' in Fortran _does_
5415 reverse the order of its `kind' array parameter, so the `kind'
5416 array of that routine should be in the reverse of the order of the
5417 iodim arrays (see above).
5418
5419
5420 In general, you should take care to use Fortran data types that
5421 correspond to (i.e. are the same size as) the C types used by FFTW. In
5422 practice, this correspondence is usually straightforward (i.e.
5423 `integer' corresponds to `int', `real' corresponds to `float',
5424 etcetera). The native Fortran double/single-precision complex type
5425 should be compatible with `fftw_complex'/`fftwf_complex'. Such simple
5426 correspondences are assumed in the examples below.
5427
5428 ---------- Footnotes ----------
5429
5430 (1) Technically, Fortran 77 identifiers are not allowed to have more
5431 than 6 characters, nor may they contain underscores. Any compiler that
5432 enforces this limitation doesn't deserve to link to FFTW.
5433
5434 (2) The reason for this is that some Fortran implementations seem to
5435 have trouble with C function return values, and vice versa.
5436
5437 
5438 File: fftw3.info, Node: FFTW Constants in Fortran, Next: FFTW Execution in Fortran, Prev: Fortran-interface routines, Up: Calling FFTW from Legacy Fortran
5439
5440 8.2 FFTW Constants in Fortran
5441 =============================
5442
5443 When creating plans in FFTW, a number of constants are used to specify
5444 options, such as `FFTW_MEASURE' or `FFTW_ESTIMATE'. The same constants
5445 must be used with the wrapper routines, but of course the C header
5446 files where the constants are defined can't be incorporated directly
5447 into Fortran code.
5448
5449 Instead, we have placed Fortran equivalents of the FFTW constant
5450 definitions in the file `fftw3.f', which can be found in the same
5451 directory as `fftw3.h'. If your Fortran compiler supports a
5452 preprocessor of some sort, you should be able to `include' or
5453 `#include' this file; otherwise, you can paste it directly into your
5454 code.
5455
5456 In C, you combine different flags (like `FFTW_PRESERVE_INPUT' and
5457 `FFTW_MEASURE') using the ``|'' operator; in Fortran you should just
5458 use ``+''. (Take care not to add in the same flag more than once,
5459 though. Alternatively, you can use the `ior' intrinsic function
5460 standardized in Fortran 95.)
5461
5462 
5463 File: fftw3.info, Node: FFTW Execution in Fortran, Next: Fortran Examples, Prev: FFTW Constants in Fortran, Up: Calling FFTW from Legacy Fortran
5464
5465 8.3 FFTW Execution in Fortran
5466 =============================
5467
5468 In C, in order to use a plan, one normally calls `fftw_execute', which
5469 executes the plan to perform the transform on the input/output arrays
5470 passed when the plan was created (*note Using Plans::). The
5471 corresponding subroutine call in legacy Fortran is:
5472 call dfftw_execute(plan)
5473
5474 However, we have had reports that this causes problems with some
5475 recent optimizing Fortran compilers. The problem is, because the
5476 input/output arrays are not passed as explicit arguments to
5477 `dfftw_execute', the semantics of Fortran (unlike C) allow the compiler
5478 to assume that the input/output arrays are not changed by
5479 `dfftw_execute'. As a consequence, certain compilers end up optimizing
5480 out or repositioning the call to `dfftw_execute', assuming incorrectly
5481 that it does nothing.
5482
5483 There are various workarounds to this, but the safest and simplest
5484 thing is to not use `dfftw_execute' in Fortran. Instead, use the
5485 functions described in *note New-array Execute Functions::, which take
5486 the input/output arrays as explicit arguments. For example, if the
5487 plan is for a complex-data DFT and was created for the arrays `in' and
5488 `out', you would do:
5489 call dfftw_execute_dft(plan, in, out)
5490
5491 There are a few things to be careful of, however:
5492
5493 * You must use the correct type of execute function, matching the way
5494 the plan was created. Complex DFT plans should use
5495 `dfftw_execute_dft', Real-input (r2c) DFT plans should use use
5496 `dfftw_execute_dft_r2c', and real-output (c2r) DFT plans should
5497 use `dfftw_execute_dft_c2r'. The various r2r plans should use
5498 `dfftw_execute_r2r'.
5499
5500 * You should normally pass the same input/output arrays that were
5501 used when creating the plan. This is always safe.
5502
5503 * _If_ you pass _different_ input/output arrays compared to those
5504 used when creating the plan, you must abide by all the
5505 restrictions of the new-array execute functions (*note New-array
5506 Execute Functions::). The most difficult of these, in Fortran, is
5507 the requirement that the new arrays have the same alignment as the
5508 original arrays, because there seems to be no way in legacy
5509 Fortran to obtain guaranteed-aligned arrays (analogous to
5510 `fftw_malloc' in C). You can, of course, use the `FFTW_UNALIGNED'
5511 flag when creating the plan, in which case the plan does not
5512 depend on the alignment, but this may sacrifice substantial
5513 performance on architectures (like x86) with SIMD instructions
5514 (*note SIMD alignment and fftw_malloc::).
5515
5516
5517 
5518 File: fftw3.info, Node: Fortran Examples, Next: Wisdom of Fortran?, Prev: FFTW Execution in Fortran, Up: Calling FFTW from Legacy Fortran
5519
5520 8.4 Fortran Examples
5521 ====================
5522
5523 In C, you might have something like the following to transform a
5524 one-dimensional complex array:
5525
5526 fftw_complex in[N], out[N];
5527 fftw_plan plan;
5528
5529 plan = fftw_plan_dft_1d(N,in,out,FFTW_FORWARD,FFTW_ESTIMATE);
5530 fftw_execute(plan);
5531 fftw_destroy_plan(plan);
5532
5533 In Fortran, you would use the following to accomplish the same thing:
5534
5535 double complex in, out
5536 dimension in(N), out(N)
5537 integer*8 plan
5538
5539 call dfftw_plan_dft_1d(plan,N,in,out,FFTW_FORWARD,FFTW_ESTIMATE)
5540 call dfftw_execute_dft(plan, in, out)
5541 call dfftw_destroy_plan(plan)
5542
5543 Notice how all routines are called as Fortran subroutines, and the
5544 plan is returned via the first argument to `dfftw_plan_dft_1d'. Notice
5545 also that we changed `fftw_execute' to `dfftw_execute_dft' (*note FFTW
5546 Execution in Fortran::). To do the same thing, but using 8 threads in
5547 parallel (*note Multi-threaded FFTW::), you would simply prefix these
5548 calls with:
5549
5550 integer iret
5551 call dfftw_init_threads(iret)
5552 call dfftw_plan_with_nthreads(8)
5553
5554 (You might want to check the value of `iret': if it is zero, it
5555 indicates an unlikely error during thread initialization.)
5556
5557 To transform a three-dimensional array in-place with C, you might do:
5558
5559 fftw_complex arr[L][M][N];
5560 fftw_plan plan;
5561
5562 plan = fftw_plan_dft_3d(L,M,N, arr,arr,
5563 FFTW_FORWARD, FFTW_ESTIMATE);
5564 fftw_execute(plan);
5565 fftw_destroy_plan(plan);
5566
5567 In Fortran, you would use this instead:
5568
5569 double complex arr
5570 dimension arr(L,M,N)
5571 integer*8 plan
5572
5573 call dfftw_plan_dft_3d(plan, L,M,N, arr,arr,
5574 & FFTW_FORWARD, FFTW_ESTIMATE)
5575 call dfftw_execute_dft(plan, arr, arr)
5576 call dfftw_destroy_plan(plan)
5577
5578 Note that we pass the array dimensions in the "natural" order in
5579 both C and Fortran.
5580
5581 To transform a one-dimensional real array in Fortran, you might do:
5582
5583 double precision in
5584 dimension in(N)
5585 double complex out
5586 dimension out(N/2 + 1)
5587 integer*8 plan
5588
5589 call dfftw_plan_dft_r2c_1d(plan,N,in,out,FFTW_ESTIMATE)
5590 call dfftw_execute_dft_r2c(plan, in, out)
5591 call dfftw_destroy_plan(plan)
5592
5593 To transform a two-dimensional real array, out of place, you might
5594 use the following:
5595
5596 double precision in
5597 dimension in(M,N)
5598 double complex out
5599 dimension out(M/2 + 1, N)
5600 integer*8 plan
5601
5602 call dfftw_plan_dft_r2c_2d(plan,M,N,in,out,FFTW_ESTIMATE)
5603 call dfftw_execute_dft_r2c(plan, in, out)
5604 call dfftw_destroy_plan(plan)
5605
5606 *Important:* Notice that it is the _first_ dimension of the complex
5607 output array that is cut in half in Fortran, rather than the last
5608 dimension as in C. This is a consequence of the interface routines
5609 reversing the order of the array dimensions passed to FFTW so that the
5610 Fortran program can use its ordinary column-major order.
5611
5612 
5613 File: fftw3.info, Node: Wisdom of Fortran?, Prev: Fortran Examples, Up: Calling FFTW from Legacy Fortran
5614
5615 8.5 Wisdom of Fortran?
5616 ======================
5617
5618 In this section, we discuss how one can import/export FFTW wisdom
5619 (saved plans) to/from a Fortran program; we assume that the reader is
5620 already familiar with wisdom, as described in *note Words of
5621 Wisdom-Saving Plans::.
5622
5623 The basic problem is that is difficult to (portably) pass files and
5624 strings between Fortran and C, so we cannot provide a direct Fortran
5625 equivalent to the `fftw_export_wisdom_to_file', etcetera, functions.
5626 Fortran interfaces _are_ provided for the functions that do not take
5627 file/string arguments, however: `dfftw_import_system_wisdom',
5628 `dfftw_import_wisdom', `dfftw_export_wisdom', and `dfftw_forget_wisdom'.
5629
5630 So, for example, to import the system-wide wisdom, you would do:
5631
5632 integer isuccess
5633 call dfftw_import_system_wisdom(isuccess)
5634
5635 As usual, the C return value is turned into a first parameter;
5636 `isuccess' is non-zero on success and zero on failure (e.g. if there is
5637 no system wisdom installed).
5638
5639 If you want to import/export wisdom from/to an arbitrary file or
5640 elsewhere, you can employ the generic `dfftw_import_wisdom' and
5641 `dfftw_export_wisdom' functions, for which you must supply a subroutine
5642 to read/write one character at a time. The FFTW package contains an
5643 example file `doc/f77_wisdom.f' demonstrating how to implement
5644 `import_wisdom_from_file' and `export_wisdom_to_file' subroutines in
5645 this way. (These routines cannot be compiled into the FFTW library
5646 itself, lest all FFTW-using programs be required to link with the
5647 Fortran I/O library.)
5648
5649 
5650 File: fftw3.info, Node: Upgrading from FFTW version 2, Next: Installation and Customization, Prev: Calling FFTW from Legacy Fortran, Up: Top
5651
5652 9 Upgrading from FFTW version 2
5653 *******************************
5654
5655 In this chapter, we outline the process for updating codes designed for
5656 the older FFTW 2 interface to work with FFTW 3. The interface for FFTW
5657 3 is not backwards-compatible with the interface for FFTW 2 and earlier
5658 versions; codes written to use those versions will fail to link with
5659 FFTW 3. Nor is it possible to write "compatibility wrappers" to bridge
5660 the gap (at least not efficiently), because FFTW 3 has different
5661 semantics from previous versions. However, upgrading should be a
5662 straightforward process because the data formats are identical and the
5663 overall style of planning/execution is essentially the same.
5664
5665 Unlike FFTW 2, there are no separate header files for real and
5666 complex transforms (or even for different precisions) in FFTW 3; all
5667 interfaces are defined in the `<fftw3.h>' header file.
5668
5669 Numeric Types
5670 =============
5671
5672 The main difference in data types is that `fftw_complex' in FFTW 2 was
5673 defined as a `struct' with macros `c_re' and `c_im' for accessing the
5674 real/imaginary parts. (This is binary-compatible with FFTW 3 on any
5675 machine except perhaps for some older Crays in single precision.) The
5676 equivalent macros for FFTW 3 are:
5677
5678 #define c_re(c) ((c)[0])
5679 #define c_im(c) ((c)[1])
5680
5681 This does not work if you are using the C99 complex type, however,
5682 unless you insert a `double*' typecast into the above macros (*note
5683 Complex numbers::).
5684
5685 Also, FFTW 2 had an `fftw_real' typedef that was an alias for
5686 `double' (in double precision). In FFTW 3 you should just use `double'
5687 (or whatever precision you are employing).
5688
5689 Plans
5690 =====
5691
5692 The major difference between FFTW 2 and FFTW 3 is in the
5693 planning/execution division of labor. In FFTW 2, plans were found for a
5694 given transform size and type, and then could be applied to _any_
5695 arrays and for _any_ multiplicity/stride parameters. In FFTW 3, you
5696 specify the particular arrays, stride parameters, etcetera when
5697 creating the plan, and the plan is then executed for _those_ arrays
5698 (unless the guru interface is used) and _those_ parameters _only_.
5699 (FFTW 2 had "specific planner" routines that planned for a particular
5700 array and stride, but the plan could still be used for other arrays and
5701 strides.) That is, much of the information that was formerly specified
5702 at execution time is now specified at planning time.
5703
5704 Like FFTW 2's specific planner routines, the FFTW 3 planner
5705 overwrites the input/output arrays unless you use `FFTW_ESTIMATE'.
5706
5707 FFTW 2 had separate data types `fftw_plan', `fftwnd_plan',
5708 `rfftw_plan', and `rfftwnd_plan' for complex and real one- and
5709 multi-dimensional transforms, and each type had its own `destroy'
5710 function. In FFTW 3, all plans are of type `fftw_plan' and all are
5711 destroyed by `fftw_destroy_plan(plan)'.
5712
5713 Where you formerly used `fftw_create_plan' and `fftw_one' to plan
5714 and compute a single 1d transform, you would now use `fftw_plan_dft_1d'
5715 to plan the transform. If you used the generic `fftw' function to
5716 execute the transform with multiplicity (`howmany') and stride
5717 parameters, you would now use the advanced interface
5718 `fftw_plan_many_dft' to specify those parameters. The plans are now
5719 executed with `fftw_execute(plan)', which takes all of its parameters
5720 (including the input/output arrays) from the plan.
5721
5722 In-place transforms no longer interpret their output argument as
5723 scratch space, nor is there an `FFTW_IN_PLACE' flag. You simply pass
5724 the same pointer for both the input and output arguments. (Previously,
5725 the output `ostride' and `odist' parameters were ignored for in-place
5726 transforms; now, if they are specified via the advanced interface, they
5727 are significant even in the in-place case, although they should
5728 normally equal the corresponding input parameters.)
5729
5730 The `FFTW_ESTIMATE' and `FFTW_MEASURE' flags have the same meaning
5731 as before, although the planning time will differ. You may also
5732 consider using `FFTW_PATIENT', which is like `FFTW_MEASURE' except that
5733 it takes more time in order to consider a wider variety of algorithms.
5734
5735 For multi-dimensional complex DFTs, instead of `fftwnd_create_plan'
5736 (or `fftw2d_create_plan' or `fftw3d_create_plan'), followed by
5737 `fftwnd_one', you would use `fftw_plan_dft' (or `fftw_plan_dft_2d' or
5738 `fftw_plan_dft_3d'). followed by `fftw_execute'. If you used `fftwnd'
5739 to to specify strides etcetera, you would instead specify these via
5740 `fftw_plan_many_dft'.
5741
5742 The analogues to `rfftw_create_plan' and `rfftw_one' with
5743 `FFTW_REAL_TO_COMPLEX' or `FFTW_COMPLEX_TO_REAL' directions are
5744 `fftw_plan_r2r_1d' with kind `FFTW_R2HC' or `FFTW_HC2R', followed by
5745 `fftw_execute'. The stride etcetera arguments of `rfftw' are now in
5746 `fftw_plan_many_r2r'.
5747
5748 Instead of `rfftwnd_create_plan' (or `rfftw2d_create_plan' or
5749 `rfftw3d_create_plan') followed by `rfftwnd_one_real_to_complex' or
5750 `rfftwnd_one_complex_to_real', you now use `fftw_plan_dft_r2c' (or
5751 `fftw_plan_dft_r2c_2d' or `fftw_plan_dft_r2c_3d') or
5752 `fftw_plan_dft_c2r' (or `fftw_plan_dft_c2r_2d' or
5753 `fftw_plan_dft_c2r_3d'), respectively, followed by `fftw_execute'. As
5754 usual, the strides etcetera of `rfftwnd_real_to_complex' or
5755 `rfftwnd_complex_to_real' are no specified in the advanced planner
5756 routines, `fftw_plan_many_dft_r2c' or `fftw_plan_many_dft_c2r'.
5757
5758 Wisdom
5759 ======
5760
5761 In FFTW 2, you had to supply the `FFTW_USE_WISDOM' flag in order to use
5762 wisdom; in FFTW 3, wisdom is always used. (You could simulate the FFTW
5763 2 wisdom-less behavior by calling `fftw_forget_wisdom' after every
5764 planner call.)
5765
5766 The FFTW 3 wisdom import/export routines are almost the same as
5767 before (although the storage format is entirely different). There is
5768 one significant difference, however. In FFTW 2, the import routines
5769 would never read past the end of the wisdom, so you could store extra
5770 data beyond the wisdom in the same file, for example. In FFTW 3, the
5771 file-import routine may read up to a few hundred bytes past the end of
5772 the wisdom, so you cannot store other data just beyond it.(1)
5773
5774 Wisdom has been enhanced by additional humility in FFTW 3: whereas
5775 FFTW 2 would re-use wisdom for a given transform size regardless of the
5776 stride etc., in FFTW 3 wisdom is only used with the strides etc. for
5777 which it was created. Unfortunately, this means FFTW 3 has to create
5778 new plans from scratch more often than FFTW 2 (in FFTW 2, planning e.g.
5779 one transform of size 1024 also created wisdom for all smaller powers
5780 of 2, but this no longer occurs).
5781
5782 FFTW 3 also has the new routine `fftw_import_system_wisdom' to
5783 import wisdom from a standard system-wide location.
5784
5785 Memory allocation
5786 =================
5787
5788 In FFTW 3, we recommend allocating your arrays with `fftw_malloc' and
5789 deallocating them with `fftw_free'; this is not required, but allows
5790 optimal performance when SIMD acceleration is used. (Those two
5791 functions actually existed in FFTW 2, and worked the same way, but were
5792 not documented.)
5793
5794 In FFTW 2, there were `fftw_malloc_hook' and `fftw_free_hook'
5795 functions that allowed the user to replace FFTW's memory-allocation
5796 routines (e.g. to implement different error-handling, since by default
5797 FFTW prints an error message and calls `exit' to abort the program if
5798 `malloc' returns `NULL'). These hooks are not supported in FFTW 3;
5799 those few users who require this functionality can just directly modify
5800 the memory-allocation routines in FFTW (they are defined in
5801 `kernel/alloc.c').
5802
5803 Fortran interface
5804 =================
5805
5806 In FFTW 2, the subroutine names were obtained by replacing `fftw_' with
5807 `fftw_f77'; in FFTW 3, you replace `fftw_' with `dfftw_' (or `sfftw_'
5808 or `lfftw_', depending upon the precision).
5809
5810 In FFTW 3, we have begun recommending that you always declare the
5811 type used to store plans as `integer*8'. (Too many people didn't notice
5812 our instruction to switch from `integer' to `integer*8' for 64-bit
5813 machines.)
5814
5815 In FFTW 3, we provide a `fftw3.f' "header file" to include in your
5816 code (and which is officially installed on Unix systems). (In FFTW 2,
5817 we supplied a `fftw_f77.i' file, but it was not installed.)
5818
5819 Otherwise, the C-Fortran interface relationship is much the same as
5820 it was before (e.g. return values become initial parameters, and
5821 multi-dimensional arrays are in column-major order). Unlike FFTW 2, we
5822 do provide some support for wisdom import/export in Fortran (*note
5823 Wisdom of Fortran?::).
5824
5825 Threads
5826 =======
5827
5828 Like FFTW 2, only the execution routines are thread-safe. All planner
5829 routines, etcetera, should be called by only a single thread at a time
5830 (*note Thread safety::). _Unlike_ FFTW 2, there is no special
5831 `FFTW_THREADSAFE' flag for the planner to allow a given plan to be
5832 usable by multiple threads in parallel; this is now the case by default.
5833
5834 The multi-threaded version of FFTW 2 required you to pass the number
5835 of threads each time you execute the transform. The number of threads
5836 is now stored in the plan, and is specified before the planner is
5837 called by `fftw_plan_with_nthreads'. The threads initialization
5838 routine used to be called `fftw_threads_init' and would return zero on
5839 success; the new routine is called `fftw_init_threads' and returns zero
5840 on failure. *Note Multi-threaded FFTW::.
5841
5842 There is no separate threads header file in FFTW 3; all the function
5843 prototypes are in `<fftw3.h>'. However, you still have to link to a
5844 separate library (`-lfftw3_threads -lfftw3 -lm' on Unix), as well as to
5845 the threading library (e.g. POSIX threads on Unix).
5846
5847 ---------- Footnotes ----------
5848
5849 (1) We do our own buffering because GNU libc I/O routines are
5850 horribly slow for single-character I/O, apparently for thread-safety
5851 reasons (whether you are using threads or not).
5852
5853 
5854 File: fftw3.info, Node: Installation and Customization, Next: Acknowledgments, Prev: Upgrading from FFTW version 2, Up: Top
5855
5856 10 Installation and Customization
5857 *********************************
5858
5859 This chapter describes the installation and customization of FFTW, the
5860 latest version of which may be downloaded from the FFTW home page
5861 (http://www.fftw.org).
5862
5863 In principle, FFTW should work on any system with an ANSI C compiler
5864 (`gcc' is fine). However, planner time is drastically reduced if FFTW
5865 can exploit a hardware cycle counter; FFTW comes with cycle-counter
5866 support for all modern general-purpose CPUs, but you may need to add a
5867 couple of lines of code if your compiler is not yet supported (*note
5868 Cycle Counters::). (On Unix, there will be a warning at the end of the
5869 `configure' output if no cycle counter is found.)
5870
5871 Installation of FFTW is simplest if you have a Unix or a GNU system,
5872 such as GNU/Linux, and we describe this case in the first section below,
5873 including the use of special configuration options to e.g. install
5874 different precisions or exploit optimizations for particular
5875 architectures (e.g. SIMD). Compilation on non-Unix systems is a more
5876 manual process, but we outline the procedure in the second section. It
5877 is also likely that pre-compiled binaries will be available for popular
5878 systems.
5879
5880 Finally, we describe how you can customize FFTW for particular needs
5881 by generating _codelets_ for fast transforms of sizes not supported
5882 efficiently by the standard FFTW distribution.
5883
5884 * Menu:
5885
5886 * Installation on Unix::
5887 * Installation on non-Unix systems::
5888 * Cycle Counters::
5889 * Generating your own code::
5890
5891 
5892 File: fftw3.info, Node: Installation on Unix, Next: Installation on non-Unix systems, Prev: Installation and Customization, Up: Installation and Customization
5893
5894 10.1 Installation on Unix
5895 =========================
5896
5897 FFTW comes with a `configure' program in the GNU style. Installation
5898 can be as simple as:
5899
5900 ./configure
5901 make
5902 make install
5903
5904 This will build the uniprocessor complex and real transform libraries
5905 along with the test programs. (We recommend that you use GNU `make' if
5906 it is available; on some systems it is called `gmake'.) The "`make
5907 install'" command installs the fftw and rfftw libraries in standard
5908 places, and typically requires root privileges (unless you specify a
5909 different install directory with the `--prefix' flag to `configure').
5910 You can also type "`make check'" to put the FFTW test programs through
5911 their paces. If you have problems during configuration or compilation,
5912 you may want to run "`make distclean'" before trying again; this
5913 ensures that you don't have any stale files left over from previous
5914 compilation attempts.
5915
5916 The `configure' script chooses the `gcc' compiler by default, if it
5917 is available; you can select some other compiler with:
5918 ./configure CC="<the name of your C compiler>"
5919
5920 The `configure' script knows good `CFLAGS' (C compiler flags) for a
5921 few systems. If your system is not known, the `configure' script will
5922 print out a warning. In this case, you should re-configure FFTW with
5923 the command
5924 ./configure CFLAGS="<write your CFLAGS here>"
5925 and then compile as usual. If you do find an optimal set of
5926 `CFLAGS' for your system, please let us know what they are (along with
5927 the output of `config.guess') so that we can include them in future
5928 releases.
5929
5930 `configure' supports all the standard flags defined by the GNU
5931 Coding Standards; see the `INSTALL' file in FFTW or the GNU web page
5932 (http://www.gnu.org/prep/standards/html_node/index.html). Note
5933 especially `--help' to list all flags and `--enable-shared' to create
5934 shared, rather than static, libraries. `configure' also accepts a few
5935 FFTW-specific flags, particularly:
5936
5937 * `--enable-float': Produces a single-precision version of FFTW
5938 (`float') instead of the default double-precision (`double').
5939 *Note Precision::.
5940
5941 * `--enable-long-double': Produces a long-double precision version of
5942 FFTW (`long double') instead of the default double-precision
5943 (`double'). The `configure' script will halt with an error
5944 message if `long double' is the same size as `double' on your
5945 machine/compiler. *Note Precision::.
5946
5947 * `--enable-quad-precision': Produces a quadruple-precision version
5948 of FFTW using the nonstandard `__float128' type provided by `gcc'
5949 4.6 or later on x86, x86-64, and Itanium architectures, instead of
5950 the default double-precision (`double'). The `configure' script
5951 will halt with an error message if the compiler is not `gcc'
5952 version 4.6 or later or if `gcc''s `libquadmath' library is not
5953 installed. *Note Precision::.
5954
5955 * `--enable-threads': Enables compilation and installation of the
5956 FFTW threads library (*note Multi-threaded FFTW::), which provides
5957 a simple interface to parallel transforms for SMP systems. By
5958 default, the threads routines are not compiled.
5959
5960 * `--enable-openmp': Like `--enable-threads', but using OpenMP
5961 compiler directives in order to induce parallelism rather than
5962 spawning its own threads directly, and installing an `fftw3_omp'
5963 library rather than an `fftw3_threads' library (*note
5964 Multi-threaded FFTW::). You can use both `--enable-openmp' and
5965 `--enable-threads' since they compile/install libraries with
5966 different names. By default, the OpenMP routines are not compiled.
5967
5968 * `--with-combined-threads': By default, if `--enable-threads' is
5969 used, the threads support is compiled into a separate library that
5970 must be linked in addition to the main FFTW library. This is so
5971 that users of the serial library do not need to link the system
5972 threads libraries. If `--with-combined-threads' is specified,
5973 however, then no separate threads library is created, and threads
5974 are included in the main FFTW library. This is mainly useful
5975 under Windows, where no system threads library is required and
5976 inter-library dependencies are problematic.
5977
5978 * `--enable-mpi': Enables compilation and installation of the FFTW
5979 MPI library (*note Distributed-memory FFTW with MPI::), which
5980 provides parallel transforms for distributed-memory systems with
5981 MPI. (By default, the MPI routines are not compiled.) *Note FFTW
5982 MPI Installation::.
5983
5984 * `--disable-fortran': Disables inclusion of legacy-Fortran wrapper
5985 routines (*note Calling FFTW from Legacy Fortran::) in the standard
5986 FFTW libraries. These wrapper routines increase the library size
5987 by only a negligible amount, so they are included by default as
5988 long as the `configure' script finds a Fortran compiler on your
5989 system. (To specify a particular Fortran compiler foo, pass
5990 `F77='foo to `configure'.)
5991
5992 * `--with-g77-wrappers': By default, when Fortran wrappers are
5993 included, the wrappers employ the linking conventions of the
5994 Fortran compiler detected by the `configure' script. If this
5995 compiler is GNU `g77', however, then _two_ versions of the
5996 wrappers are included: one with `g77''s idiosyncratic convention
5997 of appending two underscores to identifiers, and one with the more
5998 common convention of appending only a single underscore. This
5999 way, the same FFTW library will work with both `g77' and other
6000 Fortran compilers, such as GNU `gfortran'. However, the converse
6001 is not true: if you configure with a different compiler, then the
6002 `g77'-compatible wrappers are not included. By specifying
6003 `--with-g77-wrappers', the `g77'-compatible wrappers are included
6004 in addition to wrappers for whatever Fortran compiler `configure'
6005 finds.
6006
6007 * `--with-slow-timer': Disables the use of hardware cycle counters,
6008 and falls back on `gettimeofday' or `clock'. This greatly worsens
6009 performance, and should generally not be used (unless you don't
6010 have a cycle counter but still really want an optimized plan
6011 regardless of the time). *Note Cycle Counters::.
6012
6013 * `--enable-sse', `--enable-sse2', `--enable-avx',
6014 `--enable-altivec', `--enable-neon': Enable the compilation of
6015 SIMD code for SSE (Pentium III+), SSE2 (Pentium IV+), AVX (Sandy
6016 Bridge, Interlagos), AltiVec (PowerPC G4+), NEON (some ARM
6017 processors). SSE, AltiVec, and NEON only work with
6018 `--enable-float' (above). SSE2 works in both single and double
6019 precision (and is simply SSE in single precision). The resulting
6020 code will _still work_ on earlier CPUs lacking the SIMD extensions
6021 (SIMD is automatically disabled, although the FFTW library is
6022 still larger).
6023 - These options require a compiler supporting SIMD extensions,
6024 and compiler support is always a bit flaky: see the FFTW FAQ
6025 for a list of compiler versions that have problems compiling
6026 FFTW.
6027
6028 - With AltiVec and `gcc', you may have to use the
6029 `-mabi=altivec' option when compiling any code that links to
6030 FFTW, in order to properly align the stack; otherwise, FFTW
6031 could crash when it tries to use an AltiVec feature. (This
6032 is not necessary on MacOS X.)
6033
6034 - With SSE/SSE2 and `gcc', you should use a version of gcc that
6035 properly aligns the stack when compiling any code that links
6036 to FFTW. By default, `gcc' 2.95 and later versions align the
6037 stack as needed, but you should not compile FFTW with the
6038 `-Os' option or the `-mpreferred-stack-boundary' option with
6039 an argument less than 4.
6040
6041 - Because of the large variety of ARM processors and ABIs, FFTW
6042 does not attempt to guess the correct `gcc' flags for
6043 generating NEON code. In general, you will have to provide
6044 them on the command line. This command line is known to have
6045 worked at least once:
6046 ./configure --with-slow-timer --host=arm-linux-gnueabi \
6047 --enable-single --enable-neon \
6048 "CC=arm-linux-gnueabi-gcc -march=armv7-a -mfloat-abi=softfp"
6049
6050
6051 To force `configure' to use a particular C compiler foo (instead of
6052 the default, usually `gcc'), pass `CC='foo to the `configure' script;
6053 you may also need to set the flags via the variable `CFLAGS' as
6054 described above.
6055
6056 
6057 File: fftw3.info, Node: Installation on non-Unix systems, Next: Cycle Counters, Prev: Installation on Unix, Up: Installation and Customization
6058
6059 10.2 Installation on non-Unix systems
6060 =====================================
6061
6062 It should be relatively straightforward to compile FFTW even on non-Unix
6063 systems lacking the niceties of a `configure' script. Basically, you
6064 need to edit the `config.h' header (copy it from `config.h.in') to
6065 `#define' the various options and compiler characteristics, and then
6066 compile all the `.c' files in the relevant directories.
6067
6068 The `config.h' header contains about 100 options to set, each one
6069 initially an `#undef', each documented with a comment, and most of them
6070 fairly obvious. For most of the options, you should simply `#define'
6071 them to `1' if they are applicable, although a few options require a
6072 particular value (e.g. `SIZEOF_LONG_LONG' should be defined to the size
6073 of the `long long' type, in bytes, or zero if it is not supported). We
6074 will likely post some sample `config.h' files for various operating
6075 systems and compilers for you to use (at least as a starting point).
6076 Please let us know if you have to hand-create a configuration file
6077 (and/or a pre-compiled binary) that you want to share.
6078
6079 To create the FFTW library, you will then need to compile all of the
6080 `.c' files in the `kernel', `dft', `dft/scalar', `dft/scalar/codelets',
6081 `rdft', `rdft/scalar', `rdft/scalar/r2cf', `rdft/scalar/r2cb',
6082 `rdft/scalar/r2r', `reodft', and `api' directories. If you are
6083 compiling with SIMD support (e.g. you defined `HAVE_SSE2' in
6084 `config.h'), then you also need to compile the `.c' files in the
6085 `simd-support', `{dft,rdft}/simd', `{dft,rdft}/simd/*' directories.
6086
6087 Once these files are all compiled, link them into a library, or a
6088 shared library, or directly into your program.
6089
6090 To compile the FFTW test program, additionally compile the code in
6091 the `libbench2/' directory, and link it into a library. Then compile
6092 the code in the `tests/' directory and link it to the `libbench2' and
6093 FFTW libraries. To compile the `fftw-wisdom' (command-line) tool
6094 (*note Wisdom Utilities::), compile `tools/fftw-wisdom.c' and link it
6095 to the `libbench2' and FFTW libraries
6096
6097 
6098 File: fftw3.info, Node: Cycle Counters, Next: Generating your own code, Prev: Installation on non-Unix systems, Up: Installation and Customization
6099
6100 10.3 Cycle Counters
6101 ===================
6102
6103 FFTW's planner actually executes and times different possible FFT
6104 algorithms in order to pick the fastest plan for a given n. In order
6105 to do this in as short a time as possible, however, the timer must have
6106 a very high resolution, and to accomplish this we employ the hardware
6107 "cycle counters" that are available on most CPUs. Currently, FFTW
6108 supports the cycle counters on x86, PowerPC/POWER, Alpha, UltraSPARC
6109 (SPARC v9), IA64, PA-RISC, and MIPS processors.
6110
6111 Access to the cycle counters, unfortunately, is a compiler and/or
6112 operating-system dependent task, often requiring inline assembly
6113 language, and it may be that your compiler is not supported. If you are
6114 _not_ supported, FFTW will by default fall back on its estimator
6115 (effectively using `FFTW_ESTIMATE' for all plans).
6116
6117 You can add support by editing the file `kernel/cycle.h'; normally,
6118 this will involve adapting one of the examples already present in order
6119 to use the inline-assembler syntax for your C compiler, and will only
6120 require a couple of lines of code. Anyone adding support for a new
6121 system to `cycle.h' is encouraged to email us at <fftw@fftw.org>.
6122
6123 If a cycle counter is not available on your system (e.g. some
6124 embedded processor), and you don't want to use estimated plans, as a
6125 last resort you can use the `--with-slow-timer' option to `configure'
6126 (on Unix) or `#define WITH_SLOW_TIMER' in `config.h' (elsewhere). This
6127 will use the much lower-resolution `gettimeofday' function, or even
6128 `clock' if the former is unavailable, and planning will be extremely
6129 slow.
6130
6131 
6132 File: fftw3.info, Node: Generating your own code, Prev: Cycle Counters, Up: Installation and Customization
6133
6134 10.4 Generating your own code
6135 =============================
6136
6137 The directory `genfft' contains the programs that were used to generate
6138 FFTW's "codelets," which are hard-coded transforms of small sizes. We
6139 do not expect casual users to employ the generator, which is a rather
6140 sophisticated program that generates directed acyclic graphs of FFT
6141 algorithms and performs algebraic simplifications on them. It was
6142 written in Objective Caml, a dialect of ML, which is available at
6143 `http://caml.inria.fr/ocaml/index.en.html'.
6144
6145 If you have Objective Caml installed (along with recent versions of
6146 GNU `autoconf', `automake', and `libtool'), then you can change the set
6147 of codelets that are generated or play with the generation options.
6148 The set of generated codelets is specified by the
6149 `{dft,rdft}/{codelets,simd}/*/Makefile.am' files. For example, you can
6150 add efficient REDFT codelets of small sizes by modifying
6151 `rdft/codelets/r2r/Makefile.am'. After you modify any `Makefile.am'
6152 files, you can type `sh bootstrap.sh' in the top-level directory
6153 followed by `make' to re-generate the files.
6154
6155 We do not provide more details about the code-generation process,
6156 since we do not expect that most users will need to generate their own
6157 code. However, feel free to contact us at <fftw@fftw.org> if you are
6158 interested in the subject.
6159
6160 You might find it interesting to learn Caml and/or some modern
6161 programming techniques that we used in the generator (including monadic
6162 programming), especially if you heard the rumor that Java and
6163 object-oriented programming are the latest advancement in the field.
6164 The internal operation of the codelet generator is described in the
6165 paper, "A Fast Fourier Transform Compiler," by M. Frigo, which is
6166 available from the FFTW home page (http://www.fftw.org) and also
6167 appeared in the `Proceedings of the 1999 ACM SIGPLAN Conference on
6168 Programming Language Design and Implementation (PLDI)'.
6169
6170 
6171 File: fftw3.info, Node: Acknowledgments, Next: License and Copyright, Prev: Installation and Customization, Up: Top
6172
6173 11 Acknowledgments
6174 ******************
6175
6176 Matteo Frigo was supported in part by the Special Research Program SFB
6177 F011 "AURORA" of the Austrian Science Fund FWF and by MIT Lincoln
6178 Laboratory. For previous versions of FFTW, he was supported in part by
6179 the Defense Advanced Research Projects Agency (DARPA), under Grants
6180 N00014-94-1-0985 and F30602-97-1-0270, and by a Digital Equipment
6181 Corporation Fellowship.
6182
6183 Steven G. Johnson was supported in part by a Dept. of Defense NDSEG
6184 Fellowship, an MIT Karl Taylor Compton Fellowship, and by the Materials
6185 Research Science and Engineering Center program of the National Science
6186 Foundation under award DMR-9400334.
6187
6188 Code for the Cell Broadband Engine was graciously donated to the FFTW
6189 project by the IBM Austin Research Lab and included in fftw-3.2. (This
6190 code was removed in fftw-3.3.)
6191
6192 Code for the MIPS paired-single SIMD support was graciously donated
6193 to the FFTW project by CodeSourcery, Inc.
6194
6195 We are grateful to Sun Microsystems Inc. for its donation of a
6196 cluster of 9 8-processor Ultra HPC 5000 SMPs (24 Gflops peak). These
6197 machines served as the primary platform for the development of early
6198 versions of FFTW.
6199
6200 We thank Intel Corporation for donating a four-processor Pentium Pro
6201 machine. We thank the GNU/Linux community for giving us a decent OS to
6202 run on that machine.
6203
6204 We are thankful to the AMD corporation for donating an AMD Athlon XP
6205 1700+ computer to the FFTW project.
6206
6207 We thank the Compaq/HP testdrive program and VA Software Corporation
6208 (SourceForge.net) for providing remote access to machines that were used
6209 to test FFTW.
6210
6211 The `genfft' suite of code generators was written using Objective
6212 Caml, a dialect of ML. Objective Caml is a small and elegant language
6213 developed by Xavier Leroy. The implementation is available from
6214 `http://caml.inria.fr/' (http://caml.inria.fr/). In previous releases
6215 of FFTW, `genfft' was written in Caml Light, by the same authors. An
6216 even earlier implementation of `genfft' was written in Scheme, but Caml
6217 is definitely better for this kind of application.
6218
6219 FFTW uses many tools from the GNU project, including `automake',
6220 `texinfo', and `libtool'.
6221
6222 Prof. Charles E. Leiserson of MIT provided continuous support and
6223 encouragement. This program would not exist without him. Charles also
6224 proposed the name "codelets" for the basic FFT blocks.
6225
6226 Prof. John D. Joannopoulos of MIT demonstrated continuing tolerance
6227 of Steven's "extra-curricular" computer-science activities, as well as
6228 remarkable creativity in working them into his grant proposals.
6229 Steven's physics degree would not exist without him.
6230
6231 Franz Franchetti wrote SIMD extensions to FFTW 2, which eventually
6232 led to the SIMD support in FFTW 3.
6233
6234 Stefan Kral wrote most of the K7 code generator distributed with FFTW
6235 3.0.x and 3.1.x.
6236
6237 Andrew Sterian contributed the Windows timing code in FFTW 2.
6238
6239 Didier Miras reported a bug in the test procedure used in FFTW 1.2.
6240 We now use a completely different test algorithm by Funda Ergun that
6241 does not require a separate FFT program to compare against.
6242
6243 Wolfgang Reimer contributed the Pentium cycle counter and a few fixes
6244 that help portability.
6245
6246 Ming-Chang Liu uncovered a well-hidden bug in the complex transforms
6247 of FFTW 2.0 and supplied a patch to correct it.
6248
6249 The FFTW FAQ was written in `bfnn' (Bizarre Format With No Name) and
6250 formatted using the tools developed by Ian Jackson for the Linux FAQ.
6251
6252 _We are especially thankful to all of our users for their continuing
6253 support, feedback, and interest during our development of FFTW._
6254
6255 
6256 File: fftw3.info, Node: License and Copyright, Next: Concept Index, Prev: Acknowledgments, Up: Top
6257
6258 12 License and Copyright
6259 ************************
6260
6261 FFTW is Copyright (C) 2003, 2007-11 Matteo Frigo, Copyright (C) 2003,
6262 2007-11 Massachusetts Institute of Technology.
6263
6264 FFTW is free software; you can redistribute it and/or modify it
6265 under the terms of the GNU General Public License as published by the
6266 Free Software Foundation; either version 2 of the License, or (at your
6267 option) any later version.
6268
6269 This program is distributed in the hope that it will be useful, but
6270 WITHOUT ANY WARRANTY; without even the implied warranty of
6271 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
6272 General Public License for more details.
6273
6274 You should have received a copy of the GNU General Public License
6275 along with this program; if not, write to the Free Software Foundation,
6276 Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA You
6277 can also find the GPL on the GNU web site
6278 (http://www.gnu.org/licenses/gpl-2.0.html).
6279
6280 In addition, we kindly ask you to acknowledge FFTW and its authors in
6281 any program or publication in which you use FFTW. (You are not
6282 _required_ to do so; it is up to your common sense to decide whether
6283 you want to comply with this request or not.) For general
6284 publications, we suggest referencing: Matteo Frigo and Steven G.
6285 Johnson, "The design and implementation of FFTW3," Proc. IEEE 93 (2),
6286 216-231 (2005).
6287
6288 Non-free versions of FFTW are available under terms different from
6289 those of the General Public License. (e.g. they do not require you to
6290 accompany any object code using FFTW with the corresponding source
6291 code.) For these alternative terms you must purchase a license from
6292 MIT's Technology Licensing Office. Users interested in such a license
6293 should contact us (<fftw@fftw.org>) for more information.
6294