annotate fft/fftw/fftw-3.3.4/doc/fftw3.info-1 @ 40:223f770b5341 kissfft-double tip

Try a double-precision kissfft
author Chris Cannam
date Wed, 07 Sep 2016 10:40:32 +0100
parents 26056e866c29
children
rev   line source
Chris@19 1 This is fftw3.info, produced by makeinfo version 4.13 from fftw3.texi.
Chris@19 2
Chris@19 3 This manual is for FFTW (version 3.3.4, 20 September 2013).
Chris@19 4
Chris@19 5 Copyright (C) 2003 Matteo Frigo.
Chris@19 6
Chris@19 7 Copyright (C) 2003 Massachusetts Institute of Technology.
Chris@19 8
Chris@19 9 Permission is granted to make and distribute verbatim copies of
Chris@19 10 this manual provided the copyright notice and this permission
Chris@19 11 notice are preserved on all copies.
Chris@19 12
Chris@19 13 Permission is granted to copy and distribute modified versions of
Chris@19 14 this manual under the conditions for verbatim copying, provided
Chris@19 15 that the entire resulting derived work is distributed under the
Chris@19 16 terms of a permission notice identical to this one.
Chris@19 17
Chris@19 18 Permission is granted to copy and distribute translations of this
Chris@19 19 manual into another language, under the above conditions for
Chris@19 20 modified versions, except that this permission notice may be
Chris@19 21 stated in a translation approved by the Free Software Foundation.
Chris@19 22
Chris@19 23 INFO-DIR-SECTION Development
Chris@19 24 START-INFO-DIR-ENTRY
Chris@19 25 * fftw3: (fftw3). FFTW User's Manual.
Chris@19 26 END-INFO-DIR-ENTRY
Chris@19 27
Chris@19 28 
Chris@19 29 File: fftw3.info, Node: Top, Next: Introduction, Prev: (dir), Up: (dir)
Chris@19 30
Chris@19 31 FFTW User Manual
Chris@19 32 ****************
Chris@19 33
Chris@19 34 Welcome to FFTW, the Fastest Fourier Transform in the West. FFTW is a
Chris@19 35 collection of fast C routines to compute the discrete Fourier transform.
Chris@19 36 This manual documents FFTW version 3.3.4.
Chris@19 37
Chris@19 38 * Menu:
Chris@19 39
Chris@19 40 * Introduction::
Chris@19 41 * Tutorial::
Chris@19 42 * Other Important Topics::
Chris@19 43 * FFTW Reference::
Chris@19 44 * Multi-threaded FFTW::
Chris@19 45 * Distributed-memory FFTW with MPI::
Chris@19 46 * Calling FFTW from Modern Fortran::
Chris@19 47 * Calling FFTW from Legacy Fortran::
Chris@19 48 * Upgrading from FFTW version 2::
Chris@19 49 * Installation and Customization::
Chris@19 50 * Acknowledgments::
Chris@19 51 * License and Copyright::
Chris@19 52 * Concept Index::
Chris@19 53 * Library Index::
Chris@19 54
Chris@19 55 
Chris@19 56 File: fftw3.info, Node: Introduction, Next: Tutorial, Prev: Top, Up: Top
Chris@19 57
Chris@19 58 1 Introduction
Chris@19 59 **************
Chris@19 60
Chris@19 61 This manual documents version 3.3.4 of FFTW, the _Fastest Fourier
Chris@19 62 Transform in the West_. FFTW is a comprehensive collection of fast C
Chris@19 63 routines for computing the discrete Fourier transform (DFT) and various
Chris@19 64 special cases thereof.
Chris@19 65 * FFTW computes the DFT of complex data, real data, even- or
Chris@19 66 odd-symmetric real data (these symmetric transforms are usually
Chris@19 67 known as the discrete cosine or sine transform, respectively), and
Chris@19 68 the discrete Hartley transform (DHT) of real data.
Chris@19 69
Chris@19 70 * The input data can have arbitrary length. FFTW employs O(n
Chris@19 71 log n) algorithms for all lengths, including prime numbers.
Chris@19 72
Chris@19 73 * FFTW supports arbitrary multi-dimensional data.
Chris@19 74
Chris@19 75 * FFTW supports the SSE, SSE2, AVX, Altivec, and MIPS PS instruction
Chris@19 76 sets.
Chris@19 77
Chris@19 78 * FFTW includes parallel (multi-threaded) transforms for
Chris@19 79 shared-memory systems.
Chris@19 80
Chris@19 81 * Starting with version 3.3, FFTW includes distributed-memory
Chris@19 82 parallel transforms using MPI.
Chris@19 83
Chris@19 84 We assume herein that you are familiar with the properties and uses
Chris@19 85 of the DFT that are relevant to your application. Otherwise, see e.g.
Chris@19 86 `The Fast Fourier Transform and Its Applications' by E. O. Brigham
Chris@19 87 (Prentice-Hall, Englewood Cliffs, NJ, 1988). Our web page
Chris@19 88 (http://www.fftw.org) also has links to FFT-related information online.
Chris@19 89
Chris@19 90 In order to use FFTW effectively, you need to learn one basic concept
Chris@19 91 of FFTW's internal structure: FFTW does not use a fixed algorithm for
Chris@19 92 computing the transform, but instead it adapts the DFT algorithm to
Chris@19 93 details of the underlying hardware in order to maximize performance.
Chris@19 94 Hence, the computation of the transform is split into two phases.
Chris@19 95 First, FFTW's "planner" "learns" the fastest way to compute the
Chris@19 96 transform on your machine. The planner produces a data structure
Chris@19 97 called a "plan" that contains this information. Subsequently, the plan
Chris@19 98 is "executed" to transform the array of input data as dictated by the
Chris@19 99 plan. The plan can be reused as many times as needed. In typical
Chris@19 100 high-performance applications, many transforms of the same size are
Chris@19 101 computed and, consequently, a relatively expensive initialization of
Chris@19 102 this sort is acceptable. On the other hand, if you need a single
Chris@19 103 transform of a given size, the one-time cost of the planner becomes
Chris@19 104 significant. For this case, FFTW provides fast planners based on
Chris@19 105 heuristics or on previously computed plans.
Chris@19 106
Chris@19 107 FFTW supports transforms of data with arbitrary length, rank,
Chris@19 108 multiplicity, and a general memory layout. In simple cases, however,
Chris@19 109 this generality may be unnecessary and confusing. Consequently, we
Chris@19 110 organized the interface to FFTW into three levels of increasing
Chris@19 111 generality.
Chris@19 112 * The "basic interface" computes a single transform of
Chris@19 113 contiguous data.
Chris@19 114
Chris@19 115 * The "advanced interface" computes transforms of multiple or
Chris@19 116 strided arrays.
Chris@19 117
Chris@19 118 * The "guru interface" supports the most general data layouts,
Chris@19 119 multiplicities, and strides.
Chris@19 120 We expect that most users will be best served by the basic interface,
Chris@19 121 whereas the guru interface requires careful attention to the
Chris@19 122 documentation to avoid problems.
Chris@19 123
Chris@19 124 Besides the automatic performance adaptation performed by the
Chris@19 125 planner, it is also possible for advanced users to customize FFTW
Chris@19 126 manually. For example, if code space is a concern, we provide a tool
Chris@19 127 that links only the subset of FFTW needed by your application.
Chris@19 128 Conversely, you may need to extend FFTW because the standard
Chris@19 129 distribution is not sufficient for your needs. For example, the
Chris@19 130 standard FFTW distribution works most efficiently for arrays whose size
Chris@19 131 can be factored into small primes (2, 3, 5, and 7), and otherwise it
Chris@19 132 uses a slower general-purpose routine. If you need efficient
Chris@19 133 transforms of other sizes, you can use FFTW's code generator, which
Chris@19 134 produces fast C programs ("codelets") for any particular array size you
Chris@19 135 may care about. For example, if you need transforms of size 513 = 19 x
Chris@19 136 3^3, you can customize FFTW to support the factor 19 efficiently.
Chris@19 137
Chris@19 138 For more information regarding FFTW, see the paper, "The Design and
Chris@19 139 Implementation of FFTW3," by M. Frigo and S. G. Johnson, which was an
Chris@19 140 invited paper in `Proc. IEEE' 93 (2), p. 216 (2005). The code
Chris@19 141 generator is described in the paper "A fast Fourier transform compiler", by
Chris@19 142 M. Frigo, in the `Proceedings of the 1999 ACM SIGPLAN Conference on
Chris@19 143 Programming Language Design and Implementation (PLDI), Atlanta,
Chris@19 144 Georgia, May 1999'. These papers, along with the latest version of
Chris@19 145 FFTW, the FAQ, benchmarks, and other links, are available at the FFTW
Chris@19 146 home page (http://www.fftw.org).
Chris@19 147
Chris@19 148 The current version of FFTW incorporates many good ideas from the
Chris@19 149 past thirty years of FFT literature. In one way or another, FFTW uses
Chris@19 150 the Cooley-Tukey algorithm, the prime factor algorithm, Rader's
Chris@19 151 algorithm for prime sizes, and a split-radix algorithm (with a
Chris@19 152 "conjugate-pair" variation pointed out to us by Dan Bernstein). FFTW's
Chris@19 153 code generator also produces new algorithms that we do not completely
Chris@19 154 understand. The reader is referred to the cited papers for the
Chris@19 155 appropriate references.
Chris@19 156
Chris@19 157 The rest of this manual is organized as follows. We first discuss
Chris@19 158 the sequential (single-processor) implementation. We start by
Chris@19 159 describing the basic interface/features of FFTW in *note Tutorial::.
Chris@19 160 Next, *note Other Important Topics:: discusses data alignment (*note
Chris@19 161 SIMD alignment and fftw_malloc::), the storage scheme of
Chris@19 162 multi-dimensional arrays (*note Multi-dimensional Array Format::), and
Chris@19 163 FFTW's mechanism for storing plans on disk (*note Words of
Chris@19 164 Wisdom-Saving Plans::). Next, *note FFTW Reference:: provides
Chris@19 165 comprehensive documentation of all FFTW's features. Parallel
Chris@19 166 transforms are discussed in their own chapters: *note Multi-threaded
Chris@19 167 FFTW:: and *note Distributed-memory FFTW with MPI::. Fortran
Chris@19 168 programmers can also use FFTW, as described in *note Calling FFTW from
Chris@19 169 Legacy Fortran:: and *note Calling FFTW from Modern Fortran::. *note
Chris@19 170 Installation and Customization:: explains how to install FFTW in your
Chris@19 171 computer system and how to adapt FFTW to your needs. License and
Chris@19 172 copyright information is given in *note License and Copyright::.
Chris@19 173 Finally, we thank all the people who helped us in *note
Chris@19 174 Acknowledgments::.
Chris@19 175
Chris@19 176 
Chris@19 177 File: fftw3.info, Node: Tutorial, Next: Other Important Topics, Prev: Introduction, Up: Top
Chris@19 178
Chris@19 179 2 Tutorial
Chris@19 180 **********
Chris@19 181
Chris@19 182 * Menu:
Chris@19 183
Chris@19 184 * Complex One-Dimensional DFTs::
Chris@19 185 * Complex Multi-Dimensional DFTs::
Chris@19 186 * One-Dimensional DFTs of Real Data::
Chris@19 187 * Multi-Dimensional DFTs of Real Data::
Chris@19 188 * More DFTs of Real Data::
Chris@19 189
Chris@19 190 This chapter describes the basic usage of FFTW, i.e., how to compute the
Chris@19 191 Fourier transform of a single array. This chapter tells the truth, but
Chris@19 192 not the _whole_ truth. Specifically, FFTW implements additional
Chris@19 193 routines and flags that are not documented here, although in many cases
Chris@19 194 we try to indicate where added capabilities exist. For more complete
Chris@19 195 information, see *note FFTW Reference::. (Note that you need to
Chris@19 196 compile and install FFTW before you can use it in a program. For the
Chris@19 197 details of the installation, see *note Installation and
Chris@19 198 Customization::.)
Chris@19 199
Chris@19 200 We recommend that you read this tutorial in order.(1) At the least,
Chris@19 201 read the first section (*note Complex One-Dimensional DFTs::) before
Chris@19 202 reading any of the others, even if your main interest lies in one of
Chris@19 203 the other transform types.
Chris@19 204
Chris@19 205 Users of FFTW version 2 and earlier may also want to read *note
Chris@19 206 Upgrading from FFTW version 2::.
Chris@19 207
Chris@19 208 ---------- Footnotes ----------
Chris@19 209
Chris@19 210 (1) You can read the tutorial in bit-reversed order after computing
Chris@19 211 your first transform.
Chris@19 212
Chris@19 213 
Chris@19 214 File: fftw3.info, Node: Complex One-Dimensional DFTs, Next: Complex Multi-Dimensional DFTs, Prev: Tutorial, Up: Tutorial
Chris@19 215
Chris@19 216 2.1 Complex One-Dimensional DFTs
Chris@19 217 ================================
Chris@19 218
Chris@19 219 Plan: To bother about the best method of accomplishing an
Chris@19 220 accidental result. [Ambrose Bierce, `The Enlarged Devil's
Chris@19 221 Dictionary'.]
Chris@19 222
Chris@19 223 The basic usage of FFTW to compute a one-dimensional DFT of size `N'
Chris@19 224 is simple, and it typically looks something like this code:
Chris@19 225
Chris@19 226 #include <fftw3.h>
Chris@19 227 ...
Chris@19 228 {
Chris@19 229 fftw_complex *in, *out;
Chris@19 230 fftw_plan p;
Chris@19 231 ...
Chris@19 232 in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
Chris@19 233 out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
Chris@19 234 p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE);
Chris@19 235 ...
Chris@19 236 fftw_execute(p); /* repeat as needed */
Chris@19 237 ...
Chris@19 238 fftw_destroy_plan(p);
Chris@19 239 fftw_free(in); fftw_free(out);
Chris@19 240 }
Chris@19 241
Chris@19 242 You must link this code with the `fftw3' library. On Unix systems,
Chris@19 243 link with `-lfftw3 -lm'.
Chris@19 244
Chris@19 245 The example code first allocates the input and output arrays. You
Chris@19 246 can allocate them in any way that you like, but we recommend using
Chris@19 247 `fftw_malloc', which behaves like `malloc' except that it properly
Chris@19 248 aligns the array when SIMD instructions (such as SSE and Altivec) are
Chris@19 249 available (*note SIMD alignment and fftw_malloc::). [Alternatively, we
Chris@19 250 provide a convenient wrapper function `fftw_alloc_complex(N)' which has
Chris@19 251 the same effect.]
Chris@19 252
Chris@19 253 The data is an array of type `fftw_complex', which is by default a
Chris@19 254 `double[2]' composed of the real (`in[i][0]') and imaginary
Chris@19 255 (`in[i][1]') parts of a complex number.
Chris@19 256
Chris@19 257 The next step is to create a "plan", which is an object that
Chris@19 258 contains all the data that FFTW needs to compute the FFT. This
Chris@19 259 function creates the plan:
Chris@19 260
Chris@19 261 fftw_plan fftw_plan_dft_1d(int n, fftw_complex *in, fftw_complex *out,
Chris@19 262 int sign, unsigned flags);
Chris@19 263
Chris@19 264 The first argument, `n', is the size of the transform you are trying
Chris@19 265 to compute. The size `n' can be any positive integer, but sizes that
Chris@19 266 are products of small factors are transformed most efficiently
Chris@19 267 (although prime sizes still use an O(n log n) algorithm).
Chris@19 268
Chris@19 269 The next two arguments are pointers to the input and output arrays of
Chris@19 270 the transform. These pointers can be equal, indicating an "in-place"
Chris@19 271 transform.
Chris@19 272
Chris@19 273 The fourth argument, `sign', can be either `FFTW_FORWARD' (`-1') or
Chris@19 274 `FFTW_BACKWARD' (`+1'), and indicates the direction of the transform
Chris@19 275 you are interested in; technically, it is the sign of the exponent in
Chris@19 276 the transform.
Chris@19 277
Chris@19 278 The `flags' argument is usually either `FFTW_MEASURE' or `FFTW_ESTIMATE'.
Chris@19 279 `FFTW_MEASURE' instructs FFTW to run and measure the execution time of
Chris@19 280 several FFTs in order to find the best way to compute the transform of
Chris@19 281 size `n'. This process takes some time (usually a few seconds),
Chris@19 282 depending on your machine and on the size of the transform.
Chris@19 283 `FFTW_ESTIMATE', on the contrary, does not run any computation and just
Chris@19 284 builds a reasonable plan that is probably sub-optimal. In short, if
Chris@19 285 your program performs many transforms of the same size and
Chris@19 286 initialization time is not important, use `FFTW_MEASURE'; otherwise use
Chris@19 287 the estimate.
Chris@19 288
Chris@19 289 _You must create the plan before initializing the input_, because
Chris@19 290 `FFTW_MEASURE' overwrites the `in'/`out' arrays. (Technically,
Chris@19 291 `FFTW_ESTIMATE' does not touch your arrays, but you should always
Chris@19 292 create plans first just to be sure.)
Chris@19 293
Chris@19 294 Once the plan has been created, you can use it as many times as you
Chris@19 295 like for transforms on the specified `in'/`out' arrays, computing the
Chris@19 296 actual transforms via `fftw_execute(plan)':
Chris@19 297 void fftw_execute(const fftw_plan plan);
Chris@19 298
Chris@19 299 The DFT results are stored in-order in the array `out', with the
Chris@19 300 zero-frequency (DC) component in `out[0]'. If `in != out', the
Chris@19 301 transform is "out-of-place" and the input array `in' is not modified.
Chris@19 302 Otherwise, the input array is overwritten with the transform.
Chris@19 303
Chris@19 304 If you want to transform a _different_ array of the same size, you
Chris@19 305 can create a new plan with `fftw_plan_dft_1d' and FFTW automatically
Chris@19 306 reuses the information from the previous plan, if possible.
Chris@19 307 Alternatively, with the "guru" interface you can apply a given plan to
Chris@19 308 a different array, if you are careful. *Note FFTW Reference::.
Chris@19 309
Chris@19 310 When you are done with the plan, you deallocate it by calling
Chris@19 311 `fftw_destroy_plan(plan)':
Chris@19 312 void fftw_destroy_plan(fftw_plan plan);
Chris@19 313 If you allocate an array with `fftw_malloc()' you must deallocate it
Chris@19 314 with `fftw_free()'. Do not use `free()' or, heaven forbid, `delete'.
Chris@19 315
Chris@19 316 FFTW computes an _unnormalized_ DFT. Thus, computing a forward
Chris@19 317 followed by a backward transform (or vice versa) results in the original
Chris@19 318 array scaled by `n'. For the definition of the DFT, see *note What
Chris@19 319 FFTW Really Computes::.
Chris@19 320
Chris@19 321 If you have a C compiler, such as `gcc', that supports the C99
Chris@19 322 standard, and you `#include <complex.h>' _before_ `<fftw3.h>', then
Chris@19 323 `fftw_complex' is the native double-precision complex type and you can
Chris@19 324 manipulate it with ordinary arithmetic. Otherwise, FFTW defines its
Chris@19 325 own complex type, which is bit-compatible with the C99 complex type.
Chris@19 326 *Note Complex numbers::. (The C++ `<complex>' template class may also
Chris@19 327 be usable via a typecast.)
Chris@19 328
Chris@19 329 To use single or long-double precision versions of FFTW, replace the
Chris@19 330 `fftw_' prefix by `fftwf_' or `fftwl_' and link with `-lfftw3f' or
Chris@19 331 `-lfftw3l', but use the _same_ `<fftw3.h>' header file.
Chris@19 332
Chris@19 333 Many more flags exist besides `FFTW_MEASURE' and `FFTW_ESTIMATE'.
Chris@19 334 For example, use `FFTW_PATIENT' if you're willing to wait even longer
Chris@19 335 for a possibly even faster plan (*note FFTW Reference::). You can also
Chris@19 336 save plans for future use, as described by *note Words of Wisdom-Saving
Chris@19 337 Plans::.
Chris@19 338
Chris@19 339 
Chris@19 340 File: fftw3.info, Node: Complex Multi-Dimensional DFTs, Next: One-Dimensional DFTs of Real Data, Prev: Complex One-Dimensional DFTs, Up: Tutorial
Chris@19 341
Chris@19 342 2.2 Complex Multi-Dimensional DFTs
Chris@19 343 ==================================
Chris@19 344
Chris@19 345 Multi-dimensional transforms work much the same way as one-dimensional
Chris@19 346 transforms: you allocate arrays of `fftw_complex' (preferably using
Chris@19 347 `fftw_malloc'), create an `fftw_plan', execute it as many times as you
Chris@19 348 want with `fftw_execute(plan)', and clean up with
Chris@19 349 `fftw_destroy_plan(plan)' (and `fftw_free').
Chris@19 350
Chris@19 351 FFTW provides two routines for creating plans for 2d and 3d
Chris@19 352 transforms, and one routine for creating plans of arbitrary
Chris@19 353 dimensionality. The 2d and 3d routines have the following signature:
Chris@19 354 fftw_plan fftw_plan_dft_2d(int n0, int n1,
Chris@19 355 fftw_complex *in, fftw_complex *out,
Chris@19 356 int sign, unsigned flags);
Chris@19 357 fftw_plan fftw_plan_dft_3d(int n0, int n1, int n2,
Chris@19 358 fftw_complex *in, fftw_complex *out,
Chris@19 359 int sign, unsigned flags);
Chris@19 360
Chris@19 361 These routines create plans for `n0' by `n1' two-dimensional (2d)
Chris@19 362 transforms and `n0' by `n1' by `n2' 3d transforms, respectively. All
Chris@19 363 of these transforms operate on contiguous arrays in the C-standard
Chris@19 364 "row-major" order, so that the last dimension has the fastest-varying
Chris@19 365 index in the array. This layout is described further in *note
Chris@19 366 Multi-dimensional Array Format::.
Chris@19 367
Chris@19 368 FFTW can also compute transforms of higher dimensionality. In order
Chris@19 369 to avoid confusion between the various meanings of the the word
Chris@19 370 "dimension", we use the term _rank_ to denote the number of independent
Chris@19 371 indices in an array.(1) For example, we say that a 2d transform has
Chris@19 372 rank 2, a 3d transform has rank 3, and so on. You can plan transforms
Chris@19 373 of arbitrary rank by means of the following function:
Chris@19 374
Chris@19 375 fftw_plan fftw_plan_dft(int rank, const int *n,
Chris@19 376 fftw_complex *in, fftw_complex *out,
Chris@19 377 int sign, unsigned flags);
Chris@19 378
Chris@19 379 Here, `n' is a pointer to an array `n[rank]' denoting an `n[0]' by
Chris@19 380 `n[1]' by ... by `n[rank-1]' transform. Thus, for example, the call
Chris@19 381 fftw_plan_dft_2d(n0, n1, in, out, sign, flags);
Chris@19 382 is equivalent to the following code fragment:
Chris@19 383 int n[2];
Chris@19 384 n[0] = n0;
Chris@19 385 n[1] = n1;
Chris@19 386 fftw_plan_dft(2, n, in, out, sign, flags);
Chris@19 387 `fftw_plan_dft' is not restricted to 2d and 3d transforms, however,
Chris@19 388 but it can plan transforms of arbitrary rank.
Chris@19 389
Chris@19 390 You may have noticed that all the planner routines described so far
Chris@19 391 have overlapping functionality. For example, you can plan a 1d or 2d
Chris@19 392 transform by using `fftw_plan_dft' with a `rank' of `1' or `2', or even
Chris@19 393 by calling `fftw_plan_dft_3d' with `n0' and/or `n1' equal to `1' (with
Chris@19 394 no loss in efficiency). This pattern continues, and FFTW's planning
Chris@19 395 routines in general form a "partial order," sequences of interfaces
Chris@19 396 with strictly increasing generality but correspondingly greater
Chris@19 397 complexity.
Chris@19 398
Chris@19 399 `fftw_plan_dft' is the most general complex-DFT routine that we
Chris@19 400 describe in this tutorial, but there are also the advanced and guru
Chris@19 401 interfaces, which allow one to efficiently combine multiple/strided
Chris@19 402 transforms into a single FFTW plan, transform a subset of a larger
Chris@19 403 multi-dimensional array, and/or to handle more general complex-number
Chris@19 404 formats. For more information, see *note FFTW Reference::.
Chris@19 405
Chris@19 406 ---------- Footnotes ----------
Chris@19 407
Chris@19 408 (1) The term "rank" is commonly used in the APL, FORTRAN, and Common
Chris@19 409 Lisp traditions, although it is not so common in the C world.
Chris@19 410
Chris@19 411 
Chris@19 412 File: fftw3.info, Node: One-Dimensional DFTs of Real Data, Next: Multi-Dimensional DFTs of Real Data, Prev: Complex Multi-Dimensional DFTs, Up: Tutorial
Chris@19 413
Chris@19 414 2.3 One-Dimensional DFTs of Real Data
Chris@19 415 =====================================
Chris@19 416
Chris@19 417 In many practical applications, the input data `in[i]' are purely real
Chris@19 418 numbers, in which case the DFT output satisfies the "Hermitian" redundancy:
Chris@19 419 `out[i]' is the conjugate of `out[n-i]'. It is possible to take
Chris@19 420 advantage of these circumstances in order to achieve roughly a factor
Chris@19 421 of two improvement in both speed and memory usage.
Chris@19 422
Chris@19 423 In exchange for these speed and space advantages, the user sacrifices
Chris@19 424 some of the simplicity of FFTW's complex transforms. First of all, the
Chris@19 425 input and output arrays are of _different sizes and types_: the input
Chris@19 426 is `n' real numbers, while the output is `n/2+1' complex numbers (the
Chris@19 427 non-redundant outputs); this also requires slight "padding" of the
Chris@19 428 input array for in-place transforms. Second, the inverse transform
Chris@19 429 (complex to real) has the side-effect of _overwriting its input array_,
Chris@19 430 by default. Neither of these inconveniences should pose a serious
Chris@19 431 problem for users, but it is important to be aware of them.
Chris@19 432
Chris@19 433 The routines to perform real-data transforms are almost the same as
Chris@19 434 those for complex transforms: you allocate arrays of `double' and/or
Chris@19 435 `fftw_complex' (preferably using `fftw_malloc' or
Chris@19 436 `fftw_alloc_complex'), create an `fftw_plan', execute it as many times
Chris@19 437 as you want with `fftw_execute(plan)', and clean up with
Chris@19 438 `fftw_destroy_plan(plan)' (and `fftw_free'). The only differences are
Chris@19 439 that the input (or output) is of type `double' and there are new
Chris@19 440 routines to create the plan. In one dimension:
Chris@19 441
Chris@19 442 fftw_plan fftw_plan_dft_r2c_1d(int n, double *in, fftw_complex *out,
Chris@19 443 unsigned flags);
Chris@19 444 fftw_plan fftw_plan_dft_c2r_1d(int n, fftw_complex *in, double *out,
Chris@19 445 unsigned flags);
Chris@19 446
Chris@19 447 for the real input to complex-Hermitian output ("r2c") and
Chris@19 448 complex-Hermitian input to real output ("c2r") transforms. Unlike the
Chris@19 449 complex DFT planner, there is no `sign' argument. Instead, r2c DFTs
Chris@19 450 are always `FFTW_FORWARD' and c2r DFTs are always `FFTW_BACKWARD'. (For
Chris@19 451 single/long-double precision `fftwf' and `fftwl', `double' should be
Chris@19 452 replaced by `float' and `long double', respectively.)
Chris@19 453
Chris@19 454 Here, `n' is the "logical" size of the DFT, not necessarily the
Chris@19 455 physical size of the array. In particular, the real (`double') array
Chris@19 456 has `n' elements, while the complex (`fftw_complex') array has `n/2+1'
Chris@19 457 elements (where the division is rounded down). For an in-place
Chris@19 458 transform, `in' and `out' are aliased to the same array, which must be
Chris@19 459 big enough to hold both; so, the real array would actually have
Chris@19 460 `2*(n/2+1)' elements, where the elements beyond the first `n' are
Chris@19 461 unused padding. (Note that this is very different from the concept of
Chris@19 462 "zero-padding" a transform to a larger length, which changes the
Chris@19 463 logical size of the DFT by actually adding new input data.) The kth
Chris@19 464 element of the complex array is exactly the same as the kth element of
Chris@19 465 the corresponding complex DFT. All positive `n' are supported;
Chris@19 466 products of small factors are most efficient, but an O(n log n)
Chris@19 467 algorithm is used even for prime sizes.
Chris@19 468
Chris@19 469 As noted above, the c2r transform destroys its input array even for
Chris@19 470 out-of-place transforms. This can be prevented, if necessary, by
Chris@19 471 including `FFTW_PRESERVE_INPUT' in the `flags', with unfortunately some
Chris@19 472 sacrifice in performance. This flag is also not currently supported
Chris@19 473 for multi-dimensional real DFTs (next section).
Chris@19 474
Chris@19 475 Readers familiar with DFTs of real data will recall that the 0th (the
Chris@19 476 "DC") and `n/2'-th (the "Nyquist" frequency, when `n' is even) elements
Chris@19 477 of the complex output are purely real. Some implementations therefore
Chris@19 478 store the Nyquist element where the DC imaginary part would go, in
Chris@19 479 order to make the input and output arrays the same size. Such packing,
Chris@19 480 however, does not generalize well to multi-dimensional transforms, and
Chris@19 481 the space savings are miniscule in any case; FFTW does not support it.
Chris@19 482
Chris@19 483 An alternative interface for one-dimensional r2c and c2r DFTs can be
Chris@19 484 found in the `r2r' interface (*note The Halfcomplex-format DFT::), with
Chris@19 485 "halfcomplex"-format output that _is_ the same size (and type) as the
Chris@19 486 input array. That interface, although it is not very useful for
Chris@19 487 multi-dimensional transforms, may sometimes yield better performance.
Chris@19 488
Chris@19 489 
Chris@19 490 File: fftw3.info, Node: Multi-Dimensional DFTs of Real Data, Next: More DFTs of Real Data, Prev: One-Dimensional DFTs of Real Data, Up: Tutorial
Chris@19 491
Chris@19 492 2.4 Multi-Dimensional DFTs of Real Data
Chris@19 493 =======================================
Chris@19 494
Chris@19 495 Multi-dimensional DFTs of real data use the following planner routines:
Chris@19 496
Chris@19 497 fftw_plan fftw_plan_dft_r2c_2d(int n0, int n1,
Chris@19 498 double *in, fftw_complex *out,
Chris@19 499 unsigned flags);
Chris@19 500 fftw_plan fftw_plan_dft_r2c_3d(int n0, int n1, int n2,
Chris@19 501 double *in, fftw_complex *out,
Chris@19 502 unsigned flags);
Chris@19 503 fftw_plan fftw_plan_dft_r2c(int rank, const int *n,
Chris@19 504 double *in, fftw_complex *out,
Chris@19 505 unsigned flags);
Chris@19 506
Chris@19 507 as well as the corresponding `c2r' routines with the input/output
Chris@19 508 types swapped. These routines work similarly to their complex
Chris@19 509 analogues, except for the fact that here the complex output array is cut
Chris@19 510 roughly in half and the real array requires padding for in-place
Chris@19 511 transforms (as in 1d, above).
Chris@19 512
Chris@19 513 As before, `n' is the logical size of the array, and the
Chris@19 514 consequences of this on the the format of the complex arrays deserve
Chris@19 515 careful attention. Suppose that the real data has dimensions n[0] x
Chris@19 516 n[1] x n[2] x ... x n[d-1] (in row-major order). Then, after an r2c
Chris@19 517 transform, the output is an n[0] x n[1] x n[2] x ... x (n[d-1]/2 + 1)
Chris@19 518 array of `fftw_complex' values in row-major order, corresponding to
Chris@19 519 slightly over half of the output of the corresponding complex DFT.
Chris@19 520 (The division is rounded down.) The ordering of the data is otherwise
Chris@19 521 exactly the same as in the complex-DFT case.
Chris@19 522
Chris@19 523 For out-of-place transforms, this is the end of the story: the real
Chris@19 524 data is stored as a row-major array of size n[0] x n[1] x n[2] x ... x
Chris@19 525 n[d-1] and the complex data is stored as a row-major array of size
Chris@19 526 n[0] x n[1] x n[2] x ... x (n[d-1]/2 + 1) .
Chris@19 527
Chris@19 528 For in-place transforms, however, extra padding of the real-data
Chris@19 529 array is necessary because the complex array is larger than the real
Chris@19 530 array, and the two arrays share the same memory locations. Thus, for
Chris@19 531 in-place transforms, the final dimension of the real-data array must be
Chris@19 532 padded with extra values to accommodate the size of the complex
Chris@19 533 data--two values if the last dimension is even and one if it is odd. That
Chris@19 534 is, the last dimension of the real data must physically contain 2 *
Chris@19 535 (n[d-1]/2+1) `double' values (exactly enough to hold the complex data).
Chris@19 536 This physical array size does not, however, change the _logical_ array
Chris@19 537 size--only n[d-1] values are actually stored in the last dimension, and
Chris@19 538 n[d-1] is the last dimension passed to the plan-creation routine.
Chris@19 539
Chris@19 540 For example, consider the transform of a two-dimensional real array
Chris@19 541 of size `n0' by `n1'. The output of the r2c transform is a
Chris@19 542 two-dimensional complex array of size `n0' by `n1/2+1', where the `y'
Chris@19 543 dimension has been cut nearly in half because of redundancies in the
Chris@19 544 output. Because `fftw_complex' is twice the size of `double', the
Chris@19 545 output array is slightly bigger than the input array. Thus, if we want
Chris@19 546 to compute the transform in place, we must _pad_ the input array so
Chris@19 547 that it is of size `n0' by `2*(n1/2+1)'. If `n1' is even, then there
Chris@19 548 are two padding elements at the end of each row (which need not be
Chris@19 549 initialized, as they are only used for output).
Chris@19 550
Chris@19 551 These transforms are unnormalized, so an r2c followed by a c2r
Chris@19 552 transform (or vice versa) will result in the original data scaled by
Chris@19 553 the number of real data elements--that is, the product of the (logical)
Chris@19 554 dimensions of the real data.
Chris@19 555
Chris@19 556 (Because the last dimension is treated specially, if it is equal to
Chris@19 557 `1' the transform is _not_ equivalent to a lower-dimensional r2c/c2r
Chris@19 558 transform. In that case, the last complex dimension also has size `1'
Chris@19 559 (`=1/2+1'), and no advantage is gained over the complex transforms.)
Chris@19 560
Chris@19 561 
Chris@19 562 File: fftw3.info, Node: More DFTs of Real Data, Prev: Multi-Dimensional DFTs of Real Data, Up: Tutorial
Chris@19 563
Chris@19 564 2.5 More DFTs of Real Data
Chris@19 565 ==========================
Chris@19 566
Chris@19 567 * Menu:
Chris@19 568
Chris@19 569 * The Halfcomplex-format DFT::
Chris@19 570 * Real even/odd DFTs (cosine/sine transforms)::
Chris@19 571 * The Discrete Hartley Transform::
Chris@19 572
Chris@19 573 FFTW supports several other transform types via a unified "r2r"
Chris@19 574 (real-to-real) interface, so called because it takes a real (`double')
Chris@19 575 array and outputs a real array of the same size. These r2r transforms
Chris@19 576 currently fall into three categories: DFTs of real input and
Chris@19 577 complex-Hermitian output in halfcomplex format, DFTs of real input with
Chris@19 578 even/odd symmetry (a.k.a. discrete cosine/sine transforms, DCTs/DSTs),
Chris@19 579 and discrete Hartley transforms (DHTs), all described in more detail by
Chris@19 580 the following sections.
Chris@19 581
Chris@19 582 The r2r transforms follow the by now familiar interface of creating
Chris@19 583 an `fftw_plan', executing it with `fftw_execute(plan)', and destroying
Chris@19 584 it with `fftw_destroy_plan(plan)'. Furthermore, all r2r transforms
Chris@19 585 share the same planner interface:
Chris@19 586
Chris@19 587 fftw_plan fftw_plan_r2r_1d(int n, double *in, double *out,
Chris@19 588 fftw_r2r_kind kind, unsigned flags);
Chris@19 589 fftw_plan fftw_plan_r2r_2d(int n0, int n1, double *in, double *out,
Chris@19 590 fftw_r2r_kind kind0, fftw_r2r_kind kind1,
Chris@19 591 unsigned flags);
Chris@19 592 fftw_plan fftw_plan_r2r_3d(int n0, int n1, int n2,
Chris@19 593 double *in, double *out,
Chris@19 594 fftw_r2r_kind kind0,
Chris@19 595 fftw_r2r_kind kind1,
Chris@19 596 fftw_r2r_kind kind2,
Chris@19 597 unsigned flags);
Chris@19 598 fftw_plan fftw_plan_r2r(int rank, const int *n, double *in, double *out,
Chris@19 599 const fftw_r2r_kind *kind, unsigned flags);
Chris@19 600
Chris@19 601 Just as for the complex DFT, these plan 1d/2d/3d/multi-dimensional
Chris@19 602 transforms for contiguous arrays in row-major order, transforming (real)
Chris@19 603 input to output of the same size, where `n' specifies the _physical_
Chris@19 604 dimensions of the arrays. All positive `n' are supported (with the
Chris@19 605 exception of `n=1' for the `FFTW_REDFT00' kind, noted in the real-even
Chris@19 606 subsection below); products of small factors are most efficient
Chris@19 607 (factorizing `n-1' and `n+1' for `FFTW_REDFT00' and `FFTW_RODFT00'
Chris@19 608 kinds, described below), but an O(n log n) algorithm is used even for
Chris@19 609 prime sizes.
Chris@19 610
Chris@19 611 Each dimension has a "kind" parameter, of type `fftw_r2r_kind',
Chris@19 612 specifying the kind of r2r transform to be used for that dimension. (In
Chris@19 613 the case of `fftw_plan_r2r', this is an array `kind[rank]' where
Chris@19 614 `kind[i]' is the transform kind for the dimension `n[i]'.) The kind
Chris@19 615 can be one of a set of predefined constants, defined in the following
Chris@19 616 subsections.
Chris@19 617
Chris@19 618 In other words, FFTW computes the separable product of the specified
Chris@19 619 r2r transforms over each dimension, which can be used e.g. for partial
Chris@19 620 differential equations with mixed boundary conditions. (For some r2r
Chris@19 621 kinds, notably the halfcomplex DFT and the DHT, such a separable
Chris@19 622 product is somewhat problematic in more than one dimension, however, as
Chris@19 623 is described below.)
Chris@19 624
Chris@19 625 In the current version of FFTW, all r2r transforms except for the
Chris@19 626 halfcomplex type are computed via pre- or post-processing of
Chris@19 627 halfcomplex transforms, and they are therefore not as fast as they
Chris@19 628 could be. Since most other general DCT/DST codes employ a similar
Chris@19 629 algorithm, however, FFTW's implementation should provide at least
Chris@19 630 competitive performance.
Chris@19 631
Chris@19 632 
Chris@19 633 File: fftw3.info, Node: The Halfcomplex-format DFT, Next: Real even/odd DFTs (cosine/sine transforms), Prev: More DFTs of Real Data, Up: More DFTs of Real Data
Chris@19 634
Chris@19 635 2.5.1 The Halfcomplex-format DFT
Chris@19 636 --------------------------------
Chris@19 637
Chris@19 638 An r2r kind of `FFTW_R2HC' ("r2hc") corresponds to an r2c DFT (*note
Chris@19 639 One-Dimensional DFTs of Real Data::) but with "halfcomplex" format
Chris@19 640 output, and may sometimes be faster and/or more convenient than the
Chris@19 641 latter. The inverse "hc2r" transform is of kind `FFTW_HC2R'. This
Chris@19 642 consists of the non-redundant half of the complex output for a 1d
Chris@19 643 real-input DFT of size `n', stored as a sequence of `n' real numbers
Chris@19 644 (`double') in the format:
Chris@19 645
Chris@19 646 r0, r1, r2, r(n/2), i((n+1)/2-1), ..., i2, i1
Chris@19 647
Chris@19 648 Here, rk is the real part of the kth output, and ik is the imaginary
Chris@19 649 part. (Division by 2 is rounded down.) For a halfcomplex array
Chris@19 650 `hc[n]', the kth component thus has its real part in `hc[k]' and its
Chris@19 651 imaginary part in `hc[n-k]', with the exception of `k' `==' `0' or
Chris@19 652 `n/2' (the latter only if `n' is even)--in these two cases, the
Chris@19 653 imaginary part is zero due to symmetries of the real-input DFT, and is
Chris@19 654 not stored. Thus, the r2hc transform of `n' real values is a
Chris@19 655 halfcomplex array of length `n', and vice versa for hc2r.
Chris@19 656
Chris@19 657 Aside from the differing format, the output of
Chris@19 658 `FFTW_R2HC'/`FFTW_HC2R' is otherwise exactly the same as for the
Chris@19 659 corresponding 1d r2c/c2r transform (i.e. `FFTW_FORWARD'/`FFTW_BACKWARD'
Chris@19 660 transforms, respectively). Recall that these transforms are
Chris@19 661 unnormalized, so r2hc followed by hc2r will result in the original data
Chris@19 662 multiplied by `n'. Furthermore, like the c2r transform, an
Chris@19 663 out-of-place hc2r transform will _destroy its input_ array.
Chris@19 664
Chris@19 665 Although these halfcomplex transforms can be used with the
Chris@19 666 multi-dimensional r2r interface, the interpretation of such a separable
Chris@19 667 product of transforms along each dimension is problematic. For example,
Chris@19 668 consider a two-dimensional `n0' by `n1', r2hc by r2hc transform planned
Chris@19 669 by `fftw_plan_r2r_2d(n0, n1, in, out, FFTW_R2HC, FFTW_R2HC,
Chris@19 670 FFTW_MEASURE)'. Conceptually, FFTW first transforms the rows (of size
Chris@19 671 `n1') to produce halfcomplex rows, and then transforms the columns (of
Chris@19 672 size `n0'). Half of these column transforms, however, are of imaginary
Chris@19 673 parts, and should therefore be multiplied by i and combined with the
Chris@19 674 r2hc transforms of the real columns to produce the 2d DFT amplitudes;
Chris@19 675 FFTW's r2r transform does _not_ perform this combination for you.
Chris@19 676 Thus, if a multi-dimensional real-input/output DFT is required, we
Chris@19 677 recommend using the ordinary r2c/c2r interface (*note Multi-Dimensional
Chris@19 678 DFTs of Real Data::).
Chris@19 679
Chris@19 680 
Chris@19 681 File: fftw3.info, Node: Real even/odd DFTs (cosine/sine transforms), Next: The Discrete Hartley Transform, Prev: The Halfcomplex-format DFT, Up: More DFTs of Real Data
Chris@19 682
Chris@19 683 2.5.2 Real even/odd DFTs (cosine/sine transforms)
Chris@19 684 -------------------------------------------------
Chris@19 685
Chris@19 686 The Fourier transform of a real-even function f(-x) = f(x) is
Chris@19 687 real-even, and i times the Fourier transform of a real-odd function
Chris@19 688 f(-x) = -f(x) is real-odd. Similar results hold for a discrete Fourier
Chris@19 689 transform, and thus for these symmetries the need for complex
Chris@19 690 inputs/outputs is entirely eliminated. Moreover, one gains a factor of
Chris@19 691 two in speed/space from the fact that the data are real, and an
Chris@19 692 additional factor of two from the even/odd symmetry: only the
Chris@19 693 non-redundant (first) half of the array need be stored. The result is
Chris@19 694 the real-even DFT ("REDFT") and the real-odd DFT ("RODFT"), also known
Chris@19 695 as the discrete cosine and sine transforms ("DCT" and "DST"),
Chris@19 696 respectively.
Chris@19 697
Chris@19 698 (In this section, we describe the 1d transforms; multi-dimensional
Chris@19 699 transforms are just a separable product of these transforms operating
Chris@19 700 along each dimension.)
Chris@19 701
Chris@19 702 Because of the discrete sampling, one has an additional choice: is
Chris@19 703 the data even/odd around a sampling point, or around the point halfway
Chris@19 704 between two samples? The latter corresponds to _shifting_ the samples
Chris@19 705 by _half_ an interval, and gives rise to several transform variants
Chris@19 706 denoted by REDFTab and RODFTab: a and b are 0 or 1, and indicate
Chris@19 707 whether the input (a) and/or output (b) are shifted by half a sample (1
Chris@19 708 means it is shifted). These are also known as types I-IV of the DCT
Chris@19 709 and DST, and all four types are supported by FFTW's r2r interface.(1)
Chris@19 710
Chris@19 711 The r2r kinds for the various REDFT and RODFT types supported by
Chris@19 712 FFTW, along with the boundary conditions at both ends of the _input_
Chris@19 713 array (`n' real numbers `in[j=0..n-1]'), are:
Chris@19 714
Chris@19 715 * `FFTW_REDFT00' (DCT-I): even around j=0 and even around j=n-1.
Chris@19 716
Chris@19 717 * `FFTW_REDFT10' (DCT-II, "the" DCT): even around j=-0.5 and even
Chris@19 718 around j=n-0.5.
Chris@19 719
Chris@19 720 * `FFTW_REDFT01' (DCT-III, "the" IDCT): even around j=0 and odd
Chris@19 721 around j=n.
Chris@19 722
Chris@19 723 * `FFTW_REDFT11' (DCT-IV): even around j=-0.5 and odd around j=n-0.5.
Chris@19 724
Chris@19 725 * `FFTW_RODFT00' (DST-I): odd around j=-1 and odd around j=n.
Chris@19 726
Chris@19 727 * `FFTW_RODFT10' (DST-II): odd around j=-0.5 and odd around j=n-0.5.
Chris@19 728
Chris@19 729 * `FFTW_RODFT01' (DST-III): odd around j=-1 and even around j=n-1.
Chris@19 730
Chris@19 731 * `FFTW_RODFT11' (DST-IV): odd around j=-0.5 and even around j=n-0.5.
Chris@19 732
Chris@19 733
Chris@19 734 Note that these symmetries apply to the "logical" array being
Chris@19 735 transformed; *there are no constraints on your physical input data*.
Chris@19 736 So, for example, if you specify a size-5 REDFT00 (DCT-I) of the data
Chris@19 737 abcde, it corresponds to the DFT of the logical even array abcdedcb of
Chris@19 738 size 8. A size-4 REDFT10 (DCT-II) of the data abcd corresponds to the
Chris@19 739 size-8 logical DFT of the even array abcddcba, shifted by half a sample.
Chris@19 740
Chris@19 741 All of these transforms are invertible. The inverse of R*DFT00 is
Chris@19 742 R*DFT00; of R*DFT10 is R*DFT01 and vice versa (these are often called
Chris@19 743 simply "the" DCT and IDCT, respectively); and of R*DFT11 is R*DFT11.
Chris@19 744 However, the transforms computed by FFTW are unnormalized, exactly like
Chris@19 745 the corresponding real and complex DFTs, so computing a transform
Chris@19 746 followed by its inverse yields the original array scaled by N, where N
Chris@19 747 is the _logical_ DFT size. For REDFT00, N=2(n-1); for RODFT00,
Chris@19 748 N=2(n+1); otherwise, N=2n.
Chris@19 749
Chris@19 750 Note that the boundary conditions of the transform output array are
Chris@19 751 given by the input boundary conditions of the inverse transform. Thus,
Chris@19 752 the above transforms are all inequivalent in terms of input/output
Chris@19 753 boundary conditions, even neglecting the 0.5 shift difference.
Chris@19 754
Chris@19 755 FFTW is most efficient when N is a product of small factors; note
Chris@19 756 that this _differs_ from the factorization of the physical size `n' for
Chris@19 757 REDFT00 and RODFT00! There is another oddity: `n=1' REDFT00 transforms
Chris@19 758 correspond to N=0, and so are _not defined_ (the planner will return
Chris@19 759 `NULL'). Otherwise, any positive `n' is supported.
Chris@19 760
Chris@19 761 For the precise mathematical definitions of these transforms as used
Chris@19 762 by FFTW, see *note What FFTW Really Computes::. (For people accustomed
Chris@19 763 to the DCT/DST, FFTW's definitions have a coefficient of 2 in front of
Chris@19 764 the cos/sin functions so that they correspond precisely to an even/odd
Chris@19 765 DFT of size N. Some authors also include additional multiplicative
Chris@19 766 factors of sqrt(2) for selected inputs and outputs; this makes the
Chris@19 767 transform orthogonal, but sacrifices the direct equivalence to a
Chris@19 768 symmetric DFT.)
Chris@19 769
Chris@19 770 Which type do you need?
Chris@19 771 .......................
Chris@19 772
Chris@19 773 Since the required flavor of even/odd DFT depends upon your problem,
Chris@19 774 you are the best judge of this choice, but we can make a few comments
Chris@19 775 on relative efficiency to help you in your selection. In particular,
Chris@19 776 R*DFT01 and R*DFT10 tend to be slightly faster than R*DFT11 (especially
Chris@19 777 for odd sizes), while the R*DFT00 transforms are sometimes
Chris@19 778 significantly slower (especially for even sizes).(2)
Chris@19 779
Chris@19 780 Thus, if only the boundary conditions on the transform inputs are
Chris@19 781 specified, we generally recommend R*DFT10 over R*DFT00 and R*DFT01 over
Chris@19 782 R*DFT11 (unless the half-sample shift or the self-inverse property is
Chris@19 783 significant for your problem).
Chris@19 784
Chris@19 785 If performance is important to you and you are using only small sizes
Chris@19 786 (say n<200), e.g. for multi-dimensional transforms, then you might
Chris@19 787 consider generating hard-coded transforms of those sizes and types that
Chris@19 788 you are interested in (*note Generating your own code::).
Chris@19 789
Chris@19 790 We are interested in hearing what types of symmetric transforms you
Chris@19 791 find most useful.
Chris@19 792
Chris@19 793 ---------- Footnotes ----------
Chris@19 794
Chris@19 795 (1) There are also type V-VIII transforms, which correspond to a
Chris@19 796 logical DFT of _odd_ size N, independent of whether the physical size
Chris@19 797 `n' is odd, but we do not support these variants.
Chris@19 798
Chris@19 799 (2) R*DFT00 is sometimes slower in FFTW because we discovered that
Chris@19 800 the standard algorithm for computing this by a pre/post-processed real
Chris@19 801 DFT--the algorithm used in FFTPACK, Numerical Recipes, and other
Chris@19 802 sources for decades now--has serious numerical problems: it already
Chris@19 803 loses several decimal places of accuracy for 16k sizes. There seem to
Chris@19 804 be only two alternatives in the literature that do not suffer
Chris@19 805 similarly: a recursive decomposition into smaller DCTs, which would
Chris@19 806 require a large set of codelets for efficiency and generality, or
Chris@19 807 sacrificing a factor of 2 in speed to use a real DFT of twice the size.
Chris@19 808 We currently employ the latter technique for general n, as well as a
Chris@19 809 limited form of the former method: a split-radix decomposition when n
Chris@19 810 is odd (N a multiple of 4). For N containing many factors of 2, the
Chris@19 811 split-radix method seems to recover most of the speed of the standard
Chris@19 812 algorithm without the accuracy tradeoff.
Chris@19 813
Chris@19 814 
Chris@19 815 File: fftw3.info, Node: The Discrete Hartley Transform, Prev: Real even/odd DFTs (cosine/sine transforms), Up: More DFTs of Real Data
Chris@19 816
Chris@19 817 2.5.3 The Discrete Hartley Transform
Chris@19 818 ------------------------------------
Chris@19 819
Chris@19 820 If you are planning to use the DHT because you've heard that it is
Chris@19 821 "faster" than the DFT (FFT), *stop here*. The DHT is not faster than
Chris@19 822 the DFT. That story is an old but enduring misconception that was
Chris@19 823 debunked in 1987.
Chris@19 824
Chris@19 825 The discrete Hartley transform (DHT) is an invertible linear
Chris@19 826 transform closely related to the DFT. In the DFT, one multiplies each
Chris@19 827 input by cos - i * sin (a complex exponential), whereas in the DHT each
Chris@19 828 input is multiplied by simply cos + sin. Thus, the DHT transforms `n'
Chris@19 829 real numbers to `n' real numbers, and has the convenient property of
Chris@19 830 being its own inverse. In FFTW, a DHT (of any positive `n') can be
Chris@19 831 specified by an r2r kind of `FFTW_DHT'.
Chris@19 832
Chris@19 833 Like the DFT, in FFTW the DHT is unnormalized, so computing a DHT of
Chris@19 834 size `n' followed by another DHT of the same size will result in the
Chris@19 835 original array multiplied by `n'.
Chris@19 836
Chris@19 837 The DHT was originally proposed as a more efficient alternative to
Chris@19 838 the DFT for real data, but it was subsequently shown that a specialized
Chris@19 839 DFT (such as FFTW's r2hc or r2c transforms) could be just as fast. In
Chris@19 840 FFTW, the DHT is actually computed by post-processing an r2hc
Chris@19 841 transform, so there is ordinarily no reason to prefer it from a
Chris@19 842 performance perspective.(1) However, we have heard rumors that the DHT
Chris@19 843 might be the most appropriate transform in its own right for certain
Chris@19 844 applications, and we would be very interested to hear from anyone who
Chris@19 845 finds it useful.
Chris@19 846
Chris@19 847 If `FFTW_DHT' is specified for multiple dimensions of a
Chris@19 848 multi-dimensional transform, FFTW computes the separable product of 1d
Chris@19 849 DHTs along each dimension. Unfortunately, this is not quite the same
Chris@19 850 thing as a true multi-dimensional DHT; you can compute the latter, if
Chris@19 851 necessary, with at most `rank-1' post-processing passes [see e.g. H.
Chris@19 852 Hao and R. N. Bracewell, Proc. IEEE 75, 264-266 (1987)].
Chris@19 853
Chris@19 854 For the precise mathematical definition of the DHT as used by FFTW,
Chris@19 855 see *note What FFTW Really Computes::.
Chris@19 856
Chris@19 857 ---------- Footnotes ----------
Chris@19 858
Chris@19 859 (1) We provide the DHT mainly as a byproduct of some internal
Chris@19 860 algorithms. FFTW computes a real input/output DFT of _prime_ size by
Chris@19 861 re-expressing it as a DHT plus post/pre-processing and then using
Chris@19 862 Rader's prime-DFT algorithm adapted to the DHT.
Chris@19 863
Chris@19 864 
Chris@19 865 File: fftw3.info, Node: Other Important Topics, Next: FFTW Reference, Prev: Tutorial, Up: Top
Chris@19 866
Chris@19 867 3 Other Important Topics
Chris@19 868 ************************
Chris@19 869
Chris@19 870 * Menu:
Chris@19 871
Chris@19 872 * SIMD alignment and fftw_malloc::
Chris@19 873 * Multi-dimensional Array Format::
Chris@19 874 * Words of Wisdom-Saving Plans::
Chris@19 875 * Caveats in Using Wisdom::
Chris@19 876
Chris@19 877 
Chris@19 878 File: fftw3.info, Node: SIMD alignment and fftw_malloc, Next: Multi-dimensional Array Format, Prev: Other Important Topics, Up: Other Important Topics
Chris@19 879
Chris@19 880 3.1 SIMD alignment and fftw_malloc
Chris@19 881 ==================================
Chris@19 882
Chris@19 883 SIMD, which stands for "Single Instruction Multiple Data," is a set of
Chris@19 884 special operations supported by some processors to perform a single
Chris@19 885 operation on several numbers (usually 2 or 4) simultaneously. SIMD
Chris@19 886 floating-point instructions are available on several popular CPUs:
Chris@19 887 SSE/SSE2/AVX on recent x86/x86-64 processors, AltiVec (single precision)
Chris@19 888 on some PowerPCs (Apple G4 and higher), NEON on some ARM models, and
Chris@19 889 MIPS Paired Single (currently only in FFTW 3.2.x). FFTW can be
Chris@19 890 compiled to support the SIMD instructions on any of these systems.
Chris@19 891
Chris@19 892 A program linking to an FFTW library compiled with SIMD support can
Chris@19 893 obtain a nonnegligible speedup for most complex and r2c/c2r transforms.
Chris@19 894 In order to obtain this speedup, however, the arrays of complex (or
Chris@19 895 real) data passed to FFTW must be specially aligned in memory
Chris@19 896 (typically 16-byte aligned), and often this alignment is more stringent
Chris@19 897 than that provided by the usual `malloc' (etc.) allocation routines.
Chris@19 898
Chris@19 899 In order to guarantee proper alignment for SIMD, therefore, in case
Chris@19 900 your program is ever linked against a SIMD-using FFTW, we recommend
Chris@19 901 allocating your transform data with `fftw_malloc' and de-allocating it
Chris@19 902 with `fftw_free'. These have exactly the same interface and behavior as
Chris@19 903 `malloc'/`free', except that for a SIMD FFTW they ensure that the
Chris@19 904 returned pointer has the necessary alignment (by calling `memalign' or
Chris@19 905 its equivalent on your OS).
Chris@19 906
Chris@19 907 You are not _required_ to use `fftw_malloc'. You can allocate your
Chris@19 908 data in any way that you like, from `malloc' to `new' (in C++) to a
Chris@19 909 fixed-size array declaration. If the array happens not to be properly
Chris@19 910 aligned, FFTW will not use the SIMD extensions.
Chris@19 911
Chris@19 912 Since `fftw_malloc' only ever needs to be used for real and complex
Chris@19 913 arrays, we provide two convenient wrapper routines `fftw_alloc_real(N)'
Chris@19 914 and `fftw_alloc_complex(N)' that are equivalent to
Chris@19 915 `(double*)fftw_malloc(sizeof(double) * N)' and
Chris@19 916 `(fftw_complex*)fftw_malloc(sizeof(fftw_complex) * N)', respectively
Chris@19 917 (or their equivalents in other precisions).
Chris@19 918
Chris@19 919 
Chris@19 920 File: fftw3.info, Node: Multi-dimensional Array Format, Next: Words of Wisdom-Saving Plans, Prev: SIMD alignment and fftw_malloc, Up: Other Important Topics
Chris@19 921
Chris@19 922 3.2 Multi-dimensional Array Format
Chris@19 923 ==================================
Chris@19 924
Chris@19 925 This section describes the format in which multi-dimensional arrays are
Chris@19 926 stored in FFTW. We felt that a detailed discussion of this topic was
Chris@19 927 necessary. Since several different formats are common, this topic is
Chris@19 928 often a source of confusion.
Chris@19 929
Chris@19 930 * Menu:
Chris@19 931
Chris@19 932 * Row-major Format::
Chris@19 933 * Column-major Format::
Chris@19 934 * Fixed-size Arrays in C::
Chris@19 935 * Dynamic Arrays in C::
Chris@19 936 * Dynamic Arrays in C-The Wrong Way::
Chris@19 937
Chris@19 938 
Chris@19 939 File: fftw3.info, Node: Row-major Format, Next: Column-major Format, Prev: Multi-dimensional Array Format, Up: Multi-dimensional Array Format
Chris@19 940
Chris@19 941 3.2.1 Row-major Format
Chris@19 942 ----------------------
Chris@19 943
Chris@19 944 The multi-dimensional arrays passed to `fftw_plan_dft' etcetera are
Chris@19 945 expected to be stored as a single contiguous block in "row-major" order
Chris@19 946 (sometimes called "C order"). Basically, this means that as you step
Chris@19 947 through adjacent memory locations, the first dimension's index varies
Chris@19 948 most slowly and the last dimension's index varies most quickly.
Chris@19 949
Chris@19 950 To be more explicit, let us consider an array of rank d whose
Chris@19 951 dimensions are n[0] x n[1] x n[2] x ... x n[d-1] . Now, we specify a
Chris@19 952 location in the array by a sequence of d (zero-based) indices, one for
Chris@19 953 each dimension: (i[0], i[1], ..., i[d-1]). If the array is stored in
Chris@19 954 row-major order, then this element is located at the position i[d-1] +
Chris@19 955 n[d-1] * (i[d-2] + n[d-2] * (... + n[1] * i[0])).
Chris@19 956
Chris@19 957 Note that, for the ordinary complex DFT, each element of the array
Chris@19 958 must be of type `fftw_complex'; i.e. a (real, imaginary) pair of
Chris@19 959 (double-precision) numbers.
Chris@19 960
Chris@19 961 In the advanced FFTW interface, the physical dimensions n from which
Chris@19 962 the indices are computed can be different from (larger than) the
Chris@19 963 logical dimensions of the transform to be computed, in order to
Chris@19 964 transform a subset of a larger array. Note also that, in the advanced
Chris@19 965 interface, the expression above is multiplied by a "stride" to get the
Chris@19 966 actual array index--this is useful in situations where each element of
Chris@19 967 the multi-dimensional array is actually a data structure (or another
Chris@19 968 array), and you just want to transform a single field. In the basic
Chris@19 969 interface, however, the stride is 1.
Chris@19 970
Chris@19 971 
Chris@19 972 File: fftw3.info, Node: Column-major Format, Next: Fixed-size Arrays in C, Prev: Row-major Format, Up: Multi-dimensional Array Format
Chris@19 973
Chris@19 974 3.2.2 Column-major Format
Chris@19 975 -------------------------
Chris@19 976
Chris@19 977 Readers from the Fortran world are used to arrays stored in
Chris@19 978 "column-major" order (sometimes called "Fortran order"). This is
Chris@19 979 essentially the exact opposite of row-major order in that, here, the
Chris@19 980 _first_ dimension's index varies most quickly.
Chris@19 981
Chris@19 982 If you have an array stored in column-major order and wish to
Chris@19 983 transform it using FFTW, it is quite easy to do. When creating the
Chris@19 984 plan, simply pass the dimensions of the array to the planner in
Chris@19 985 _reverse order_. For example, if your array is a rank three `N x M x
Chris@19 986 L' matrix in column-major order, you should pass the dimensions of the
Chris@19 987 array as if it were an `L x M x N' matrix (which it is, from the
Chris@19 988 perspective of FFTW). This is done for you _automatically_ by the FFTW
Chris@19 989 legacy-Fortran interface (*note Calling FFTW from Legacy Fortran::),
Chris@19 990 but you must do it manually with the modern Fortran interface (*note
Chris@19 991 Reversing array dimensions::).
Chris@19 992
Chris@19 993 
Chris@19 994 File: fftw3.info, Node: Fixed-size Arrays in C, Next: Dynamic Arrays in C, Prev: Column-major Format, Up: Multi-dimensional Array Format
Chris@19 995
Chris@19 996 3.2.3 Fixed-size Arrays in C
Chris@19 997 ----------------------------
Chris@19 998
Chris@19 999 A multi-dimensional array whose size is declared at compile time in C
Chris@19 1000 is _already_ in row-major order. You don't have to do anything special
Chris@19 1001 to transform it. For example:
Chris@19 1002
Chris@19 1003 {
Chris@19 1004 fftw_complex data[N0][N1][N2];
Chris@19 1005 fftw_plan plan;
Chris@19 1006 ...
Chris@19 1007 plan = fftw_plan_dft_3d(N0, N1, N2, &data[0][0][0], &data[0][0][0],
Chris@19 1008 FFTW_FORWARD, FFTW_ESTIMATE);
Chris@19 1009 ...
Chris@19 1010 }
Chris@19 1011
Chris@19 1012 This will plan a 3d in-place transform of size `N0 x N1 x N2'.
Chris@19 1013 Notice how we took the address of the zero-th element to pass to the
Chris@19 1014 planner (we could also have used a typecast).
Chris@19 1015
Chris@19 1016 However, we tend to _discourage_ users from declaring their arrays
Chris@19 1017 in this way, for two reasons. First, this allocates the array on the
Chris@19 1018 stack ("automatic" storage), which has a very limited size on most
Chris@19 1019 operating systems (declaring an array with more than a few thousand
Chris@19 1020 elements will often cause a crash). (You can get around this
Chris@19 1021 limitation on many systems by declaring the array as `static' and/or
Chris@19 1022 global, but that has its own drawbacks.) Second, it may not optimally
Chris@19 1023 align the array for use with a SIMD FFTW (*note SIMD alignment and
Chris@19 1024 fftw_malloc::). Instead, we recommend using `fftw_malloc', as
Chris@19 1025 described below.
Chris@19 1026
Chris@19 1027 
Chris@19 1028 File: fftw3.info, Node: Dynamic Arrays in C, Next: Dynamic Arrays in C-The Wrong Way, Prev: Fixed-size Arrays in C, Up: Multi-dimensional Array Format
Chris@19 1029
Chris@19 1030 3.2.4 Dynamic Arrays in C
Chris@19 1031 -------------------------
Chris@19 1032
Chris@19 1033 We recommend allocating most arrays dynamically, with `fftw_malloc'.
Chris@19 1034 This isn't too hard to do, although it is not as straightforward for
Chris@19 1035 multi-dimensional arrays as it is for one-dimensional arrays.
Chris@19 1036
Chris@19 1037 Creating the array is simple: using a dynamic-allocation routine like
Chris@19 1038 `fftw_malloc', allocate an array big enough to store N `fftw_complex'
Chris@19 1039 values (for a complex DFT), where N is the product of the sizes of the
Chris@19 1040 array dimensions (i.e. the total number of complex values in the
Chris@19 1041 array). For example, here is code to allocate a 5 x 12 x 27 rank-3
Chris@19 1042 array:
Chris@19 1043
Chris@19 1044 fftw_complex *an_array;
Chris@19 1045 an_array = (fftw_complex*) fftw_malloc(5*12*27 * sizeof(fftw_complex));
Chris@19 1046
Chris@19 1047 Accessing the array elements, however, is more tricky--you can't
Chris@19 1048 simply use multiple applications of the `[]' operator like you could
Chris@19 1049 for fixed-size arrays. Instead, you have to explicitly compute the
Chris@19 1050 offset into the array using the formula given earlier for row-major
Chris@19 1051 arrays. For example, to reference the (i,j,k)-th element of the array
Chris@19 1052 allocated above, you would use the expression `an_array[k + 27 * (j +
Chris@19 1053 12 * i)]'.
Chris@19 1054
Chris@19 1055 This pain can be alleviated somewhat by defining appropriate macros,
Chris@19 1056 or, in C++, creating a class and overloading the `()' operator. The
Chris@19 1057 recent C99 standard provides a way to reinterpret the dynamic array as
Chris@19 1058 a "variable-length" multi-dimensional array amenable to `[]', but this
Chris@19 1059 feature is not yet widely supported by compilers.
Chris@19 1060
Chris@19 1061 
Chris@19 1062 File: fftw3.info, Node: Dynamic Arrays in C-The Wrong Way, Prev: Dynamic Arrays in C, Up: Multi-dimensional Array Format
Chris@19 1063
Chris@19 1064 3.2.5 Dynamic Arrays in C--The Wrong Way
Chris@19 1065 ----------------------------------------
Chris@19 1066
Chris@19 1067 A different method for allocating multi-dimensional arrays in C is
Chris@19 1068 often suggested that is incompatible with FFTW: _using it will cause
Chris@19 1069 FFTW to die a painful death_. We discuss the technique here, however,
Chris@19 1070 because it is so commonly known and used. This method is to create
Chris@19 1071 arrays of pointers of arrays of pointers of ...etcetera. For example,
Chris@19 1072 the analogue in this method to the example above is:
Chris@19 1073
Chris@19 1074 int i,j;
Chris@19 1075 fftw_complex ***a_bad_array; /* another way to make a 5x12x27 array */
Chris@19 1076
Chris@19 1077 a_bad_array = (fftw_complex ***) malloc(5 * sizeof(fftw_complex **));
Chris@19 1078 for (i = 0; i < 5; ++i) {
Chris@19 1079 a_bad_array[i] =
Chris@19 1080 (fftw_complex **) malloc(12 * sizeof(fftw_complex *));
Chris@19 1081 for (j = 0; j < 12; ++j)
Chris@19 1082 a_bad_array[i][j] =
Chris@19 1083 (fftw_complex *) malloc(27 * sizeof(fftw_complex));
Chris@19 1084 }
Chris@19 1085
Chris@19 1086 As you can see, this sort of array is inconvenient to allocate (and
Chris@19 1087 deallocate). On the other hand, it has the advantage that the
Chris@19 1088 (i,j,k)-th element can be referenced simply by `a_bad_array[i][j][k]'.
Chris@19 1089
Chris@19 1090 If you like this technique and want to maximize convenience in
Chris@19 1091 accessing the array, but still want to pass the array to FFTW, you can
Chris@19 1092 use a hybrid method. Allocate the array as one contiguous block, but
Chris@19 1093 also declare an array of arrays of pointers that point to appropriate
Chris@19 1094 places in the block. That sort of trick is beyond the scope of this
Chris@19 1095 documentation; for more information on multi-dimensional arrays in C,
Chris@19 1096 see the `comp.lang.c' FAQ (http://c-faq.com/aryptr/dynmuldimary.html).
Chris@19 1097
Chris@19 1098 
Chris@19 1099 File: fftw3.info, Node: Words of Wisdom-Saving Plans, Next: Caveats in Using Wisdom, Prev: Multi-dimensional Array Format, Up: Other Important Topics
Chris@19 1100
Chris@19 1101 3.3 Words of Wisdom--Saving Plans
Chris@19 1102 =================================
Chris@19 1103
Chris@19 1104 FFTW implements a method for saving plans to disk and restoring them.
Chris@19 1105 In fact, what FFTW does is more general than just saving and loading
Chris@19 1106 plans. The mechanism is called "wisdom". Here, we describe this
Chris@19 1107 feature at a high level. *Note FFTW Reference::, for a less casual but
Chris@19 1108 more complete discussion of how to use wisdom in FFTW.
Chris@19 1109
Chris@19 1110 Plans created with the `FFTW_MEASURE', `FFTW_PATIENT', or
Chris@19 1111 `FFTW_EXHAUSTIVE' options produce near-optimal FFT performance, but may
Chris@19 1112 require a long time to compute because FFTW must measure the runtime of
Chris@19 1113 many possible plans and select the best one. This setup is designed
Chris@19 1114 for the situations where so many transforms of the same size must be
Chris@19 1115 computed that the start-up time is irrelevant. For short
Chris@19 1116 initialization times, but slower transforms, we have provided
Chris@19 1117 `FFTW_ESTIMATE'. The `wisdom' mechanism is a way to get the best of
Chris@19 1118 both worlds: you compute a good plan once, save it to disk, and later
Chris@19 1119 reload it as many times as necessary. The wisdom mechanism can
Chris@19 1120 actually save and reload many plans at once, not just one.
Chris@19 1121
Chris@19 1122 Whenever you create a plan, the FFTW planner accumulates wisdom,
Chris@19 1123 which is information sufficient to reconstruct the plan. After
Chris@19 1124 planning, you can save this information to disk by means of the
Chris@19 1125 function:
Chris@19 1126 int fftw_export_wisdom_to_filename(const char *filename);
Chris@19 1127 (This function returns non-zero on success.)
Chris@19 1128
Chris@19 1129 The next time you run the program, you can restore the wisdom with
Chris@19 1130 `fftw_import_wisdom_from_filename' (which also returns non-zero on
Chris@19 1131 success), and then recreate the plan using the same flags as before.
Chris@19 1132 int fftw_import_wisdom_from_filename(const char *filename);
Chris@19 1133
Chris@19 1134 Wisdom is automatically used for any size to which it is applicable,
Chris@19 1135 as long as the planner flags are not more "patient" than those with
Chris@19 1136 which the wisdom was created. For example, wisdom created with
Chris@19 1137 `FFTW_MEASURE' can be used if you later plan with `FFTW_ESTIMATE' or
Chris@19 1138 `FFTW_MEASURE', but not with `FFTW_PATIENT'.
Chris@19 1139
Chris@19 1140 The `wisdom' is cumulative, and is stored in a global, private data
Chris@19 1141 structure managed internally by FFTW. The storage space required is
Chris@19 1142 minimal, proportional to the logarithm of the sizes the wisdom was
Chris@19 1143 generated from. If memory usage is a concern, however, the wisdom can
Chris@19 1144 be forgotten and its associated memory freed by calling:
Chris@19 1145 void fftw_forget_wisdom(void);
Chris@19 1146
Chris@19 1147 Wisdom can be exported to a file, a string, or any other medium.
Chris@19 1148 For details, see *note Wisdom::.
Chris@19 1149
Chris@19 1150 
Chris@19 1151 File: fftw3.info, Node: Caveats in Using Wisdom, Prev: Words of Wisdom-Saving Plans, Up: Other Important Topics
Chris@19 1152
Chris@19 1153 3.4 Caveats in Using Wisdom
Chris@19 1154 ===========================
Chris@19 1155
Chris@19 1156 For in much wisdom is much grief, and he that increaseth knowledge
Chris@19 1157 increaseth sorrow. [Ecclesiastes 1:18]
Chris@19 1158
Chris@19 1159 There are pitfalls to using wisdom, in that it can negate FFTW's
Chris@19 1160 ability to adapt to changing hardware and other conditions. For
Chris@19 1161 example, it would be perfectly possible to export wisdom from a program
Chris@19 1162 running on one processor and import it into a program running on
Chris@19 1163 another processor. Doing so, however, would mean that the second
Chris@19 1164 program would use plans optimized for the first processor, instead of
Chris@19 1165 the one it is running on.
Chris@19 1166
Chris@19 1167 It should be safe to reuse wisdom as long as the hardware and program
Chris@19 1168 binaries remain unchanged. (Actually, the optimal plan may change even
Chris@19 1169 between runs of the same binary on identical hardware, due to
Chris@19 1170 differences in the virtual memory environment, etcetera. Users
Chris@19 1171 seriously interested in performance should worry about this problem,
Chris@19 1172 too.) It is likely that, if the same wisdom is used for two different
Chris@19 1173 program binaries, even running on the same machine, the plans may be
Chris@19 1174 sub-optimal because of differing code alignments. It is therefore wise
Chris@19 1175 to recreate wisdom every time an application is recompiled. The more
Chris@19 1176 the underlying hardware and software changes between the creation of
Chris@19 1177 wisdom and its use, the greater grows the risk of sub-optimal plans.
Chris@19 1178
Chris@19 1179 Nevertheless, if the choice is between using `FFTW_ESTIMATE' or
Chris@19 1180 using possibly-suboptimal wisdom (created on the same machine, but for a
Chris@19 1181 different binary), the wisdom is likely to be better. For this reason,
Chris@19 1182 we provide a function to import wisdom from a standard system-wide
Chris@19 1183 location (`/etc/fftw/wisdom' on Unix):
Chris@19 1184
Chris@19 1185 int fftw_import_system_wisdom(void);
Chris@19 1186
Chris@19 1187 FFTW also provides a standalone program, `fftw-wisdom' (described by
Chris@19 1188 its own `man' page on Unix) with which users can create wisdom, e.g.
Chris@19 1189 for a canonical set of sizes to store in the system wisdom file. *Note
Chris@19 1190 Wisdom Utilities::.
Chris@19 1191
Chris@19 1192 
Chris@19 1193 File: fftw3.info, Node: FFTW Reference, Next: Multi-threaded FFTW, Prev: Other Important Topics, Up: Top
Chris@19 1194
Chris@19 1195 4 FFTW Reference
Chris@19 1196 ****************
Chris@19 1197
Chris@19 1198 This chapter provides a complete reference for all sequential (i.e.,
Chris@19 1199 one-processor) FFTW functions. Parallel transforms are described in
Chris@19 1200 later chapters.
Chris@19 1201
Chris@19 1202 * Menu:
Chris@19 1203
Chris@19 1204 * Data Types and Files::
Chris@19 1205 * Using Plans::
Chris@19 1206 * Basic Interface::
Chris@19 1207 * Advanced Interface::
Chris@19 1208 * Guru Interface::
Chris@19 1209 * New-array Execute Functions::
Chris@19 1210 * Wisdom::
Chris@19 1211 * What FFTW Really Computes::
Chris@19 1212
Chris@19 1213 
Chris@19 1214 File: fftw3.info, Node: Data Types and Files, Next: Using Plans, Prev: FFTW Reference, Up: FFTW Reference
Chris@19 1215
Chris@19 1216 4.1 Data Types and Files
Chris@19 1217 ========================
Chris@19 1218
Chris@19 1219 All programs using FFTW should include its header file:
Chris@19 1220
Chris@19 1221 #include <fftw3.h>
Chris@19 1222
Chris@19 1223 You must also link to the FFTW library. On Unix, this means adding
Chris@19 1224 `-lfftw3 -lm' at the _end_ of the link command.
Chris@19 1225
Chris@19 1226 * Menu:
Chris@19 1227
Chris@19 1228 * Complex numbers::
Chris@19 1229 * Precision::
Chris@19 1230 * Memory Allocation::
Chris@19 1231
Chris@19 1232 
Chris@19 1233 File: fftw3.info, Node: Complex numbers, Next: Precision, Prev: Data Types and Files, Up: Data Types and Files
Chris@19 1234
Chris@19 1235 4.1.1 Complex numbers
Chris@19 1236 ---------------------
Chris@19 1237
Chris@19 1238 The default FFTW interface uses `double' precision for all
Chris@19 1239 floating-point numbers, and defines a `fftw_complex' type to hold
Chris@19 1240 complex numbers as:
Chris@19 1241
Chris@19 1242 typedef double fftw_complex[2];
Chris@19 1243
Chris@19 1244 Here, the `[0]' element holds the real part and the `[1]' element
Chris@19 1245 holds the imaginary part.
Chris@19 1246
Chris@19 1247 Alternatively, if you have a C compiler (such as `gcc') that
Chris@19 1248 supports the C99 revision of the ANSI C standard, you can use C's new
Chris@19 1249 native complex type (which is binary-compatible with the typedef above).
Chris@19 1250 In particular, if you `#include <complex.h>' _before_ `<fftw3.h>', then
Chris@19 1251 `fftw_complex' is defined to be the native complex type and you can
Chris@19 1252 manipulate it with ordinary arithmetic (e.g. `x = y * (3+4*I)', where
Chris@19 1253 `x' and `y' are `fftw_complex' and `I' is the standard symbol for the
Chris@19 1254 imaginary unit);
Chris@19 1255
Chris@19 1256 C++ has its own `complex<T>' template class, defined in the standard
Chris@19 1257 `<complex>' header file. Reportedly, the C++ standards committee has
Chris@19 1258 recently agreed to mandate that the storage format used for this type
Chris@19 1259 be binary-compatible with the C99 type, i.e. an array `T[2]' with
Chris@19 1260 consecutive real `[0]' and imaginary `[1]' parts. (See report
Chris@19 1261 `http://www.open-std.org/jtc1/sc22/WG21/docs/papers/2002/n1388.pdf
Chris@19 1262 WG21/N1388'.) Although not part of the official standard as of this
Chris@19 1263 writing, the proposal stated that: "This solution has been tested with
Chris@19 1264 all current major implementations of the standard library and shown to
Chris@19 1265 be working." To the extent that this is true, if you have a variable
Chris@19 1266 `complex<double> *x', you can pass it directly to FFTW via
Chris@19 1267 `reinterpret_cast<fftw_complex*>(x)'.
Chris@19 1268
Chris@19 1269 
Chris@19 1270 File: fftw3.info, Node: Precision, Next: Memory Allocation, Prev: Complex numbers, Up: Data Types and Files
Chris@19 1271
Chris@19 1272 4.1.2 Precision
Chris@19 1273 ---------------
Chris@19 1274
Chris@19 1275 You can install single and long-double precision versions of FFTW,
Chris@19 1276 which replace `double' with `float' and `long double', respectively
Chris@19 1277 (*note Installation and Customization::). To use these interfaces, you:
Chris@19 1278
Chris@19 1279 * Link to the single/long-double libraries; on Unix, `-lfftw3f' or
Chris@19 1280 `-lfftw3l' instead of (or in addition to) `-lfftw3'. (You can
Chris@19 1281 link to the different-precision libraries simultaneously.)
Chris@19 1282
Chris@19 1283 * Include the _same_ `<fftw3.h>' header file.
Chris@19 1284
Chris@19 1285 * Replace all lowercase instances of `fftw_' with `fftwf_' or
Chris@19 1286 `fftwl_' for single or long-double precision, respectively.
Chris@19 1287 (`fftw_complex' becomes `fftwf_complex', `fftw_execute' becomes
Chris@19 1288 `fftwf_execute', etcetera.)
Chris@19 1289
Chris@19 1290 * Uppercase names, i.e. names beginning with `FFTW_', remain the
Chris@19 1291 same.
Chris@19 1292
Chris@19 1293 * Replace `double' with `float' or `long double' for subroutine
Chris@19 1294 parameters.
Chris@19 1295
Chris@19 1296
Chris@19 1297 Depending upon your compiler and/or hardware, `long double' may not
Chris@19 1298 be any more precise than `double' (or may not be supported at all,
Chris@19 1299 although it is standard in C99).
Chris@19 1300
Chris@19 1301 We also support using the nonstandard `__float128'
Chris@19 1302 quadruple-precision type provided by recent versions of `gcc' on 32-
Chris@19 1303 and 64-bit x86 hardware (*note Installation and Customization::). To
Chris@19 1304 use this type, link with `-lfftw3q -lquadmath -lm' (the `libquadmath'
Chris@19 1305 library provided by `gcc' is needed for quadruple-precision
Chris@19 1306 trigonometric functions) and use `fftwq_' identifiers.
Chris@19 1307
Chris@19 1308 
Chris@19 1309 File: fftw3.info, Node: Memory Allocation, Prev: Precision, Up: Data Types and Files
Chris@19 1310
Chris@19 1311 4.1.3 Memory Allocation
Chris@19 1312 -----------------------
Chris@19 1313
Chris@19 1314 void *fftw_malloc(size_t n);
Chris@19 1315 void fftw_free(void *p);
Chris@19 1316
Chris@19 1317 These are functions that behave identically to `malloc' and `free',
Chris@19 1318 except that they guarantee that the returned pointer obeys any special
Chris@19 1319 alignment restrictions imposed by any algorithm in FFTW (e.g. for SIMD
Chris@19 1320 acceleration). *Note SIMD alignment and fftw_malloc::.
Chris@19 1321
Chris@19 1322 Data allocated by `fftw_malloc' _must_ be deallocated by `fftw_free'
Chris@19 1323 and not by the ordinary `free'.
Chris@19 1324
Chris@19 1325 These routines simply call through to your operating system's
Chris@19 1326 `malloc' or, if necessary, its aligned equivalent (e.g. `memalign'), so
Chris@19 1327 you normally need not worry about any significant time or space
Chris@19 1328 overhead. You are _not required_ to use them to allocate your data,
Chris@19 1329 but we strongly recommend it.
Chris@19 1330
Chris@19 1331 Note: in C++, just as with ordinary `malloc', you must typecast the
Chris@19 1332 output of `fftw_malloc' to whatever pointer type you are allocating.
Chris@19 1333
Chris@19 1334 We also provide the following two convenience functions to allocate
Chris@19 1335 real and complex arrays with `n' elements, which are equivalent to
Chris@19 1336 `(double *) fftw_malloc(sizeof(double) * n)' and `(fftw_complex *)
Chris@19 1337 fftw_malloc(sizeof(fftw_complex) * n)', respectively:
Chris@19 1338
Chris@19 1339 double *fftw_alloc_real(size_t n);
Chris@19 1340 fftw_complex *fftw_alloc_complex(size_t n);
Chris@19 1341
Chris@19 1342 The equivalent functions in other precisions allocate arrays of `n'
Chris@19 1343 elements in that precision. e.g. `fftwf_alloc_real(n)' is equivalent
Chris@19 1344 to `(float *) fftwf_malloc(sizeof(float) * n)'.
Chris@19 1345
Chris@19 1346 
Chris@19 1347 File: fftw3.info, Node: Using Plans, Next: Basic Interface, Prev: Data Types and Files, Up: FFTW Reference
Chris@19 1348
Chris@19 1349 4.2 Using Plans
Chris@19 1350 ===============
Chris@19 1351
Chris@19 1352 Plans for all transform types in FFTW are stored as type `fftw_plan'
Chris@19 1353 (an opaque pointer type), and are created by one of the various
Chris@19 1354 planning routines described in the following sections. An `fftw_plan'
Chris@19 1355 contains all information necessary to compute the transform, including
Chris@19 1356 the pointers to the input and output arrays.
Chris@19 1357
Chris@19 1358 void fftw_execute(const fftw_plan plan);
Chris@19 1359
Chris@19 1360 This executes the `plan', to compute the corresponding transform on
Chris@19 1361 the arrays for which it was planned (which must still exist). The plan
Chris@19 1362 is not modified, and `fftw_execute' can be called as many times as
Chris@19 1363 desired.
Chris@19 1364
Chris@19 1365 To apply a given plan to a different array, you can use the
Chris@19 1366 new-array execute interface. *Note New-array Execute Functions::.
Chris@19 1367
Chris@19 1368 `fftw_execute' (and equivalents) is the only function in FFTW
Chris@19 1369 guaranteed to be thread-safe; see *note Thread safety::.
Chris@19 1370
Chris@19 1371 This function:
Chris@19 1372 void fftw_destroy_plan(fftw_plan plan);
Chris@19 1373 deallocates the `plan' and all its associated data.
Chris@19 1374
Chris@19 1375 FFTW's planner saves some other persistent data, such as the
Chris@19 1376 accumulated wisdom and a list of algorithms available in the current
Chris@19 1377 configuration. If you want to deallocate all of that and reset FFTW to
Chris@19 1378 the pristine state it was in when you started your program, you can
Chris@19 1379 call:
Chris@19 1380
Chris@19 1381 void fftw_cleanup(void);
Chris@19 1382
Chris@19 1383 After calling `fftw_cleanup', all existing plans become undefined,
Chris@19 1384 and you should not attempt to execute them nor to destroy them. You can
Chris@19 1385 however create and execute/destroy new plans, in which case FFTW starts
Chris@19 1386 accumulating wisdom information again.
Chris@19 1387
Chris@19 1388 `fftw_cleanup' does not deallocate your plans, however. To prevent
Chris@19 1389 memory leaks, you must still call `fftw_destroy_plan' before executing
Chris@19 1390 `fftw_cleanup'.
Chris@19 1391
Chris@19 1392 Occasionally, it may useful to know FFTW's internal "cost" metric
Chris@19 1393 that it uses to compare plans to one another; this cost is proportional
Chris@19 1394 to an execution time of the plan, in undocumented units, if the plan
Chris@19 1395 was created with the `FFTW_MEASURE' or other timing-based options, or
Chris@19 1396 alternatively is a heuristic cost function for `FFTW_ESTIMATE' plans.
Chris@19 1397 (The cost values of measured and estimated plans are not comparable,
Chris@19 1398 being in different units. Also, costs from different FFTW versions or
Chris@19 1399 the same version compiled differently may not be in the same units.
Chris@19 1400 Plans created from wisdom have a cost of 0 since no timing measurement
Chris@19 1401 is performed for them. Finally, certain problems for which only one
Chris@19 1402 top-level algorithm was possible may have required no measurements of
Chris@19 1403 the cost of the whole plan, in which case `fftw_cost' will also return
Chris@19 1404 0.) The cost metric for a given plan is returned by:
Chris@19 1405
Chris@19 1406 double fftw_cost(const fftw_plan plan);
Chris@19 1407
Chris@19 1408 The following two routines are provided purely for academic purposes
Chris@19 1409 (that is, for entertainment).
Chris@19 1410
Chris@19 1411 void fftw_flops(const fftw_plan plan,
Chris@19 1412 double *add, double *mul, double *fma);
Chris@19 1413
Chris@19 1414 Given a `plan', set `add', `mul', and `fma' to an exact count of the
Chris@19 1415 number of floating-point additions, multiplications, and fused
Chris@19 1416 multiply-add operations involved in the plan's execution. The total
Chris@19 1417 number of floating-point operations (flops) is `add + mul + 2*fma', or
Chris@19 1418 `add + mul + fma' if the hardware supports fused multiply-add
Chris@19 1419 instructions (although the number of FMA operations is only approximate
Chris@19 1420 because of compiler voodoo). (The number of operations should be an
Chris@19 1421 integer, but we use `double' to avoid overflowing `int' for large
Chris@19 1422 transforms; the arguments are of type `double' even for single and
Chris@19 1423 long-double precision versions of FFTW.)
Chris@19 1424
Chris@19 1425 void fftw_fprint_plan(const fftw_plan plan, FILE *output_file);
Chris@19 1426 void fftw_print_plan(const fftw_plan plan);
Chris@19 1427 char *fftw_sprint_plan(const fftw_plan plan);
Chris@19 1428
Chris@19 1429 This outputs a "nerd-readable" representation of the `plan' to the
Chris@19 1430 given file, to `stdout', or two a newly allocated NUL-terminated string
Chris@19 1431 (which the caller is responsible for deallocating with `free'),
Chris@19 1432 respectively.
Chris@19 1433
Chris@19 1434 
Chris@19 1435 File: fftw3.info, Node: Basic Interface, Next: Advanced Interface, Prev: Using Plans, Up: FFTW Reference
Chris@19 1436
Chris@19 1437 4.3 Basic Interface
Chris@19 1438 ===================
Chris@19 1439
Chris@19 1440 Recall that the FFTW API is divided into three parts(1): the "basic
Chris@19 1441 interface" computes a single transform of contiguous data, the "advanced
Chris@19 1442 interface" computes transforms of multiple or strided arrays, and the
Chris@19 1443 "guru interface" supports the most general data layouts,
Chris@19 1444 multiplicities, and strides. This section describes the the basic
Chris@19 1445 interface, which we expect to satisfy the needs of most users.
Chris@19 1446
Chris@19 1447 * Menu:
Chris@19 1448
Chris@19 1449 * Complex DFTs::
Chris@19 1450 * Planner Flags::
Chris@19 1451 * Real-data DFTs::
Chris@19 1452 * Real-data DFT Array Format::
Chris@19 1453 * Real-to-Real Transforms::
Chris@19 1454 * Real-to-Real Transform Kinds::
Chris@19 1455
Chris@19 1456 ---------- Footnotes ----------
Chris@19 1457
Chris@19 1458 (1) Gallia est omnis divisa in partes tres (Julius Caesar).
Chris@19 1459
Chris@19 1460 
Chris@19 1461 File: fftw3.info, Node: Complex DFTs, Next: Planner Flags, Prev: Basic Interface, Up: Basic Interface
Chris@19 1462
Chris@19 1463 4.3.1 Complex DFTs
Chris@19 1464 ------------------
Chris@19 1465
Chris@19 1466 fftw_plan fftw_plan_dft_1d(int n0,
Chris@19 1467 fftw_complex *in, fftw_complex *out,
Chris@19 1468 int sign, unsigned flags);
Chris@19 1469 fftw_plan fftw_plan_dft_2d(int n0, int n1,
Chris@19 1470 fftw_complex *in, fftw_complex *out,
Chris@19 1471 int sign, unsigned flags);
Chris@19 1472 fftw_plan fftw_plan_dft_3d(int n0, int n1, int n2,
Chris@19 1473 fftw_complex *in, fftw_complex *out,
Chris@19 1474 int sign, unsigned flags);
Chris@19 1475 fftw_plan fftw_plan_dft(int rank, const int *n,
Chris@19 1476 fftw_complex *in, fftw_complex *out,
Chris@19 1477 int sign, unsigned flags);
Chris@19 1478
Chris@19 1479 Plan a complex input/output discrete Fourier transform (DFT) in zero
Chris@19 1480 or more dimensions, returning an `fftw_plan' (*note Using Plans::).
Chris@19 1481
Chris@19 1482 Once you have created a plan for a certain transform type and
Chris@19 1483 parameters, then creating another plan of the same type and parameters,
Chris@19 1484 but for different arrays, is fast and shares constant data with the
Chris@19 1485 first plan (if it still exists).
Chris@19 1486
Chris@19 1487 The planner returns `NULL' if the plan cannot be created. In the
Chris@19 1488 standard FFTW distribution, the basic interface is guaranteed to return
Chris@19 1489 a non-`NULL' plan. A plan may be `NULL', however, if you are using a
Chris@19 1490 customized FFTW configuration supporting a restricted set of transforms.
Chris@19 1491
Chris@19 1492 Arguments
Chris@19 1493 .........
Chris@19 1494
Chris@19 1495 * `rank' is the rank of the transform (it should be the size of the
Chris@19 1496 array `*n'), and can be any non-negative integer. (*Note Complex
Chris@19 1497 Multi-Dimensional DFTs::, for the definition of "rank".) The
Chris@19 1498 `_1d', `_2d', and `_3d' planners correspond to a `rank' of `1',
Chris@19 1499 `2', and `3', respectively. The rank may be zero, which is
Chris@19 1500 equivalent to a rank-1 transform of size 1, i.e. a copy of one
Chris@19 1501 number from input to output.
Chris@19 1502
Chris@19 1503 * `n0', `n1', `n2', or `n[0..rank-1]' (as appropriate for each
Chris@19 1504 routine) specify the size of the transform dimensions. They can
Chris@19 1505 be any positive integer.
Chris@19 1506
Chris@19 1507 - Multi-dimensional arrays are stored in row-major order with
Chris@19 1508 dimensions: `n0' x `n1'; or `n0' x `n1' x `n2'; or `n[0]' x
Chris@19 1509 `n[1]' x ... x `n[rank-1]'. *Note Multi-dimensional Array
Chris@19 1510 Format::.
Chris@19 1511
Chris@19 1512 - FFTW is best at handling sizes of the form 2^a 3^b 5^c 7^d
Chris@19 1513 11^e 13^f, where e+f is either 0 or 1, and the other exponents
Chris@19 1514 are arbitrary. Other sizes are computed by means of a slow,
Chris@19 1515 general-purpose algorithm (which nevertheless retains O(n log
Chris@19 1516 n) performance even for prime sizes). It is possible to
Chris@19 1517 customize FFTW for different array sizes; see *note
Chris@19 1518 Installation and Customization::. Transforms whose sizes are
Chris@19 1519 powers of 2 are especially fast.
Chris@19 1520
Chris@19 1521 * `in' and `out' point to the input and output arrays of the
Chris@19 1522 transform, which may be the same (yielding an in-place transform). These
Chris@19 1523 arrays are overwritten during planning, unless `FFTW_ESTIMATE' is
Chris@19 1524 used in the flags. (The arrays need not be initialized, but they
Chris@19 1525 must be allocated.)
Chris@19 1526
Chris@19 1527 If `in == out', the transform is "in-place" and the input array is
Chris@19 1528 overwritten. If `in != out', the two arrays must not overlap (but
Chris@19 1529 FFTW does not check for this condition).
Chris@19 1530
Chris@19 1531 * `sign' is the sign of the exponent in the formula that defines the
Chris@19 1532 Fourier transform. It can be -1 (= `FFTW_FORWARD') or +1 (=
Chris@19 1533 `FFTW_BACKWARD').
Chris@19 1534
Chris@19 1535 * `flags' is a bitwise OR (`|') of zero or more planner flags, as
Chris@19 1536 defined in *note Planner Flags::.
Chris@19 1537
Chris@19 1538
Chris@19 1539 FFTW computes an unnormalized transform: computing a forward
Chris@19 1540 followed by a backward transform (or vice versa) will result in the
Chris@19 1541 original data multiplied by the size of the transform (the product of
Chris@19 1542 the dimensions). For more information, see *note What FFTW Really
Chris@19 1543 Computes::.
Chris@19 1544
Chris@19 1545 
Chris@19 1546 File: fftw3.info, Node: Planner Flags, Next: Real-data DFTs, Prev: Complex DFTs, Up: Basic Interface
Chris@19 1547
Chris@19 1548 4.3.2 Planner Flags
Chris@19 1549 -------------------
Chris@19 1550
Chris@19 1551 All of the planner routines in FFTW accept an integer `flags' argument,
Chris@19 1552 which is a bitwise OR (`|') of zero or more of the flag constants
Chris@19 1553 defined below. These flags control the rigor (and time) of the
Chris@19 1554 planning process, and can also impose (or lift) restrictions on the
Chris@19 1555 type of transform algorithm that is employed.
Chris@19 1556
Chris@19 1557 _Important:_ the planner overwrites the input array during planning
Chris@19 1558 unless a saved plan (*note Wisdom::) is available for that problem, so
Chris@19 1559 you should initialize your input data after creating the plan. The
Chris@19 1560 only exceptions to this are the `FFTW_ESTIMATE' and `FFTW_WISDOM_ONLY'
Chris@19 1561 flags, as mentioned below.
Chris@19 1562
Chris@19 1563 In all cases, if wisdom is available for the given problem that
Chris@19 1564 was created with equal-or-greater planning rigor, then the more
Chris@19 1565 rigorous wisdom is used. For example, in `FFTW_ESTIMATE' mode any
Chris@19 1566 available wisdom is used, whereas in `FFTW_PATIENT' mode only wisdom
Chris@19 1567 created in patient or exhaustive mode can be used. *Note Words of
Chris@19 1568 Wisdom-Saving Plans::.
Chris@19 1569
Chris@19 1570 Planning-rigor flags
Chris@19 1571 ....................
Chris@19 1572
Chris@19 1573 * `FFTW_ESTIMATE' specifies that, instead of actual measurements of
Chris@19 1574 different algorithms, a simple heuristic is used to pick a
Chris@19 1575 (probably sub-optimal) plan quickly. With this flag, the
Chris@19 1576 input/output arrays are not overwritten during planning.
Chris@19 1577
Chris@19 1578 * `FFTW_MEASURE' tells FFTW to find an optimized plan by actually
Chris@19 1579 _computing_ several FFTs and measuring their execution time.
Chris@19 1580 Depending on your machine, this can take some time (often a few
Chris@19 1581 seconds). `FFTW_MEASURE' is the default planning option.
Chris@19 1582
Chris@19 1583 * `FFTW_PATIENT' is like `FFTW_MEASURE', but considers a wider range
Chris@19 1584 of algorithms and often produces a "more optimal" plan (especially
Chris@19 1585 for large transforms), but at the expense of several times longer
Chris@19 1586 planning time (especially for large transforms).
Chris@19 1587
Chris@19 1588 * `FFTW_EXHAUSTIVE' is like `FFTW_PATIENT', but considers an even
Chris@19 1589 wider range of algorithms, including many that we think are
Chris@19 1590 unlikely to be fast, to produce the most optimal plan but with a
Chris@19 1591 substantially increased planning time.
Chris@19 1592
Chris@19 1593 * `FFTW_WISDOM_ONLY' is a special planning mode in which the plan is
Chris@19 1594 only created if wisdom is available for the given problem, and
Chris@19 1595 otherwise a `NULL' plan is returned. This can be combined with
Chris@19 1596 other flags, e.g. `FFTW_WISDOM_ONLY | FFTW_PATIENT' creates a plan
Chris@19 1597 only if wisdom is available that was created in `FFTW_PATIENT' or
Chris@19 1598 `FFTW_EXHAUSTIVE' mode. The `FFTW_WISDOM_ONLY' flag is intended
Chris@19 1599 for users who need to detect whether wisdom is available; for
Chris@19 1600 example, if wisdom is not available one may wish to allocate new
Chris@19 1601 arrays for planning so that user data is not overwritten.
Chris@19 1602
Chris@19 1603
Chris@19 1604 Algorithm-restriction flags
Chris@19 1605 ...........................
Chris@19 1606
Chris@19 1607 * `FFTW_DESTROY_INPUT' specifies that an out-of-place transform is
Chris@19 1608 allowed to _overwrite its input_ array with arbitrary data; this
Chris@19 1609 can sometimes allow more efficient algorithms to be employed.
Chris@19 1610
Chris@19 1611 * `FFTW_PRESERVE_INPUT' specifies that an out-of-place transform must
Chris@19 1612 _not change its input_ array. This is ordinarily the _default_,
Chris@19 1613 except for c2r and hc2r (i.e. complex-to-real) transforms for
Chris@19 1614 which `FFTW_DESTROY_INPUT' is the default. In the latter cases,
Chris@19 1615 passing `FFTW_PRESERVE_INPUT' will attempt to use algorithms that
Chris@19 1616 do not destroy the input, at the expense of worse performance; for
Chris@19 1617 multi-dimensional c2r transforms, however, no input-preserving
Chris@19 1618 algorithms are implemented and the planner will return `NULL' if
Chris@19 1619 one is requested.
Chris@19 1620
Chris@19 1621 * `FFTW_UNALIGNED' specifies that the algorithm may not impose any
Chris@19 1622 unusual alignment requirements on the input/output arrays (i.e. no
Chris@19 1623 SIMD may be used). This flag is normally _not necessary_, since
Chris@19 1624 the planner automatically detects misaligned arrays. The only use
Chris@19 1625 for this flag is if you want to use the new-array execute
Chris@19 1626 interface to execute a given plan on a different array that may
Chris@19 1627 not be aligned like the original. (Using `fftw_malloc' makes this
Chris@19 1628 flag unnecessary even then. You can also use `fftw_alignment_of'
Chris@19 1629 to detect whether two arrays are equivalently aligned.)
Chris@19 1630
Chris@19 1631
Chris@19 1632 Limiting planning time
Chris@19 1633 ......................
Chris@19 1634
Chris@19 1635 extern void fftw_set_timelimit(double seconds);
Chris@19 1636
Chris@19 1637 This function instructs FFTW to spend at most `seconds' seconds
Chris@19 1638 (approximately) in the planner. If `seconds == FFTW_NO_TIMELIMIT' (the
Chris@19 1639 default value, which is negative), then planning time is unbounded.
Chris@19 1640 Otherwise, FFTW plans with a progressively wider range of algorithms
Chris@19 1641 until the the given time limit is reached or the given range of
Chris@19 1642 algorithms is explored, returning the best available plan.
Chris@19 1643
Chris@19 1644 For example, specifying `FFTW_PATIENT' first plans in
Chris@19 1645 `FFTW_ESTIMATE' mode, then in `FFTW_MEASURE' mode, then finally (time
Chris@19 1646 permitting) in `FFTW_PATIENT'. If `FFTW_EXHAUSTIVE' is specified
Chris@19 1647 instead, the planner will further progress to `FFTW_EXHAUSTIVE' mode.
Chris@19 1648
Chris@19 1649 Note that the `seconds' argument specifies only a rough limit; in
Chris@19 1650 practice, the planner may use somewhat more time if the time limit is
Chris@19 1651 reached when the planner is in the middle of an operation that cannot
Chris@19 1652 be interrupted. At the very least, the planner will complete planning
Chris@19 1653 in `FFTW_ESTIMATE' mode (which is thus equivalent to a time limit of 0).
Chris@19 1654
Chris@19 1655 
Chris@19 1656 File: fftw3.info, Node: Real-data DFTs, Next: Real-data DFT Array Format, Prev: Planner Flags, Up: Basic Interface
Chris@19 1657
Chris@19 1658 4.3.3 Real-data DFTs
Chris@19 1659 --------------------
Chris@19 1660
Chris@19 1661 fftw_plan fftw_plan_dft_r2c_1d(int n0,
Chris@19 1662 double *in, fftw_complex *out,
Chris@19 1663 unsigned flags);
Chris@19 1664 fftw_plan fftw_plan_dft_r2c_2d(int n0, int n1,
Chris@19 1665 double *in, fftw_complex *out,
Chris@19 1666 unsigned flags);
Chris@19 1667 fftw_plan fftw_plan_dft_r2c_3d(int n0, int n1, int n2,
Chris@19 1668 double *in, fftw_complex *out,
Chris@19 1669 unsigned flags);
Chris@19 1670 fftw_plan fftw_plan_dft_r2c(int rank, const int *n,
Chris@19 1671 double *in, fftw_complex *out,
Chris@19 1672 unsigned flags);
Chris@19 1673
Chris@19 1674 Plan a real-input/complex-output discrete Fourier transform (DFT) in
Chris@19 1675 zero or more dimensions, returning an `fftw_plan' (*note Using Plans::).
Chris@19 1676
Chris@19 1677 Once you have created a plan for a certain transform type and
Chris@19 1678 parameters, then creating another plan of the same type and parameters,
Chris@19 1679 but for different arrays, is fast and shares constant data with the
Chris@19 1680 first plan (if it still exists).
Chris@19 1681
Chris@19 1682 The planner returns `NULL' if the plan cannot be created. A
Chris@19 1683 non-`NULL' plan is always returned by the basic interface unless you
Chris@19 1684 are using a customized FFTW configuration supporting a restricted set
Chris@19 1685 of transforms, or if you use the `FFTW_PRESERVE_INPUT' flag with a
Chris@19 1686 multi-dimensional out-of-place c2r transform (see below).
Chris@19 1687
Chris@19 1688 Arguments
Chris@19 1689 .........
Chris@19 1690
Chris@19 1691 * `rank' is the rank of the transform (it should be the size of the
Chris@19 1692 array `*n'), and can be any non-negative integer. (*Note Complex
Chris@19 1693 Multi-Dimensional DFTs::, for the definition of "rank".) The
Chris@19 1694 `_1d', `_2d', and `_3d' planners correspond to a `rank' of `1',
Chris@19 1695 `2', and `3', respectively. The rank may be zero, which is
Chris@19 1696 equivalent to a rank-1 transform of size 1, i.e. a copy of one
Chris@19 1697 real number (with zero imaginary part) from input to output.
Chris@19 1698
Chris@19 1699 * `n0', `n1', `n2', or `n[0..rank-1]', (as appropriate for each
Chris@19 1700 routine) specify the size of the transform dimensions. They can
Chris@19 1701 be any positive integer. This is different in general from the
Chris@19 1702 _physical_ array dimensions, which are described in *note
Chris@19 1703 Real-data DFT Array Format::.
Chris@19 1704
Chris@19 1705 - FFTW is best at handling sizes of the form 2^a 3^b 5^c 7^d
Chris@19 1706 11^e 13^f, where e+f is either 0 or 1, and the other exponents
Chris@19 1707 are arbitrary. Other sizes are computed by means of a slow,
Chris@19 1708 general-purpose algorithm (which nevertheless retains O(n log
Chris@19 1709 n) performance even for prime sizes). (It is possible to
Chris@19 1710 customize FFTW for different array sizes; see *note
Chris@19 1711 Installation and Customization::.) Transforms whose sizes
Chris@19 1712 are powers of 2 are especially fast, and it is generally
Chris@19 1713 beneficial for the _last_ dimension of an r2c/c2r transform
Chris@19 1714 to be _even_.
Chris@19 1715
Chris@19 1716 * `in' and `out' point to the input and output arrays of the
Chris@19 1717 transform, which may be the same (yielding an in-place transform). These
Chris@19 1718 arrays are overwritten during planning, unless `FFTW_ESTIMATE' is
Chris@19 1719 used in the flags. (The arrays need not be initialized, but they
Chris@19 1720 must be allocated.) For an in-place transform, it is important to
Chris@19 1721 remember that the real array will require padding, described in
Chris@19 1722 *note Real-data DFT Array Format::.
Chris@19 1723
Chris@19 1724 * `flags' is a bitwise OR (`|') of zero or more planner flags, as
Chris@19 1725 defined in *note Planner Flags::.
Chris@19 1726
Chris@19 1727
Chris@19 1728 The inverse transforms, taking complex input (storing the
Chris@19 1729 non-redundant half of a logically Hermitian array) to real output, are
Chris@19 1730 given by:
Chris@19 1731
Chris@19 1732 fftw_plan fftw_plan_dft_c2r_1d(int n0,
Chris@19 1733 fftw_complex *in, double *out,
Chris@19 1734 unsigned flags);
Chris@19 1735 fftw_plan fftw_plan_dft_c2r_2d(int n0, int n1,
Chris@19 1736 fftw_complex *in, double *out,
Chris@19 1737 unsigned flags);
Chris@19 1738 fftw_plan fftw_plan_dft_c2r_3d(int n0, int n1, int n2,
Chris@19 1739 fftw_complex *in, double *out,
Chris@19 1740 unsigned flags);
Chris@19 1741 fftw_plan fftw_plan_dft_c2r(int rank, const int *n,
Chris@19 1742 fftw_complex *in, double *out,
Chris@19 1743 unsigned flags);
Chris@19 1744
Chris@19 1745 The arguments are the same as for the r2c transforms, except that the
Chris@19 1746 input and output data formats are reversed.
Chris@19 1747
Chris@19 1748 FFTW computes an unnormalized transform: computing an r2c followed
Chris@19 1749 by a c2r transform (or vice versa) will result in the original data
Chris@19 1750 multiplied by the size of the transform (the product of the logical
Chris@19 1751 dimensions). An r2c transform produces the same output as a
Chris@19 1752 `FFTW_FORWARD' complex DFT of the same input, and a c2r transform is
Chris@19 1753 correspondingly equivalent to `FFTW_BACKWARD'. For more information,
Chris@19 1754 see *note What FFTW Really Computes::.
Chris@19 1755
Chris@19 1756 
Chris@19 1757 File: fftw3.info, Node: Real-data DFT Array Format, Next: Real-to-Real Transforms, Prev: Real-data DFTs, Up: Basic Interface
Chris@19 1758
Chris@19 1759 4.3.4 Real-data DFT Array Format
Chris@19 1760 --------------------------------
Chris@19 1761
Chris@19 1762 The output of a DFT of real data (r2c) contains symmetries that, in
Chris@19 1763 principle, make half of the outputs redundant (*note What FFTW Really
Chris@19 1764 Computes::). (Similarly for the input of an inverse c2r transform.) In
Chris@19 1765 practice, it is not possible to entirely realize these savings in an
Chris@19 1766 efficient and understandable format that generalizes to
Chris@19 1767 multi-dimensional transforms. Instead, the output of the r2c
Chris@19 1768 transforms is _slightly_ over half of the output of the corresponding
Chris@19 1769 complex transform. We do not "pack" the data in any way, but store it
Chris@19 1770 as an ordinary array of `fftw_complex' values. In fact, this data is
Chris@19 1771 simply a subsection of what would be the array in the corresponding
Chris@19 1772 complex transform.
Chris@19 1773
Chris@19 1774 Specifically, for a real transform of d (= `rank') dimensions n[0] x
Chris@19 1775 n[1] x n[2] x ... x n[d-1] , the complex data is an n[0] x n[1] x n[2]
Chris@19 1776 x ... x (n[d-1]/2 + 1) array of `fftw_complex' values in row-major
Chris@19 1777 order (with the division rounded down). That is, we only store the
Chris@19 1778 _lower_ half (non-negative frequencies), plus one element, of the last
Chris@19 1779 dimension of the data from the ordinary complex transform. (We could
Chris@19 1780 have instead taken half of any other dimension, but implementation
Chris@19 1781 turns out to be simpler if the last, contiguous, dimension is used.)
Chris@19 1782
Chris@19 1783 For an out-of-place transform, the real data is simply an array with
Chris@19 1784 physical dimensions n[0] x n[1] x n[2] x ... x n[d-1] in row-major
Chris@19 1785 order.
Chris@19 1786
Chris@19 1787 For an in-place transform, some complications arise since the
Chris@19 1788 complex data is slightly larger than the real data. In this case, the
Chris@19 1789 final dimension of the real data must be _padded_ with extra values to
Chris@19 1790 accommodate the size of the complex data--two extra if the last
Chris@19 1791 dimension is even and one if it is odd. That is, the last dimension of
Chris@19 1792 the real data must physically contain 2 * (n[d-1]/2+1) `double' values
Chris@19 1793 (exactly enough to hold the complex data). This physical array size
Chris@19 1794 does not, however, change the _logical_ array size--only n[d-1] values
Chris@19 1795 are actually stored in the last dimension, and n[d-1] is the last
Chris@19 1796 dimension passed to the planner.
Chris@19 1797
Chris@19 1798 
Chris@19 1799 File: fftw3.info, Node: Real-to-Real Transforms, Next: Real-to-Real Transform Kinds, Prev: Real-data DFT Array Format, Up: Basic Interface
Chris@19 1800
Chris@19 1801 4.3.5 Real-to-Real Transforms
Chris@19 1802 -----------------------------
Chris@19 1803
Chris@19 1804 fftw_plan fftw_plan_r2r_1d(int n, double *in, double *out,
Chris@19 1805 fftw_r2r_kind kind, unsigned flags);
Chris@19 1806 fftw_plan fftw_plan_r2r_2d(int n0, int n1, double *in, double *out,
Chris@19 1807 fftw_r2r_kind kind0, fftw_r2r_kind kind1,
Chris@19 1808 unsigned flags);
Chris@19 1809 fftw_plan fftw_plan_r2r_3d(int n0, int n1, int n2,
Chris@19 1810 double *in, double *out,
Chris@19 1811 fftw_r2r_kind kind0,
Chris@19 1812 fftw_r2r_kind kind1,
Chris@19 1813 fftw_r2r_kind kind2,
Chris@19 1814 unsigned flags);
Chris@19 1815 fftw_plan fftw_plan_r2r(int rank, const int *n, double *in, double *out,
Chris@19 1816 const fftw_r2r_kind *kind, unsigned flags);
Chris@19 1817
Chris@19 1818 Plan a real input/output (r2r) transform of various kinds in zero or
Chris@19 1819 more dimensions, returning an `fftw_plan' (*note Using Plans::).
Chris@19 1820
Chris@19 1821 Once you have created a plan for a certain transform type and
Chris@19 1822 parameters, then creating another plan of the same type and parameters,
Chris@19 1823 but for different arrays, is fast and shares constant data with the
Chris@19 1824 first plan (if it still exists).
Chris@19 1825
Chris@19 1826 The planner returns `NULL' if the plan cannot be created. A
Chris@19 1827 non-`NULL' plan is always returned by the basic interface unless you
Chris@19 1828 are using a customized FFTW configuration supporting a restricted set
Chris@19 1829 of transforms, or for size-1 `FFTW_REDFT00' kinds (which are not
Chris@19 1830 defined).
Chris@19 1831
Chris@19 1832 Arguments
Chris@19 1833 .........
Chris@19 1834
Chris@19 1835 * `rank' is the dimensionality of the transform (it should be the
Chris@19 1836 size of the arrays `*n' and `*kind'), and can be any non-negative
Chris@19 1837 integer. The `_1d', `_2d', and `_3d' planners correspond to a
Chris@19 1838 `rank' of `1', `2', and `3', respectively. A `rank' of zero is
Chris@19 1839 equivalent to a copy of one number from input to output.
Chris@19 1840
Chris@19 1841 * `n', or `n0'/`n1'/`n2', or `n[rank]', respectively, gives the
Chris@19 1842 (physical) size of the transform dimensions. They can be any
Chris@19 1843 positive integer.
Chris@19 1844
Chris@19 1845 - Multi-dimensional arrays are stored in row-major order with
Chris@19 1846 dimensions: `n0' x `n1'; or `n0' x `n1' x `n2'; or `n[0]' x
Chris@19 1847 `n[1]' x ... x `n[rank-1]'. *Note Multi-dimensional Array
Chris@19 1848 Format::.
Chris@19 1849
Chris@19 1850 - FFTW is generally best at handling sizes of the form 2^a 3^b
Chris@19 1851 5^c 7^d 11^e 13^f, where e+f is either 0 or 1, and the other
Chris@19 1852 exponents are arbitrary. Other sizes are computed by means
Chris@19 1853 of a slow, general-purpose algorithm (which nevertheless
Chris@19 1854 retains O(n log n) performance even for prime sizes). (It
Chris@19 1855 is possible to customize FFTW for different array sizes; see
Chris@19 1856 *note Installation and Customization::.) Transforms whose
Chris@19 1857 sizes are powers of 2 are especially fast.
Chris@19 1858
Chris@19 1859 - For a `REDFT00' or `RODFT00' transform kind in a dimension of
Chris@19 1860 size n, it is n-1 or n+1, respectively, that should be
Chris@19 1861 factorizable in the above form.
Chris@19 1862
Chris@19 1863 * `in' and `out' point to the input and output arrays of the
Chris@19 1864 transform, which may be the same (yielding an in-place transform). These
Chris@19 1865 arrays are overwritten during planning, unless `FFTW_ESTIMATE' is
Chris@19 1866 used in the flags. (The arrays need not be initialized, but they
Chris@19 1867 must be allocated.)
Chris@19 1868
Chris@19 1869 * `kind', or `kind0'/`kind1'/`kind2', or `kind[rank]', is the kind
Chris@19 1870 of r2r transform used for the corresponding dimension. The valid
Chris@19 1871 kind constants are described in *note Real-to-Real Transform
Chris@19 1872 Kinds::. In a multi-dimensional transform, what is computed is
Chris@19 1873 the separable product formed by taking each transform kind along
Chris@19 1874 the corresponding dimension, one dimension after another.
Chris@19 1875
Chris@19 1876 * `flags' is a bitwise OR (`|') of zero or more planner flags, as
Chris@19 1877 defined in *note Planner Flags::.
Chris@19 1878
Chris@19 1879
Chris@19 1880 
Chris@19 1881 File: fftw3.info, Node: Real-to-Real Transform Kinds, Prev: Real-to-Real Transforms, Up: Basic Interface
Chris@19 1882
Chris@19 1883 4.3.6 Real-to-Real Transform Kinds
Chris@19 1884 ----------------------------------
Chris@19 1885
Chris@19 1886 FFTW currently supports 11 different r2r transform kinds, specified by
Chris@19 1887 one of the constants below. For the precise definitions of these
Chris@19 1888 transforms, see *note What FFTW Really Computes::. For a more
Chris@19 1889 colloquial introduction to these transform kinds, see *note More DFTs
Chris@19 1890 of Real Data::.
Chris@19 1891
Chris@19 1892 For dimension of size `n', there is a corresponding "logical"
Chris@19 1893 dimension `N' that determines the normalization (and the optimal
Chris@19 1894 factorization); the formula for `N' is given for each kind below.
Chris@19 1895 Also, with each transform kind is listed its corrsponding inverse
Chris@19 1896 transform. FFTW computes unnormalized transforms: a transform followed
Chris@19 1897 by its inverse will result in the original data multiplied by `N' (or
Chris@19 1898 the product of the `N''s for each dimension, in multi-dimensions).
Chris@19 1899
Chris@19 1900 * `FFTW_R2HC' computes a real-input DFT with output in "halfcomplex"
Chris@19 1901 format, i.e. real and imaginary parts for a transform of size `n'
Chris@19 1902 stored as: r0, r1, r2, r(n/2), i((n+1)/2-1), ..., i2, i1 (Logical
Chris@19 1903 `N=n', inverse is `FFTW_HC2R'.)
Chris@19 1904
Chris@19 1905 * `FFTW_HC2R' computes the reverse of `FFTW_R2HC', above. (Logical
Chris@19 1906 `N=n', inverse is `FFTW_R2HC'.)
Chris@19 1907
Chris@19 1908 * `FFTW_DHT' computes a discrete Hartley transform. (Logical `N=n',
Chris@19 1909 inverse is `FFTW_DHT'.)
Chris@19 1910
Chris@19 1911 * `FFTW_REDFT00' computes an REDFT00 transform, i.e. a DCT-I.
Chris@19 1912 (Logical `N=2*(n-1)', inverse is `FFTW_REDFT00'.)
Chris@19 1913
Chris@19 1914 * `FFTW_REDFT10' computes an REDFT10 transform, i.e. a DCT-II
Chris@19 1915 (sometimes called "the" DCT). (Logical `N=2*n', inverse is
Chris@19 1916 `FFTW_REDFT01'.)
Chris@19 1917
Chris@19 1918 * `FFTW_REDFT01' computes an REDFT01 transform, i.e. a DCT-III
Chris@19 1919 (sometimes called "the" IDCT, being the inverse of DCT-II).
Chris@19 1920 (Logical `N=2*n', inverse is `FFTW_REDFT=10'.)
Chris@19 1921
Chris@19 1922 * `FFTW_REDFT11' computes an REDFT11 transform, i.e. a DCT-IV.
Chris@19 1923 (Logical `N=2*n', inverse is `FFTW_REDFT11'.)
Chris@19 1924
Chris@19 1925 * `FFTW_RODFT00' computes an RODFT00 transform, i.e. a DST-I.
Chris@19 1926 (Logical `N=2*(n+1)', inverse is `FFTW_RODFT00'.)
Chris@19 1927
Chris@19 1928 * `FFTW_RODFT10' computes an RODFT10 transform, i.e. a DST-II.
Chris@19 1929 (Logical `N=2*n', inverse is `FFTW_RODFT01'.)
Chris@19 1930
Chris@19 1931 * `FFTW_RODFT01' computes an RODFT01 transform, i.e. a DST-III.
Chris@19 1932 (Logical `N=2*n', inverse is `FFTW_RODFT=10'.)
Chris@19 1933
Chris@19 1934 * `FFTW_RODFT11' computes an RODFT11 transform, i.e. a DST-IV.
Chris@19 1935 (Logical `N=2*n', inverse is `FFTW_RODFT11'.)
Chris@19 1936
Chris@19 1937
Chris@19 1938 
Chris@19 1939 File: fftw3.info, Node: Advanced Interface, Next: Guru Interface, Prev: Basic Interface, Up: FFTW Reference
Chris@19 1940
Chris@19 1941 4.4 Advanced Interface
Chris@19 1942 ======================
Chris@19 1943
Chris@19 1944 FFTW's "advanced" interface supplements the basic interface with four
Chris@19 1945 new planner routines, providing a new level of flexibility: you can plan
Chris@19 1946 a transform of multiple arrays simultaneously, operate on non-contiguous
Chris@19 1947 (strided) data, and transform a subset of a larger multi-dimensional
Chris@19 1948 array. Other than these additional features, the planner operates in
Chris@19 1949 the same fashion as in the basic interface, and the resulting
Chris@19 1950 `fftw_plan' is used in the same way (*note Using Plans::).
Chris@19 1951
Chris@19 1952 * Menu:
Chris@19 1953
Chris@19 1954 * Advanced Complex DFTs::
Chris@19 1955 * Advanced Real-data DFTs::
Chris@19 1956 * Advanced Real-to-real Transforms::
Chris@19 1957
Chris@19 1958 
Chris@19 1959 File: fftw3.info, Node: Advanced Complex DFTs, Next: Advanced Real-data DFTs, Prev: Advanced Interface, Up: Advanced Interface
Chris@19 1960
Chris@19 1961 4.4.1 Advanced Complex DFTs
Chris@19 1962 ---------------------------
Chris@19 1963
Chris@19 1964 fftw_plan fftw_plan_many_dft(int rank, const int *n, int howmany,
Chris@19 1965 fftw_complex *in, const int *inembed,
Chris@19 1966 int istride, int idist,
Chris@19 1967 fftw_complex *out, const int *onembed,
Chris@19 1968 int ostride, int odist,
Chris@19 1969 int sign, unsigned flags);
Chris@19 1970
Chris@19 1971 This routine plans multiple multidimensional complex DFTs, and it
Chris@19 1972 extends the `fftw_plan_dft' routine (*note Complex DFTs::) to compute
Chris@19 1973 `howmany' transforms, each having rank `rank' and size `n'. In
Chris@19 1974 addition, the transform data need not be contiguous, but it may be laid
Chris@19 1975 out in memory with an arbitrary stride. To account for these
Chris@19 1976 possibilities, `fftw_plan_many_dft' adds the new parameters `howmany',
Chris@19 1977 {`i',`o'}`nembed', {`i',`o'}`stride', and {`i',`o'}`dist'. The FFTW
Chris@19 1978 basic interface (*note Complex DFTs::) provides routines specialized
Chris@19 1979 for ranks 1, 2, and 3, but the advanced interface handles only the
Chris@19 1980 general-rank case.
Chris@19 1981
Chris@19 1982 `howmany' is the number of transforms to compute. The resulting
Chris@19 1983 plan computes `howmany' transforms, where the input of the `k'-th
Chris@19 1984 transform is at location `in+k*idist' (in C pointer arithmetic), and
Chris@19 1985 its output is at location `out+k*odist'. Plans obtained in this way
Chris@19 1986 can often be faster than calling FFTW multiple times for the individual
Chris@19 1987 transforms. The basic `fftw_plan_dft' interface corresponds to
Chris@19 1988 `howmany=1' (in which case the `dist' parameters are ignored).
Chris@19 1989
Chris@19 1990 Each of the `howmany' transforms has rank `rank' and size `n', as in
Chris@19 1991 the basic interface. In addition, the advanced interface allows the
Chris@19 1992 input and output arrays of each transform to be row-major subarrays of
Chris@19 1993 larger rank-`rank' arrays, described by `inembed' and `onembed'
Chris@19 1994 parameters, respectively. {`i',`o'}`nembed' must be arrays of length
Chris@19 1995 `rank', and `n' should be elementwise less than or equal to
Chris@19 1996 {`i',`o'}`nembed'. Passing `NULL' for an `nembed' parameter is
Chris@19 1997 equivalent to passing `n' (i.e. same physical and logical dimensions,
Chris@19 1998 as in the basic interface.)
Chris@19 1999
Chris@19 2000 The `stride' parameters indicate that the `j'-th element of the
Chris@19 2001 input or output arrays is located at `j*istride' or `j*ostride',
Chris@19 2002 respectively. (For a multi-dimensional array, `j' is the ordinary
Chris@19 2003 row-major index.) When combined with the `k'-th transform in a
Chris@19 2004 `howmany' loop, from above, this means that the (`j',`k')-th element is
Chris@19 2005 at `j*stride+k*dist'. (The basic `fftw_plan_dft' interface corresponds
Chris@19 2006 to a stride of 1.)
Chris@19 2007
Chris@19 2008 For in-place transforms, the input and output `stride' and `dist'
Chris@19 2009 parameters should be the same; otherwise, the planner may return `NULL'.
Chris@19 2010
Chris@19 2011 Arrays `n', `inembed', and `onembed' are not used after this
Chris@19 2012 function returns. You can safely free or reuse them.
Chris@19 2013
Chris@19 2014 *Examples*: One transform of one 5 by 6 array contiguous in memory:
Chris@19 2015 int rank = 2;
Chris@19 2016 int n[] = {5, 6};
Chris@19 2017 int howmany = 1;
Chris@19 2018 int idist = odist = 0; /* unused because howmany = 1 */
Chris@19 2019 int istride = ostride = 1; /* array is contiguous in memory */
Chris@19 2020 int *inembed = n, *onembed = n;
Chris@19 2021
Chris@19 2022 Transform of three 5 by 6 arrays, each contiguous in memory, stored
Chris@19 2023 in memory one after another:
Chris@19 2024 int rank = 2;
Chris@19 2025 int n[] = {5, 6};
Chris@19 2026 int howmany = 3;
Chris@19 2027 int idist = odist = n[0]*n[1]; /* = 30, the distance in memory
Chris@19 2028 between the first element
Chris@19 2029 of the first array and the
Chris@19 2030 first element of the second array */
Chris@19 2031 int istride = ostride = 1; /* array is contiguous in memory */
Chris@19 2032 int *inembed = n, *onembed = n;
Chris@19 2033
Chris@19 2034 Transform each column of a 2d array with 10 rows and 3 columns:
Chris@19 2035 int rank = 1; /* not 2: we are computing 1d transforms */
Chris@19 2036 int n[] = {10}; /* 1d transforms of length 10 */
Chris@19 2037 int howmany = 3;
Chris@19 2038 int idist = odist = 1;
Chris@19 2039 int istride = ostride = 3; /* distance between two elements in
Chris@19 2040 the same column */
Chris@19 2041 int *inembed = n, *onembed = n;
Chris@19 2042
Chris@19 2043 
Chris@19 2044 File: fftw3.info, Node: Advanced Real-data DFTs, Next: Advanced Real-to-real Transforms, Prev: Advanced Complex DFTs, Up: Advanced Interface
Chris@19 2045
Chris@19 2046 4.4.2 Advanced Real-data DFTs
Chris@19 2047 -----------------------------
Chris@19 2048
Chris@19 2049 fftw_plan fftw_plan_many_dft_r2c(int rank, const int *n, int howmany,
Chris@19 2050 double *in, const int *inembed,
Chris@19 2051 int istride, int idist,
Chris@19 2052 fftw_complex *out, const int *onembed,
Chris@19 2053 int ostride, int odist,
Chris@19 2054 unsigned flags);
Chris@19 2055 fftw_plan fftw_plan_many_dft_c2r(int rank, const int *n, int howmany,
Chris@19 2056 fftw_complex *in, const int *inembed,
Chris@19 2057 int istride, int idist,
Chris@19 2058 double *out, const int *onembed,
Chris@19 2059 int ostride, int odist,
Chris@19 2060 unsigned flags);
Chris@19 2061
Chris@19 2062 Like `fftw_plan_many_dft', these two functions add `howmany',
Chris@19 2063 `nembed', `stride', and `dist' parameters to the `fftw_plan_dft_r2c'
Chris@19 2064 and `fftw_plan_dft_c2r' functions, but otherwise behave the same as the
Chris@19 2065 basic interface.
Chris@19 2066
Chris@19 2067 The interpretation of `howmany', `stride', and `dist' are the same
Chris@19 2068 as for `fftw_plan_many_dft', above. Note that the `stride' and `dist'
Chris@19 2069 for the real array are in units of `double', and for the complex array
Chris@19 2070 are in units of `fftw_complex'.
Chris@19 2071
Chris@19 2072 If an `nembed' parameter is `NULL', it is interpreted as what it
Chris@19 2073 would be in the basic interface, as described in *note Real-data DFT
Chris@19 2074 Array Format::. That is, for the complex array the size is assumed to
Chris@19 2075 be the same as `n', but with the last dimension cut roughly in half.
Chris@19 2076 For the real array, the size is assumed to be `n' if the transform is
Chris@19 2077 out-of-place, or `n' with the last dimension "padded" if the transform
Chris@19 2078 is in-place.
Chris@19 2079
Chris@19 2080 If an `nembed' parameter is non-`NULL', it is interpreted as the
Chris@19 2081 physical size of the corresponding array, in row-major order, just as
Chris@19 2082 for `fftw_plan_many_dft'. In this case, each dimension of `nembed'
Chris@19 2083 should be `>=' what it would be in the basic interface (e.g. the halved
Chris@19 2084 or padded `n').
Chris@19 2085
Chris@19 2086 Arrays `n', `inembed', and `onembed' are not used after this
Chris@19 2087 function returns. You can safely free or reuse them.
Chris@19 2088
Chris@19 2089 
Chris@19 2090 File: fftw3.info, Node: Advanced Real-to-real Transforms, Prev: Advanced Real-data DFTs, Up: Advanced Interface
Chris@19 2091
Chris@19 2092 4.4.3 Advanced Real-to-real Transforms
Chris@19 2093 --------------------------------------
Chris@19 2094
Chris@19 2095 fftw_plan fftw_plan_many_r2r(int rank, const int *n, int howmany,
Chris@19 2096 double *in, const int *inembed,
Chris@19 2097 int istride, int idist,
Chris@19 2098 double *out, const int *onembed,
Chris@19 2099 int ostride, int odist,
Chris@19 2100 const fftw_r2r_kind *kind, unsigned flags);
Chris@19 2101
Chris@19 2102 Like `fftw_plan_many_dft', this functions adds `howmany', `nembed',
Chris@19 2103 `stride', and `dist' parameters to the `fftw_plan_r2r' function, but
Chris@19 2104 otherwise behave the same as the basic interface. The interpretation
Chris@19 2105 of those additional parameters are the same as for
Chris@19 2106 `fftw_plan_many_dft'. (Of course, the `stride' and `dist' parameters
Chris@19 2107 are now in units of `double', not `fftw_complex'.)
Chris@19 2108
Chris@19 2109 Arrays `n', `inembed', `onembed', and `kind' are not used after this
Chris@19 2110 function returns. You can safely free or reuse them.
Chris@19 2111
Chris@19 2112 
Chris@19 2113 File: fftw3.info, Node: Guru Interface, Next: New-array Execute Functions, Prev: Advanced Interface, Up: FFTW Reference
Chris@19 2114
Chris@19 2115 4.5 Guru Interface
Chris@19 2116 ==================
Chris@19 2117
Chris@19 2118 The "guru" interface to FFTW is intended to expose as much as possible
Chris@19 2119 of the flexibility in the underlying FFTW architecture. It allows one
Chris@19 2120 to compute multi-dimensional "vectors" (loops) of multi-dimensional
Chris@19 2121 transforms, where each vector/transform dimension has an independent
Chris@19 2122 size and stride. One can also use more general complex-number formats,
Chris@19 2123 e.g. separate real and imaginary arrays.
Chris@19 2124
Chris@19 2125 For those users who require the flexibility of the guru interface,
Chris@19 2126 it is important that they pay special attention to the documentation
Chris@19 2127 lest they shoot themselves in the foot.
Chris@19 2128
Chris@19 2129 * Menu:
Chris@19 2130
Chris@19 2131 * Interleaved and split arrays::
Chris@19 2132 * Guru vector and transform sizes::
Chris@19 2133 * Guru Complex DFTs::
Chris@19 2134 * Guru Real-data DFTs::
Chris@19 2135 * Guru Real-to-real Transforms::
Chris@19 2136 * 64-bit Guru Interface::
Chris@19 2137
Chris@19 2138 
Chris@19 2139 File: fftw3.info, Node: Interleaved and split arrays, Next: Guru vector and transform sizes, Prev: Guru Interface, Up: Guru Interface
Chris@19 2140
Chris@19 2141 4.5.1 Interleaved and split arrays
Chris@19 2142 ----------------------------------
Chris@19 2143
Chris@19 2144 The guru interface supports two representations of complex numbers,
Chris@19 2145 which we call the interleaved and the split format.
Chris@19 2146
Chris@19 2147 The "interleaved" format is the same one used by the basic and
Chris@19 2148 advanced interfaces, and it is documented in *note Complex numbers::.
Chris@19 2149 In the interleaved format, you provide pointers to the real part of a
Chris@19 2150 complex number, and the imaginary part understood to be stored in the
Chris@19 2151 next memory location.
Chris@19 2152
Chris@19 2153 The "split" format allows separate pointers to the real and
Chris@19 2154 imaginary parts of a complex array.
Chris@19 2155
Chris@19 2156 Technically, the interleaved format is redundant, because you can
Chris@19 2157 always express an interleaved array in terms of a split array with
Chris@19 2158 appropriate pointers and strides. On the other hand, the interleaved
Chris@19 2159 format is simpler to use, and it is common in practice. Hence, FFTW
Chris@19 2160 supports it as a special case.
Chris@19 2161
Chris@19 2162 
Chris@19 2163 File: fftw3.info, Node: Guru vector and transform sizes, Next: Guru Complex DFTs, Prev: Interleaved and split arrays, Up: Guru Interface
Chris@19 2164
Chris@19 2165 4.5.2 Guru vector and transform sizes
Chris@19 2166 -------------------------------------
Chris@19 2167
Chris@19 2168 The guru interface introduces one basic new data structure,
Chris@19 2169 `fftw_iodim', that is used to specify sizes and strides for
Chris@19 2170 multi-dimensional transforms and vectors:
Chris@19 2171
Chris@19 2172 typedef struct {
Chris@19 2173 int n;
Chris@19 2174 int is;
Chris@19 2175 int os;
Chris@19 2176 } fftw_iodim;
Chris@19 2177
Chris@19 2178 Here, `n' is the size of the dimension, and `is' and `os' are the
Chris@19 2179 strides of that dimension for the input and output arrays. (The stride
Chris@19 2180 is the separation of consecutive elements along this dimension.)
Chris@19 2181
Chris@19 2182 The meaning of the stride parameter depends on the type of the array
Chris@19 2183 that the stride refers to. _If the array is interleaved complex,
Chris@19 2184 strides are expressed in units of complex numbers (`fftw_complex'). If
Chris@19 2185 the array is split complex or real, strides are expressed in units of
Chris@19 2186 real numbers (`double')._ This convention is consistent with the usual
Chris@19 2187 pointer arithmetic in the C language. An interleaved array is denoted
Chris@19 2188 by a pointer `p' to `fftw_complex', so that `p+1' points to the next
Chris@19 2189 complex number. Split arrays are denoted by pointers to `double', in
Chris@19 2190 which case pointer arithmetic operates in units of `sizeof(double)'.
Chris@19 2191
Chris@19 2192 The guru planner interfaces all take a (`rank', `dims[rank]') pair
Chris@19 2193 describing the transform size, and a (`howmany_rank',
Chris@19 2194 `howmany_dims[howmany_rank]') pair describing the "vector" size (a
Chris@19 2195 multi-dimensional loop of transforms to perform), where `dims' and
Chris@19 2196 `howmany_dims' are arrays of `fftw_iodim'.
Chris@19 2197
Chris@19 2198 For example, the `howmany' parameter in the advanced complex-DFT
Chris@19 2199 interface corresponds to `howmany_rank' = 1, `howmany_dims[0].n' =
Chris@19 2200 `howmany', `howmany_dims[0].is' = `idist', and `howmany_dims[0].os' =
Chris@19 2201 `odist'. (To compute a single transform, you can just use
Chris@19 2202 `howmany_rank' = 0.)
Chris@19 2203
Chris@19 2204 A row-major multidimensional array with dimensions `n[rank]' (*note
Chris@19 2205 Row-major Format::) corresponds to `dims[i].n' = `n[i]' and the
Chris@19 2206 recurrence `dims[i].is' = `n[i+1] * dims[i+1].is' (similarly for `os').
Chris@19 2207 The stride of the last (`i=rank-1') dimension is the overall stride of
Chris@19 2208 the array. e.g. to be equivalent to the advanced complex-DFT
Chris@19 2209 interface, you would have `dims[rank-1].is' = `istride' and
Chris@19 2210 `dims[rank-1].os' = `ostride'.
Chris@19 2211
Chris@19 2212 In general, we only guarantee FFTW to return a non-`NULL' plan if
Chris@19 2213 the vector and transform dimensions correspond to a set of distinct
Chris@19 2214 indices, and for in-place transforms the input/output strides should be
Chris@19 2215 the same.
Chris@19 2216
Chris@19 2217 
Chris@19 2218 File: fftw3.info, Node: Guru Complex DFTs, Next: Guru Real-data DFTs, Prev: Guru vector and transform sizes, Up: Guru Interface
Chris@19 2219
Chris@19 2220 4.5.3 Guru Complex DFTs
Chris@19 2221 -----------------------
Chris@19 2222
Chris@19 2223 fftw_plan fftw_plan_guru_dft(
Chris@19 2224 int rank, const fftw_iodim *dims,
Chris@19 2225 int howmany_rank, const fftw_iodim *howmany_dims,
Chris@19 2226 fftw_complex *in, fftw_complex *out,
Chris@19 2227 int sign, unsigned flags);
Chris@19 2228
Chris@19 2229 fftw_plan fftw_plan_guru_split_dft(
Chris@19 2230 int rank, const fftw_iodim *dims,
Chris@19 2231 int howmany_rank, const fftw_iodim *howmany_dims,
Chris@19 2232 double *ri, double *ii, double *ro, double *io,
Chris@19 2233 unsigned flags);
Chris@19 2234
Chris@19 2235 These two functions plan a complex-data, multi-dimensional DFT for
Chris@19 2236 the interleaved and split format, respectively. Transform dimensions
Chris@19 2237 are given by (`rank', `dims') over a multi-dimensional vector (loop) of
Chris@19 2238 dimensions (`howmany_rank', `howmany_dims'). `dims' and `howmany_dims'
Chris@19 2239 should point to `fftw_iodim' arrays of length `rank' and
Chris@19 2240 `howmany_rank', respectively.
Chris@19 2241
Chris@19 2242 `flags' is a bitwise OR (`|') of zero or more planner flags, as
Chris@19 2243 defined in *note Planner Flags::.
Chris@19 2244
Chris@19 2245 In the `fftw_plan_guru_dft' function, the pointers `in' and `out'
Chris@19 2246 point to the interleaved input and output arrays, respectively. The
Chris@19 2247 sign can be either -1 (= `FFTW_FORWARD') or +1 (= `FFTW_BACKWARD'). If
Chris@19 2248 the pointers are equal, the transform is in-place.
Chris@19 2249
Chris@19 2250 In the `fftw_plan_guru_split_dft' function, `ri' and `ii' point to
Chris@19 2251 the real and imaginary input arrays, and `ro' and `io' point to the
Chris@19 2252 real and imaginary output arrays. The input and output pointers may be
Chris@19 2253 the same, indicating an in-place transform. For example, for
Chris@19 2254 `fftw_complex' pointers `in' and `out', the corresponding parameters
Chris@19 2255 are:
Chris@19 2256
Chris@19 2257 ri = (double *) in;
Chris@19 2258 ii = (double *) in + 1;
Chris@19 2259 ro = (double *) out;
Chris@19 2260 io = (double *) out + 1;
Chris@19 2261
Chris@19 2262 Because `fftw_plan_guru_split_dft' accepts split arrays, strides are
Chris@19 2263 expressed in units of `double'. For a contiguous `fftw_complex' array,
Chris@19 2264 the overall stride of the transform should be 2, the distance between
Chris@19 2265 consecutive real parts or between consecutive imaginary parts; see
Chris@19 2266 *note Guru vector and transform sizes::. Note that the dimension
Chris@19 2267 strides are applied equally to the real and imaginary parts; real and
Chris@19 2268 imaginary arrays with different strides are not supported.
Chris@19 2269
Chris@19 2270 There is no `sign' parameter in `fftw_plan_guru_split_dft'. This
Chris@19 2271 function always plans for an `FFTW_FORWARD' transform. To plan for an
Chris@19 2272 `FFTW_BACKWARD' transform, you can exploit the identity that the
Chris@19 2273 backwards DFT is equal to the forwards DFT with the real and imaginary
Chris@19 2274 parts swapped. For example, in the case of the `fftw_complex' arrays
Chris@19 2275 above, the `FFTW_BACKWARD' transform is computed by the parameters:
Chris@19 2276
Chris@19 2277 ri = (double *) in + 1;
Chris@19 2278 ii = (double *) in;
Chris@19 2279 ro = (double *) out + 1;
Chris@19 2280 io = (double *) out;
Chris@19 2281
Chris@19 2282 
Chris@19 2283 File: fftw3.info, Node: Guru Real-data DFTs, Next: Guru Real-to-real Transforms, Prev: Guru Complex DFTs, Up: Guru Interface
Chris@19 2284
Chris@19 2285 4.5.4 Guru Real-data DFTs
Chris@19 2286 -------------------------
Chris@19 2287
Chris@19 2288 fftw_plan fftw_plan_guru_dft_r2c(
Chris@19 2289 int rank, const fftw_iodim *dims,
Chris@19 2290 int howmany_rank, const fftw_iodim *howmany_dims,
Chris@19 2291 double *in, fftw_complex *out,
Chris@19 2292 unsigned flags);
Chris@19 2293
Chris@19 2294 fftw_plan fftw_plan_guru_split_dft_r2c(
Chris@19 2295 int rank, const fftw_iodim *dims,
Chris@19 2296 int howmany_rank, const fftw_iodim *howmany_dims,
Chris@19 2297 double *in, double *ro, double *io,
Chris@19 2298 unsigned flags);
Chris@19 2299
Chris@19 2300 fftw_plan fftw_plan_guru_dft_c2r(
Chris@19 2301 int rank, const fftw_iodim *dims,
Chris@19 2302 int howmany_rank, const fftw_iodim *howmany_dims,
Chris@19 2303 fftw_complex *in, double *out,
Chris@19 2304 unsigned flags);
Chris@19 2305
Chris@19 2306 fftw_plan fftw_plan_guru_split_dft_c2r(
Chris@19 2307 int rank, const fftw_iodim *dims,
Chris@19 2308 int howmany_rank, const fftw_iodim *howmany_dims,
Chris@19 2309 double *ri, double *ii, double *out,
Chris@19 2310 unsigned flags);
Chris@19 2311
Chris@19 2312 Plan a real-input (r2c) or real-output (c2r), multi-dimensional DFT
Chris@19 2313 with transform dimensions given by (`rank', `dims') over a
Chris@19 2314 multi-dimensional vector (loop) of dimensions (`howmany_rank',
Chris@19 2315 `howmany_dims'). `dims' and `howmany_dims' should point to
Chris@19 2316 `fftw_iodim' arrays of length `rank' and `howmany_rank', respectively.
Chris@19 2317 As for the basic and advanced interfaces, an r2c transform is
Chris@19 2318 `FFTW_FORWARD' and a c2r transform is `FFTW_BACKWARD'.
Chris@19 2319
Chris@19 2320 The _last_ dimension of `dims' is interpreted specially: that
Chris@19 2321 dimension of the real array has size `dims[rank-1].n', but that
Chris@19 2322 dimension of the complex array has size `dims[rank-1].n/2+1' (division
Chris@19 2323 rounded down). The strides, on the other hand, are taken to be exactly
Chris@19 2324 as specified. It is up to the user to specify the strides
Chris@19 2325 appropriately for the peculiar dimensions of the data, and we do not
Chris@19 2326 guarantee that the planner will succeed (return non-`NULL') for any
Chris@19 2327 dimensions other than those described in *note Real-data DFT Array
Chris@19 2328 Format:: and generalized in *note Advanced Real-data DFTs::. (That is,
Chris@19 2329 for an in-place transform, each individual dimension should be able to
Chris@19 2330 operate in place.)
Chris@19 2331
Chris@19 2332 `in' and `out' point to the input and output arrays for r2c and c2r
Chris@19 2333 transforms, respectively. For split arrays, `ri' and `ii' point to the
Chris@19 2334 real and imaginary input arrays for a c2r transform, and `ro' and `io'
Chris@19 2335 point to the real and imaginary output arrays for an r2c transform.
Chris@19 2336 `in' and `ro' or `ri' and `out' may be the same, indicating an in-place
Chris@19 2337 transform. (In-place transforms where `in' and `io' or `ii' and `out'
Chris@19 2338 are the same are not currently supported.)
Chris@19 2339
Chris@19 2340 `flags' is a bitwise OR (`|') of zero or more planner flags, as
Chris@19 2341 defined in *note Planner Flags::.
Chris@19 2342
Chris@19 2343 In-place transforms of rank greater than 1 are currently only
Chris@19 2344 supported for interleaved arrays. For split arrays, the planner will
Chris@19 2345 return `NULL'.
Chris@19 2346
Chris@19 2347 
Chris@19 2348 File: fftw3.info, Node: Guru Real-to-real Transforms, Next: 64-bit Guru Interface, Prev: Guru Real-data DFTs, Up: Guru Interface
Chris@19 2349
Chris@19 2350 4.5.5 Guru Real-to-real Transforms
Chris@19 2351 ----------------------------------
Chris@19 2352
Chris@19 2353 fftw_plan fftw_plan_guru_r2r(int rank, const fftw_iodim *dims,
Chris@19 2354 int howmany_rank,
Chris@19 2355 const fftw_iodim *howmany_dims,
Chris@19 2356 double *in, double *out,
Chris@19 2357 const fftw_r2r_kind *kind,
Chris@19 2358 unsigned flags);
Chris@19 2359
Chris@19 2360 Plan a real-to-real (r2r) multi-dimensional `FFTW_FORWARD' transform
Chris@19 2361 with transform dimensions given by (`rank', `dims') over a
Chris@19 2362 multi-dimensional vector (loop) of dimensions (`howmany_rank',
Chris@19 2363 `howmany_dims'). `dims' and `howmany_dims' should point to
Chris@19 2364 `fftw_iodim' arrays of length `rank' and `howmany_rank', respectively.
Chris@19 2365
Chris@19 2366 The transform kind of each dimension is given by the `kind'
Chris@19 2367 parameter, which should point to an array of length `rank'. Valid
Chris@19 2368 `fftw_r2r_kind' constants are given in *note Real-to-Real Transform
Chris@19 2369 Kinds::.
Chris@19 2370
Chris@19 2371 `in' and `out' point to the real input and output arrays; they may
Chris@19 2372 be the same, indicating an in-place transform.
Chris@19 2373
Chris@19 2374 `flags' is a bitwise OR (`|') of zero or more planner flags, as
Chris@19 2375 defined in *note Planner Flags::.
Chris@19 2376
Chris@19 2377 
Chris@19 2378 File: fftw3.info, Node: 64-bit Guru Interface, Prev: Guru Real-to-real Transforms, Up: Guru Interface
Chris@19 2379
Chris@19 2380 4.5.6 64-bit Guru Interface
Chris@19 2381 ---------------------------
Chris@19 2382
Chris@19 2383 When compiled in 64-bit mode on a 64-bit architecture (where addresses
Chris@19 2384 are 64 bits wide), FFTW uses 64-bit quantities internally for all
Chris@19 2385 transform sizes, strides, and so on--you don't have to do anything
Chris@19 2386 special to exploit this. However, in the ordinary FFTW interfaces, you
Chris@19 2387 specify the transform size by an `int' quantity, which is normally only
Chris@19 2388 32 bits wide. This means that, even though FFTW is using 64-bit sizes
Chris@19 2389 internally, you cannot specify a single transform dimension larger than
Chris@19 2390 2^31-1 numbers.
Chris@19 2391
Chris@19 2392 We expect that few users will require transforms larger than this,
Chris@19 2393 but, for those who do, we provide a 64-bit version of the guru
Chris@19 2394 interface in which all sizes are specified as integers of type
Chris@19 2395 `ptrdiff_t' instead of `int'. (`ptrdiff_t' is a signed integer type
Chris@19 2396 defined by the C standard to be wide enough to represent address
Chris@19 2397 differences, and thus must be at least 64 bits wide on a 64-bit
Chris@19 2398 machine.) We stress that there is _no performance advantage_ to using
Chris@19 2399 this interface--the same internal FFTW code is employed regardless--and
Chris@19 2400 it is only necessary if you want to specify very large transform sizes.
Chris@19 2401
Chris@19 2402 In particular, the 64-bit guru interface is a set of planner routines
Chris@19 2403 that are exactly the same as the guru planner routines, except that
Chris@19 2404 they are named with `guru64' instead of `guru' and they take arguments
Chris@19 2405 of type `fftw_iodim64' instead of `fftw_iodim'. For example, instead
Chris@19 2406 of `fftw_plan_guru_dft', we have `fftw_plan_guru64_dft'.
Chris@19 2407
Chris@19 2408 fftw_plan fftw_plan_guru64_dft(
Chris@19 2409 int rank, const fftw_iodim64 *dims,
Chris@19 2410 int howmany_rank, const fftw_iodim64 *howmany_dims,
Chris@19 2411 fftw_complex *in, fftw_complex *out,
Chris@19 2412 int sign, unsigned flags);
Chris@19 2413
Chris@19 2414 The `fftw_iodim64' type is similar to `fftw_iodim', with the same
Chris@19 2415 interpretation, except that it uses type `ptrdiff_t' instead of type
Chris@19 2416 `int'.
Chris@19 2417
Chris@19 2418 typedef struct {
Chris@19 2419 ptrdiff_t n;
Chris@19 2420 ptrdiff_t is;
Chris@19 2421 ptrdiff_t os;
Chris@19 2422 } fftw_iodim64;
Chris@19 2423
Chris@19 2424 Every other `fftw_plan_guru' function also has a `fftw_plan_guru64'
Chris@19 2425 equivalent, but we do not repeat their documentation here since they
Chris@19 2426 are identical to the 32-bit versions except as noted above.
Chris@19 2427
Chris@19 2428 
Chris@19 2429 File: fftw3.info, Node: New-array Execute Functions, Next: Wisdom, Prev: Guru Interface, Up: FFTW Reference
Chris@19 2430
Chris@19 2431 4.6 New-array Execute Functions
Chris@19 2432 ===============================
Chris@19 2433
Chris@19 2434 Normally, one executes a plan for the arrays with which the plan was
Chris@19 2435 created, by calling `fftw_execute(plan)' as described in *note Using
Chris@19 2436 Plans::. However, it is possible for sophisticated users to apply a
Chris@19 2437 given plan to a _different_ array using the "new-array execute"
Chris@19 2438 functions detailed below, provided that the following conditions are
Chris@19 2439 met:
Chris@19 2440
Chris@19 2441 * The array size, strides, etcetera are the same (since those are
Chris@19 2442 set by the plan).
Chris@19 2443
Chris@19 2444 * The input and output arrays are the same (in-place) or different
Chris@19 2445 (out-of-place) if the plan was originally created to be in-place or
Chris@19 2446 out-of-place, respectively.
Chris@19 2447
Chris@19 2448 * For split arrays, the separations between the real and imaginary
Chris@19 2449 parts, `ii-ri' and `io-ro', are the same as they were for the
Chris@19 2450 input and output arrays when the plan was created. (This
Chris@19 2451 condition is automatically satisfied for interleaved arrays.)
Chris@19 2452
Chris@19 2453 * The "alignment" of the new input/output arrays is the same as that
Chris@19 2454 of the input/output arrays when the plan was created, unless the
Chris@19 2455 plan was created with the `FFTW_UNALIGNED' flag. Here, the
Chris@19 2456 alignment is a platform-dependent quantity (for example, it is the
Chris@19 2457 address modulo 16 if SSE SIMD instructions are used, but the
Chris@19 2458 address modulo 4 for non-SIMD single-precision FFTW on the same
Chris@19 2459 machine). In general, only arrays allocated with `fftw_malloc'
Chris@19 2460 are guaranteed to be equally aligned (*note SIMD alignment and
Chris@19 2461 fftw_malloc::).
Chris@19 2462
Chris@19 2463
Chris@19 2464 The alignment issue is especially critical, because if you don't use
Chris@19 2465 `fftw_malloc' then you may have little control over the alignment of
Chris@19 2466 arrays in memory. For example, neither the C++ `new' function nor the
Chris@19 2467 Fortran `allocate' statement provide strong enough guarantees about
Chris@19 2468 data alignment. If you don't use `fftw_malloc', therefore, you
Chris@19 2469 probably have to use `FFTW_UNALIGNED' (which disables most SIMD
Chris@19 2470 support). If possible, it is probably better for you to simply create
Chris@19 2471 multiple plans (creating a new plan is quick once one exists for a
Chris@19 2472 given size), or better yet re-use the same array for your transforms.
Chris@19 2473
Chris@19 2474 For rare circumstances in which you cannot control the alignment of
Chris@19 2475 allocated memory, but wish to determine where a given array is aligned
Chris@19 2476 like the original array for which a plan was created, you can use the
Chris@19 2477 `fftw_alignment_of' function:
Chris@19 2478 int fftw_alignment_of(double *p);
Chris@19 2479 Two arrays have equivalent alignment (for the purposes of applying a
Chris@19 2480 plan) if and only if `fftw_alignment_of' returns the same value for the
Chris@19 2481 corresponding pointers to their data (typecast to `double*' if
Chris@19 2482 necessary).
Chris@19 2483
Chris@19 2484 If you are tempted to use the new-array execute interface because you
Chris@19 2485 want to transform a known bunch of arrays of the same size, you should
Chris@19 2486 probably go use the advanced interface instead (*note Advanced
Chris@19 2487 Interface::)).
Chris@19 2488
Chris@19 2489 The new-array execute functions are:
Chris@19 2490
Chris@19 2491 void fftw_execute_dft(
Chris@19 2492 const fftw_plan p,
Chris@19 2493 fftw_complex *in, fftw_complex *out);
Chris@19 2494
Chris@19 2495 void fftw_execute_split_dft(
Chris@19 2496 const fftw_plan p,
Chris@19 2497 double *ri, double *ii, double *ro, double *io);
Chris@19 2498
Chris@19 2499 void fftw_execute_dft_r2c(
Chris@19 2500 const fftw_plan p,
Chris@19 2501 double *in, fftw_complex *out);
Chris@19 2502
Chris@19 2503 void fftw_execute_split_dft_r2c(
Chris@19 2504 const fftw_plan p,
Chris@19 2505 double *in, double *ro, double *io);
Chris@19 2506
Chris@19 2507 void fftw_execute_dft_c2r(
Chris@19 2508 const fftw_plan p,
Chris@19 2509 fftw_complex *in, double *out);
Chris@19 2510
Chris@19 2511 void fftw_execute_split_dft_c2r(
Chris@19 2512 const fftw_plan p,
Chris@19 2513 double *ri, double *ii, double *out);
Chris@19 2514
Chris@19 2515 void fftw_execute_r2r(
Chris@19 2516 const fftw_plan p,
Chris@19 2517 double *in, double *out);
Chris@19 2518
Chris@19 2519 These execute the `plan' to compute the corresponding transform on
Chris@19 2520 the input/output arrays specified by the subsequent arguments. The
Chris@19 2521 input/output array arguments have the same meanings as the ones passed
Chris@19 2522 to the guru planner routines in the preceding sections. The `plan' is
Chris@19 2523 not modified, and these routines can be called as many times as
Chris@19 2524 desired, or intermixed with calls to the ordinary `fftw_execute'.
Chris@19 2525
Chris@19 2526 The `plan' _must_ have been created for the transform type
Chris@19 2527 corresponding to the execute function, e.g. it must be a complex-DFT
Chris@19 2528 plan for `fftw_execute_dft'. Any of the planner routines for that
Chris@19 2529 transform type, from the basic to the guru interface, could have been
Chris@19 2530 used to create the plan, however.
Chris@19 2531
Chris@19 2532 
Chris@19 2533 File: fftw3.info, Node: Wisdom, Next: What FFTW Really Computes, Prev: New-array Execute Functions, Up: FFTW Reference
Chris@19 2534
Chris@19 2535 4.7 Wisdom
Chris@19 2536 ==========
Chris@19 2537
Chris@19 2538 This section documents the FFTW mechanism for saving and restoring
Chris@19 2539 plans from disk. This mechanism is called "wisdom".
Chris@19 2540
Chris@19 2541 * Menu:
Chris@19 2542
Chris@19 2543 * Wisdom Export::
Chris@19 2544 * Wisdom Import::
Chris@19 2545 * Forgetting Wisdom::
Chris@19 2546 * Wisdom Utilities::
Chris@19 2547
Chris@19 2548 
Chris@19 2549 File: fftw3.info, Node: Wisdom Export, Next: Wisdom Import, Prev: Wisdom, Up: Wisdom
Chris@19 2550
Chris@19 2551 4.7.1 Wisdom Export
Chris@19 2552 -------------------
Chris@19 2553
Chris@19 2554 int fftw_export_wisdom_to_filename(const char *filename);
Chris@19 2555 void fftw_export_wisdom_to_file(FILE *output_file);
Chris@19 2556 char *fftw_export_wisdom_to_string(void);
Chris@19 2557 void fftw_export_wisdom(void (*write_char)(char c, void *), void *data);
Chris@19 2558
Chris@19 2559 These functions allow you to export all currently accumulated wisdom
Chris@19 2560 in a form from which it can be later imported and restored, even during
Chris@19 2561 a separate run of the program. (*Note Words of Wisdom-Saving Plans::.)
Chris@19 2562 The current store of wisdom is not affected by calling any of these
Chris@19 2563 routines.
Chris@19 2564
Chris@19 2565 `fftw_export_wisdom' exports the wisdom to any output medium, as
Chris@19 2566 specified by the callback function `write_char'. `write_char' is a
Chris@19 2567 `putc'-like function that writes the character `c' to some output; its
Chris@19 2568 second parameter is the `data' pointer passed to `fftw_export_wisdom'.
Chris@19 2569 For convenience, the following three "wrapper" routines are provided:
Chris@19 2570
Chris@19 2571 `fftw_export_wisdom_to_filename' writes wisdom to a file named
Chris@19 2572 `filename' (which is created or overwritten), returning `1' on success
Chris@19 2573 and `0' on failure. A lower-level function, which requires you to open
Chris@19 2574 and close the file yourself (e.g. if you want to write wisdom to a
Chris@19 2575 portion of a larger file) is `fftw_export_wisdom_to_file'. This writes
Chris@19 2576 the wisdom to the current position in `output_file', which should be
Chris@19 2577 open with write permission; upon exit, the file remains open and is
Chris@19 2578 positioned at the end of the wisdom data.
Chris@19 2579
Chris@19 2580 `fftw_export_wisdom_to_string' returns a pointer to a
Chris@19 2581 `NULL'-terminated string holding the wisdom data. This string is
Chris@19 2582 dynamically allocated, and it is the responsibility of the caller to
Chris@19 2583 deallocate it with `free' when it is no longer needed.
Chris@19 2584
Chris@19 2585 All of these routines export the wisdom in the same format, which we
Chris@19 2586 will not document here except to say that it is LISP-like ASCII text
Chris@19 2587 that is insensitive to white space.
Chris@19 2588
Chris@19 2589 
Chris@19 2590 File: fftw3.info, Node: Wisdom Import, Next: Forgetting Wisdom, Prev: Wisdom Export, Up: Wisdom
Chris@19 2591
Chris@19 2592 4.7.2 Wisdom Import
Chris@19 2593 -------------------
Chris@19 2594
Chris@19 2595 int fftw_import_system_wisdom(void);
Chris@19 2596 int fftw_import_wisdom_from_filename(const char *filename);
Chris@19 2597 int fftw_import_wisdom_from_string(const char *input_string);
Chris@19 2598 int fftw_import_wisdom(int (*read_char)(void *), void *data);
Chris@19 2599
Chris@19 2600 These functions import wisdom into a program from data stored by the
Chris@19 2601 `fftw_export_wisdom' functions above. (*Note Words of Wisdom-Saving
Chris@19 2602 Plans::.) The imported wisdom replaces any wisdom already accumulated
Chris@19 2603 by the running program.
Chris@19 2604
Chris@19 2605 `fftw_import_wisdom' imports wisdom from any input medium, as
Chris@19 2606 specified by the callback function `read_char'. `read_char' is a
Chris@19 2607 `getc'-like function that returns the next character in the input; its
Chris@19 2608 parameter is the `data' pointer passed to `fftw_import_wisdom'. If the
Chris@19 2609 end of the input data is reached (which should never happen for valid
Chris@19 2610 data), `read_char' should return `EOF' (as defined in `<stdio.h>').
Chris@19 2611 For convenience, the following three "wrapper" routines are provided:
Chris@19 2612
Chris@19 2613 `fftw_import_wisdom_from_filename' reads wisdom from a file named
Chris@19 2614 `filename'. A lower-level function, which requires you to open and
Chris@19 2615 close the file yourself (e.g. if you want to read wisdom from a portion
Chris@19 2616 of a larger file) is `fftw_import_wisdom_from_file'. This reads wisdom
Chris@19 2617 from the current position in `input_file' (which should be open with
Chris@19 2618 read permission); upon exit, the file remains open, but the position of
Chris@19 2619 the read pointer is unspecified.
Chris@19 2620
Chris@19 2621 `fftw_import_wisdom_from_string' reads wisdom from the
Chris@19 2622 `NULL'-terminated string `input_string'.
Chris@19 2623
Chris@19 2624 `fftw_import_system_wisdom' reads wisdom from an
Chris@19 2625 implementation-defined standard file (`/etc/fftw/wisdom' on Unix and
Chris@19 2626 GNU systems).
Chris@19 2627
Chris@19 2628 The return value of these import routines is `1' if the wisdom was
Chris@19 2629 read successfully and `0' otherwise. Note that, in all of these
Chris@19 2630 functions, any data in the input stream past the end of the wisdom data
Chris@19 2631 is simply ignored.
Chris@19 2632
Chris@19 2633 
Chris@19 2634 File: fftw3.info, Node: Forgetting Wisdom, Next: Wisdom Utilities, Prev: Wisdom Import, Up: Wisdom
Chris@19 2635
Chris@19 2636 4.7.3 Forgetting Wisdom
Chris@19 2637 -----------------------
Chris@19 2638
Chris@19 2639 void fftw_forget_wisdom(void);
Chris@19 2640
Chris@19 2641 Calling `fftw_forget_wisdom' causes all accumulated `wisdom' to be
Chris@19 2642 discarded and its associated memory to be freed. (New `wisdom' can
Chris@19 2643 still be gathered subsequently, however.)
Chris@19 2644
Chris@19 2645 
Chris@19 2646 File: fftw3.info, Node: Wisdom Utilities, Prev: Forgetting Wisdom, Up: Wisdom
Chris@19 2647
Chris@19 2648 4.7.4 Wisdom Utilities
Chris@19 2649 ----------------------
Chris@19 2650
Chris@19 2651 FFTW includes two standalone utility programs that deal with wisdom. We
Chris@19 2652 merely summarize them here, since they come with their own `man' pages
Chris@19 2653 for Unix and GNU systems (with HTML versions on our web site).
Chris@19 2654
Chris@19 2655 The first program is `fftw-wisdom' (or `fftwf-wisdom' in single
Chris@19 2656 precision, etcetera), which can be used to create a wisdom file
Chris@19 2657 containing plans for any of the transform sizes and types supported by
Chris@19 2658 FFTW. It is preferable to create wisdom directly from your executable
Chris@19 2659 (*note Caveats in Using Wisdom::), but this program is useful for
Chris@19 2660 creating global wisdom files for `fftw_import_system_wisdom'.
Chris@19 2661
Chris@19 2662 The second program is `fftw-wisdom-to-conf', which takes a wisdom
Chris@19 2663 file as input and produces a "configuration routine" as output. The
Chris@19 2664 latter is a C subroutine that you can compile and link into your
Chris@19 2665 program, replacing a routine of the same name in the FFTW library, that
Chris@19 2666 determines which parts of FFTW are callable by your program.
Chris@19 2667 `fftw-wisdom-to-conf' produces a configuration routine that links to
Chris@19 2668 only those parts of FFTW needed by the saved plans in the wisdom,
Chris@19 2669 greatly reducing the size of statically linked executables (which should
Chris@19 2670 only attempt to create plans corresponding to those in the wisdom,
Chris@19 2671 however).
Chris@19 2672
Chris@19 2673 
Chris@19 2674 File: fftw3.info, Node: What FFTW Really Computes, Prev: Wisdom, Up: FFTW Reference
Chris@19 2675
Chris@19 2676 4.8 What FFTW Really Computes
Chris@19 2677 =============================
Chris@19 2678
Chris@19 2679 In this section, we provide precise mathematical definitions for the
Chris@19 2680 transforms that FFTW computes. These transform definitions are fairly
Chris@19 2681 standard, but some authors follow slightly different conventions for the
Chris@19 2682 normalization of the transform (the constant factor in front) and the
Chris@19 2683 sign of the complex exponent. We begin by presenting the
Chris@19 2684 one-dimensional (1d) transform definitions, and then give the
Chris@19 2685 straightforward extension to multi-dimensional transforms.
Chris@19 2686
Chris@19 2687 * Menu:
Chris@19 2688
Chris@19 2689 * The 1d Discrete Fourier Transform (DFT)::
Chris@19 2690 * The 1d Real-data DFT::
Chris@19 2691 * 1d Real-even DFTs (DCTs)::
Chris@19 2692 * 1d Real-odd DFTs (DSTs)::
Chris@19 2693 * 1d Discrete Hartley Transforms (DHTs)::
Chris@19 2694 * Multi-dimensional Transforms::
Chris@19 2695
Chris@19 2696 
Chris@19 2697 File: fftw3.info, Node: The 1d Discrete Fourier Transform (DFT), Next: The 1d Real-data DFT, Prev: What FFTW Really Computes, Up: What FFTW Really Computes
Chris@19 2698
Chris@19 2699 4.8.1 The 1d Discrete Fourier Transform (DFT)
Chris@19 2700 ---------------------------------------------
Chris@19 2701
Chris@19 2702 The forward (`FFTW_FORWARD') discrete Fourier transform (DFT) of a 1d
Chris@19 2703 complex array X of size n computes an array Y, where: Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(-2 pi j k sqrt(-1)/n) .
Chris@19 2704 The backward (`FFTW_BACKWARD') DFT computes: Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(2 pi j k sqrt(-1)/n) .
Chris@19 2705 FFTW computes an unnormalized transform, in that there is no
Chris@19 2706 coefficient in front of the summation in the DFT. In other words,
Chris@19 2707 applying the forward and then the backward transform will multiply the
Chris@19 2708 input by n.
Chris@19 2709
Chris@19 2710 From above, an `FFTW_FORWARD' transform corresponds to a sign of -1
Chris@19 2711 in the exponent of the DFT. Note also that we use the standard
Chris@19 2712 "in-order" output ordering--the k-th output corresponds to the
Chris@19 2713 frequency k/n (or k/T, where T is your total sampling period). For
Chris@19 2714 those who like to think in terms of positive and negative frequencies,
Chris@19 2715 this means that the positive frequencies are stored in the first half
Chris@19 2716 of the output and the negative frequencies are stored in backwards
Chris@19 2717 order in the second half of the output. (The frequency -k/n is the
Chris@19 2718 same as the frequency (n-k)/n.)
Chris@19 2719
Chris@19 2720 
Chris@19 2721 File: fftw3.info, Node: The 1d Real-data DFT, Next: 1d Real-even DFTs (DCTs), Prev: The 1d Discrete Fourier Transform (DFT), Up: What FFTW Really Computes
Chris@19 2722
Chris@19 2723 4.8.2 The 1d Real-data DFT
Chris@19 2724 --------------------------
Chris@19 2725
Chris@19 2726 The real-input (r2c) DFT in FFTW computes the _forward_ transform Y of
Chris@19 2727 the size `n' real array X, exactly as defined above, i.e. Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(-2 pi j k sqrt(-1)/n) .
Chris@19 2728 This output array Y can easily be shown to possess the "Hermitian"
Chris@19 2729 symmetry Y[k] = Y[n-k]*, where we take Y to be periodic so that Y[n] =
Chris@19 2730 Y[0].
Chris@19 2731
Chris@19 2732 As a result of this symmetry, half of the output Y is redundant
Chris@19 2733 (being the complex conjugate of the other half), and so the 1d r2c
Chris@19 2734 transforms only output elements 0...n/2 of Y (n/2+1 complex numbers),
Chris@19 2735 where the division by 2 is rounded down.
Chris@19 2736
Chris@19 2737 Moreover, the Hermitian symmetry implies that Y[0] and, if n is
Chris@19 2738 even, the Y[n/2] element, are purely real. So, for the `R2HC' r2r
Chris@19 2739 transform, these elements are not stored in the halfcomplex output
Chris@19 2740 format.
Chris@19 2741
Chris@19 2742 The c2r and `H2RC' r2r transforms compute the backward DFT of the
Chris@19 2743 _complex_ array X with Hermitian symmetry, stored in the r2c/`R2HC'
Chris@19 2744 output formats, respectively, where the backward transform is defined
Chris@19 2745 exactly as for the complex case: Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(2 pi j k sqrt(-1)/n) .
Chris@19 2746 The outputs `Y' of this transform can easily be seen to be purely
Chris@19 2747 real, and are stored as an array of real numbers.
Chris@19 2748
Chris@19 2749 Like FFTW's complex DFT, these transforms are unnormalized. In other
Chris@19 2750 words, applying the real-to-complex (forward) and then the
Chris@19 2751 complex-to-real (backward) transform will multiply the input by n.
Chris@19 2752
Chris@19 2753 
Chris@19 2754 File: fftw3.info, Node: 1d Real-even DFTs (DCTs), Next: 1d Real-odd DFTs (DSTs), Prev: The 1d Real-data DFT, Up: What FFTW Really Computes
Chris@19 2755
Chris@19 2756 4.8.3 1d Real-even DFTs (DCTs)
Chris@19 2757 ------------------------------
Chris@19 2758
Chris@19 2759 The Real-even symmetry DFTs in FFTW are exactly equivalent to the
Chris@19 2760 unnormalized forward (and backward) DFTs as defined above, where the
Chris@19 2761 input array X of length N is purely real and is also "even" symmetry.
Chris@19 2762 In this case, the output array is likewise real and even symmetry.
Chris@19 2763
Chris@19 2764 For the case of `REDFT00', this even symmetry means that X[j] =
Chris@19 2765 X[N-j], where we take X to be periodic so that X[N] = X[0]. Because of
Chris@19 2766 this redundancy, only the first n real numbers are actually stored,
Chris@19 2767 where N = 2(n-1).
Chris@19 2768
Chris@19 2769 The proper definition of even symmetry for `REDFT10', `REDFT01', and
Chris@19 2770 `REDFT11' transforms is somewhat more intricate because of the shifts
Chris@19 2771 by 1/2 of the input and/or output, although the corresponding boundary
Chris@19 2772 conditions are given in *note Real even/odd DFTs (cosine/sine
Chris@19 2773 transforms)::. Because of the even symmetry, however, the sine terms
Chris@19 2774 in the DFT all cancel and the remaining cosine terms are written
Chris@19 2775 explicitly below. This formulation often leads people to call such a
Chris@19 2776 transform a "discrete cosine transform" (DCT), although it is really
Chris@19 2777 just a special case of the DFT.
Chris@19 2778
Chris@19 2779 In each of the definitions below, we transform a real array X of
Chris@19 2780 length n to a real array Y of length n:
Chris@19 2781
Chris@19 2782 REDFT00 (DCT-I)
Chris@19 2783 ...............
Chris@19 2784
Chris@19 2785 An `REDFT00' transform (type-I DCT) in FFTW is defined by: Y[k] = X[0]
Chris@19 2786 + (-1)^k X[n-1] + 2 (sum for j = 1 to n-2 of X[j] cos(pi jk /(n-1))).
Chris@19 2787 Note that this transform is not defined for n=1. For n=2, the
Chris@19 2788 summation term above is dropped as you might expect.
Chris@19 2789
Chris@19 2790 REDFT10 (DCT-II)
Chris@19 2791 ................
Chris@19 2792
Chris@19 2793 An `REDFT10' transform (type-II DCT, sometimes called "the" DCT) in
Chris@19 2794 FFTW is defined by: Y[k] = 2 (sum for j = 0 to n-1 of X[j] cos(pi
Chris@19 2795 (j+1/2) k / n)).
Chris@19 2796
Chris@19 2797 REDFT01 (DCT-III)
Chris@19 2798 .................
Chris@19 2799
Chris@19 2800 An `REDFT01' transform (type-III DCT) in FFTW is defined by: Y[k] =
Chris@19 2801 X[0] + 2 (sum for j = 1 to n-1 of X[j] cos(pi j (k+1/2) / n)). In the
Chris@19 2802 case of n=1, this reduces to Y[0] = X[0]. Up to a scale factor (see
Chris@19 2803 below), this is the inverse of `REDFT10' ("the" DCT), and so the
Chris@19 2804 `REDFT01' (DCT-III) is sometimes called the "IDCT".
Chris@19 2805
Chris@19 2806 REDFT11 (DCT-IV)
Chris@19 2807 ................
Chris@19 2808
Chris@19 2809 An `REDFT11' transform (type-IV DCT) in FFTW is defined by: Y[k] = 2
Chris@19 2810 (sum for j = 0 to n-1 of X[j] cos(pi (j+1/2) (k+1/2) / n)).
Chris@19 2811
Chris@19 2812 Inverses and Normalization
Chris@19 2813 ..........................
Chris@19 2814
Chris@19 2815 These definitions correspond directly to the unnormalized DFTs used
Chris@19 2816 elsewhere in FFTW (hence the factors of 2 in front of the summations).
Chris@19 2817 The unnormalized inverse of `REDFT00' is `REDFT00', of `REDFT10' is
Chris@19 2818 `REDFT01' and vice versa, and of `REDFT11' is `REDFT11'. Each
Chris@19 2819 unnormalized inverse results in the original array multiplied by N,
Chris@19 2820 where N is the _logical_ DFT size. For `REDFT00', N=2(n-1) (note that
Chris@19 2821 n=1 is not defined); otherwise, N=2n.
Chris@19 2822
Chris@19 2823 In defining the discrete cosine transform, some authors also include
Chris@19 2824 additional factors of sqrt(2) (or its inverse) multiplying selected
Chris@19 2825 inputs and/or outputs. This is a mostly cosmetic change that makes the
Chris@19 2826 transform orthogonal, but sacrifices the direct equivalence to a
Chris@19 2827 symmetric DFT.
Chris@19 2828
Chris@19 2829 
Chris@19 2830 File: fftw3.info, Node: 1d Real-odd DFTs (DSTs), Next: 1d Discrete Hartley Transforms (DHTs), Prev: 1d Real-even DFTs (DCTs), Up: What FFTW Really Computes
Chris@19 2831
Chris@19 2832 4.8.4 1d Real-odd DFTs (DSTs)
Chris@19 2833 -----------------------------
Chris@19 2834
Chris@19 2835 The Real-odd symmetry DFTs in FFTW are exactly equivalent to the
Chris@19 2836 unnormalized forward (and backward) DFTs as defined above, where the
Chris@19 2837 input array X of length N is purely real and is also "odd" symmetry. In
Chris@19 2838 this case, the output is odd symmetry and purely imaginary.
Chris@19 2839
Chris@19 2840 For the case of `RODFT00', this odd symmetry means that X[j] =
Chris@19 2841 -X[N-j], where we take X to be periodic so that X[N] = X[0]. Because
Chris@19 2842 of this redundancy, only the first n real numbers starting at j=1 are
Chris@19 2843 actually stored (the j=0 element is zero), where N = 2(n+1).
Chris@19 2844
Chris@19 2845 The proper definition of odd symmetry for `RODFT10', `RODFT01', and
Chris@19 2846 `RODFT11' transforms is somewhat more intricate because of the shifts
Chris@19 2847 by 1/2 of the input and/or output, although the corresponding boundary
Chris@19 2848 conditions are given in *note Real even/odd DFTs (cosine/sine
Chris@19 2849 transforms)::. Because of the odd symmetry, however, the cosine terms
Chris@19 2850 in the DFT all cancel and the remaining sine terms are written
Chris@19 2851 explicitly below. This formulation often leads people to call such a
Chris@19 2852 transform a "discrete sine transform" (DST), although it is really just
Chris@19 2853 a special case of the DFT.
Chris@19 2854
Chris@19 2855 In each of the definitions below, we transform a real array X of
Chris@19 2856 length n to a real array Y of length n:
Chris@19 2857
Chris@19 2858 RODFT00 (DST-I)
Chris@19 2859 ...............
Chris@19 2860
Chris@19 2861 An `RODFT00' transform (type-I DST) in FFTW is defined by: Y[k] = 2
Chris@19 2862 (sum for j = 0 to n-1 of X[j] sin(pi (j+1)(k+1) / (n+1))).
Chris@19 2863
Chris@19 2864 RODFT10 (DST-II)
Chris@19 2865 ................
Chris@19 2866
Chris@19 2867 An `RODFT10' transform (type-II DST) in FFTW is defined by: Y[k] = 2
Chris@19 2868 (sum for j = 0 to n-1 of X[j] sin(pi (j+1/2) (k+1) / n)).
Chris@19 2869
Chris@19 2870 RODFT01 (DST-III)
Chris@19 2871 .................
Chris@19 2872
Chris@19 2873 An `RODFT01' transform (type-III DST) in FFTW is defined by: Y[k] =
Chris@19 2874 (-1)^k X[n-1] + 2 (sum for j = 0 to n-2 of X[j] sin(pi (j+1) (k+1/2) /
Chris@19 2875 n)). In the case of n=1, this reduces to Y[0] = X[0].
Chris@19 2876
Chris@19 2877 RODFT11 (DST-IV)
Chris@19 2878 ................
Chris@19 2879
Chris@19 2880 An `RODFT11' transform (type-IV DST) in FFTW is defined by: Y[k] = 2
Chris@19 2881 (sum for j = 0 to n-1 of X[j] sin(pi (j+1/2) (k+1/2) / n)).
Chris@19 2882
Chris@19 2883 Inverses and Normalization
Chris@19 2884 ..........................
Chris@19 2885
Chris@19 2886 These definitions correspond directly to the unnormalized DFTs used
Chris@19 2887 elsewhere in FFTW (hence the factors of 2 in front of the summations).
Chris@19 2888 The unnormalized inverse of `RODFT00' is `RODFT00', of `RODFT10' is
Chris@19 2889 `RODFT01' and vice versa, and of `RODFT11' is `RODFT11'. Each
Chris@19 2890 unnormalized inverse results in the original array multiplied by N,
Chris@19 2891 where N is the _logical_ DFT size. For `RODFT00', N=2(n+1); otherwise,
Chris@19 2892 N=2n.
Chris@19 2893
Chris@19 2894 In defining the discrete sine transform, some authors also include
Chris@19 2895 additional factors of sqrt(2) (or its inverse) multiplying selected
Chris@19 2896 inputs and/or outputs. This is a mostly cosmetic change that makes the
Chris@19 2897 transform orthogonal, but sacrifices the direct equivalence to an
Chris@19 2898 antisymmetric DFT.
Chris@19 2899
Chris@19 2900 
Chris@19 2901 File: fftw3.info, Node: 1d Discrete Hartley Transforms (DHTs), Next: Multi-dimensional Transforms, Prev: 1d Real-odd DFTs (DSTs), Up: What FFTW Really Computes
Chris@19 2902
Chris@19 2903 4.8.5 1d Discrete Hartley Transforms (DHTs)
Chris@19 2904 -------------------------------------------
Chris@19 2905
Chris@19 2906 The discrete Hartley transform (DHT) of a 1d real array X of size n
Chris@19 2907 computes a real array Y of the same size, where: Y[k] = sum for j = 0 to (n - 1) of X[j] * [cos(2 pi j k / n) + sin(2 pi j k / n)].
Chris@19 2908 FFTW computes an unnormalized transform, in that there is no
Chris@19 2909 coefficient in front of the summation in the DHT. In other words,
Chris@19 2910 applying the transform twice (the DHT is its own inverse) will multiply
Chris@19 2911 the input by n.
Chris@19 2912
Chris@19 2913 
Chris@19 2914 File: fftw3.info, Node: Multi-dimensional Transforms, Prev: 1d Discrete Hartley Transforms (DHTs), Up: What FFTW Really Computes
Chris@19 2915
Chris@19 2916 4.8.6 Multi-dimensional Transforms
Chris@19 2917 ----------------------------------
Chris@19 2918
Chris@19 2919 The multi-dimensional transforms of FFTW, in general, compute simply the
Chris@19 2920 separable product of the given 1d transform along each dimension of the
Chris@19 2921 array. Since each of these transforms is unnormalized, computing the
Chris@19 2922 forward followed by the backward/inverse multi-dimensional transform
Chris@19 2923 will result in the original array scaled by the product of the
Chris@19 2924 normalization factors for each dimension (e.g. the product of the
Chris@19 2925 dimension sizes, for a multi-dimensional DFT).
Chris@19 2926
Chris@19 2927 The definition of FFTW's multi-dimensional DFT of real data (r2c)
Chris@19 2928 deserves special attention. In this case, we logically compute the full
Chris@19 2929 multi-dimensional DFT of the input data; since the input data are purely
Chris@19 2930 real, the output data have the Hermitian symmetry and therefore only one
Chris@19 2931 non-redundant half need be stored. More specifically, for an n[0] x
Chris@19 2932 n[1] x n[2] x ... x n[d-1] multi-dimensional real-input DFT, the full
Chris@19 2933 (logical) complex output array Y[k[0], k[1], ..., k[d-1]] has the
Chris@19 2934 symmetry: Y[k[0], k[1], ..., k[d-1]] = Y[n[0] - k[0], n[1] - k[1], ...,
Chris@19 2935 n[d-1] - k[d-1]]* (where each dimension is periodic). Because of this
Chris@19 2936 symmetry, we only store the k[d-1] = 0...n[d-1]/2 elements of the
Chris@19 2937 _last_ dimension (division by 2 is rounded down). (We could instead
Chris@19 2938 have cut any other dimension in half, but the last dimension proved
Chris@19 2939 computationally convenient.) This results in the peculiar array format
Chris@19 2940 described in more detail by *note Real-data DFT Array Format::.
Chris@19 2941
Chris@19 2942 The multi-dimensional c2r transform is simply the unnormalized
Chris@19 2943 inverse of the r2c transform. i.e. it is the same as FFTW's complex
Chris@19 2944 backward multi-dimensional DFT, operating on a Hermitian input array in
Chris@19 2945 the peculiar format mentioned above and outputting a real array (since
Chris@19 2946 the DFT output is purely real).
Chris@19 2947
Chris@19 2948 We should remind the user that the separable product of 1d transforms
Chris@19 2949 along each dimension, as computed by FFTW, is not always the same thing
Chris@19 2950 as the usual multi-dimensional transform. A multi-dimensional `R2HC'
Chris@19 2951 (or `HC2R') transform is not identical to the multi-dimensional DFT,
Chris@19 2952 requiring some post-processing to combine the requisite real and
Chris@19 2953 imaginary parts, as was described in *note The Halfcomplex-format
Chris@19 2954 DFT::. Likewise, FFTW's multidimensional `FFTW_DHT' r2r transform is
Chris@19 2955 not the same thing as the logical multi-dimensional discrete Hartley
Chris@19 2956 transform defined in the literature, as discussed in *note The Discrete
Chris@19 2957 Hartley Transform::.
Chris@19 2958
Chris@19 2959 
Chris@19 2960 File: fftw3.info, Node: Multi-threaded FFTW, Next: Distributed-memory FFTW with MPI, Prev: FFTW Reference, Up: Top
Chris@19 2961
Chris@19 2962 5 Multi-threaded FFTW
Chris@19 2963 *********************
Chris@19 2964
Chris@19 2965 In this chapter we document the parallel FFTW routines for
Chris@19 2966 shared-memory parallel hardware. These routines, which support
Chris@19 2967 parallel one- and multi-dimensional transforms of both real and complex
Chris@19 2968 data, are the easiest way to take advantage of multiple processors with
Chris@19 2969 FFTW. They work just like the corresponding uniprocessor transform
Chris@19 2970 routines, except that you have an extra initialization routine to call,
Chris@19 2971 and there is a routine to set the number of threads to employ. Any
Chris@19 2972 program that uses the uniprocessor FFTW can therefore be trivially
Chris@19 2973 modified to use the multi-threaded FFTW.
Chris@19 2974
Chris@19 2975 A shared-memory machine is one in which all CPUs can directly access
Chris@19 2976 the same main memory, and such machines are now common due to the
Chris@19 2977 ubiquity of multi-core CPUs. FFTW's multi-threading support allows you
Chris@19 2978 to utilize these additional CPUs transparently from a single program.
Chris@19 2979 However, this does not necessarily translate into performance
Chris@19 2980 gains--when multiple threads/CPUs are employed, there is an overhead
Chris@19 2981 required for synchronization that may outweigh the computatational
Chris@19 2982 parallelism. Therefore, you can only benefit from threads if your
Chris@19 2983 problem is sufficiently large.
Chris@19 2984
Chris@19 2985 * Menu:
Chris@19 2986
Chris@19 2987 * Installation and Supported Hardware/Software::
Chris@19 2988 * Usage of Multi-threaded FFTW::
Chris@19 2989 * How Many Threads to Use?::
Chris@19 2990 * Thread safety::
Chris@19 2991
Chris@19 2992 
Chris@19 2993 File: fftw3.info, Node: Installation and Supported Hardware/Software, Next: Usage of Multi-threaded FFTW, Prev: Multi-threaded FFTW, Up: Multi-threaded FFTW
Chris@19 2994
Chris@19 2995 5.1 Installation and Supported Hardware/Software
Chris@19 2996 ================================================
Chris@19 2997
Chris@19 2998 All of the FFTW threads code is located in the `threads' subdirectory
Chris@19 2999 of the FFTW package. On Unix systems, the FFTW threads libraries and
Chris@19 3000 header files can be automatically configured, compiled, and installed
Chris@19 3001 along with the uniprocessor FFTW libraries simply by including
Chris@19 3002 `--enable-threads' in the flags to the `configure' script (*note
Chris@19 3003 Installation on Unix::), or `--enable-openmp' to use OpenMP
Chris@19 3004 (http://www.openmp.org) threads.
Chris@19 3005
Chris@19 3006 The threads routines require your operating system to have some sort
Chris@19 3007 of shared-memory threads support. Specifically, the FFTW threads
Chris@19 3008 package works with POSIX threads (available on most Unix variants, from
Chris@19 3009 GNU/Linux to MacOS X) and Win32 threads. OpenMP threads, which are
Chris@19 3010 supported in many common compilers (e.g. gcc) are also supported, and
Chris@19 3011 may give better performance on some systems. (OpenMP threads are also
Chris@19 3012 useful if you are employing OpenMP in your own code, in order to
Chris@19 3013 minimize conflicts between threading models.) If you have a
Chris@19 3014 shared-memory machine that uses a different threads API, it should be a
Chris@19 3015 simple matter of programming to include support for it; see the file
Chris@19 3016 `threads/threads.c' for more detail.
Chris@19 3017
Chris@19 3018 You can compile FFTW with _both_ `--enable-threads' and
Chris@19 3019 `--enable-openmp' at the same time, since they install libraries with
Chris@19 3020 different names (`fftw3_threads' and `fftw3_omp', as described below).
Chris@19 3021 However, your programs may only link to _one_ of these two libraries at
Chris@19 3022 a time.
Chris@19 3023
Chris@19 3024 Ideally, of course, you should also have multiple processors in
Chris@19 3025 order to get any benefit from the threaded transforms.
Chris@19 3026
Chris@19 3027 
Chris@19 3028 File: fftw3.info, Node: Usage of Multi-threaded FFTW, Next: How Many Threads to Use?, Prev: Installation and Supported Hardware/Software, Up: Multi-threaded FFTW
Chris@19 3029
Chris@19 3030 5.2 Usage of Multi-threaded FFTW
Chris@19 3031 ================================
Chris@19 3032
Chris@19 3033 Here, it is assumed that the reader is already familiar with the usage
Chris@19 3034 of the uniprocessor FFTW routines, described elsewhere in this manual.
Chris@19 3035 We only describe what one has to change in order to use the
Chris@19 3036 multi-threaded routines.
Chris@19 3037
Chris@19 3038 First, programs using the parallel complex transforms should be
Chris@19 3039 linked with `-lfftw3_threads -lfftw3 -lm' on Unix, or `-lfftw3_omp
Chris@19 3040 -lfftw3 -lm' if you compiled with OpenMP. You will also need to link
Chris@19 3041 with whatever library is responsible for threads on your system (e.g.
Chris@19 3042 `-lpthread' on GNU/Linux) or include whatever compiler flag enables
Chris@19 3043 OpenMP (e.g. `-fopenmp' with gcc).
Chris@19 3044
Chris@19 3045 Second, before calling _any_ FFTW routines, you should call the
Chris@19 3046 function:
Chris@19 3047
Chris@19 3048 int fftw_init_threads(void);
Chris@19 3049
Chris@19 3050 This function, which need only be called once, performs any one-time
Chris@19 3051 initialization required to use threads on your system. It returns zero
Chris@19 3052 if there was some error (which should not happen under normal
Chris@19 3053 circumstances) and a non-zero value otherwise.
Chris@19 3054
Chris@19 3055 Third, before creating a plan that you want to parallelize, you
Chris@19 3056 should call:
Chris@19 3057
Chris@19 3058 void fftw_plan_with_nthreads(int nthreads);
Chris@19 3059
Chris@19 3060 The `nthreads' argument indicates the number of threads you want
Chris@19 3061 FFTW to use (or actually, the maximum number). All plans subsequently
Chris@19 3062 created with any planner routine will use that many threads. You can
Chris@19 3063 call `fftw_plan_with_nthreads', create some plans, call
Chris@19 3064 `fftw_plan_with_nthreads' again with a different argument, and create
Chris@19 3065 some more plans for a new number of threads. Plans already created
Chris@19 3066 before a call to `fftw_plan_with_nthreads' are unaffected. If you pass
Chris@19 3067 an `nthreads' argument of `1' (the default), threads are disabled for
Chris@19 3068 subsequent plans.
Chris@19 3069
Chris@19 3070 With OpenMP, to configure FFTW to use all of the currently running
Chris@19 3071 OpenMP threads (set by `omp_set_num_threads(nthreads)' or by the
Chris@19 3072 `OMP_NUM_THREADS' environment variable), you can do:
Chris@19 3073 `fftw_plan_with_nthreads(omp_get_max_threads())'. (The `omp_' OpenMP
Chris@19 3074 functions are declared via `#include <omp.h>'.)
Chris@19 3075
Chris@19 3076 Given a plan, you then execute it as usual with
Chris@19 3077 `fftw_execute(plan)', and the execution will use the number of threads
Chris@19 3078 specified when the plan was created. When done, you destroy it as
Chris@19 3079 usual with `fftw_destroy_plan'. As described in *note Thread safety::,
Chris@19 3080 plan _execution_ is thread-safe, but plan creation and destruction are
Chris@19 3081 _not_: you should create/destroy plans only from a single thread, but
Chris@19 3082 can safely execute multiple plans in parallel.
Chris@19 3083
Chris@19 3084 There is one additional routine: if you want to get rid of all memory
Chris@19 3085 and other resources allocated internally by FFTW, you can call:
Chris@19 3086
Chris@19 3087 void fftw_cleanup_threads(void);
Chris@19 3088
Chris@19 3089 which is much like the `fftw_cleanup()' function except that it also
Chris@19 3090 gets rid of threads-related data. You must _not_ execute any
Chris@19 3091 previously created plans after calling this function.
Chris@19 3092
Chris@19 3093 We should also mention one other restriction: if you save wisdom
Chris@19 3094 from a program using the multi-threaded FFTW, that wisdom _cannot be
Chris@19 3095 used_ by a program using only the single-threaded FFTW (i.e. not calling
Chris@19 3096 `fftw_init_threads'). *Note Words of Wisdom-Saving Plans::.
Chris@19 3097
Chris@19 3098 
Chris@19 3099 File: fftw3.info, Node: How Many Threads to Use?, Next: Thread safety, Prev: Usage of Multi-threaded FFTW, Up: Multi-threaded FFTW
Chris@19 3100
Chris@19 3101 5.3 How Many Threads to Use?
Chris@19 3102 ============================
Chris@19 3103
Chris@19 3104 There is a fair amount of overhead involved in synchronizing threads,
Chris@19 3105 so the optimal number of threads to use depends upon the size of the
Chris@19 3106 transform as well as on the number of processors you have.
Chris@19 3107
Chris@19 3108 As a general rule, you don't want to use more threads than you have
Chris@19 3109 processors. (Using more threads will work, but there will be extra
Chris@19 3110 overhead with no benefit.) In fact, if the problem size is too small,
Chris@19 3111 you may want to use fewer threads than you have processors.
Chris@19 3112
Chris@19 3113 You will have to experiment with your system to see what level of
Chris@19 3114 parallelization is best for your problem size. Typically, the problem
Chris@19 3115 will have to involve at least a few thousand data points before threads
Chris@19 3116 become beneficial. If you plan with `FFTW_PATIENT', it will
Chris@19 3117 automatically disable threads for sizes that don't benefit from
Chris@19 3118 parallelization.
Chris@19 3119
Chris@19 3120 
Chris@19 3121 File: fftw3.info, Node: Thread safety, Prev: How Many Threads to Use?, Up: Multi-threaded FFTW
Chris@19 3122
Chris@19 3123 5.4 Thread safety
Chris@19 3124 =================
Chris@19 3125
Chris@19 3126 Users writing multi-threaded programs (including OpenMP) must concern
Chris@19 3127 themselves with the "thread safety" of the libraries they use--that is,
Chris@19 3128 whether it is safe to call routines in parallel from multiple threads.
Chris@19 3129 FFTW can be used in such an environment, but some care must be taken
Chris@19 3130 because the planner routines share data (e.g. wisdom and trigonometric
Chris@19 3131 tables) between calls and plans.
Chris@19 3132
Chris@19 3133 The upshot is that the only thread-safe (re-entrant) routine in FFTW
Chris@19 3134 is `fftw_execute' (and the new-array variants thereof). All other
Chris@19 3135 routines (e.g. the planner) should only be called from one thread at a
Chris@19 3136 time. So, for example, you can wrap a semaphore lock around any calls
Chris@19 3137 to the planner; even more simply, you can just create all of your plans
Chris@19 3138 from one thread. We do not think this should be an important
Chris@19 3139 restriction (FFTW is designed for the situation where the only
Chris@19 3140 performance-sensitive code is the actual execution of the transform),
Chris@19 3141 and the benefits of shared data between plans are great.
Chris@19 3142
Chris@19 3143 Note also that, since the plan is not modified by `fftw_execute', it
Chris@19 3144 is safe to execute the _same plan_ in parallel by multiple threads.
Chris@19 3145 However, since a given plan operates by default on a fixed array, you
Chris@19 3146 need to use one of the new-array execute functions (*note New-array
Chris@19 3147 Execute Functions::) so that different threads compute the transform of
Chris@19 3148 different data.
Chris@19 3149
Chris@19 3150 (Users should note that these comments only apply to programs using
Chris@19 3151 shared-memory threads or OpenMP. Parallelism using MPI or forked
Chris@19 3152 processes involves a separate address-space and global variables for
Chris@19 3153 each process, and is not susceptible to problems of this sort.)
Chris@19 3154
Chris@19 3155 If you are configured FFTW with the `--enable-debug' or
Chris@19 3156 `--enable-debug-malloc' flags (*note Installation on Unix::), then
Chris@19 3157 `fftw_execute' is not thread-safe. These flags are not documented
Chris@19 3158 because they are intended only for developing and debugging FFTW, but
Chris@19 3159 if you must use `--enable-debug' then you should also specifically pass
Chris@19 3160 `--disable-debug-malloc' for `fftw_execute' to be thread-safe.
Chris@19 3161
Chris@19 3162 
Chris@19 3163 File: fftw3.info, Node: Distributed-memory FFTW with MPI, Next: Calling FFTW from Modern Fortran, Prev: Multi-threaded FFTW, Up: Top
Chris@19 3164
Chris@19 3165 6 Distributed-memory FFTW with MPI
Chris@19 3166 **********************************
Chris@19 3167
Chris@19 3168 In this chapter we document the parallel FFTW routines for parallel
Chris@19 3169 systems supporting the MPI message-passing interface. Unlike the
Chris@19 3170 shared-memory threads described in the previous chapter, MPI allows you
Chris@19 3171 to use _distributed-memory_ parallelism, where each CPU has its own
Chris@19 3172 separate memory, and which can scale up to clusters of many thousands
Chris@19 3173 of processors. This capability comes at a price, however: each process
Chris@19 3174 only stores a _portion_ of the data to be transformed, which means that
Chris@19 3175 the data structures and programming-interface are quite different from
Chris@19 3176 the serial or threads versions of FFTW.
Chris@19 3177
Chris@19 3178 Distributed-memory parallelism is especially useful when you are
Chris@19 3179 transforming arrays so large that they do not fit into the memory of a
Chris@19 3180 single processor. The storage per-process required by FFTW's MPI
Chris@19 3181 routines is proportional to the total array size divided by the number
Chris@19 3182 of processes. Conversely, distributed-memory parallelism can easily
Chris@19 3183 pose an unacceptably high communications overhead for small problems;
Chris@19 3184 the threshold problem size for which parallelism becomes advantageous
Chris@19 3185 will depend on the precise problem you are interested in, your
Chris@19 3186 hardware, and your MPI implementation.
Chris@19 3187
Chris@19 3188 A note on terminology: in MPI, you divide the data among a set of
Chris@19 3189 "processes" which each run in their own memory address space.
Chris@19 3190 Generally, each process runs on a different physical processor, but
Chris@19 3191 this is not required. A set of processes in MPI is described by an
Chris@19 3192 opaque data structure called a "communicator," the most common of which
Chris@19 3193 is the predefined communicator `MPI_COMM_WORLD' which refers to _all_
Chris@19 3194 processes. For more information on these and other concepts common to
Chris@19 3195 all MPI programs, we refer the reader to the documentation at the MPI
Chris@19 3196 home page (http://www.mcs.anl.gov/research/projects/mpi/).
Chris@19 3197
Chris@19 3198 We assume in this chapter that the reader is familiar with the usage
Chris@19 3199 of the serial (uniprocessor) FFTW, and focus only on the concepts new
Chris@19 3200 to the MPI interface.
Chris@19 3201
Chris@19 3202 * Menu:
Chris@19 3203
Chris@19 3204 * FFTW MPI Installation::
Chris@19 3205 * Linking and Initializing MPI FFTW::
Chris@19 3206 * 2d MPI example::
Chris@19 3207 * MPI Data Distribution::
Chris@19 3208 * Multi-dimensional MPI DFTs of Real Data::
Chris@19 3209 * Other Multi-dimensional Real-data MPI Transforms::
Chris@19 3210 * FFTW MPI Transposes::
Chris@19 3211 * FFTW MPI Wisdom::
Chris@19 3212 * Avoiding MPI Deadlocks::
Chris@19 3213 * FFTW MPI Performance Tips::
Chris@19 3214 * Combining MPI and Threads::
Chris@19 3215 * FFTW MPI Reference::
Chris@19 3216 * FFTW MPI Fortran Interface::
Chris@19 3217
Chris@19 3218 
Chris@19 3219 File: fftw3.info, Node: FFTW MPI Installation, Next: Linking and Initializing MPI FFTW, Prev: Distributed-memory FFTW with MPI, Up: Distributed-memory FFTW with MPI
Chris@19 3220
Chris@19 3221 6.1 FFTW MPI Installation
Chris@19 3222 =========================
Chris@19 3223
Chris@19 3224 All of the FFTW MPI code is located in the `mpi' subdirectory of the
Chris@19 3225 FFTW package. On Unix systems, the FFTW MPI libraries and header files
Chris@19 3226 are automatically configured, compiled, and installed along with the
Chris@19 3227 uniprocessor FFTW libraries simply by including `--enable-mpi' in the
Chris@19 3228 flags to the `configure' script (*note Installation on Unix::).
Chris@19 3229
Chris@19 3230 Any implementation of the MPI standard, version 1 or later, should
Chris@19 3231 work with FFTW. The `configure' script will attempt to automatically
Chris@19 3232 detect how to compile and link code using your MPI implementation. In
Chris@19 3233 some cases, especially if you have multiple different MPI
Chris@19 3234 implementations installed or have an unusual MPI software package, you
Chris@19 3235 may need to provide this information explicitly.
Chris@19 3236
Chris@19 3237 Most commonly, one compiles MPI code by invoking a special compiler
Chris@19 3238 command, typically `mpicc' for C code. The `configure' script knows
Chris@19 3239 the most common names for this command, but you can specify the MPI
Chris@19 3240 compilation command explicitly by setting the `MPICC' variable, as in
Chris@19 3241 `./configure MPICC=mpicc ...'.
Chris@19 3242
Chris@19 3243 If, instead of a special compiler command, you need to link a certain
Chris@19 3244 library, you can specify the link command via the `MPILIBS' variable,
Chris@19 3245 as in `./configure MPILIBS=-lmpi ...'. Note that if your MPI library
Chris@19 3246 is installed in a non-standard location (one the compiler does not know
Chris@19 3247 about by default), you may also have to specify the location of the
Chris@19 3248 library and header files via `LDFLAGS' and `CPPFLAGS' variables,
Chris@19 3249 respectively, as in `./configure LDFLAGS=-L/path/to/mpi/libs
Chris@19 3250 CPPFLAGS=-I/path/to/mpi/include ...'.
Chris@19 3251
Chris@19 3252 
Chris@19 3253 File: fftw3.info, Node: Linking and Initializing MPI FFTW, Next: 2d MPI example, Prev: FFTW MPI Installation, Up: Distributed-memory FFTW with MPI
Chris@19 3254
Chris@19 3255 6.2 Linking and Initializing MPI FFTW
Chris@19 3256 =====================================
Chris@19 3257
Chris@19 3258 Programs using the MPI FFTW routines should be linked with `-lfftw3_mpi
Chris@19 3259 -lfftw3 -lm' on Unix in double precision, `-lfftw3f_mpi -lfftw3f -lm'
Chris@19 3260 in single precision, and so on (*note Precision::). You will also need
Chris@19 3261 to link with whatever library is responsible for MPI on your system; in
Chris@19 3262 most MPI implementations, there is a special compiler alias named
Chris@19 3263 `mpicc' to compile and link MPI code.
Chris@19 3264
Chris@19 3265 Before calling any FFTW routines except possibly `fftw_init_threads'
Chris@19 3266 (*note Combining MPI and Threads::), but after calling `MPI_Init', you
Chris@19 3267 should call the function:
Chris@19 3268
Chris@19 3269 void fftw_mpi_init(void);
Chris@19 3270
Chris@19 3271 If, at the end of your program, you want to get rid of all memory and
Chris@19 3272 other resources allocated internally by FFTW, for both the serial and
Chris@19 3273 MPI routines, you can call:
Chris@19 3274
Chris@19 3275 void fftw_mpi_cleanup(void);
Chris@19 3276
Chris@19 3277 which is much like the `fftw_cleanup()' function except that it also
Chris@19 3278 gets rid of FFTW's MPI-related data. You must _not_ execute any
Chris@19 3279 previously created plans after calling this function.
Chris@19 3280
Chris@19 3281 
Chris@19 3282 File: fftw3.info, Node: 2d MPI example, Next: MPI Data Distribution, Prev: Linking and Initializing MPI FFTW, Up: Distributed-memory FFTW with MPI
Chris@19 3283
Chris@19 3284 6.3 2d MPI example
Chris@19 3285 ==================
Chris@19 3286
Chris@19 3287 Before we document the FFTW MPI interface in detail, we begin with a
Chris@19 3288 simple example outlining how one would perform a two-dimensional `N0'
Chris@19 3289 by `N1' complex DFT.
Chris@19 3290
Chris@19 3291 #include <fftw3-mpi.h>
Chris@19 3292
Chris@19 3293 int main(int argc, char **argv)
Chris@19 3294 {
Chris@19 3295 const ptrdiff_t N0 = ..., N1 = ...;
Chris@19 3296 fftw_plan plan;
Chris@19 3297 fftw_complex *data;
Chris@19 3298 ptrdiff_t alloc_local, local_n0, local_0_start, i, j;
Chris@19 3299
Chris@19 3300 MPI_Init(&argc, &argv);
Chris@19 3301 fftw_mpi_init();
Chris@19 3302
Chris@19 3303 /* get local data size and allocate */
Chris@19 3304 alloc_local = fftw_mpi_local_size_2d(N0, N1, MPI_COMM_WORLD,
Chris@19 3305 &local_n0, &local_0_start);
Chris@19 3306 data = fftw_alloc_complex(alloc_local);
Chris@19 3307
Chris@19 3308 /* create plan for in-place forward DFT */
Chris@19 3309 plan = fftw_mpi_plan_dft_2d(N0, N1, data, data, MPI_COMM_WORLD,
Chris@19 3310 FFTW_FORWARD, FFTW_ESTIMATE);
Chris@19 3311
Chris@19 3312 /* initialize data to some function my_function(x,y) */
Chris@19 3313 for (i = 0; i < local_n0; ++i) for (j = 0; j < N1; ++j)
Chris@19 3314 data[i*N1 + j] = my_function(local_0_start + i, j);
Chris@19 3315
Chris@19 3316 /* compute transforms, in-place, as many times as desired */
Chris@19 3317 fftw_execute(plan);
Chris@19 3318
Chris@19 3319 fftw_destroy_plan(plan);
Chris@19 3320
Chris@19 3321 MPI_Finalize();
Chris@19 3322 }
Chris@19 3323
Chris@19 3324 As can be seen above, the MPI interface follows the same basic style
Chris@19 3325 of allocate/plan/execute/destroy as the serial FFTW routines. All of
Chris@19 3326 the MPI-specific routines are prefixed with `fftw_mpi_' instead of
Chris@19 3327 `fftw_'. There are a few important differences, however:
Chris@19 3328
Chris@19 3329 First, we must call `fftw_mpi_init()' after calling `MPI_Init'
Chris@19 3330 (required in all MPI programs) and before calling any other `fftw_mpi_'
Chris@19 3331 routine.
Chris@19 3332
Chris@19 3333 Second, when we create the plan with `fftw_mpi_plan_dft_2d',
Chris@19 3334 analogous to `fftw_plan_dft_2d', we pass an additional argument: the
Chris@19 3335 communicator, indicating which processes will participate in the
Chris@19 3336 transform (here `MPI_COMM_WORLD', indicating all processes). Whenever
Chris@19 3337 you create, execute, or destroy a plan for an MPI transform, you must
Chris@19 3338 call the corresponding FFTW routine on _all_ processes in the
Chris@19 3339 communicator for that transform. (That is, these are _collective_
Chris@19 3340 calls.) Note that the plan for the MPI transform uses the standard
Chris@19 3341 `fftw_execute' and `fftw_destroy' routines (on the other hand, there
Chris@19 3342 are MPI-specific new-array execute functions documented below).
Chris@19 3343
Chris@19 3344 Third, all of the FFTW MPI routines take `ptrdiff_t' arguments
Chris@19 3345 instead of `int' as for the serial FFTW. `ptrdiff_t' is a standard C
Chris@19 3346 integer type which is (at least) 32 bits wide on a 32-bit machine and
Chris@19 3347 64 bits wide on a 64-bit machine. This is to make it easy to specify
Chris@19 3348 very large parallel transforms on a 64-bit machine. (You can specify
Chris@19 3349 64-bit transform sizes in the serial FFTW, too, but only by using the
Chris@19 3350 `guru64' planner interface. *Note 64-bit Guru Interface::.)
Chris@19 3351
Chris@19 3352 Fourth, and most importantly, you don't allocate the entire
Chris@19 3353 two-dimensional array on each process. Instead, you call
Chris@19 3354 `fftw_mpi_local_size_2d' to find out what _portion_ of the array
Chris@19 3355 resides on each processor, and how much space to allocate. Here, the
Chris@19 3356 portion of the array on each process is a `local_n0' by `N1' slice of
Chris@19 3357 the total array, starting at index `local_0_start'. The total number
Chris@19 3358 of `fftw_complex' numbers to allocate is given by the `alloc_local'
Chris@19 3359 return value, which _may_ be greater than `local_n0 * N1' (in case some
Chris@19 3360 intermediate calculations require additional storage). The data
Chris@19 3361 distribution in FFTW's MPI interface is described in more detail by the
Chris@19 3362 next section.
Chris@19 3363
Chris@19 3364 Given the portion of the array that resides on the local process, it
Chris@19 3365 is straightforward to initialize the data (here to a function
Chris@19 3366 `myfunction') and otherwise manipulate it. Of course, at the end of
Chris@19 3367 the program you may want to output the data somehow, but synchronizing
Chris@19 3368 this output is up to you and is beyond the scope of this manual. (One
Chris@19 3369 good way to output a large multi-dimensional distributed array in MPI
Chris@19 3370 to a portable binary file is to use the free HDF5 library; see the HDF
Chris@19 3371 home page (http://www.hdfgroup.org/).)
Chris@19 3372
Chris@19 3373 
Chris@19 3374 File: fftw3.info, Node: MPI Data Distribution, Next: Multi-dimensional MPI DFTs of Real Data, Prev: 2d MPI example, Up: Distributed-memory FFTW with MPI
Chris@19 3375
Chris@19 3376 6.4 MPI Data Distribution
Chris@19 3377 =========================
Chris@19 3378
Chris@19 3379 The most important concept to understand in using FFTW's MPI interface
Chris@19 3380 is the data distribution. With a serial or multithreaded FFT, all of
Chris@19 3381 the inputs and outputs are stored as a single contiguous chunk of
Chris@19 3382 memory. With a distributed-memory FFT, the inputs and outputs are
Chris@19 3383 broken into disjoint blocks, one per process.
Chris@19 3384
Chris@19 3385 In particular, FFTW uses a _1d block distribution_ of the data,
Chris@19 3386 distributed along the _first dimension_. For example, if you want to
Chris@19 3387 perform a 100 x 200 complex DFT, distributed over 4 processes, each
Chris@19 3388 process will get a 25 x 200 slice of the data. That is, process 0
Chris@19 3389 will get rows 0 through 24, process 1 will get rows 25 through 49,
Chris@19 3390 process 2 will get rows 50 through 74, and process 3 will get rows 75
Chris@19 3391 through 99. If you take the same array but distribute it over 3
Chris@19 3392 processes, then it is not evenly divisible so the different processes
Chris@19 3393 will have unequal chunks. FFTW's default choice in this case is to
Chris@19 3394 assign 34 rows to processes 0 and 1, and 32 rows to process 2.
Chris@19 3395
Chris@19 3396 FFTW provides several `fftw_mpi_local_size' routines that you can
Chris@19 3397 call to find out what portion of an array is stored on the current
Chris@19 3398 process. In most cases, you should use the default block sizes picked
Chris@19 3399 by FFTW, but it is also possible to specify your own block size. For
Chris@19 3400 example, with a 100 x 200 array on three processes, you can tell FFTW
Chris@19 3401 to use a block size of 40, which would assign 40 rows to processes 0
Chris@19 3402 and 1, and 20 rows to process 2. FFTW's default is to divide the data
Chris@19 3403 equally among the processes if possible, and as best it can otherwise.
Chris@19 3404 The rows are always assigned in "rank order," i.e. process 0 gets the
Chris@19 3405 first block of rows, then process 1, and so on. (You can change this
Chris@19 3406 by using `MPI_Comm_split' to create a new communicator with re-ordered
Chris@19 3407 processes.) However, you should always call the `fftw_mpi_local_size'
Chris@19 3408 routines, if possible, rather than trying to predict FFTW's
Chris@19 3409 distribution choices.
Chris@19 3410
Chris@19 3411 In particular, it is critical that you allocate the storage size that
Chris@19 3412 is returned by `fftw_mpi_local_size', which is _not_ necessarily the
Chris@19 3413 size of the local slice of the array. The reason is that intermediate
Chris@19 3414 steps of FFTW's algorithms involve transposing the array and
Chris@19 3415 redistributing the data, so at these intermediate steps FFTW may
Chris@19 3416 require more local storage space (albeit always proportional to the
Chris@19 3417 total size divided by the number of processes). The
Chris@19 3418 `fftw_mpi_local_size' functions know how much storage is required for
Chris@19 3419 these intermediate steps and tell you the correct amount to allocate.
Chris@19 3420
Chris@19 3421 * Menu:
Chris@19 3422
Chris@19 3423 * Basic and advanced distribution interfaces::
Chris@19 3424 * Load balancing::
Chris@19 3425 * Transposed distributions::
Chris@19 3426 * One-dimensional distributions::
Chris@19 3427
Chris@19 3428 
Chris@19 3429 File: fftw3.info, Node: Basic and advanced distribution interfaces, Next: Load balancing, Prev: MPI Data Distribution, Up: MPI Data Distribution
Chris@19 3430
Chris@19 3431 6.4.1 Basic and advanced distribution interfaces
Chris@19 3432 ------------------------------------------------
Chris@19 3433
Chris@19 3434 As with the planner interface, the `fftw_mpi_local_size' distribution
Chris@19 3435 interface is broken into basic and advanced (`_many') interfaces, where
Chris@19 3436 the latter allows you to specify the block size manually and also to
Chris@19 3437 request block sizes when computing multiple transforms simultaneously.
Chris@19 3438 These functions are documented more exhaustively by the FFTW MPI
Chris@19 3439 Reference, but we summarize the basic ideas here using a couple of
Chris@19 3440 two-dimensional examples.
Chris@19 3441
Chris@19 3442 For the 100 x 200 complex-DFT example, above, we would find the
Chris@19 3443 distribution by calling the following function in the basic interface:
Chris@19 3444
Chris@19 3445 ptrdiff_t fftw_mpi_local_size_2d(ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
Chris@19 3446 ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
Chris@19 3447
Chris@19 3448 Given the total size of the data to be transformed (here, `n0 = 100'
Chris@19 3449 and `n1 = 200') and an MPI communicator (`comm'), this function
Chris@19 3450 provides three numbers.
Chris@19 3451
Chris@19 3452 First, it describes the shape of the local data: the current process
Chris@19 3453 should store a `local_n0' by `n1' slice of the overall dataset, in
Chris@19 3454 row-major order (`n1' dimension contiguous), starting at index
Chris@19 3455 `local_0_start'. That is, if the total dataset is viewed as a `n0' by
Chris@19 3456 `n1' matrix, the current process should store the rows `local_0_start'
Chris@19 3457 to `local_0_start+local_n0-1'. Obviously, if you are running with only
Chris@19 3458 a single MPI process, that process will store the entire array:
Chris@19 3459 `local_0_start' will be zero and `local_n0' will be `n0'. *Note
Chris@19 3460 Row-major Format::.
Chris@19 3461
Chris@19 3462 Second, the return value is the total number of data elements (e.g.,
Chris@19 3463 complex numbers for a complex DFT) that should be allocated for the
Chris@19 3464 input and output arrays on the current process (ideally with
Chris@19 3465 `fftw_malloc' or an `fftw_alloc' function, to ensure optimal
Chris@19 3466 alignment). It might seem that this should always be equal to
Chris@19 3467 `local_n0 * n1', but this is _not_ the case. FFTW's distributed FFT
Chris@19 3468 algorithms require data redistributions at intermediate stages of the
Chris@19 3469 transform, and in some circumstances this may require slightly larger
Chris@19 3470 local storage. This is discussed in more detail below, under *note
Chris@19 3471 Load balancing::.
Chris@19 3472
Chris@19 3473 The advanced-interface `local_size' function for multidimensional
Chris@19 3474 transforms returns the same three things (`local_n0', `local_0_start',
Chris@19 3475 and the total number of elements to allocate), but takes more inputs:
Chris@19 3476
Chris@19 3477 ptrdiff_t fftw_mpi_local_size_many(int rnk, const ptrdiff_t *n,
Chris@19 3478 ptrdiff_t howmany,
Chris@19 3479 ptrdiff_t block0,
Chris@19 3480 MPI_Comm comm,
Chris@19 3481 ptrdiff_t *local_n0,
Chris@19 3482 ptrdiff_t *local_0_start);
Chris@19 3483
Chris@19 3484 The two-dimensional case above corresponds to `rnk = 2' and an array
Chris@19 3485 `n' of length 2 with `n[0] = n0' and `n[1] = n1'. This routine is for
Chris@19 3486 any `rnk > 1'; one-dimensional transforms have their own interface
Chris@19 3487 because they work slightly differently, as discussed below.
Chris@19 3488
Chris@19 3489 First, the advanced interface allows you to perform multiple
Chris@19 3490 transforms at once, of interleaved data, as specified by the `howmany'
Chris@19 3491 parameter. (`hoamany' is 1 for a single transform.)
Chris@19 3492
Chris@19 3493 Second, here you can specify your desired block size in the `n0'
Chris@19 3494 dimension, `block0'. To use FFTW's default block size, pass
Chris@19 3495 `FFTW_MPI_DEFAULT_BLOCK' (0) for `block0'. Otherwise, on `P'
Chris@19 3496 processes, FFTW will return `local_n0' equal to `block0' on the first
Chris@19 3497 `P / block0' processes (rounded down), return `local_n0' equal to `n0 -
Chris@19 3498 block0 * (P / block0)' on the next process, and `local_n0' equal to
Chris@19 3499 zero on any remaining processes. In general, we recommend using the
Chris@19 3500 default block size (which corresponds to `n0 / P', rounded up).
Chris@19 3501
Chris@19 3502 For example, suppose you have `P = 4' processes and `n0 = 21'. The
Chris@19 3503 default will be a block size of `6', which will give `local_n0 = 6' on
Chris@19 3504 the first three processes and `local_n0 = 3' on the last process.
Chris@19 3505 Instead, however, you could specify `block0 = 5' if you wanted, which
Chris@19 3506 would give `local_n0 = 5' on processes 0 to 2, `local_n0 = 6' on
Chris@19 3507 process 3. (This choice, while it may look superficially more
Chris@19 3508 "balanced," has the same critical path as FFTW's default but requires
Chris@19 3509 more communications.)
Chris@19 3510
Chris@19 3511 
Chris@19 3512 File: fftw3.info, Node: Load balancing, Next: Transposed distributions, Prev: Basic and advanced distribution interfaces, Up: MPI Data Distribution
Chris@19 3513
Chris@19 3514 6.4.2 Load balancing
Chris@19 3515 --------------------
Chris@19 3516
Chris@19 3517 Ideally, when you parallelize a transform over some P processes, each
Chris@19 3518 process should end up with work that takes equal time. Otherwise, all
Chris@19 3519 of the processes end up waiting on whichever process is slowest. This
Chris@19 3520 goal is known as "load balancing." In this section, we describe the
Chris@19 3521 circumstances under which FFTW is able to load-balance well, and in
Chris@19 3522 particular how you should choose your transform size in order to load
Chris@19 3523 balance.
Chris@19 3524
Chris@19 3525 Load balancing is especially difficult when you are parallelizing
Chris@19 3526 over heterogeneous machines; for example, if one of your processors is a
Chris@19 3527 old 486 and another is a Pentium IV, obviously you should give the
Chris@19 3528 Pentium more work to do than the 486 since the latter is much slower.
Chris@19 3529 FFTW does not deal with this problem, however--it assumes that your
Chris@19 3530 processes run on hardware of comparable speed, and that the goal is
Chris@19 3531 therefore to divide the problem as equally as possible.
Chris@19 3532
Chris@19 3533 For a multi-dimensional complex DFT, FFTW can divide the problem
Chris@19 3534 equally among the processes if: (i) the _first_ dimension `n0' is
Chris@19 3535 divisible by P; and (ii), the _product_ of the subsequent dimensions is
Chris@19 3536 divisible by P. (For the advanced interface, where you can specify
Chris@19 3537 multiple simultaneous transforms via some "vector" length `howmany', a
Chris@19 3538 factor of `howmany' is included in the product of the subsequent
Chris@19 3539 dimensions.)
Chris@19 3540
Chris@19 3541 For a one-dimensional complex DFT, the length `N' of the data should
Chris@19 3542 be divisible by P _squared_ to be able to divide the problem equally
Chris@19 3543 among the processes.
Chris@19 3544
Chris@19 3545 
Chris@19 3546 File: fftw3.info, Node: Transposed distributions, Next: One-dimensional distributions, Prev: Load balancing, Up: MPI Data Distribution
Chris@19 3547
Chris@19 3548 6.4.3 Transposed distributions
Chris@19 3549 ------------------------------
Chris@19 3550
Chris@19 3551 Internally, FFTW's MPI transform algorithms work by first computing
Chris@19 3552 transforms of the data local to each process, then by globally
Chris@19 3553 _transposing_ the data in some fashion to redistribute the data among
Chris@19 3554 the processes, transforming the new data local to each process, and
Chris@19 3555 transposing back. For example, a two-dimensional `n0' by `n1' array,
Chris@19 3556 distributed across the `n0' dimension, is transformd by: (i)
Chris@19 3557 transforming the `n1' dimension, which are local to each process; (ii)
Chris@19 3558 transposing to an `n1' by `n0' array, distributed across the `n1'
Chris@19 3559 dimension; (iii) transforming the `n0' dimension, which is now local to
Chris@19 3560 each process; (iv) transposing back.
Chris@19 3561
Chris@19 3562 However, in many applications it is acceptable to compute a
Chris@19 3563 multidimensional DFT whose results are produced in transposed order
Chris@19 3564 (e.g., `n1' by `n0' in two dimensions). This provides a significant
Chris@19 3565 performance advantage, because it means that the final transposition
Chris@19 3566 step can be omitted. FFTW supports this optimization, which you
Chris@19 3567 specify by passing the flag `FFTW_MPI_TRANSPOSED_OUT' to the planner
Chris@19 3568 routines. To compute the inverse transform of transposed output, you
Chris@19 3569 specify `FFTW_MPI_TRANSPOSED_IN' to tell it that the input is
Chris@19 3570 transposed. In this section, we explain how to interpret the output
Chris@19 3571 format of such a transform.
Chris@19 3572
Chris@19 3573 Suppose you have are transforming multi-dimensional data with (at
Chris@19 3574 least two) dimensions n[0] x n[1] x n[2] x ... x n[d-1] . As always,
Chris@19 3575 it is distributed along the first dimension n[0] . Now, if we compute
Chris@19 3576 its DFT with the `FFTW_MPI_TRANSPOSED_OUT' flag, the resulting output
Chris@19 3577 data are stored with the first _two_ dimensions transposed: n[1] x n[0]
Chris@19 3578 x n[2] x ... x n[d-1] , distributed along the n[1] dimension.
Chris@19 3579 Conversely, if we take the n[1] x n[0] x n[2] x ... x n[d-1] data and
Chris@19 3580 transform it with the `FFTW_MPI_TRANSPOSED_IN' flag, then the format
Chris@19 3581 goes back to the original n[0] x n[1] x n[2] x ... x n[d-1] array.
Chris@19 3582
Chris@19 3583 There are two ways to find the portion of the transposed array that
Chris@19 3584 resides on the current process. First, you can simply call the
Chris@19 3585 appropriate `local_size' function, passing n[1] x n[0] x n[2] x ... x
Chris@19 3586 n[d-1] (the transposed dimensions). This would mean calling the
Chris@19 3587 `local_size' function twice, once for the transposed and once for the
Chris@19 3588 non-transposed dimensions. Alternatively, you can call one of the
Chris@19 3589 `local_size_transposed' functions, which returns both the
Chris@19 3590 non-transposed and transposed data distribution from a single call.
Chris@19 3591 For example, for a 3d transform with transposed output (or input), you
Chris@19 3592 might call:
Chris@19 3593
Chris@19 3594 ptrdiff_t fftw_mpi_local_size_3d_transposed(
Chris@19 3595 ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2, MPI_Comm comm,
Chris@19 3596 ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
Chris@19 3597 ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
Chris@19 3598
Chris@19 3599 Here, `local_n0' and `local_0_start' give the size and starting
Chris@19 3600 index of the `n0' dimension for the _non_-transposed data, as in the
Chris@19 3601 previous sections. For _transposed_ data (e.g. the output for
Chris@19 3602 `FFTW_MPI_TRANSPOSED_OUT'), `local_n1' and `local_1_start' give the
Chris@19 3603 size and starting index of the `n1' dimension, which is the first
Chris@19 3604 dimension of the transposed data (`n1' by `n0' by `n2').
Chris@19 3605
Chris@19 3606 (Note that `FFTW_MPI_TRANSPOSED_IN' is completely equivalent to
Chris@19 3607 performing `FFTW_MPI_TRANSPOSED_OUT' and passing the first two
Chris@19 3608 dimensions to the planner in reverse order, or vice versa. If you pass
Chris@19 3609 _both_ the `FFTW_MPI_TRANSPOSED_IN' and `FFTW_MPI_TRANSPOSED_OUT'
Chris@19 3610 flags, it is equivalent to swapping the first two dimensions passed to
Chris@19 3611 the planner and passing _neither_ flag.)
Chris@19 3612
Chris@19 3613 
Chris@19 3614 File: fftw3.info, Node: One-dimensional distributions, Prev: Transposed distributions, Up: MPI Data Distribution
Chris@19 3615
Chris@19 3616 6.4.4 One-dimensional distributions
Chris@19 3617 -----------------------------------
Chris@19 3618
Chris@19 3619 For one-dimensional distributed DFTs using FFTW, matters are slightly
Chris@19 3620 more complicated because the data distribution is more closely tied to
Chris@19 3621 how the algorithm works. In particular, you can no longer pass an
Chris@19 3622 arbitrary block size and must accept FFTW's default; also, the block
Chris@19 3623 sizes may be different for input and output. Also, the data
Chris@19 3624 distribution depends on the flags and transform direction, in order for
Chris@19 3625 forward and backward transforms to work correctly.
Chris@19 3626
Chris@19 3627 ptrdiff_t fftw_mpi_local_size_1d(ptrdiff_t n0, MPI_Comm comm,
Chris@19 3628 int sign, unsigned flags,
Chris@19 3629 ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
Chris@19 3630 ptrdiff_t *local_no, ptrdiff_t *local_o_start);
Chris@19 3631
Chris@19 3632 This function computes the data distribution for a 1d transform of
Chris@19 3633 size `n0' with the given transform `sign' and `flags'. Both input and
Chris@19 3634 output data use block distributions. The input on the current process
Chris@19 3635 will consist of `local_ni' numbers starting at index `local_i_start';
Chris@19 3636 e.g. if only a single process is used, then `local_ni' will be `n0' and
Chris@19 3637 `local_i_start' will be `0'. Similarly for the output, with `local_no'
Chris@19 3638 numbers starting at index `local_o_start'. The return value of
Chris@19 3639 `fftw_mpi_local_size_1d' will be the total number of elements to
Chris@19 3640 allocate on the current process (which might be slightly larger than
Chris@19 3641 the local size due to intermediate steps in the algorithm).
Chris@19 3642
Chris@19 3643 As mentioned above (*note Load balancing::), the data will be divided
Chris@19 3644 equally among the processes if `n0' is divisible by the _square_ of the
Chris@19 3645 number of processes. In this case, `local_ni' will equal `local_no'.
Chris@19 3646 Otherwise, they may be different.
Chris@19 3647
Chris@19 3648 For some applications, such as convolutions, the order of the output
Chris@19 3649 data is irrelevant. In this case, performance can be improved by
Chris@19 3650 specifying that the output data be stored in an FFTW-defined
Chris@19 3651 "scrambled" format. (In particular, this is the analogue of transposed
Chris@19 3652 output in the multidimensional case: scrambled output saves a
Chris@19 3653 communications step.) If you pass `FFTW_MPI_SCRAMBLED_OUT' in the
Chris@19 3654 flags, then the output is stored in this (undocumented) scrambled
Chris@19 3655 order. Conversely, to perform the inverse transform of data in
Chris@19 3656 scrambled order, pass the `FFTW_MPI_SCRAMBLED_IN' flag.
Chris@19 3657
Chris@19 3658 In MPI FFTW, only composite sizes `n0' can be parallelized; we have
Chris@19 3659 not yet implemented a parallel algorithm for large prime sizes.
Chris@19 3660
Chris@19 3661 
Chris@19 3662 File: fftw3.info, Node: Multi-dimensional MPI DFTs of Real Data, Next: Other Multi-dimensional Real-data MPI Transforms, Prev: MPI Data Distribution, Up: Distributed-memory FFTW with MPI
Chris@19 3663
Chris@19 3664 6.5 Multi-dimensional MPI DFTs of Real Data
Chris@19 3665 ===========================================
Chris@19 3666
Chris@19 3667 FFTW's MPI interface also supports multi-dimensional DFTs of real data,
Chris@19 3668 similar to the serial r2c and c2r interfaces. (Parallel
Chris@19 3669 one-dimensional real-data DFTs are not currently supported; you must
Chris@19 3670 use a complex transform and set the imaginary parts of the inputs to
Chris@19 3671 zero.)
Chris@19 3672
Chris@19 3673 The key points to understand for r2c and c2r MPI transforms (compared
Chris@19 3674 to the MPI complex DFTs or the serial r2c/c2r transforms), are:
Chris@19 3675
Chris@19 3676 * Just as for serial transforms, r2c/c2r DFTs transform n[0] x n[1]
Chris@19 3677 x n[2] x ... x n[d-1] real data to/from n[0] x n[1] x n[2] x ...
Chris@19 3678 x (n[d-1]/2 + 1) complex data: the last dimension of the complex
Chris@19 3679 data is cut in half (rounded down), plus one. As for the serial
Chris@19 3680 transforms, the sizes you pass to the `plan_dft_r2c' and
Chris@19 3681 `plan_dft_c2r' are the n[0] x n[1] x n[2] x ... x n[d-1]
Chris@19 3682 dimensions of the real data.
Chris@19 3683
Chris@19 3684 * Although the real data is _conceptually_ n[0] x n[1] x n[2] x ...
Chris@19 3685 x n[d-1] , it is _physically_ stored as an n[0] x n[1] x n[2] x
Chris@19 3686 ... x [2 (n[d-1]/2 + 1)] array, where the last dimension has been
Chris@19 3687 _padded_ to make it the same size as the complex output. This is
Chris@19 3688 much like the in-place serial r2c/c2r interface (*note
Chris@19 3689 Multi-Dimensional DFTs of Real Data::), except that in MPI the
Chris@19 3690 padding is required even for out-of-place data. The extra padding
Chris@19 3691 numbers are ignored by FFTW (they are _not_ like zero-padding the
Chris@19 3692 transform to a larger size); they are only used to determine the
Chris@19 3693 data layout.
Chris@19 3694
Chris@19 3695 * The data distribution in MPI for _both_ the real and complex data
Chris@19 3696 is determined by the shape of the _complex_ data. That is, you
Chris@19 3697 call the appropriate `local size' function for the n[0] x n[1] x
Chris@19 3698 n[2] x ... x (n[d-1]/2 + 1)
Chris@19 3699
Chris@19 3700 complex data, and then use the _same_ distribution for the real
Chris@19 3701 data except that the last complex dimension is replaced by a
Chris@19 3702 (padded) real dimension of twice the length.
Chris@19 3703
Chris@19 3704
Chris@19 3705 For example suppose we are performing an out-of-place r2c transform
Chris@19 3706 of L x M x N real data [padded to L x M x 2(N/2+1) ], resulting in L x
Chris@19 3707 M x N/2+1 complex data. Similar to the example in *note 2d MPI
Chris@19 3708 example::, we might do something like:
Chris@19 3709
Chris@19 3710 #include <fftw3-mpi.h>
Chris@19 3711
Chris@19 3712 int main(int argc, char **argv)
Chris@19 3713 {
Chris@19 3714 const ptrdiff_t L = ..., M = ..., N = ...;
Chris@19 3715 fftw_plan plan;
Chris@19 3716 double *rin;
Chris@19 3717 fftw_complex *cout;
Chris@19 3718 ptrdiff_t alloc_local, local_n0, local_0_start, i, j, k;
Chris@19 3719
Chris@19 3720 MPI_Init(&argc, &argv);
Chris@19 3721 fftw_mpi_init();
Chris@19 3722
Chris@19 3723 /* get local data size and allocate */
Chris@19 3724 alloc_local = fftw_mpi_local_size_3d(L, M, N/2+1, MPI_COMM_WORLD,
Chris@19 3725 &local_n0, &local_0_start);
Chris@19 3726 rin = fftw_alloc_real(2 * alloc_local);
Chris@19 3727 cout = fftw_alloc_complex(alloc_local);
Chris@19 3728
Chris@19 3729 /* create plan for out-of-place r2c DFT */
Chris@19 3730 plan = fftw_mpi_plan_dft_r2c_3d(L, M, N, rin, cout, MPI_COMM_WORLD,
Chris@19 3731 FFTW_MEASURE);
Chris@19 3732
Chris@19 3733 /* initialize rin to some function my_func(x,y,z) */
Chris@19 3734 for (i = 0; i < local_n0; ++i)
Chris@19 3735 for (j = 0; j < M; ++j)
Chris@19 3736 for (k = 0; k < N; ++k)
Chris@19 3737 rin[(i*M + j) * (2*(N/2+1)) + k] = my_func(local_0_start+i, j, k);
Chris@19 3738
Chris@19 3739 /* compute transforms as many times as desired */
Chris@19 3740 fftw_execute(plan);
Chris@19 3741
Chris@19 3742 fftw_destroy_plan(plan);
Chris@19 3743
Chris@19 3744 MPI_Finalize();
Chris@19 3745 }
Chris@19 3746
Chris@19 3747 Note that we allocated `rin' using `fftw_alloc_real' with an
Chris@19 3748 argument of `2 * alloc_local': since `alloc_local' is the number of
Chris@19 3749 _complex_ values to allocate, the number of _real_ values is twice as
Chris@19 3750 many. The `rin' array is then local_n0 x M x 2(N/2+1) in row-major
Chris@19 3751 order, so its `(i,j,k)' element is at the index `(i*M + j) *
Chris@19 3752 (2*(N/2+1)) + k' (*note Multi-dimensional Array Format::).
Chris@19 3753
Chris@19 3754 As for the complex transforms, improved performance can be obtained
Chris@19 3755 by specifying that the output is the transpose of the input or vice
Chris@19 3756 versa (*note Transposed distributions::). In our L x M x N r2c
Chris@19 3757 example, including `FFTW_TRANSPOSED_OUT' in the flags means that the
Chris@19 3758 input would be a padded L x M x 2(N/2+1) real array distributed over
Chris@19 3759 the `L' dimension, while the output would be a M x L x N/2+1 complex
Chris@19 3760 array distributed over the `M' dimension. To perform the inverse c2r
Chris@19 3761 transform with the same data distributions, you would use the
Chris@19 3762 `FFTW_TRANSPOSED_IN' flag.
Chris@19 3763
Chris@19 3764 
Chris@19 3765 File: fftw3.info, Node: Other Multi-dimensional Real-data MPI Transforms, Next: FFTW MPI Transposes, Prev: Multi-dimensional MPI DFTs of Real Data, Up: Distributed-memory FFTW with MPI
Chris@19 3766
Chris@19 3767 6.6 Other multi-dimensional Real-Data MPI Transforms
Chris@19 3768 ====================================================
Chris@19 3769
Chris@19 3770 FFTW's MPI interface also supports multi-dimensional `r2r' transforms
Chris@19 3771 of all kinds supported by the serial interface (e.g. discrete cosine
Chris@19 3772 and sine transforms, discrete Hartley transforms, etc.). Only
Chris@19 3773 multi-dimensional `r2r' transforms, not one-dimensional transforms, are
Chris@19 3774 currently parallelized.
Chris@19 3775
Chris@19 3776 These are used much like the multidimensional complex DFTs discussed
Chris@19 3777 above, except that the data is real rather than complex, and one needs
Chris@19 3778 to pass an r2r transform kind (`fftw_r2r_kind') for each dimension as
Chris@19 3779 in the serial FFTW (*note More DFTs of Real Data::).
Chris@19 3780
Chris@19 3781 For example, one might perform a two-dimensional L x M that is an
Chris@19 3782 REDFT10 (DCT-II) in the first dimension and an RODFT10 (DST-II) in the
Chris@19 3783 second dimension with code like:
Chris@19 3784
Chris@19 3785 const ptrdiff_t L = ..., M = ...;
Chris@19 3786 fftw_plan plan;
Chris@19 3787 double *data;
Chris@19 3788 ptrdiff_t alloc_local, local_n0, local_0_start, i, j;
Chris@19 3789
Chris@19 3790 /* get local data size and allocate */
Chris@19 3791 alloc_local = fftw_mpi_local_size_2d(L, M, MPI_COMM_WORLD,
Chris@19 3792 &local_n0, &local_0_start);
Chris@19 3793 data = fftw_alloc_real(alloc_local);
Chris@19 3794
Chris@19 3795 /* create plan for in-place REDFT10 x RODFT10 */
Chris@19 3796 plan = fftw_mpi_plan_r2r_2d(L, M, data, data, MPI_COMM_WORLD,
Chris@19 3797 FFTW_REDFT10, FFTW_RODFT10, FFTW_MEASURE);
Chris@19 3798
Chris@19 3799 /* initialize data to some function my_function(x,y) */
Chris@19 3800 for (i = 0; i < local_n0; ++i) for (j = 0; j < M; ++j)
Chris@19 3801 data[i*M + j] = my_function(local_0_start + i, j);
Chris@19 3802
Chris@19 3803 /* compute transforms, in-place, as many times as desired */
Chris@19 3804 fftw_execute(plan);
Chris@19 3805
Chris@19 3806 fftw_destroy_plan(plan);
Chris@19 3807
Chris@19 3808 Notice that we use the same `local_size' functions as we did for
Chris@19 3809 complex data, only now we interpret the sizes in terms of real rather
Chris@19 3810 than complex values, and correspondingly use `fftw_alloc_real'.
Chris@19 3811
Chris@19 3812 
Chris@19 3813 File: fftw3.info, Node: FFTW MPI Transposes, Next: FFTW MPI Wisdom, Prev: Other Multi-dimensional Real-data MPI Transforms, Up: Distributed-memory FFTW with MPI
Chris@19 3814
Chris@19 3815 6.7 FFTW MPI Transposes
Chris@19 3816 =======================
Chris@19 3817
Chris@19 3818 The FFTW's MPI Fourier transforms rely on one or more _global
Chris@19 3819 transposition_ step for their communications. For example, the
Chris@19 3820 multidimensional transforms work by transforming along some dimensions,
Chris@19 3821 then transposing to make the first dimension local and transforming
Chris@19 3822 that, then transposing back. Because global transposition of a
Chris@19 3823 block-distributed matrix has many other potential uses besides FFTs,
Chris@19 3824 FFTW's transpose routines can be called directly, as documented in this
Chris@19 3825 section.
Chris@19 3826
Chris@19 3827 * Menu:
Chris@19 3828
Chris@19 3829 * Basic distributed-transpose interface::
Chris@19 3830 * Advanced distributed-transpose interface::
Chris@19 3831 * An improved replacement for MPI_Alltoall::
Chris@19 3832
Chris@19 3833 
Chris@19 3834 File: fftw3.info, Node: Basic distributed-transpose interface, Next: Advanced distributed-transpose interface, Prev: FFTW MPI Transposes, Up: FFTW MPI Transposes
Chris@19 3835
Chris@19 3836 6.7.1 Basic distributed-transpose interface
Chris@19 3837 -------------------------------------------
Chris@19 3838
Chris@19 3839 In particular, suppose that we have an `n0' by `n1' array in row-major
Chris@19 3840 order, block-distributed across the `n0' dimension. To transpose this
Chris@19 3841 into an `n1' by `n0' array block-distributed across the `n1' dimension,
Chris@19 3842 we would create a plan by calling the following function:
Chris@19 3843
Chris@19 3844 fftw_plan fftw_mpi_plan_transpose(ptrdiff_t n0, ptrdiff_t n1,
Chris@19 3845 double *in, double *out,
Chris@19 3846 MPI_Comm comm, unsigned flags);
Chris@19 3847
Chris@19 3848 The input and output arrays (`in' and `out') can be the same. The
Chris@19 3849 transpose is actually executed by calling `fftw_execute' on the plan,
Chris@19 3850 as usual.
Chris@19 3851
Chris@19 3852 The `flags' are the usual FFTW planner flags, but support two
Chris@19 3853 additional flags: `FFTW_MPI_TRANSPOSED_OUT' and/or
Chris@19 3854 `FFTW_MPI_TRANSPOSED_IN'. What these flags indicate, for transpose
Chris@19 3855 plans, is that the output and/or input, respectively, are _locally_
Chris@19 3856 transposed. That is, on each process input data is normally stored as
Chris@19 3857 a `local_n0' by `n1' array in row-major order, but for an
Chris@19 3858 `FFTW_MPI_TRANSPOSED_IN' plan the input data is stored as `n1' by
Chris@19 3859 `local_n0' in row-major order. Similarly, `FFTW_MPI_TRANSPOSED_OUT'
Chris@19 3860 means that the output is `n0' by `local_n1' instead of `local_n1' by
Chris@19 3861 `n0'.
Chris@19 3862
Chris@19 3863 To determine the local size of the array on each process before and
Chris@19 3864 after the transpose, as well as the amount of storage that must be
Chris@19 3865 allocated, one should call `fftw_mpi_local_size_2d_transposed', just as
Chris@19 3866 for a 2d DFT as described in the previous section:
Chris@19 3867
Chris@19 3868 ptrdiff_t fftw_mpi_local_size_2d_transposed
Chris@19 3869 (ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
Chris@19 3870 ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
Chris@19 3871 ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
Chris@19 3872
Chris@19 3873 Again, the return value is the local storage to allocate, which in
Chris@19 3874 this case is the number of _real_ (`double') values rather than complex
Chris@19 3875 numbers as in the previous examples.
Chris@19 3876
Chris@19 3877 
Chris@19 3878 File: fftw3.info, Node: Advanced distributed-transpose interface, Next: An improved replacement for MPI_Alltoall, Prev: Basic distributed-transpose interface, Up: FFTW MPI Transposes
Chris@19 3879
Chris@19 3880 6.7.2 Advanced distributed-transpose interface
Chris@19 3881 ----------------------------------------------
Chris@19 3882
Chris@19 3883 The above routines are for a transpose of a matrix of numbers (of type
Chris@19 3884 `double'), using FFTW's default block sizes. More generally, one can
Chris@19 3885 perform transposes of _tuples_ of numbers, with user-specified block
Chris@19 3886 sizes for the input and output:
Chris@19 3887
Chris@19 3888 fftw_plan fftw_mpi_plan_many_transpose
Chris@19 3889 (ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t howmany,
Chris@19 3890 ptrdiff_t block0, ptrdiff_t block1,
Chris@19 3891 double *in, double *out, MPI_Comm comm, unsigned flags);
Chris@19 3892
Chris@19 3893 In this case, one is transposing an `n0' by `n1' matrix of
Chris@19 3894 `howmany'-tuples (e.g. `howmany = 2' for complex numbers). The input
Chris@19 3895 is distributed along the `n0' dimension with block size `block0', and
Chris@19 3896 the `n1' by `n0' output is distributed along the `n1' dimension with
Chris@19 3897 block size `block1'. If `FFTW_MPI_DEFAULT_BLOCK' (0) is passed for a
Chris@19 3898 block size then FFTW uses its default block size. To get the local
Chris@19 3899 size of the data on each process, you should then call
Chris@19 3900 `fftw_mpi_local_size_many_transposed'.
Chris@19 3901
Chris@19 3902 
Chris@19 3903 File: fftw3.info, Node: An improved replacement for MPI_Alltoall, Prev: Advanced distributed-transpose interface, Up: FFTW MPI Transposes
Chris@19 3904
Chris@19 3905 6.7.3 An improved replacement for MPI_Alltoall
Chris@19 3906 ----------------------------------------------
Chris@19 3907
Chris@19 3908 We close this section by noting that FFTW's MPI transpose routines can
Chris@19 3909 be thought of as a generalization for the `MPI_Alltoall' function
Chris@19 3910 (albeit only for floating-point types), and in some circumstances can
Chris@19 3911 function as an improved replacement.
Chris@19 3912
Chris@19 3913 `MPI_Alltoall' is defined by the MPI standard as:
Chris@19 3914
Chris@19 3915 int MPI_Alltoall(void *sendbuf, int sendcount, MPI_Datatype sendtype,
Chris@19 3916 void *recvbuf, int recvcnt, MPI_Datatype recvtype,
Chris@19 3917 MPI_Comm comm);
Chris@19 3918
Chris@19 3919 In particular, for `double*' arrays `in' and `out', consider the
Chris@19 3920 call:
Chris@19 3921
Chris@19 3922 MPI_Alltoall(in, howmany, MPI_DOUBLE, out, howmany MPI_DOUBLE, comm);
Chris@19 3923
Chris@19 3924 This is completely equivalent to:
Chris@19 3925
Chris@19 3926 MPI_Comm_size(comm, &P);
Chris@19 3927 plan = fftw_mpi_plan_many_transpose(P, P, howmany, 1, 1, in, out, comm, FFTW_ESTIMATE);
Chris@19 3928 fftw_execute(plan);
Chris@19 3929 fftw_destroy_plan(plan);
Chris@19 3930
Chris@19 3931 That is, computing a P x P transpose on `P' processes, with a block
Chris@19 3932 size of 1, is just a standard all-to-all communication.
Chris@19 3933
Chris@19 3934 However, using the FFTW routine instead of `MPI_Alltoall' may have
Chris@19 3935 certain advantages. First of all, FFTW's routine can operate in-place
Chris@19 3936 (`in == out') whereas `MPI_Alltoall' can only operate out-of-place.
Chris@19 3937
Chris@19 3938 Second, even for out-of-place plans, FFTW's routine may be faster,
Chris@19 3939 especially if you need to perform the all-to-all communication many
Chris@19 3940 times and can afford to use `FFTW_MEASURE' or `FFTW_PATIENT'. It
Chris@19 3941 should certainly be no slower, not including the time to create the
Chris@19 3942 plan, since one of the possible algorithms that FFTW uses for an
Chris@19 3943 out-of-place transpose _is_ simply to call `MPI_Alltoall'. However,
Chris@19 3944 FFTW also considers several other possible algorithms that, depending
Chris@19 3945 on your MPI implementation and your hardware, may be faster.
Chris@19 3946
Chris@19 3947 
Chris@19 3948 File: fftw3.info, Node: FFTW MPI Wisdom, Next: Avoiding MPI Deadlocks, Prev: FFTW MPI Transposes, Up: Distributed-memory FFTW with MPI
Chris@19 3949
Chris@19 3950 6.8 FFTW MPI Wisdom
Chris@19 3951 ===================
Chris@19 3952
Chris@19 3953 FFTW's "wisdom" facility (*note Words of Wisdom-Saving Plans::) can be
Chris@19 3954 used to save MPI plans as well as to save uniprocessor plans. However,
Chris@19 3955 for MPI there are several unavoidable complications.
Chris@19 3956
Chris@19 3957 First, the MPI standard does not guarantee that every process can
Chris@19 3958 perform file I/O (at least, not using C stdio routines)--in general, we
Chris@19 3959 may only assume that process 0 is capable of I/O.(1) So, if we want to
Chris@19 3960 export the wisdom from a single process to a file, we must first export
Chris@19 3961 the wisdom to a string, then send it to process 0, then write it to a
Chris@19 3962 file.
Chris@19 3963
Chris@19 3964 Second, in principle we may want to have separate wisdom for every
Chris@19 3965 process, since in general the processes may run on different hardware
Chris@19 3966 even for a single MPI program. However, in practice FFTW's MPI code is
Chris@19 3967 designed for the case of homogeneous hardware (*note Load balancing::),
Chris@19 3968 and in this case it is convenient to use the same wisdom for every
Chris@19 3969 process. Thus, we need a mechanism to synchronize the wisdom.
Chris@19 3970
Chris@19 3971 To address both of these problems, FFTW provides the following two
Chris@19 3972 functions:
Chris@19 3973
Chris@19 3974 void fftw_mpi_broadcast_wisdom(MPI_Comm comm);
Chris@19 3975 void fftw_mpi_gather_wisdom(MPI_Comm comm);
Chris@19 3976
Chris@19 3977 Given a communicator `comm', `fftw_mpi_broadcast_wisdom' will
Chris@19 3978 broadcast the wisdom from process 0 to all other processes.
Chris@19 3979 Conversely, `fftw_mpi_gather_wisdom' will collect wisdom from all
Chris@19 3980 processes onto process 0. (If the plans created for the same problem
Chris@19 3981 by different processes are not the same, `fftw_mpi_gather_wisdom' will
Chris@19 3982 arbitrarily choose one of the plans.) Both of these functions may
Chris@19 3983 result in suboptimal plans for different processes if the processes are
Chris@19 3984 running on non-identical hardware. Both of these functions are
Chris@19 3985 _collective_ calls, which means that they must be executed by all
Chris@19 3986 processes in the communicator.
Chris@19 3987
Chris@19 3988 So, for example, a typical code snippet to import wisdom from a file
Chris@19 3989 and use it on all processes would be:
Chris@19 3990
Chris@19 3991 {
Chris@19 3992 int rank;
Chris@19 3993
Chris@19 3994 fftw_mpi_init();
Chris@19 3995 MPI_Comm_rank(MPI_COMM_WORLD, &rank);
Chris@19 3996 if (rank == 0) fftw_import_wisdom_from_filename("mywisdom");
Chris@19 3997 fftw_mpi_broadcast_wisdom(MPI_COMM_WORLD);
Chris@19 3998 }
Chris@19 3999
Chris@19 4000 (Note that we must call `fftw_mpi_init' before importing any wisdom
Chris@19 4001 that might contain MPI plans.) Similarly, a typical code snippet to
Chris@19 4002 export wisdom from all processes to a file is:
Chris@19 4003
Chris@19 4004 {
Chris@19 4005 int rank;
Chris@19 4006
Chris@19 4007 fftw_mpi_gather_wisdom(MPI_COMM_WORLD);
Chris@19 4008 MPI_Comm_rank(MPI_COMM_WORLD, &rank);
Chris@19 4009 if (rank == 0) fftw_export_wisdom_to_filename("mywisdom");
Chris@19 4010 }
Chris@19 4011
Chris@19 4012 ---------- Footnotes ----------
Chris@19 4013
Chris@19 4014 (1) In fact, even this assumption is not technically guaranteed by
Chris@19 4015 the standard, although it seems to be universal in actual MPI
Chris@19 4016 implementations and is widely assumed by MPI-using software.
Chris@19 4017 Technically, you need to query the `MPI_IO' attribute of
Chris@19 4018 `MPI_COMM_WORLD' with `MPI_Attr_get'. If this attribute is
Chris@19 4019 `MPI_PROC_NULL', no I/O is possible. If it is `MPI_ANY_SOURCE', any
Chris@19 4020 process can perform I/O. Otherwise, it is the rank of a process that
Chris@19 4021 can perform I/O ... but since it is not guaranteed to yield the _same_
Chris@19 4022 rank on all processes, you have to do an `MPI_Allreduce' of some kind
Chris@19 4023 if you want all processes to agree about which is going to do I/O. And
Chris@19 4024 even then, the standard only guarantees that this process can perform
Chris@19 4025 output, but not input. See e.g. `Parallel Programming with MPI' by P.
Chris@19 4026 S. Pacheco, section 8.1.3. Needless to say, in our experience
Chris@19 4027 virtually no MPI programmers worry about this.
Chris@19 4028
Chris@19 4029 
Chris@19 4030 File: fftw3.info, Node: Avoiding MPI Deadlocks, Next: FFTW MPI Performance Tips, Prev: FFTW MPI Wisdom, Up: Distributed-memory FFTW with MPI
Chris@19 4031
Chris@19 4032 6.9 Avoiding MPI Deadlocks
Chris@19 4033 ==========================
Chris@19 4034
Chris@19 4035 An MPI program can _deadlock_ if one process is waiting for a message
Chris@19 4036 from another process that never gets sent. To avoid deadlocks when
Chris@19 4037 using FFTW's MPI routines, it is important to know which functions are
Chris@19 4038 _collective_: that is, which functions must _always_ be called in the
Chris@19 4039 _same order_ from _every_ process in a given communicator. (For
Chris@19 4040 example, `MPI_Barrier' is the canonical example of a collective
Chris@19 4041 function in the MPI standard.)
Chris@19 4042
Chris@19 4043 The functions in FFTW that are _always_ collective are: every
Chris@19 4044 function beginning with `fftw_mpi_plan', as well as
Chris@19 4045 `fftw_mpi_broadcast_wisdom' and `fftw_mpi_gather_wisdom'. Also, the
Chris@19 4046 following functions from the ordinary FFTW interface are collective
Chris@19 4047 when they are applied to a plan created by an `fftw_mpi_plan' function:
Chris@19 4048 `fftw_execute', `fftw_destroy_plan', and `fftw_flops'.
Chris@19 4049
Chris@19 4050 
Chris@19 4051 File: fftw3.info, Node: FFTW MPI Performance Tips, Next: Combining MPI and Threads, Prev: Avoiding MPI Deadlocks, Up: Distributed-memory FFTW with MPI
Chris@19 4052
Chris@19 4053 6.10 FFTW MPI Performance Tips
Chris@19 4054 ==============================
Chris@19 4055
Chris@19 4056 In this section, we collect a few tips on getting the best performance
Chris@19 4057 out of FFTW's MPI transforms.
Chris@19 4058
Chris@19 4059 First, because of the 1d block distribution, FFTW's parallelization
Chris@19 4060 is currently limited by the size of the first dimension.
Chris@19 4061 (Multidimensional block distributions may be supported by a future
Chris@19 4062 version.) More generally, you should ideally arrange the dimensions so
Chris@19 4063 that FFTW can divide them equally among the processes. *Note Load
Chris@19 4064 balancing::.
Chris@19 4065
Chris@19 4066 Second, if it is not too inconvenient, you should consider working
Chris@19 4067 with transposed output for multidimensional plans, as this saves a
Chris@19 4068 considerable amount of communications. *Note Transposed
Chris@19 4069 distributions::.
Chris@19 4070
Chris@19 4071 Third, the fastest choices are generally either an in-place transform
Chris@19 4072 or an out-of-place transform with the `FFTW_DESTROY_INPUT' flag (which
Chris@19 4073 allows the input array to be used as scratch space). In-place is
Chris@19 4074 especially beneficial if the amount of data per process is large.
Chris@19 4075
Chris@19 4076 Fourth, if you have multiple arrays to transform at once, rather than
Chris@19 4077 calling FFTW's MPI transforms several times it usually seems to be
Chris@19 4078 faster to interleave the data and use the advanced interface. (This
Chris@19 4079 groups the communications together instead of requiring separate
Chris@19 4080 messages for each transform.)
Chris@19 4081
Chris@19 4082 
Chris@19 4083 File: fftw3.info, Node: Combining MPI and Threads, Next: FFTW MPI Reference, Prev: FFTW MPI Performance Tips, Up: Distributed-memory FFTW with MPI
Chris@19 4084
Chris@19 4085 6.11 Combining MPI and Threads
Chris@19 4086 ==============================
Chris@19 4087
Chris@19 4088 In certain cases, it may be advantageous to combine MPI
Chris@19 4089 (distributed-memory) and threads (shared-memory) parallelization. FFTW
Chris@19 4090 supports this, with certain caveats. For example, if you have a
Chris@19 4091 cluster of 4-processor shared-memory nodes, you may want to use threads
Chris@19 4092 within the nodes and MPI between the nodes, instead of MPI for all
Chris@19 4093 parallelization.
Chris@19 4094
Chris@19 4095 In particular, it is possible to seamlessly combine the MPI FFTW
Chris@19 4096 routines with the multi-threaded FFTW routines (*note Multi-threaded
Chris@19 4097 FFTW::). However, some care must be taken in the initialization code,
Chris@19 4098 which should look something like this:
Chris@19 4099
Chris@19 4100 int threads_ok;
Chris@19 4101
Chris@19 4102 int main(int argc, char **argv)
Chris@19 4103 {
Chris@19 4104 int provided;
Chris@19 4105 MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &provided);
Chris@19 4106 threads_ok = provided >= MPI_THREAD_FUNNELED;
Chris@19 4107
Chris@19 4108 if (threads_ok) threads_ok = fftw_init_threads();
Chris@19 4109 fftw_mpi_init();
Chris@19 4110
Chris@19 4111 ...
Chris@19 4112 if (threads_ok) fftw_plan_with_nthreads(...);
Chris@19 4113 ...
Chris@19 4114
Chris@19 4115 MPI_Finalize();
Chris@19 4116 }
Chris@19 4117
Chris@19 4118 First, note that instead of calling `MPI_Init', you should call
Chris@19 4119 `MPI_Init_threads', which is the initialization routine defined by the
Chris@19 4120 MPI-2 standard to indicate to MPI that your program will be
Chris@19 4121 multithreaded. We pass `MPI_THREAD_FUNNELED', which indicates that we
Chris@19 4122 will only call MPI routines from the main thread. (FFTW will launch
Chris@19 4123 additional threads internally, but the extra threads will not call MPI
Chris@19 4124 code.) (You may also pass `MPI_THREAD_SERIALIZED' or
Chris@19 4125 `MPI_THREAD_MULTIPLE', which requests additional multithreading support
Chris@19 4126 from the MPI implementation, but this is not required by FFTW.) The
Chris@19 4127 `provided' parameter returns what level of threads support is actually
Chris@19 4128 supported by your MPI implementation; this _must_ be at least
Chris@19 4129 `MPI_THREAD_FUNNELED' if you want to call the FFTW threads routines, so
Chris@19 4130 we define a global variable `threads_ok' to record this. You should
Chris@19 4131 only call `fftw_init_threads' or `fftw_plan_with_nthreads' if
Chris@19 4132 `threads_ok' is true. For more information on thread safety in MPI,
Chris@19 4133 see the MPI and Threads
Chris@19 4134 (http://www.mpi-forum.org/docs/mpi-20-html/node162.htm) section of the
Chris@19 4135 MPI-2 standard.
Chris@19 4136
Chris@19 4137 Second, we must call `fftw_init_threads' _before_ `fftw_mpi_init'.
Chris@19 4138 This is critical for technical reasons having to do with how FFTW
Chris@19 4139 initializes its list of algorithms.
Chris@19 4140
Chris@19 4141 Then, if you call `fftw_plan_with_nthreads(N)', _every_ MPI process
Chris@19 4142 will launch (up to) `N' threads to parallelize its transforms.
Chris@19 4143
Chris@19 4144 For example, in the hypothetical cluster of 4-processor nodes, you
Chris@19 4145 might wish to launch only a single MPI process per node, and then call
Chris@19 4146 `fftw_plan_with_nthreads(4)' on each process to use all processors in
Chris@19 4147 the nodes.
Chris@19 4148
Chris@19 4149 This may or may not be faster than simply using as many MPI processes
Chris@19 4150 as you have processors, however. On the one hand, using threads within
Chris@19 4151 a node eliminates the need for explicit message passing within the
Chris@19 4152 node. On the other hand, FFTW's transpose routines are not
Chris@19 4153 multi-threaded, and this means that the communications that do take
Chris@19 4154 place will not benefit from parallelization within the node. Moreover,
Chris@19 4155 many MPI implementations already have optimizations to exploit shared
Chris@19 4156 memory when it is available, so adding the multithreaded FFTW on top of
Chris@19 4157 this may be superfluous.
Chris@19 4158
Chris@19 4159 
Chris@19 4160 File: fftw3.info, Node: FFTW MPI Reference, Next: FFTW MPI Fortran Interface, Prev: Combining MPI and Threads, Up: Distributed-memory FFTW with MPI
Chris@19 4161
Chris@19 4162 6.12 FFTW MPI Reference
Chris@19 4163 =======================
Chris@19 4164
Chris@19 4165 This chapter provides a complete reference to all FFTW MPI functions,
Chris@19 4166 datatypes, and constants. See also *note FFTW Reference:: for
Chris@19 4167 information on functions and types in common with the serial interface.
Chris@19 4168
Chris@19 4169 * Menu:
Chris@19 4170
Chris@19 4171 * MPI Files and Data Types::
Chris@19 4172 * MPI Initialization::
Chris@19 4173 * Using MPI Plans::
Chris@19 4174 * MPI Data Distribution Functions::
Chris@19 4175 * MPI Plan Creation::
Chris@19 4176 * MPI Wisdom Communication::
Chris@19 4177
Chris@19 4178 
Chris@19 4179 File: fftw3.info, Node: MPI Files and Data Types, Next: MPI Initialization, Prev: FFTW MPI Reference, Up: FFTW MPI Reference
Chris@19 4180
Chris@19 4181 6.12.1 MPI Files and Data Types
Chris@19 4182 -------------------------------
Chris@19 4183
Chris@19 4184 All programs using FFTW's MPI support should include its header file:
Chris@19 4185
Chris@19 4186 #include <fftw3-mpi.h>
Chris@19 4187
Chris@19 4188 Note that this header file includes the serial-FFTW `fftw3.h' header
Chris@19 4189 file, and also the `mpi.h' header file for MPI, so you need not include
Chris@19 4190 those files separately.
Chris@19 4191
Chris@19 4192 You must also link to _both_ the FFTW MPI library and to the serial
Chris@19 4193 FFTW library. On Unix, this means adding `-lfftw3_mpi -lfftw3 -lm' at
Chris@19 4194 the end of the link command.
Chris@19 4195
Chris@19 4196 Different precisions are handled as in the serial interface: *Note
Chris@19 4197 Precision::. That is, `fftw_' functions become `fftwf_' (in single
Chris@19 4198 precision) etcetera, and the libraries become `-lfftw3f_mpi -lfftw3f
Chris@19 4199 -lm' etcetera on Unix. Long-double precision is supported in MPI, but
Chris@19 4200 quad precision (`fftwq_') is not due to the lack of MPI support for
Chris@19 4201 this type.
Chris@19 4202
Chris@19 4203 
Chris@19 4204 File: fftw3.info, Node: MPI Initialization, Next: Using MPI Plans, Prev: MPI Files and Data Types, Up: FFTW MPI Reference
Chris@19 4205
Chris@19 4206 6.12.2 MPI Initialization
Chris@19 4207 -------------------------
Chris@19 4208
Chris@19 4209 Before calling any other FFTW MPI (`fftw_mpi_') function, and before
Chris@19 4210 importing any wisdom for MPI problems, you must call:
Chris@19 4211
Chris@19 4212 void fftw_mpi_init(void);
Chris@19 4213
Chris@19 4214 If FFTW threads support is used, however, `fftw_mpi_init' should be
Chris@19 4215 called _after_ `fftw_init_threads' (*note Combining MPI and Threads::).
Chris@19 4216 Calling `fftw_mpi_init' additional times (before `fftw_mpi_cleanup')
Chris@19 4217 has no effect.
Chris@19 4218
Chris@19 4219 If you want to deallocate all persistent data and reset FFTW to the
Chris@19 4220 pristine state it was in when you started your program, you can call:
Chris@19 4221
Chris@19 4222 void fftw_mpi_cleanup(void);
Chris@19 4223
Chris@19 4224 (This calls `fftw_cleanup', so you need not call the serial cleanup
Chris@19 4225 routine too, although it is safe to do so.) After calling
Chris@19 4226 `fftw_mpi_cleanup', all existing plans become undefined, and you should
Chris@19 4227 not attempt to execute or destroy them. You must call `fftw_mpi_init'
Chris@19 4228 again after `fftw_mpi_cleanup' if you want to resume using the MPI FFTW
Chris@19 4229 routines.
Chris@19 4230
Chris@19 4231 
Chris@19 4232 File: fftw3.info, Node: Using MPI Plans, Next: MPI Data Distribution Functions, Prev: MPI Initialization, Up: FFTW MPI Reference
Chris@19 4233
Chris@19 4234 6.12.3 Using MPI Plans
Chris@19 4235 ----------------------
Chris@19 4236
Chris@19 4237 Once an MPI plan is created, you can execute and destroy it using
Chris@19 4238 `fftw_execute', `fftw_destroy_plan', and the other functions in the
Chris@19 4239 serial interface that operate on generic plans (*note Using Plans::).
Chris@19 4240
Chris@19 4241 The `fftw_execute' and `fftw_destroy_plan' functions, applied to MPI
Chris@19 4242 plans, are _collective_ calls: they must be called for all processes in
Chris@19 4243 the communicator that was used to create the plan.
Chris@19 4244
Chris@19 4245 You must _not_ use the serial new-array plan-execution functions
Chris@19 4246 `fftw_execute_dft' and so on (*note New-array Execute Functions::) with
Chris@19 4247 MPI plans. Such functions are specialized to the problem type, and
Chris@19 4248 there are specific new-array execute functions for MPI plans:
Chris@19 4249
Chris@19 4250 void fftw_mpi_execute_dft(fftw_plan p, fftw_complex *in, fftw_complex *out);
Chris@19 4251 void fftw_mpi_execute_dft_r2c(fftw_plan p, double *in, fftw_complex *out);
Chris@19 4252 void fftw_mpi_execute_dft_c2r(fftw_plan p, fftw_complex *in, double *out);
Chris@19 4253 void fftw_mpi_execute_r2r(fftw_plan p, double *in, double *out);
Chris@19 4254
Chris@19 4255 These functions have the same restrictions as those of the serial
Chris@19 4256 new-array execute functions. They are _always_ safe to apply to the
Chris@19 4257 _same_ `in' and `out' arrays that were used to create the plan. They
Chris@19 4258 can only be applied to new arrarys if those arrays have the same types,
Chris@19 4259 dimensions, in-placeness, and alignment as the original arrays, where
Chris@19 4260 the best way to ensure the same alignment is to use FFTW's
Chris@19 4261 `fftw_malloc' and related allocation functions for all arrays (*note
Chris@19 4262 Memory Allocation::). Note that distributed transposes (*note FFTW MPI
Chris@19 4263 Transposes::) use `fftw_mpi_execute_r2r', since they count as rank-zero
Chris@19 4264 r2r plans from FFTW's perspective.
Chris@19 4265
Chris@19 4266 
Chris@19 4267 File: fftw3.info, Node: MPI Data Distribution Functions, Next: MPI Plan Creation, Prev: Using MPI Plans, Up: FFTW MPI Reference
Chris@19 4268
Chris@19 4269 6.12.4 MPI Data Distribution Functions
Chris@19 4270 --------------------------------------
Chris@19 4271
Chris@19 4272 As described above (*note MPI Data Distribution::), in order to
Chris@19 4273 allocate your arrays, _before_ creating a plan, you must first call one
Chris@19 4274 of the following routines to determine the required allocation size and
Chris@19 4275 the portion of the array locally stored on a given process. The
Chris@19 4276 `MPI_Comm' communicator passed here must be equivalent to the
Chris@19 4277 communicator used below for plan creation.
Chris@19 4278
Chris@19 4279 The basic interface for multidimensional transforms consists of the
Chris@19 4280 functions:
Chris@19 4281
Chris@19 4282 ptrdiff_t fftw_mpi_local_size_2d(ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
Chris@19 4283 ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
Chris@19 4284 ptrdiff_t fftw_mpi_local_size_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
Chris@19 4285 MPI_Comm comm,
Chris@19 4286 ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
Chris@19 4287 ptrdiff_t fftw_mpi_local_size(int rnk, const ptrdiff_t *n, MPI_Comm comm,
Chris@19 4288 ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
Chris@19 4289
Chris@19 4290 ptrdiff_t fftw_mpi_local_size_2d_transposed(ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
Chris@19 4291 ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
Chris@19 4292 ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
Chris@19 4293 ptrdiff_t fftw_mpi_local_size_3d_transposed(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
Chris@19 4294 MPI_Comm comm,
Chris@19 4295 ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
Chris@19 4296 ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
Chris@19 4297 ptrdiff_t fftw_mpi_local_size_transposed(int rnk, const ptrdiff_t *n, MPI_Comm comm,
Chris@19 4298 ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
Chris@19 4299 ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
Chris@19 4300
Chris@19 4301 These functions return the number of elements to allocate (complex
Chris@19 4302 numbers for DFT/r2c/c2r plans, real numbers for r2r plans), whereas the
Chris@19 4303 `local_n0' and `local_0_start' return the portion (`local_0_start' to
Chris@19 4304 `local_0_start + local_n0 - 1') of the first dimension of an n[0] x
Chris@19 4305 n[1] x n[2] x ... x n[d-1] array that is stored on the local process.
Chris@19 4306 *Note Basic and advanced distribution interfaces::. For
Chris@19 4307 `FFTW_MPI_TRANSPOSED_OUT' plans, the `_transposed' variants are useful
Chris@19 4308 in order to also return the local portion of the first dimension in the
Chris@19 4309 n[1] x n[0] x n[2] x ... x n[d-1] transposed output. *Note Transposed
Chris@19 4310 distributions::. The advanced interface for multidimensional
Chris@19 4311 transforms is:
Chris@19 4312
Chris@19 4313 ptrdiff_t fftw_mpi_local_size_many(int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
Chris@19 4314 ptrdiff_t block0, MPI_Comm comm,
Chris@19 4315 ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
Chris@19 4316 ptrdiff_t fftw_mpi_local_size_many_transposed(int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
Chris@19 4317 ptrdiff_t block0, ptrdiff_t block1, MPI_Comm comm,
Chris@19 4318 ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
Chris@19 4319 ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
Chris@19 4320
Chris@19 4321 These differ from the basic interface in only two ways. First, they
Chris@19 4322 allow you to specify block sizes `block0' and `block1' (the latter for
Chris@19 4323 the transposed output); you can pass `FFTW_MPI_DEFAULT_BLOCK' to use
Chris@19 4324 FFTW's default block size as in the basic interface. Second, you can
Chris@19 4325 pass a `howmany' parameter, corresponding to the advanced planning
Chris@19 4326 interface below: this is for transforms of contiguous `howmany'-tuples
Chris@19 4327 of numbers (`howmany = 1' in the basic interface).
Chris@19 4328
Chris@19 4329 The corresponding basic and advanced routines for one-dimensional
Chris@19 4330 transforms (currently only complex DFTs) are:
Chris@19 4331
Chris@19 4332 ptrdiff_t fftw_mpi_local_size_1d(
Chris@19 4333 ptrdiff_t n0, MPI_Comm comm, int sign, unsigned flags,
Chris@19 4334 ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
Chris@19 4335 ptrdiff_t *local_no, ptrdiff_t *local_o_start);
Chris@19 4336 ptrdiff_t fftw_mpi_local_size_many_1d(
Chris@19 4337 ptrdiff_t n0, ptrdiff_t howmany,
Chris@19 4338 MPI_Comm comm, int sign, unsigned flags,
Chris@19 4339 ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
Chris@19 4340 ptrdiff_t *local_no, ptrdiff_t *local_o_start);
Chris@19 4341
Chris@19 4342 As above, the return value is the number of elements to allocate
Chris@19 4343 (complex numbers, for complex DFTs). The `local_ni' and
Chris@19 4344 `local_i_start' arguments return the portion (`local_i_start' to
Chris@19 4345 `local_i_start + local_ni - 1') of the 1d array that is stored on this
Chris@19 4346 process for the transform _input_, and `local_no' and `local_o_start'
Chris@19 4347 are the corresponding quantities for the input. The `sign'
Chris@19 4348 (`FFTW_FORWARD' or `FFTW_BACKWARD') and `flags' must match the
Chris@19 4349 arguments passed when creating a plan. Although the inputs and outputs
Chris@19 4350 have different data distributions in general, it is guaranteed that the
Chris@19 4351 _output_ data distribution of an `FFTW_FORWARD' plan will match the
Chris@19 4352 _input_ data distribution of an `FFTW_BACKWARD' plan and vice versa;
Chris@19 4353 similarly for the `FFTW_MPI_SCRAMBLED_OUT' and `FFTW_MPI_SCRAMBLED_IN'
Chris@19 4354 flags. *Note One-dimensional distributions::.
Chris@19 4355
Chris@19 4356 
Chris@19 4357 File: fftw3.info, Node: MPI Plan Creation, Next: MPI Wisdom Communication, Prev: MPI Data Distribution Functions, Up: FFTW MPI Reference
Chris@19 4358
Chris@19 4359 6.12.5 MPI Plan Creation
Chris@19 4360 ------------------------
Chris@19 4361
Chris@19 4362 Complex-data MPI DFTs
Chris@19 4363 .....................
Chris@19 4364
Chris@19 4365 Plans for complex-data DFTs (*note 2d MPI example::) are created by:
Chris@19 4366
Chris@19 4367 fftw_plan fftw_mpi_plan_dft_1d(ptrdiff_t n0, fftw_complex *in, fftw_complex *out,
Chris@19 4368 MPI_Comm comm, int sign, unsigned flags);
Chris@19 4369 fftw_plan fftw_mpi_plan_dft_2d(ptrdiff_t n0, ptrdiff_t n1,
Chris@19 4370 fftw_complex *in, fftw_complex *out,
Chris@19 4371 MPI_Comm comm, int sign, unsigned flags);
Chris@19 4372 fftw_plan fftw_mpi_plan_dft_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
Chris@19 4373 fftw_complex *in, fftw_complex *out,
Chris@19 4374 MPI_Comm comm, int sign, unsigned flags);
Chris@19 4375 fftw_plan fftw_mpi_plan_dft(int rnk, const ptrdiff_t *n,
Chris@19 4376 fftw_complex *in, fftw_complex *out,
Chris@19 4377 MPI_Comm comm, int sign, unsigned flags);
Chris@19 4378 fftw_plan fftw_mpi_plan_many_dft(int rnk, const ptrdiff_t *n,
Chris@19 4379 ptrdiff_t howmany, ptrdiff_t block, ptrdiff_t tblock,
Chris@19 4380 fftw_complex *in, fftw_complex *out,
Chris@19 4381 MPI_Comm comm, int sign, unsigned flags);
Chris@19 4382
Chris@19 4383 These are similar to their serial counterparts (*note Complex DFTs::)
Chris@19 4384 in specifying the dimensions, sign, and flags of the transform. The
Chris@19 4385 `comm' argument gives an MPI communicator that specifies the set of
Chris@19 4386 processes to participate in the transform; plan creation is a
Chris@19 4387 collective function that must be called for all processes in the
Chris@19 4388 communicator. The `in' and `out' pointers refer only to a portion of
Chris@19 4389 the overall transform data (*note MPI Data Distribution::) as specified
Chris@19 4390 by the `local_size' functions in the previous section. Unless `flags'
Chris@19 4391 contains `FFTW_ESTIMATE', these arrays are overwritten during plan
Chris@19 4392 creation as for the serial interface. For multi-dimensional
Chris@19 4393 transforms, any dimensions `> 1' are supported; for one-dimensional
Chris@19 4394 transforms, only composite (non-prime) `n0' are currently supported
Chris@19 4395 (unlike the serial FFTW). Requesting an unsupported transform size
Chris@19 4396 will yield a `NULL' plan. (As in the serial interface, highly
Chris@19 4397 composite sizes generally yield the best performance.)
Chris@19 4398
Chris@19 4399 The advanced-interface `fftw_mpi_plan_many_dft' additionally allows
Chris@19 4400 you to specify the block sizes for the first dimension (`block') of the
Chris@19 4401 n[0] x n[1] x n[2] x ... x n[d-1] input data and the first dimension
Chris@19 4402 (`tblock') of the n[1] x n[0] x n[2] x ... x n[d-1] transposed data
Chris@19 4403 (at intermediate steps of the transform, and for the output if
Chris@19 4404 `FFTW_TRANSPOSED_OUT' is specified in `flags'). These must be the same
Chris@19 4405 block sizes as were passed to the corresponding `local_size' function;
Chris@19 4406 you can pass `FFTW_MPI_DEFAULT_BLOCK' to use FFTW's default block size
Chris@19 4407 as in the basic interface. Also, the `howmany' parameter specifies
Chris@19 4408 that the transform is of contiguous `howmany'-tuples rather than
Chris@19 4409 individual complex numbers; this corresponds to the same parameter in
Chris@19 4410 the serial advanced interface (*note Advanced Complex DFTs::) with
Chris@19 4411 `stride = howmany' and `dist = 1'.
Chris@19 4412
Chris@19 4413 MPI flags
Chris@19 4414 .........
Chris@19 4415
Chris@19 4416 The `flags' can be any of those for the serial FFTW (*note Planner
Chris@19 4417 Flags::), and in addition may include one or more of the following
Chris@19 4418 MPI-specific flags, which improve performance at the cost of changing
Chris@19 4419 the output or input data formats.
Chris@19 4420
Chris@19 4421 * `FFTW_MPI_SCRAMBLED_OUT', `FFTW_MPI_SCRAMBLED_IN': valid for 1d
Chris@19 4422 transforms only, these flags indicate that the output/input of the
Chris@19 4423 transform are in an undocumented "scrambled" order. A forward
Chris@19 4424 `FFTW_MPI_SCRAMBLED_OUT' transform can be inverted by a backward
Chris@19 4425 `FFTW_MPI_SCRAMBLED_IN' (times the usual 1/N normalization).
Chris@19 4426 *Note One-dimensional distributions::.
Chris@19 4427
Chris@19 4428 * `FFTW_MPI_TRANSPOSED_OUT', `FFTW_MPI_TRANSPOSED_IN': valid for
Chris@19 4429 multidimensional (`rnk > 1') transforms only, these flags specify
Chris@19 4430 that the output or input of an n[0] x n[1] x n[2] x ... x n[d-1]
Chris@19 4431 transform is transposed to n[1] x n[0] x n[2] x ... x n[d-1] .
Chris@19 4432 *Note Transposed distributions::.
Chris@19 4433
Chris@19 4434
Chris@19 4435 Real-data MPI DFTs
Chris@19 4436 ..................
Chris@19 4437
Chris@19 4438 Plans for real-input/output (r2c/c2r) DFTs (*note Multi-dimensional MPI
Chris@19 4439 DFTs of Real Data::) are created by:
Chris@19 4440
Chris@19 4441 fftw_plan fftw_mpi_plan_dft_r2c_2d(ptrdiff_t n0, ptrdiff_t n1,
Chris@19 4442 double *in, fftw_complex *out,
Chris@19 4443 MPI_Comm comm, unsigned flags);
Chris@19 4444 fftw_plan fftw_mpi_plan_dft_r2c_2d(ptrdiff_t n0, ptrdiff_t n1,
Chris@19 4445 double *in, fftw_complex *out,
Chris@19 4446 MPI_Comm comm, unsigned flags);
Chris@19 4447 fftw_plan fftw_mpi_plan_dft_r2c_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
Chris@19 4448 double *in, fftw_complex *out,
Chris@19 4449 MPI_Comm comm, unsigned flags);
Chris@19 4450 fftw_plan fftw_mpi_plan_dft_r2c(int rnk, const ptrdiff_t *n,
Chris@19 4451 double *in, fftw_complex *out,
Chris@19 4452 MPI_Comm comm, unsigned flags);
Chris@19 4453 fftw_plan fftw_mpi_plan_dft_c2r_2d(ptrdiff_t n0, ptrdiff_t n1,
Chris@19 4454 fftw_complex *in, double *out,
Chris@19 4455 MPI_Comm comm, unsigned flags);
Chris@19 4456 fftw_plan fftw_mpi_plan_dft_c2r_2d(ptrdiff_t n0, ptrdiff_t n1,
Chris@19 4457 fftw_complex *in, double *out,
Chris@19 4458 MPI_Comm comm, unsigned flags);
Chris@19 4459 fftw_plan fftw_mpi_plan_dft_c2r_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
Chris@19 4460 fftw_complex *in, double *out,
Chris@19 4461 MPI_Comm comm, unsigned flags);
Chris@19 4462 fftw_plan fftw_mpi_plan_dft_c2r(int rnk, const ptrdiff_t *n,
Chris@19 4463 fftw_complex *in, double *out,
Chris@19 4464 MPI_Comm comm, unsigned flags);
Chris@19 4465
Chris@19 4466 Similar to the serial interface (*note Real-data DFTs::), these
Chris@19 4467 transform logically n[0] x n[1] x n[2] x ... x n[d-1] real data
Chris@19 4468 to/from n[0] x n[1] x n[2] x ... x (n[d-1]/2 + 1) complex data,
Chris@19 4469 representing the non-redundant half of the conjugate-symmetry output of
Chris@19 4470 a real-input DFT (*note Multi-dimensional Transforms::). However, the
Chris@19 4471 real array must be stored within a padded n[0] x n[1] x n[2] x ... x [2
Chris@19 4472 (n[d-1]/2 + 1)]
Chris@19 4473
Chris@19 4474 array (much like the in-place serial r2c transforms, but here for
Chris@19 4475 out-of-place transforms as well). Currently, only multi-dimensional
Chris@19 4476 (`rnk > 1') r2c/c2r transforms are supported (requesting a plan for
Chris@19 4477 `rnk = 1' will yield `NULL'). As explained above (*note
Chris@19 4478 Multi-dimensional MPI DFTs of Real Data::), the data distribution of
Chris@19 4479 both the real and complex arrays is given by the `local_size' function
Chris@19 4480 called for the dimensions of the _complex_ array. Similar to the other
Chris@19 4481 planning functions, the input and output arrays are overwritten when
Chris@19 4482 the plan is created except in `FFTW_ESTIMATE' mode.
Chris@19 4483
Chris@19 4484 As for the complex DFTs above, there is an advance interface that
Chris@19 4485 allows you to manually specify block sizes and to transform contiguous
Chris@19 4486 `howmany'-tuples of real/complex numbers:
Chris@19 4487
Chris@19 4488 fftw_plan fftw_mpi_plan_many_dft_r2c
Chris@19 4489 (int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
Chris@19 4490 ptrdiff_t iblock, ptrdiff_t oblock,
Chris@19 4491 double *in, fftw_complex *out,
Chris@19 4492 MPI_Comm comm, unsigned flags);
Chris@19 4493 fftw_plan fftw_mpi_plan_many_dft_c2r
Chris@19 4494 (int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
Chris@19 4495 ptrdiff_t iblock, ptrdiff_t oblock,
Chris@19 4496 fftw_complex *in, double *out,
Chris@19 4497 MPI_Comm comm, unsigned flags);
Chris@19 4498
Chris@19 4499 MPI r2r transforms
Chris@19 4500 ..................
Chris@19 4501
Chris@19 4502 There are corresponding plan-creation routines for r2r transforms
Chris@19 4503 (*note More DFTs of Real Data::), currently supporting multidimensional
Chris@19 4504 (`rnk > 1') transforms only (`rnk = 1' will yield a `NULL' plan):
Chris@19 4505
Chris@19 4506 fftw_plan fftw_mpi_plan_r2r_2d(ptrdiff_t n0, ptrdiff_t n1,
Chris@19 4507 double *in, double *out,
Chris@19 4508 MPI_Comm comm,
Chris@19 4509 fftw_r2r_kind kind0, fftw_r2r_kind kind1,
Chris@19 4510 unsigned flags);
Chris@19 4511 fftw_plan fftw_mpi_plan_r2r_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
Chris@19 4512 double *in, double *out,
Chris@19 4513 MPI_Comm comm,
Chris@19 4514 fftw_r2r_kind kind0, fftw_r2r_kind kind1, fftw_r2r_kind kind2,
Chris@19 4515 unsigned flags);
Chris@19 4516 fftw_plan fftw_mpi_plan_r2r(int rnk, const ptrdiff_t *n,
Chris@19 4517 double *in, double *out,
Chris@19 4518 MPI_Comm comm, const fftw_r2r_kind *kind,
Chris@19 4519 unsigned flags);
Chris@19 4520 fftw_plan fftw_mpi_plan_many_r2r(int rnk, const ptrdiff_t *n,
Chris@19 4521 ptrdiff_t iblock, ptrdiff_t oblock,
Chris@19 4522 double *in, double *out,
Chris@19 4523 MPI_Comm comm, const fftw_r2r_kind *kind,
Chris@19 4524 unsigned flags);
Chris@19 4525
Chris@19 4526 The parameters are much the same as for the complex DFTs above,
Chris@19 4527 except that the arrays are of real numbers (and hence the outputs of the
Chris@19 4528 `local_size' data-distribution functions should be interpreted as
Chris@19 4529 counts of real rather than complex numbers). Also, the `kind'
Chris@19 4530 parameters specify the r2r kinds along each dimension as for the serial
Chris@19 4531 interface (*note Real-to-Real Transform Kinds::). *Note Other
Chris@19 4532 Multi-dimensional Real-data MPI Transforms::.
Chris@19 4533
Chris@19 4534 MPI transposition
Chris@19 4535 .................
Chris@19 4536
Chris@19 4537 FFTW also provides routines to plan a transpose of a distributed `n0'
Chris@19 4538 by `n1' array of real numbers, or an array of `howmany'-tuples of real
Chris@19 4539 numbers with specified block sizes (*note FFTW MPI Transposes::):
Chris@19 4540
Chris@19 4541 fftw_plan fftw_mpi_plan_transpose(ptrdiff_t n0, ptrdiff_t n1,
Chris@19 4542 double *in, double *out,
Chris@19 4543 MPI_Comm comm, unsigned flags);
Chris@19 4544 fftw_plan fftw_mpi_plan_many_transpose
Chris@19 4545 (ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t howmany,
Chris@19 4546 ptrdiff_t block0, ptrdiff_t block1,
Chris@19 4547 double *in, double *out, MPI_Comm comm, unsigned flags);
Chris@19 4548
Chris@19 4549 These plans are used with the `fftw_mpi_execute_r2r' new-array
Chris@19 4550 execute function (*note Using MPI Plans::), since they count as (rank
Chris@19 4551 zero) r2r plans from FFTW's perspective.
Chris@19 4552
Chris@19 4553 
Chris@19 4554 File: fftw3.info, Node: MPI Wisdom Communication, Prev: MPI Plan Creation, Up: FFTW MPI Reference
Chris@19 4555
Chris@19 4556 6.12.6 MPI Wisdom Communication
Chris@19 4557 -------------------------------
Chris@19 4558
Chris@19 4559 To facilitate synchronizing wisdom among the different MPI processes,
Chris@19 4560 we provide two functions:
Chris@19 4561
Chris@19 4562 void fftw_mpi_gather_wisdom(MPI_Comm comm);
Chris@19 4563 void fftw_mpi_broadcast_wisdom(MPI_Comm comm);
Chris@19 4564
Chris@19 4565 The `fftw_mpi_gather_wisdom' function gathers all wisdom in the
Chris@19 4566 given communicator `comm' to the process of rank 0 in the communicator:
Chris@19 4567 that process obtains the union of all wisdom on all the processes. As
Chris@19 4568 a side effect, some other processes will gain additional wisdom from
Chris@19 4569 other processes, but only process 0 will gain the complete union.
Chris@19 4570
Chris@19 4571 The `fftw_mpi_broadcast_wisdom' does the reverse: it exports wisdom
Chris@19 4572 from process 0 in `comm' to all other processes in the communicator,
Chris@19 4573 replacing any wisdom they currently have.
Chris@19 4574
Chris@19 4575 *Note FFTW MPI Wisdom::.
Chris@19 4576
Chris@19 4577 
Chris@19 4578 File: fftw3.info, Node: FFTW MPI Fortran Interface, Prev: FFTW MPI Reference, Up: Distributed-memory FFTW with MPI
Chris@19 4579
Chris@19 4580 6.13 FFTW MPI Fortran Interface
Chris@19 4581 ===============================
Chris@19 4582
Chris@19 4583 The FFTW MPI interface is callable from modern Fortran compilers
Chris@19 4584 supporting the Fortran 2003 `iso_c_binding' standard for calling C
Chris@19 4585 functions. As described in *note Calling FFTW from Modern Fortran::,
Chris@19 4586 this means that you can directly call FFTW's C interface from Fortran
Chris@19 4587 with only minor changes in syntax. There are, however, a few things
Chris@19 4588 specific to the MPI interface to keep in mind:
Chris@19 4589
Chris@19 4590 * Instead of including `fftw3.f03' as in *note Overview of Fortran
Chris@19 4591 interface::, you should `include 'fftw3-mpi.f03'' (after `use,
Chris@19 4592 intrinsic :: iso_c_binding' as before). The `fftw3-mpi.f03' file
Chris@19 4593 includes `fftw3.f03', so you should _not_ `include' them both
Chris@19 4594 yourself. (You will also want to include the MPI header file,
Chris@19 4595 usually via `include 'mpif.h'' or similar, although though this is
Chris@19 4596 not needed by `fftw3-mpi.f03' per se.) (To use the `fftwl_' `long
Chris@19 4597 double' extended-precision routines in supporting compilers, you
Chris@19 4598 should include `fftw3f-mpi.f03' in _addition_ to `fftw3-mpi.f03'.
Chris@19 4599 *Note Extended and quadruple precision in Fortran::.)
Chris@19 4600
Chris@19 4601 * Because of the different storage conventions between C and Fortran,
Chris@19 4602 you reverse the order of your array dimensions when passing them to
Chris@19 4603 FFTW (*note Reversing array dimensions::). This is merely a
Chris@19 4604 difference in notation and incurs no performance overhead.
Chris@19 4605 However, it means that, whereas in C the _first_ dimension is
Chris@19 4606 distributed, in Fortran the _last_ dimension of your array is
Chris@19 4607 distributed.
Chris@19 4608
Chris@19 4609 * In Fortran, communicators are stored as `integer' types; there is
Chris@19 4610 no `MPI_Comm' type, nor is there any way to access a C `MPI_Comm'.
Chris@19 4611 Fortunately, this is taken care of for you by the FFTW Fortran
Chris@19 4612 interface: whenever the C interface expects an `MPI_Comm' type,
Chris@19 4613 you should pass the Fortran communicator as an `integer'.(1)
Chris@19 4614
Chris@19 4615 * Because you need to call the `local_size' function to find out how
Chris@19 4616 much space to allocate, and this may be _larger_ than the local
Chris@19 4617 portion of the array (*note MPI Data Distribution::), you should
Chris@19 4618 _always_ allocate your arrays dynamically using FFTW's allocation
Chris@19 4619 routines as described in *note Allocating aligned memory in
Chris@19 4620 Fortran::. (Coincidentally, this also provides the best
Chris@19 4621 performance by guaranteeding proper data alignment.)
Chris@19 4622
Chris@19 4623 * Because all sizes in the MPI FFTW interface are declared as
Chris@19 4624 `ptrdiff_t' in C, you should use `integer(C_INTPTR_T)' in Fortran
Chris@19 4625 (*note FFTW Fortran type reference::).
Chris@19 4626
Chris@19 4627 * In Fortran, because of the language semantics, we generally
Chris@19 4628 recommend using the new-array execute functions for all plans,
Chris@19 4629 even in the common case where you are executing the plan on the
Chris@19 4630 same arrays for which the plan was created (*note Plan execution
Chris@19 4631 in Fortran::). However, note that in the MPI interface these
Chris@19 4632 functions are changed: `fftw_execute_dft' becomes
Chris@19 4633 `fftw_mpi_execute_dft', etcetera. *Note Using MPI Plans::.
Chris@19 4634
Chris@19 4635
Chris@19 4636 For example, here is a Fortran code snippet to perform a distributed
Chris@19 4637 L x M complex DFT in-place. (This assumes you have already
Chris@19 4638 initialized MPI with `MPI_init' and have also performed `call
Chris@19 4639 fftw_mpi_init'.)
Chris@19 4640
Chris@19 4641 use, intrinsic :: iso_c_binding
Chris@19 4642 include 'fftw3-mpi.f03'
Chris@19 4643 integer(C_INTPTR_T), parameter :: L = ...
Chris@19 4644 integer(C_INTPTR_T), parameter :: M = ...
Chris@19 4645 type(C_PTR) :: plan, cdata
Chris@19 4646 complex(C_DOUBLE_COMPLEX), pointer :: data(:,:)
Chris@19 4647 integer(C_INTPTR_T) :: i, j, alloc_local, local_M, local_j_offset
Chris@19 4648
Chris@19 4649 ! get local data size and allocate (note dimension reversal)
Chris@19 4650 alloc_local = fftw_mpi_local_size_2d(M, L, MPI_COMM_WORLD, &
Chris@19 4651 local_M, local_j_offset)
Chris@19 4652 cdata = fftw_alloc_complex(alloc_local)
Chris@19 4653 call c_f_pointer(cdata, data, [L,local_M])
Chris@19 4654
Chris@19 4655 ! create MPI plan for in-place forward DFT (note dimension reversal)
Chris@19 4656 plan = fftw_mpi_plan_dft_2d(M, L, data, data, MPI_COMM_WORLD, &
Chris@19 4657 FFTW_FORWARD, FFTW_MEASURE)
Chris@19 4658
Chris@19 4659 ! initialize data to some function my_function(i,j)
Chris@19 4660 do j = 1, local_M
Chris@19 4661 do i = 1, L
Chris@19 4662 data(i, j) = my_function(i, j + local_j_offset)
Chris@19 4663 end do
Chris@19 4664 end do
Chris@19 4665
Chris@19 4666 ! compute transform (as many times as desired)
Chris@19 4667 call fftw_mpi_execute_dft(plan, data, data)
Chris@19 4668
Chris@19 4669 call fftw_destroy_plan(plan)
Chris@19 4670 call fftw_free(cdata)
Chris@19 4671
Chris@19 4672 Note that when we called `fftw_mpi_local_size_2d' and
Chris@19 4673 `fftw_mpi_plan_dft_2d' with the dimensions in reversed order, since a L
Chris@19 4674 x M Fortran array is viewed by FFTW in C as a M x L array. This
Chris@19 4675 means that the array was distributed over the `M' dimension, the local
Chris@19 4676 portion of which is a L x local_M array in Fortran. (You must _not_
Chris@19 4677 use an `allocate' statement to allocate an L x local_M array, however;
Chris@19 4678 you must allocate `alloc_local' complex numbers, which may be greater
Chris@19 4679 than `L * local_M', in order to reserve space for intermediate steps of
Chris@19 4680 the transform.) Finally, we mention that because C's array indices are
Chris@19 4681 zero-based, the `local_j_offset' argument can conveniently be
Chris@19 4682 interpreted as an offset in the 1-based `j' index (rather than as a
Chris@19 4683 starting index as in C).
Chris@19 4684
Chris@19 4685 If instead you had used the `ior(FFTW_MEASURE,
Chris@19 4686 FFTW_MPI_TRANSPOSED_OUT)' flag, the output of the transform would be a
Chris@19 4687 transposed M x local_L array, associated with the _same_ `cdata'
Chris@19 4688 allocation (since the transform is in-place), and which you could
Chris@19 4689 declare with:
Chris@19 4690
Chris@19 4691 complex(C_DOUBLE_COMPLEX), pointer :: tdata(:,:)
Chris@19 4692 ...
Chris@19 4693 call c_f_pointer(cdata, tdata, [M,local_L])
Chris@19 4694
Chris@19 4695 where `local_L' would have been obtained by changing the
Chris@19 4696 `fftw_mpi_local_size_2d' call to:
Chris@19 4697
Chris@19 4698 alloc_local = fftw_mpi_local_size_2d_transposed(M, L, MPI_COMM_WORLD, &
Chris@19 4699 local_M, local_j_offset, local_L, local_i_offset)
Chris@19 4700
Chris@19 4701 ---------- Footnotes ----------
Chris@19 4702
Chris@19 4703 (1) Technically, this is because you aren't actually calling the C
Chris@19 4704 functions directly. You are calling wrapper functions that translate
Chris@19 4705 the communicator with `MPI_Comm_f2c' before calling the ordinary C
Chris@19 4706 interface. This is all done transparently, however, since the
Chris@19 4707 `fftw3-mpi.f03' interface file renames the wrappers so that they are
Chris@19 4708 called in Fortran with the same names as the C interface functions.
Chris@19 4709
Chris@19 4710 
Chris@19 4711 File: fftw3.info, Node: Calling FFTW from Modern Fortran, Next: Calling FFTW from Legacy Fortran, Prev: Distributed-memory FFTW with MPI, Up: Top
Chris@19 4712
Chris@19 4713 7 Calling FFTW from Modern Fortran
Chris@19 4714 **********************************
Chris@19 4715
Chris@19 4716 Fortran 2003 standardized ways for Fortran code to call C libraries,
Chris@19 4717 and this allows us to support a direct translation of the FFTW C API
Chris@19 4718 into Fortran. Compared to the legacy Fortran 77 interface (*note
Chris@19 4719 Calling FFTW from Legacy Fortran::), this direct interface offers many
Chris@19 4720 advantages, especially compile-time type-checking and aligned memory
Chris@19 4721 allocation. As of this writing, support for these C interoperability
Chris@19 4722 features seems widespread, having been implemented in nearly all major
Chris@19 4723 Fortran compilers (e.g. GNU, Intel, IBM, Oracle/Solaris, Portland
Chris@19 4724 Group, NAG).
Chris@19 4725
Chris@19 4726 This chapter documents that interface. For the most part, since this
Chris@19 4727 interface allows Fortran to call the C interface directly, the usage is
Chris@19 4728 identical to C translated to Fortran syntax. However, there are a few
Chris@19 4729 subtle points such as memory allocation, wisdom, and data types that
Chris@19 4730 deserve closer attention.
Chris@19 4731
Chris@19 4732 * Menu:
Chris@19 4733
Chris@19 4734 * Overview of Fortran interface::
Chris@19 4735 * Reversing array dimensions::
Chris@19 4736 * FFTW Fortran type reference::
Chris@19 4737 * Plan execution in Fortran::
Chris@19 4738 * Allocating aligned memory in Fortran::
Chris@19 4739 * Accessing the wisdom API from Fortran::
Chris@19 4740 * Defining an FFTW module::
Chris@19 4741
Chris@19 4742 
Chris@19 4743 File: fftw3.info, Node: Overview of Fortran interface, Next: Reversing array dimensions, Prev: Calling FFTW from Modern Fortran, Up: Calling FFTW from Modern Fortran
Chris@19 4744
Chris@19 4745 7.1 Overview of Fortran interface
Chris@19 4746 =================================
Chris@19 4747
Chris@19 4748 FFTW provides a file `fftw3.f03' that defines Fortran 2003 interfaces
Chris@19 4749 for all of its C routines, except for the MPI routines described
Chris@19 4750 elsewhere, which can be found in the same directory as `fftw3.h' (the C
Chris@19 4751 header file). In any Fortran subroutine where you want to use FFTW
Chris@19 4752 functions, you should begin with:
Chris@19 4753
Chris@19 4754 use, intrinsic :: iso_c_binding
Chris@19 4755 include 'fftw3.f03'
Chris@19 4756
Chris@19 4757 This includes the interface definitions and the standard
Chris@19 4758 `iso_c_binding' module (which defines the equivalents of C types). You
Chris@19 4759 can also put the FFTW functions into a module if you prefer (*note
Chris@19 4760 Defining an FFTW module::).
Chris@19 4761
Chris@19 4762 At this point, you can now call anything in the FFTW C interface
Chris@19 4763 directly, almost exactly as in C other than minor changes in syntax.
Chris@19 4764 For example:
Chris@19 4765
Chris@19 4766 type(C_PTR) :: plan
Chris@19 4767 complex(C_DOUBLE_COMPLEX), dimension(1024,1000) :: in, out
Chris@19 4768 plan = fftw_plan_dft_2d(1000,1024, in,out, FFTW_FORWARD,FFTW_ESTIMATE)
Chris@19 4769 ...
Chris@19 4770 call fftw_execute_dft(plan, in, out)
Chris@19 4771 ...
Chris@19 4772 call fftw_destroy_plan(plan)
Chris@19 4773
Chris@19 4774 A few important things to keep in mind are:
Chris@19 4775
Chris@19 4776 * FFTW plans are `type(C_PTR)'. Other C types are mapped in the
Chris@19 4777 obvious way via the `iso_c_binding' standard: `int' turns into
Chris@19 4778 `integer(C_INT)', `fftw_complex' turns into
Chris@19 4779 `complex(C_DOUBLE_COMPLEX)', `double' turns into `real(C_DOUBLE)',
Chris@19 4780 and so on. *Note FFTW Fortran type reference::.
Chris@19 4781
Chris@19 4782 * Functions in C become functions in Fortran if they have a return
Chris@19 4783 value, and subroutines in Fortran otherwise.
Chris@19 4784
Chris@19 4785 * The ordering of the Fortran array dimensions must be _reversed_
Chris@19 4786 when they are passed to the FFTW plan creation, thanks to
Chris@19 4787 differences in array indexing conventions (*note Multi-dimensional
Chris@19 4788 Array Format::). This is _unlike_ the legacy Fortran interface
Chris@19 4789 (*note Fortran-interface routines::), which reversed the dimensions
Chris@19 4790 for you. *Note Reversing array dimensions::.
Chris@19 4791
Chris@19 4792 * Using ordinary Fortran array declarations like this works, but may
Chris@19 4793 yield suboptimal performance because the data may not be not
Chris@19 4794 aligned to exploit SIMD instructions on modern proessors (*note
Chris@19 4795 SIMD alignment and fftw_malloc::). Better performance will often
Chris@19 4796 be obtained by allocating with `fftw_alloc'. *Note Allocating
Chris@19 4797 aligned memory in Fortran::.
Chris@19 4798
Chris@19 4799 * Similar to the legacy Fortran interface (*note FFTW Execution in
Chris@19 4800 Fortran::), we currently recommend _not_ using `fftw_execute' but
Chris@19 4801 rather using the more specialized functions like
Chris@19 4802 `fftw_execute_dft' (*note New-array Execute Functions::).
Chris@19 4803 However, you should execute the plan on the `same arrays' as the
Chris@19 4804 ones for which you created the plan, unless you are especially
Chris@19 4805 careful. *Note Plan execution in Fortran::. To prevent you from
Chris@19 4806 using `fftw_execute' by mistake, the `fftw3.f03' file does not
Chris@19 4807 provide an `fftw_execute' interface declaration.
Chris@19 4808
Chris@19 4809 * Multiple planner flags are combined with `ior' (equivalent to `|'
Chris@19 4810 in C). e.g. `FFTW_MEASURE | FFTW_DESTROY_INPUT' becomes
Chris@19 4811 `ior(FFTW_MEASURE, FFTW_DESTROY_INPUT)'. (You can also use `+' as
Chris@19 4812 long as you don't try to include a given flag more than once.)
Chris@19 4813
Chris@19 4814
Chris@19 4815 * Menu:
Chris@19 4816
Chris@19 4817 * Extended and quadruple precision in Fortran::
Chris@19 4818
Chris@19 4819 
Chris@19 4820 File: fftw3.info, Node: Extended and quadruple precision in Fortran, Prev: Overview of Fortran interface, Up: Overview of Fortran interface
Chris@19 4821
Chris@19 4822 7.1.1 Extended and quadruple precision in Fortran
Chris@19 4823 -------------------------------------------------
Chris@19 4824
Chris@19 4825 If FFTW is compiled in `long double' (extended) precision (*note
Chris@19 4826 Installation and Customization::), you may be able to call the
Chris@19 4827 resulting `fftwl_' routines (*note Precision::) from Fortran if your
Chris@19 4828 compiler supports the `C_LONG_DOUBLE_COMPLEX' type code.
Chris@19 4829
Chris@19 4830 Because some Fortran compilers do not support
Chris@19 4831 `C_LONG_DOUBLE_COMPLEX', the `fftwl_' declarations are segregated into
Chris@19 4832 a separate interface file `fftw3l.f03', which you should include _in
Chris@19 4833 addition_ to `fftw3.f03' (which declares precision-independent `FFTW_'
Chris@19 4834 constants):
Chris@19 4835
Chris@19 4836 use, intrinsic :: iso_c_binding
Chris@19 4837 include 'fftw3.f03'
Chris@19 4838 include 'fftw3l.f03'
Chris@19 4839
Chris@19 4840 We also support using the nonstandard `__float128'
Chris@19 4841 quadruple-precision type provided by recent versions of `gcc' on 32-
Chris@19 4842 and 64-bit x86 hardware (*note Installation and Customization::), using
Chris@19 4843 the corresponding `real(16)' and `complex(16)' types supported by
Chris@19 4844 `gfortran'. The quadruple-precision `fftwq_' functions (*note
Chris@19 4845 Precision::) are declared in a `fftw3q.f03' interface file, which
Chris@19 4846 should be included in addition to `fftw3l.f03', as above. You should
Chris@19 4847 also link with `-lfftw3q -lquadmath -lm' as in C.
Chris@19 4848
Chris@19 4849 
Chris@19 4850 File: fftw3.info, Node: Reversing array dimensions, Next: FFTW Fortran type reference, Prev: Overview of Fortran interface, Up: Calling FFTW from Modern Fortran
Chris@19 4851
Chris@19 4852 7.2 Reversing array dimensions
Chris@19 4853 ==============================
Chris@19 4854
Chris@19 4855 A minor annoyance in calling FFTW from Fortran is that FFTW's array
Chris@19 4856 dimensions are defined in the C convention (row-major order), while
Chris@19 4857 Fortran's array dimensions are the opposite convention (column-major
Chris@19 4858 order). *Note Multi-dimensional Array Format::. This is just a
Chris@19 4859 bookkeeping difference, with no effect on performance. The only
Chris@19 4860 consequence of this is that, whenever you create an FFTW plan for a
Chris@19 4861 multi-dimensional transform, you must always _reverse the ordering of
Chris@19 4862 the dimensions_.
Chris@19 4863
Chris@19 4864 For example, consider the three-dimensional (L x M x N ) arrays:
Chris@19 4865
Chris@19 4866 complex(C_DOUBLE_COMPLEX), dimension(L,M,N) :: in, out
Chris@19 4867
Chris@19 4868 To plan a DFT for these arrays using `fftw_plan_dft_3d', you could
Chris@19 4869 do:
Chris@19 4870
Chris@19 4871 plan = fftw_plan_dft_3d(N,M,L, in,out, FFTW_FORWARD,FFTW_ESTIMATE)
Chris@19 4872
Chris@19 4873 That is, from FFTW's perspective this is a N x M x L array. _No
Chris@19 4874 data transposition need occur_, as this is _only notation_. Similarly,
Chris@19 4875 to use the more generic routine `fftw_plan_dft' with the same arrays,
Chris@19 4876 you could do:
Chris@19 4877
Chris@19 4878 integer(C_INT), dimension(3) :: n = [N,M,L]
Chris@19 4879 plan = fftw_plan_dft_3d(3, n, in,out, FFTW_FORWARD,FFTW_ESTIMATE)
Chris@19 4880
Chris@19 4881 Note, by the way, that this is different from the legacy Fortran
Chris@19 4882 interface (*note Fortran-interface routines::), which automatically
Chris@19 4883 reverses the order of the array dimension for you. Here, you are
Chris@19 4884 calling the C interface directly, so there is no "translation" layer.
Chris@19 4885
Chris@19 4886 An important thing to keep in mind is the implication of this for
Chris@19 4887 multidimensional real-to-complex transforms (*note Multi-Dimensional
Chris@19 4888 DFTs of Real Data::). In C, a multidimensional real-to-complex DFT
Chris@19 4889 chops the last dimension roughly in half (N x M x L real input goes to
Chris@19 4890 N x M x L/2+1 complex output). In Fortran, because the array
Chris@19 4891 dimension notation is reversed, the _first_ dimension of the complex
Chris@19 4892 data is chopped roughly in half. For example consider the `r2c'
Chris@19 4893 transform of L x M x N real input in Fortran:
Chris@19 4894
Chris@19 4895 type(C_PTR) :: plan
Chris@19 4896 real(C_DOUBLE), dimension(L,M,N) :: in
Chris@19 4897 complex(C_DOUBLE_COMPLEX), dimension(L/2+1,M,N) :: out
Chris@19 4898 plan = fftw_plan_dft_r2c_3d(N,M,L, in,out, FFTW_ESTIMATE)
Chris@19 4899 ...
Chris@19 4900 call fftw_execute_dft_r2c(plan, in, out)
Chris@19 4901
Chris@19 4902 Alternatively, for an in-place r2c transform, as described in the C
Chris@19 4903 documentation we must _pad_ the _first_ dimension of the real input
Chris@19 4904 with an extra two entries (which are ignored by FFTW) so as to leave
Chris@19 4905 enough space for the complex output. The input is _allocated_ as a
Chris@19 4906 2[L/2+1] x M x N array, even though only L x M x N of it is actually
Chris@19 4907 used. In this example, we will allocate the array as a pointer type,
Chris@19 4908 using `fftw_alloc' to ensure aligned memory for maximum performance
Chris@19 4909 (*note Allocating aligned memory in Fortran::); this also makes it easy
Chris@19 4910 to reference the same memory as both a real array and a complex array.
Chris@19 4911
Chris@19 4912 real(C_DOUBLE), pointer :: in(:,:,:)
Chris@19 4913 complex(C_DOUBLE_COMPLEX), pointer :: out(:,:,:)
Chris@19 4914 type(C_PTR) :: plan, data
Chris@19 4915 data = fftw_alloc_complex(int((L/2+1) * M * N, C_SIZE_T))
Chris@19 4916 call c_f_pointer(data, in, [2*(L/2+1),M,N])
Chris@19 4917 call c_f_pointer(data, out, [L/2+1,M,N])
Chris@19 4918 plan = fftw_plan_dft_r2c_3d(N,M,L, in,out, FFTW_ESTIMATE)
Chris@19 4919 ...
Chris@19 4920 call fftw_execute_dft_r2c(plan, in, out)
Chris@19 4921 ...
Chris@19 4922 call fftw_destroy_plan(plan)
Chris@19 4923 call fftw_free(data)
Chris@19 4924
Chris@19 4925 
Chris@19 4926 File: fftw3.info, Node: FFTW Fortran type reference, Next: Plan execution in Fortran, Prev: Reversing array dimensions, Up: Calling FFTW from Modern Fortran
Chris@19 4927
Chris@19 4928 7.3 FFTW Fortran type reference
Chris@19 4929 ===============================
Chris@19 4930
Chris@19 4931 The following are the most important type correspondences between the C
Chris@19 4932 interface and Fortran:
Chris@19 4933
Chris@19 4934 * Plans (`fftw_plan' and variants) are `type(C_PTR)' (i.e. an opaque
Chris@19 4935 pointer).
Chris@19 4936
Chris@19 4937 * The C floating-point types `double', `float', and `long double'
Chris@19 4938 correspond to `real(C_DOUBLE)', `real(C_FLOAT)', and
Chris@19 4939 `real(C_LONG_DOUBLE)', respectively. The C complex types
Chris@19 4940 `fftw_complex', `fftwf_complex', and `fftwl_complex' correspond in
Chris@19 4941 Fortran to `complex(C_DOUBLE_COMPLEX)',
Chris@19 4942 `complex(C_FLOAT_COMPLEX)', and `complex(C_LONG_DOUBLE_COMPLEX)',
Chris@19 4943 respectively. Just as in C (*note Precision::), the FFTW
Chris@19 4944 subroutines and types are prefixed with `fftw_', `fftwf_', and
Chris@19 4945 `fftwl_' for the different precisions, and link to different
Chris@19 4946 libraries (`-lfftw3', `-lfftw3f', and `-lfftw3l' on Unix), but use
Chris@19 4947 the _same_ include file `fftw3.f03' and the _same_ constants (all
Chris@19 4948 of which begin with `FFTW_'). The exception is `long double'
Chris@19 4949 precision, for which you should _also_ include `fftw3l.f03' (*note
Chris@19 4950 Extended and quadruple precision in Fortran::).
Chris@19 4951
Chris@19 4952 * The C integer types `int' and `unsigned' (used for planner flags)
Chris@19 4953 become `integer(C_INT)'. The C integer type `ptrdiff_t' (e.g. in
Chris@19 4954 the *note 64-bit Guru Interface::) becomes `integer(C_INTPTR_T)',
Chris@19 4955 and `size_t' (in `fftw_malloc' etc.) becomes `integer(C_SIZE_T)'.
Chris@19 4956
Chris@19 4957 * The `fftw_r2r_kind' type (*note Real-to-Real Transform Kinds::)
Chris@19 4958 becomes `integer(C_FFTW_R2R_KIND)'. The various constant values
Chris@19 4959 of the C enumerated type (`FFTW_R2HC' etc.) become simply integer
Chris@19 4960 constants of the same names in Fortran.
Chris@19 4961
Chris@19 4962 * Numeric array pointer arguments (e.g. `double *') become
Chris@19 4963 `dimension(*), intent(out)' arrays of the same type, or
Chris@19 4964 `dimension(*), intent(in)' if they are pointers to constant data
Chris@19 4965 (e.g. `const int *'). There are a few exceptions where numeric
Chris@19 4966 pointers refer to scalar outputs (e.g. for `fftw_flops'), in which
Chris@19 4967 case they are `intent(out)' scalar arguments in Fortran too. For
Chris@19 4968 the new-array execute functions (*note New-array Execute
Chris@19 4969 Functions::), the input arrays are declared `dimension(*),
Chris@19 4970 intent(inout)', since they can be modified in the case of in-place
Chris@19 4971 or `FFTW_DESTROY_INPUT' transforms.
Chris@19 4972
Chris@19 4973 * Pointer _return_ values (e.g `double *') become `type(C_PTR)'.
Chris@19 4974 (If they are pointers to arrays, as for `fftw_alloc_real', you can
Chris@19 4975 convert them back to Fortran array pointers with the standard
Chris@19 4976 intrinsic function `c_f_pointer'.)
Chris@19 4977
Chris@19 4978 * The `fftw_iodim' type in the guru interface (*note Guru vector and
Chris@19 4979 transform sizes::) becomes `type(fftw_iodim)' in Fortran, a
Chris@19 4980 derived data type (the Fortran analogue of C's `struct') with
Chris@19 4981 three `integer(C_INT)' components: `n', `is', and `os', with the
Chris@19 4982 same meanings as in C. The `fftw_iodim64' type in the 64-bit guru
Chris@19 4983 interface (*note 64-bit Guru Interface::) is the same, except that
Chris@19 4984 its components are of type `integer(C_INTPTR_T)'.
Chris@19 4985
Chris@19 4986 * Using the wisdom import/export functions from Fortran is a bit
Chris@19 4987 tricky, and is discussed in *note Accessing the wisdom API from
Chris@19 4988 Fortran::. In brief, the `FILE *' arguments map to `type(C_PTR)',
Chris@19 4989 `const char *' to `character(C_CHAR), dimension(*), intent(in)'
Chris@19 4990 (null-terminated!), and the generic read-char/write-char functions
Chris@19 4991 map to `type(C_FUNPTR)'.
Chris@19 4992
Chris@19 4993
Chris@19 4994 You may be wondering if you need to search-and-replace
Chris@19 4995 `real(kind(0.0d0))' (or whatever your favorite Fortran spelling of
Chris@19 4996 "double precision" is) with `real(C_DOUBLE)' everywhere in your
Chris@19 4997 program, and similarly for `complex' and `integer' types. The answer
Chris@19 4998 is no; you can still use your existing types. As long as these types
Chris@19 4999 match their C counterparts, things should work without a hitch. The
Chris@19 5000 worst that can happen, e.g. in the (unlikely) event of a system where
Chris@19 5001 `real(kind(0.0d0))' is different from `real(C_DOUBLE)', is that the
Chris@19 5002 compiler will give you a type-mismatch error. That is, if you don't
Chris@19 5003 use the `iso_c_binding' kinds you need to accept at least the
Chris@19 5004 theoretical possibility of having to change your code in response to
Chris@19 5005 compiler errors on some future machine, but you don't need to worry
Chris@19 5006 about silently compiling incorrect code that yields runtime errors.
Chris@19 5007
Chris@19 5008 
Chris@19 5009 File: fftw3.info, Node: Plan execution in Fortran, Next: Allocating aligned memory in Fortran, Prev: FFTW Fortran type reference, Up: Calling FFTW from Modern Fortran
Chris@19 5010
Chris@19 5011 7.4 Plan execution in Fortran
Chris@19 5012 =============================
Chris@19 5013
Chris@19 5014 In C, in order to use a plan, one normally calls `fftw_execute', which
Chris@19 5015 executes the plan to perform the transform on the input/output arrays
Chris@19 5016 passed when the plan was created (*note Using Plans::). The
Chris@19 5017 corresponding subroutine call in modern Fortran is:
Chris@19 5018 call fftw_execute(plan)
Chris@19 5019
Chris@19 5020 However, we have had reports that this causes problems with some
Chris@19 5021 recent optimizing Fortran compilers. The problem is, because the
Chris@19 5022 input/output arrays are not passed as explicit arguments to
Chris@19 5023 `fftw_execute', the semantics of Fortran (unlike C) allow the compiler
Chris@19 5024 to assume that the input/output arrays are not changed by
Chris@19 5025 `fftw_execute'. As a consequence, certain compilers end up
Chris@19 5026 repositioning the call to `fftw_execute', assuming incorrectly that it
Chris@19 5027 does nothing to the arrays.
Chris@19 5028
Chris@19 5029 There are various workarounds to this, but the safest and simplest
Chris@19 5030 thing is to not use `fftw_execute' in Fortran. Instead, use the
Chris@19 5031 functions described in *note New-array Execute Functions::, which take
Chris@19 5032 the input/output arrays as explicit arguments. For example, if the
Chris@19 5033 plan is for a complex-data DFT and was created for the arrays `in' and
Chris@19 5034 `out', you would do:
Chris@19 5035 call fftw_execute_dft(plan, in, out)
Chris@19 5036
Chris@19 5037 There are a few things to be careful of, however:
Chris@19 5038
Chris@19 5039 * You must use the correct type of execute function, matching the way
Chris@19 5040 the plan was created. Complex DFT plans should use
Chris@19 5041 `fftw_execute_dft', Real-input (r2c) DFT plans should use use
Chris@19 5042 `fftw_execute_dft_r2c', and real-output (c2r) DFT plans should use
Chris@19 5043 `fftw_execute_dft_c2r'. The various r2r plans should use
Chris@19 5044 `fftw_execute_r2r'. Fortunately, if you use the wrong one you
Chris@19 5045 will get a compile-time type-mismatch error (unlike legacy
Chris@19 5046 Fortran).
Chris@19 5047
Chris@19 5048 * You should normally pass the same input/output arrays that were
Chris@19 5049 used when creating the plan. This is always safe.
Chris@19 5050
Chris@19 5051 * _If_ you pass _different_ input/output arrays compared to those
Chris@19 5052 used when creating the plan, you must abide by all the
Chris@19 5053 restrictions of the new-array execute functions (*note New-array
Chris@19 5054 Execute Functions::). The most tricky of these is the requirement
Chris@19 5055 that the new arrays have the same alignment as the original
Chris@19 5056 arrays; the best (and possibly only) way to guarantee this is to
Chris@19 5057 use the `fftw_alloc' functions to allocate your arrays (*note
Chris@19 5058 Allocating aligned memory in Fortran::). Alternatively, you can
Chris@19 5059 use the `FFTW_UNALIGNED' flag when creating the plan, in which
Chris@19 5060 case the plan does not depend on the alignment, but this may
Chris@19 5061 sacrifice substantial performance on architectures (like x86) with
Chris@19 5062 SIMD instructions (*note SIMD alignment and fftw_malloc::).
Chris@19 5063
Chris@19 5064
Chris@19 5065 
Chris@19 5066 File: fftw3.info, Node: Allocating aligned memory in Fortran, Next: Accessing the wisdom API from Fortran, Prev: Plan execution in Fortran, Up: Calling FFTW from Modern Fortran
Chris@19 5067
Chris@19 5068 7.5 Allocating aligned memory in Fortran
Chris@19 5069 ========================================
Chris@19 5070
Chris@19 5071 In order to obtain maximum performance in FFTW, you should store your
Chris@19 5072 data in arrays that have been specially aligned in memory (*note SIMD
Chris@19 5073 alignment and fftw_malloc::). Enforcing alignment also permits you to
Chris@19 5074 safely use the new-array execute functions (*note New-array Execute
Chris@19 5075 Functions::) to apply a given plan to more than one pair of in/out
Chris@19 5076 arrays. Unfortunately, standard Fortran arrays do _not_ provide any
Chris@19 5077 alignment guarantees. The _only_ way to allocate aligned memory in
Chris@19 5078 standard Fortran is to allocate it with an external C function, like
Chris@19 5079 the `fftw_alloc_real' and `fftw_alloc_complex' functions. Fortunately,
Chris@19 5080 Fortran 2003 provides a simple way to associate such allocated memory
Chris@19 5081 with a standard Fortran array pointer that you can then use normally.
Chris@19 5082
Chris@19 5083 We therefore recommend allocating all your input/output arrays using
Chris@19 5084 the following technique:
Chris@19 5085
Chris@19 5086 1. Declare a `pointer', `arr', to your array of the desired type and
Chris@19 5087 dimensions. For example, `real(C_DOUBLE), pointer :: a(:,:)' for
Chris@19 5088 a 2d real array, or `complex(C_DOUBLE_COMPLEX), pointer ::
Chris@19 5089 a(:,:,:)' for a 3d complex array.
Chris@19 5090
Chris@19 5091 2. The number of elements to allocate must be an `integer(C_SIZE_T)'.
Chris@19 5092 You can either declare a variable of this type, e.g.
Chris@19 5093 `integer(C_SIZE_T) :: sz', to store the number of elements to
Chris@19 5094 allocate, or you can use the `int(..., C_SIZE_T)' intrinsic
Chris@19 5095 function. e.g. set `sz = L * M * N' or use `int(L * M * N,
Chris@19 5096 C_SIZE_T)' for an L x M x N array.
Chris@19 5097
Chris@19 5098 3. Declare a `type(C_PTR) :: p' to hold the return value from FFTW's
Chris@19 5099 allocation routine. Set `p = fftw_alloc_real(sz)' for a real
Chris@19 5100 array, or `p = fftw_alloc_complex(sz)' for a complex array.
Chris@19 5101
Chris@19 5102 4. Associate your pointer `arr' with the allocated memory `p' using
Chris@19 5103 the standard `c_f_pointer' subroutine: `call c_f_pointer(p, arr,
Chris@19 5104 [...dimensions...])', where `[...dimensions...])' are an array of
Chris@19 5105 the dimensions of the array (in the usual Fortran order). e.g.
Chris@19 5106 `call c_f_pointer(p, arr, [L,M,N])' for an L x M x N array.
Chris@19 5107 (Alternatively, you can omit the dimensions argument if you
Chris@19 5108 specified the shape explicitly when declaring `arr'.) You can now
Chris@19 5109 use `arr' as a usual multidimensional array.
Chris@19 5110
Chris@19 5111 5. When you are done using the array, deallocate the memory by `call
Chris@19 5112 fftw_free(p)' on `p'.
Chris@19 5113
Chris@19 5114
Chris@19 5115 For example, here is how we would allocate an L x M 2d real array:
Chris@19 5116
Chris@19 5117 real(C_DOUBLE), pointer :: arr(:,:)
Chris@19 5118 type(C_PTR) :: p
Chris@19 5119 p = fftw_alloc_real(int(L * M, C_SIZE_T))
Chris@19 5120 call c_f_pointer(p, arr, [L,M])
Chris@19 5121 _...use arr and arr(i,j) as usual..._
Chris@19 5122 call fftw_free(p)
Chris@19 5123
Chris@19 5124 and here is an L x M x N 3d complex array:
Chris@19 5125
Chris@19 5126 complex(C_DOUBLE_COMPLEX), pointer :: arr(:,:,:)
Chris@19 5127 type(C_PTR) :: p
Chris@19 5128 p = fftw_alloc_complex(int(L * M * N, C_SIZE_T))
Chris@19 5129 call c_f_pointer(p, arr, [L,M,N])
Chris@19 5130 _...use arr and arr(i,j,k) as usual..._
Chris@19 5131 call fftw_free(p)
Chris@19 5132
Chris@19 5133 See *note Reversing array dimensions:: for an example allocating a
Chris@19 5134 single array and associating both real and complex array pointers with
Chris@19 5135 it, for in-place real-to-complex transforms.
Chris@19 5136
Chris@19 5137 
Chris@19 5138 File: fftw3.info, Node: Accessing the wisdom API from Fortran, Next: Defining an FFTW module, Prev: Allocating aligned memory in Fortran, Up: Calling FFTW from Modern Fortran
Chris@19 5139
Chris@19 5140 7.6 Accessing the wisdom API from Fortran
Chris@19 5141 =========================================
Chris@19 5142
Chris@19 5143 As explained in *note Words of Wisdom-Saving Plans::, FFTW provides a
Chris@19 5144 "wisdom" API for saving plans to disk so that they can be recreated
Chris@19 5145 quickly. The C API for exporting (*note Wisdom Export::) and importing
Chris@19 5146 (*note Wisdom Import::) wisdom is somewhat tricky to use from Fortran,
Chris@19 5147 however, because of differences in file I/O and string types between C
Chris@19 5148 and Fortran.
Chris@19 5149
Chris@19 5150 * Menu:
Chris@19 5151
Chris@19 5152 * Wisdom File Export/Import from Fortran::
Chris@19 5153 * Wisdom String Export/Import from Fortran::
Chris@19 5154 * Wisdom Generic Export/Import from Fortran::
Chris@19 5155
Chris@19 5156 
Chris@19 5157 File: fftw3.info, Node: Wisdom File Export/Import from Fortran, Next: Wisdom String Export/Import from Fortran, Prev: Accessing the wisdom API from Fortran, Up: Accessing the wisdom API from Fortran
Chris@19 5158
Chris@19 5159 7.6.1 Wisdom File Export/Import from Fortran
Chris@19 5160 --------------------------------------------
Chris@19 5161
Chris@19 5162 The easiest way to export and import wisdom is to do so using
Chris@19 5163 `fftw_export_wisdom_to_filename' and `fftw_wisdom_from_filename'. The
Chris@19 5164 only trick is that these require you to pass a C string, which is an
Chris@19 5165 array of type `CHARACTER(C_CHAR)' that is terminated by `C_NULL_CHAR'.
Chris@19 5166 You can call them like this:
Chris@19 5167
Chris@19 5168 integer(C_INT) :: ret
Chris@19 5169 ret = fftw_export_wisdom_to_filename(C_CHAR_'my_wisdom.dat' // C_NULL_CHAR)
Chris@19 5170 if (ret .eq. 0) stop 'error exporting wisdom to file'
Chris@19 5171 ret = fftw_import_wisdom_from_filename(C_CHAR_'my_wisdom.dat' // C_NULL_CHAR)
Chris@19 5172 if (ret .eq. 0) stop 'error importing wisdom from file'
Chris@19 5173
Chris@19 5174 Note that prepending `C_CHAR_' is needed to specify that the literal
Chris@19 5175 string is of kind `C_CHAR', and we null-terminate the string by
Chris@19 5176 appending `// C_NULL_CHAR'. These functions return an `integer(C_INT)'
Chris@19 5177 (`ret') which is `0' if an error occurred during export/import and
Chris@19 5178 nonzero otherwise.
Chris@19 5179
Chris@19 5180 It is also possible to use the lower-level routines
Chris@19 5181 `fftw_export_wisdom_to_file' and `fftw_import_wisdom_from_file', which
Chris@19 5182 accept parameters of the C type `FILE*', expressed in Fortran as
Chris@19 5183 `type(C_PTR)'. However, you are then responsible for creating the
Chris@19 5184 `FILE*' yourself. You can do this by using `iso_c_binding' to define
Chris@19 5185 Fortran intefaces for the C library functions `fopen' and `fclose',
Chris@19 5186 which is a bit strange in Fortran but workable.
Chris@19 5187
Chris@19 5188 
Chris@19 5189 File: fftw3.info, Node: Wisdom String Export/Import from Fortran, Next: Wisdom Generic Export/Import from Fortran, Prev: Wisdom File Export/Import from Fortran, Up: Accessing the wisdom API from Fortran
Chris@19 5190
Chris@19 5191 7.6.2 Wisdom String Export/Import from Fortran
Chris@19 5192 ----------------------------------------------
Chris@19 5193
Chris@19 5194 Dealing with FFTW's C string export/import is a bit more painful. In
Chris@19 5195 particular, the `fftw_export_wisdom_to_string' function requires you to
Chris@19 5196 deal with a dynamically allocated C string. To get its length, you
Chris@19 5197 must define an interface to the C `strlen' function, and to deallocate
Chris@19 5198 it you must define an interface to C `free':
Chris@19 5199
Chris@19 5200 use, intrinsic :: iso_c_binding
Chris@19 5201 interface
Chris@19 5202 integer(C_INT) function strlen(s) bind(C, name='strlen')
Chris@19 5203 import
Chris@19 5204 type(C_PTR), value :: s
Chris@19 5205 end function strlen
Chris@19 5206 subroutine free(p) bind(C, name='free')
Chris@19 5207 import
Chris@19 5208 type(C_PTR), value :: p
Chris@19 5209 end subroutine free
Chris@19 5210 end interface
Chris@19 5211
Chris@19 5212 Given these definitions, you can then export wisdom to a Fortran
Chris@19 5213 character array:
Chris@19 5214
Chris@19 5215 character(C_CHAR), pointer :: s(:)
Chris@19 5216 integer(C_SIZE_T) :: slen
Chris@19 5217 type(C_PTR) :: p
Chris@19 5218 p = fftw_export_wisdom_to_string()
Chris@19 5219 if (.not. c_associated(p)) stop 'error exporting wisdom'
Chris@19 5220 slen = strlen(p)
Chris@19 5221 call c_f_pointer(p, s, [slen+1])
Chris@19 5222 ...
Chris@19 5223 call free(p)
Chris@19 5224
Chris@19 5225 Note that `slen' is the length of the C string, but the length of
Chris@19 5226 the array is `slen+1' because it includes the terminating null
Chris@19 5227 character. (You can omit the `+1' if you don't want Fortran to know
Chris@19 5228 about the null character.) The standard `c_associated' function checks
Chris@19 5229 whether `p' is a null pointer, which is returned by
Chris@19 5230 `fftw_export_wisdom_to_string' if there was an error.
Chris@19 5231
Chris@19 5232 To import wisdom from a string, use `fftw_import_wisdom_from_string'
Chris@19 5233 as usual; note that the argument of this function must be a
Chris@19 5234 `character(C_CHAR)' that is terminated by the `C_NULL_CHAR' character,
Chris@19 5235 like the `s' array above.
Chris@19 5236
Chris@19 5237 
Chris@19 5238 File: fftw3.info, Node: Wisdom Generic Export/Import from Fortran, Prev: Wisdom String Export/Import from Fortran, Up: Accessing the wisdom API from Fortran
Chris@19 5239
Chris@19 5240 7.6.3 Wisdom Generic Export/Import from Fortran
Chris@19 5241 -----------------------------------------------
Chris@19 5242
Chris@19 5243 The most generic wisdom export/import functions allow you to provide an
Chris@19 5244 arbitrary callback function to read/write one character at a time in
Chris@19 5245 any way you want. However, your callback function must be written in a
Chris@19 5246 special way, using the `bind(C)' attribute to be passed to a C
Chris@19 5247 interface.
Chris@19 5248
Chris@19 5249 In particular, to call the generic wisdom export function
Chris@19 5250 `fftw_export_wisdom', you would write a callback subroutine of the form:
Chris@19 5251
Chris@19 5252 subroutine my_write_char(c, p) bind(C)
Chris@19 5253 use, intrinsic :: iso_c_binding
Chris@19 5254 character(C_CHAR), value :: c
Chris@19 5255 type(C_PTR), value :: p
Chris@19 5256 _...write c..._
Chris@19 5257 end subroutine my_write_char
Chris@19 5258
Chris@19 5259 Given such a subroutine (along with the corresponding interface
Chris@19 5260 definition), you could then export wisdom using:
Chris@19 5261
Chris@19 5262 call fftw_export_wisdom(c_funloc(my_write_char), p)
Chris@19 5263
Chris@19 5264 The standard `c_funloc' intrinsic converts a Fortran `bind(C)'
Chris@19 5265 subroutine into a C function pointer. The parameter `p' is a
Chris@19 5266 `type(C_PTR)' to any arbitrary data that you want to pass to
Chris@19 5267 `my_write_char' (or `C_NULL_PTR' if none). (Note that you can get a C
Chris@19 5268 pointer to Fortran data using the intrinsic `c_loc', and convert it
Chris@19 5269 back to a Fortran pointer in `my_write_char' using `c_f_pointer'.)
Chris@19 5270
Chris@19 5271 Similarly, to use the generic `fftw_import_wisdom', you would define
Chris@19 5272 a callback function of the form:
Chris@19 5273
Chris@19 5274 integer(C_INT) function my_read_char(p) bind(C)
Chris@19 5275 use, intrinsic :: iso_c_binding
Chris@19 5276 type(C_PTR), value :: p
Chris@19 5277 character :: c
Chris@19 5278 _...read a character c..._
Chris@19 5279 my_read_char = ichar(c, C_INT)
Chris@19 5280 end function my_read_char
Chris@19 5281
Chris@19 5282 ....
Chris@19 5283
Chris@19 5284 integer(C_INT) :: ret
Chris@19 5285 ret = fftw_import_wisdom(c_funloc(my_read_char), p)
Chris@19 5286 if (ret .eq. 0) stop 'error importing wisdom'
Chris@19 5287
Chris@19 5288 Your function can return `-1' if the end of the input is reached.
Chris@19 5289 Again, `p' is an arbitrary `type(C_PTR' that is passed through to your
Chris@19 5290 function. `fftw_import_wisdom' returns `0' if an error occurred and
Chris@19 5291 nonzero otherwise.
Chris@19 5292
Chris@19 5293 
Chris@19 5294 File: fftw3.info, Node: Defining an FFTW module, Prev: Accessing the wisdom API from Fortran, Up: Calling FFTW from Modern Fortran
Chris@19 5295
Chris@19 5296 7.7 Defining an FFTW module
Chris@19 5297 ===========================
Chris@19 5298
Chris@19 5299 Rather than using the `include' statement to include the `fftw3.f03'
Chris@19 5300 interface file in any subroutine where you want to use FFTW, you might
Chris@19 5301 prefer to define an FFTW Fortran module. FFTW does not install itself
Chris@19 5302 as a module, primarily because `fftw3.f03' can be shared between
Chris@19 5303 different Fortran compilers while modules (in general) cannot.
Chris@19 5304 However, it is trivial to define your own FFTW module if you want.
Chris@19 5305 Just create a file containing:
Chris@19 5306
Chris@19 5307 module FFTW3
Chris@19 5308 use, intrinsic :: iso_c_binding
Chris@19 5309 include 'fftw3.f03'
Chris@19 5310 end module
Chris@19 5311
Chris@19 5312 Compile this file into a module as usual for your compiler (e.g. with
Chris@19 5313 `gfortran -c' you will get a file `fftw3.mod'). Now, instead of
Chris@19 5314 `include 'fftw3.f03'', whenever you want to use FFTW routines you can
Chris@19 5315 just do:
Chris@19 5316
Chris@19 5317 use FFTW3
Chris@19 5318
Chris@19 5319 as usual for Fortran modules. (You still need to link to the FFTW
Chris@19 5320 library, of course.)
Chris@19 5321
Chris@19 5322 
Chris@19 5323 File: fftw3.info, Node: Calling FFTW from Legacy Fortran, Next: Upgrading from FFTW version 2, Prev: Calling FFTW from Modern Fortran, Up: Top
Chris@19 5324
Chris@19 5325 8 Calling FFTW from Legacy Fortran
Chris@19 5326 **********************************
Chris@19 5327
Chris@19 5328 This chapter describes the interface to FFTW callable by Fortran code
Chris@19 5329 in older compilers not supporting the Fortran 2003 C interoperability
Chris@19 5330 features (*note Calling FFTW from Modern Fortran::). This interface
Chris@19 5331 has the major disadvantage that it is not type-checked, so if you
Chris@19 5332 mistake the argument types or ordering then your program will not have
Chris@19 5333 any compiler errors, and will likely crash at runtime. So, greater
Chris@19 5334 care is needed. Also, technically interfacing older Fortran versions
Chris@19 5335 to C is nonstandard, but in practice we have found that the techniques
Chris@19 5336 used in this chapter have worked with all known Fortran compilers for
Chris@19 5337 many years.
Chris@19 5338
Chris@19 5339 The legacy Fortran interface differs from the C interface only in the
Chris@19 5340 prefix (`dfftw_' instead of `fftw_' in double precision) and a few
Chris@19 5341 other minor details. This Fortran interface is included in the FFTW
Chris@19 5342 libraries by default, unless a Fortran compiler isn't found on your
Chris@19 5343 system or `--disable-fortran' is included in the `configure' flags. We
Chris@19 5344 assume here that the reader is already familiar with the usage of FFTW
Chris@19 5345 in C, as described elsewhere in this manual.
Chris@19 5346
Chris@19 5347 The MPI parallel interface to FFTW is _not_ currently available to
Chris@19 5348 legacy Fortran.
Chris@19 5349
Chris@19 5350 * Menu:
Chris@19 5351
Chris@19 5352 * Fortran-interface routines::
Chris@19 5353 * FFTW Constants in Fortran::
Chris@19 5354 * FFTW Execution in Fortran::
Chris@19 5355 * Fortran Examples::
Chris@19 5356 * Wisdom of Fortran?::
Chris@19 5357
Chris@19 5358 
Chris@19 5359 File: fftw3.info, Node: Fortran-interface routines, Next: FFTW Constants in Fortran, Prev: Calling FFTW from Legacy Fortran, Up: Calling FFTW from Legacy Fortran
Chris@19 5360
Chris@19 5361 8.1 Fortran-interface routines
Chris@19 5362 ==============================
Chris@19 5363
Chris@19 5364 Nearly all of the FFTW functions have Fortran-callable equivalents.
Chris@19 5365 The name of the legacy Fortran routine is the same as that of the
Chris@19 5366 corresponding C routine, but with the `fftw_' prefix replaced by
Chris@19 5367 `dfftw_'.(1) The single and long-double precision versions use
Chris@19 5368 `sfftw_' and `lfftw_', respectively, instead of `fftwf_' and `fftwl_';
Chris@19 5369 quadruple precision (`real*16') is available on some systems as
Chris@19 5370 `fftwq_' (*note Precision::). (Note that `long double' on x86 hardware
Chris@19 5371 is usually at most 80-bit extended precision, _not_ quadruple
Chris@19 5372 precision.)
Chris@19 5373
Chris@19 5374 For the most part, all of the arguments to the functions are the
Chris@19 5375 same, with the following exceptions:
Chris@19 5376
Chris@19 5377 * `plan' variables (what would be of type `fftw_plan' in C), must be
Chris@19 5378 declared as a type that is at least as big as a pointer (address)
Chris@19 5379 on your machine. We recommend using `integer*8' everywhere, since
Chris@19 5380 this should always be big enough.
Chris@19 5381
Chris@19 5382 * Any function that returns a value (e.g. `fftw_plan_dft') is
Chris@19 5383 converted into a _subroutine_. The return value is converted into
Chris@19 5384 an additional _first_ parameter of this subroutine.(2)
Chris@19 5385
Chris@19 5386 * The Fortran routines expect multi-dimensional arrays to be in
Chris@19 5387 _column-major_ order, which is the ordinary format of Fortran
Chris@19 5388 arrays (*note Multi-dimensional Array Format::). They do this
Chris@19 5389 transparently and costlessly simply by reversing the order of the
Chris@19 5390 dimensions passed to FFTW, but this has one important consequence
Chris@19 5391 for multi-dimensional real-complex transforms, discussed below.
Chris@19 5392
Chris@19 5393 * Wisdom import and export is somewhat more tricky because one cannot
Chris@19 5394 easily pass files or strings between C and Fortran; see *note
Chris@19 5395 Wisdom of Fortran?::.
Chris@19 5396
Chris@19 5397 * Legacy Fortran cannot use the `fftw_malloc' dynamic-allocation
Chris@19 5398 routine. If you want to exploit the SIMD FFTW (*note SIMD
Chris@19 5399 alignment and fftw_malloc::), you'll need to figure out some other
Chris@19 5400 way to ensure that your arrays are at least 16-byte aligned.
Chris@19 5401
Chris@19 5402 * Since Fortran 77 does not have data structures, the `fftw_iodim'
Chris@19 5403 structure from the guru interface (*note Guru vector and transform
Chris@19 5404 sizes::) must be split into separate arguments. In particular, any
Chris@19 5405 `fftw_iodim' array arguments in the C guru interface become three
Chris@19 5406 integer array arguments (`n', `is', and `os') in the Fortran guru
Chris@19 5407 interface, all of whose lengths should be equal to the
Chris@19 5408 corresponding `rank' argument.
Chris@19 5409
Chris@19 5410 * The guru planner interface in Fortran does _not_ do any automatic
Chris@19 5411 translation between column-major and row-major; you are responsible
Chris@19 5412 for setting the strides etcetera to correspond to your Fortran
Chris@19 5413 arrays. However, as a slight bug that we are preserving for
Chris@19 5414 backwards compatibility, the `plan_guru_r2r' in Fortran _does_
Chris@19 5415 reverse the order of its `kind' array parameter, so the `kind'
Chris@19 5416 array of that routine should be in the reverse of the order of the
Chris@19 5417 iodim arrays (see above).
Chris@19 5418
Chris@19 5419
Chris@19 5420 In general, you should take care to use Fortran data types that
Chris@19 5421 correspond to (i.e. are the same size as) the C types used by FFTW. In
Chris@19 5422 practice, this correspondence is usually straightforward (i.e.
Chris@19 5423 `integer' corresponds to `int', `real' corresponds to `float',
Chris@19 5424 etcetera). The native Fortran double/single-precision complex type
Chris@19 5425 should be compatible with `fftw_complex'/`fftwf_complex'. Such simple
Chris@19 5426 correspondences are assumed in the examples below.
Chris@19 5427
Chris@19 5428 ---------- Footnotes ----------
Chris@19 5429
Chris@19 5430 (1) Technically, Fortran 77 identifiers are not allowed to have more
Chris@19 5431 than 6 characters, nor may they contain underscores. Any compiler that
Chris@19 5432 enforces this limitation doesn't deserve to link to FFTW.
Chris@19 5433
Chris@19 5434 (2) The reason for this is that some Fortran implementations seem to
Chris@19 5435 have trouble with C function return values, and vice versa.
Chris@19 5436
Chris@19 5437 
Chris@19 5438 File: fftw3.info, Node: FFTW Constants in Fortran, Next: FFTW Execution in Fortran, Prev: Fortran-interface routines, Up: Calling FFTW from Legacy Fortran
Chris@19 5439
Chris@19 5440 8.2 FFTW Constants in Fortran
Chris@19 5441 =============================
Chris@19 5442
Chris@19 5443 When creating plans in FFTW, a number of constants are used to specify
Chris@19 5444 options, such as `FFTW_MEASURE' or `FFTW_ESTIMATE'. The same constants
Chris@19 5445 must be used with the wrapper routines, but of course the C header
Chris@19 5446 files where the constants are defined can't be incorporated directly
Chris@19 5447 into Fortran code.
Chris@19 5448
Chris@19 5449 Instead, we have placed Fortran equivalents of the FFTW constant
Chris@19 5450 definitions in the file `fftw3.f', which can be found in the same
Chris@19 5451 directory as `fftw3.h'. If your Fortran compiler supports a
Chris@19 5452 preprocessor of some sort, you should be able to `include' or
Chris@19 5453 `#include' this file; otherwise, you can paste it directly into your
Chris@19 5454 code.
Chris@19 5455
Chris@19 5456 In C, you combine different flags (like `FFTW_PRESERVE_INPUT' and
Chris@19 5457 `FFTW_MEASURE') using the ``|'' operator; in Fortran you should just
Chris@19 5458 use ``+''. (Take care not to add in the same flag more than once,
Chris@19 5459 though. Alternatively, you can use the `ior' intrinsic function
Chris@19 5460 standardized in Fortran 95.)
Chris@19 5461
Chris@19 5462 
Chris@19 5463 File: fftw3.info, Node: FFTW Execution in Fortran, Next: Fortran Examples, Prev: FFTW Constants in Fortran, Up: Calling FFTW from Legacy Fortran
Chris@19 5464
Chris@19 5465 8.3 FFTW Execution in Fortran
Chris@19 5466 =============================
Chris@19 5467
Chris@19 5468 In C, in order to use a plan, one normally calls `fftw_execute', which
Chris@19 5469 executes the plan to perform the transform on the input/output arrays
Chris@19 5470 passed when the plan was created (*note Using Plans::). The
Chris@19 5471 corresponding subroutine call in legacy Fortran is:
Chris@19 5472 call dfftw_execute(plan)
Chris@19 5473
Chris@19 5474 However, we have had reports that this causes problems with some
Chris@19 5475 recent optimizing Fortran compilers. The problem is, because the
Chris@19 5476 input/output arrays are not passed as explicit arguments to
Chris@19 5477 `dfftw_execute', the semantics of Fortran (unlike C) allow the compiler
Chris@19 5478 to assume that the input/output arrays are not changed by
Chris@19 5479 `dfftw_execute'. As a consequence, certain compilers end up optimizing
Chris@19 5480 out or repositioning the call to `dfftw_execute', assuming incorrectly
Chris@19 5481 that it does nothing.
Chris@19 5482
Chris@19 5483 There are various workarounds to this, but the safest and simplest
Chris@19 5484 thing is to not use `dfftw_execute' in Fortran. Instead, use the
Chris@19 5485 functions described in *note New-array Execute Functions::, which take
Chris@19 5486 the input/output arrays as explicit arguments. For example, if the
Chris@19 5487 plan is for a complex-data DFT and was created for the arrays `in' and
Chris@19 5488 `out', you would do:
Chris@19 5489 call dfftw_execute_dft(plan, in, out)
Chris@19 5490
Chris@19 5491 There are a few things to be careful of, however:
Chris@19 5492
Chris@19 5493 * You must use the correct type of execute function, matching the way
Chris@19 5494 the plan was created. Complex DFT plans should use
Chris@19 5495 `dfftw_execute_dft', Real-input (r2c) DFT plans should use use
Chris@19 5496 `dfftw_execute_dft_r2c', and real-output (c2r) DFT plans should
Chris@19 5497 use `dfftw_execute_dft_c2r'. The various r2r plans should use
Chris@19 5498 `dfftw_execute_r2r'.
Chris@19 5499
Chris@19 5500 * You should normally pass the same input/output arrays that were
Chris@19 5501 used when creating the plan. This is always safe.
Chris@19 5502
Chris@19 5503 * _If_ you pass _different_ input/output arrays compared to those
Chris@19 5504 used when creating the plan, you must abide by all the
Chris@19 5505 restrictions of the new-array execute functions (*note New-array
Chris@19 5506 Execute Functions::). The most difficult of these, in Fortran, is
Chris@19 5507 the requirement that the new arrays have the same alignment as the
Chris@19 5508 original arrays, because there seems to be no way in legacy
Chris@19 5509 Fortran to obtain guaranteed-aligned arrays (analogous to
Chris@19 5510 `fftw_malloc' in C). You can, of course, use the `FFTW_UNALIGNED'
Chris@19 5511 flag when creating the plan, in which case the plan does not
Chris@19 5512 depend on the alignment, but this may sacrifice substantial
Chris@19 5513 performance on architectures (like x86) with SIMD instructions
Chris@19 5514 (*note SIMD alignment and fftw_malloc::).
Chris@19 5515
Chris@19 5516
Chris@19 5517 
Chris@19 5518 File: fftw3.info, Node: Fortran Examples, Next: Wisdom of Fortran?, Prev: FFTW Execution in Fortran, Up: Calling FFTW from Legacy Fortran
Chris@19 5519
Chris@19 5520 8.4 Fortran Examples
Chris@19 5521 ====================
Chris@19 5522
Chris@19 5523 In C, you might have something like the following to transform a
Chris@19 5524 one-dimensional complex array:
Chris@19 5525
Chris@19 5526 fftw_complex in[N], out[N];
Chris@19 5527 fftw_plan plan;
Chris@19 5528
Chris@19 5529 plan = fftw_plan_dft_1d(N,in,out,FFTW_FORWARD,FFTW_ESTIMATE);
Chris@19 5530 fftw_execute(plan);
Chris@19 5531 fftw_destroy_plan(plan);
Chris@19 5532
Chris@19 5533 In Fortran, you would use the following to accomplish the same thing:
Chris@19 5534
Chris@19 5535 double complex in, out
Chris@19 5536 dimension in(N), out(N)
Chris@19 5537 integer*8 plan
Chris@19 5538
Chris@19 5539 call dfftw_plan_dft_1d(plan,N,in,out,FFTW_FORWARD,FFTW_ESTIMATE)
Chris@19 5540 call dfftw_execute_dft(plan, in, out)
Chris@19 5541 call dfftw_destroy_plan(plan)
Chris@19 5542
Chris@19 5543 Notice how all routines are called as Fortran subroutines, and the
Chris@19 5544 plan is returned via the first argument to `dfftw_plan_dft_1d'. Notice
Chris@19 5545 also that we changed `fftw_execute' to `dfftw_execute_dft' (*note FFTW
Chris@19 5546 Execution in Fortran::). To do the same thing, but using 8 threads in
Chris@19 5547 parallel (*note Multi-threaded FFTW::), you would simply prefix these
Chris@19 5548 calls with:
Chris@19 5549
Chris@19 5550 integer iret
Chris@19 5551 call dfftw_init_threads(iret)
Chris@19 5552 call dfftw_plan_with_nthreads(8)
Chris@19 5553
Chris@19 5554 (You might want to check the value of `iret': if it is zero, it
Chris@19 5555 indicates an unlikely error during thread initialization.)
Chris@19 5556
Chris@19 5557 To transform a three-dimensional array in-place with C, you might do:
Chris@19 5558
Chris@19 5559 fftw_complex arr[L][M][N];
Chris@19 5560 fftw_plan plan;
Chris@19 5561
Chris@19 5562 plan = fftw_plan_dft_3d(L,M,N, arr,arr,
Chris@19 5563 FFTW_FORWARD, FFTW_ESTIMATE);
Chris@19 5564 fftw_execute(plan);
Chris@19 5565 fftw_destroy_plan(plan);
Chris@19 5566
Chris@19 5567 In Fortran, you would use this instead:
Chris@19 5568
Chris@19 5569 double complex arr
Chris@19 5570 dimension arr(L,M,N)
Chris@19 5571 integer*8 plan
Chris@19 5572
Chris@19 5573 call dfftw_plan_dft_3d(plan, L,M,N, arr,arr,
Chris@19 5574 & FFTW_FORWARD, FFTW_ESTIMATE)
Chris@19 5575 call dfftw_execute_dft(plan, arr, arr)
Chris@19 5576 call dfftw_destroy_plan(plan)
Chris@19 5577
Chris@19 5578 Note that we pass the array dimensions in the "natural" order in
Chris@19 5579 both C and Fortran.
Chris@19 5580
Chris@19 5581 To transform a one-dimensional real array in Fortran, you might do:
Chris@19 5582
Chris@19 5583 double precision in
Chris@19 5584 dimension in(N)
Chris@19 5585 double complex out
Chris@19 5586 dimension out(N/2 + 1)
Chris@19 5587 integer*8 plan
Chris@19 5588
Chris@19 5589 call dfftw_plan_dft_r2c_1d(plan,N,in,out,FFTW_ESTIMATE)
Chris@19 5590 call dfftw_execute_dft_r2c(plan, in, out)
Chris@19 5591 call dfftw_destroy_plan(plan)
Chris@19 5592
Chris@19 5593 To transform a two-dimensional real array, out of place, you might
Chris@19 5594 use the following:
Chris@19 5595
Chris@19 5596 double precision in
Chris@19 5597 dimension in(M,N)
Chris@19 5598 double complex out
Chris@19 5599 dimension out(M/2 + 1, N)
Chris@19 5600 integer*8 plan
Chris@19 5601
Chris@19 5602 call dfftw_plan_dft_r2c_2d(plan,M,N,in,out,FFTW_ESTIMATE)
Chris@19 5603 call dfftw_execute_dft_r2c(plan, in, out)
Chris@19 5604 call dfftw_destroy_plan(plan)
Chris@19 5605
Chris@19 5606 *Important:* Notice that it is the _first_ dimension of the complex
Chris@19 5607 output array that is cut in half in Fortran, rather than the last
Chris@19 5608 dimension as in C. This is a consequence of the interface routines
Chris@19 5609 reversing the order of the array dimensions passed to FFTW so that the
Chris@19 5610 Fortran program can use its ordinary column-major order.
Chris@19 5611
Chris@19 5612 
Chris@19 5613 File: fftw3.info, Node: Wisdom of Fortran?, Prev: Fortran Examples, Up: Calling FFTW from Legacy Fortran
Chris@19 5614
Chris@19 5615 8.5 Wisdom of Fortran?
Chris@19 5616 ======================
Chris@19 5617
Chris@19 5618 In this section, we discuss how one can import/export FFTW wisdom
Chris@19 5619 (saved plans) to/from a Fortran program; we assume that the reader is
Chris@19 5620 already familiar with wisdom, as described in *note Words of
Chris@19 5621 Wisdom-Saving Plans::.
Chris@19 5622
Chris@19 5623 The basic problem is that is difficult to (portably) pass files and
Chris@19 5624 strings between Fortran and C, so we cannot provide a direct Fortran
Chris@19 5625 equivalent to the `fftw_export_wisdom_to_file', etcetera, functions.
Chris@19 5626 Fortran interfaces _are_ provided for the functions that do not take
Chris@19 5627 file/string arguments, however: `dfftw_import_system_wisdom',
Chris@19 5628 `dfftw_import_wisdom', `dfftw_export_wisdom', and `dfftw_forget_wisdom'.
Chris@19 5629
Chris@19 5630 So, for example, to import the system-wide wisdom, you would do:
Chris@19 5631
Chris@19 5632 integer isuccess
Chris@19 5633 call dfftw_import_system_wisdom(isuccess)
Chris@19 5634
Chris@19 5635 As usual, the C return value is turned into a first parameter;
Chris@19 5636 `isuccess' is non-zero on success and zero on failure (e.g. if there is
Chris@19 5637 no system wisdom installed).
Chris@19 5638
Chris@19 5639 If you want to import/export wisdom from/to an arbitrary file or
Chris@19 5640 elsewhere, you can employ the generic `dfftw_import_wisdom' and
Chris@19 5641 `dfftw_export_wisdom' functions, for which you must supply a subroutine
Chris@19 5642 to read/write one character at a time. The FFTW package contains an
Chris@19 5643 example file `doc/f77_wisdom.f' demonstrating how to implement
Chris@19 5644 `import_wisdom_from_file' and `export_wisdom_to_file' subroutines in
Chris@19 5645 this way. (These routines cannot be compiled into the FFTW library
Chris@19 5646 itself, lest all FFTW-using programs be required to link with the
Chris@19 5647 Fortran I/O library.)
Chris@19 5648
Chris@19 5649 
Chris@19 5650 File: fftw3.info, Node: Upgrading from FFTW version 2, Next: Installation and Customization, Prev: Calling FFTW from Legacy Fortran, Up: Top
Chris@19 5651
Chris@19 5652 9 Upgrading from FFTW version 2
Chris@19 5653 *******************************
Chris@19 5654
Chris@19 5655 In this chapter, we outline the process for updating codes designed for
Chris@19 5656 the older FFTW 2 interface to work with FFTW 3. The interface for FFTW
Chris@19 5657 3 is not backwards-compatible with the interface for FFTW 2 and earlier
Chris@19 5658 versions; codes written to use those versions will fail to link with
Chris@19 5659 FFTW 3. Nor is it possible to write "compatibility wrappers" to bridge
Chris@19 5660 the gap (at least not efficiently), because FFTW 3 has different
Chris@19 5661 semantics from previous versions. However, upgrading should be a
Chris@19 5662 straightforward process because the data formats are identical and the
Chris@19 5663 overall style of planning/execution is essentially the same.
Chris@19 5664
Chris@19 5665 Unlike FFTW 2, there are no separate header files for real and
Chris@19 5666 complex transforms (or even for different precisions) in FFTW 3; all
Chris@19 5667 interfaces are defined in the `<fftw3.h>' header file.
Chris@19 5668
Chris@19 5669 Numeric Types
Chris@19 5670 =============
Chris@19 5671
Chris@19 5672 The main difference in data types is that `fftw_complex' in FFTW 2 was
Chris@19 5673 defined as a `struct' with macros `c_re' and `c_im' for accessing the
Chris@19 5674 real/imaginary parts. (This is binary-compatible with FFTW 3 on any
Chris@19 5675 machine except perhaps for some older Crays in single precision.) The
Chris@19 5676 equivalent macros for FFTW 3 are:
Chris@19 5677
Chris@19 5678 #define c_re(c) ((c)[0])
Chris@19 5679 #define c_im(c) ((c)[1])
Chris@19 5680
Chris@19 5681 This does not work if you are using the C99 complex type, however,
Chris@19 5682 unless you insert a `double*' typecast into the above macros (*note
Chris@19 5683 Complex numbers::).
Chris@19 5684
Chris@19 5685 Also, FFTW 2 had an `fftw_real' typedef that was an alias for
Chris@19 5686 `double' (in double precision). In FFTW 3 you should just use `double'
Chris@19 5687 (or whatever precision you are employing).
Chris@19 5688
Chris@19 5689 Plans
Chris@19 5690 =====
Chris@19 5691
Chris@19 5692 The major difference between FFTW 2 and FFTW 3 is in the
Chris@19 5693 planning/execution division of labor. In FFTW 2, plans were found for a
Chris@19 5694 given transform size and type, and then could be applied to _any_
Chris@19 5695 arrays and for _any_ multiplicity/stride parameters. In FFTW 3, you
Chris@19 5696 specify the particular arrays, stride parameters, etcetera when
Chris@19 5697 creating the plan, and the plan is then executed for _those_ arrays
Chris@19 5698 (unless the guru interface is used) and _those_ parameters _only_.
Chris@19 5699 (FFTW 2 had "specific planner" routines that planned for a particular
Chris@19 5700 array and stride, but the plan could still be used for other arrays and
Chris@19 5701 strides.) That is, much of the information that was formerly specified
Chris@19 5702 at execution time is now specified at planning time.
Chris@19 5703
Chris@19 5704 Like FFTW 2's specific planner routines, the FFTW 3 planner
Chris@19 5705 overwrites the input/output arrays unless you use `FFTW_ESTIMATE'.
Chris@19 5706
Chris@19 5707 FFTW 2 had separate data types `fftw_plan', `fftwnd_plan',
Chris@19 5708 `rfftw_plan', and `rfftwnd_plan' for complex and real one- and
Chris@19 5709 multi-dimensional transforms, and each type had its own `destroy'
Chris@19 5710 function. In FFTW 3, all plans are of type `fftw_plan' and all are
Chris@19 5711 destroyed by `fftw_destroy_plan(plan)'.
Chris@19 5712
Chris@19 5713 Where you formerly used `fftw_create_plan' and `fftw_one' to plan
Chris@19 5714 and compute a single 1d transform, you would now use `fftw_plan_dft_1d'
Chris@19 5715 to plan the transform. If you used the generic `fftw' function to
Chris@19 5716 execute the transform with multiplicity (`howmany') and stride
Chris@19 5717 parameters, you would now use the advanced interface
Chris@19 5718 `fftw_plan_many_dft' to specify those parameters. The plans are now
Chris@19 5719 executed with `fftw_execute(plan)', which takes all of its parameters
Chris@19 5720 (including the input/output arrays) from the plan.
Chris@19 5721
Chris@19 5722 In-place transforms no longer interpret their output argument as
Chris@19 5723 scratch space, nor is there an `FFTW_IN_PLACE' flag. You simply pass
Chris@19 5724 the same pointer for both the input and output arguments. (Previously,
Chris@19 5725 the output `ostride' and `odist' parameters were ignored for in-place
Chris@19 5726 transforms; now, if they are specified via the advanced interface, they
Chris@19 5727 are significant even in the in-place case, although they should
Chris@19 5728 normally equal the corresponding input parameters.)
Chris@19 5729
Chris@19 5730 The `FFTW_ESTIMATE' and `FFTW_MEASURE' flags have the same meaning
Chris@19 5731 as before, although the planning time will differ. You may also
Chris@19 5732 consider using `FFTW_PATIENT', which is like `FFTW_MEASURE' except that
Chris@19 5733 it takes more time in order to consider a wider variety of algorithms.
Chris@19 5734
Chris@19 5735 For multi-dimensional complex DFTs, instead of `fftwnd_create_plan'
Chris@19 5736 (or `fftw2d_create_plan' or `fftw3d_create_plan'), followed by
Chris@19 5737 `fftwnd_one', you would use `fftw_plan_dft' (or `fftw_plan_dft_2d' or
Chris@19 5738 `fftw_plan_dft_3d'). followed by `fftw_execute'. If you used `fftwnd'
Chris@19 5739 to to specify strides etcetera, you would instead specify these via
Chris@19 5740 `fftw_plan_many_dft'.
Chris@19 5741
Chris@19 5742 The analogues to `rfftw_create_plan' and `rfftw_one' with
Chris@19 5743 `FFTW_REAL_TO_COMPLEX' or `FFTW_COMPLEX_TO_REAL' directions are
Chris@19 5744 `fftw_plan_r2r_1d' with kind `FFTW_R2HC' or `FFTW_HC2R', followed by
Chris@19 5745 `fftw_execute'. The stride etcetera arguments of `rfftw' are now in
Chris@19 5746 `fftw_plan_many_r2r'.
Chris@19 5747
Chris@19 5748 Instead of `rfftwnd_create_plan' (or `rfftw2d_create_plan' or
Chris@19 5749 `rfftw3d_create_plan') followed by `rfftwnd_one_real_to_complex' or
Chris@19 5750 `rfftwnd_one_complex_to_real', you now use `fftw_plan_dft_r2c' (or
Chris@19 5751 `fftw_plan_dft_r2c_2d' or `fftw_plan_dft_r2c_3d') or
Chris@19 5752 `fftw_plan_dft_c2r' (or `fftw_plan_dft_c2r_2d' or
Chris@19 5753 `fftw_plan_dft_c2r_3d'), respectively, followed by `fftw_execute'. As
Chris@19 5754 usual, the strides etcetera of `rfftwnd_real_to_complex' or
Chris@19 5755 `rfftwnd_complex_to_real' are no specified in the advanced planner
Chris@19 5756 routines, `fftw_plan_many_dft_r2c' or `fftw_plan_many_dft_c2r'.
Chris@19 5757
Chris@19 5758 Wisdom
Chris@19 5759 ======
Chris@19 5760
Chris@19 5761 In FFTW 2, you had to supply the `FFTW_USE_WISDOM' flag in order to use
Chris@19 5762 wisdom; in FFTW 3, wisdom is always used. (You could simulate the FFTW
Chris@19 5763 2 wisdom-less behavior by calling `fftw_forget_wisdom' after every
Chris@19 5764 planner call.)
Chris@19 5765
Chris@19 5766 The FFTW 3 wisdom import/export routines are almost the same as
Chris@19 5767 before (although the storage format is entirely different). There is
Chris@19 5768 one significant difference, however. In FFTW 2, the import routines
Chris@19 5769 would never read past the end of the wisdom, so you could store extra
Chris@19 5770 data beyond the wisdom in the same file, for example. In FFTW 3, the
Chris@19 5771 file-import routine may read up to a few hundred bytes past the end of
Chris@19 5772 the wisdom, so you cannot store other data just beyond it.(1)
Chris@19 5773
Chris@19 5774 Wisdom has been enhanced by additional humility in FFTW 3: whereas
Chris@19 5775 FFTW 2 would re-use wisdom for a given transform size regardless of the
Chris@19 5776 stride etc., in FFTW 3 wisdom is only used with the strides etc. for
Chris@19 5777 which it was created. Unfortunately, this means FFTW 3 has to create
Chris@19 5778 new plans from scratch more often than FFTW 2 (in FFTW 2, planning e.g.
Chris@19 5779 one transform of size 1024 also created wisdom for all smaller powers
Chris@19 5780 of 2, but this no longer occurs).
Chris@19 5781
Chris@19 5782 FFTW 3 also has the new routine `fftw_import_system_wisdom' to
Chris@19 5783 import wisdom from a standard system-wide location.
Chris@19 5784
Chris@19 5785 Memory allocation
Chris@19 5786 =================
Chris@19 5787
Chris@19 5788 In FFTW 3, we recommend allocating your arrays with `fftw_malloc' and
Chris@19 5789 deallocating them with `fftw_free'; this is not required, but allows
Chris@19 5790 optimal performance when SIMD acceleration is used. (Those two
Chris@19 5791 functions actually existed in FFTW 2, and worked the same way, but were
Chris@19 5792 not documented.)
Chris@19 5793
Chris@19 5794 In FFTW 2, there were `fftw_malloc_hook' and `fftw_free_hook'
Chris@19 5795 functions that allowed the user to replace FFTW's memory-allocation
Chris@19 5796 routines (e.g. to implement different error-handling, since by default
Chris@19 5797 FFTW prints an error message and calls `exit' to abort the program if
Chris@19 5798 `malloc' returns `NULL'). These hooks are not supported in FFTW 3;
Chris@19 5799 those few users who require this functionality can just directly modify
Chris@19 5800 the memory-allocation routines in FFTW (they are defined in
Chris@19 5801 `kernel/alloc.c').
Chris@19 5802
Chris@19 5803 Fortran interface
Chris@19 5804 =================
Chris@19 5805
Chris@19 5806 In FFTW 2, the subroutine names were obtained by replacing `fftw_' with
Chris@19 5807 `fftw_f77'; in FFTW 3, you replace `fftw_' with `dfftw_' (or `sfftw_'
Chris@19 5808 or `lfftw_', depending upon the precision).
Chris@19 5809
Chris@19 5810 In FFTW 3, we have begun recommending that you always declare the
Chris@19 5811 type used to store plans as `integer*8'. (Too many people didn't notice
Chris@19 5812 our instruction to switch from `integer' to `integer*8' for 64-bit
Chris@19 5813 machines.)
Chris@19 5814
Chris@19 5815 In FFTW 3, we provide a `fftw3.f' "header file" to include in your
Chris@19 5816 code (and which is officially installed on Unix systems). (In FFTW 2,
Chris@19 5817 we supplied a `fftw_f77.i' file, but it was not installed.)
Chris@19 5818
Chris@19 5819 Otherwise, the C-Fortran interface relationship is much the same as
Chris@19 5820 it was before (e.g. return values become initial parameters, and
Chris@19 5821 multi-dimensional arrays are in column-major order). Unlike FFTW 2, we
Chris@19 5822 do provide some support for wisdom import/export in Fortran (*note
Chris@19 5823 Wisdom of Fortran?::).
Chris@19 5824
Chris@19 5825 Threads
Chris@19 5826 =======
Chris@19 5827
Chris@19 5828 Like FFTW 2, only the execution routines are thread-safe. All planner
Chris@19 5829 routines, etcetera, should be called by only a single thread at a time
Chris@19 5830 (*note Thread safety::). _Unlike_ FFTW 2, there is no special
Chris@19 5831 `FFTW_THREADSAFE' flag for the planner to allow a given plan to be
Chris@19 5832 usable by multiple threads in parallel; this is now the case by default.
Chris@19 5833
Chris@19 5834 The multi-threaded version of FFTW 2 required you to pass the number
Chris@19 5835 of threads each time you execute the transform. The number of threads
Chris@19 5836 is now stored in the plan, and is specified before the planner is
Chris@19 5837 called by `fftw_plan_with_nthreads'. The threads initialization
Chris@19 5838 routine used to be called `fftw_threads_init' and would return zero on
Chris@19 5839 success; the new routine is called `fftw_init_threads' and returns zero
Chris@19 5840 on failure. *Note Multi-threaded FFTW::.
Chris@19 5841
Chris@19 5842 There is no separate threads header file in FFTW 3; all the function
Chris@19 5843 prototypes are in `<fftw3.h>'. However, you still have to link to a
Chris@19 5844 separate library (`-lfftw3_threads -lfftw3 -lm' on Unix), as well as to
Chris@19 5845 the threading library (e.g. POSIX threads on Unix).
Chris@19 5846
Chris@19 5847 ---------- Footnotes ----------
Chris@19 5848
Chris@19 5849 (1) We do our own buffering because GNU libc I/O routines are
Chris@19 5850 horribly slow for single-character I/O, apparently for thread-safety
Chris@19 5851 reasons (whether you are using threads or not).
Chris@19 5852
Chris@19 5853 
Chris@19 5854 File: fftw3.info, Node: Installation and Customization, Next: Acknowledgments, Prev: Upgrading from FFTW version 2, Up: Top
Chris@19 5855
Chris@19 5856 10 Installation and Customization
Chris@19 5857 *********************************
Chris@19 5858
Chris@19 5859 This chapter describes the installation and customization of FFTW, the
Chris@19 5860 latest version of which may be downloaded from the FFTW home page
Chris@19 5861 (http://www.fftw.org).
Chris@19 5862
Chris@19 5863 In principle, FFTW should work on any system with an ANSI C compiler
Chris@19 5864 (`gcc' is fine). However, planner time is drastically reduced if FFTW
Chris@19 5865 can exploit a hardware cycle counter; FFTW comes with cycle-counter
Chris@19 5866 support for all modern general-purpose CPUs, but you may need to add a
Chris@19 5867 couple of lines of code if your compiler is not yet supported (*note
Chris@19 5868 Cycle Counters::). (On Unix, there will be a warning at the end of the
Chris@19 5869 `configure' output if no cycle counter is found.)
Chris@19 5870
Chris@19 5871 Installation of FFTW is simplest if you have a Unix or a GNU system,
Chris@19 5872 such as GNU/Linux, and we describe this case in the first section below,
Chris@19 5873 including the use of special configuration options to e.g. install
Chris@19 5874 different precisions or exploit optimizations for particular
Chris@19 5875 architectures (e.g. SIMD). Compilation on non-Unix systems is a more
Chris@19 5876 manual process, but we outline the procedure in the second section. It
Chris@19 5877 is also likely that pre-compiled binaries will be available for popular
Chris@19 5878 systems.
Chris@19 5879
Chris@19 5880 Finally, we describe how you can customize FFTW for particular needs
Chris@19 5881 by generating _codelets_ for fast transforms of sizes not supported
Chris@19 5882 efficiently by the standard FFTW distribution.
Chris@19 5883
Chris@19 5884 * Menu:
Chris@19 5885
Chris@19 5886 * Installation on Unix::
Chris@19 5887 * Installation on non-Unix systems::
Chris@19 5888 * Cycle Counters::
Chris@19 5889 * Generating your own code::
Chris@19 5890
Chris@19 5891 
Chris@19 5892 File: fftw3.info, Node: Installation on Unix, Next: Installation on non-Unix systems, Prev: Installation and Customization, Up: Installation and Customization
Chris@19 5893
Chris@19 5894 10.1 Installation on Unix
Chris@19 5895 =========================
Chris@19 5896
Chris@19 5897 FFTW comes with a `configure' program in the GNU style. Installation
Chris@19 5898 can be as simple as:
Chris@19 5899
Chris@19 5900 ./configure
Chris@19 5901 make
Chris@19 5902 make install
Chris@19 5903
Chris@19 5904 This will build the uniprocessor complex and real transform libraries
Chris@19 5905 along with the test programs. (We recommend that you use GNU `make' if
Chris@19 5906 it is available; on some systems it is called `gmake'.) The "`make
Chris@19 5907 install'" command installs the fftw and rfftw libraries in standard
Chris@19 5908 places, and typically requires root privileges (unless you specify a
Chris@19 5909 different install directory with the `--prefix' flag to `configure').
Chris@19 5910 You can also type "`make check'" to put the FFTW test programs through
Chris@19 5911 their paces. If you have problems during configuration or compilation,
Chris@19 5912 you may want to run "`make distclean'" before trying again; this
Chris@19 5913 ensures that you don't have any stale files left over from previous
Chris@19 5914 compilation attempts.
Chris@19 5915
Chris@19 5916 The `configure' script chooses the `gcc' compiler by default, if it
Chris@19 5917 is available; you can select some other compiler with:
Chris@19 5918 ./configure CC="<the name of your C compiler>"
Chris@19 5919
Chris@19 5920 The `configure' script knows good `CFLAGS' (C compiler flags) for a
Chris@19 5921 few systems. If your system is not known, the `configure' script will
Chris@19 5922 print out a warning. In this case, you should re-configure FFTW with
Chris@19 5923 the command
Chris@19 5924 ./configure CFLAGS="<write your CFLAGS here>"
Chris@19 5925 and then compile as usual. If you do find an optimal set of
Chris@19 5926 `CFLAGS' for your system, please let us know what they are (along with
Chris@19 5927 the output of `config.guess') so that we can include them in future
Chris@19 5928 releases.
Chris@19 5929
Chris@19 5930 `configure' supports all the standard flags defined by the GNU
Chris@19 5931 Coding Standards; see the `INSTALL' file in FFTW or the GNU web page
Chris@19 5932 (http://www.gnu.org/prep/standards/html_node/index.html). Note
Chris@19 5933 especially `--help' to list all flags and `--enable-shared' to create
Chris@19 5934 shared, rather than static, libraries. `configure' also accepts a few
Chris@19 5935 FFTW-specific flags, particularly:
Chris@19 5936
Chris@19 5937 * `--enable-float': Produces a single-precision version of FFTW
Chris@19 5938 (`float') instead of the default double-precision (`double').
Chris@19 5939 *Note Precision::.
Chris@19 5940
Chris@19 5941 * `--enable-long-double': Produces a long-double precision version of
Chris@19 5942 FFTW (`long double') instead of the default double-precision
Chris@19 5943 (`double'). The `configure' script will halt with an error
Chris@19 5944 message if `long double' is the same size as `double' on your
Chris@19 5945 machine/compiler. *Note Precision::.
Chris@19 5946
Chris@19 5947 * `--enable-quad-precision': Produces a quadruple-precision version
Chris@19 5948 of FFTW using the nonstandard `__float128' type provided by `gcc'
Chris@19 5949 4.6 or later on x86, x86-64, and Itanium architectures, instead of
Chris@19 5950 the default double-precision (`double'). The `configure' script
Chris@19 5951 will halt with an error message if the compiler is not `gcc'
Chris@19 5952 version 4.6 or later or if `gcc''s `libquadmath' library is not
Chris@19 5953 installed. *Note Precision::.
Chris@19 5954
Chris@19 5955 * `--enable-threads': Enables compilation and installation of the
Chris@19 5956 FFTW threads library (*note Multi-threaded FFTW::), which provides
Chris@19 5957 a simple interface to parallel transforms for SMP systems. By
Chris@19 5958 default, the threads routines are not compiled.
Chris@19 5959
Chris@19 5960 * `--enable-openmp': Like `--enable-threads', but using OpenMP
Chris@19 5961 compiler directives in order to induce parallelism rather than
Chris@19 5962 spawning its own threads directly, and installing an `fftw3_omp'
Chris@19 5963 library rather than an `fftw3_threads' library (*note
Chris@19 5964 Multi-threaded FFTW::). You can use both `--enable-openmp' and
Chris@19 5965 `--enable-threads' since they compile/install libraries with
Chris@19 5966 different names. By default, the OpenMP routines are not compiled.
Chris@19 5967
Chris@19 5968 * `--with-combined-threads': By default, if `--enable-threads' is
Chris@19 5969 used, the threads support is compiled into a separate library that
Chris@19 5970 must be linked in addition to the main FFTW library. This is so
Chris@19 5971 that users of the serial library do not need to link the system
Chris@19 5972 threads libraries. If `--with-combined-threads' is specified,
Chris@19 5973 however, then no separate threads library is created, and threads
Chris@19 5974 are included in the main FFTW library. This is mainly useful
Chris@19 5975 under Windows, where no system threads library is required and
Chris@19 5976 inter-library dependencies are problematic.
Chris@19 5977
Chris@19 5978 * `--enable-mpi': Enables compilation and installation of the FFTW
Chris@19 5979 MPI library (*note Distributed-memory FFTW with MPI::), which
Chris@19 5980 provides parallel transforms for distributed-memory systems with
Chris@19 5981 MPI. (By default, the MPI routines are not compiled.) *Note FFTW
Chris@19 5982 MPI Installation::.
Chris@19 5983
Chris@19 5984 * `--disable-fortran': Disables inclusion of legacy-Fortran wrapper
Chris@19 5985 routines (*note Calling FFTW from Legacy Fortran::) in the standard
Chris@19 5986 FFTW libraries. These wrapper routines increase the library size
Chris@19 5987 by only a negligible amount, so they are included by default as
Chris@19 5988 long as the `configure' script finds a Fortran compiler on your
Chris@19 5989 system. (To specify a particular Fortran compiler foo, pass
Chris@19 5990 `F77='foo to `configure'.)
Chris@19 5991
Chris@19 5992 * `--with-g77-wrappers': By default, when Fortran wrappers are
Chris@19 5993 included, the wrappers employ the linking conventions of the
Chris@19 5994 Fortran compiler detected by the `configure' script. If this
Chris@19 5995 compiler is GNU `g77', however, then _two_ versions of the
Chris@19 5996 wrappers are included: one with `g77''s idiosyncratic convention
Chris@19 5997 of appending two underscores to identifiers, and one with the more
Chris@19 5998 common convention of appending only a single underscore. This
Chris@19 5999 way, the same FFTW library will work with both `g77' and other
Chris@19 6000 Fortran compilers, such as GNU `gfortran'. However, the converse
Chris@19 6001 is not true: if you configure with a different compiler, then the
Chris@19 6002 `g77'-compatible wrappers are not included. By specifying
Chris@19 6003 `--with-g77-wrappers', the `g77'-compatible wrappers are included
Chris@19 6004 in addition to wrappers for whatever Fortran compiler `configure'
Chris@19 6005 finds.
Chris@19 6006
Chris@19 6007 * `--with-slow-timer': Disables the use of hardware cycle counters,
Chris@19 6008 and falls back on `gettimeofday' or `clock'. This greatly worsens
Chris@19 6009 performance, and should generally not be used (unless you don't
Chris@19 6010 have a cycle counter but still really want an optimized plan
Chris@19 6011 regardless of the time). *Note Cycle Counters::.
Chris@19 6012
Chris@19 6013 * `--enable-sse', `--enable-sse2', `--enable-avx',
Chris@19 6014 `--enable-altivec', `--enable-neon': Enable the compilation of
Chris@19 6015 SIMD code for SSE (Pentium III+), SSE2 (Pentium IV+), AVX (Sandy
Chris@19 6016 Bridge, Interlagos), AltiVec (PowerPC G4+), NEON (some ARM
Chris@19 6017 processors). SSE, AltiVec, and NEON only work with
Chris@19 6018 `--enable-float' (above). SSE2 works in both single and double
Chris@19 6019 precision (and is simply SSE in single precision). The resulting
Chris@19 6020 code will _still work_ on earlier CPUs lacking the SIMD extensions
Chris@19 6021 (SIMD is automatically disabled, although the FFTW library is
Chris@19 6022 still larger).
Chris@19 6023 - These options require a compiler supporting SIMD extensions,
Chris@19 6024 and compiler support is always a bit flaky: see the FFTW FAQ
Chris@19 6025 for a list of compiler versions that have problems compiling
Chris@19 6026 FFTW.
Chris@19 6027
Chris@19 6028 - With AltiVec and `gcc', you may have to use the
Chris@19 6029 `-mabi=altivec' option when compiling any code that links to
Chris@19 6030 FFTW, in order to properly align the stack; otherwise, FFTW
Chris@19 6031 could crash when it tries to use an AltiVec feature. (This
Chris@19 6032 is not necessary on MacOS X.)
Chris@19 6033
Chris@19 6034 - With SSE/SSE2 and `gcc', you should use a version of gcc that
Chris@19 6035 properly aligns the stack when compiling any code that links
Chris@19 6036 to FFTW. By default, `gcc' 2.95 and later versions align the
Chris@19 6037 stack as needed, but you should not compile FFTW with the
Chris@19 6038 `-Os' option or the `-mpreferred-stack-boundary' option with
Chris@19 6039 an argument less than 4.
Chris@19 6040
Chris@19 6041 - Because of the large variety of ARM processors and ABIs, FFTW
Chris@19 6042 does not attempt to guess the correct `gcc' flags for
Chris@19 6043 generating NEON code. In general, you will have to provide
Chris@19 6044 them on the command line. This command line is known to have
Chris@19 6045 worked at least once:
Chris@19 6046 ./configure --with-slow-timer --host=arm-linux-gnueabi \
Chris@19 6047 --enable-single --enable-neon \
Chris@19 6048 "CC=arm-linux-gnueabi-gcc -march=armv7-a -mfloat-abi=softfp"
Chris@19 6049
Chris@19 6050
Chris@19 6051 To force `configure' to use a particular C compiler foo (instead of
Chris@19 6052 the default, usually `gcc'), pass `CC='foo to the `configure' script;
Chris@19 6053 you may also need to set the flags via the variable `CFLAGS' as
Chris@19 6054 described above.
Chris@19 6055
Chris@19 6056 
Chris@19 6057 File: fftw3.info, Node: Installation on non-Unix systems, Next: Cycle Counters, Prev: Installation on Unix, Up: Installation and Customization
Chris@19 6058
Chris@19 6059 10.2 Installation on non-Unix systems
Chris@19 6060 =====================================
Chris@19 6061
Chris@19 6062 It should be relatively straightforward to compile FFTW even on non-Unix
Chris@19 6063 systems lacking the niceties of a `configure' script. Basically, you
Chris@19 6064 need to edit the `config.h' header (copy it from `config.h.in') to
Chris@19 6065 `#define' the various options and compiler characteristics, and then
Chris@19 6066 compile all the `.c' files in the relevant directories.
Chris@19 6067
Chris@19 6068 The `config.h' header contains about 100 options to set, each one
Chris@19 6069 initially an `#undef', each documented with a comment, and most of them
Chris@19 6070 fairly obvious. For most of the options, you should simply `#define'
Chris@19 6071 them to `1' if they are applicable, although a few options require a
Chris@19 6072 particular value (e.g. `SIZEOF_LONG_LONG' should be defined to the size
Chris@19 6073 of the `long long' type, in bytes, or zero if it is not supported). We
Chris@19 6074 will likely post some sample `config.h' files for various operating
Chris@19 6075 systems and compilers for you to use (at least as a starting point).
Chris@19 6076 Please let us know if you have to hand-create a configuration file
Chris@19 6077 (and/or a pre-compiled binary) that you want to share.
Chris@19 6078
Chris@19 6079 To create the FFTW library, you will then need to compile all of the
Chris@19 6080 `.c' files in the `kernel', `dft', `dft/scalar', `dft/scalar/codelets',
Chris@19 6081 `rdft', `rdft/scalar', `rdft/scalar/r2cf', `rdft/scalar/r2cb',
Chris@19 6082 `rdft/scalar/r2r', `reodft', and `api' directories. If you are
Chris@19 6083 compiling with SIMD support (e.g. you defined `HAVE_SSE2' in
Chris@19 6084 `config.h'), then you also need to compile the `.c' files in the
Chris@19 6085 `simd-support', `{dft,rdft}/simd', `{dft,rdft}/simd/*' directories.
Chris@19 6086
Chris@19 6087 Once these files are all compiled, link them into a library, or a
Chris@19 6088 shared library, or directly into your program.
Chris@19 6089
Chris@19 6090 To compile the FFTW test program, additionally compile the code in
Chris@19 6091 the `libbench2/' directory, and link it into a library. Then compile
Chris@19 6092 the code in the `tests/' directory and link it to the `libbench2' and
Chris@19 6093 FFTW libraries. To compile the `fftw-wisdom' (command-line) tool
Chris@19 6094 (*note Wisdom Utilities::), compile `tools/fftw-wisdom.c' and link it
Chris@19 6095 to the `libbench2' and FFTW libraries
Chris@19 6096
Chris@19 6097 
Chris@19 6098 File: fftw3.info, Node: Cycle Counters, Next: Generating your own code, Prev: Installation on non-Unix systems, Up: Installation and Customization
Chris@19 6099
Chris@19 6100 10.3 Cycle Counters
Chris@19 6101 ===================
Chris@19 6102
Chris@19 6103 FFTW's planner actually executes and times different possible FFT
Chris@19 6104 algorithms in order to pick the fastest plan for a given n. In order
Chris@19 6105 to do this in as short a time as possible, however, the timer must have
Chris@19 6106 a very high resolution, and to accomplish this we employ the hardware
Chris@19 6107 "cycle counters" that are available on most CPUs. Currently, FFTW
Chris@19 6108 supports the cycle counters on x86, PowerPC/POWER, Alpha, UltraSPARC
Chris@19 6109 (SPARC v9), IA64, PA-RISC, and MIPS processors.
Chris@19 6110
Chris@19 6111 Access to the cycle counters, unfortunately, is a compiler and/or
Chris@19 6112 operating-system dependent task, often requiring inline assembly
Chris@19 6113 language, and it may be that your compiler is not supported. If you are
Chris@19 6114 _not_ supported, FFTW will by default fall back on its estimator
Chris@19 6115 (effectively using `FFTW_ESTIMATE' for all plans).
Chris@19 6116
Chris@19 6117 You can add support by editing the file `kernel/cycle.h'; normally,
Chris@19 6118 this will involve adapting one of the examples already present in order
Chris@19 6119 to use the inline-assembler syntax for your C compiler, and will only
Chris@19 6120 require a couple of lines of code. Anyone adding support for a new
Chris@19 6121 system to `cycle.h' is encouraged to email us at <fftw@fftw.org>.
Chris@19 6122
Chris@19 6123 If a cycle counter is not available on your system (e.g. some
Chris@19 6124 embedded processor), and you don't want to use estimated plans, as a
Chris@19 6125 last resort you can use the `--with-slow-timer' option to `configure'
Chris@19 6126 (on Unix) or `#define WITH_SLOW_TIMER' in `config.h' (elsewhere). This
Chris@19 6127 will use the much lower-resolution `gettimeofday' function, or even
Chris@19 6128 `clock' if the former is unavailable, and planning will be extremely
Chris@19 6129 slow.
Chris@19 6130
Chris@19 6131 
Chris@19 6132 File: fftw3.info, Node: Generating your own code, Prev: Cycle Counters, Up: Installation and Customization
Chris@19 6133
Chris@19 6134 10.4 Generating your own code
Chris@19 6135 =============================
Chris@19 6136
Chris@19 6137 The directory `genfft' contains the programs that were used to generate
Chris@19 6138 FFTW's "codelets," which are hard-coded transforms of small sizes. We
Chris@19 6139 do not expect casual users to employ the generator, which is a rather
Chris@19 6140 sophisticated program that generates directed acyclic graphs of FFT
Chris@19 6141 algorithms and performs algebraic simplifications on them. It was
Chris@19 6142 written in Objective Caml, a dialect of ML, which is available at
Chris@19 6143 `http://caml.inria.fr/ocaml/index.en.html'.
Chris@19 6144
Chris@19 6145 If you have Objective Caml installed (along with recent versions of
Chris@19 6146 GNU `autoconf', `automake', and `libtool'), then you can change the set
Chris@19 6147 of codelets that are generated or play with the generation options.
Chris@19 6148 The set of generated codelets is specified by the
Chris@19 6149 `{dft,rdft}/{codelets,simd}/*/Makefile.am' files. For example, you can
Chris@19 6150 add efficient REDFT codelets of small sizes by modifying
Chris@19 6151 `rdft/codelets/r2r/Makefile.am'. After you modify any `Makefile.am'
Chris@19 6152 files, you can type `sh bootstrap.sh' in the top-level directory
Chris@19 6153 followed by `make' to re-generate the files.
Chris@19 6154
Chris@19 6155 We do not provide more details about the code-generation process,
Chris@19 6156 since we do not expect that most users will need to generate their own
Chris@19 6157 code. However, feel free to contact us at <fftw@fftw.org> if you are
Chris@19 6158 interested in the subject.
Chris@19 6159
Chris@19 6160 You might find it interesting to learn Caml and/or some modern
Chris@19 6161 programming techniques that we used in the generator (including monadic
Chris@19 6162 programming), especially if you heard the rumor that Java and
Chris@19 6163 object-oriented programming are the latest advancement in the field.
Chris@19 6164 The internal operation of the codelet generator is described in the
Chris@19 6165 paper, "A Fast Fourier Transform Compiler," by M. Frigo, which is
Chris@19 6166 available from the FFTW home page (http://www.fftw.org) and also
Chris@19 6167 appeared in the `Proceedings of the 1999 ACM SIGPLAN Conference on
Chris@19 6168 Programming Language Design and Implementation (PLDI)'.
Chris@19 6169
Chris@19 6170 
Chris@19 6171 File: fftw3.info, Node: Acknowledgments, Next: License and Copyright, Prev: Installation and Customization, Up: Top
Chris@19 6172
Chris@19 6173 11 Acknowledgments
Chris@19 6174 ******************
Chris@19 6175
Chris@19 6176 Matteo Frigo was supported in part by the Special Research Program SFB
Chris@19 6177 F011 "AURORA" of the Austrian Science Fund FWF and by MIT Lincoln
Chris@19 6178 Laboratory. For previous versions of FFTW, he was supported in part by
Chris@19 6179 the Defense Advanced Research Projects Agency (DARPA), under Grants
Chris@19 6180 N00014-94-1-0985 and F30602-97-1-0270, and by a Digital Equipment
Chris@19 6181 Corporation Fellowship.
Chris@19 6182
Chris@19 6183 Steven G. Johnson was supported in part by a Dept. of Defense NDSEG
Chris@19 6184 Fellowship, an MIT Karl Taylor Compton Fellowship, and by the Materials
Chris@19 6185 Research Science and Engineering Center program of the National Science
Chris@19 6186 Foundation under award DMR-9400334.
Chris@19 6187
Chris@19 6188 Code for the Cell Broadband Engine was graciously donated to the FFTW
Chris@19 6189 project by the IBM Austin Research Lab and included in fftw-3.2. (This
Chris@19 6190 code was removed in fftw-3.3.)
Chris@19 6191
Chris@19 6192 Code for the MIPS paired-single SIMD support was graciously donated
Chris@19 6193 to the FFTW project by CodeSourcery, Inc.
Chris@19 6194
Chris@19 6195 We are grateful to Sun Microsystems Inc. for its donation of a
Chris@19 6196 cluster of 9 8-processor Ultra HPC 5000 SMPs (24 Gflops peak). These
Chris@19 6197 machines served as the primary platform for the development of early
Chris@19 6198 versions of FFTW.
Chris@19 6199
Chris@19 6200 We thank Intel Corporation for donating a four-processor Pentium Pro
Chris@19 6201 machine. We thank the GNU/Linux community for giving us a decent OS to
Chris@19 6202 run on that machine.
Chris@19 6203
Chris@19 6204 We are thankful to the AMD corporation for donating an AMD Athlon XP
Chris@19 6205 1700+ computer to the FFTW project.
Chris@19 6206
Chris@19 6207 We thank the Compaq/HP testdrive program and VA Software Corporation
Chris@19 6208 (SourceForge.net) for providing remote access to machines that were used
Chris@19 6209 to test FFTW.
Chris@19 6210
Chris@19 6211 The `genfft' suite of code generators was written using Objective
Chris@19 6212 Caml, a dialect of ML. Objective Caml is a small and elegant language
Chris@19 6213 developed by Xavier Leroy. The implementation is available from
Chris@19 6214 `http://caml.inria.fr/' (http://caml.inria.fr/). In previous releases
Chris@19 6215 of FFTW, `genfft' was written in Caml Light, by the same authors. An
Chris@19 6216 even earlier implementation of `genfft' was written in Scheme, but Caml
Chris@19 6217 is definitely better for this kind of application.
Chris@19 6218
Chris@19 6219 FFTW uses many tools from the GNU project, including `automake',
Chris@19 6220 `texinfo', and `libtool'.
Chris@19 6221
Chris@19 6222 Prof. Charles E. Leiserson of MIT provided continuous support and
Chris@19 6223 encouragement. This program would not exist without him. Charles also
Chris@19 6224 proposed the name "codelets" for the basic FFT blocks.
Chris@19 6225
Chris@19 6226 Prof. John D. Joannopoulos of MIT demonstrated continuing tolerance
Chris@19 6227 of Steven's "extra-curricular" computer-science activities, as well as
Chris@19 6228 remarkable creativity in working them into his grant proposals.
Chris@19 6229 Steven's physics degree would not exist without him.
Chris@19 6230
Chris@19 6231 Franz Franchetti wrote SIMD extensions to FFTW 2, which eventually
Chris@19 6232 led to the SIMD support in FFTW 3.
Chris@19 6233
Chris@19 6234 Stefan Kral wrote most of the K7 code generator distributed with FFTW
Chris@19 6235 3.0.x and 3.1.x.
Chris@19 6236
Chris@19 6237 Andrew Sterian contributed the Windows timing code in FFTW 2.
Chris@19 6238
Chris@19 6239 Didier Miras reported a bug in the test procedure used in FFTW 1.2.
Chris@19 6240 We now use a completely different test algorithm by Funda Ergun that
Chris@19 6241 does not require a separate FFT program to compare against.
Chris@19 6242
Chris@19 6243 Wolfgang Reimer contributed the Pentium cycle counter and a few fixes
Chris@19 6244 that help portability.
Chris@19 6245
Chris@19 6246 Ming-Chang Liu uncovered a well-hidden bug in the complex transforms
Chris@19 6247 of FFTW 2.0 and supplied a patch to correct it.
Chris@19 6248
Chris@19 6249 The FFTW FAQ was written in `bfnn' (Bizarre Format With No Name) and
Chris@19 6250 formatted using the tools developed by Ian Jackson for the Linux FAQ.
Chris@19 6251
Chris@19 6252 _We are especially thankful to all of our users for their continuing
Chris@19 6253 support, feedback, and interest during our development of FFTW._
Chris@19 6254
Chris@19 6255 
Chris@19 6256 File: fftw3.info, Node: License and Copyright, Next: Concept Index, Prev: Acknowledgments, Up: Top
Chris@19 6257
Chris@19 6258 12 License and Copyright
Chris@19 6259 ************************
Chris@19 6260
Chris@19 6261 FFTW is Copyright (C) 2003, 2007-11 Matteo Frigo, Copyright (C) 2003,
Chris@19 6262 2007-11 Massachusetts Institute of Technology.
Chris@19 6263
Chris@19 6264 FFTW is free software; you can redistribute it and/or modify it
Chris@19 6265 under the terms of the GNU General Public License as published by the
Chris@19 6266 Free Software Foundation; either version 2 of the License, or (at your
Chris@19 6267 option) any later version.
Chris@19 6268
Chris@19 6269 This program is distributed in the hope that it will be useful, but
Chris@19 6270 WITHOUT ANY WARRANTY; without even the implied warranty of
Chris@19 6271 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Chris@19 6272 General Public License for more details.
Chris@19 6273
Chris@19 6274 You should have received a copy of the GNU General Public License
Chris@19 6275 along with this program; if not, write to the Free Software Foundation,
Chris@19 6276 Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA You
Chris@19 6277 can also find the GPL on the GNU web site
Chris@19 6278 (http://www.gnu.org/licenses/gpl-2.0.html).
Chris@19 6279
Chris@19 6280 In addition, we kindly ask you to acknowledge FFTW and its authors in
Chris@19 6281 any program or publication in which you use FFTW. (You are not
Chris@19 6282 _required_ to do so; it is up to your common sense to decide whether
Chris@19 6283 you want to comply with this request or not.) For general
Chris@19 6284 publications, we suggest referencing: Matteo Frigo and Steven G.
Chris@19 6285 Johnson, "The design and implementation of FFTW3," Proc. IEEE 93 (2),
Chris@19 6286 216-231 (2005).
Chris@19 6287
Chris@19 6288 Non-free versions of FFTW are available under terms different from
Chris@19 6289 those of the General Public License. (e.g. they do not require you to
Chris@19 6290 accompany any object code using FFTW with the corresponding source
Chris@19 6291 code.) For these alternative terms you must purchase a license from
Chris@19 6292 MIT's Technology Licensing Office. Users interested in such a license
Chris@19 6293 should contact us (<fftw@fftw.org>) for more information.
Chris@19 6294