cannam@127: @node Tutorial, Other Important Topics, Introduction, Top cannam@127: @chapter Tutorial cannam@127: @menu cannam@127: * Complex One-Dimensional DFTs:: cannam@127: * Complex Multi-Dimensional DFTs:: cannam@127: * One-Dimensional DFTs of Real Data:: cannam@127: * Multi-Dimensional DFTs of Real Data:: cannam@127: * More DFTs of Real Data:: cannam@127: @end menu cannam@127: cannam@127: This chapter describes the basic usage of FFTW, i.e., how to compute cannam@127: @cindex basic interface cannam@127: the Fourier transform of a single array. This chapter tells the cannam@127: truth, but not the @emph{whole} truth. Specifically, FFTW implements cannam@127: additional routines and flags that are not documented here, although cannam@127: in many cases we try to indicate where added capabilities exist. For cannam@127: more complete information, see @ref{FFTW Reference}. (Note that you cannam@127: need to compile and install FFTW before you can use it in a program. cannam@127: For the details of the installation, see @ref{Installation and cannam@127: Customization}.) cannam@127: cannam@127: We recommend that you read this tutorial in order.@footnote{You can cannam@127: read the tutorial in bit-reversed order after computing your first cannam@127: transform.} At the least, read the first section (@pxref{Complex cannam@127: One-Dimensional DFTs}) before reading any of the others, even if your cannam@127: main interest lies in one of the other transform types. cannam@127: cannam@127: Users of FFTW version 2 and earlier may also want to read @ref{Upgrading cannam@127: from FFTW version 2}. cannam@127: cannam@127: @c ------------------------------------------------------------ cannam@127: @node Complex One-Dimensional DFTs, Complex Multi-Dimensional DFTs, Tutorial, Tutorial cannam@127: @section Complex One-Dimensional DFTs cannam@127: cannam@127: @quotation cannam@127: Plan: To bother about the best method of accomplishing an accidental result. cannam@127: [Ambrose Bierce, @cite{The Enlarged Devil's Dictionary}.] cannam@127: @cindex Devil cannam@127: @end quotation cannam@127: cannam@127: @iftex cannam@127: @medskip cannam@127: @end iftex cannam@127: cannam@127: The basic usage of FFTW to compute a one-dimensional DFT of size cannam@127: @code{N} is simple, and it typically looks something like this code: cannam@127: cannam@127: @example cannam@127: #include cannam@127: ... cannam@127: @{ cannam@127: fftw_complex *in, *out; cannam@127: fftw_plan p; cannam@127: ... cannam@127: in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N); cannam@127: out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N); cannam@127: p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE); cannam@127: ... cannam@127: fftw_execute(p); /* @r{repeat as needed} */ cannam@127: ... cannam@127: fftw_destroy_plan(p); cannam@127: fftw_free(in); fftw_free(out); cannam@127: @} cannam@127: @end example cannam@127: cannam@127: You must link this code with the @code{fftw3} library. On Unix systems, cannam@127: link with @code{-lfftw3 -lm}. cannam@127: cannam@127: The example code first allocates the input and output arrays. You can cannam@127: allocate them in any way that you like, but we recommend using cannam@127: @code{fftw_malloc}, which behaves like cannam@127: @findex fftw_malloc cannam@127: @code{malloc} except that it properly aligns the array when SIMD cannam@127: instructions (such as SSE and Altivec) are available (@pxref{SIMD cannam@127: alignment and fftw_malloc}). [Alternatively, we provide a convenient wrapper function @code{fftw_alloc_complex(N)} which has the same effect.] cannam@127: @findex fftw_alloc_complex cannam@127: @cindex SIMD cannam@127: cannam@127: cannam@127: The data is an array of type @code{fftw_complex}, which is by default a cannam@127: @code{double[2]} composed of the real (@code{in[i][0]}) and imaginary cannam@127: (@code{in[i][1]}) parts of a complex number. cannam@127: @tindex fftw_complex cannam@127: cannam@127: The next step is to create a @dfn{plan}, which is an object cannam@127: @cindex plan cannam@127: that contains all the data that FFTW needs to compute the FFT. cannam@127: This function creates the plan: cannam@127: cannam@127: @example cannam@127: fftw_plan fftw_plan_dft_1d(int n, fftw_complex *in, fftw_complex *out, cannam@127: int sign, unsigned flags); cannam@127: @end example cannam@127: @findex fftw_plan_dft_1d cannam@127: @tindex fftw_plan cannam@127: cannam@127: The first argument, @code{n}, is the size of the transform you are cannam@127: trying to compute. The size @code{n} can be any positive integer, but cannam@127: sizes that are products of small factors are transformed most cannam@127: efficiently (although prime sizes still use an @Onlogn{} algorithm). cannam@127: cannam@127: The next two arguments are pointers to the input and output arrays of cannam@127: the transform. These pointers can be equal, indicating an cannam@127: @dfn{in-place} transform. cannam@127: @cindex in-place cannam@127: cannam@127: cannam@127: The fourth argument, @code{sign}, can be either @code{FFTW_FORWARD} cannam@127: (@code{-1}) or @code{FFTW_BACKWARD} (@code{+1}), cannam@127: @ctindex FFTW_FORWARD cannam@127: @ctindex FFTW_BACKWARD cannam@127: and indicates the direction of the transform you are interested in; cannam@127: technically, it is the sign of the exponent in the transform. cannam@127: cannam@127: The @code{flags} argument is usually either @code{FFTW_MEASURE} or cannam@127: @cindex flags cannam@127: @code{FFTW_ESTIMATE}. @code{FFTW_MEASURE} instructs FFTW to run cannam@127: @ctindex FFTW_MEASURE cannam@127: and measure the execution time of several FFTs in order to find the cannam@127: best way to compute the transform of size @code{n}. This process takes cannam@127: some time (usually a few seconds), depending on your machine and on cannam@127: the size of the transform. @code{FFTW_ESTIMATE}, on the contrary, cannam@127: does not run any computation and just builds a cannam@127: @ctindex FFTW_ESTIMATE cannam@127: reasonable plan that is probably sub-optimal. In short, if your cannam@127: program performs many transforms of the same size and initialization cannam@127: time is not important, use @code{FFTW_MEASURE}; otherwise use the cannam@127: estimate. cannam@127: cannam@127: @emph{You must create the plan before initializing the input}, because cannam@127: @code{FFTW_MEASURE} overwrites the @code{in}/@code{out} arrays. cannam@127: (Technically, @code{FFTW_ESTIMATE} does not touch your arrays, but you cannam@127: should always create plans first just to be sure.) cannam@127: cannam@127: Once the plan has been created, you can use it as many times as you cannam@127: like for transforms on the specified @code{in}/@code{out} arrays, cannam@127: computing the actual transforms via @code{fftw_execute(plan)}: cannam@127: @example cannam@127: void fftw_execute(const fftw_plan plan); cannam@127: @end example cannam@127: @findex fftw_execute cannam@127: cannam@127: The DFT results are stored in-order in the array @code{out}, with the cannam@127: zero-frequency (DC) component in @code{out[0]}. cannam@127: @cindex frequency cannam@127: If @code{in != out}, the transform is @dfn{out-of-place} and the input cannam@127: array @code{in} is not modified. Otherwise, the input array is cannam@127: overwritten with the transform. cannam@127: cannam@127: @cindex execute cannam@127: If you want to transform a @emph{different} array of the same size, you cannam@127: can create a new plan with @code{fftw_plan_dft_1d} and FFTW cannam@127: automatically reuses the information from the previous plan, if cannam@127: possible. Alternatively, with the ``guru'' interface you can apply a cannam@127: given plan to a different array, if you are careful. cannam@127: @xref{FFTW Reference}. cannam@127: cannam@127: When you are done with the plan, you deallocate it by calling cannam@127: @code{fftw_destroy_plan(plan)}: cannam@127: @example cannam@127: void fftw_destroy_plan(fftw_plan plan); cannam@127: @end example cannam@127: @findex fftw_destroy_plan cannam@127: If you allocate an array with @code{fftw_malloc()} you must deallocate cannam@127: it with @code{fftw_free()}. Do not use @code{free()} or, heaven cannam@127: forbid, @code{delete}. cannam@127: @findex fftw_free cannam@127: cannam@127: FFTW computes an @emph{unnormalized} DFT. Thus, computing a forward cannam@127: followed by a backward transform (or vice versa) results in the original cannam@127: array scaled by @code{n}. For the definition of the DFT, see @ref{What cannam@127: FFTW Really Computes}. cannam@127: @cindex DFT cannam@127: @cindex normalization cannam@127: cannam@127: cannam@127: If you have a C compiler, such as @code{gcc}, that supports the cannam@127: C99 standard, and you @code{#include } @emph{before} cannam@127: @code{}, then @code{fftw_complex} is the native cannam@127: double-precision complex type and you can manipulate it with ordinary cannam@127: arithmetic. Otherwise, FFTW defines its own complex type, which is cannam@127: bit-compatible with the C99 complex type. @xref{Complex numbers}. cannam@127: (The C++ @code{} template class may also be usable via a cannam@127: typecast.) cannam@127: @cindex C++ cannam@127: cannam@127: To use single or long-double precision versions of FFTW, replace the cannam@127: @code{fftw_} prefix by @code{fftwf_} or @code{fftwl_} and link with cannam@127: @code{-lfftw3f} or @code{-lfftw3l}, but use the @emph{same} cannam@127: @code{} header file. cannam@127: @cindex precision cannam@127: cannam@127: cannam@127: Many more flags exist besides @code{FFTW_MEASURE} and cannam@127: @code{FFTW_ESTIMATE}. For example, use @code{FFTW_PATIENT} if you're cannam@127: willing to wait even longer for a possibly even faster plan (@pxref{FFTW cannam@127: Reference}). cannam@127: @ctindex FFTW_PATIENT cannam@127: You can also save plans for future use, as described by @ref{Words of cannam@127: Wisdom-Saving Plans}. cannam@127: cannam@127: @c ------------------------------------------------------------ cannam@127: @node Complex Multi-Dimensional DFTs, One-Dimensional DFTs of Real Data, Complex One-Dimensional DFTs, Tutorial cannam@127: @section Complex Multi-Dimensional DFTs cannam@127: cannam@127: Multi-dimensional transforms work much the same way as one-dimensional cannam@127: transforms: you allocate arrays of @code{fftw_complex} (preferably cannam@127: using @code{fftw_malloc}), create an @code{fftw_plan}, execute it as cannam@127: many times as you want with @code{fftw_execute(plan)}, and clean up cannam@127: with @code{fftw_destroy_plan(plan)} (and @code{fftw_free}). cannam@127: cannam@127: FFTW provides two routines for creating plans for 2d and 3d transforms, cannam@127: and one routine for creating plans of arbitrary dimensionality. cannam@127: The 2d and 3d routines have the following signature: cannam@127: @example cannam@127: fftw_plan fftw_plan_dft_2d(int n0, int n1, cannam@127: fftw_complex *in, fftw_complex *out, cannam@127: int sign, unsigned flags); cannam@127: fftw_plan fftw_plan_dft_3d(int n0, int n1, int n2, cannam@127: fftw_complex *in, fftw_complex *out, cannam@127: int sign, unsigned flags); cannam@127: @end example cannam@127: @findex fftw_plan_dft_2d cannam@127: @findex fftw_plan_dft_3d cannam@127: cannam@127: These routines create plans for @code{n0} by @code{n1} two-dimensional cannam@127: (2d) transforms and @code{n0} by @code{n1} by @code{n2} 3d transforms, cannam@127: respectively. All of these transforms operate on contiguous arrays in cannam@127: the C-standard @dfn{row-major} order, so that the last dimension has the cannam@127: fastest-varying index in the array. This layout is described further in cannam@127: @ref{Multi-dimensional Array Format}. cannam@127: cannam@127: FFTW can also compute transforms of higher dimensionality. In order to cannam@127: avoid confusion between the various meanings of the the word cannam@127: ``dimension'', we use the term @emph{rank} cannam@127: @cindex rank cannam@127: to denote the number of independent indices in an array.@footnote{The cannam@127: term ``rank'' is commonly used in the APL, FORTRAN, and Common Lisp cannam@127: traditions, although it is not so common in the C@tie{}world.} For cannam@127: example, we say that a 2d transform has rank@tie{}2, a 3d transform has cannam@127: rank@tie{}3, and so on. You can plan transforms of arbitrary rank by cannam@127: means of the following function: cannam@127: cannam@127: @example cannam@127: fftw_plan fftw_plan_dft(int rank, const int *n, cannam@127: fftw_complex *in, fftw_complex *out, cannam@127: int sign, unsigned flags); cannam@127: @end example cannam@127: @findex fftw_plan_dft cannam@127: cannam@127: Here, @code{n} is a pointer to an array @code{n[rank]} denoting an cannam@127: @code{n[0]} by @code{n[1]} by @dots{} by @code{n[rank-1]} transform. cannam@127: Thus, for example, the call cannam@127: @example cannam@127: fftw_plan_dft_2d(n0, n1, in, out, sign, flags); cannam@127: @end example cannam@127: is equivalent to the following code fragment: cannam@127: @example cannam@127: int n[2]; cannam@127: n[0] = n0; cannam@127: n[1] = n1; cannam@127: fftw_plan_dft(2, n, in, out, sign, flags); cannam@127: @end example cannam@127: @code{fftw_plan_dft} is not restricted to 2d and 3d transforms, cannam@127: however, but it can plan transforms of arbitrary rank. cannam@127: cannam@127: You may have noticed that all the planner routines described so far cannam@127: have overlapping functionality. For example, you can plan a 1d or 2d cannam@127: transform by using @code{fftw_plan_dft} with a @code{rank} of @code{1} cannam@127: or @code{2}, or even by calling @code{fftw_plan_dft_3d} with @code{n0} cannam@127: and/or @code{n1} equal to @code{1} (with no loss in efficiency). This cannam@127: pattern continues, and FFTW's planning routines in general form a cannam@127: ``partial order,'' sequences of cannam@127: @cindex partial order cannam@127: interfaces with strictly increasing generality but correspondingly cannam@127: greater complexity. cannam@127: cannam@127: @code{fftw_plan_dft} is the most general complex-DFT routine that we cannam@127: describe in this tutorial, but there are also the advanced and guru interfaces, cannam@127: @cindex advanced interface cannam@127: @cindex guru interface cannam@127: which allow one to efficiently combine multiple/strided transforms cannam@127: into a single FFTW plan, transform a subset of a larger cannam@127: multi-dimensional array, and/or to handle more general complex-number cannam@127: formats. For more information, see @ref{FFTW Reference}. cannam@127: cannam@127: @c ------------------------------------------------------------ cannam@127: @node One-Dimensional DFTs of Real Data, Multi-Dimensional DFTs of Real Data, Complex Multi-Dimensional DFTs, Tutorial cannam@127: @section One-Dimensional DFTs of Real Data cannam@127: cannam@127: In many practical applications, the input data @code{in[i]} are purely cannam@127: real numbers, in which case the DFT output satisfies the ``Hermitian'' cannam@127: @cindex Hermitian cannam@127: redundancy: @code{out[i]} is the conjugate of @code{out[n-i]}. It is cannam@127: possible to take advantage of these circumstances in order to achieve cannam@127: roughly a factor of two improvement in both speed and memory usage. cannam@127: cannam@127: In exchange for these speed and space advantages, the user sacrifices cannam@127: some of the simplicity of FFTW's complex transforms. First of all, the cannam@127: input and output arrays are of @emph{different sizes and types}: the cannam@127: input is @code{n} real numbers, while the output is @code{n/2+1} cannam@127: complex numbers (the non-redundant outputs); this also requires slight cannam@127: ``padding'' of the input array for cannam@127: @cindex padding cannam@127: in-place transforms. Second, the inverse transform (complex to real) cannam@127: has the side-effect of @emph{overwriting its input array}, by default. cannam@127: Neither of these inconveniences should pose a serious problem for cannam@127: users, but it is important to be aware of them. cannam@127: cannam@127: The routines to perform real-data transforms are almost the same as cannam@127: those for complex transforms: you allocate arrays of @code{double} cannam@127: and/or @code{fftw_complex} (preferably using @code{fftw_malloc} or cannam@127: @code{fftw_alloc_complex}), create an @code{fftw_plan}, execute it as cannam@127: many times as you want with @code{fftw_execute(plan)}, and clean up cannam@127: with @code{fftw_destroy_plan(plan)} (and @code{fftw_free}). The only cannam@127: differences are that the input (or output) is of type @code{double} cannam@127: and there are new routines to create the plan. In one dimension: cannam@127: cannam@127: @example cannam@127: fftw_plan fftw_plan_dft_r2c_1d(int n, double *in, fftw_complex *out, cannam@127: unsigned flags); cannam@127: fftw_plan fftw_plan_dft_c2r_1d(int n, fftw_complex *in, double *out, cannam@127: unsigned flags); cannam@127: @end example cannam@127: @findex fftw_plan_dft_r2c_1d cannam@127: @findex fftw_plan_dft_c2r_1d cannam@127: cannam@127: for the real input to complex-Hermitian output (@dfn{r2c}) and cannam@127: complex-Hermitian input to real output (@dfn{c2r}) transforms. cannam@127: @cindex r2c cannam@127: @cindex c2r cannam@127: Unlike the complex DFT planner, there is no @code{sign} argument. cannam@127: Instead, r2c DFTs are always @code{FFTW_FORWARD} and c2r DFTs are cannam@127: always @code{FFTW_BACKWARD}. cannam@127: @ctindex FFTW_FORWARD cannam@127: @ctindex FFTW_BACKWARD cannam@127: (For single/long-double precision cannam@127: @code{fftwf} and @code{fftwl}, @code{double} should be replaced by cannam@127: @code{float} and @code{long double}, respectively.) cannam@127: @cindex precision cannam@127: cannam@127: cannam@127: Here, @code{n} is the ``logical'' size of the DFT, not necessarily the cannam@127: physical size of the array. In particular, the real (@code{double}) cannam@127: array has @code{n} elements, while the complex (@code{fftw_complex}) cannam@127: array has @code{n/2+1} elements (where the division is rounded down). cannam@127: For an in-place transform, cannam@127: @cindex in-place cannam@127: @code{in} and @code{out} are aliased to the same array, which must be cannam@127: big enough to hold both; so, the real array would actually have cannam@127: @code{2*(n/2+1)} elements, where the elements beyond the first cannam@127: @code{n} are unused padding. (Note that this is very different from cannam@127: the concept of ``zero-padding'' a transform to a larger length, which cannam@127: changes the logical size of the DFT by actually adding new input cannam@127: data.) The @math{k}th element of the complex array is exactly the cannam@127: same as the @math{k}th element of the corresponding complex DFT. All cannam@127: positive @code{n} are supported; products of small factors are most cannam@127: efficient, but an @Onlogn algorithm is used even for prime sizes. cannam@127: cannam@127: As noted above, the c2r transform destroys its input array even for cannam@127: out-of-place transforms. This can be prevented, if necessary, by cannam@127: including @code{FFTW_PRESERVE_INPUT} in the @code{flags}, with cannam@127: unfortunately some sacrifice in performance. cannam@127: @cindex flags cannam@127: @ctindex FFTW_PRESERVE_INPUT cannam@127: This flag is also not currently supported for multi-dimensional real cannam@127: DFTs (next section). cannam@127: cannam@127: Readers familiar with DFTs of real data will recall that the 0th (the cannam@127: ``DC'') and @code{n/2}-th (the ``Nyquist'' frequency, when @code{n} is cannam@127: even) elements of the complex output are purely real. Some cannam@127: implementations therefore store the Nyquist element where the DC cannam@127: imaginary part would go, in order to make the input and output arrays cannam@127: the same size. Such packing, however, does not generalize well to cannam@127: multi-dimensional transforms, and the space savings are miniscule in cannam@127: any case; FFTW does not support it. cannam@127: cannam@127: An alternative interface for one-dimensional r2c and c2r DFTs can be cannam@127: found in the @samp{r2r} interface (@pxref{The Halfcomplex-format cannam@127: DFT}), with ``halfcomplex''-format output that @emph{is} the same size cannam@127: (and type) as the input array. cannam@127: @cindex halfcomplex format cannam@127: That interface, although it is not very useful for multi-dimensional cannam@127: transforms, may sometimes yield better performance. cannam@127: cannam@127: @c ------------------------------------------------------------ cannam@127: @node Multi-Dimensional DFTs of Real Data, More DFTs of Real Data, One-Dimensional DFTs of Real Data, Tutorial cannam@127: @section Multi-Dimensional DFTs of Real Data cannam@127: cannam@127: Multi-dimensional DFTs of real data use the following planner routines: cannam@127: cannam@127: @example cannam@127: fftw_plan fftw_plan_dft_r2c_2d(int n0, int n1, cannam@127: double *in, fftw_complex *out, cannam@127: unsigned flags); cannam@127: fftw_plan fftw_plan_dft_r2c_3d(int n0, int n1, int n2, cannam@127: double *in, fftw_complex *out, cannam@127: unsigned flags); cannam@127: fftw_plan fftw_plan_dft_r2c(int rank, const int *n, cannam@127: double *in, fftw_complex *out, cannam@127: unsigned flags); cannam@127: @end example cannam@127: @findex fftw_plan_dft_r2c_2d cannam@127: @findex fftw_plan_dft_r2c_3d cannam@127: @findex fftw_plan_dft_r2c cannam@127: cannam@127: as well as the corresponding @code{c2r} routines with the input/output cannam@127: types swapped. These routines work similarly to their complex cannam@127: analogues, except for the fact that here the complex output array is cut cannam@127: roughly in half and the real array requires padding for in-place cannam@127: transforms (as in 1d, above). cannam@127: cannam@127: As before, @code{n} is the logical size of the array, and the cannam@127: consequences of this on the the format of the complex arrays deserve cannam@127: careful attention. cannam@127: @cindex r2c/c2r multi-dimensional array format cannam@127: Suppose that the real data has dimensions @ndims (in row-major order). cannam@127: Then, after an r2c transform, the output is an @ndimshalf array of cannam@127: @code{fftw_complex} values in row-major order, corresponding to slightly cannam@127: over half of the output of the corresponding complex DFT. (The division cannam@127: is rounded down.) The ordering of the data is otherwise exactly the cannam@127: same as in the complex-DFT case. cannam@127: cannam@127: For out-of-place transforms, this is the end of the story: the real cannam@127: data is stored as a row-major array of size @ndims and the complex cannam@127: data is stored as a row-major array of size @ndimshalf{}. cannam@127: cannam@127: For in-place transforms, however, extra padding of the real-data array cannam@127: is necessary because the complex array is larger than the real array, cannam@127: and the two arrays share the same memory locations. Thus, for cannam@127: in-place transforms, the final dimension of the real-data array must cannam@127: be padded with extra values to accommodate the size of the complex cannam@127: data---two values if the last dimension is even and one if it is odd. cannam@127: @cindex padding cannam@127: That is, the last dimension of the real data must physically contain cannam@127: @tex cannam@127: $2 (n_{d-1}/2+1)$ cannam@127: @end tex cannam@127: @ifinfo cannam@127: 2 * (n[d-1]/2+1) cannam@127: @end ifinfo cannam@127: @html cannam@127: 2 * (nd-1/2+1) cannam@127: @end html cannam@127: @code{double} values (exactly enough to hold the complex data). cannam@127: This physical array size does not, however, change the @emph{logical} cannam@127: array size---only cannam@127: @tex cannam@127: $n_{d-1}$ cannam@127: @end tex cannam@127: @ifinfo cannam@127: n[d-1] cannam@127: @end ifinfo cannam@127: @html cannam@127: nd-1 cannam@127: @end html cannam@127: values are actually stored in the last dimension, and cannam@127: @tex cannam@127: $n_{d-1}$ cannam@127: @end tex cannam@127: @ifinfo cannam@127: n[d-1] cannam@127: @end ifinfo cannam@127: @html cannam@127: nd-1 cannam@127: @end html cannam@127: is the last dimension passed to the plan-creation routine. cannam@127: cannam@127: For example, consider the transform of a two-dimensional real array of cannam@127: size @code{n0} by @code{n1}. The output of the r2c transform is a cannam@127: two-dimensional complex array of size @code{n0} by @code{n1/2+1}, where cannam@127: the @code{y} dimension has been cut nearly in half because of cannam@127: redundancies in the output. Because @code{fftw_complex} is twice the cannam@127: size of @code{double}, the output array is slightly bigger than the cannam@127: input array. Thus, if we want to compute the transform in place, we cannam@127: must @emph{pad} the input array so that it is of size @code{n0} by cannam@127: @code{2*(n1/2+1)}. If @code{n1} is even, then there are two padding cannam@127: elements at the end of each row (which need not be initialized, as they cannam@127: are only used for output). cannam@127: cannam@127: @ifhtml cannam@127: The following illustration depicts the input and output arrays just cannam@127: described, for both the out-of-place and in-place transforms (with the cannam@127: arrows indicating consecutive memory locations): cannam@127: @image{rfftwnd-for-html} cannam@127: @end ifhtml cannam@127: @ifnotinfo cannam@127: @ifnothtml cannam@127: @float Figure,fig:rfftwnd cannam@127: @center @image{rfftwnd} cannam@127: @caption{Illustration of the data layout for a 2d @code{nx} by @code{ny} cannam@127: real-to-complex transform.} cannam@127: @end float cannam@127: @ref{fig:rfftwnd} depicts the input and output arrays just cannam@127: described, for both the out-of-place and in-place transforms (with the cannam@127: arrows indicating consecutive memory locations): cannam@127: @end ifnothtml cannam@127: @end ifnotinfo cannam@127: cannam@127: These transforms are unnormalized, so an r2c followed by a c2r cannam@127: transform (or vice versa) will result in the original data scaled by cannam@127: the number of real data elements---that is, the product of the cannam@127: (logical) dimensions of the real data. cannam@127: @cindex normalization cannam@127: cannam@127: cannam@127: (Because the last dimension is treated specially, if it is equal to cannam@127: @code{1} the transform is @emph{not} equivalent to a lower-dimensional cannam@127: r2c/c2r transform. In that case, the last complex dimension also has cannam@127: size @code{1} (@code{=1/2+1}), and no advantage is gained over the cannam@127: complex transforms.) cannam@127: cannam@127: @c ------------------------------------------------------------ cannam@127: @node More DFTs of Real Data, , Multi-Dimensional DFTs of Real Data, Tutorial cannam@127: @section More DFTs of Real Data cannam@127: @menu cannam@127: * The Halfcomplex-format DFT:: cannam@127: * Real even/odd DFTs (cosine/sine transforms):: cannam@127: * The Discrete Hartley Transform:: cannam@127: @end menu cannam@127: cannam@127: FFTW supports several other transform types via a unified @dfn{r2r} cannam@127: (real-to-real) interface, cannam@127: @cindex r2r cannam@127: so called because it takes a real (@code{double}) array and outputs a cannam@127: real array of the same size. These r2r transforms currently fall into cannam@127: three categories: DFTs of real input and complex-Hermitian output in cannam@127: halfcomplex format, DFTs of real input with even/odd symmetry cannam@127: (a.k.a. discrete cosine/sine transforms, DCTs/DSTs), and discrete cannam@127: Hartley transforms (DHTs), all described in more detail by the cannam@127: following sections. cannam@127: cannam@127: The r2r transforms follow the by now familiar interface of creating an cannam@127: @code{fftw_plan}, executing it with @code{fftw_execute(plan)}, and cannam@127: destroying it with @code{fftw_destroy_plan(plan)}. Furthermore, all cannam@127: r2r transforms share the same planner interface: cannam@127: cannam@127: @example cannam@127: fftw_plan fftw_plan_r2r_1d(int n, double *in, double *out, cannam@127: fftw_r2r_kind kind, unsigned flags); cannam@127: fftw_plan fftw_plan_r2r_2d(int n0, int n1, double *in, double *out, cannam@127: fftw_r2r_kind kind0, fftw_r2r_kind kind1, cannam@127: unsigned flags); cannam@127: fftw_plan fftw_plan_r2r_3d(int n0, int n1, int n2, cannam@127: double *in, double *out, cannam@127: fftw_r2r_kind kind0, cannam@127: fftw_r2r_kind kind1, cannam@127: fftw_r2r_kind kind2, cannam@127: unsigned flags); cannam@127: fftw_plan fftw_plan_r2r(int rank, const int *n, double *in, double *out, cannam@127: const fftw_r2r_kind *kind, unsigned flags); cannam@127: @end example cannam@127: @findex fftw_plan_r2r_1d cannam@127: @findex fftw_plan_r2r_2d cannam@127: @findex fftw_plan_r2r_3d cannam@127: @findex fftw_plan_r2r cannam@127: cannam@127: Just as for the complex DFT, these plan 1d/2d/3d/multi-dimensional cannam@127: transforms for contiguous arrays in row-major order, transforming (real) cannam@127: input to output of the same size, where @code{n} specifies the cannam@127: @emph{physical} dimensions of the arrays. All positive @code{n} are cannam@127: supported (with the exception of @code{n=1} for the @code{FFTW_REDFT00} cannam@127: kind, noted in the real-even subsection below); products of small cannam@127: factors are most efficient (factorizing @code{n-1} and @code{n+1} for cannam@127: @code{FFTW_REDFT00} and @code{FFTW_RODFT00} kinds, described below), but cannam@127: an @Onlogn algorithm is used even for prime sizes. cannam@127: cannam@127: Each dimension has a @dfn{kind} parameter, of type cannam@127: @code{fftw_r2r_kind}, specifying the kind of r2r transform to be used cannam@127: for that dimension. cannam@127: @cindex kind (r2r) cannam@127: @tindex fftw_r2r_kind cannam@127: (In the case of @code{fftw_plan_r2r}, this is an array @code{kind[rank]} cannam@127: where @code{kind[i]} is the transform kind for the dimension cannam@127: @code{n[i]}.) The kind can be one of a set of predefined constants, cannam@127: defined in the following subsections. cannam@127: cannam@127: In other words, FFTW computes the separable product of the specified cannam@127: r2r transforms over each dimension, which can be used e.g. for partial cannam@127: differential equations with mixed boundary conditions. (For some r2r cannam@127: kinds, notably the halfcomplex DFT and the DHT, such a separable cannam@127: product is somewhat problematic in more than one dimension, however, cannam@127: as is described below.) cannam@127: cannam@127: In the current version of FFTW, all r2r transforms except for the cannam@127: halfcomplex type are computed via pre- or post-processing of cannam@127: halfcomplex transforms, and they are therefore not as fast as they cannam@127: could be. Since most other general DCT/DST codes employ a similar cannam@127: algorithm, however, FFTW's implementation should provide at least cannam@127: competitive performance. cannam@127: cannam@127: @c =========> cannam@127: @node The Halfcomplex-format DFT, Real even/odd DFTs (cosine/sine transforms), More DFTs of Real Data, More DFTs of Real Data cannam@127: @subsection The Halfcomplex-format DFT cannam@127: cannam@127: An r2r kind of @code{FFTW_R2HC} (@dfn{r2hc}) corresponds to an r2c DFT cannam@127: @ctindex FFTW_R2HC cannam@127: @cindex r2c cannam@127: @cindex r2hc cannam@127: (@pxref{One-Dimensional DFTs of Real Data}) but with ``halfcomplex'' cannam@127: format output, and may sometimes be faster and/or more convenient than cannam@127: the latter. cannam@127: @cindex halfcomplex format cannam@127: The inverse @dfn{hc2r} transform is of kind @code{FFTW_HC2R}. cannam@127: @ctindex FFTW_HC2R cannam@127: @cindex hc2r cannam@127: This consists of the non-redundant half of the complex output for a 1d cannam@127: real-input DFT of size @code{n}, stored as a sequence of @code{n} real cannam@127: numbers (@code{double}) in the format: cannam@127: cannam@127: @tex cannam@127: $$ cannam@127: r_0, r_1, r_2, \ldots, r_{n/2}, i_{(n+1)/2-1}, \ldots, i_2, i_1 cannam@127: $$ cannam@127: @end tex cannam@127: @ifinfo cannam@127: r0, r1, r2, r(n/2), i((n+1)/2-1), ..., i2, i1 cannam@127: @end ifinfo cannam@127: @html cannam@127:

cannam@127: r0, r1, r2, ..., rn/2, i(n+1)/2-1, ..., i2, i1 cannam@127:

cannam@127: @end html cannam@127: cannam@127: Here, cannam@127: @ifinfo cannam@127: rk cannam@127: @end ifinfo cannam@127: @tex cannam@127: $r_k$ cannam@127: @end tex cannam@127: @html cannam@127: rk cannam@127: @end html cannam@127: is the real part of the @math{k}th output, and cannam@127: @ifinfo cannam@127: ik cannam@127: @end ifinfo cannam@127: @tex cannam@127: $i_k$ cannam@127: @end tex cannam@127: @html cannam@127: ik cannam@127: @end html cannam@127: is the imaginary part. (Division by 2 is rounded down.) For a cannam@127: halfcomplex array @code{hc[n]}, the @math{k}th component thus has its cannam@127: real part in @code{hc[k]} and its imaginary part in @code{hc[n-k]}, with cannam@127: the exception of @code{k} @code{==} @code{0} or @code{n/2} (the latter cannam@127: only if @code{n} is even)---in these two cases, the imaginary part is cannam@127: zero due to symmetries of the real-input DFT, and is not stored. cannam@127: Thus, the r2hc transform of @code{n} real values is a halfcomplex array of cannam@127: length @code{n}, and vice versa for hc2r. cannam@127: @cindex normalization cannam@127: cannam@127: cannam@127: Aside from the differing format, the output of cannam@127: @code{FFTW_R2HC}/@code{FFTW_HC2R} is otherwise exactly the same as for cannam@127: the corresponding 1d r2c/c2r transform cannam@127: (i.e. @code{FFTW_FORWARD}/@code{FFTW_BACKWARD} transforms, respectively). cannam@127: Recall that these transforms are unnormalized, so r2hc followed by hc2r cannam@127: will result in the original data multiplied by @code{n}. Furthermore, cannam@127: like the c2r transform, an out-of-place hc2r transform will cannam@127: @emph{destroy its input} array. cannam@127: cannam@127: Although these halfcomplex transforms can be used with the cannam@127: multi-dimensional r2r interface, the interpretation of such a separable cannam@127: product of transforms along each dimension is problematic. For example, cannam@127: consider a two-dimensional @code{n0} by @code{n1}, r2hc by r2hc cannam@127: transform planned by @code{fftw_plan_r2r_2d(n0, n1, in, out, FFTW_R2HC, cannam@127: FFTW_R2HC, FFTW_MEASURE)}. Conceptually, FFTW first transforms the rows cannam@127: (of size @code{n1}) to produce halfcomplex rows, and then transforms the cannam@127: columns (of size @code{n0}). Half of these column transforms, however, cannam@127: are of imaginary parts, and should therefore be multiplied by @math{i} cannam@127: and combined with the r2hc transforms of the real columns to produce the cannam@127: 2d DFT amplitudes; FFTW's r2r transform does @emph{not} perform this cannam@127: combination for you. Thus, if a multi-dimensional real-input/output DFT cannam@127: is required, we recommend using the ordinary r2c/c2r cannam@127: interface (@pxref{Multi-Dimensional DFTs of Real Data}). cannam@127: cannam@127: @c =========> cannam@127: @node Real even/odd DFTs (cosine/sine transforms), The Discrete Hartley Transform, The Halfcomplex-format DFT, More DFTs of Real Data cannam@127: @subsection Real even/odd DFTs (cosine/sine transforms) cannam@127: cannam@127: The Fourier transform of a real-even function @math{f(-x) = f(x)} is cannam@127: real-even, and @math{i} times the Fourier transform of a real-odd cannam@127: function @math{f(-x) = -f(x)} is real-odd. Similar results hold for a cannam@127: discrete Fourier transform, and thus for these symmetries the need for cannam@127: complex inputs/outputs is entirely eliminated. Moreover, one gains a cannam@127: factor of two in speed/space from the fact that the data are real, and cannam@127: an additional factor of two from the even/odd symmetry: only the cannam@127: non-redundant (first) half of the array need be stored. The result is cannam@127: the real-even DFT (@dfn{REDFT}) and the real-odd DFT (@dfn{RODFT}), also cannam@127: known as the discrete cosine and sine transforms (@dfn{DCT} and cannam@127: @dfn{DST}), respectively. cannam@127: @cindex real-even DFT cannam@127: @cindex REDFT cannam@127: @cindex real-odd DFT cannam@127: @cindex RODFT cannam@127: @cindex discrete cosine transform cannam@127: @cindex DCT cannam@127: @cindex discrete sine transform cannam@127: @cindex DST cannam@127: cannam@127: cannam@127: (In this section, we describe the 1d transforms; multi-dimensional cannam@127: transforms are just a separable product of these transforms operating cannam@127: along each dimension.) cannam@127: cannam@127: Because of the discrete sampling, one has an additional choice: is the cannam@127: data even/odd around a sampling point, or around the point halfway cannam@127: between two samples? The latter corresponds to @emph{shifting} the cannam@127: samples by @emph{half} an interval, and gives rise to several transform cannam@127: variants denoted by REDFT@math{ab} and RODFT@math{ab}: @math{a} and cannam@127: @math{b} are @math{0} or @math{1}, and indicate whether the input cannam@127: (@math{a}) and/or output (@math{b}) are shifted by half a sample cannam@127: (@math{1} means it is shifted). These are also known as types I-IV of cannam@127: the DCT and DST, and all four types are supported by FFTW's r2r cannam@127: interface.@footnote{There are also type V-VIII transforms, which cannam@127: correspond to a logical DFT of @emph{odd} size @math{N}, independent of cannam@127: whether the physical size @code{n} is odd, but we do not support these cannam@127: variants.} cannam@127: cannam@127: The r2r kinds for the various REDFT and RODFT types supported by FFTW, cannam@127: along with the boundary conditions at both ends of the @emph{input} cannam@127: array (@code{n} real numbers @code{in[j=0..n-1]}), are: cannam@127: cannam@127: @itemize @bullet cannam@127: cannam@127: @item cannam@127: @code{FFTW_REDFT00} (DCT-I): even around @math{j=0} and even around @math{j=n-1}. cannam@127: @ctindex FFTW_REDFT00 cannam@127: cannam@127: @item cannam@127: @code{FFTW_REDFT10} (DCT-II, ``the'' DCT): even around @math{j=-0.5} and even around @math{j=n-0.5}. cannam@127: @ctindex FFTW_REDFT10 cannam@127: cannam@127: @item cannam@127: @code{FFTW_REDFT01} (DCT-III, ``the'' IDCT): even around @math{j=0} and odd around @math{j=n}. cannam@127: @ctindex FFTW_REDFT01 cannam@127: @cindex IDCT cannam@127: cannam@127: @item cannam@127: @code{FFTW_REDFT11} (DCT-IV): even around @math{j=-0.5} and odd around @math{j=n-0.5}. cannam@127: @ctindex FFTW_REDFT11 cannam@127: cannam@127: @item cannam@127: @code{FFTW_RODFT00} (DST-I): odd around @math{j=-1} and odd around @math{j=n}. cannam@127: @ctindex FFTW_RODFT00 cannam@127: cannam@127: @item cannam@127: @code{FFTW_RODFT10} (DST-II): odd around @math{j=-0.5} and odd around @math{j=n-0.5}. cannam@127: @ctindex FFTW_RODFT10 cannam@127: cannam@127: @item cannam@127: @code{FFTW_RODFT01} (DST-III): odd around @math{j=-1} and even around @math{j=n-1}. cannam@127: @ctindex FFTW_RODFT01 cannam@127: cannam@127: @item cannam@127: @code{FFTW_RODFT11} (DST-IV): odd around @math{j=-0.5} and even around @math{j=n-0.5}. cannam@127: @ctindex FFTW_RODFT11 cannam@127: cannam@127: @end itemize cannam@127: cannam@127: Note that these symmetries apply to the ``logical'' array being cannam@127: transformed; @strong{there are no constraints on your physical input cannam@127: data}. So, for example, if you specify a size-5 REDFT00 (DCT-I) of the cannam@127: data @math{abcde}, it corresponds to the DFT of the logical even array cannam@127: @math{abcdedcb} of size 8. A size-4 REDFT10 (DCT-II) of the data cannam@127: @math{abcd} corresponds to the size-8 logical DFT of the even array cannam@127: @math{abcddcba}, shifted by half a sample. cannam@127: cannam@127: All of these transforms are invertible. The inverse of R*DFT00 is cannam@127: R*DFT00; of R*DFT10 is R*DFT01 and vice versa (these are often called cannam@127: simply ``the'' DCT and IDCT, respectively); and of R*DFT11 is R*DFT11. cannam@127: However, the transforms computed by FFTW are unnormalized, exactly cannam@127: like the corresponding real and complex DFTs, so computing a transform cannam@127: followed by its inverse yields the original array scaled by @math{N}, cannam@127: where @math{N} is the @emph{logical} DFT size. For REDFT00, cannam@127: @math{N=2(n-1)}; for RODFT00, @math{N=2(n+1)}; otherwise, @math{N=2n}. cannam@127: @cindex normalization cannam@127: @cindex IDCT cannam@127: cannam@127: cannam@127: Note that the boundary conditions of the transform output array are cannam@127: given by the input boundary conditions of the inverse transform. cannam@127: Thus, the above transforms are all inequivalent in terms of cannam@127: input/output boundary conditions, even neglecting the 0.5 shift cannam@127: difference. cannam@127: cannam@127: FFTW is most efficient when @math{N} is a product of small factors; note cannam@127: that this @emph{differs} from the factorization of the physical size cannam@127: @code{n} for REDFT00 and RODFT00! There is another oddity: @code{n=1} cannam@127: REDFT00 transforms correspond to @math{N=0}, and so are @emph{not cannam@127: defined} (the planner will return @code{NULL}). Otherwise, any positive cannam@127: @code{n} is supported. cannam@127: cannam@127: For the precise mathematical definitions of these transforms as used by cannam@127: FFTW, see @ref{What FFTW Really Computes}. (For people accustomed to cannam@127: the DCT/DST, FFTW's definitions have a coefficient of @math{2} in front cannam@127: of the cos/sin functions so that they correspond precisely to an cannam@127: even/odd DFT of size @math{N}. Some authors also include additional cannam@127: multiplicative factors of cannam@127: @ifinfo cannam@127: sqrt(2) cannam@127: @end ifinfo cannam@127: @html cannam@127: √2 cannam@127: @end html cannam@127: @tex cannam@127: $\sqrt{2}$ cannam@127: @end tex cannam@127: for selected inputs and outputs; this makes cannam@127: the transform orthogonal, but sacrifices the direct equivalence to a cannam@127: symmetric DFT.) cannam@127: cannam@127: @subsubheading Which type do you need? cannam@127: cannam@127: Since the required flavor of even/odd DFT depends upon your problem, cannam@127: you are the best judge of this choice, but we can make a few comments cannam@127: on relative efficiency to help you in your selection. In particular, cannam@127: R*DFT01 and R*DFT10 tend to be slightly faster than R*DFT11 cannam@127: (especially for odd sizes), while the R*DFT00 transforms are sometimes cannam@127: significantly slower (especially for even sizes).@footnote{R*DFT00 is cannam@127: sometimes slower in FFTW because we discovered that the standard cannam@127: algorithm for computing this by a pre/post-processed real DFT---the cannam@127: algorithm used in FFTPACK, Numerical Recipes, and other sources for cannam@127: decades now---has serious numerical problems: it already loses several cannam@127: decimal places of accuracy for 16k sizes. There seem to be only two cannam@127: alternatives in the literature that do not suffer similarly: a cannam@127: recursive decomposition into smaller DCTs, which would require a large cannam@127: set of codelets for efficiency and generality, or sacrificing a factor of cannam@127: @tex cannam@127: $\sim 2$ cannam@127: @end tex cannam@127: @ifnottex cannam@127: 2 cannam@127: @end ifnottex cannam@127: in speed to use a real DFT of twice the size. We currently cannam@127: employ the latter technique for general @math{n}, as well as a limited cannam@127: form of the former method: a split-radix decomposition when @math{n} cannam@127: is odd (@math{N} a multiple of 4). For @math{N} containing many cannam@127: factors of 2, the split-radix method seems to recover most of the cannam@127: speed of the standard algorithm without the accuracy tradeoff.} cannam@127: cannam@127: Thus, if only the boundary conditions on the transform inputs are cannam@127: specified, we generally recommend R*DFT10 over R*DFT00 and R*DFT01 over cannam@127: R*DFT11 (unless the half-sample shift or the self-inverse property is cannam@127: significant for your problem). cannam@127: cannam@127: If performance is important to you and you are using only small sizes cannam@127: (say @math{n<200}), e.g. for multi-dimensional transforms, then you cannam@127: might consider generating hard-coded transforms of those sizes and types cannam@127: that you are interested in (@pxref{Generating your own code}). cannam@127: cannam@127: We are interested in hearing what types of symmetric transforms you find cannam@127: most useful. cannam@127: cannam@127: @c =========> cannam@127: @node The Discrete Hartley Transform, , Real even/odd DFTs (cosine/sine transforms), More DFTs of Real Data cannam@127: @subsection The Discrete Hartley Transform cannam@127: cannam@127: If you are planning to use the DHT because you've heard that it is cannam@127: ``faster'' than the DFT (FFT), @strong{stop here}. The DHT is not cannam@127: faster than the DFT. That story is an old but enduring misconception cannam@127: that was debunked in 1987. cannam@127: cannam@127: The discrete Hartley transform (DHT) is an invertible linear transform cannam@127: closely related to the DFT. In the DFT, one multiplies each input by cannam@127: @math{cos - i * sin} (a complex exponential), whereas in the DHT each cannam@127: input is multiplied by simply @math{cos + sin}. Thus, the DHT cannam@127: transforms @code{n} real numbers to @code{n} real numbers, and has the cannam@127: convenient property of being its own inverse. In FFTW, a DHT (of any cannam@127: positive @code{n}) can be specified by an r2r kind of @code{FFTW_DHT}. cannam@127: @ctindex FFTW_DHT cannam@127: @cindex discrete Hartley transform cannam@127: @cindex DHT cannam@127: cannam@127: Like the DFT, in FFTW the DHT is unnormalized, so computing a DHT of cannam@127: size @code{n} followed by another DHT of the same size will result in cannam@127: the original array multiplied by @code{n}. cannam@127: @cindex normalization cannam@127: cannam@127: The DHT was originally proposed as a more efficient alternative to the cannam@127: DFT for real data, but it was subsequently shown that a specialized DFT cannam@127: (such as FFTW's r2hc or r2c transforms) could be just as fast. In FFTW, cannam@127: the DHT is actually computed by post-processing an r2hc transform, so cannam@127: there is ordinarily no reason to prefer it from a performance cannam@127: perspective.@footnote{We provide the DHT mainly as a byproduct of some cannam@127: internal algorithms. FFTW computes a real input/output DFT of cannam@127: @emph{prime} size by re-expressing it as a DHT plus post/pre-processing cannam@127: and then using Rader's prime-DFT algorithm adapted to the DHT.} cannam@127: However, we have heard rumors that the DHT might be the most appropriate cannam@127: transform in its own right for certain applications, and we would be cannam@127: very interested to hear from anyone who finds it useful. cannam@127: cannam@127: If @code{FFTW_DHT} is specified for multiple dimensions of a cannam@127: multi-dimensional transform, FFTW computes the separable product of 1d cannam@127: DHTs along each dimension. Unfortunately, this is not quite the same cannam@127: thing as a true multi-dimensional DHT; you can compute the latter, if cannam@127: necessary, with at most @code{rank-1} post-processing passes cannam@127: [see e.g. H. Hao and R. N. Bracewell, @i{Proc. IEEE} @b{75}, 264--266 (1987)]. cannam@127: cannam@127: For the precise mathematical definition of the DHT as used by FFTW, see cannam@127: @ref{What FFTW Really Computes}. cannam@127: