annotate src/fftw-3.3.3/doc/tutorial.texi @ 83:ae30d91d2ffe

Replace these with versions built using an older toolset (so as to avoid ABI compatibilities when linking on Ubuntu 14.04 for packaging purposes)
author Chris Cannam
date Fri, 07 Feb 2020 11:51:13 +0000
parents 37bf6b4a2645
children
rev   line source
Chris@10 1 @node Tutorial, Other Important Topics, Introduction, Top
Chris@10 2 @chapter Tutorial
Chris@10 3 @menu
Chris@10 4 * Complex One-Dimensional DFTs::
Chris@10 5 * Complex Multi-Dimensional DFTs::
Chris@10 6 * One-Dimensional DFTs of Real Data::
Chris@10 7 * Multi-Dimensional DFTs of Real Data::
Chris@10 8 * More DFTs of Real Data::
Chris@10 9 @end menu
Chris@10 10
Chris@10 11 This chapter describes the basic usage of FFTW, i.e., how to compute
Chris@10 12 @cindex basic interface
Chris@10 13 the Fourier transform of a single array. This chapter tells the
Chris@10 14 truth, but not the @emph{whole} truth. Specifically, FFTW implements
Chris@10 15 additional routines and flags that are not documented here, although
Chris@10 16 in many cases we try to indicate where added capabilities exist. For
Chris@10 17 more complete information, see @ref{FFTW Reference}. (Note that you
Chris@10 18 need to compile and install FFTW before you can use it in a program.
Chris@10 19 For the details of the installation, see @ref{Installation and
Chris@10 20 Customization}.)
Chris@10 21
Chris@10 22 We recommend that you read this tutorial in order.@footnote{You can
Chris@10 23 read the tutorial in bit-reversed order after computing your first
Chris@10 24 transform.} At the least, read the first section (@pxref{Complex
Chris@10 25 One-Dimensional DFTs}) before reading any of the others, even if your
Chris@10 26 main interest lies in one of the other transform types.
Chris@10 27
Chris@10 28 Users of FFTW version 2 and earlier may also want to read @ref{Upgrading
Chris@10 29 from FFTW version 2}.
Chris@10 30
Chris@10 31 @c ------------------------------------------------------------
Chris@10 32 @node Complex One-Dimensional DFTs, Complex Multi-Dimensional DFTs, Tutorial, Tutorial
Chris@10 33 @section Complex One-Dimensional DFTs
Chris@10 34
Chris@10 35 @quotation
Chris@10 36 Plan: To bother about the best method of accomplishing an accidental result.
Chris@10 37 [Ambrose Bierce, @cite{The Enlarged Devil's Dictionary}.]
Chris@10 38 @cindex Devil
Chris@10 39 @end quotation
Chris@10 40
Chris@10 41 @iftex
Chris@10 42 @medskip
Chris@10 43 @end iftex
Chris@10 44
Chris@10 45 The basic usage of FFTW to compute a one-dimensional DFT of size
Chris@10 46 @code{N} is simple, and it typically looks something like this code:
Chris@10 47
Chris@10 48 @example
Chris@10 49 #include <fftw3.h>
Chris@10 50 ...
Chris@10 51 @{
Chris@10 52 fftw_complex *in, *out;
Chris@10 53 fftw_plan p;
Chris@10 54 ...
Chris@10 55 in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
Chris@10 56 out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
Chris@10 57 p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE);
Chris@10 58 ...
Chris@10 59 fftw_execute(p); /* @r{repeat as needed} */
Chris@10 60 ...
Chris@10 61 fftw_destroy_plan(p);
Chris@10 62 fftw_free(in); fftw_free(out);
Chris@10 63 @}
Chris@10 64 @end example
Chris@10 65
Chris@10 66 You must link this code with the @code{fftw3} library. On Unix systems,
Chris@10 67 link with @code{-lfftw3 -lm}.
Chris@10 68
Chris@10 69 The example code first allocates the input and output arrays. You can
Chris@10 70 allocate them in any way that you like, but we recommend using
Chris@10 71 @code{fftw_malloc}, which behaves like
Chris@10 72 @findex fftw_malloc
Chris@10 73 @code{malloc} except that it properly aligns the array when SIMD
Chris@10 74 instructions (such as SSE and Altivec) are available (@pxref{SIMD
Chris@10 75 alignment and fftw_malloc}). [Alternatively, we provide a convenient wrapper function @code{fftw_alloc_complex(N)} which has the same effect.]
Chris@10 76 @findex fftw_alloc_complex
Chris@10 77 @cindex SIMD
Chris@10 78
Chris@10 79
Chris@10 80 The data is an array of type @code{fftw_complex}, which is by default a
Chris@10 81 @code{double[2]} composed of the real (@code{in[i][0]}) and imaginary
Chris@10 82 (@code{in[i][1]}) parts of a complex number.
Chris@10 83 @tindex fftw_complex
Chris@10 84
Chris@10 85 The next step is to create a @dfn{plan}, which is an object
Chris@10 86 @cindex plan
Chris@10 87 that contains all the data that FFTW needs to compute the FFT.
Chris@10 88 This function creates the plan:
Chris@10 89
Chris@10 90 @example
Chris@10 91 fftw_plan fftw_plan_dft_1d(int n, fftw_complex *in, fftw_complex *out,
Chris@10 92 int sign, unsigned flags);
Chris@10 93 @end example
Chris@10 94 @findex fftw_plan_dft_1d
Chris@10 95 @tindex fftw_plan
Chris@10 96
Chris@10 97 The first argument, @code{n}, is the size of the transform you are
Chris@10 98 trying to compute. The size @code{n} can be any positive integer, but
Chris@10 99 sizes that are products of small factors are transformed most
Chris@10 100 efficiently (although prime sizes still use an @Onlogn{} algorithm).
Chris@10 101
Chris@10 102 The next two arguments are pointers to the input and output arrays of
Chris@10 103 the transform. These pointers can be equal, indicating an
Chris@10 104 @dfn{in-place} transform.
Chris@10 105 @cindex in-place
Chris@10 106
Chris@10 107
Chris@10 108 The fourth argument, @code{sign}, can be either @code{FFTW_FORWARD}
Chris@10 109 (@code{-1}) or @code{FFTW_BACKWARD} (@code{+1}),
Chris@10 110 @ctindex FFTW_FORWARD
Chris@10 111 @ctindex FFTW_BACKWARD
Chris@10 112 and indicates the direction of the transform you are interested in;
Chris@10 113 technically, it is the sign of the exponent in the transform.
Chris@10 114
Chris@10 115 The @code{flags} argument is usually either @code{FFTW_MEASURE} or
Chris@10 116 @cindex flags
Chris@10 117 @code{FFTW_ESTIMATE}. @code{FFTW_MEASURE} instructs FFTW to run
Chris@10 118 @ctindex FFTW_MEASURE
Chris@10 119 and measure the execution time of several FFTs in order to find the
Chris@10 120 best way to compute the transform of size @code{n}. This process takes
Chris@10 121 some time (usually a few seconds), depending on your machine and on
Chris@10 122 the size of the transform. @code{FFTW_ESTIMATE}, on the contrary,
Chris@10 123 does not run any computation and just builds a
Chris@10 124 @ctindex FFTW_ESTIMATE
Chris@10 125 reasonable plan that is probably sub-optimal. In short, if your
Chris@10 126 program performs many transforms of the same size and initialization
Chris@10 127 time is not important, use @code{FFTW_MEASURE}; otherwise use the
Chris@10 128 estimate.
Chris@10 129
Chris@10 130 @emph{You must create the plan before initializing the input}, because
Chris@10 131 @code{FFTW_MEASURE} overwrites the @code{in}/@code{out} arrays.
Chris@10 132 (Technically, @code{FFTW_ESTIMATE} does not touch your arrays, but you
Chris@10 133 should always create plans first just to be sure.)
Chris@10 134
Chris@10 135 Once the plan has been created, you can use it as many times as you
Chris@10 136 like for transforms on the specified @code{in}/@code{out} arrays,
Chris@10 137 computing the actual transforms via @code{fftw_execute(plan)}:
Chris@10 138 @example
Chris@10 139 void fftw_execute(const fftw_plan plan);
Chris@10 140 @end example
Chris@10 141 @findex fftw_execute
Chris@10 142
Chris@10 143 The DFT results are stored in-order in the array @code{out}, with the
Chris@10 144 zero-frequency (DC) component in @code{out[0]}.
Chris@10 145 @cindex frequency
Chris@10 146 If @code{in != out}, the transform is @dfn{out-of-place} and the input
Chris@10 147 array @code{in} is not modified. Otherwise, the input array is
Chris@10 148 overwritten with the transform.
Chris@10 149
Chris@10 150 @cindex execute
Chris@10 151 If you want to transform a @emph{different} array of the same size, you
Chris@10 152 can create a new plan with @code{fftw_plan_dft_1d} and FFTW
Chris@10 153 automatically reuses the information from the previous plan, if
Chris@10 154 possible. Alternatively, with the ``guru'' interface you can apply a
Chris@10 155 given plan to a different array, if you are careful.
Chris@10 156 @xref{FFTW Reference}.
Chris@10 157
Chris@10 158 When you are done with the plan, you deallocate it by calling
Chris@10 159 @code{fftw_destroy_plan(plan)}:
Chris@10 160 @example
Chris@10 161 void fftw_destroy_plan(fftw_plan plan);
Chris@10 162 @end example
Chris@10 163 @findex fftw_destroy_plan
Chris@10 164 If you allocate an array with @code{fftw_malloc()} you must deallocate
Chris@10 165 it with @code{fftw_free()}. Do not use @code{free()} or, heaven
Chris@10 166 forbid, @code{delete}.
Chris@10 167 @findex fftw_free
Chris@10 168
Chris@10 169 FFTW computes an @emph{unnormalized} DFT. Thus, computing a forward
Chris@10 170 followed by a backward transform (or vice versa) results in the original
Chris@10 171 array scaled by @code{n}. For the definition of the DFT, see @ref{What
Chris@10 172 FFTW Really Computes}.
Chris@10 173 @cindex DFT
Chris@10 174 @cindex normalization
Chris@10 175
Chris@10 176
Chris@10 177 If you have a C compiler, such as @code{gcc}, that supports the
Chris@10 178 C99 standard, and you @code{#include <complex.h>} @emph{before}
Chris@10 179 @code{<fftw3.h>}, then @code{fftw_complex} is the native
Chris@10 180 double-precision complex type and you can manipulate it with ordinary
Chris@10 181 arithmetic. Otherwise, FFTW defines its own complex type, which is
Chris@10 182 bit-compatible with the C99 complex type. @xref{Complex numbers}.
Chris@10 183 (The C++ @code{<complex>} template class may also be usable via a
Chris@10 184 typecast.)
Chris@10 185 @cindex C++
Chris@10 186
Chris@10 187 To use single or long-double precision versions of FFTW, replace the
Chris@10 188 @code{fftw_} prefix by @code{fftwf_} or @code{fftwl_} and link with
Chris@10 189 @code{-lfftw3f} or @code{-lfftw3l}, but use the @emph{same}
Chris@10 190 @code{<fftw3.h>} header file.
Chris@10 191 @cindex precision
Chris@10 192
Chris@10 193
Chris@10 194 Many more flags exist besides @code{FFTW_MEASURE} and
Chris@10 195 @code{FFTW_ESTIMATE}. For example, use @code{FFTW_PATIENT} if you're
Chris@10 196 willing to wait even longer for a possibly even faster plan (@pxref{FFTW
Chris@10 197 Reference}).
Chris@10 198 @ctindex FFTW_PATIENT
Chris@10 199 You can also save plans for future use, as described by @ref{Words of
Chris@10 200 Wisdom-Saving Plans}.
Chris@10 201
Chris@10 202 @c ------------------------------------------------------------
Chris@10 203 @node Complex Multi-Dimensional DFTs, One-Dimensional DFTs of Real Data, Complex One-Dimensional DFTs, Tutorial
Chris@10 204 @section Complex Multi-Dimensional DFTs
Chris@10 205
Chris@10 206 Multi-dimensional transforms work much the same way as one-dimensional
Chris@10 207 transforms: you allocate arrays of @code{fftw_complex} (preferably
Chris@10 208 using @code{fftw_malloc}), create an @code{fftw_plan}, execute it as
Chris@10 209 many times as you want with @code{fftw_execute(plan)}, and clean up
Chris@10 210 with @code{fftw_destroy_plan(plan)} (and @code{fftw_free}).
Chris@10 211
Chris@10 212 FFTW provides two routines for creating plans for 2d and 3d transforms,
Chris@10 213 and one routine for creating plans of arbitrary dimensionality.
Chris@10 214 The 2d and 3d routines have the following signature:
Chris@10 215 @example
Chris@10 216 fftw_plan fftw_plan_dft_2d(int n0, int n1,
Chris@10 217 fftw_complex *in, fftw_complex *out,
Chris@10 218 int sign, unsigned flags);
Chris@10 219 fftw_plan fftw_plan_dft_3d(int n0, int n1, int n2,
Chris@10 220 fftw_complex *in, fftw_complex *out,
Chris@10 221 int sign, unsigned flags);
Chris@10 222 @end example
Chris@10 223 @findex fftw_plan_dft_2d
Chris@10 224 @findex fftw_plan_dft_3d
Chris@10 225
Chris@10 226 These routines create plans for @code{n0} by @code{n1} two-dimensional
Chris@10 227 (2d) transforms and @code{n0} by @code{n1} by @code{n2} 3d transforms,
Chris@10 228 respectively. All of these transforms operate on contiguous arrays in
Chris@10 229 the C-standard @dfn{row-major} order, so that the last dimension has the
Chris@10 230 fastest-varying index in the array. This layout is described further in
Chris@10 231 @ref{Multi-dimensional Array Format}.
Chris@10 232
Chris@10 233 FFTW can also compute transforms of higher dimensionality. In order to
Chris@10 234 avoid confusion between the various meanings of the the word
Chris@10 235 ``dimension'', we use the term @emph{rank}
Chris@10 236 @cindex rank
Chris@10 237 to denote the number of independent indices in an array.@footnote{The
Chris@10 238 term ``rank'' is commonly used in the APL, FORTRAN, and Common Lisp
Chris@10 239 traditions, although it is not so common in the C@tie{}world.} For
Chris@10 240 example, we say that a 2d transform has rank@tie{}2, a 3d transform has
Chris@10 241 rank@tie{}3, and so on. You can plan transforms of arbitrary rank by
Chris@10 242 means of the following function:
Chris@10 243
Chris@10 244 @example
Chris@10 245 fftw_plan fftw_plan_dft(int rank, const int *n,
Chris@10 246 fftw_complex *in, fftw_complex *out,
Chris@10 247 int sign, unsigned flags);
Chris@10 248 @end example
Chris@10 249 @findex fftw_plan_dft
Chris@10 250
Chris@10 251 Here, @code{n} is a pointer to an array @code{n[rank]} denoting an
Chris@10 252 @code{n[0]} by @code{n[1]} by @dots{} by @code{n[rank-1]} transform.
Chris@10 253 Thus, for example, the call
Chris@10 254 @example
Chris@10 255 fftw_plan_dft_2d(n0, n1, in, out, sign, flags);
Chris@10 256 @end example
Chris@10 257 is equivalent to the following code fragment:
Chris@10 258 @example
Chris@10 259 int n[2];
Chris@10 260 n[0] = n0;
Chris@10 261 n[1] = n1;
Chris@10 262 fftw_plan_dft(2, n, in, out, sign, flags);
Chris@10 263 @end example
Chris@10 264 @code{fftw_plan_dft} is not restricted to 2d and 3d transforms,
Chris@10 265 however, but it can plan transforms of arbitrary rank.
Chris@10 266
Chris@10 267 You may have noticed that all the planner routines described so far
Chris@10 268 have overlapping functionality. For example, you can plan a 1d or 2d
Chris@10 269 transform by using @code{fftw_plan_dft} with a @code{rank} of @code{1}
Chris@10 270 or @code{2}, or even by calling @code{fftw_plan_dft_3d} with @code{n0}
Chris@10 271 and/or @code{n1} equal to @code{1} (with no loss in efficiency). This
Chris@10 272 pattern continues, and FFTW's planning routines in general form a
Chris@10 273 ``partial order,'' sequences of
Chris@10 274 @cindex partial order
Chris@10 275 interfaces with strictly increasing generality but correspondingly
Chris@10 276 greater complexity.
Chris@10 277
Chris@10 278 @code{fftw_plan_dft} is the most general complex-DFT routine that we
Chris@10 279 describe in this tutorial, but there are also the advanced and guru interfaces,
Chris@10 280 @cindex advanced interface
Chris@10 281 @cindex guru interface
Chris@10 282 which allow one to efficiently combine multiple/strided transforms
Chris@10 283 into a single FFTW plan, transform a subset of a larger
Chris@10 284 multi-dimensional array, and/or to handle more general complex-number
Chris@10 285 formats. For more information, see @ref{FFTW Reference}.
Chris@10 286
Chris@10 287 @c ------------------------------------------------------------
Chris@10 288 @node One-Dimensional DFTs of Real Data, Multi-Dimensional DFTs of Real Data, Complex Multi-Dimensional DFTs, Tutorial
Chris@10 289 @section One-Dimensional DFTs of Real Data
Chris@10 290
Chris@10 291 In many practical applications, the input data @code{in[i]} are purely
Chris@10 292 real numbers, in which case the DFT output satisfies the ``Hermitian''
Chris@10 293 @cindex Hermitian
Chris@10 294 redundancy: @code{out[i]} is the conjugate of @code{out[n-i]}. It is
Chris@10 295 possible to take advantage of these circumstances in order to achieve
Chris@10 296 roughly a factor of two improvement in both speed and memory usage.
Chris@10 297
Chris@10 298 In exchange for these speed and space advantages, the user sacrifices
Chris@10 299 some of the simplicity of FFTW's complex transforms. First of all, the
Chris@10 300 input and output arrays are of @emph{different sizes and types}: the
Chris@10 301 input is @code{n} real numbers, while the output is @code{n/2+1}
Chris@10 302 complex numbers (the non-redundant outputs); this also requires slight
Chris@10 303 ``padding'' of the input array for
Chris@10 304 @cindex padding
Chris@10 305 in-place transforms. Second, the inverse transform (complex to real)
Chris@10 306 has the side-effect of @emph{overwriting its input array}, by default.
Chris@10 307 Neither of these inconveniences should pose a serious problem for
Chris@10 308 users, but it is important to be aware of them.
Chris@10 309
Chris@10 310 The routines to perform real-data transforms are almost the same as
Chris@10 311 those for complex transforms: you allocate arrays of @code{double}
Chris@10 312 and/or @code{fftw_complex} (preferably using @code{fftw_malloc} or
Chris@10 313 @code{fftw_alloc_complex}), create an @code{fftw_plan}, execute it as
Chris@10 314 many times as you want with @code{fftw_execute(plan)}, and clean up
Chris@10 315 with @code{fftw_destroy_plan(plan)} (and @code{fftw_free}). The only
Chris@10 316 differences are that the input (or output) is of type @code{double}
Chris@10 317 and there are new routines to create the plan. In one dimension:
Chris@10 318
Chris@10 319 @example
Chris@10 320 fftw_plan fftw_plan_dft_r2c_1d(int n, double *in, fftw_complex *out,
Chris@10 321 unsigned flags);
Chris@10 322 fftw_plan fftw_plan_dft_c2r_1d(int n, fftw_complex *in, double *out,
Chris@10 323 unsigned flags);
Chris@10 324 @end example
Chris@10 325 @findex fftw_plan_dft_r2c_1d
Chris@10 326 @findex fftw_plan_dft_c2r_1d
Chris@10 327
Chris@10 328 for the real input to complex-Hermitian output (@dfn{r2c}) and
Chris@10 329 complex-Hermitian input to real output (@dfn{c2r}) transforms.
Chris@10 330 @cindex r2c
Chris@10 331 @cindex c2r
Chris@10 332 Unlike the complex DFT planner, there is no @code{sign} argument.
Chris@10 333 Instead, r2c DFTs are always @code{FFTW_FORWARD} and c2r DFTs are
Chris@10 334 always @code{FFTW_BACKWARD}.
Chris@10 335 @ctindex FFTW_FORWARD
Chris@10 336 @ctindex FFTW_BACKWARD
Chris@10 337 (For single/long-double precision
Chris@10 338 @code{fftwf} and @code{fftwl}, @code{double} should be replaced by
Chris@10 339 @code{float} and @code{long double}, respectively.)
Chris@10 340 @cindex precision
Chris@10 341
Chris@10 342
Chris@10 343 Here, @code{n} is the ``logical'' size of the DFT, not necessarily the
Chris@10 344 physical size of the array. In particular, the real (@code{double})
Chris@10 345 array has @code{n} elements, while the complex (@code{fftw_complex})
Chris@10 346 array has @code{n/2+1} elements (where the division is rounded down).
Chris@10 347 For an in-place transform,
Chris@10 348 @cindex in-place
Chris@10 349 @code{in} and @code{out} are aliased to the same array, which must be
Chris@10 350 big enough to hold both; so, the real array would actually have
Chris@10 351 @code{2*(n/2+1)} elements, where the elements beyond the first
Chris@10 352 @code{n} are unused padding. (Note that this is very different from
Chris@10 353 the concept of ``zero-padding'' a transform to a larger length, which
Chris@10 354 changes the logical size of the DFT by actually adding new input
Chris@10 355 data.) The @math{k}th element of the complex array is exactly the
Chris@10 356 same as the @math{k}th element of the corresponding complex DFT. All
Chris@10 357 positive @code{n} are supported; products of small factors are most
Chris@10 358 efficient, but an @Onlogn algorithm is used even for prime sizes.
Chris@10 359
Chris@10 360 As noted above, the c2r transform destroys its input array even for
Chris@10 361 out-of-place transforms. This can be prevented, if necessary, by
Chris@10 362 including @code{FFTW_PRESERVE_INPUT} in the @code{flags}, with
Chris@10 363 unfortunately some sacrifice in performance.
Chris@10 364 @cindex flags
Chris@10 365 @ctindex FFTW_PRESERVE_INPUT
Chris@10 366 This flag is also not currently supported for multi-dimensional real
Chris@10 367 DFTs (next section).
Chris@10 368
Chris@10 369 Readers familiar with DFTs of real data will recall that the 0th (the
Chris@10 370 ``DC'') and @code{n/2}-th (the ``Nyquist'' frequency, when @code{n} is
Chris@10 371 even) elements of the complex output are purely real. Some
Chris@10 372 implementations therefore store the Nyquist element where the DC
Chris@10 373 imaginary part would go, in order to make the input and output arrays
Chris@10 374 the same size. Such packing, however, does not generalize well to
Chris@10 375 multi-dimensional transforms, and the space savings are miniscule in
Chris@10 376 any case; FFTW does not support it.
Chris@10 377
Chris@10 378 An alternative interface for one-dimensional r2c and c2r DFTs can be
Chris@10 379 found in the @samp{r2r} interface (@pxref{The Halfcomplex-format
Chris@10 380 DFT}), with ``halfcomplex''-format output that @emph{is} the same size
Chris@10 381 (and type) as the input array.
Chris@10 382 @cindex halfcomplex format
Chris@10 383 That interface, although it is not very useful for multi-dimensional
Chris@10 384 transforms, may sometimes yield better performance.
Chris@10 385
Chris@10 386 @c ------------------------------------------------------------
Chris@10 387 @node Multi-Dimensional DFTs of Real Data, More DFTs of Real Data, One-Dimensional DFTs of Real Data, Tutorial
Chris@10 388 @section Multi-Dimensional DFTs of Real Data
Chris@10 389
Chris@10 390 Multi-dimensional DFTs of real data use the following planner routines:
Chris@10 391
Chris@10 392 @example
Chris@10 393 fftw_plan fftw_plan_dft_r2c_2d(int n0, int n1,
Chris@10 394 double *in, fftw_complex *out,
Chris@10 395 unsigned flags);
Chris@10 396 fftw_plan fftw_plan_dft_r2c_3d(int n0, int n1, int n2,
Chris@10 397 double *in, fftw_complex *out,
Chris@10 398 unsigned flags);
Chris@10 399 fftw_plan fftw_plan_dft_r2c(int rank, const int *n,
Chris@10 400 double *in, fftw_complex *out,
Chris@10 401 unsigned flags);
Chris@10 402 @end example
Chris@10 403 @findex fftw_plan_dft_r2c_2d
Chris@10 404 @findex fftw_plan_dft_r2c_3d
Chris@10 405 @findex fftw_plan_dft_r2c
Chris@10 406
Chris@10 407 as well as the corresponding @code{c2r} routines with the input/output
Chris@10 408 types swapped. These routines work similarly to their complex
Chris@10 409 analogues, except for the fact that here the complex output array is cut
Chris@10 410 roughly in half and the real array requires padding for in-place
Chris@10 411 transforms (as in 1d, above).
Chris@10 412
Chris@10 413 As before, @code{n} is the logical size of the array, and the
Chris@10 414 consequences of this on the the format of the complex arrays deserve
Chris@10 415 careful attention.
Chris@10 416 @cindex r2c/c2r multi-dimensional array format
Chris@10 417 Suppose that the real data has dimensions @ndims (in row-major order).
Chris@10 418 Then, after an r2c transform, the output is an @ndimshalf array of
Chris@10 419 @code{fftw_complex} values in row-major order, corresponding to slightly
Chris@10 420 over half of the output of the corresponding complex DFT. (The division
Chris@10 421 is rounded down.) The ordering of the data is otherwise exactly the
Chris@10 422 same as in the complex-DFT case.
Chris@10 423
Chris@10 424 For out-of-place transforms, this is the end of the story: the real
Chris@10 425 data is stored as a row-major array of size @ndims and the complex
Chris@10 426 data is stored as a row-major array of size @ndimshalf{}.
Chris@10 427
Chris@10 428 For in-place transforms, however, extra padding of the real-data array
Chris@10 429 is necessary because the complex array is larger than the real array,
Chris@10 430 and the two arrays share the same memory locations. Thus, for
Chris@10 431 in-place transforms, the final dimension of the real-data array must
Chris@10 432 be padded with extra values to accommodate the size of the complex
Chris@10 433 data---two values if the last dimension is even and one if it is odd.
Chris@10 434 @cindex padding
Chris@10 435 That is, the last dimension of the real data must physically contain
Chris@10 436 @tex
Chris@10 437 $2 (n_{d-1}/2+1)$
Chris@10 438 @end tex
Chris@10 439 @ifinfo
Chris@10 440 2 * (n[d-1]/2+1)
Chris@10 441 @end ifinfo
Chris@10 442 @html
Chris@10 443 2 * (n<sub>d-1</sub>/2+1)
Chris@10 444 @end html
Chris@10 445 @code{double} values (exactly enough to hold the complex data).
Chris@10 446 This physical array size does not, however, change the @emph{logical}
Chris@10 447 array size---only
Chris@10 448 @tex
Chris@10 449 $n_{d-1}$
Chris@10 450 @end tex
Chris@10 451 @ifinfo
Chris@10 452 n[d-1]
Chris@10 453 @end ifinfo
Chris@10 454 @html
Chris@10 455 n<sub>d-1</sub>
Chris@10 456 @end html
Chris@10 457 values are actually stored in the last dimension, and
Chris@10 458 @tex
Chris@10 459 $n_{d-1}$
Chris@10 460 @end tex
Chris@10 461 @ifinfo
Chris@10 462 n[d-1]
Chris@10 463 @end ifinfo
Chris@10 464 @html
Chris@10 465 n<sub>d-1</sub>
Chris@10 466 @end html
Chris@10 467 is the last dimension passed to the plan-creation routine.
Chris@10 468
Chris@10 469 For example, consider the transform of a two-dimensional real array of
Chris@10 470 size @code{n0} by @code{n1}. The output of the r2c transform is a
Chris@10 471 two-dimensional complex array of size @code{n0} by @code{n1/2+1}, where
Chris@10 472 the @code{y} dimension has been cut nearly in half because of
Chris@10 473 redundancies in the output. Because @code{fftw_complex} is twice the
Chris@10 474 size of @code{double}, the output array is slightly bigger than the
Chris@10 475 input array. Thus, if we want to compute the transform in place, we
Chris@10 476 must @emph{pad} the input array so that it is of size @code{n0} by
Chris@10 477 @code{2*(n1/2+1)}. If @code{n1} is even, then there are two padding
Chris@10 478 elements at the end of each row (which need not be initialized, as they
Chris@10 479 are only used for output).
Chris@10 480
Chris@10 481 @ifhtml
Chris@10 482 The following illustration depicts the input and output arrays just
Chris@10 483 described, for both the out-of-place and in-place transforms (with the
Chris@10 484 arrows indicating consecutive memory locations):
Chris@10 485 @image{rfftwnd-for-html}
Chris@10 486 @end ifhtml
Chris@10 487 @ifnotinfo
Chris@10 488 @ifnothtml
Chris@10 489 @float Figure,fig:rfftwnd
Chris@10 490 @center @image{rfftwnd}
Chris@10 491 @caption{Illustration of the data layout for a 2d @code{nx} by @code{ny}
Chris@10 492 real-to-complex transform.}
Chris@10 493 @end float
Chris@10 494 @ref{fig:rfftwnd} depicts the input and output arrays just
Chris@10 495 described, for both the out-of-place and in-place transforms (with the
Chris@10 496 arrows indicating consecutive memory locations):
Chris@10 497 @end ifnothtml
Chris@10 498 @end ifnotinfo
Chris@10 499
Chris@10 500 These transforms are unnormalized, so an r2c followed by a c2r
Chris@10 501 transform (or vice versa) will result in the original data scaled by
Chris@10 502 the number of real data elements---that is, the product of the
Chris@10 503 (logical) dimensions of the real data.
Chris@10 504 @cindex normalization
Chris@10 505
Chris@10 506
Chris@10 507 (Because the last dimension is treated specially, if it is equal to
Chris@10 508 @code{1} the transform is @emph{not} equivalent to a lower-dimensional
Chris@10 509 r2c/c2r transform. In that case, the last complex dimension also has
Chris@10 510 size @code{1} (@code{=1/2+1}), and no advantage is gained over the
Chris@10 511 complex transforms.)
Chris@10 512
Chris@10 513 @c ------------------------------------------------------------
Chris@10 514 @node More DFTs of Real Data, , Multi-Dimensional DFTs of Real Data, Tutorial
Chris@10 515 @section More DFTs of Real Data
Chris@10 516 @menu
Chris@10 517 * The Halfcomplex-format DFT::
Chris@10 518 * Real even/odd DFTs (cosine/sine transforms)::
Chris@10 519 * The Discrete Hartley Transform::
Chris@10 520 @end menu
Chris@10 521
Chris@10 522 FFTW supports several other transform types via a unified @dfn{r2r}
Chris@10 523 (real-to-real) interface,
Chris@10 524 @cindex r2r
Chris@10 525 so called because it takes a real (@code{double}) array and outputs a
Chris@10 526 real array of the same size. These r2r transforms currently fall into
Chris@10 527 three categories: DFTs of real input and complex-Hermitian output in
Chris@10 528 halfcomplex format, DFTs of real input with even/odd symmetry
Chris@10 529 (a.k.a. discrete cosine/sine transforms, DCTs/DSTs), and discrete
Chris@10 530 Hartley transforms (DHTs), all described in more detail by the
Chris@10 531 following sections.
Chris@10 532
Chris@10 533 The r2r transforms follow the by now familiar interface of creating an
Chris@10 534 @code{fftw_plan}, executing it with @code{fftw_execute(plan)}, and
Chris@10 535 destroying it with @code{fftw_destroy_plan(plan)}. Furthermore, all
Chris@10 536 r2r transforms share the same planner interface:
Chris@10 537
Chris@10 538 @example
Chris@10 539 fftw_plan fftw_plan_r2r_1d(int n, double *in, double *out,
Chris@10 540 fftw_r2r_kind kind, unsigned flags);
Chris@10 541 fftw_plan fftw_plan_r2r_2d(int n0, int n1, double *in, double *out,
Chris@10 542 fftw_r2r_kind kind0, fftw_r2r_kind kind1,
Chris@10 543 unsigned flags);
Chris@10 544 fftw_plan fftw_plan_r2r_3d(int n0, int n1, int n2,
Chris@10 545 double *in, double *out,
Chris@10 546 fftw_r2r_kind kind0,
Chris@10 547 fftw_r2r_kind kind1,
Chris@10 548 fftw_r2r_kind kind2,
Chris@10 549 unsigned flags);
Chris@10 550 fftw_plan fftw_plan_r2r(int rank, const int *n, double *in, double *out,
Chris@10 551 const fftw_r2r_kind *kind, unsigned flags);
Chris@10 552 @end example
Chris@10 553 @findex fftw_plan_r2r_1d
Chris@10 554 @findex fftw_plan_r2r_2d
Chris@10 555 @findex fftw_plan_r2r_3d
Chris@10 556 @findex fftw_plan_r2r
Chris@10 557
Chris@10 558 Just as for the complex DFT, these plan 1d/2d/3d/multi-dimensional
Chris@10 559 transforms for contiguous arrays in row-major order, transforming (real)
Chris@10 560 input to output of the same size, where @code{n} specifies the
Chris@10 561 @emph{physical} dimensions of the arrays. All positive @code{n} are
Chris@10 562 supported (with the exception of @code{n=1} for the @code{FFTW_REDFT00}
Chris@10 563 kind, noted in the real-even subsection below); products of small
Chris@10 564 factors are most efficient (factorizing @code{n-1} and @code{n+1} for
Chris@10 565 @code{FFTW_REDFT00} and @code{FFTW_RODFT00} kinds, described below), but
Chris@10 566 an @Onlogn algorithm is used even for prime sizes.
Chris@10 567
Chris@10 568 Each dimension has a @dfn{kind} parameter, of type
Chris@10 569 @code{fftw_r2r_kind}, specifying the kind of r2r transform to be used
Chris@10 570 for that dimension.
Chris@10 571 @cindex kind (r2r)
Chris@10 572 @tindex fftw_r2r_kind
Chris@10 573 (In the case of @code{fftw_plan_r2r}, this is an array @code{kind[rank]}
Chris@10 574 where @code{kind[i]} is the transform kind for the dimension
Chris@10 575 @code{n[i]}.) The kind can be one of a set of predefined constants,
Chris@10 576 defined in the following subsections.
Chris@10 577
Chris@10 578 In other words, FFTW computes the separable product of the specified
Chris@10 579 r2r transforms over each dimension, which can be used e.g. for partial
Chris@10 580 differential equations with mixed boundary conditions. (For some r2r
Chris@10 581 kinds, notably the halfcomplex DFT and the DHT, such a separable
Chris@10 582 product is somewhat problematic in more than one dimension, however,
Chris@10 583 as is described below.)
Chris@10 584
Chris@10 585 In the current version of FFTW, all r2r transforms except for the
Chris@10 586 halfcomplex type are computed via pre- or post-processing of
Chris@10 587 halfcomplex transforms, and they are therefore not as fast as they
Chris@10 588 could be. Since most other general DCT/DST codes employ a similar
Chris@10 589 algorithm, however, FFTW's implementation should provide at least
Chris@10 590 competitive performance.
Chris@10 591
Chris@10 592 @c =========>
Chris@10 593 @node The Halfcomplex-format DFT, Real even/odd DFTs (cosine/sine transforms), More DFTs of Real Data, More DFTs of Real Data
Chris@10 594 @subsection The Halfcomplex-format DFT
Chris@10 595
Chris@10 596 An r2r kind of @code{FFTW_R2HC} (@dfn{r2hc}) corresponds to an r2c DFT
Chris@10 597 @ctindex FFTW_R2HC
Chris@10 598 @cindex r2c
Chris@10 599 @cindex r2hc
Chris@10 600 (@pxref{One-Dimensional DFTs of Real Data}) but with ``halfcomplex''
Chris@10 601 format output, and may sometimes be faster and/or more convenient than
Chris@10 602 the latter.
Chris@10 603 @cindex halfcomplex format
Chris@10 604 The inverse @dfn{hc2r} transform is of kind @code{FFTW_HC2R}.
Chris@10 605 @ctindex FFTW_HC2R
Chris@10 606 @cindex hc2r
Chris@10 607 This consists of the non-redundant half of the complex output for a 1d
Chris@10 608 real-input DFT of size @code{n}, stored as a sequence of @code{n} real
Chris@10 609 numbers (@code{double}) in the format:
Chris@10 610
Chris@10 611 @tex
Chris@10 612 $$
Chris@10 613 r_0, r_1, r_2, \ldots, r_{n/2}, i_{(n+1)/2-1}, \ldots, i_2, i_1
Chris@10 614 $$
Chris@10 615 @end tex
Chris@10 616 @ifinfo
Chris@10 617 r0, r1, r2, r(n/2), i((n+1)/2-1), ..., i2, i1
Chris@10 618 @end ifinfo
Chris@10 619 @html
Chris@10 620 <p align=center>
Chris@10 621 r<sub>0</sub>, r<sub>1</sub>, r<sub>2</sub>, ..., r<sub>n/2</sub>, i<sub>(n+1)/2-1</sub>, ..., i<sub>2</sub>, i<sub>1</sub>
Chris@10 622 </p>
Chris@10 623 @end html
Chris@10 624
Chris@10 625 Here,
Chris@10 626 @ifinfo
Chris@10 627 rk
Chris@10 628 @end ifinfo
Chris@10 629 @tex
Chris@10 630 $r_k$
Chris@10 631 @end tex
Chris@10 632 @html
Chris@10 633 r<sub>k</sub>
Chris@10 634 @end html
Chris@10 635 is the real part of the @math{k}th output, and
Chris@10 636 @ifinfo
Chris@10 637 ik
Chris@10 638 @end ifinfo
Chris@10 639 @tex
Chris@10 640 $i_k$
Chris@10 641 @end tex
Chris@10 642 @html
Chris@10 643 i<sub>k</sub>
Chris@10 644 @end html
Chris@10 645 is the imaginary part. (Division by 2 is rounded down.) For a
Chris@10 646 halfcomplex array @code{hc[n]}, the @math{k}th component thus has its
Chris@10 647 real part in @code{hc[k]} and its imaginary part in @code{hc[n-k]}, with
Chris@10 648 the exception of @code{k} @code{==} @code{0} or @code{n/2} (the latter
Chris@10 649 only if @code{n} is even)---in these two cases, the imaginary part is
Chris@10 650 zero due to symmetries of the real-input DFT, and is not stored.
Chris@10 651 Thus, the r2hc transform of @code{n} real values is a halfcomplex array of
Chris@10 652 length @code{n}, and vice versa for hc2r.
Chris@10 653 @cindex normalization
Chris@10 654
Chris@10 655
Chris@10 656 Aside from the differing format, the output of
Chris@10 657 @code{FFTW_R2HC}/@code{FFTW_HC2R} is otherwise exactly the same as for
Chris@10 658 the corresponding 1d r2c/c2r transform
Chris@10 659 (i.e. @code{FFTW_FORWARD}/@code{FFTW_BACKWARD} transforms, respectively).
Chris@10 660 Recall that these transforms are unnormalized, so r2hc followed by hc2r
Chris@10 661 will result in the original data multiplied by @code{n}. Furthermore,
Chris@10 662 like the c2r transform, an out-of-place hc2r transform will
Chris@10 663 @emph{destroy its input} array.
Chris@10 664
Chris@10 665 Although these halfcomplex transforms can be used with the
Chris@10 666 multi-dimensional r2r interface, the interpretation of such a separable
Chris@10 667 product of transforms along each dimension is problematic. For example,
Chris@10 668 consider a two-dimensional @code{n0} by @code{n1}, r2hc by r2hc
Chris@10 669 transform planned by @code{fftw_plan_r2r_2d(n0, n1, in, out, FFTW_R2HC,
Chris@10 670 FFTW_R2HC, FFTW_MEASURE)}. Conceptually, FFTW first transforms the rows
Chris@10 671 (of size @code{n1}) to produce halfcomplex rows, and then transforms the
Chris@10 672 columns (of size @code{n0}). Half of these column transforms, however,
Chris@10 673 are of imaginary parts, and should therefore be multiplied by @math{i}
Chris@10 674 and combined with the r2hc transforms of the real columns to produce the
Chris@10 675 2d DFT amplitudes; FFTW's r2r transform does @emph{not} perform this
Chris@10 676 combination for you. Thus, if a multi-dimensional real-input/output DFT
Chris@10 677 is required, we recommend using the ordinary r2c/c2r
Chris@10 678 interface (@pxref{Multi-Dimensional DFTs of Real Data}).
Chris@10 679
Chris@10 680 @c =========>
Chris@10 681 @node Real even/odd DFTs (cosine/sine transforms), The Discrete Hartley Transform, The Halfcomplex-format DFT, More DFTs of Real Data
Chris@10 682 @subsection Real even/odd DFTs (cosine/sine transforms)
Chris@10 683
Chris@10 684 The Fourier transform of a real-even function @math{f(-x) = f(x)} is
Chris@10 685 real-even, and @math{i} times the Fourier transform of a real-odd
Chris@10 686 function @math{f(-x) = -f(x)} is real-odd. Similar results hold for a
Chris@10 687 discrete Fourier transform, and thus for these symmetries the need for
Chris@10 688 complex inputs/outputs is entirely eliminated. Moreover, one gains a
Chris@10 689 factor of two in speed/space from the fact that the data are real, and
Chris@10 690 an additional factor of two from the even/odd symmetry: only the
Chris@10 691 non-redundant (first) half of the array need be stored. The result is
Chris@10 692 the real-even DFT (@dfn{REDFT}) and the real-odd DFT (@dfn{RODFT}), also
Chris@10 693 known as the discrete cosine and sine transforms (@dfn{DCT} and
Chris@10 694 @dfn{DST}), respectively.
Chris@10 695 @cindex real-even DFT
Chris@10 696 @cindex REDFT
Chris@10 697 @cindex real-odd DFT
Chris@10 698 @cindex RODFT
Chris@10 699 @cindex discrete cosine transform
Chris@10 700 @cindex DCT
Chris@10 701 @cindex discrete sine transform
Chris@10 702 @cindex DST
Chris@10 703
Chris@10 704
Chris@10 705 (In this section, we describe the 1d transforms; multi-dimensional
Chris@10 706 transforms are just a separable product of these transforms operating
Chris@10 707 along each dimension.)
Chris@10 708
Chris@10 709 Because of the discrete sampling, one has an additional choice: is the
Chris@10 710 data even/odd around a sampling point, or around the point halfway
Chris@10 711 between two samples? The latter corresponds to @emph{shifting} the
Chris@10 712 samples by @emph{half} an interval, and gives rise to several transform
Chris@10 713 variants denoted by REDFT@math{ab} and RODFT@math{ab}: @math{a} and
Chris@10 714 @math{b} are @math{0} or @math{1}, and indicate whether the input
Chris@10 715 (@math{a}) and/or output (@math{b}) are shifted by half a sample
Chris@10 716 (@math{1} means it is shifted). These are also known as types I-IV of
Chris@10 717 the DCT and DST, and all four types are supported by FFTW's r2r
Chris@10 718 interface.@footnote{There are also type V-VIII transforms, which
Chris@10 719 correspond to a logical DFT of @emph{odd} size @math{N}, independent of
Chris@10 720 whether the physical size @code{n} is odd, but we do not support these
Chris@10 721 variants.}
Chris@10 722
Chris@10 723 The r2r kinds for the various REDFT and RODFT types supported by FFTW,
Chris@10 724 along with the boundary conditions at both ends of the @emph{input}
Chris@10 725 array (@code{n} real numbers @code{in[j=0..n-1]}), are:
Chris@10 726
Chris@10 727 @itemize @bullet
Chris@10 728
Chris@10 729 @item
Chris@10 730 @code{FFTW_REDFT00} (DCT-I): even around @math{j=0} and even around @math{j=n-1}.
Chris@10 731 @ctindex FFTW_REDFT00
Chris@10 732
Chris@10 733 @item
Chris@10 734 @code{FFTW_REDFT10} (DCT-II, ``the'' DCT): even around @math{j=-0.5} and even around @math{j=n-0.5}.
Chris@10 735 @ctindex FFTW_REDFT10
Chris@10 736
Chris@10 737 @item
Chris@10 738 @code{FFTW_REDFT01} (DCT-III, ``the'' IDCT): even around @math{j=0} and odd around @math{j=n}.
Chris@10 739 @ctindex FFTW_REDFT01
Chris@10 740 @cindex IDCT
Chris@10 741
Chris@10 742 @item
Chris@10 743 @code{FFTW_REDFT11} (DCT-IV): even around @math{j=-0.5} and odd around @math{j=n-0.5}.
Chris@10 744 @ctindex FFTW_REDFT11
Chris@10 745
Chris@10 746 @item
Chris@10 747 @code{FFTW_RODFT00} (DST-I): odd around @math{j=-1} and odd around @math{j=n}.
Chris@10 748 @ctindex FFTW_RODFT00
Chris@10 749
Chris@10 750 @item
Chris@10 751 @code{FFTW_RODFT10} (DST-II): odd around @math{j=-0.5} and odd around @math{j=n-0.5}.
Chris@10 752 @ctindex FFTW_RODFT10
Chris@10 753
Chris@10 754 @item
Chris@10 755 @code{FFTW_RODFT01} (DST-III): odd around @math{j=-1} and even around @math{j=n-1}.
Chris@10 756 @ctindex FFTW_RODFT01
Chris@10 757
Chris@10 758 @item
Chris@10 759 @code{FFTW_RODFT11} (DST-IV): odd around @math{j=-0.5} and even around @math{j=n-0.5}.
Chris@10 760 @ctindex FFTW_RODFT11
Chris@10 761
Chris@10 762 @end itemize
Chris@10 763
Chris@10 764 Note that these symmetries apply to the ``logical'' array being
Chris@10 765 transformed; @strong{there are no constraints on your physical input
Chris@10 766 data}. So, for example, if you specify a size-5 REDFT00 (DCT-I) of the
Chris@10 767 data @math{abcde}, it corresponds to the DFT of the logical even array
Chris@10 768 @math{abcdedcb} of size 8. A size-4 REDFT10 (DCT-II) of the data
Chris@10 769 @math{abcd} corresponds to the size-8 logical DFT of the even array
Chris@10 770 @math{abcddcba}, shifted by half a sample.
Chris@10 771
Chris@10 772 All of these transforms are invertible. The inverse of R*DFT00 is
Chris@10 773 R*DFT00; of R*DFT10 is R*DFT01 and vice versa (these are often called
Chris@10 774 simply ``the'' DCT and IDCT, respectively); and of R*DFT11 is R*DFT11.
Chris@10 775 However, the transforms computed by FFTW are unnormalized, exactly
Chris@10 776 like the corresponding real and complex DFTs, so computing a transform
Chris@10 777 followed by its inverse yields the original array scaled by @math{N},
Chris@10 778 where @math{N} is the @emph{logical} DFT size. For REDFT00,
Chris@10 779 @math{N=2(n-1)}; for RODFT00, @math{N=2(n+1)}; otherwise, @math{N=2n}.
Chris@10 780 @cindex normalization
Chris@10 781 @cindex IDCT
Chris@10 782
Chris@10 783
Chris@10 784 Note that the boundary conditions of the transform output array are
Chris@10 785 given by the input boundary conditions of the inverse transform.
Chris@10 786 Thus, the above transforms are all inequivalent in terms of
Chris@10 787 input/output boundary conditions, even neglecting the 0.5 shift
Chris@10 788 difference.
Chris@10 789
Chris@10 790 FFTW is most efficient when @math{N} is a product of small factors; note
Chris@10 791 that this @emph{differs} from the factorization of the physical size
Chris@10 792 @code{n} for REDFT00 and RODFT00! There is another oddity: @code{n=1}
Chris@10 793 REDFT00 transforms correspond to @math{N=0}, and so are @emph{not
Chris@10 794 defined} (the planner will return @code{NULL}). Otherwise, any positive
Chris@10 795 @code{n} is supported.
Chris@10 796
Chris@10 797 For the precise mathematical definitions of these transforms as used by
Chris@10 798 FFTW, see @ref{What FFTW Really Computes}. (For people accustomed to
Chris@10 799 the DCT/DST, FFTW's definitions have a coefficient of @math{2} in front
Chris@10 800 of the cos/sin functions so that they correspond precisely to an
Chris@10 801 even/odd DFT of size @math{N}. Some authors also include additional
Chris@10 802 multiplicative factors of
Chris@10 803 @ifinfo
Chris@10 804 sqrt(2)
Chris@10 805 @end ifinfo
Chris@10 806 @html
Chris@10 807 &radic;2
Chris@10 808 @end html
Chris@10 809 @tex
Chris@10 810 $\sqrt{2}$
Chris@10 811 @end tex
Chris@10 812 for selected inputs and outputs; this makes
Chris@10 813 the transform orthogonal, but sacrifices the direct equivalence to a
Chris@10 814 symmetric DFT.)
Chris@10 815
Chris@10 816 @subsubheading Which type do you need?
Chris@10 817
Chris@10 818 Since the required flavor of even/odd DFT depends upon your problem,
Chris@10 819 you are the best judge of this choice, but we can make a few comments
Chris@10 820 on relative efficiency to help you in your selection. In particular,
Chris@10 821 R*DFT01 and R*DFT10 tend to be slightly faster than R*DFT11
Chris@10 822 (especially for odd sizes), while the R*DFT00 transforms are sometimes
Chris@10 823 significantly slower (especially for even sizes).@footnote{R*DFT00 is
Chris@10 824 sometimes slower in FFTW because we discovered that the standard
Chris@10 825 algorithm for computing this by a pre/post-processed real DFT---the
Chris@10 826 algorithm used in FFTPACK, Numerical Recipes, and other sources for
Chris@10 827 decades now---has serious numerical problems: it already loses several
Chris@10 828 decimal places of accuracy for 16k sizes. There seem to be only two
Chris@10 829 alternatives in the literature that do not suffer similarly: a
Chris@10 830 recursive decomposition into smaller DCTs, which would require a large
Chris@10 831 set of codelets for efficiency and generality, or sacrificing a factor of
Chris@10 832 @tex
Chris@10 833 $\sim 2$
Chris@10 834 @end tex
Chris@10 835 @ifnottex
Chris@10 836 2
Chris@10 837 @end ifnottex
Chris@10 838 in speed to use a real DFT of twice the size. We currently
Chris@10 839 employ the latter technique for general @math{n}, as well as a limited
Chris@10 840 form of the former method: a split-radix decomposition when @math{n}
Chris@10 841 is odd (@math{N} a multiple of 4). For @math{N} containing many
Chris@10 842 factors of 2, the split-radix method seems to recover most of the
Chris@10 843 speed of the standard algorithm without the accuracy tradeoff.}
Chris@10 844
Chris@10 845 Thus, if only the boundary conditions on the transform inputs are
Chris@10 846 specified, we generally recommend R*DFT10 over R*DFT00 and R*DFT01 over
Chris@10 847 R*DFT11 (unless the half-sample shift or the self-inverse property is
Chris@10 848 significant for your problem).
Chris@10 849
Chris@10 850 If performance is important to you and you are using only small sizes
Chris@10 851 (say @math{n<200}), e.g. for multi-dimensional transforms, then you
Chris@10 852 might consider generating hard-coded transforms of those sizes and types
Chris@10 853 that you are interested in (@pxref{Generating your own code}).
Chris@10 854
Chris@10 855 We are interested in hearing what types of symmetric transforms you find
Chris@10 856 most useful.
Chris@10 857
Chris@10 858 @c =========>
Chris@10 859 @node The Discrete Hartley Transform, , Real even/odd DFTs (cosine/sine transforms), More DFTs of Real Data
Chris@10 860 @subsection The Discrete Hartley Transform
Chris@10 861
Chris@10 862 If you are planning to use the DHT because you've heard that it is
Chris@10 863 ``faster'' than the DFT (FFT), @strong{stop here}. The DHT is not
Chris@10 864 faster than the DFT. That story is an old but enduring misconception
Chris@10 865 that was debunked in 1987.
Chris@10 866
Chris@10 867 The discrete Hartley transform (DHT) is an invertible linear transform
Chris@10 868 closely related to the DFT. In the DFT, one multiplies each input by
Chris@10 869 @math{cos - i * sin} (a complex exponential), whereas in the DHT each
Chris@10 870 input is multiplied by simply @math{cos + sin}. Thus, the DHT
Chris@10 871 transforms @code{n} real numbers to @code{n} real numbers, and has the
Chris@10 872 convenient property of being its own inverse. In FFTW, a DHT (of any
Chris@10 873 positive @code{n}) can be specified by an r2r kind of @code{FFTW_DHT}.
Chris@10 874 @ctindex FFTW_DHT
Chris@10 875 @cindex discrete Hartley transform
Chris@10 876 @cindex DHT
Chris@10 877
Chris@10 878 Like the DFT, in FFTW the DHT is unnormalized, so computing a DHT of
Chris@10 879 size @code{n} followed by another DHT of the same size will result in
Chris@10 880 the original array multiplied by @code{n}.
Chris@10 881 @cindex normalization
Chris@10 882
Chris@10 883 The DHT was originally proposed as a more efficient alternative to the
Chris@10 884 DFT for real data, but it was subsequently shown that a specialized DFT
Chris@10 885 (such as FFTW's r2hc or r2c transforms) could be just as fast. In FFTW,
Chris@10 886 the DHT is actually computed by post-processing an r2hc transform, so
Chris@10 887 there is ordinarily no reason to prefer it from a performance
Chris@10 888 perspective.@footnote{We provide the DHT mainly as a byproduct of some
Chris@10 889 internal algorithms. FFTW computes a real input/output DFT of
Chris@10 890 @emph{prime} size by re-expressing it as a DHT plus post/pre-processing
Chris@10 891 and then using Rader's prime-DFT algorithm adapted to the DHT.}
Chris@10 892 However, we have heard rumors that the DHT might be the most appropriate
Chris@10 893 transform in its own right for certain applications, and we would be
Chris@10 894 very interested to hear from anyone who finds it useful.
Chris@10 895
Chris@10 896 If @code{FFTW_DHT} is specified for multiple dimensions of a
Chris@10 897 multi-dimensional transform, FFTW computes the separable product of 1d
Chris@10 898 DHTs along each dimension. Unfortunately, this is not quite the same
Chris@10 899 thing as a true multi-dimensional DHT; you can compute the latter, if
Chris@10 900 necessary, with at most @code{rank-1} post-processing passes
Chris@10 901 [see e.g. H. Hao and R. N. Bracewell, @i{Proc. IEEE} @b{75}, 264--266 (1987)].
Chris@10 902
Chris@10 903 For the precise mathematical definition of the DHT as used by FFTW, see
Chris@10 904 @ref{What FFTW Really Computes}.
Chris@10 905