FFTW FAQ - Section 3 
Using FFTW
FFTW 3 has semantics incompatible with earlier versions: its plans can
only be used for a given stride, multiplicity, and other
characteristics of the input and output arrays; these stronger
semantics are necessary for performance reasons.  Thus, it is
impossible to efficiently emulate the older interface (whose plans can
be used for any transform of the same size).  We believe that it
should be possible to upgrade most programs without any difficulty,
however.  
There are several reasons: 
- It was important for performance reasons that the plan be specific to
array characteristics like the stride (and alignment, for SIMD), and
requiring that the user maintain these invariants is error prone. 
- In most high-performance applications, as far as we can tell, you are
usually transforming the same array over and over, so FFTW's semantics
should not be a burden.  
- If you need to transform another array of the same size, creating a
new plan once the first exists is a cheap operation. 
- If you need to transform many arrays of the same size at once, you
should really use the plan_manyroutines in FFTW's "advanced"
interface.
- If the abovementioned array characteristics are the same, you are
willing to pay close attention to the documentation, and you really
need to, we provide a "new-array execution" interface to
apply a plan to a new array.  
You are probably recreating the plan before every transform, rather
than creating it once and reusing it for all transforms of the same
size.  FFTW is designed to be used in the following way:
- First, you create a plan.  This will take several seconds. 
- Then, you reuse the plan many times to perform FFTs.  These are fast. 
If you don't need to compute many transforms and the time for the
planner is significant, you have two options.  First, you can use theFFTW_ESTIMATE option in the planner, which uses heuristics
instead of runtime measurements and produces a good plan in a short
time.  Second, you can use the wisdom feature to precompute the plan;
see Q3.9 `Can I save FFTW's plans?' 
Probably, NaNs or similar are creeping into your data, and the
slowdown is due to the resulting floating-point exceptions.  For
example, be aware that repeatedly FFTing the same array is a diverging
process (because FFTW computes the unnormalized transform). 
Did the FFTW test programs pass (make check, or cd tests; make bigcheck if you want to be paranoid)?  If so, you almost
certainly have a bug in your own code.  For example, you could be
passing invalid arguments (such as wrongly-sized arrays) to FFTW, or
you could simply have memory corruption elsewhere in your program that
causes random crashes later on.  Please don't complain to us unless
you can come up with a minimal self-contained program (preferably
under 30 lines) that illustrates the problem. 
As described in the manual, on 64-bit machines you must store the
plans in variables large enough to hold a pointer, for example
integer*8.  We recommend using integer*8 on 32-bit machines as well, to simplify porting. 
People follow many different conventions for the DFT, and you should
be sure to know the ones that we use (described in the FFTW manual). 
In particular, you should be aware that the
FFTW_FORWARD/FFTW_BACKWARD directions correspond to signs of -1/+1 in the exponent of the DFT definition. 
(Numerical Recipes uses the opposite convention.)   
You should also know that we compute an unnormalized transform.  In
contrast, Matlab is an example of program that computes a normalized
transform.  See Q3.10 `Why does your inverse transform return a scaled
result?'.  
Finally, note that floating-point arithmetic is not exact, so
different FFT algorithms will give slightly different results (on the
order of the numerical accuracy; typically a fractional difference of
1e-15 or so in double precision).  
If you use FFTW_MEASURE or FFTW_PATIENT mode, then the algorithm FFTW employs is not deterministic: it depends on
runtime performance measurements.  This will cause the results to vary
slightly from run to run.  However, the differences should be slight,
on the order of the floating-point precision, and therefore should
have no practical impact on most applications. 
If you use saved plans (wisdom) or FFTW_ESTIMATE mode, however, then the algorithm is deterministic and the results should be
identical between runs.  
Yes. Starting with version 1.2, FFTW provides the
wisdom mechanism for saving plans; see the FFTW manual. 
Computing the forward transform followed by the backward transform (or
vice versa) yields the original array scaled by the size of the array.
 (For multi-dimensional transforms, the size of the array is the
product of the dimensions.)  We could, instead, have chosen a
normalization that would have returned the unscaled array. Or, to
accomodate the many conventions in this matter, the transform routines
could have accepted a "scale factor" parameter. We did not
do this, however, for two reasons. First, we didn't want to sacrifice
performance in the common case where the scale factor is 1. Second, in
real applications the FFT is followed or preceded by some computation
on the data, into which the scale factor can typically be absorbed at
little or no cost.  
For human viewing of a spectrum, it is often convenient to put the
origin in frequency space at the center of the output array, rather
than in the zero-th element (the default in FFTW).  If all of the
dimensions of your array are even, you can accomplish this by simply
multiplying each element of the input array by (-1)^(i + j + ...),
where i, j, etcetera are the indices of the element.  (This trick is a
general property of the DFT, and is not specific to FFTW.)
FFTW performs an FFT on an array of floating-point values.  You can
certainly use it to compute the transform of an image or audio stream,
but you are responsible for figuring out your data format and
converting it to the form FFTW requires. 
The libraries must be listed in the correct order
(-lfftw3 -lm for FFTW 3.x) and after your program sources/objects.  (The general rule is that if A uses B, then A must be listed before B in the link command.).  
You're a C++ programmer, aren't you?  You have to compile the FFTW
library and link it into your program, not just
#include <fftw3.h>.  (Yes, this is really a FAQ.) 
You cannot declare large arrays with automatic storage (e.g. via
fftw_complex array[N]); you should use fftw_malloc (or equivalent) to allocate the arrays you want
to transform if they are larger than a few hundred elements. 
After you create a plan, FFTW caches the information required to
quickly recreate the plan.  (See Q3.9 `Can I save FFTW's plans?') It also maintains a small amount of other persistent memory.  You can deallocate all of
FFTW's internally allocated memory, if you wish, by calling
fftw_cleanup(), as documented in the manual.  
You should initialize your input array after creating the plan, unless you use FFTW_ESTIMATE: planning with FFTW_MEASURE or FFTW_PATIENT overwrites the input/output arrays, as described in the manual. 
Please do not ask us Windows-specific questions.  We do not
use Windows.  We know nothing about Visual Basic, Visual C++, or .NET.
 Please find the appropriate Usenet discussion group and ask your
question there.  See also Q2.2 `Does FFTW run on Windows?'.  
In general, no, an FFT intrinsically computes all outputs from all
inputs.  In principle, there is something called a
pruned FFT that can do what you want, but to compute K outputs out of N the
complexity is in general O(N log K) instead of O(N log N), thus saving
only a small additive factor in the log.  (The same argument holds if
you instead have only K nonzero inputs.)
There are some specific cases in which you can get the O(N log K)
performance benefits easily, however, by combining a few ordinary
FFTs.  In particular, the case where you want the first K outputs,
where K divides N, can be handled by performing N/K transforms of size
K and then summing the outputs multiplied by appropriate phase
factors.  For more details, see pruned FFTs with FFTW.  
There are also some algorithms that compute pruned transforms
approximately, but they are beyond the scope of this FAQ. 
You can use the FFTW guru interface to create a rank-0 transform of
vector rank 2 where the vector strides are transposed.  (A rank-0
transform is equivalent to a 1D transform of size 1, which.  just
copies the input into the output.)  Specifying the same location for
the input and output makes the transpose in-place. 
For double-valued data stored in row-major format, plan creation looks
like this: 
fftw_plan plan_transpose(int rows, int cols, double *in, double *out)
{
    const unsigned flags = FFTW_ESTIMATE; /* other flags are possible */
    fftw_iodim howmany_dims[2];
    howmany_dims[0].n  = rows;
    howmany_dims[0].is = cols;
    howmany_dims[0].os = 1;
    howmany_dims[1].n  = cols;
    howmany_dims[1].is = 1;
    howmany_dims[1].os = rows;
    return fftw_plan_guru_r2r(/*rank=*/ 0, /*dims=*/ NULL,
                              /*howmany_rank=*/ 2, howmany_dims,
                              in, out, /*kind=*/ NULL, flags);
}
(This entry was written by Rhys Ulerich.)
Next: Internals of FFTW.
Back: Installing FFTW.
Return to contents.
Matteo Frigo and Steven G. Johnson / fftw@fftw.org
- 30 July 2016
Extracted from FFTW Frequently Asked Questions with Answers,
Copyright © 2016 Matteo Frigo and Massachusetts Institute of Technology.