Mercurial > hg > batch-feature-extraction-tool

\input texinfo    @c -*-texinfo-*-
@comment %**start of header
@setfilename fftw3.info
@include version.texi
@settitle FFTW @value{VERSION}
@setchapternewpage odd
@c define constant index (ct)
@defcodeindex ct
@syncodeindex ct fn
@syncodeindex vr fn
@syncodeindex pg fn
@syncodeindex tp fn
@c define foreign function index (ff)
@defcodeindex ff
@syncodeindex ff cp
@c define foreign constant index (fc)
@defcodeindex fc
@syncodeindex fc cp
@c define foreign program index (fp)
@defcodeindex fp
@syncodeindex fp cp
@comment %**end of header

@iftex
@paragraphindent 0
@parskip=@medskipamount
@end iftex

@macro Onlogn
@ifinfo
O(n log n)
@end ifinfo
@html
<i>O</i>(<i>n</i>&nbsp;log&nbsp;<i>n</i>)
@end html
@tex
$O(n \\log n)$
@end tex
@end macro

@macro ndims
@ifinfo
n[0] x n[1] x n[2] x ... x n[d-1]
@end ifinfo
@html
n<sub>0</sub>&nbsp;&times;&nbsp;n<sub>1</sub>&nbsp;&times;&nbsp;n<sub>2</sub>&nbsp;&times;&nbsp;&hellip;&nbsp;&times;&nbsp;n<sub>d-1</sub>
@end html
@tex
$n_0 \\times n_1 \\times n_2 \\times \\cdots \\times n_{d-1}$
@end tex
@end macro

@macro ndimshalf
@ifinfo
n[0] x n[1] x n[2] x ... x (n[d-1]/2 + 1)
@end ifinfo
@html
n<sub>0</sub>&nbsp;&times;&nbsp;n<sub>1</sub>&nbsp;&times;&nbsp;n<sub>2</sub>&nbsp;&times;&nbsp;&hellip;&nbsp;&times;&nbsp;(n<sub>d-1</sub>/2 + 1)
@end html
@tex
$n_0 \\times n_1 \\times n_2 \\times \\cdots \\times (n_{d-1}/2 + 1)$
@end tex
@end macro

@macro twodims{d1, d2}
@ifinfo
\d1\ x \d2\
@end ifinfo
@html
\d1\&nbsp;&times;&nbsp;\d2\
@end html
@tex
$\d1\ \\times \d2\$
@end tex
@end macro

@macro threedims{d1, d2, d3}
@ifinfo
\d1\ x \d2\ x \d3\
@end ifinfo
@html
\d1\&nbsp;&times;&nbsp;\d2\&nbsp;&times;&nbsp;\d3\
@end html
@tex
$\d1\ \\times \d2\ \\times \d3\$
@end tex
@end macro

@macro dimk{k}
@ifinfo
n[\k\]
@end ifinfo
@html
n<sub>\k\</sub>
@end html
@tex
$n_\k\$
@end tex
@end macro


@macro ndimstrans
@ifinfo
n[1] x n[0] x n[2] x ... x n[d-1]
@end ifinfo
@html
n<sub>1</sub>&nbsp;&times;&nbsp;n<sub>0</sub>&nbsp;&times;&nbsp;n<sub>2</sub>&nbsp;&times;&hellip;&times;&nbsp;n<sub>d-1</sub>
@end html
@tex
$n_1 \\times n_0 \\times n_2 \\times \\cdots \\times n_{d-1}$
@end tex
@end macro

@copying
This manual is for FFTW
(version @value{VERSION}, @value{UPDATED}).

Copyright @copyright{} 2003 Matteo Frigo.

Copyright @copyright{} 2003 Massachusetts Institute of Technology.

@quotation
Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.

Permission is granted to copy and distribute modified versions of this
manual under the conditions for verbatim copying, provided that the
entire resulting derived work is distributed under the terms of a
permission notice identical to this one.

Permission is granted to copy and distribute translations of this manual
into another language, under the above conditions for modified versions,
except that this permission notice may be stated in a translation
approved by the Free Software Foundation.
@end quotation
@end copying

@dircategory Texinfo documentation system
@direntry
* fftw3: (fftw3).	FFTW User's Manual.
@end direntry

@titlepage
@title FFTW
@subtitle for version @value{VERSION}, @value{UPDATED}
@author{Matteo Frigo}
@author{Steven G. Johnson}
@page
@vskip 0pt plus 1filll
@insertcopying
@end titlepage

@contents

@ifnottex
@node Top, Introduction, (dir), (dir)
@top FFTW User Manual
Welcome to FFTW, the Fastest Fourier Transform in the West.  FFTW is a
collection of fast C routines to compute the discrete Fourier transform.
This manual documents FFTW version @value{VERSION}.
@end ifnottex

@menu
* Introduction::
* Tutorial::
* Other Important Topics::
* FFTW Reference::
* Multi-threaded FFTW::
* FFTW on the Cell Processor::
* Calling FFTW from Fortran::
* Upgrading from FFTW version 2::
* Installation and Customization::
* Acknowledgments::
* License and Copyright::
* Concept Index::
* Library Index::

@detailmenu
 --- The Detailed Node Listing ---

Tutorial

* Complex One-Dimensional DFTs::
* Complex Multi-Dimensional DFTs::
* One-Dimensional DFTs of Real Data::
* Multi-Dimensional DFTs of Real Data::
* More DFTs of Real Data::

More DFTs of Real Data

* The Halfcomplex-format DFT::
* Real even/odd DFTs (cosine/sine transforms)::
* The Discrete Hartley Transform::

Other Important Topics

* Data Alignment::
* Multi-dimensional Array Format::
* Words of Wisdom-Saving Plans::
* Caveats in Using Wisdom::

Data Alignment

* SIMD alignment and fftw_malloc::
* Stack alignment on x86::

Multi-dimensional Array Format

* Row-major Format::
* Column-major Format::
* Fixed-size Arrays in C::
* Dynamic Arrays in C::
* Dynamic Arrays in C-The Wrong Way::

FFTW Reference

* Data Types and Files::
* Using Plans::
* Basic Interface::
* Advanced Interface::
* Guru Interface::
* New-array Execute Functions::
* Wisdom::
* What FFTW Really Computes::

Data Types and Files

* Complex numbers::
* Precision::
* Memory Allocation::

Basic Interface

* Complex DFTs::
* Planner Flags::
* Real-data DFTs::
* Real-data DFT Array Format::
* Real-to-Real Transforms::
* Real-to-Real Transform Kinds::

Advanced Interface

* Advanced Complex DFTs::
* Advanced Real-data DFTs::
* Advanced Real-to-real Transforms::

Guru Interface

* Interleaved and split arrays::
* Guru vector and transform sizes::
* Guru Complex DFTs::
* Guru Real-data DFTs::
* Guru Real-to-real Transforms::
* 64-bit Guru Interface::

Wisdom

* Wisdom Export::
* Wisdom Import::
* Forgetting Wisdom::
* Wisdom Utilities::

What FFTW Really Computes

* The 1d Discrete Fourier Transform (DFT)::
* The 1d Real-data DFT::
* 1d Real-even DFTs (DCTs)::
* 1d Real-odd DFTs (DSTs)::
* 1d Discrete Hartley Transforms (DHTs)::
* Multi-dimensional Transforms::

Multi-threaded FFTW

* Installation and Supported Hardware/Software::
* Usage of Multi-threaded FFTW::
* How Many Threads to Use?::
* Thread safety::

FFTW on the Cell Processor

* Cell Installation::
* Cell Caveats::
* FFTW Accuracy on Cell::

Calling FFTW from Fortran

* Fortran-interface routines::
* FFTW Constants in Fortran::
* FFTW Execution in Fortran::
* Fortran Examples::
* Wisdom of Fortran?::

Installation and Customization

* Installation on Unix::
* Installation on non-Unix systems::
* Cycle Counters::
* Generating your own code::

@end detailmenu
@end menu

@c ************************************************************
@node    Introduction, Tutorial, Top, Top
@chapter Introduction
This manual documents version @value{VERSION} of FFTW, the
@emph{Fastest Fourier Transform in the West}.  FFTW is a comprehensive
collection of fast C routines for computing the discrete Fourier
transform (DFT) and various special cases thereof.
@cindex discrete Fourier transform
@cindex DFT
@itemize @bullet
@item FFTW computes the DFT of complex data, real data, even-
  or odd-symmetric real data (these symmetric transforms are usually
  known as the discrete cosine or sine transform, respectively), and the
  discrete Hartley transform (DHT) of real data.

@item  The input data can have arbitrary length.
       FFTW employs @Onlogn{} algorithms for all lengths, including
       prime numbers.

@item  FFTW supports arbitrary multi-dimensional data.

@item  FFTW supports the SSE, SSE2, Altivec, and MIPS PS instruction
       sets.

@item  FFTW @value{VERSION} includes parallel (multi-threaded) transforms
  for shared-memory systems.
  FFTW @value{VERSION} does not include distributed-memory parallel
  transforms, but we plan to implement an MPI version soon.  (Meanwhile,
  you can use the MPI implementation from FFTW 2.1.3.)
@end itemize

We assume herein that you are familiar with the properties and uses of
the DFT that are relevant to your application.  Otherwise, see
e.g. @cite{The Fast Fourier Transform and Its Applications} by E. O. Brigham
(Prentice-Hall, Englewood Cliffs, NJ, 1988).
@uref{http://www.fftw.org, Our web page} also has links to FFT-related
information online.
@cindex FFTW

@c TODO: revise.  We don't need to brag any longer
@c
@c FFTW is usually faster (and sometimes much faster) than all other
@c freely-available Fourier transform programs found on the Net.  It is
@c competitive with (and often faster than) the FFT codes in Sun's
@c Performance Library, IBM's ESSL library, HP's CXML library, and
@c Intel's MKL library, which are targeted at specific machines.
@c Moreover, FFTW's performance is @emph{portable}.  Indeed, FFTW is
@c unique in that it automatically adapts itself to your machine, your
@c cache, the size of your memory, your number of registers, and all the
@c other factors that normally make it impossible to optimize a program
@c for more than one machine.  An extensive comparison of FFTW's
@c performance with that of other Fourier transform codes has been made,
@c and the results are available on the Web at
@c @uref{http://fftw.org/~benchfft, the benchFFT home page}.
@c @cindex benchmark
@c @fpindex benchfft

In order to use FFTW effectively, you need to learn one basic concept
of FFTW's internal structure: FFTW does not use a fixed algorithm for
computing the transform, but instead it adapts the DFT algorithm to
details of the underlying hardware in order to maximize performance.
Hence, the computation of the transform is split into two phases.
First, FFTW's @dfn{planner} ``learns'' the fastest way to compute the
transform on your machine.  The planner
@cindex planner
produces a data structure called a @dfn{plan} that contains this
@cindex plan
information.  Subsequently, the plan is @dfn{executed}
@cindex execute
to transform the array of input data as dictated by the plan.  The
plan can be reused as many times as needed.  In typical
high-performance applications, many transforms of the same size are
computed and, consequently, a relatively expensive initialization of
this sort is acceptable.  On the other hand, if you need a single
transform of a given size, the one-time cost of the planner becomes
significant.  For this case, FFTW provides fast planners based on
heuristics or on previously computed plans.

FFTW supports transforms of data with arbitrary length, rank,
multiplicity, and a general memory layout.  In simple cases, however,
this generality may be unnecessary and confusing.  Consequently, we
organized the interface to FFTW into three levels of increasing
generality.
@itemize @bullet
@item The @dfn{basic interface} computes a single
      transform of contiguous data.
@item The @dfn{advanced interface} computes transforms
      of multiple or strided arrays.
@item The @dfn{guru interface} supports the most general data
      layouts, multiplicities, and strides.
@end itemize
We expect that most users will be best served by the basic interface,
whereas the guru interface requires careful attention to the
documentation to avoid problems.
@cindex basic interface
@cindex advanced interface
@cindex guru interface

Besides the automatic performance adaptation performed by the planner,
it is also possible for advanced users to customize FFTW manually.  For
example, if code space is a concern, we provide a tool that links only
the subset of FFTW needed by your application.  Conversely, you may need
to extend FFTW because the standard distribution is not sufficient for
your needs.  For example, the standard FFTW distribution works most
efficiently for arrays whose size can be factored into small primes
(@math{2}, @math{3}, @math{5}, and @math{7}), and otherwise it uses a
slower general-purpose routine.  If you need efficient transforms of
other sizes, you can use FFTW's code generator, which produces fast C
programs (``codelets'') for any particular array size you may care
about.
@cindex code generator
@cindex codelet
For example, if you need transforms of size
@ifinfo
@math{513 = 19 x 3^3},
@end ifinfo
@tex
$513 = 19 \cdot 3^3$,
@end tex
@html
513&nbsp;=&nbsp;19*3<sup>3</sup>,
@end html
you can customize FFTW to support the factor @math{19} efficiently.

For more information regarding FFTW, see the paper, ``The Design and
Implementation of FFTW3,'' by M. Frigo and S. G. Johnson, which was an
invited paper in @cite{Proc. IEEE} @b{93} (2), p. 216 (2005).  The
code generator is described in the paper ``A fast Fourier transform
compiler'',
@cindex compiler
by M. Frigo, in the @cite{Proceedings of the 1999 ACM SIGPLAN Conference
on Programming Language Design and Implementation (PLDI), Atlanta,
Georgia, May 1999}.  These papers, along with the latest version of
FFTW, the FAQ, benchmarks, and other links, are available at
@uref{http://www.fftw.org, the FFTW home page}.

The current version of FFTW incorporates many good ideas from the past
thirty years of FFT literature.  In one way or another, FFTW uses the
Cooley-Tukey algorithm, the prime factor algorithm, Rader's algorithm
for prime sizes, and a split-radix algorithm (with a variation due to
Dan Bernstein).  FFTW's code generator also produces new algorithms
that we do not completely understand.
@cindex algorithm
The reader is referred to the cited papers for the appropriate
references.

The rest of this manual is organized as follows.  We first discuss the
sequential (single-processor) implementation.  We start by describing
the basic interface/features of FFTW in @ref{Tutorial}.  The following
chapter discusses @ref{Other Important Topics}, including @ref{Data
Alignment}, the storage scheme of multi-dimensional arrays
(@pxref{Multi-dimensional Array Format}), and FFTW's mechanism for
storing plans on disk (@pxref{Words of Wisdom-Saving Plans}).  Next,
@ref{FFTW Reference} provides comprehensive documentation of all
FFTW's features.  Parallel transforms are discussed in their own
chapters: @ref{Multi-threaded FFTW}.  Fortran programmers can also use
FFTW, as described in @ref{Calling FFTW from Fortran}.
@ref{Installation and Customization} explains how to install FFTW in
your computer system and how to adapt FFTW to your needs.  License and
copyright information is given in @ref{License and Copyright}.
Finally, we thank all the people who helped us in
@ref{Acknowledgments}.

@c ************************************************************
@node  Tutorial, Other Important Topics, Introduction, Top
@chapter Tutorial
@menu
* Complex One-Dimensional DFTs::
* Complex Multi-Dimensional DFTs::
* One-Dimensional DFTs of Real Data::
* Multi-Dimensional DFTs of Real Data::
* More DFTs of Real Data::
@end menu

This chapter describes the basic usage of FFTW, i.e., how to compute
@cindex basic interface
the Fourier transform of a single array.  This chapter tells the
truth, but not the @emph{whole} truth. Specifically, FFTW implements
additional routines and flags that are not documented here, although
in many cases we try to indicate where added capabilities exist.  For
more complete information, see @ref{FFTW Reference}.  (Note that you
need to compile and install FFTW before you can use it in a program.
For the details of the installation, see @ref{Installation and
Customization}.)

We recommend that you read this tutorial in order.@footnote{You can
read the tutorial in bit-reversed order after computing your first
transform.}  At the least, read the first section (@pxref{Complex
One-Dimensional DFTs}) before reading any of the others, even if your
main interest lies in one of the other transform types.

Users of FFTW version 2 and earlier may also want to read @ref{Upgrading
from FFTW version 2}.

@c ------------------------------------------------------------
@node Complex One-Dimensional DFTs, Complex Multi-Dimensional DFTs, Tutorial, Tutorial
@section Complex One-Dimensional DFTs

@quotation
Plan: To bother about the best method of accomplishing an accidental result.
[Ambrose Bierce, @cite{The Enlarged Devil's Dictionary}.]
@cindex Devil
@end quotation

@iftex
@medskip
@end iftex

The basic usage of FFTW to compute a one-dimensional DFT of size
@code{N} is simple, and it typically looks something like this code:

@example
#include <fftw3.h>
...
@{
    fftw_complex *in, *out;
    fftw_plan p;
    ...
    in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
    out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
    p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE);
    ...
    fftw_execute(p); /* @r{repeat as needed} */
    ...
    fftw_destroy_plan(p);
    fftw_free(in); fftw_free(out);
@}
@end example

(When you compile, you must also link with the @code{fftw3} library,
e.g. @code{-lfftw3 -lm} on Unix systems.)

First you allocate the input and output arrays.  You can allocate them
in any way that you like, but we recommend using @code{fftw_malloc},
which behaves like
@findex fftw_malloc
@code{malloc} except that it properly aligns the array when SIMD
instructions (such as SSE and Altivec) are available (@pxref{SIMD
alignment and fftw_malloc}).
@cindex SIMD

The data is an array of type @code{fftw_complex}, which is by default a
@code{double[2]} composed of the real (@code{in[i][0]}) and imaginary
(@code{in[i][1]}) parts of a complex number.
@tindex fftw_complex

The next step is to create a @dfn{plan}, which is an object
@cindex plan
that contains all the data that FFTW needs to compute the FFT.
This function creates the plan:

@example
fftw_plan fftw_plan_dft_1d(int n, fftw_complex *in, fftw_complex *out,
                           int sign, unsigned flags);
@end example
@findex fftw_plan_dft_1d
@tindex fftw_plan

The first argument, @code{n}, is the size of the transform you are
trying to compute.  The size @code{n} can be any positive integer, but
sizes that are products of small factors are transformed most
efficiently (although prime sizes still use an @Onlogn{} algorithm).

The next two arguments are pointers to the input and output arrays of
the transform.  These pointers can be equal, indicating an
@dfn{in-place} transform.
@cindex in-place

The fourth argument, @code{sign}, can be either @code{FFTW_FORWARD}
(@code{-1}) or @code{FFTW_BACKWARD} (@code{+1}),
@ctindex FFTW_FORWARD
@ctindex FFTW_BACKWARD
and indicates the direction of the transform you are interested in;
technically, it is the sign of the exponent in the transform.

The @code{flags} argument is usually either @code{FFTW_MEASURE} or
@cindex flags
@code{FFTW_ESTIMATE}.  @code{FFTW_MEASURE} instructs FFTW to run
@ctindex FFTW_MEASURE
and measure the execution time of several FFTs in order to find the
best way to compute the transform of size @code{n}.  This process takes
some time (usually a few seconds), depending on your machine and on
the size of the transform.  @code{FFTW_ESTIMATE}, on the contrary,
does not run any computation and just builds a
@ctindex FFTW_ESTIMATE
reasonable plan that is probably sub-optimal.  In short, if your
program performs many transforms of the same size and initialization
time is not important, use @code{FFTW_MEASURE}; otherwise use the
estimate.  The data in the @code{in}/@code{out} arrays is
@emph{overwritten} during @code{FFTW_MEASURE} planning, so such
planning should be done @emph{before} the input is initialized by the
user.

Once the plan has been created, you can use it as many times as you
like for transforms on the specified @code{in}/@code{out} arrays,
computing the actual transforms via @code{fftw_execute(plan)}:
@example
void fftw_execute(const fftw_plan plan);
@end example
@findex fftw_execute

@cindex execute
If you want to transform a @emph{different} array of the same size, you
can create a new plan with @code{fftw_plan_dft_1d} and FFTW
automatically reuses the information from the previous plan, if
possible.  (Alternatively, with the ``guru'' interface you can apply a
given plan to a different array, if you are careful.
@xref{FFTW Reference}.)

When you are done with the plan, you deallocate it by calling
@code{fftw_destroy_plan(plan)}:
@example
void fftw_destroy_plan(fftw_plan plan);
@end example
@findex fftw_destroy_plan
Arrays allocated with @code{fftw_malloc} should be deallocated by
@code{fftw_free} rather than the ordinary @code{free} (or, heaven
forbid, @code{delete}).
@findex fftw_free

The DFT results are stored in-order in the array @code{out}, with the
zero-frequency (DC) component in @code{out[0]}.
@cindex frequency
If @code{in != out}, the transform is @dfn{out-of-place} and the input
array @code{in} is not modified.  Otherwise, the input array is
overwritten with the transform.

Users should note that FFTW computes an @emph{unnormalized} DFT.
Thus, computing a forward followed by a backward transform (or vice
versa) results in the original array scaled by @code{n}.  For the
definition of the DFT, see @ref{What FFTW Really Computes}.
@cindex DFT
@cindex normalization

If you have a C compiler, such as @code{gcc}, that supports the
recent C99 standard, and you @code{#include <complex.h>} @emph{before}
@code{<fftw3.h>}, then @code{fftw_complex} is the native
double-precision complex type and you can manipulate it with ordinary
arithmetic.  Otherwise, FFTW defines its own complex type, which is
bit-compatible with the C99 complex type. @xref{Complex numbers}.
(The C++ @code{<complex>} template class may also be usable via a
typecast.)
@cindex C++

Single and long-double precision versions of FFTW may be installed; to
use them, replace the @code{fftw_} prefix by @code{fftwf_} or
@code{fftwl_} and link with @code{-lfftw3f} or @code{-lfftw3l}, but
use the @emph{same} @code{<fftw3.h>} header file.
@cindex precision

Many more flags exist besides @code{FFTW_MEASURE} and
@code{FFTW_ESTIMATE}.  For example, use @code{FFTW_PATIENT} if you're
willing to wait even longer for a possibly even faster plan (@pxref{FFTW
Reference}).
@ctindex FFTW_PATIENT
You can also save plans for future use, as described by @ref{Words of
Wisdom-Saving Plans}.

@c ------------------------------------------------------------
@node Complex Multi-Dimensional DFTs, One-Dimensional DFTs of Real Data, Complex One-Dimensional DFTs, Tutorial
@section Complex Multi-Dimensional DFTs

Multi-dimensional transforms work much the same way as one-dimensional
transforms: you allocate arrays of @code{fftw_complex} (preferably
using @code{fftw_malloc}), create an @code{fftw_plan}, execute it as
many times as you want with @code{fftw_execute(plan)}, and clean up
with @code{fftw_destroy_plan(plan)} (and @code{fftw_free}).  The only
difference is the routine you use to create the plan:

@example
fftw_plan fftw_plan_dft_2d(int n0, int n1,
                           fftw_complex *in, fftw_complex *out,
                           int sign, unsigned flags);
fftw_plan fftw_plan_dft_3d(int n0, int n1, int n2,
                           fftw_complex *in, fftw_complex *out,
                           int sign, unsigned flags);
fftw_plan fftw_plan_dft(int rank, const int *n,
                        fftw_complex *in, fftw_complex *out,
                        int sign, unsigned flags);
@end example
@findex fftw_plan_dft_2d
@findex fftw_plan_dft_3d
@findex fftw_plan_dft

These routines create plans for @code{n0} by @code{n1} two-dimensional
(2d) transforms, @code{n0} by @code{n1} by @code{n2} 3d transforms,
and arbitrary @code{rank}-dimensional transforms, respectively.  In the
@cindex rank
third case, @code{n} is a pointer to an array @code{n[rank]} denoting
an @code{n[0]} by @code{n[1]} by @dots{} by @code{n[rank-1]}
transform.  All of these transforms operate on contiguous arrays in
the C-standard @dfn{row-major} order, so that the last dimension has
the fastest-varying index in the array.  This layout is described
further in @ref{Multi-dimensional Array Format}.

You may have noticed that all the planner routines described so far
have overlapping functionality.  For example, you can plan a 1d or 2d
transform by using @code{fftw_plan_dft} with a @code{rank} of @code{1}
or @code{2}, or even by calling @code{fftw_plan_dft_3d} with @code{n0}
and/or @code{n1} equal to @code{1} (with no loss in efficiency).  This
pattern continues, and FFTW's planning routines in general form a
``partial order,'' sequences of
@cindex partial order
interfaces with strictly increasing generality but correspondingly
greater complexity.

@code{fftw_plan_dft} is the most general complex-DFT routine that we
describe in this tutorial, but there are also the advanced and guru interfaces,
@cindex advanced interface
@cindex guru interface
which allow one to efficiently combine multiple/strided transforms
into a single FFTW plan, transform a subset of a larger
multi-dimensional array, and/or to handle more general complex-number
formats.  For more information, see @ref{FFTW Reference}.

@c ------------------------------------------------------------
@node One-Dimensional DFTs of Real Data, Multi-Dimensional DFTs of Real Data, Complex Multi-Dimensional DFTs, Tutorial
@section One-Dimensional DFTs of Real Data

In many practical applications, the input data @code{in[i]} are purely
real numbers, in which case the DFT output satisfies the ``Hermitian''
@cindex Hermitian
redundancy: @code{out[i]} is the conjugate of @code{out[n-i]}.  It is
possible to take advantage of these circumstances in order to achieve
roughly a factor of two improvement in both speed and memory usage.

In exchange for these speed and space advantages, the user sacrifices
some of the simplicity of FFTW's complex transforms. First of all, the
input and output arrays are of @emph{different sizes and types}: the
input is @code{n} real numbers, while the output is @code{n/2+1}
complex numbers (the non-redundant outputs); this also requires slight
``padding'' of the input array for
@cindex padding
in-place transforms.  Second, the inverse transform (complex to real)
has the side-effect of @emph{destroying its input array}, by default.
Neither of these inconveniences should pose a serious problem for
users, but it is important to be aware of them.

The routines to perform real-data transforms are almost the same as
those for complex transforms: you allocate arrays of @code{double}
and/or @code{fftw_complex} (preferably using @code{fftw_malloc}),
create an @code{fftw_plan}, execute it as many times as you want with
@code{fftw_execute(plan)}, and clean up with
@code{fftw_destroy_plan(plan)} (and @code{fftw_free}).  The only
differences are that the input (or output) is of type @code{double}
and there are new routines to create the plan.  In one dimension:

@example
fftw_plan fftw_plan_dft_r2c_1d(int n, double *in, fftw_complex *out,
                               unsigned flags);
fftw_plan fftw_plan_dft_c2r_1d(int n, fftw_complex *in, double *out,
                               unsigned flags);
@end example
@findex fftw_plan_dft_r2c_1d
@findex fftw_plan_dft_c2r_1d

for the real input to complex-Hermitian output (@dfn{r2c}) and
complex-Hermitian input to real output (@dfn{c2r}) transforms.
@cindex r2c
@cindex c2r
Unlike the complex DFT planner, there is no @code{sign} argument.
Instead, r2c DFTs are always @code{FFTW_FORWARD} and c2r DFTs are
always @code{FFTW_BACKWARD}.
@ctindex FFTW_FORWARD
@ctindex FFTW_BACKWARD
(For single/long-double precision
@code{fftwf} and @code{fftwl}, @code{double} should be replaced by
@code{float} and @code{long double}, respectively.)
@cindex precision

Here, @code{n} is the ``logical'' size of the DFT, not necessarily the
physical size of the array.  In particular, the real (@code{double})
array has @code{n} elements, while the complex (@code{fftw_complex})
array has @code{n/2+1} elements (where the division is rounded down).
For an in-place transform,
@cindex in-place
@code{in} and @code{out} are aliased to the same array, which must be
big enough to hold both; so, the real array would actually have
@code{2*(n/2+1)} elements, where the elements beyond the first @code{n}
are unused padding.  The @math{k}th element of the complex array is
exactly the same as the @math{k}th element of the corresponding complex
DFT.  All positive @code{n} are supported; products of small factors are
most efficient, but an @Onlogn algorithm is used even for prime
sizes.

As noted above, the c2r transform destroys its input array even for
out-of-place transforms.  This can be prevented, if necessary, by
including @code{FFTW_PRESERVE_INPUT} in the @code{flags}, with
unfortunately some sacrifice in performance.
@cindex flags
@ctindex FFTW_PRESERVE_INPUT
This flag is also not currently supported for multi-dimensional real
DFTs (next section).

Readers familiar with DFTs of real data will recall that the 0th (the
``DC'') and @code{n/2}-th (the ``Nyquist'' frequency, when @code{n} is
even) elements of the complex output are purely real.  Some
implementations therefore store the Nyquist element where the DC
imaginary part would go, in order to make the input and output arrays
the same size.  Such packing, however, does not generalize well to
multi-dimensional transforms, and the space savings are miniscule in
any case; FFTW does not support it.

An alternative interface for one-dimensional r2c and c2r DFTs can be
found in the @samp{r2r} interface (@pxref{The Halfcomplex-format
DFT}), with ``halfcomplex''-format output that @emph{is} the same size
(and type) as the input array.
@cindex halfcomplex format
That interface, although it is not very useful for multi-dimensional
transforms, may sometimes yield better performance.

@c ------------------------------------------------------------
@node Multi-Dimensional DFTs of Real Data, More DFTs of Real Data, One-Dimensional DFTs of Real Data, Tutorial
@section Multi-Dimensional DFTs of Real Data

Multi-dimensional DFTs of real data use the following planner routines:

@example
fftw_plan fftw_plan_dft_r2c_2d(int n0, int n1,
                               double *in, fftw_complex *out,
                               unsigned flags);
fftw_plan fftw_plan_dft_r2c_3d(int n0, int n1, int n2,
                               double *in, fftw_complex *out,
                               unsigned flags);
fftw_plan fftw_plan_dft_r2c(int rank, const int *n,
                            double *in, fftw_complex *out,
                            unsigned flags);
@end example
@findex fftw_plan_dft_r2c_2d
@findex fftw_plan_dft_r2c_3d
@findex fftw_plan_dft_r2c

as well as the corresponding @code{c2r} routines with the input/output
types swapped.  These routines work similarly to their complex
analogues, except for the fact that here the complex output array is cut
roughly in half and the real array requires padding for in-place
transforms (as in 1d, above).

As before, @code{n} is the logical size of the array, and the
consequences of this on the the format of the complex arrays deserve
careful attention.
@cindex r2c/c2r multi-dimensional array format
Suppose that the real data has dimensions @ndims (in row-major order).
Then, after an r2c transform, the output is an @ndimshalf array of
@code{fftw_complex} values in row-major order, corresponding to slightly
over half of the output of the corresponding complex DFT.  (The division
is rounded down.)  The ordering of the data is otherwise exactly the
same as in the complex-DFT case.

Since the complex data is slightly larger than the real data, some
complications arise for in-place transforms.  In this case, the final
dimension of the real data must be padded with extra values to
accommodate the size of the complex data---two values if the last
dimension is even and one if it is odd.
@cindex padding
That is, the last dimension of the real data must physically contain
@tex
$2 (n_{d-1}/2+1)$
@end tex
@ifinfo
2 * (n[d-1]/2+1)
@end ifinfo
@html
2 * (n<sub>d-1</sub>/2+1)
@end html
@code{double} values (exactly enough to hold the complex data).
This physical array size does not, however, change the @emph{logical}
array size---only
@tex
$n_{d-1}$
@end tex
@ifinfo
n[d-1]
@end ifinfo
@html
n<sub>d-1</sub>
@end html
values are actually stored in the last dimension, and
@tex
$n_{d-1}$
@end tex
@ifinfo
n[d-1]
@end ifinfo
@html
n<sub>d-1</sub>
@end html
is the last dimension passed to the plan-creation routine.

For example, consider the transform of a two-dimensional real array of
size @code{n0} by @code{n1}.  The output of the r2c transform is a
two-dimensional complex array of size @code{n0} by @code{n1/2+1}, where
the @code{y} dimension has been cut nearly in half because of
redundancies in the output.  Because @code{fftw_complex} is twice the
size of @code{double}, the output array is slightly bigger than the
input array.  Thus, if we want to compute the transform in place, we
must @emph{pad} the input array so that it is of size @code{n0} by
@code{2*(n1/2+1)}.  If @code{n1} is even, then there are two padding
elements at the end of each row (which need not be initialized, as they
are only used for output).

@ifnotinfo
The following illustration depicts the input and output arrays just
described, for both the out-of-place and in-place transforms (with the
arrows indicating consecutive memory locations):

@image{rfftwnd}
@end ifnotinfo

These transforms are unnormalized, so an r2c followed by a c2r
transform (or vice versa) will result in the original data scaled by
the number of real data elements---that is, the product of the
(logical) dimensions of the real data.
@cindex normalization

(Because the last dimension is treated specially, if it is equal to
@code{1} the transform is @emph{not} equivalent to a lower-dimensional
r2c/c2r transform.  In that case, the last complex dimension also has
size @code{1} (@code{=1/2+1}), and no advantage is gained over the
complex transforms.)

@c ------------------------------------------------------------
@node More DFTs of Real Data,  , Multi-Dimensional DFTs of Real Data, Tutorial
@section More DFTs of Real Data
@menu
* The Halfcomplex-format DFT::
* Real even/odd DFTs (cosine/sine transforms)::
* The Discrete Hartley Transform::
@end menu

FFTW supports several other transform types via a unified @dfn{r2r}
(real-to-real) interface,
@cindex r2r
so called because it takes a real (@code{double}) array and outputs a
real array of the same size.  These r2r transforms currently fall into
three categories: DFTs of real input and complex-Hermitian output in
halfcomplex format, DFTs of real input with even/odd symmetry
(a.k.a. discrete cosine/sine transforms, DCTs/DSTs), and discrete
Hartley transforms (DHTs), all described in more detail by the
following sections.

The r2r transforms follow the by now familiar interface of creating an
@code{fftw_plan}, executing it with @code{fftw_execute(plan)}, and
destroying it with @code{fftw_destroy_plan(plan)}.  Furthermore, all
r2r transforms share the same planner interface:

@example
fftw_plan fftw_plan_r2r_1d(int n, double *in, double *out,
                           fftw_r2r_kind kind, unsigned flags);
fftw_plan fftw_plan_r2r_2d(int n0, int n1, double *in, double *out,
                           fftw_r2r_kind kind0, fftw_r2r_kind kind1,
                           unsigned flags);
fftw_plan fftw_plan_r2r_3d(int n0, int n1, int n2,
                           double *in, double *out,
                           fftw_r2r_kind kind0,
                           fftw_r2r_kind kind1,
                           fftw_r2r_kind kind2,
                           unsigned flags);
fftw_plan fftw_plan_r2r(int rank, const int *n, double *in, double *out,
                        const fftw_r2r_kind *kind, unsigned flags);
@end example
@findex fftw_plan_r2r_1d
@findex fftw_plan_r2r_2d
@findex fftw_plan_r2r_3d
@findex fftw_plan_r2r

Just as for the complex DFT, these plan 1d/2d/3d/multi-dimensional
transforms for contiguous arrays in row-major order, transforming (real)
input to output of the same size, where @code{n} specifies the
@emph{physical} dimensions of the arrays.  All positive @code{n} are
supported (with the exception of @code{n=1} for the @code{FFTW_REDFT00}
kind, noted in the real-even subsection below); products of small
factors are most efficient (factorizing @code{n-1} and @code{n+1} for
@code{FFTW_REDFT00} and @code{FFTW_RODFT00} kinds, described below), but
an @Onlogn algorithm is used even for prime sizes.

Each dimension has a @dfn{kind} parameter, of type
@code{fftw_r2r_kind}, specifying the kind of r2r transform to be used
for that dimension.
@cindex kind (r2r)
@tindex fftw_r2r_kind
(In the case of @code{fftw_plan_r2r}, this is an array @code{kind[rank]}
where @code{kind[i]} is the transform kind for the dimension
@code{n[i]}.)  The kind can be one of a set of predefined constants,
defined in the following subsections.

In other words, FFTW computes the separable product of the specified
r2r transforms over each dimension, which can be used e.g. for partial
differential equations with mixed boundary conditions.  (For some r2r
kinds, notably the halfcomplex DFT and the DHT, such a separable
product is somewhat problematic in more than one dimension, however,
as is described below.)

In the current version of FFTW, all r2r transforms except for the
halfcomplex type are computed via pre- or post-processing of
halfcomplex transforms, and they are therefore not as fast as they
could be.  Since most other general DCT/DST codes employ a similar
algorithm, however, FFTW's implementation should provide at least
competitive performance.

@c =========>
@node The Halfcomplex-format DFT, Real even/odd DFTs (cosine/sine transforms), More DFTs of Real Data, More DFTs of Real Data
@subsection The Halfcomplex-format DFT

An r2r kind of @code{FFTW_R2HC} (@dfn{r2hc}) corresponds to an r2c DFT
@ctindex FFTW_R2HC
@cindex r2c
@cindex r2hc
(@pxref{One-Dimensional DFTs of Real Data}) but with ``halfcomplex''
format output, and may sometimes be faster and/or more convenient than
the latter.
@cindex halfcomplex format
The inverse @dfn{hc2r} transform is of kind @code{FFTW_HC2R}.
@ctindex FFTW_HC2R
@cindex hc2r
This consists of the non-redundant half of the complex output for a 1d
real-input DFT of size @code{n}, stored as a sequence of @code{n} real
numbers (@code{double}) in the format:

@tex
$$
r_0, r_1, r_2, \ldots, r_{n/2}, i_{(n+1)/2-1}, \ldots, i_2, i_1
$$
@end tex
@ifinfo
r0, r1, r2, r(n/2), i((n+1)/2-1), ..., i2, i1
@end ifinfo
@html
<p align=center>
r<sub>0</sub>, r<sub>1</sub>, r<sub>2</sub>, ..., r<sub>n/2</sub>, i<sub>(n+1)/2-1</sub>, ..., i<sub>2</sub>, i<sub>1</sub>
</p>
@end html

Here,
@ifinfo
rk
@end ifinfo
@tex
$r_k$
@end tex
@html
r<sub>k</sub>
@end html
is the real part of the @math{k}th output, and
@ifinfo
ik
@end ifinfo
@tex
$i_k$
@end tex
@html
i<sub>k</sub>
@end html
is the imaginary part.  (Division by 2 is rounded down.) For a
halfcomplex array @code{hc[n]}, the @math{k}th component thus has its
real part in @code{hc[k]} and its imaginary part in @code{hc[n-k]}, with
the exception of @code{k} @code{==} @code{0} or @code{n/2} (the latter
only if @code{n} is even)---in these two cases, the imaginary part is
zero due to symmetries of the real-input DFT, and is not stored.
Thus, the r2hc transform of @code{n} real values is a halfcomplex array of
length @code{n}, and vice versa for hc2r.
@cindex normalization

Aside from the differing format, the output of
@code{FFTW_R2HC}/@code{FFTW_HC2R} is otherwise exactly the same as for
the corresponding 1d r2c/c2r transform
(i.e. @code{FFTW_FORWARD}/@code{FFTW_BACKWARD} transforms, respectively).
Recall that these transforms are unnormalized, so r2hc followed by hc2r
will result in the original data multiplied by @code{n}.  Furthermore,
like the c2r transform, an out-of-place hc2r transform will
@emph{destroy its input} array.

Although these halfcomplex transforms can be used with the
multi-dimensional r2r interface, the interpretation of such a separable
product of transforms along each dimension is problematic.  For example,
consider a two-dimensional @code{n0} by @code{n1}, r2hc by r2hc
transform planned by @code{fftw_plan_r2r_2d(n0, n1, in, out, FFTW_R2HC,
FFTW_R2HC, FFTW_MEASURE)}.  Conceptually, FFTW first transforms the rows
(of size @code{n1}) to produce halfcomplex rows, and then transforms the
columns (of size @code{n0}).  Half of these column transforms, however,
are of imaginary parts, and should therefore be multiplied by @math{i}
and combined with the r2hc transforms of the real columns to produce the
2d DFT amplitudes; FFTW's r2r transform does @emph{not} perform this
combination for you.  Thus, if a multi-dimensional real-input/output DFT
is required, we recommend using the ordinary r2c/c2r
interface (@pxref{Multi-Dimensional DFTs of Real Data}).

@c =========>
@node Real even/odd DFTs (cosine/sine transforms), The Discrete Hartley Transform, The Halfcomplex-format DFT, More DFTs of Real Data
@subsection Real even/odd DFTs (cosine/sine transforms)

The Fourier transform of a real-even function @math{f(-x) = f(x)} is
real-even, and @math{i} times the Fourier transform of a real-odd
function @math{f(-x) = -f(x)} is real-odd.  Similar results hold for a
discrete Fourier transform, and thus for these symmetries the need for
complex inputs/outputs is entirely eliminated.  Moreover, one gains a
factor of two in speed/space from the fact that the data are real, and
an additional factor of two from the even/odd symmetry: only the
non-redundant (first) half of the array need be stored.  The result is
the real-even DFT (@dfn{REDFT}) and the real-odd DFT (@dfn{RODFT}), also
known as the discrete cosine and sine transforms (@dfn{DCT} and
@dfn{DST}), respectively.
@cindex real-even DFT
@cindex REDFT
@cindex real-odd DFT
@cindex RODFT
@cindex discrete cosine transform
@cindex DCT
@cindex discrete sine transform
@cindex DST

(In this section, we describe the 1d transforms; multi-dimensional
transforms are just a separable product of these transforms operating
along each dimension.)

Because of the discrete sampling, one has an additional choice: is the
data even/odd around a sampling point, or around the point halfway
between two samples?  The latter corresponds to @emph{shifting} the
samples by @emph{half} an interval, and gives rise to several transform
variants denoted by REDFT@math{ab} and RODFT@math{ab}: @math{a} and
@math{b} are @math{0} or @math{1}, and indicate whether the input
(@math{a}) and/or output (@math{b}) are shifted by half a sample
(@math{1} means it is shifted).  These are also known as types I-IV of
the DCT and DST, and all four types are supported by FFTW's r2r
interface.@footnote{There are also type V-VIII transforms, which
correspond to a logical DFT of @emph{odd} size @math{N}, independent of
whether the physical size @code{n} is odd, but we do not support these
variants.}

The r2r kinds for the various REDFT and RODFT types supported by FFTW,
along with the boundary conditions at both ends of the @emph{input}
array (@code{n} real numbers @code{in[j=0..n-1]}), are:

@itemize @bullet

@item
@code{FFTW_REDFT00} (DCT-I): even around @math{j=0} and even around @math{j=n-1}.
@ctindex FFTW_REDFT00

@item
@code{FFTW_REDFT10} (DCT-II, ``the'' DCT): even around @math{j=-0.5} and even around @math{j=n-0.5}.
@ctindex FFTW_REDFT10

@item
@code{FFTW_REDFT01} (DCT-III, ``the'' IDCT): even around @math{j=0} and odd around @math{j=n}.
@ctindex FFTW_REDFT01
@cindex IDCT

@item
@code{FFTW_REDFT11} (DCT-IV): even around @math{j=-0.5} and odd around @math{j=n-0.5}.
@ctindex FFTW_REDFT11

@item
@code{FFTW_RODFT00} (DST-I): odd around @math{j=-1} and odd around @math{j=n}.
@ctindex FFTW_RODFT00

@item
@code{FFTW_RODFT10} (DST-II): odd around @math{j=-0.5} and odd around @math{j=n-0.5}.
@ctindex FFTW_RODFT10

@item
@code{FFTW_RODFT01} (DST-III): odd around @math{j=-1} and even around @math{j=n-1}.
@ctindex FFTW_RODFT01

@item
@code{FFTW_RODFT11} (DST-IV): odd around @math{j=-0.5} and even around @math{j=n-0.5}.
@ctindex FFTW_RODFT11

@end itemize

Note that these symmetries apply to the ``logical'' array being
transformed; @strong{there are no constraints on your physical input
data}.  So, for example, if you specify a size-5 REDFT00 (DCT-I) of the
data @math{abcde}, it corresponds to the DFT of the logical even array
@math{abcdedcb} of size 8.  A size-4 REDFT10 (DCT-II) of the data
@math{abcd} corresponds to the size-8 logical DFT of the even array
@math{abcddcba}, shifted by half a sample.

All of these transforms are invertible.  The inverse of R*DFT00 is
R*DFT00; of R*DFT10 is R*DFT01 and vice versa (these are often called
simply ``the'' DCT and IDCT, respectively); and of R*DFT11 is R*DFT11.
However, the transforms computed by FFTW are unnormalized, exactly
like the corresponding real and complex DFTs, so computing a transform
followed by its inverse yields the original array scaled by @math{N},
where @math{N} is the @emph{logical} DFT size.  For REDFT00,
@math{N=2(n-1)}; for RODFT00, @math{N=2(n+1)}; otherwise, @math{N=2n}.
@cindex normalization
@cindex IDCT

Note that the boundary conditions of the transform output array are
given by the input boundary conditions of the inverse transform.
Thus, the above transforms are all inequivalent in terms of
input/output boundary conditions, even neglecting the 0.5 shift
difference.

FFTW is most efficient when @math{N} is a product of small factors; note
that this @emph{differs} from the factorization of the physical size
@code{n} for REDFT00 and RODFT00!  There is another oddity: @code{n=1}
REDFT00 transforms correspond to @math{N=0}, and so are @emph{not
defined} (the planner will return @code{NULL}).  Otherwise, any positive
@code{n} is supported.

For the precise mathematical definitions of these transforms as used by
FFTW, see @ref{What FFTW Really Computes}.  (For people accustomed to
the DCT/DST, FFTW's definitions have a coefficient of @math{2} in front
of the cos/sin functions so that they correspond precisely to an
even/odd DFT of size @math{N}.  Some authors also include additional
multiplicative factors of
@ifinfo
sqrt(2)
@end ifinfo
@html
&radic;2
@end html
@tex
$\sqrt{2}$
@end tex
for selected inputs and outputs; this makes
the transform orthogonal, but sacrifices the direct equivalence to a
symmetric DFT.)

@subsubheading Which type do you need?

Since the required flavor of even/odd DFT depends upon your problem,
you are the best judge of this choice, but we can make a few comments
on relative efficiency to help you in your selection.  In particular,
R*DFT01 and R*DFT10 tend to be slightly faster than R*DFT11
(especially for odd sizes), while the R*DFT00 transforms are sometimes
significantly slower (especially for even sizes).@footnote{R*DFT00 is
sometimes slower in FFTW because we discovered that the standard
algorithm for computing this by a pre/post-processed real DFT---the
algorithm used in FFTPACK, Numerical Recipes, and other sources for
decades now---has serious numerical problems: it already loses several
decimal places of accuracy for 16k sizes.  There seem to be only two
alternatives in the literature that do not suffer similarly: a
recursive decomposition into smaller DCTs, which would require a large
set of codelets for efficiency and generality, or sacrificing a factor of
@tex
$\sim 2$
@end tex
@ifnottex
~2
@end ifnottex
in speed to use a real DFT of twice the size.  We currently
employ the latter technique for general @math{n}, as well as a limited
form of the former method: a split-radix decomposition when @math{n}
is odd (@math{N} a multiple of 4).  For @math{N} containing many
factors of 2, the split-radix method seems to recover most of the
speed of the standard algorithm without the accuracy tradeoff.}

Thus, if only the boundary conditions on the transform inputs are
specified, we generally recommend R*DFT10 over R*DFT00 and R*DFT01 over
R*DFT11 (unless the half-sample shift or the self-inverse property is
significant for your problem).

If performance is important to you and you are using only small sizes
(say @math{n<200}), e.g. for multi-dimensional transforms, then you
might consider generating hard-coded transforms of those sizes and types
that you are interested in (@pxref{Generating your own code}).

We are interested in hearing what types of symmetric transforms you find
most useful.

@c =========>
@node The Discrete Hartley Transform,  , Real even/odd DFTs (cosine/sine transforms), More DFTs of Real Data
@subsection The Discrete Hartley Transform

The discrete Hartley transform (DHT) is an invertible linear transform
closely related to the DFT.  In the DFT, one multiplies each input by
@math{cos - i * sin} (a complex exponential), whereas in the DHT each
input is multiplied by simply @math{cos + sin}.  Thus, the DHT
transforms @code{n} real numbers to @code{n} real numbers, and has the
convenient property of being its own inverse.  In FFTW, a DHT (of any
positive @code{n}) can be specified by an r2r kind of @code{FFTW_DHT}.
@ctindex FFTW_DHT
@cindex discrete Hartley transform
@cindex DHT

If you are planning to use the DHT because you've heard that it is
``faster'' than the DFT (FFT), @strong{stop here}.  That story is an old
but enduring misconception that was debunked in 1987: a properly
designed real-input FFT (such as FFTW's) has no more operations in
general than an FHT.  Moreover, in FFTW, the DHT is ordinarily
@emph{slower} than the DFT for composite sizes (see below).

Like the DFT, in FFTW the DHT is unnormalized, so computing a DHT of
size @code{n} followed by another DHT of the same size will result in
the original array multiplied by @code{n}.
@cindex normalization

The DHT was originally proposed as a more efficient alternative to the
DFT for real data, but it was subsequently shown that a specialized DFT
(such as FFTW's r2hc or r2c transforms) could be just as fast.  In FFTW,
the DHT is actually computed by post-processing an r2hc transform, so
there is ordinarily no reason to prefer it from a performance
perspective.@footnote{We provide the DHT mainly as a byproduct of some
internal algorithms. FFTW computes a real input/output DFT of
@emph{prime} size by re-expressing it as a DHT plus post/pre-processing
and then using Rader's prime-DFT algorithm adapted to the DHT.}
However, we have heard rumors that the DHT might be the most appropriate
transform in its own right for certain applications, and we would be
very interested to hear from anyone who finds it useful.

If @code{FFTW_DHT} is specified for multiple dimensions of a
multi-dimensional transform, FFTW computes the separable product of 1d
DHTs along each dimension.  Unfortunately, this is not quite the same
thing as a true multi-dimensional DHT; you can compute the latter, if
necessary, with at most @code{rank-1} post-processing passes
[see e.g. H. Hao and R. N. Bracewell, @i{Proc. IEEE} @b{75}, 264--266 (1987)].

For the precise mathematical definition of the DHT as used by FFTW, see
@ref{What FFTW Really Computes}.

@c ************************************************************
@node Other Important Topics, FFTW Reference, Tutorial, Top
@chapter Other Important Topics
@menu
* Data Alignment::
* Multi-dimensional Array Format::
* Words of Wisdom-Saving Plans::
* Caveats in Using Wisdom::
@end menu

@c ------------------------------------------------------------
@node Data Alignment, Multi-dimensional Array Format, Other Important Topics, Other Important Topics
@section Data Alignment
@cindex alignment

@menu
* SIMD alignment and fftw_malloc::
* Stack alignment on x86::
@end menu

In order to get the best performance from FFTW, one needs to be
somewhat aware of two problems related to data alignment on x86
(Pentia) architectures: alignment of allocated arrays (for use with
SIMD acceleration), and alignment of the stack.

@c =========>
@node SIMD alignment and fftw_malloc, Stack alignment on x86, Data Alignment, Data Alignment
@subsection SIMD alignment and fftw_malloc

SIMD, which stands for ``Single Instruction Multiple Data,'' is a set of
special operations supported by some processors to perform a single
operation on several numbers (usually 2 or 4) simultaneously.  SIMD
floating-point instructions are available on several popular CPUs:
SSE/SSE2 (single/double precision) on Pentium III and higher and on
AMD64, AltiVec (single precision) on some PowerPCs (Apple G4 and
higher), and MIPS Paired Single.  FFTW can be compiled to support the
SIMD instructions on any of these systems.
@cindex SIMD
@cindex SSE
@cindex SSE2
@cindex AltiVec
@cindex MIPS PS
@cindex precision

A program linking to an FFTW library compiled with SIMD support can
obtain a nonnegligible speedup for most complex and r2c/c2r
transforms.  In order to obtain this speedup, however, the arrays of
complex (or real) data passed to FFTW must be specially aligned in
memory (typically 16-byte aligned), and often this alignment is more
stringent than that provided by the usual @code{malloc} (etc.)
allocation routines.

@cindex portability
In order to guarantee proper alignment for SIMD, therefore, in case
your program is ever linked against a SIMD-using FFTW, we recommend
allocating your transform data with @code{fftw_malloc} and
de-allocating it with @code{fftw_free}.
@findex fftw_malloc
@findex fftw_free
These have exactly the same interface and behavior as
@code{malloc}/@code{free}, except that for a SIMD FFTW they ensure
that the returned pointer has the necessary alignment (by calling
@code{memalign} or its equivalent on your OS).

You are not @emph{required} to use @code{fftw_malloc}.  You can
allocate your data in any way that you like, from @code{malloc} to
@code{new} (in C++) to a fixed-size array declaration.  If the array
happens not to be properly aligned, FFTW will not use the SIMD
extensions.
@cindex C++

@c =========>
@node Stack alignment on x86,  , SIMD alignment and fftw_malloc, Data Alignment
@subsection Stack alignment on x86

On the Pentium and subsequent x86 processors, there is a substantial
performance penalty if double-precision variables are not stored
8-byte aligned; a factor of two or more is not unusual.
Unfortunately, the stack (the place that local variables and
subroutine arguments live) is not guaranteed by the Intel ABI to be
8-byte aligned.

Recent versions of @code{gcc} (as well as most other compilers, we are
told, such as Intel's, Metrowerks', and Microsoft's) are able to keep
the stack 8-byte aligned; @code{gcc} does this by default (see
@code{-mpreferred-stack-boundary} in the @code{gcc} documentation).
If you are not certain whether your compiler maintains stack alignment
by default, it is a good idea to make sure.

Unfortunately, @code{gcc} only @emph{preserves} the stack
alignment---as a result, if the stack starts off misaligned, it will
always be misaligned, with a disastrous effect on performance (in
double precision).  To prevent this, FFTW includes hacks to align its
own stack if necessary, so it should perform well even if you call it
from a program with a misaligned stack.  Currently, our hacks support
@code{gcc} and the Intel C compiler; if you use another compiler you
are on your own.  Fortunately, recent versions of glibc (on GNU/Linux)
provide a properly-aligned starting stack, but this was not the case
with a number of older versions, and we are not certain of the
situation on other operating systems.  Hopefully, as time goes by this
will become less of a concern.

@c ------------------------------------------------------------
@node Multi-dimensional Array Format, Words of Wisdom-Saving Plans, Data Alignment, Other Important Topics
@section Multi-dimensional Array Format

This section describes the format in which multi-dimensional arrays
are stored in FFTW.  We felt that a detailed discussion of this topic
was necessary.  Since several different formats are common, this topic
is often a source of confusion among users.

@menu
* Row-major Format::
* Column-major Format::
* Fixed-size Arrays in C::
* Dynamic Arrays in C::
* Dynamic Arrays in C-The Wrong Way::
@end menu

@c =========>
@node Row-major Format, Column-major Format, Multi-dimensional Array Format, Multi-dimensional Array Format
@subsection Row-major Format
@cindex row-major

The multi-dimensional arrays passed to @code{fftw_plan_dft} etcetera
are expected to be stored as a single contiguous block in
@dfn{row-major} order (sometimes called ``C order'').  Basically, this
means that as you step through adjacent memory locations, the first
dimension's index varies most slowly and the last dimension's index
varies most quickly.

To be more explicit, let us consider an array of rank @math{d} whose
dimensions are @ndims{}. Now, we specify a location in the array by a
sequence of @math{d} (zero-based) indices, one for each dimension:
@tex
$(i_0, i_1, i_2, \ldots, i_{d-1})$.
@end tex
@ifinfo
(i[0], i[1], ..., i[d-1]).
@end ifinfo
@html
(i<sub>0</sub>, i<sub>1</sub>, i<sub>2</sub>,..., i<sub>d-1</sub>).
@end html
If the array is stored in row-major
order, then this element is located at the position
@tex
$i_{d-1} + n_{d-1} (i_{d-2} + n_{d-2} (\ldots + n_1 i_0))$.
@end tex
@ifinfo
i[d-1] + n[d-1] * (i[d-2] + n[d-2] * (... + n[1] * i[0])).
@end ifinfo
@html
i<sub>d-1</sub> + n<sub>d-1</sub> * (i<sub>d-2</sub> + n<sub>d-2</sub> * (... + n<sub>1</sub> * i<sub>0</sub>)).
@end html

Note that, for the ordinary complex DFT, each element of the array
must be of type @code{fftw_complex}; i.e. a (real, imaginary) pair of
(double-precision) numbers.

In the advanced FFTW interface, the physical dimensions @math{n} from
which the indices are computed can be different from (larger than)
the logical dimensions of the transform to be computed, in order to
transform a subset of a larger array.
@cindex advanced interface
Note also that, in the advanced interface, the expression above is
multiplied by a @dfn{stride} to get the actual array index---this is
useful in situations where each element of the multi-dimensional array
is actually a data structure (or another array), and you just want to
transform a single field. In the basic interface, however, the stride
is 1.
@cindex stride

@c =========>
@node Column-major Format, Fixed-size Arrays in C, Row-major Format, Multi-dimensional Array Format
@subsection Column-major Format
@cindex column-major

Readers from the Fortran world are used to arrays stored in
@dfn{column-major} order (sometimes called ``Fortran order'').  This is
essentially the exact opposite of row-major order in that, here, the
@emph{first} dimension's index varies most quickly.

If you have an array stored in column-major order and wish to
transform it using FFTW, it is quite easy to do.  When creating the
plan, simply pass the dimensions of the array to the planner in
@emph{reverse order}.  For example, if your array is a rank three
@code{N x M x L} matrix in column-major order, you should pass the
dimensions of the array as if it were an @code{L x M x N} matrix
(which it is, from the perspective of FFTW).  This is done for you
@emph{automatically} by the FFTW Fortran interface
(@pxref{Calling FFTW from Fortran}).
@cindex Fortran interface

@c =========>
@node Fixed-size Arrays in C, Dynamic Arrays in C, Column-major Format, Multi-dimensional Array Format
@subsection Fixed-size Arrays in C
@cindex C multi-dimensional arrays

A multi-dimensional array whose size is declared at compile time in C
is @emph{already} in row-major order.  You don't have to do anything
special to transform it.  For example:

@example
@{
     fftw_complex data[N0][N1][N2];
     fftw_plan plan;
     ...
     plan = fftw_plan_dft_3d(N0, N1, N2, &data[0][0][0], &data[0][0][0],
                             FFTW_FORWARD, FFTW_ESTIMATE);
     ...
@}
@end example

This will plan a 3d in-place transform of size @code{N0 x N1 x N2}.
Notice how we took the address of the zero-th element to pass to the
planner (we could also have used a typecast).

However, we tend to @emph{discourage} users from declaring their
arrays in this way, for two reasons.  First, this allocates the array
on the stack (``automatic'' storage), which has a very limited size on
most operating systems (declaring an array with more than a few
thousand elements will often cause a crash).  (You can get around this
limitation on man1 systems by declaring the array as
@code{static} and/or global, but that has its own drawbacks.)
Second, it may not optimally align the array for use with a SIMD
FFTW (@pxref{SIMD alignment and fftw_malloc}).  Instead, we recommend
using @code{fftw_malloc}, as described below.

@c =========>
@node Dynamic Arrays in C, Dynamic Arrays in C-The Wrong Way, Fixed-size Arrays in C, Multi-dimensional Array Format
@subsection Dynamic Arrays in C

We recommend allocating most arrays dynamically, with
@code{fftw_malloc}.  This isn't too hard to do, although it is not as
straightforward for multi-dimensional arrays as it is for
one-dimensional arrays.

Creating the array is simple: using a dynamic-allocation routine like
@code{fftw_malloc}, allocate an array big enough to store N
@code{fftw_complex} values (for a complex DFT), where N is the product
of the sizes of the array dimensions (i.e. the total number of complex
values in the array).  For example, here is code to allocate a
@threedims{5,12,27} rank-3 array:
@findex fftw_malloc

@example
fftw_complex *an_array;
an_array = (fftw_complex*) fftw_malloc(5*12*27 * sizeof(fftw_complex));
@end example

Accessing the array elements, however, is more tricky---you can't
simply use multiple applications of the @samp{[]} operator like you
could for fixed-size arrays.  Instead, you have to explicitly compute
the offset into the array using the formula given earlier for
row-major arrays.  For example, to reference the @math{(i,j,k)}-th
element of the array allocated above, you would use the expression
@code{an_array[k + 27 * (j + 12 * i)]}.

This pain can be alleviated somewhat by defining appropriate macros,
or, in C++, creating a class and overloading the @samp{()} operator.
The recent C99 standard provides a way to reinterpret the dynamic
array as a ``variable-length'' multi-dimensional array amenable to
@samp{[]}, but this feature is not yet widely supported by compilers.
@cindex C99
@cindex C++

@c =========>
@node Dynamic Arrays in C-The Wrong Way,  , Dynamic Arrays in C, Multi-dimensional Array Format
@subsection Dynamic Arrays in C---The Wrong Way

A different method for allocating multi-dimensional arrays in C is
often suggested that is incompatible with FFTW: @emph{using it will
cause FFTW to die a painful death}.  We discuss the technique here,
however, because it is so commonly known and used.  This method is to
create arrays of pointers of arrays of pointers of @dots{}etcetera.
For example, the analogue in this method to the example above is:

@example
int i,j;
fftw_complex ***a_bad_array;  /* @r{another way to make a 5x12x27 array} */

a_bad_array = (fftw_complex ***) malloc(5 * sizeof(fftw_complex **));
for (i = 0; i < 5; ++i) @{
     a_bad_array[i] =
        (fftw_complex **) malloc(12 * sizeof(fftw_complex *));
     for (j = 0; j < 12; ++j)
          a_bad_array[i][j] =
                (fftw_complex *) malloc(27 * sizeof(fftw_complex));
@}
@end example

As you can see, this sort of array is inconvenient to allocate (and
deallocate).  On the other hand, it has the advantage that the
@math{(i,j,k)}-th element can be referenced simply by
@code{a_bad_array[i][j][k]}.

If you like this technique and want to maximize convenience in accessing
the array, but still want to pass the array to FFTW, you can use a
hybrid method.  Allocate the array as one contiguous block, but also
declare an array of arrays of pointers that point to appropriate places
in the block.  That sort of trick is beyond the scope of this
documentation; for more information on multi-dimensional arrays in C,
see the @code{comp.lang.c}
@uref{http://www.eskimo.com/~scs/C-faq/s6.html, FAQ}.

@c ------------------------------------------------------------
@node Words of Wisdom-Saving Plans, Caveats in Using Wisdom, Multi-dimensional Array Format, Other Important Topics
@section Words of Wisdom---Saving Plans
@cindex wisdom
@cindex saving plans to disk

FFTW implements a method for saving plans to disk and restoring them.
In fact, what FFTW does is more general than just saving and loading
plans.  The mechanism is called @dfn{wisdom}.  Here, we describe
this feature at a high level. @xref{FFTW Reference}, for a less casual
but more complete discussion of how to use wisdom in FFTW.

Plans created with the @code{FFTW_MEASURE}, @code{FFTW_PATIENT}, or
@code{FFTW_EXHAUSTIVE} options produce near-optimal FFT performance,
but may require a long time to compute because FFTW must measure the
runtime of many possible plans and select the best one.  This setup is
designed for the situations where so many transforms of the same size
must be computed that the start-up time is irrelevant.  For short
initialization times, but slower transforms, we have provided
@code{FFTW_ESTIMATE}.  The @code{wisdom} mechanism is a way to get the
best of both worlds: you compute a good plan once, save it to
disk, and later reload it as many times as necessary.  The wisdom
mechanism can actually save and reload many plans at once, not just
one.
@ctindex FFTW_MEASURE
@ctindex FFTW_PATIENT
@ctindex FFTW_EXHAUSTIVE
@ctindex FFTW_ESTIMATE

Whenever you create a plan, the FFTW planner accumulates wisdom, which
is information sufficient to reconstruct the plan.  After planning,
you can save this information to disk by means of the function:
@example
void fftw_export_wisdom_to_file(FILE *output_file);
@end example
@findex fftw_export_wisdom_to_file

The next time you run the program, you can restore the wisdom with
@code{fftw_import_wisdom_from_file} (which returns non-zero on success),
and then recreate the plan using the same flags as before.
@example
int fftw_import_wisdom_from_file(FILE *input_file);
@end example
@findex fftw_import_wisdom_from_file

Wisdom is automatically used for any size to which it is applicable, as
long as the planner flags are not more ``patient'' than those with which
the wisdom was created.  For example, wisdom created with
@code{FFTW_MEASURE} can be used if you later plan with
@code{FFTW_ESTIMATE} or @code{FFTW_MEASURE}, but not with
@code{FFTW_PATIENT}.

The @code{wisdom} is cumulative, and is stored in a global, private
data structure managed internally by FFTW.  The storage space required
is minimal, proportional to the logarithm of the sizes the wisdom was
generated from.  If memory usage is a concern, however, the wisdom can
be forgotten and its associated memory freed by calling:
@example
void fftw_forget_wisdom(void);
@end example
@findex fftw_forget_wisdom

Wisdom can be exported to a file, a string, or any other medium.
For details, see @ref{Wisdom}.

@node Caveats in Using Wisdom,  , Words of Wisdom-Saving Plans, Other Important Topics
@section Caveats in Using Wisdom
@cindex wisdom, problems with

@quotation
@html
<i>
@end html
For in much wisdom is much grief, and he that increaseth knowledge
increaseth sorrow.
@html
</i>
@end html
[Ecclesiastes 1:18]
@cindex Ecclesiastes
@end quotation
@iftex
@medskip
@end iftex

@cindex portability
There are pitfalls to using wisdom, in that it can negate FFTW's
ability to adapt to changing hardware and other conditions. For
example, it would be perfectly possible to export wisdom from a
program running on one processor and import it into a program running
on another processor.  Doing so, however, would mean that the second
program would use plans optimized for the first processor, instead of
the one it is running on.

It should be safe to reuse wisdom as long as the hardware and program
binaries remain unchanged. (Actually, the optimal plan may change even
between runs of the same binary on identical hardware, due to
differences in the virtual memory environment, etcetera.  Users
seriously interested in performance should worry about this problem,
too.)  It is likely that, if the same wisdom is used for two
different program binaries, even running on the same machine, the
plans may be sub-optimal because of differing code alignments.  It is
therefore wise to recreate wisdom every time an application is
recompiled.  The more the underlying hardware and software changes
between the creation of wisdom and its use, the greater grows
the risk of sub-optimal plans.

Nevertheless, if the choice is between using @code{FFTW_ESTIMATE} or
using possibly-suboptimal wisdom (created on the same machine, but for a
different binary), the wisdom is likely to be better.  For this reason,
we provide a function to import wisdom from a standard system-wide
location (@code{/etc/fftw/wisdom} on Unix):
@cindex wisdom, system-wide

@example
int fftw_import_system_wisdom(void);
@end example
@findex fftw_import_system_wisdom

FFTW also provides a standalone program, @code{fftw-wisdom} (described
by its own @code{man} page on Unix) with which users can create wisdom,
e.g. for a canonical set of sizes to store in the system wisdom file.
@xref{Wisdom Utilities}.
@cindex fftw-wisdom utility

@c ************************************************************
@node FFTW Reference, Multi-threaded FFTW, Other Important Topics, Top
@chapter FFTW Reference

This chapter provides a complete reference for all sequential (i.e.,
one-processor) FFTW functions.  Parallel transforms are described in
later chapters.

@menu
* Data Types and Files::
* Using Plans::
* Basic Interface::
* Advanced Interface::
* Guru Interface::
* New-array Execute Functions::
* Wisdom::
* What FFTW Really Computes::
@end menu

@c ------------------------------------------------------------
@node Data Types and Files, Using Plans, FFTW Reference, FFTW Reference
@section Data Types and Files

All programs using FFTW should include its header file:

@example
#include <fftw3.h>
@end example

You must also link to the FFTW library.  On Unix, this
means adding @code{-lfftw3 -lm} at the @emph{end} of the link command.

@menu
* Complex numbers::
* Precision::
* Memory Allocation::
@end menu

@c =========>
@node Complex numbers, Precision, Data Types and Files, Data Types and Files
@subsection Complex numbers

The default FFTW interface uses @code{double} precision for all
floating-point numbers, and defines a @code{fftw_complex} type to hold
complex numbers as:

@example
typedef double fftw_complex[2];
@end example
@tindex fftw_complex

Here, the @code{[0]} element holds the real part and the @code{[1]}
element holds the imaginary part.

Alternatively, if you have a C compiler (such as @code{gcc}) that
supports the C99 revision of the ANSI C standard, you can use C's new
native complex type (which is binary-compatible with the typedef above).
In particular, if you @code{#include <complex.h>} @emph{before}
@code{<fftw3.h>}, then @code{fftw_complex} is defined to be the native
complex type and you can manipulate it with ordinary arithmetic
(e.g. @code{x = y * (3+4*I)}, where @code{x} and @code{y} are
@code{fftw_complex} and @code{I} is the standard symbol for the
imaginary unit);
@cindex C99

C++ has its own @code{complex<T>} template class, defined in the
standard @code{<complex>} header file.  Reportedly, the C++ standards
committee has recently agreed to mandate that the storage format used
for this type be binary-compatible with the C99 type, i.e. an array
@code{T[2]} with consecutive real @code{[0]} and imaginary @code{[1]}
parts.  (See report
@uref{http://anubis.dkuug.dk/JTC1/SC22/WG21/docs/papers/2002/1388.pdf,
WG21/N1388}.)  Although not part of the official standard as of this
writing, the proposal stated that: ``This solution has been tested with
all current major implementations of the standard library and shown to
be working.''  To the extent that this is true, if you have a variable
@code{complex<double> *x}, you can pass it directly to FFTW via
@code{reinterpret_cast<fftw_complex*>(x)}.
@cindex C++
@cindex portability

@c =========>
@node Precision, Memory Allocation, Complex numbers, Data Types and Files
@subsection Precision
@cindex precision

You can install single and long-double precision versions of FFTW,
which replace @code{double} with @code{float} and @code{long double},
respectively (@pxref{Installation and Customization}).  To use these
interfaces, you:

@itemize @bullet

@item
Link to the single/long-double libraries; on Unix, @code{-lfftw3f} or
@code{-lfftw3l} instead of (or in addition to) @code{-lfftw3}.  (You
can link to the different-precision libraries simultaneously.)

@item
Include the @emph{same} @code{<fftw3.h>} header file.

@item
Replace all lowercase instances of @samp{fftw_} with @samp{fftwf_} or
@code{fftwl_} for single or long-double precision, respectively.
(@code{fftw_complex} becomes @code{fftwf_complex}, @code{fftw_execute}
becomes @code{fftwf_execute}, etcetera.)

@item
Uppercase names, i.e. names beginning with @samp{FFTW_}, remain the
same.

@item
Replace @code{double} with @code{float} or @code{long double} for
subroutine parameters.

@end itemize

Depending upon your compiler and/or hardware, @code{long double} may not
be any more precise than @code{double} (or may not be supported at all,
although it is standard in C99).
@cindex C99

@c =========>
@node Memory Allocation,  , Precision, Data Types and Files
@subsection Memory Allocation

@example
void *fftw_malloc(size_t n);
void fftw_free(void *p);
@end example
@findex fftw_malloc
@findex fftw_free

These are functions that behave identically to @code{malloc} and
@code{free}, except that they guarantee that the returned pointer obeys
any special alignment restrictions imposed by any algorithm in FFTW
(e.g. for SIMD acceleration).  @xref{Data Alignment}.
@cindex alignment

Data allocated by @code{fftw_malloc} @emph{must} be deallocated by
@code{fftw_free} and not by the ordinary @code{free}.

These routines simply call through to your operating system's
@code{malloc} or, if necessary, its aligned equivalent
(e.g. @code{memalign}), so you normally need not worry about any
significant time or space overhead.  You are @emph{not required} to use
them to allocate your data, but we strongly recommend it.

Note: in C++, just as with ordinary @code{malloc}, you must typecast
the output of @code{fftw_malloc} to whatever pointer type you are
allocating.
@cindex C++

@c ------------------------------------------------------------
@node Using Plans, Basic Interface, Data Types and Files, FFTW Reference
@section Using Plans

Plans for all transform types in FFTW are stored as type
@code{fftw_plan} (an opaque pointer type), and are created by one of the
various planning routines described in the following sections.
@tindex fftw_plan
An @code{fftw_plan} contains all information necessary to compute the
transform, including the pointers to the input and output arrays.

@example
void fftw_execute(const fftw_plan plan);
@end example
@findex fftw_execute

This executes the @code{plan}, to compute the corresponding transform on
the arrays for which it was planned (which must still exist).  The plan
is not modified, and @code{fftw_execute} can be called as many times as
desired.

To apply a given plan to a different array, you can use the new-array execute
interface.  @xref{New-array Execute Functions}.

@code{fftw_execute} (and equivalents) is the only function in FFTW
guaranteed to be thread-safe; see @ref{Thread safety}.

This function:
@example
void fftw_destroy_plan(fftw_plan plan);
@end example
@findex fftw_destroy_plan
deallocates the @code{plan} and all its associated data.

FFTW's planner saves some other persistent data, such as the
accumulated wisdom and a list of algorithms available in the current
configuration.  If you want to deallocate all of that and reset FFTW
to the pristine state it was in when you started your program, you can
call:

@example
void fftw_cleanup(void);
@end example
@findex fftw_cleanup

After calling @code{fftw_cleanup}, all existing plans become undefined,
and you should not attempt to execute them nor to destroy them.  You can
however create and execute/destroy new plans, in which case FFTW starts
accumulating wisdom information again.

@code{fftw_cleanup} does not deallocate your plans; you should still
call @code{fftw_destroy_plan} for this purpose.

The following two routines are provided purely for academic purposes
(that is, for entertainment).

@example
void fftw_flops(const fftw_plan plan,
                double *add, double *mul, double *fma);
@end example
@findex fftw_flops

Given a @code{plan}, set @code{add}, @code{mul}, and @code{fma} to an
exact count of the number of floating-point additions, multiplications,
and fused multiply-add operations involved in the plan's execution.  The
total number of floating-point operations (flops) is @code{add + mul +
2*fma}, or @code{add + mul + fma} if the hardware supports fused
multiply-add instructions (although the number of FMA operations is only
approximate because of compiler voodoo).  (The number of operations
should be an integer, but we use @code{double} to avoid overflowing
@code{int} for large transforms; the arguments are of type @code{double}
even for single and long-double precision versions of FFTW.)

@example
void fftw_fprint_plan(const fftw_plan plan, FILE *output_file);
void fftw_print_plan(const fftw_plan plan);
@end example
@findex fftw_fprint_plan
@findex fftw_print_plan

This outputs a ``nerd-readable'' representation of the @code{plan} to
the given file or to @code{stdout}, respectively.

@c ------------------------------------------------------------
@node Basic Interface, Advanced Interface, Using Plans, FFTW Reference
@section Basic Interface
@cindex basic interface

The basic interface, which we expect to satisfy the needs of most users,
provides planner routines for transforms of a single contiguous array
with any of FFTW's supported transform types.

@menu
* Complex DFTs::
* Planner Flags::
* Real-data DFTs::
* Real-data DFT Array Format::
* Real-to-Real Transforms::
* Real-to-Real Transform Kinds::
@end menu

@c =========>
@node Complex DFTs, Planner Flags, Basic Interface, Basic Interface
@subsection Complex DFTs

@example
fftw_plan fftw_plan_dft_1d(int n,
                           fftw_complex *in, fftw_complex *out,
                           int sign, unsigned flags);
fftw_plan fftw_plan_dft_2d(int n0, int n1,
                           fftw_complex *in, fftw_complex *out,
                           int sign, unsigned flags);
fftw_plan fftw_plan_dft_3d(int n0, int n1, int n2,
                           fftw_complex *in, fftw_complex *out,
                           int sign, unsigned flags);
fftw_plan fftw_plan_dft(int rank, const int *n,
                        fftw_complex *in, fftw_complex *out,
                        int sign, unsigned flags);
@end example
@findex fftw_plan_dft_1d
@findex fftw_plan_dft_2d
@findex fftw_plan_dft_3d
@findex fftw_plan_dft

Plan a complex input/output discrete Fourier transform (DFT) in zero or
more dimensions, returning an @code{fftw_plan} (@pxref{Using Plans}).

Once you have created a plan for a certain transform type and
parameters, then creating another plan of the same type and parameters,
but for different arrays, is fast and shares constant data with the
first plan (if it still exists).

The planner returns @code{NULL} if the plan cannot be created.  A
non-@code{NULL} plan is always returned by the basic interface unless
you are using a customized FFTW configuration supporting a restricted
set of transforms.

@subsubheading Arguments
@itemize @bullet

@item
@code{rank} is the dimensionality of the transform (it should be the
size of the array @code{*n}), and can be any non-negative integer.  The
@samp{_1d}, @samp{_2d}, and @samp{_3d} planners correspond to a
@code{rank} of @code{1}, @code{2}, and @code{3}, respectively.  A
@code{rank} of zero is equivalent to a transform of size 1, i.e. a copy
of one number from input to output.

@item
@code{n}, or @code{n0}/@code{n1}/@code{n2}, or @code{n[rank]},
respectively, gives the size of the transform dimensions.  They can be
any positive integer.

@itemize @minus
@item
@cindex row-major
Multi-dimensional arrays are stored in row-major order with dimensions:
@code{n0} x @code{n1}; or @code{n0} x @code{n1} x @code{n2}; or
@code{n[0]} x @code{n[1]} x ... x @code{n[rank-1]}.
@xref{Multi-dimensional Array Format}.
@item
FFTW is best at handling sizes of the form
@ifinfo
@math{2^a 3^b 5^c 7^d 11^e 13^f},
@end ifinfo
@tex
$2^a 3^b 5^c 7^d 11^e 13^f$,
@end tex
@html
2<sup>a</sup> 3<sup>b</sup> 5<sup>c</sup> 7<sup>d</sup>
        11<sup>e</sup> 13<sup>f</sup>,
@end html
where @math{e+f} is either @math{0} or @math{1}, and the other exponents
are arbitrary.  Other sizes are computed by means of a slow,
general-purpose algorithm (which nevertheless retains @Onlogn
performance even for prime sizes).  It is possible to customize FFTW
for different array sizes; see @ref{Installation and Customization}.
Transforms whose sizes are powers of @math{2} are especially fast.
@end itemize

@item
@code{in} and @code{out} point to the input and output arrays of the
transform, which may be the same (yielding an in-place transform).
@cindex in-place
These arrays are overwritten during planning, unless
@code{FFTW_ESTIMATE} is used in the flags.  (The arrays need not be
initialized, but they must be allocated.)

If @code{in == out}, the transform is @dfn{in-place} and the input
array is overwritten. If @code{in != out}, the two arrays must
not overlap (but FFTW does not check for this condition).

@item
@code{sign} is the sign of the exponent in the formula that defines the
Fourier transform.  It can be @math{-1} (= @code{FFTW_FORWARD}) or
@math{+1} (= @code{FFTW_BACKWARD}).

@item
@cindex flags
@code{flags} is a bitwise OR (@samp{|}) of zero or more planner flags,
as defined in @ref{Planner Flags}.

@end itemize

FFTW computes an unnormalized transform: computing a forward followed by
a backward transform (or vice versa) will result in the original data
multiplied by the size of the transform (the product of the dimensions).
@cindex normalization
For more information, see @ref{What FFTW Really Computes}.

@c =========>
@node Planner Flags, Real-data DFTs, Complex DFTs, Basic Interface
@subsection Planner Flags

All of the planner routines in FFTW accept an integer @code{flags}
argument, which is a bitwise OR (@samp{|}) of zero or more of the flag
constants defined below.  These flags control the rigor (and time) of
the planning process, and can also impose (or lift) restrictions on the
type of transform algorithm that is employed.

@emph{Important:} the planner overwrites the input array during
planning unless a saved plan (@pxref{Wisdom}) is available for that
problem, so you should initialize your input data after creating the
plan.  The only exceptions to this are the @code{FFTW_ESTIMATE} and
@code{FFTW_WISDOM_ONLY} flags, as mentioned below.

In all cases, if wisdom is available for the given problem that was
created with equal-or-greater planning rigor, then it is used instead.
For example, in @code{FFTW_ESTIMATE} mode any available wisdom is
used, whereas in @code{FFTW_PATIENT} mode only wisdom created in
patient or exhaustive mode can be used.  @xref{Words of Wisdom-Saving
Plans}.

@subsubheading Planning-rigor flags
@itemize @bullet

@item
@ctindex FFTW_ESTIMATE
@code{FFTW_ESTIMATE} specifies that, instead of actual measurements of
different algorithms, a simple heuristic is used to pick a (probably
sub-optimal) plan quickly.  With this flag, the input/output arrays are
not overwritten during planning.

@item
@ctindex FFTW_MEASURE
@code{FFTW_MEASURE} tells FFTW to find an optimized plan by actually
@emph{computing} several FFTs and measuring their execution time.
Depending on your machine, this can take some time (often a few
seconds).  @code{FFTW_MEASURE} is the default planning option.

@item
@ctindex FFTW_PATIENT
@code{FFTW_PATIENT} is like @code{FFTW_MEASURE}, but considers a wider
range of algorithms and often produces a ``more optimal'' plan
(especially for large transforms), but at the expense of several times
longer planning time (especially for large transforms).

@item
@ctindex FFTW_EXHAUSTIVE
@code{FFTW_EXHAUSTIVE} is like @code{FFTW_PATIENT}, but considers an
even wider range of algorithms, including many that we think are
unlikely to be fast, to produce the most optimal plan but with a
substantially increased planning time.

@item
@ctindex FFTW_WISDOM_ONLY
@code{FFTW_WISDOM_ONLY} is a special planning mode in which the plan
is only created if wisdom is available for the given problem, and
otherwise a @code{NULL} plan is returned.  This can be combined with
other flags, e.g. @samp{FFTW_WISDOM_ONLY | FFTW_PATIENT} creates a
plan only if wisdom is available that was created in
@code{FFTW_PATIENT} or @code{FFTW_EXHAUSTIVE} mode.  The
@code{FFTW_WISDOM_ONLY} flag is intended for users who need to detect
whether wisdom is available; for example, if wisdom is not available
one may wish to allocate new arrays for planning so that user data is
not overwritten.

@end itemize

@subsubheading Algorithm-restriction flags
@itemize @bullet

@item
@ctindex FFTW_DESTROY_INPUT
@code{FFTW_DESTROY_INPUT} specifies that an out-of-place transform is
allowed to @emph{overwrite its input} array with arbitrary data; this
can sometimes allow more efficient algorithms to be employed.
@cindex out-of-place

@item
@ctindex FFTW_PRESERVE_INPUT
@code{FFTW_PRESERVE_INPUT} specifies that an out-of-place transform must
@emph{not change its input} array.  This is ordinarily the
@emph{default}, except for c2r and hc2r (i.e. complex-to-real)
transforms for which @code{FFTW_DESTROY_INPUT} is the default.  In the
latter cases, passing @code{FFTW_PRESERVE_INPUT} will attempt to use
algorithms that do not destroy the input, at the expense of worse
performance; for multi-dimensional c2r transforms, however, no
input-preserving algorithms are implemented and the planner will return
@code{NULL} if one is requested.
@cindex c2r
@cindex hc2r

@item
@ctindex FFTW_UNALIGNED
@cindex alignment
@code{FFTW_UNALIGNED} specifies that the algorithm may not impose any
unusual alignment requirements on the input/output arrays (i.e. no
SIMD may be used).  This flag is normally @emph{not necessary}, since
the planner automatically detects misaligned arrays.  The only use for
this flag is if you want to use the new-array execute interface to
execute a given plan on a different array that may not be aligned like
the original.  (Using @code{fftw_malloc} makes this flag unnecessary
even then.)

@end itemize

@subsubheading Limiting planning time

@example
extern void fftw_set_timelimit(double seconds);
@end example
@findex fftw_set_timelimit

This function instructs FFTW to spend at most @code{seconds} seconds
(approximately) in the planner.  If @code{seconds ==
FFTW_NO_TIMELIMIT} (the default value, which is negative), then
planning time is unbounded.  Otherwise, FFTW plans with a
progressively wider range of algorithms until the the given time limit
is reached or the given range of algorithms is explored, returning the
best available plan.
@ctindex FFTW_NO_TIMELIMIT

For example, specifying @code{FFTW_PATIENT} first plans in
@code{FFTW_ESTIMATE} mode, then in @code{FFTW_MEASURE} mode, then
finally (time permitting) in @code{FFTW_PATIENT}.  If
@code{FFTW_EXHAUSTIVE} is specified instead, the planner will further
progress to @code{FFTW_EXHAUSTIVE} mode.

Note that the @code{seconds} argument specifies only a rough limit; in
practice, the planner may use somewhat more time if the time limit is
reached when the planner is in the middle of an operation that cannot
be interrupted.  At the very least, the planner will complete planning
in @code{FFTW_ESTIMATE} mode (which is thus equivalent to a time limit
of 0).


@c =========>
@node Real-data DFTs, Real-data DFT Array Format, Planner Flags, Basic Interface
@subsection Real-data DFTs

@example
fftw_plan fftw_plan_dft_r2c_1d(int n,
                               double *in, fftw_complex *out,
                               unsigned flags);
fftw_plan fftw_plan_dft_r2c_2d(int n0, int n1,
                               double *in, fftw_complex *out,
                               unsigned flags);
fftw_plan fftw_plan_dft_r2c_3d(int n0, int n1, int n2,
                               double *in, fftw_complex *out,
                               unsigned flags);
fftw_plan fftw_plan_dft_r2c(int rank, const int *n,
                            double *in, fftw_complex *out,
                            unsigned flags);
@end example
@findex fftw_plan_dft_r2c_1d
@findex fftw_plan_dft_r2c_2d
@findex fftw_plan_dft_r2c_3d
@findex fftw_plan_dft_r2c
@cindex r2c

Plan a real-input/complex-output discrete Fourier transform (DFT) in
zero or more dimensions, returning an @code{fftw_plan} (@pxref{Using
Plans}).

Once you have created a plan for a certain transform type and
parameters, then creating another plan of the same type and parameters,
but for different arrays, is fast and shares constant data with the
first plan (if it still exists).

The planner returns @code{NULL} if the plan cannot be created.  A
non-@code{NULL} plan is always returned by the basic interface unless
you are using a customized FFTW configuration supporting a restricted
set of transforms, or if you use the @code{FFTW_PRESERVE_INPUT} flag
with a multi-dimensional out-of-place c2r transform (see below).

@subsubheading Arguments
@itemize @bullet

@item
@code{rank} is the dimensionality of the transform (it should be the
size of the array @code{*n}), and can be any non-negative integer.  The
@samp{_1d}, @samp{_2d}, and @samp{_3d} planners correspond to a
@code{rank} of @code{1}, @code{2}, and @code{3}, respectively.  A
@code{rank} of zero is equivalent to a transform of size 1, i.e. a copy
of one number (with zero imaginary part) from input to output.

@item
@code{n}, or @code{n0}/@code{n1}/@code{n2}, or @code{n[rank]},
respectively, gives the size of the @emph{logical} transform dimensions.
They can be any positive integer.  This is different in general from the
@emph{physical} array dimensions, which are described in @ref{Real-data
DFT Array Format}.

@itemize @minus
@item
FFTW is best at handling sizes of the form
@ifinfo
@math{2^a 3^b 5^c 7^d 11^e 13^f},
@end ifinfo
@tex
$2^a 3^b 5^c 7^d 11^e 13^f$,
@end tex
@html
2<sup>a</sup> 3<sup>b</sup> 5<sup>c</sup> 7<sup>d</sup>
        11<sup>e</sup> 13<sup>f</sup>,
@end html
where @math{e+f} is either @math{0} or @math{1}, and the other exponents
are arbitrary.  Other sizes are computed by means of a slow,
general-purpose algorithm (which nevertheless retains @Onlogn
performance even for prime sizes).  (It is possible to customize FFTW
for different array sizes; see @ref{Installation and Customization}.)
Transforms whose sizes are powers of @math{2} are especially fast, and
it is generally beneficial for the @emph{last} dimension of an r2c/c2r
transform to be @emph{even}.
@end itemize

@item
@code{in} and @code{out} point to the input and output arrays of the
transform, which may be the same (yielding an in-place transform).
@cindex in-place
These arrays are overwritten during planning, unless
@code{FFTW_ESTIMATE} is used in the flags.  (The arrays need not be
initialized, but they must be allocated.)  For an in-place transform, it
is important to remember that the real array will require padding,
described in @ref{Real-data DFT Array Format}.
@cindex padding

@item
@cindex flags
@code{flags} is a bitwise OR (@samp{|}) of zero or more planner flags,
as defined in @ref{Planner Flags}.

@end itemize

The inverse transforms, taking complex input (storing the non-redundant
half of a logically Hermitian array) to real output, are given by:

@example
fftw_plan fftw_plan_dft_c2r_1d(int n,
                               fftw_complex *in, double *out,
                               unsigned flags);
fftw_plan fftw_plan_dft_c2r_2d(int n0, int n1,
                               fftw_complex *in, double *out,
                               unsigned flags);
fftw_plan fftw_plan_dft_c2r_3d(int n0, int n1, int n2,
                               fftw_complex *in, double *out,
                               unsigned flags);
fftw_plan fftw_plan_dft_c2r(int rank, const int *n,
                            fftw_complex *in, double *out,
                            unsigned flags);
@end example
@findex fftw_plan_dft_c2r_1d
@findex fftw_plan_dft_c2r_2d
@findex fftw_plan_dft_c2r_3d
@findex fftw_plan_dft_c2r
@cindex c2r

The arguments are the same as for the r2c transforms, except that the
input and output data formats are reversed.

FFTW computes an unnormalized transform: computing an r2c followed by a
c2r transform (or vice versa) will result in the original data
multiplied by the size of the transform (the product of the logical
dimensions).
@cindex normalization
An r2c transform produces the same output as a @code{FFTW_FORWARD}
complex DFT of the same input, and a c2r transform is correspondingly
equivalent to @code{FFTW_BACKWARD}.  For more information, see @ref{What
FFTW Really Computes}.

@c =========>
@node Real-data DFT Array Format, Real-to-Real Transforms, Real-data DFTs, Basic Interface
@subsection Real-data DFT Array Format
@cindex r2c/c2r multi-dimensional array format

The output of a DFT of real data (r2c) contains symmetries that, in
principle, make half of the outputs redundant (@pxref{What FFTW Really
Computes}).  (Similarly for the input of an inverse c2r transform.)  In
practice, it is not possible to entirely realize these savings in an
efficient and understandable format that generalizes to
multi-dimensional transforms.  Instead, the output of the r2c
transforms is @emph{slightly} over half of the output of the
corresponding complex transform.  We do not ``pack'' the data in any
way, but store it as an ordinary array of @code{fftw_complex} values.
In fact, this data is simply a subsection of what would be the array in
the corresponding complex transform.

Specifically, for a real transform of @math{d} (= @code{rank})
dimensions @ndims{}, the complex data is an @ndimshalf array of
@code{fftw_complex} values in row-major order (with the division rounded
down).  That is, we only store the @emph{lower} half (non-negative
frequencies), plus one element, of the last dimension of the data from
the ordinary complex transform.  (We could have instead taken half of
any other dimension, but implementation turns out to be simpler if the
last, contiguous, dimension is used.)

@cindex out-of-place
For an out-of-place transform, the real data is simply an array with
physical dimensions @ndims in row-major order.

@cindex in-place
@cindex padding
For an in-place transform, some complications arise since the complex data
is slightly larger than the real data.  In this case, the final
dimension of the real data must be @emph{padded} with extra values to
accommodate the size of the complex data---two extra if the last
dimension is even and one if it is odd.  That is, the last dimension of
the real data must physically contain
@tex
$2 (n_{d-1}/2+1)$
@end tex
@ifinfo
2 * (n[d-1]/2+1)
@end ifinfo
@html
2 * (n<sub>d-1</sub>/2+1)
@end html
@code{double} values (exactly enough to hold the complex data).  This
physical array size does not, however, change the @emph{logical} array
size---only
@tex
$n_{d-1}$
@end tex
@ifinfo
n[d-1]
@end ifinfo
@html
n<sub>d-1</sub>
@end html
values are actually stored in the last dimension, and
@tex
$n_{d-1}$
@end tex
@ifinfo
n[d-1]
@end ifinfo
@html
n<sub>d-1</sub>
@end html
is the last dimension passed to the planner.

@c =========>
@node Real-to-Real Transforms, Real-to-Real Transform Kinds, Real-data DFT Array Format, Basic Interface
@subsection Real-to-Real Transforms
@cindex r2r

@example
fftw_plan fftw_plan_r2r_1d(int n, double *in, double *out,
                           fftw_r2r_kind kind, unsigned flags);
fftw_plan fftw_plan_r2r_2d(int n0, int n1, double *in, double *out,
                           fftw_r2r_kind kind0, fftw_r2r_kind kind1,
                           unsigned flags);
fftw_plan fftw_plan_r2r_3d(int n0, int n1, int n2,
                           double *in, double *out,
                           fftw_r2r_kind kind0,
                           fftw_r2r_kind kind1,
                           fftw_r2r_kind kind2,
                           unsigned flags);
fftw_plan fftw_plan_r2r(int rank, const int *n, double *in, double *out,
                        const fftw_r2r_kind *kind, unsigned flags);
@end example
@findex fftw_plan_r2r_1d
@findex fftw_plan_r2r_2d
@findex fftw_plan_r2r_3d
@findex fftw_plan_r2r

Plan a real input/output (r2r) transform of various kinds in zero or
more dimensions, returning an @code{fftw_plan} (@pxref{Using Plans}).

Once you have created a plan for a certain transform type and
parameters, then creating another plan of the same type and parameters,
but for different arrays, is fast and shares constant data with the
first plan (if it still exists).

The planner returns @code{NULL} if the plan cannot be created.  A
non-@code{NULL} plan is always returned by the basic interface unless
you are using a customized FFTW configuration supporting a restricted
set of transforms, or for size-1 @code{FFTW_REDFT00} kinds (which are
not defined).
@ctindex FFTW_REDFT00

@subsubheading Arguments
@itemize @bullet

@item
@code{rank} is the dimensionality of the transform (it should be the
size of the arrays @code{*n} and @code{*kind}), and can be any
non-negative integer.  The @samp{_1d}, @samp{_2d}, and @samp{_3d}
planners correspond to a @code{rank} of @code{1}, @code{2}, and
@code{3}, respectively.  A @code{rank} of zero is equivalent to a copy
of one number from input to output.

@item
@code{n}, or @code{n0}/@code{n1}/@code{n2}, or @code{n[rank]},
respectively, gives the (physical) size of the transform dimensions.
They can be any positive integer.

@itemize @minus
@item
@cindex row-major
Multi-dimensional arrays are stored in row-major order with dimensions:
@code{n0} x @code{n1}; or @code{n0} x @code{n1} x @code{n2}; or
@code{n[0]} x @code{n[1]} x ... x @code{n[rank-1]}.
@xref{Multi-dimensional Array Format}.
@item
FFTW is generally best at handling sizes of the form
@ifinfo
@math{2^a 3^b 5^c 7^d 11^e 13^f},
@end ifinfo
@tex
$2^a 3^b 5^c 7^d 11^e 13^f$,
@end tex
@html
2<sup>a</sup> 3<sup>b</sup> 5<sup>c</sup> 7<sup>d</sup>
        11<sup>e</sup> 13<sup>f</sup>,
@end html
where @math{e+f} is either @math{0} or @math{1}, and the other exponents
are arbitrary.  Other sizes are computed by means of a slow,
general-purpose algorithm (which nevertheless retains @Onlogn
performance even for prime sizes).  (It is possible to customize FFTW
for different array sizes; see @ref{Installation and Customization}.)
Transforms whose sizes are powers of @math{2} are especially fast.
@item
For a @code{REDFT00} or @code{RODFT00} transform kind in a dimension of
size @math{n}, it is @math{n-1} or @math{n+1}, respectively, that
should be factorizable in the above form.
@end itemize

@item
@code{in} and @code{out} point to the input and output arrays of the
transform, which may be the same (yielding an in-place transform).
@cindex in-place
These arrays are overwritten during planning, unless
@code{FFTW_ESTIMATE} is used in the flags.  (The arrays need not be
initialized, but they must be allocated.)

@item
@code{kind}, or @code{kind0}/@code{kind1}/@code{kind2}, or
@code{kind[rank]}, is the kind of r2r transform used for the
corresponding dimension.  The valid kind constants are described in
@ref{Real-to-Real Transform Kinds}.  In a multi-dimensional transform,
what is computed is the separable product formed by taking each
transform kind along the corresponding dimension, one dimension after
another.

@item
@cindex flags
@code{flags} is a bitwise OR (@samp{|}) of zero or more planner flags,
as defined in @ref{Planner Flags}.

@end itemize

@c =========>
@node Real-to-Real Transform Kinds,  , Real-to-Real Transforms, Basic Interface
@subsection Real-to-Real Transform Kinds
@cindex kind (r2r)

FFTW currently supports 11 different r2r transform kinds, specified by
one of the constants below.  For the precise definitions of these
transforms, see @ref{What FFTW Really Computes}.  For a more colloquial
introduction to these transform kinds, see @ref{More DFTs of Real Data}.

For dimension of size @code{n}, there is a corresponding ``logical''
dimension @code{N} that determines the normalization (and the optimal
factorization); the formula for @code{N} is given for each kind below.
Also, with each transform kind is listed its corrsponding inverse
transform.  FFTW computes unnormalized transforms: a transform followed
by its inverse will result in the original data multiplied by @code{N}
(or the product of the @code{N}'s for each dimension, in
multi-dimensions).
@cindex normalization

@itemize @bullet

@item
@ctindex FFTW_R2HC
@code{FFTW_R2HC} computes a real-input DFT with output in
``halfcomplex'' format, i.e. real and imaginary parts for a transform of
size @code{n} stored as:
@tex
$$
r_0, r_1, r_2, \ldots, r_{n/2}, i_{(n+1)/2-1}, \ldots, i_2, i_1
$$
@end tex
@ifinfo
r0, r1, r2, r(n/2), i((n+1)/2-1), ..., i2, i1
@end ifinfo
@html
<p align=center>
r<sub>0</sub>, r<sub>1</sub>, r<sub>2</sub>, ..., r<sub>n/2</sub>, i<sub>(n+1)/2-1</sub>, ..., i<sub>2</sub>, i<sub>1</sub>
</p>
@end html
(Logical @code{N=n}, inverse is @code{FFTW_HC2R}.)

@item
@ctindex FFTW_HC2R
@code{FFTW_HC2R} computes the reverse of @code{FFTW_R2HC}, above.
(Logical @code{N=n}, inverse is @code{FFTW_R2HC}.)

@item
@ctindex FFTW_DHT
@code{FFTW_DHT} computes a discrete Hartley transform.
(Logical @code{N=n}, inverse is @code{FFTW_DHT}.)
@cindex discrete Hartley transform

@item
@ctindex FFTW_REDFT00
@code{FFTW_REDFT00} computes an REDFT00 transform, i.e. a DCT-I.
(Logical @code{N=2*(n-1)}, inverse is @code{FFTW_REDFT00}.)
@cindex discrete cosine transform
@cindex DCT

@item
@ctindex FFTW_REDFT10
@code{FFTW_REDFT10} computes an REDFT10 transform, i.e. a DCT-II (sometimes called ``the'' DCT).
(Logical @code{N=2*n}, inverse is @code{FFTW_REDFT01}.)

@item
@ctindex FFTW_REDFT01
@code{FFTW_REDFT01} computes an REDFT01 transform, i.e. a DCT-III (sometimes called ``the'' IDCT, being the inverse of DCT-II).
(Logical @code{N=2*n}, inverse is @code{FFTW_REDFT=10}.)
@cindex IDCT

@item
@ctindex FFTW_REDFT11
@code{FFTW_REDFT11} computes an REDFT11 transform, i.e. a DCT-IV.
(Logical @code{N=2*n}, inverse is @code{FFTW_REDFT11}.)

@item
@ctindex FFTW_RODFT00
@code{FFTW_RODFT00} computes an RODFT00 transform, i.e. a DST-I.
(Logical @code{N=2*(n+1)}, inverse is @code{FFTW_RODFT00}.)
@cindex discrete sine transform
@cindex DST

@item
@ctindex FFTW_RODFT10
@code{FFTW_RODFT10} computes an RODFT10 transform, i.e. a DST-II.
(Logical @code{N=2*n}, inverse is @code{FFTW_RODFT01}.)

@item
@ctindex FFTW_RODFT01
@code{FFTW_RODFT01} computes an RODFT01 transform, i.e. a DST-III.
(Logical @code{N=2*n}, inverse is @code{FFTW_RODFT=10}.)

@item
@ctindex FFTW_RODFT11
@code{FFTW_RODFT11} computes an RODFT11 transform, i.e. a DST-IV.
(Logical @code{N=2*n}, inverse is @code{FFTW_RODFT11}.)

@end itemize

@c ------------------------------------------------------------
@node Advanced Interface, Guru Interface, Basic Interface, FFTW Reference
@section Advanced Interface
@cindex advanced interface

FFTW's ``advanced'' interface supplements the basic interface with four
new planner routines, providing a new level of flexibility: you can plan
a transform of multiple arrays simultaneously, operate on non-contiguous
(strided) data, and transform a subset of a larger multi-dimensional
array.  Other than these additional features, the planner operates in
the same fashion as in the basic interface, and the resulting
@code{fftw_plan} is used in the same way (@pxref{Using Plans}).

@menu
* Advanced Complex DFTs::
* Advanced Real-data DFTs::
* Advanced Real-to-real Transforms::
@end menu

@c =========>
@node Advanced Complex DFTs, Advanced Real-data DFTs, Advanced Interface, Advanced Interface
@subsection Advanced Complex DFTs

@example
fftw_plan fftw_plan_many_dft(int rank, const int *n, int howmany,
                             fftw_complex *in, const int *inembed,
                             int istride, int idist,
                             fftw_complex *out, const int *onembed,
                             int ostride, int odist,
                             int sign, unsigned flags);
@end example
@findex fftw_plan_many_dft

This plans multidimensional complex DFTs, and is exactly the same as
@code{fftw_plan_dft} except for the new parameters @code{howmany},
@{@code{i},@code{o}@}@code{nembed}, @{@code{i},@code{o}@}@code{stride},
and @{@code{i},@code{o}@}@code{dist}.

@code{howmany} is the number of transforms to compute, where the
@code{k}-th transform is of the arrays starting at @code{in+k*idist} and
@code{out+k*odist}.  The resulting plans can often be faster than
calling FFTW multiple times for the individual transforms.  The basic
@code{fftw_plan_dft} interface corresponds to @code{howmany=1} (in which
case the @code{dist} parameters are ignored).
@cindex howmany parameter
@cindex dist

The two @code{nembed} parameters (which should be arrays of length
@code{rank}) indicate the sizes of the input and output array
dimensions, respectively, where the transform is of a subarray of size
@code{n}.  (Each dimension of @code{n} should be @code{<=} the
corresponding dimension of the @code{nembed} arrays.)  That is, the
input and output arrays are stored in row-major order with size given by
@code{nembed} (not counting the strides and howmany multiplicities).
Passing @code{NULL} for an @code{nembed} parameter is equivalent to
passing @code{n} (i.e. same physical and logical dimensions, as in the
basic interface.)

The @code{stride} parameters indicate that the @code{j}-th element of
the input or output arrays is located at @code{j*istride} or
@code{j*ostride}, respectively.  (For a multi-dimensional array,
@code{j} is the ordinary row-major index.)  When combined with the
@code{k}-th transform in a @code{howmany} loop, from above, this means
that the (@code{j},@code{k})-th element is at @code{j*stride+k*dist}.
(The basic @code{fftw_plan_dft} interface corresponds to a stride of 1.)
@cindex stride

For in-place transforms, the input and output @code{stride} and
@code{dist} parameters should be the same; otherwise, the planner may
return @code{NULL}.

Arrays @code{n}, @code{inembed}, and @code{onembed} are not used after
this function returns.  You can safely free or reuse them.

So, for example, to transform a sequence of contiguous arrays, stored
one after another, one would use a @code{stride} of 1 and a @code{dist}
of @math{N}, where @math{N} is the product of the dimensions.  In
another example, to transform an array of contiguous ``vectors'' of
length @math{M}, one would use a @code{howmany} of @math{M}, a
@code{stride} of @math{M}, and a @code{dist} of 1.
@cindex vector

@c =========>
@node Advanced Real-data DFTs, Advanced Real-to-real Transforms, Advanced Complex DFTs, Advanced Interface
@subsection Advanced Real-data DFTs

@example
fftw_plan fftw_plan_many_dft_r2c(int rank, const int *n, int howmany,
                                 double *in, const int *inembed,
                                 int istride, int idist,
                                 fftw_complex *out, const int *onembed,
                                 int ostride, int odist,
                                 unsigned flags);
fftw_plan fftw_plan_many_dft_c2r(int rank, const int *n, int howmany,
                                 fftw_complex *in, const int *inembed,
                                 int istride, int idist,
                                 double *out, const int *onembed,
                                 int ostride, int odist,
                                 unsigned flags);
@end example
@findex fftw_plan_many_dft_r2c
@findex fftw_plan_many_dft_c2r

Like @code{fftw_plan_many_dft}, these two functions add @code{howmany},
@code{nembed}, @code{stride}, and @code{dist} parameters to the
@code{fftw_plan_dft_r2c} and @code{fftw_plan_dft_c2r} functions, but
otherwise behave the same as the basic interface.

The interpretation of @code{howmany}, @code{stride}, and @code{dist} are
the same as for @code{fftw_plan_many_dft}, above.  Note that the
@code{stride} and @code{dist} for the real array are in units of
@code{double}, and for the complex array are in units of
@code{fftw_complex}.

If an @code{nembed} parameter is @code{NULL}, it is interpreted as what
it would be in the basic interface, as described in @ref{Real-data DFT
Array Format}.  That is, for the complex array the size is assumed to be
the same as @code{n}, but with the last dimension cut roughly in half.
For the real array, the size is assumed to be @code{n} if the transform
is out-of-place, or @code{n} with the last dimension ``padded'' if the
transform is in-place.

If an @code{nembed} parameter is non-@code{NULL}, it is interpreted as
the physical size of the corresponding array, in row-major order, just
as for @code{fftw_plan_many_dft}.  In this case, each dimension of
@code{nembed} should be @code{>=} what it would be in the basic
interface (e.g. the halved or padded @code{n}).

Arrays @code{n}, @code{inembed}, and @code{onembed} are not used after
this function returns.  You can safely free or reuse them.

@c =========>
@node Advanced Real-to-real Transforms,  , Advanced Real-data DFTs, Advanced Interface
@subsection Advanced Real-to-real Transforms

@example
fftw_plan fftw_plan_many_r2r(int rank, const int *n, int howmany,
                             double *in, const int *inembed,
                             int istride, int idist,
                             double *out, const int *onembed,
                             int ostride, int odist,
                             const fftw_r2r_kind *kind, unsigned flags);
@end example
@findex fftw_plan_many_r2r

Like @code{fftw_plan_many_dft}, this functions adds @code{howmany},
@code{nembed}, @code{stride}, and @code{dist} parameters to the
@code{fftw_plan_r2r} function, but otherwise behave the same as the
basic interface.  The interpretation of those additional parameters are
the same as for @code{fftw_plan_many_dft}.  (Of course, the
@code{stride} and @code{dist} parameters are now in units of
@code{double}, not @code{fftw_complex}.)

Arrays @code{n}, @code{inembed}, @code{onembed}, and @code{kind} are not
used after this function returns.  You can safely free or reuse them.

@c ------------------------------------------------------------
@node Guru Interface, New-array Execute Functions, Advanced Interface, FFTW Reference
@section Guru Interface
@cindex guru interface

The ``guru'' interface to FFTW is intended to expose as much as possible
of the flexibility in the underlying FFTW architecture.  It allows one
to compute multi-dimensional ``vectors'' (loops) of multi-dimensional
transforms, where each vector/transform dimension has an independent
size and stride.
@cindex vector
One can also use more general complex-number formats, e.g. separate real
and imaginary arrays.

For those users who require the flexibility of the guru interface, it is
important that they pay special attention to the documentation lest they
shoot themselves in the foot.

@menu
* Interleaved and split arrays::
* Guru vector and transform sizes::
* Guru Complex DFTs::
* Guru Real-data DFTs::
* Guru Real-to-real Transforms::
* 64-bit Guru Interface::
@end menu

@c =========>
@node  Interleaved and split arrays, Guru vector and transform sizes, Guru Interface, Guru Interface
@subsection Interleaved and split arrays

The guru interface supports two representations of complex numbers,
which we call the interleaved and the split format.

The @dfn{interleaved} format is the same one used by the basic and
advanced interfaces, and it is documented in @ref{Complex numbers}.
In the interleaved format, you provide pointers to the real part of a
complex number, and the imaginary part understood to be stored in the
next memory location.
@cindex interleaved format

The @dfn{split} format allows separate pointers to the real and
imaginary parts of a complex array.
@cindex split format

Technically, the interleaved format is redundant, because you can
always express an interleaved array in terms of a split array with
appropriate pointers and strides.  On the other hand, the interleaved
format is simpler to use, and it is common in practice.  Hence, FFTW
supports it as a special case.

@c =========>
@node Guru vector and transform sizes, Guru Complex DFTs, Interleaved and split arrays, Guru Interface
@subsection Guru vector and transform sizes

The guru interface introduces one basic new data structure,
@code{fftw_iodim}, that is used to specify sizes and strides for
multi-dimensional transforms and vectors:

@example
typedef struct @{
     int n;
     int is;
     int os;
@} fftw_iodim;
@end example
@tindex fftw_iodim

Here, @code{n} is the size of the dimension, and @code{is} and @code{os}
are the strides of that dimension for the input and output arrays.  (The
stride is the separation of consecutive elements along this dimension.)

The meaning of the stride parameter depends on the type of the array
that the stride refers to.  @emph{If the array is interleaved complex,
strides are expressed in units of complex numbers
(@code{fftw_complex}).  If the array is split complex or real, strides
are expressed in units of real numbers (@code{double}).}  This
convention is consistent with the usual pointer arithmetic in the C
language.  An interleaved array is denoted by a pointer @code{p} to
@code{fftw_complex}, so that @code{p+1} points to the next complex
number.  Split arrays are denoted by pointers to @code{double}, in
which case pointer arithmetic operates in units of
@code{sizeof(double)}.
@cindex stride

The guru planner interfaces all take a (@code{rank}, @code{dims[rank]})
pair describing the transform size, and a (@code{howmany_rank},
@code{howmany_dims[howmany_rank]}) pair describing the ``vector'' size (a
multi-dimensional loop of transforms to perform), where @code{dims} and
@code{howmany_dims} are arrays of @code{fftw_iodim}.

For example, the @code{howmany} parameter in the advanced complex-DFT
interface corresponds to @code{howmany_rank} = 1,
@code{howmany_dims[0].n} = @code{howmany}, @code{howmany_dims[0].is} =
@code{idist}, and @code{howmany_dims[0].os} = @code{odist}.
@cindex howmany loop
@cindex dist
(To compute a single transform, you can just use @code{howmany_rank} = 0.)


A row-major multidimensional array with dimensions @code{n[rank]}
(@pxref{Row-major Format}) corresponds to @code{dims[i].n} =
@code{n[i]} and the recurrence @code{dims[i].is} = @code{n[i+1] *
dims[i+1].is} (similarly for @code{os}).  The stride of the last
(@code{i=rank-1}) dimension is the overall stride of the array.
e.g. to be equivalent to the advanced complex-DFT interface, you would
have @code{dims[rank-1].is} = @code{istride} and
@code{dims[rank-1].os} = @code{ostride}.
@cindex row-major

In general, we only guarantee FFTW to return a non-@code{NULL} plan if
the vector and transform dimensions correspond to a set of distinct
indices, and for in-place transforms the input/output strides should
be the same.

@c =========>
@node Guru Complex DFTs, Guru Real-data DFTs, Guru vector and transform sizes, Guru Interface
@subsection Guru Complex DFTs

@example
fftw_plan fftw_plan_guru_dft(
     int rank, const fftw_iodim *dims,
     int howmany_rank, const fftw_iodim *howmany_dims,
     fftw_complex *in, fftw_complex *out,
     int sign, unsigned flags);

fftw_plan fftw_plan_guru_split_dft(
     int rank, const fftw_iodim *dims,
     int howmany_rank, const fftw_iodim *howmany_dims,
     double *ri, double *ii, double *ro, double *io,
     unsigned flags);
@end example
@findex fftw_plan_guru_dft
@findex fftw_plan_guru_split_dft

These two functions plan a complex-data, multi-dimensional DFT
for the interleaved and split format, respectively.
Transform dimensions are given by (@code{rank}, @code{dims}) over a
multi-dimensional vector (loop) of dimensions (@code{howmany_rank},
@code{howmany_dims}).  @code{dims} and @code{howmany_dims} should point
to @code{fftw_iodim} arrays of length @code{rank} and
@code{howmany_rank}, respectively.

@cindex flags
@code{flags} is a bitwise OR (@samp{|}) of zero or more planner flags,
as defined in @ref{Planner Flags}.

In the @code{fftw_plan_guru_dft} function, the pointers @code{in} and
@code{out} point to the interleaved input and output arrays,
respectively.  The sign can be either @math{-1} (=
@code{FFTW_FORWARD}) or @math{+1} (= @code{FFTW_BACKWARD}).  If the
pointers are equal, the transform is in-place.

In the @code{fftw_plan_guru_split_dft} function,
@code{ri} and @code{ii} point to the real and imaginary input arrays,
and @code{ro} and @code{io} point to the real and imaginary output
arrays.  The input and output pointers may be the same, indicating an
in-place transform.  For example, for @code{fftw_complex} pointers
@code{in} and @code{out}, the corresponding parameters are:

@example
ri = (double *) in;
ii = (double *) in + 1;
ro = (double *) out;
io = (double *) out + 1;
@end example

Because @code{fftw_plan_guru_split_dft} accepts split arrays, strides
are expressed in units of @code{double}.  For a contiguous
@code{fftw_complex} array, the overall stride of the transform should
be 2, the distance between consecutive real parts or between
consecutive imaginary parts; see @ref{Guru vector and transform
sizes}.  Note that the dimension strides are applied equally to the
real and imaginary parts; real and imaginary arrays with different
strides are not supported.

There is no @code{sign} parameter in @code{fftw_plan_guru_split_dft}.
This function always plans for an @code{FFTW_FORWARD} transform.  To
plan for an @code{FFTW_BACKWARD} transform, you can exploit the
identity that the backwards DFT is equal to the forwards DFT with the
real and imaginary parts swapped.  For example, in the case of the
@code{fftw_complex} arrays above, the @code{FFTW_BACKWARD} transform
is computed by the parameters:

@example
ri = (double *) in + 1;
ii = (double *) in;
ro = (double *) out + 1;
io = (double *) out;
@end example

@c =========>
@node Guru Real-data DFTs, Guru Real-to-real Transforms, Guru Complex DFTs, Guru Interface
@subsection Guru Real-data DFTs

@example
fftw_plan fftw_plan_guru_dft_r2c(
     int rank, const fftw_iodim *dims,
     int howmany_rank, const fftw_iodim *howmany_dims,
     double *in, fftw_complex *out,
     unsigned flags);

fftw_plan fftw_plan_guru_split_dft_r2c(
     int rank, const fftw_iodim *dims,
     int howmany_rank, const fftw_iodim *howmany_dims,
     double *in, double *ro, double *io,
     unsigned flags);

fftw_plan fftw_plan_guru_dft_c2r(
     int rank, const fftw_iodim *dims,
     int howmany_rank, const fftw_iodim *howmany_dims,
     fftw_complex *in, double *out,
     unsigned flags);

fftw_plan fftw_plan_guru_split_dft_c2r(
     int rank, const fftw_iodim *dims,
     int howmany_rank, const fftw_iodim *howmany_dims,
     double *ri, double *ii, double *out,
     unsigned flags);
@end example
@findex fftw_plan_guru_dft_r2c
@findex fftw_plan_guru_split_dft_r2c
@findex fftw_plan_guru_dft_c2r
@findex fftw_plan_guru_split_dft_c2r

Plan a real-input (r2c) or real-output (c2r), multi-dimensional DFT with
transform dimensions given by (@code{rank}, @code{dims}) over a
multi-dimensional vector (loop) of dimensions (@code{howmany_rank},
@code{howmany_dims}).  @code{dims} and @code{howmany_dims} should point
to @code{fftw_iodim} arrays of length @code{rank} and
@code{howmany_rank}, respectively.  As for the basic and advanced
interfaces, an r2c transform is @code{FFTW_FORWARD} and a c2r transform
is @code{FFTW_BACKWARD}.

The @emph{last} dimension of @code{dims} is interpreted specially:
that dimension of the real array has size @code{dims[rank-1].n}, but
that dimension of the complex array has size @code{dims[rank-1].n/2+1}
(division rounded down).  The strides, on the other hand, are taken to
be exactly as specified.  It is up to the user to specify the strides
appropriately for the peculiar dimensions of the data, and we do not
guarantee that the planner will succeed (return non-@code{NULL}) for
any dimensions other than those described in @ref{Real-data DFT Array
Format} and generalized in @ref{Advanced Real-data DFTs}.  (That is,
for an in-place transform, each individual dimension should be able to
operate in place.)
@cindex in-place

@code{in} and @code{out} point to the input and output arrays for r2c
and c2r transforms, respectively.  For split arrays, @code{ri} and
@code{ii} point to the real and imaginary input arrays for a c2r
transform, and @code{ro} and @code{io} point to the real and imaginary
output arrays for an r2c transform.  @code{in} and @code{ro} or
@code{ri} and @code{out} may be the same, indicating an in-place
transform.   (In-place transforms where @code{in} and @code{io} or
@code{ii} and @code{out} are the same are not currently supported.)

@cindex flags
@code{flags} is a bitwise OR (@samp{|}) of zero or more planner flags,
as defined in @ref{Planner Flags}.

In-place transforms of rank greater than 1 are currently only
supported for interleaved arrays.  For split arrays, the planner will
return @code{NULL}.
@cindex in-place

@c =========>
@node Guru Real-to-real Transforms, 64-bit Guru Interface, Guru Real-data DFTs, Guru Interface
@subsection Guru Real-to-real Transforms

@example
fftw_plan fftw_plan_guru_r2r(int rank, const fftw_iodim *dims,
                             int howmany_rank,
                             const fftw_iodim *howmany_dims,
                             double *in, double *out,
                             const fftw_r2r_kind *kind,
                             unsigned flags);
@end example
@findex fftw_plan_guru_r2r

Plan a real-to-real (r2r) multi-dimensional @code{FFTW_FORWARD}
transform with transform dimensions given by (@code{rank}, @code{dims})
over a multi-dimensional vector (loop) of dimensions
(@code{howmany_rank}, @code{howmany_dims}).  @code{dims} and
@code{howmany_dims} should point to @code{fftw_iodim} arrays of length
@code{rank} and @code{howmany_rank}, respectively.

The transform kind of each dimension is given by the @code{kind}
parameter, which should point to an array of length @code{rank}.  Valid
@code{fftw_r2r_kind} constants are given in @ref{Real-to-Real Transform
Kinds}.

@code{in} and @code{out} point to the real input and output arrays; they
may be the same, indicating an in-place transform.

@cindex flags
@code{flags} is a bitwise OR (@samp{|}) of zero or more planner flags,
as defined in @ref{Planner Flags}.

@c =========>
@node 64-bit Guru Interface,  , Guru Real-to-real Transforms, Guru Interface
@subsection 64-bit Guru Interface
@cindex 64-bit architecture

When compiled in 64-bit mode on a 64-bit architecture (where addresses
are 64 bits wide), FFTW uses 64-bit quantities internally for all
transform sizes, strides, and so on---you don't have to do anything
special to exploit this.  However, in the ordinary FFTW interfaces,
you specify the transform size by an @code{int} quantity, which is
normally only 32 bits wide.  This means that, even though FFTW is
using 64-bit sizes internally, you cannot specify a single transform
dimension larger than
@ifinfo
2^31-1
@end ifinfo
@html
2<sup><small>31</small></sup>&minus;1
@end html
@tex
$2^31-1$
@end tex
numbers.

We expect that few users will require transforms larger than this, but,
for those who do, we provide a 64-bit version of the guru interface in
which all sizes are specified as integers of type @code{ptrdiff_t}
instead of @code{int}.  (@code{ptrdiff_t} is a signed integer type
defined by the C standard to be wide enough to represent address
differences, and thus must be at least 64 bits wide on a 64-bit
machine.)  We stress that there is @emph{no performance advantage} to
using this interface---the same internal FFTW code is employed
regardless---and it is only necessary if you want to specify very
large transform sizes.
@tindex ptrdiff_t

In particular, the 64-bit guru interface is a set of planner routines
that are exactly the same as the guru planner routines, except that
they are named with @samp{guru64} instead of @samp{guru} and they take
arguments of type @code{fftw_iodim64} instead of @code{fftw_iodim}.
For example, instead of @code{fftw_plan_guru_dft}, we have
@code{fftw_plan_guru64_dft}.

@example
fftw_plan fftw_plan_guru64_dft(
     int rank, const fftw_iodim64 *dims,
     int howmany_rank, const fftw_iodim64 *howmany_dims,
     fftw_complex *in, fftw_complex *out,
     int sign, unsigned flags);
@end example
@findex fftw_plan_guru64_dft

The @code{fftw_iodim64} type is similar to @code{fftw_iodim}, with the
same interpretation, except that it uses type @code{ptrdiff_t} instead
of type @code{int}.

@example
typedef struct @{
     ptrdiff_t n;
     ptrdiff_t is;
     ptrdiff_t os;
@} fftw_iodim64;
@end example
@tindex fftw_iodim64

Every other @samp{fftw_plan_guru} function also has a
@samp{fftw_plan_guru64} equivalent, but we do not repeat their
documentation here since they are identical to the 32-bit versions
except as noted above.

@c -----------------------------------------------------------
@node New-array Execute Functions, Wisdom, Guru Interface, FFTW Reference
@section New-array Execute Functions
@cindex execute
@cindex new-array execution

Normally, one executes a plan for the arrays with which the plan was
created, by calling @code{fftw_execute(plan)} as described in @ref{Using
Plans}.
@findex fftw_execute
However, it is possible for sophisticated users to apply a given plan
to a @emph{different} array using the ``new-array execute'' functions
detailed below, provided that the following conditions are met:

@itemize @bullet

@item
The array size, strides, etcetera are the same (since those are set by
the plan).

@item
The input and output arrays are the same (in-place) or different
(out-of-place) if the plan was originally created to be in-place or
out-of-place, respectively.

@item
For split arrays, the separations between the real and imaginary
parts, @code{ii-ri} and @code{io-ro}, are the same as they were for
the input and output arrays when the plan was created.  (This
condition is automatically satisfied for interleaved arrays.)

@item
The @dfn{alignment} of the new input/output arrays is the same as that
of the input/output arrays when the plan was created, unless the plan
was created with the @code{FFTW_UNALIGNED} flag.
@ctindex FFTW_UNALIGNED
Here, the alignment is a platform-dependent quantity (for example, it is
the address modulo 16 if SSE SIMD instructions are used, but the address
modulo 4 for non-SIMD single-precision FFTW on the same machine).  In
general, only arrays allocated with @code{fftw_malloc} are guaranteed to
be equally aligned (@pxref{SIMD alignment and fftw_malloc}).

@end itemize

@cindex alignment
The alignment issue is especially critical, because if you don't use
@code{fftw_malloc} then you may have little control over the alignment
of arrays in memory.  For example, neither the C++ @code{new} function
nor the Fortran @code{allocate} statement provide strong enough
guarantees about data alignment.  If you don't use @code{fftw_malloc},
therefore, you probably have to use @code{FFTW_UNALIGNED} (which
disables most SIMD support).  If possible, it is probably better for
you to simply create multiple plans (creating a new plan is quick once
one exists for a given size), or better yet re-use the same array for
your transforms.

If you are tempted to use the new-array execute interface because you
want to transform a known bunch of arrays of the same size, you should
probably go use the advanced interface instead (@pxref{Advanced
Interface})).

The new-array execute functions are:

@example
void fftw_execute_dft(
     const fftw_plan p,
     fftw_complex *in, fftw_complex *out);

void fftw_execute_split_dft(
     const fftw_plan p,
     double *ri, double *ii, double *ro, double *io);

void fftw_execute_dft_r2c(
     const fftw_plan p,
     double *in, fftw_complex *out);

void fftw_execute_split_dft_r2c(
     const fftw_plan p,
     double *in, double *ro, double *io);

void fftw_execute_dft_c2r(
     const fftw_plan p,
     fftw_complex *in, double *out);

void fftw_execute_split_dft_c2r(
     const fftw_plan p,
     double *ri, double *ii, double *out);

void fftw_execute_r2r(
     const fftw_plan p,
     double *in, double *out);
@end example
@findex fftw_execute_dft
@findex fftw_execute_split_dft
@findex fftw_execute_dft_r2c
@findex fftw_execute_split_dft_r2c
@findex fftw_execute_dft_c2r
@findex fftw_execute_split_dft_c2r
@findex fftw_execute_dft_r2r

These execute the @code{plan} to compute the corresponding transform on
the input/output arrays specified by the subsequent arguments.  The
input/output array arguments have the same meanings as the ones passed
to the guru planner routines in the preceding sections.  The @code{plan}
is not modified, and these routines can be called as many times as
desired, or intermixed with calls to the ordinary @code{fftw_execute}.

The @code{plan} @emph{must} have been created for the transform type
corresponding to the execute function, e.g. it must be a complex-DFT
plan for @code{fftw_execute_dft}.  Any of the planner routines for that
transform type, from the basic to the guru interface, could have been
used to create the plan, however.

@c ------------------------------------------------------------
@node Wisdom, What FFTW Really Computes, New-array Execute Functions, FFTW Reference
@section Wisdom
@cindex wisdom
@cindex saving plans to disk

This section documents the FFTW mechanism for saving and restoring
plans from disk.  This mechanism is called @dfn{wisdom}.

@menu
* Wisdom Export::
* Wisdom Import::
* Forgetting Wisdom::
* Wisdom Utilities::
@end menu

@c =========>
@node Wisdom Export, Wisdom Import, Wisdom, Wisdom
@subsection Wisdom Export

@example
void fftw_export_wisdom_to_file(FILE *output_file);
char *fftw_export_wisdom_to_string(void);
void fftw_export_wisdom(void (*write_char)(char c, void *), void *data);
@end example
@findex fftw_export_wisdom
@findex fftw_export_wisdom_to_file
@findex fftw_export_wisdom_to_string

These functions allow you to export all currently accumulated wisdom
in a form from which it can be later imported and restored, even
during a separate run of the program. (@xref{Words of Wisdom-Saving
Plans}.)  The current store of wisdom is not affected by calling any
of these routines.

@code{fftw_export_wisdom} exports the wisdom to any output
medium, as specified by the callback function
@code{write_char}. @code{write_char} is a @code{putc}-like function that
writes the character @code{c} to some output; its second parameter is
the @code{data} pointer passed to @code{fftw_export_wisdom}.  For
convenience, the following two ``wrapper'' routines are provided:

@code{fftw_export_wisdom_to_file} writes the wisdom to the
current position in @code{output_file}, which should be open with write
permission.  Upon exit, the file remains open and is positioned at the
end of the wisdom data.

@code{fftw_export_wisdom_to_string} returns a pointer to a
@code{NULL}-terminated string holding the wisdom data. This string is
dynamically allocated, and it is the responsibility of the caller to
deallocate it with @code{free} when it is no longer needed.

All of these routines export the wisdom in the same format, which we
will not document here except to say that it is LISP-like ASCII text
that is insensitive to white space.

@c =========>
@node Wisdom Import, Forgetting Wisdom, Wisdom Export, Wisdom
@subsection Wisdom Import

@example
int fftw_import_system_wisdom(void);
int fftw_import_wisdom_from_file(FILE *input_file);
int fftw_import_wisdom_from_string(const char *input_string);
int fftw_import_wisdom(int (*read_char)(void *), void *data);
@end example
@findex fftw_import_wisdom
@findex fftw_import_system_wisdom
@findex fftw_import_wisdom_from_file
@findex fftw_import_wisdom_from_string

These functions import wisdom into a program from data stored by the
@code{fftw_export_wisdom} functions above. (@xref{Words of
Wisdom-Saving Plans}.)  The imported wisdom replaces any wisdom
already accumulated by the running program.

@code{fftw_import_wisdom} imports wisdom from any input medium, as
specified by the callback function @code{read_char}. @code{read_char} is
a @code{getc}-like function that returns the next character in the
input; its parameter is the @code{data} pointer passed to
@code{fftw_import_wisdom}. If the end of the input data is reached
(which should never happen for valid data), @code{read_char} should
return @code{EOF} (as defined in @code{<stdio.h>}).  For convenience,
the following two ``wrapper'' routines are provided:

@code{fftw_import_wisdom_from_file} reads wisdom from the current
position in @code{input_file}, which should be open with read
permission.  Upon exit, the file remains open, but the position of the
read pointer is unspecified.

@code{fftw_import_wisdom_from_string} reads wisdom from the
@code{NULL}-terminated string @code{input_string}.

@code{fftw_import_system_wisdom} reads wisdom from an
implementation-defined standard file (@code{/etc/fftw/wisdom} on Unix
and GNU systems).
@cindex wisdom, system-wide

The return value of these import routines is @code{1} if the wisdom was
read successfully and @code{0} otherwise. Note that, in all of these
functions, any data in the input stream past the end of the wisdom data
is simply ignored.

@c =========>
@node Forgetting Wisdom, Wisdom Utilities, Wisdom Import, Wisdom
@subsection Forgetting Wisdom

@example
void fftw_forget_wisdom(void);
@end example
@findex fftw_forget_wisdom

Calling @code{fftw_forget_wisdom} causes all accumulated @code{wisdom}
to be discarded and its associated memory to be freed. (New
@code{wisdom} can still be gathered subsequently, however.)

@c =========>
@node Wisdom Utilities,  , Forgetting Wisdom, Wisdom
@subsection Wisdom Utilities

FFTW includes two standalone utility programs that deal with wisdom.  We
merely summarize them here, since they come with their own @code{man}
pages for Unix and GNU systems (with HTML versions on our web site).

The first program is @code{fftw-wisdom} (or @code{fftwf-wisdom} in
single precision, etcetera), which can be used to create a wisdom file
containing plans for any of the transform sizes and types supported by
FFTW.  It is preferable to create wisdom directly from your executable
(@pxref{Caveats in Using Wisdom}), but this program is useful for
creating global wisdom files for @code{fftw_import_system_wisdom}.
@cindex fftw-wisdom utility

The second program is @code{fftw-wisdom-to-conf}, which takes a wisdom
file as input and produces a @dfn{configuration routine} as output.  The
latter is a C subroutine that you can compile and link into your
program, replacing a routine of the same name in the FFTW library, that
determines which parts of FFTW are callable by your program.
@code{fftw-wisdom-to-conf} produces a configuration routine that links
to only those parts of FFTW needed by the saved plans in the wisdom,
greatly reducing the size of statically linked executables (which should
only attempt to create plans corresponding to those in the wisdom,
however).
@cindex fftw-wisdom-to-conf utility
@cindex configuration routines

@c ------------------------------------------------------------
@node What FFTW Really Computes,  , Wisdom, FFTW Reference
@section What FFTW Really Computes

In this section, we provide precise mathematical definitions for the
transforms that FFTW computes.  These transform definitions are fairly
standard, but some authors follow slightly different conventions for the
normalization of the transform (the constant factor in front) and the
sign of the complex exponent.  We begin by presenting the
one-dimensional (1d) transform definitions, and then give the
straightforward extension to multi-dimensional transforms.

@menu
* The 1d Discrete Fourier Transform (DFT)::
* The 1d Real-data DFT::
* 1d Real-even DFTs (DCTs)::
* 1d Real-odd DFTs (DSTs)::
* 1d Discrete Hartley Transforms (DHTs)::
* Multi-dimensional Transforms::
@end menu

@c =========>
@node The 1d Discrete Fourier Transform (DFT), The 1d Real-data DFT, What FFTW Really Computes, What FFTW Really Computes
@subsection The 1d Discrete Fourier Transform (DFT)

@cindex discrete Fourier transform
@cindex DFT
The forward (@code{FFTW_FORWARD}) discrete Fourier transform (DFT) of a
1d complex array @math{X} of size @math{n} computes an array @math{Y},
where:
@tex
$$
Y_k = \sum_{j = 0}^{n - 1} X_j e^{-2\pi j k \sqrt{-1}/n} \ .
$$
@end tex
@ifinfo
@center Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(-2 pi j k sqrt(-1)/n) .
@end ifinfo
@html
<center><img src="equation-dft.png" align="top">.</center>
@end html
The backward (@code{FFTW_BACKWARD}) DFT computes:
@tex
$$
Y_k = \sum_{j = 0}^{n - 1} X_j e^{2\pi j k \sqrt{-1}/n} \ .
$$
@end tex
@ifinfo
@center Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(2 pi j k sqrt(-1)/n) .
@end ifinfo
@html
<center><img src="equation-idft.png" align="top">.</center>
@end html

@cindex normalization
FFTW computes an unnormalized transform, in that there is no coefficient
in front of the summation in the DFT.  In other words, applying the
forward and then the backward transform will multiply the input by
@math{n}.

@cindex frequency
From above, an @code{FFTW_FORWARD} transform corresponds to a sign of
@math{-1} in the exponent of the DFT.  Note also that we use the
standard ``in-order'' output ordering---the @math{k}-th output
corresponds to the frequency @math{k/n} (or @math{k/T}, where @math{T}
is your total sampling period).  For those who like to think in terms of
positive and negative frequencies, this means that the positive
frequencies are stored in the first half of the output and the negative
frequencies are stored in backwards order in the second half of the
output.  (The frequency @math{-k/n} is the same as the frequency
@math{(n-k)/n}.)

@c =========>
@node The 1d Real-data DFT, 1d Real-even DFTs (DCTs), The 1d Discrete Fourier Transform (DFT), What FFTW Really Computes
@subsection The 1d Real-data DFT

The real-input (r2c) DFT in FFTW computes the @emph{forward} transform
@math{Y} of the size @code{n} real array @math{X}, exactly as defined
above, i.e.
@tex
$$
Y_k = \sum_{j = 0}^{n - 1} X_j e^{-2\pi j k \sqrt{-1}/n} \ .
$$
@end tex
@ifinfo
@center Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(-2 pi j k sqrt(-1)/n) .
@end ifinfo
@html
<center><img src="equation-dft.png" align="top">.</center>
@end html
This output array @math{Y} can easily be shown to possess the
``Hermitian'' symmetry
@cindex Hermitian
@tex
$Y_k = Y_{n-k}^*$,
@end tex
@ifinfo
Y[k] = Y[n-k]*,
@end ifinfo
@html
<i>Y<sub>k</sub> = Y<sub>n-k</sub></i><sup>*</sup>,
@end html
where we take @math{Y} to be periodic so that
@tex
$Y_n = Y_0$.
@end tex
@ifinfo
Y[n] = Y[0].
@end ifinfo
@html
<i>Y<sub>n</sub> = Y</i><sub>0</sub>.
@end html

As a result of this symmetry, half of the output @math{Y} is redundant
(being the complex conjugate of the other half), and so the 1d r2c
transforms only output elements @math{0}@dots{}@math{n/2} of @math{Y}
(@math{n/2+1} complex numbers), where the division by @math{2} is
rounded down.

Moreover, the Hermitian symmetry implies that
@tex
$Y_0$
@end tex
@ifinfo
Y[0]
@end ifinfo
@html
<i>Y</i><sub>0</sub>
@end html
and, if @math{n} is even, the
@tex
$Y_{n/2}$
@end tex
@ifinfo
Y[n/2]
@end ifinfo
@html
<i>Y</i><sub><i>n</i>/2</sub>
@end html
element, are purely real.  So, for the @code{R2HC} r2r transform, these
elements are not stored in the halfcomplex output format.
@cindex r2r
@ctindex R2HC
@cindex halfcomplex format

The c2r and @code{H2RC} r2r transforms compute the backward DFT of the
@emph{complex} array @math{X} with Hermitian symmetry, stored in the
r2c/@code{R2HC} output formats, respectively, where the backward
transform is defined exactly as for the complex case:
@tex
$$
Y_k = \sum_{j = 0}^{n - 1} X_j e^{2\pi j k \sqrt{-1}/n} \ .
$$
@end tex
@ifinfo
@center Y[k] = sum for j = 0 to (n - 1) of X[j] * exp(2 pi j k sqrt(-1)/n) .
@end ifinfo
@html
<center><img src="equation-idft.png" align="top">.</center>
@end html
The outputs @code{Y} of this transform can easily be seen to be purely
real, and are stored as an array of real numbers.

@cindex normalization
Like FFTW's complex DFT, these transforms are unnormalized.  In other
words, applying the real-to-complex (forward) and then the
complex-to-real (backward) transform will multiply the input by
@math{n}.

@c =========>
@node 1d Real-even DFTs (DCTs), 1d Real-odd DFTs (DSTs), The 1d Real-data DFT, What FFTW Really Computes
@subsection 1d Real-even DFTs (DCTs)

The Real-even symmetry DFTs in FFTW are exactly equivalent to the unnormalized
forward (and backward) DFTs as defined above, where the input array
@math{X} of length @math{N} is purely real and is also @dfn{even} symmetry.  In
this case, the output array is likewise real and even symmetry.
@cindex real-even DFT
@cindex REDFT

@ctindex REDFT00
For the case of @code{REDFT00}, this even symmetry means that
@tex
$X_j = X_{N-j}$,
@end tex
@ifinfo
X[j] = X[N-j],
@end ifinfo
@html
<i>X<sub>j</sub> = X<sub>N-j</sub></i>,
@end html
where we take @math{X} to be periodic so that
@tex
$X_N = X_0$.
@end tex
@ifinfo
X[N] = X[0].
@end ifinfo
@html
<i>X<sub>N</sub> = X</i><sub>0</sub>.
@end html
Because of this redundancy, only the first @math{n} real numbers are
actually stored, where @math{N = 2(n-1)}.

The proper definition of even symmetry for @code{REDFT10},
@code{REDFT01}, and @code{REDFT11} transforms is somewhat more intricate
because of the shifts by @math{1/2} of the input and/or output, although
the corresponding boundary conditions are given in @ref{Real even/odd
DFTs (cosine/sine transforms)}.  Because of the even symmetry, however,
the sine terms in the DFT all cancel and the remaining cosine terms are
written explicitly below.  This formulation often leads people to call
such a transform a @dfn{discrete cosine transform} (DCT), although it is
really just a special case of the DFT.
@cindex discrete cosine transform
@cindex DCT

In each of the definitions below, we transform a real array @math{X} of
length @math{n} to a real array @math{Y} of length @math{n}:

@subsubheading REDFT00 (DCT-I)
@ctindex REDFT00
An @code{REDFT00} transform (type-I DCT) in FFTW is defined by:
@tex
$$
Y_k = X_0 + (-1)^k X_{n-1}
       + 2 \sum_{j=1}^{n-2} X_j \cos [ \pi j k / (n-1)].
$$
@end tex
@ifinfo
Y[k] = X[0] + (-1)^k X[n-1] + 2 (sum for j = 1 to n-2 of X[j] cos(pi jk /(n-1))).
@end ifinfo
@html
<center><img src="equation-redft00.png" align="top">.</center>
@end html
Note that this transform is not defined for @math{n=1}.  For @math{n=2},
the summation term above is dropped as you might expect.

@subsubheading REDFT10 (DCT-II)
@ctindex REDFT10
An @code{REDFT10} transform (type-II DCT, sometimes called ``the'' DCT) in FFTW is defined by:
@tex
$$
Y_k = 2 \sum_{j=0}^{n-1} X_j \cos [\pi (j+1/2) k / n].
$$
@end tex
@ifinfo
Y[k] = 2 (sum for j = 0 to n-1 of X[j] cos(pi (j+1/2) k / n)).
@end ifinfo
@html
<center><img src="equation-redft10.png" align="top">.</center>
@end html

@subsubheading REDFT01 (DCT-III)
@ctindex REDFT01
An @code{REDFT01} transform (type-III DCT) in FFTW is defined by:
@tex
$$
Y_k = X_0 + 2 \sum_{j=1}^{n-1} X_j \cos [\pi j (k+1/2) / n].
$$
@end tex
@ifinfo
Y[k] = X[0] + 2 (sum for j = 1 to n-1 of X[j] cos(pi j (k+1/2) / n)).
@end ifinfo
@html
<center><img src="equation-redft01.png" align="top">.</center>
@end html
In the case of @math{n=1}, this reduces to
@tex
$Y_0 = X_0$.
@end tex
@ifinfo
Y[0] = X[0].
@end ifinfo
@html
<i>Y</i><sub>0</sub> = <i>X</i><sub>0</sub>.
@end html
Up to a scale factor (see below), this is the inverse of @code{REDFT10} (``the'' DCT), and so the @code{REDFT01} (DCT-III) is sometimes called the ``IDCT''.
@cindex IDCT

@subsubheading REDFT11 (DCT-IV)
@ctindex REDFT11
An @code{REDFT11} transform (type-IV DCT) in FFTW is defined by:
@tex
$$
Y_k = 2 \sum_{j=0}^{n-1} X_j \cos [\pi (j+1/2) (k+1/2) / n].
$$
@end tex
@ifinfo
Y[k] = 2 (sum for j = 0 to n-1 of X[j] cos(pi (j+1/2) (k+1/2) / n)).
@end ifinfo
@html
<center><img src="equation-redft11.png" align="top">.</center>
@end html

@subsubheading Inverses and Normalization

These definitions correspond directly to the unnormalized DFTs used
elsewhere in FFTW (hence the factors of @math{2} in front of the
summations).  The unnormalized inverse of @code{REDFT00} is
@code{REDFT00}, of @code{REDFT10} is @code{REDFT01} and vice versa, and
of @code{REDFT11} is @code{REDFT11}.  Each unnormalized inverse results
in the original array multiplied by @math{N}, where @math{N} is the
@emph{logical} DFT size.  For @code{REDFT00}, @math{N=2(n-1)} (note that
@math{n=1} is not defined); otherwise, @math{N=2n}.
@cindex normalization

In defining the discrete cosine transform, some authors also include
additional factors of
@ifinfo
sqrt(2)
@end ifinfo
@html
&radic;2
@end html
@tex
$\sqrt{2}$
@end tex
(or its inverse) multiplying selected inputs and/or outputs.  This is a
mostly cosmetic change that makes the transform orthogonal, but
sacrifices the direct equivalence to a symmetric DFT.

@c =========>
@node 1d Real-odd DFTs (DSTs), 1d Discrete Hartley Transforms (DHTs), 1d Real-even DFTs (DCTs), What FFTW Really Computes
@subsection 1d Real-odd DFTs (DSTs)

The Real-odd symmetry DFTs in FFTW are exactly equivalent to the unnormalized
forward (and backward) DFTs as defined above, where the input array
@math{X} of length @math{N} is purely real and is also @dfn{odd} symmetry.  In
this case, the output is odd symmetry and purely imaginary.
@cindex real-odd DFT
@cindex RODFT

@ctindex RODFT00
For the case of @code{RODFT00}, this odd symmetry means that
@tex
$X_j = -X_{N-j}$,
@end tex
@ifinfo
X[j] = -X[N-j],
@end ifinfo
@html
<i>X<sub>j</sub> = -X<sub>N-j</sub></i>,
@end html
where we take @math{X} to be periodic so that
@tex
$X_N = X_0$.
@end tex
@ifinfo
X[N] = X[0].
@end ifinfo
@html
<i>X<sub>N</sub> = X</i><sub>0</sub>.
@end html
Because of this redundancy, only the first @math{n} real numbers
starting at @math{j=1} are actually stored (the @math{j=0} element is
zero), where @math{N = 2(n+1)}.

The proper definition of odd symmetry for @code{RODFT10},
@code{RODFT01}, and @code{RODFT11} transforms is somewhat more intricate
because of the shifts by @math{1/2} of the input and/or output, although
the corresponding boundary conditions are given in @ref{Real even/odd
DFTs (cosine/sine transforms)}.  Because of the odd symmetry, however,
the cosine terms in the DFT all cancel and the remaining sine terms are
written explicitly below.  This formulation often leads people to call
such a transform a @dfn{discrete sine transform} (DST), although it is
really just a special case of the DFT.
@cindex discrete sine transform
@cindex DST

In each of the definitions below, we transform a real array @math{X} of
length @math{n} to a real array @math{Y} of length @math{n}:

@subsubheading RODFT00 (DST-I)
@ctindex RODFT00
An @code{RODFT00} transform (type-I DST) in FFTW is defined by:
@tex
$$
Y_k = 2 \sum_{j=0}^{n-1} X_j \sin [ \pi (j+1) (k+1) / (n+1)].
$$
@end tex
@ifinfo
Y[k] = 2 (sum for j = 0 to n-1 of X[j] sin(pi (j+1)(k+1) / (n+1))).
@end ifinfo
@html
<center><img src="equation-rodft00.png" align="top">.</center>
@end html

@subsubheading RODFT10 (DST-II)
@ctindex RODFT10
An @code{RODFT10} transform (type-II DST) in FFTW is defined by:
@tex
$$
Y_k = 2 \sum_{j=0}^{n-1} X_j \sin [\pi (j+1/2) (k+1) / n].
$$
@end tex
@ifinfo
Y[k] = 2 (sum for j = 0 to n-1 of X[j] sin(pi (j+1/2) (k+1) / n)).
@end ifinfo
@html
<center><img src="equation-rodft10.png" align="top">.</center>
@end html

@subsubheading RODFT01 (DST-III)
@ctindex RODFT01
An @code{RODFT01} transform (type-III DST) in FFTW is defined by:
@tex
$$
Y_k = (-1)^k X_{n-1} + 2 \sum_{j=0}^{n-2} X_j \sin [\pi (j+1) (k+1/2) / n].
$$
@end tex
@ifinfo
Y[k] = (-1)^k X[n-1] + 2 (sum for j = 0 to n-2 of X[j] sin(pi (j+1) (k+1/2) / n)).
@end ifinfo
@html
<center><img src="equation-rodft01.png" align="top">.</center>
@end html
In the case of @math{n=1}, this reduces to
@tex
$Y_0 = X_0$.
@end tex
@ifinfo
Y[0] = X[0].
@end ifinfo
@html
<i>Y</i><sub>0</sub> = <i>X</i><sub>0</sub>.
@end html

@subsubheading RODFT11 (DST-IV)
@ctindex RODFT11
An @code{RODFT11} transform (type-IV DST) in FFTW is defined by:
@tex
$$
Y_k = 2 \sum_{j=0}^{n-1} X_j \sin [\pi (j+1/2) (k+1/2) / n].
$$
@end tex
@ifinfo
Y[k] = 2 (sum for j = 0 to n-1 of X[j] sin(pi (j+1/2) (k+1/2) / n)).
@end ifinfo
@html
<center><img src="equation-rodft11.png" align="top">.</center>
@end html

@subsubheading Inverses and Normalization

These definitions correspond directly to the unnormalized DFTs used
elsewhere in FFTW (hence the factors of @math{2} in front of the
summations).  The unnormalized inverse of @code{RODFT00} is
@code{RODFT00}, of @code{RODFT10} is @code{RODFT01} and vice versa, and
of @code{RODFT11} is @code{RODFT11}.  Each unnormalized inverse results
in the original array multiplied by @math{N}, where @math{N} is the
@emph{logical} DFT size.  For @code{RODFT00}, @math{N=2(n+1)};
otherwise, @math{N=2n}.
@cindex normalization

In defining the discrete sine transform, some authors also include
additional factors of
@ifinfo
sqrt(2)
@end ifinfo
@html
&radic;2
@end html
@tex
$\sqrt{2}$
@end tex
(or its inverse) multiplying selected inputs and/or outputs.  This is a
mostly cosmetic change that makes the transform orthogonal, but
sacrifices the direct equivalence to an antisymmetric DFT.

@c =========>
@node 1d Discrete Hartley Transforms (DHTs), Multi-dimensional Transforms, 1d Real-odd DFTs (DSTs), What FFTW Really Computes
@subsection 1d Discrete Hartley Transforms (DHTs)

@cindex discrete Hartley transform
@cindex DHT
The discrete Hartley transform (DHT) of a 1d real array @math{X} of size
@math{n} computes a real array @math{Y} of the same size, where:
@tex
$$
Y_k = \sum_{j = 0}^{n - 1} X_j [ \cos(2\pi j k / n) + \sin(2\pi j k / n)].
$$
@end tex
@ifinfo
@center Y[k] = sum for j = 0 to (n - 1) of X[j] * [cos(2 pi j k / n) + sin(2 pi j k / n)].
@end ifinfo
@html
<center><img src="equation-dht.png" align="top">.</center>
@end html

@cindex normalization
FFTW computes an unnormalized transform, in that there is no coefficient
in front of the summation in the DHT.  In other words, applying the
transform twice (the DHT is its own inverse) will multiply the input by
@math{n}.

@c =========>
@node Multi-dimensional Transforms,  , 1d Discrete Hartley Transforms (DHTs), What FFTW Really Computes
@subsection Multi-dimensional Transforms

The multi-dimensional transforms of FFTW, in general, compute simply the
separable product of the given 1d transform along each dimension of the
array.  Since each of these transforms is unnormalized, computing the
forward followed by the backward/inverse multi-dimensional transform
will result in the original array scaled by the product of the
normalization factors for each dimension (e.g. the product of the
dimension sizes, for a multi-dimensional DFT).

@tex
As an explicit example, consider the following exact mathematical
definition of our multi-dimensional DFT.  Let $X$ be a $d$-dimensional
complex array whose elements are $X[j_1, j_2, \ldots, j_d]$, where $0
\leq j_s < n_s$ for all~$s \in \{ 1, 2, \ldots, d \}$.  Let also
$\omega_s = e^{2\pi \sqrt{-1}/n_s}$, for all ~$s \in \{ 1, 2, \ldots, d
\}$.

The forward transform computes a complex array~$Y$, whose
structure is the same as that of~$X$, defined by

$$
Y[k_1, k_2, \ldots, k_d] =
    \sum_{j_1 = 0}^{n_1 - 1}
        \sum_{j_2 = 0}^{n_2 - 1}
           \cdots
              \sum_{j_d = 0}^{n_d - 1}
                  X[j_1, j_2, \ldots, j_d]
                      \omega_1^{-j_1 k_1}
                      \omega_2^{-j_2 k_2}
                      \cdots
                      \omega_d^{-j_d k_d} \ .
$$

The backward transform computes
$$
Y[k_1, k_2, \ldots, k_d] =
    \sum_{j_1 = 0}^{n_1 - 1}
        \sum_{j_2 = 0}^{n_2 - 1}
           \cdots
              \sum_{j_d = 0}^{n_d - 1}
                  X[j_1, j_2, \ldots, j_d]
                      \omega_1^{j_1 k_1}
                      \omega_2^{j_2 k_2}
                      \cdots
                      \omega_d^{j_d k_d} \ .
$$

Computing the forward transform followed by the backward transform
will multiply the array by $\prod_{s=1}^{d} n_d$.
@end tex

@cindex r2c
The definition of FFTW's multi-dimensional DFT of real data (r2c)
deserves special attention.  In this case, we logically compute the full
multi-dimensional DFT of the input data; since the input data are purely
real, the output data have the Hermitian symmetry and therefore only one
non-redundant half need be stored.  More specifically, for an @ndims multi-dimensional real-input DFT, the full (logical) complex output array
@tex
$Y[k_0, k_1, \ldots, k_{d-1}]$
@end tex
@html
<i>Y</i>[<i>k</i><sub>0</sub>, <i>k</i><sub>1</sub>, ...,
<i>k</i><sub><i>d-1</i></sub>]
@end html
@ifinfo
Y[k[0], k[1], ..., k[d-1]]
@end ifinfo
has the symmetry:
@tex
$$
Y[k_0, k_1, \ldots, k_{d-1}] = Y[n_0 - k_0, n_1 - k_1, \ldots, n_{d-1} - k_{d-1}]^*
$$
@end tex
@html
<i>Y</i>[<i>k</i><sub>0</sub>, <i>k</i><sub>1</sub>, ...,
<i>k</i><sub><i>d-1</i></sub>] = <i>Y</i>[<i>n</i><sub>0</sub> -
<i>k</i><sub>0</sub>, <i>n</i><sub>1</sub> - <i>k</i><sub>1</sub>, ...,
<i>n</i><sub><i>d-1</i></sub> - <i>k</i><sub><i>d-1</i></sub>]<sup>*</sup>
@end html
@ifinfo
Y[k[0], k[1], ..., k[d-1]] = Y[n[0] - k[0], n[1] - k[1], ..., n[d-1] - k[d-1]]*
@end ifinfo
(where each dimension is periodic).  Because of this symmetry, we only
store the
@tex
$k_{d-1} = 0 \cdots n_{d-1}/2$
@end tex
@html
<i>k</i><sub><i>d-1</i></sub> = 0...<i>n</i><sub><i>d-1</i></sub>/2+1
@end html
@ifinfo
k[d-1] = 0...n[d-1]/2
@end ifinfo
elements of the @emph{last} dimension (division by @math{2} is rounded
down).  (We could instead have cut any other dimension in half, but the
last dimension proved computationally convenient.)  This results in the
peculiar array format described in more detail by @ref{Real-data DFT
Array Format}.

The multi-dimensional c2r transform is simply the unnormalized inverse
of the r2c transform.  i.e. it is the same as FFTW's complex backward
multi-dimensional DFT, operating on a Hermitian input array in the
peculiar format mentioned above and outputting a real array (since the
DFT output is purely real).

We should remind the user that the separable product of 1d transforms
along each dimension, as computed by FFTW, is not always the same thing
as the usual multi-dimensional transform.  A multi-dimensional
@code{R2HC} (or @code{HC2R}) transform is not identical to the
multi-dimensional DFT, requiring some post-processing to combine the
requisite real and imaginary parts, as was described in @ref{The
Halfcomplex-format DFT}.  Likewise, FFTW's multidimensional
@code{FFTW_DHT} r2r transform is not the same thing as the logical
multi-dimensional discrete Hartley transform defined in the literature,
as discussed in @ref{The Discrete Hartley Transform}.

@c ************************************************************
@node Multi-threaded FFTW, FFTW on the Cell Processor, FFTW Reference, Top
@chapter Multi-threaded FFTW

@cindex parallel transform
In this chapter we document the parallel FFTW routines for
shared-memory parallel hardware.  These routines, which support
parallel one- and multi-dimensional transforms of both real and
complex data, are the easiest way to take advantage of multiple
processors with FFTW.  They work just like the corresponding
uniprocessor transform routines, except that you have an extra
initialization routine to call, and there is a routine to set the
number of threads to employ.  Any program that uses the uniprocessor
FFTW can therefore be trivially modified to use the multi-threaded
FFTW.

A shared-memory machine is one in which all CPUs can directly access
the same main memory, and such machines are now common due to the
ubiquity of multi-core CPUs.  FFTW's multi-threading support allows
you to utilize these additional CPUs transparently from a single
program.  However, this does not necessarily translate into
performance gains---when multiple threads/CPUs are employed, there is
an overhead required for synchronization that may outweigh the
computatational parallelism.  Therefore, you can only benefit from
threads if your problem is sufficiently large.
@cindex shared-memory
@cindex threads

@menu
* Installation and Supported Hardware/Software::
* Usage of Multi-threaded FFTW::
* How Many Threads to Use?::
* Thread safety::
@end menu

@c ------------------------------------------------------------
@node Installation and Supported Hardware/Software, Usage of Multi-threaded FFTW, Multi-threaded FFTW, Multi-threaded FFTW
@section Installation and Supported Hardware/Software

All of the FFTW threads code is located in the @code{threads}
subdirectory of the FFTW package.  On Unix systems, the FFTW threads
libraries and header files can be automatically configured, compiled,
and installed along with the uniprocessor FFTW libraries simply by
including @code{--enable-threads} in the flags to the @code{configure}
script (@pxref{Installation on Unix}).
@fpindex configure

@cindex portability
The threads routines require your operating system to have some sort
of shared-memory threads support.  Specifically, the FFTW threads
package works with POSIX threads (available on most Unix variants,
from GNU/Linux to MacOS X) and Win32 threads.  We also support using
@uref{http://www.openmp.org,OpenMP}, enabled by using
@code{--enable-openmp} (@emph{instead} of @code{--enable-threads}).
(This may be useful if you are employing that sort of directive in
your own code, in order to minimize conflicts.)  If you have a
shared-memory machine that uses a different threads API, it should be
a simple matter of programming to include support for it; see the file
@code{threads/threads.c} for more detail.

Ideally, of course, you should also have multiple processors in order to
get any benefit from the threaded transforms.

@c ------------------------------------------------------------
@node Usage of Multi-threaded FFTW, How Many Threads to Use?, Installation and Supported Hardware/Software, Multi-threaded FFTW
@section Usage of Multi-threaded FFTW

Here, it is assumed that the reader is already familiar with the usage
of the uniprocessor FFTW routines, described elsewhere in this manual.
We only describe what one has to change in order to use the
multi-threaded routines.

First, programs using the parallel complex transforms should be linked with
@code{-lfftw3_threads -lfftw3 -lm} on Unix. You will also need to link
with whatever library is responsible for threads on your system
(e.g. @code{-lpthread} on GNU/Linux).
@cindex linking on Unix

Second, before calling @emph{any} FFTW routines, you should call the
function:

@example
int fftw_init_threads(void);
@end example
@findex fftw_init_threads

This function, which need only be called once, performs any one-time
initialization required to use threads on your system.  It returns zero
if there was some error (which should not happen under normal
circumstances) and a non-zero value otherwise.

Third, before creating a plan that you want to parallelize, you should
call:

@example
void fftw_plan_with_nthreads(int nthreads);
@end example
@findex fftw_plan_with_nthreads

The @code{nthreads} argument indicates the number of threads you want
FFTW to use (or actually, the maximum number).  All plans subsequently
created with any planner routine will use that many threads.  You can
call @code{fftw_plan_with_nthreads}, create some plans, call
@code{fftw_plan_with_nthreads} again with a different argument, and
create some more plans for a new number of threads.  Plans already created
before a call to @code{fftw_plan_with_nthreads} are unaffected.  If you
pass an @code{nthreads} argument of @code{1} (the default), threads are
disabled for subsequent plans.

Given a plan, you then execute it as usual with
@code{fftw_execute(plan)}, and the execution will use the number of
threads specified when the plan was created.  When done, you destroy it
as usual with @code{fftw_destroy_plan}.

There is one additional routine: if you want to get rid of all memory
and other resources allocated internally by FFTW, you can call:

@example
void fftw_cleanup_threads(void);
@end example
@findex fftw_cleanup_threads

which is much like the @code{fftw_cleanup()} function except that it
also gets rid of threads-related data.  You must @emph{not} execute any
previously created plans after calling this function.

We should also mention one other restriction: if you save wisdom from a
program using the multi-threaded FFTW, that wisdom @emph{cannot be used}
by a program using only the single-threaded FFTW (i.e. not calling
@code{fftw_init_threads}).  @xref{Words of Wisdom-Saving Plans}.

@c ------------------------------------------------------------
@node How Many Threads to Use?, Thread safety, Usage of Multi-threaded FFTW, Multi-threaded FFTW
@section How Many Threads to Use?

@cindex number of threads
There is a fair amount of overhead involved in synchronizing threads,
so the optimal number of threads to use depends upon the size of the
transform as well as on the number of processors you have.

As a general rule, you don't want to use more threads than you have
processors.  (Using more threads will work, but there will be extra
overhead with no benefit.)  In fact, if the problem size is too small,
you may want to use fewer threads than you have processors.

You will have to experiment with your system to see what level of
parallelization is best for your problem size.  Typically, the problem
will have to involve at least a few thousand data points before threads
become beneficial.  If you plan with @code{FFTW_PATIENT}, it will
automatically disable threads for sizes that don't benefit from
parallelization.
@ctindex FFTW_PATIENT

@c ------------------------------------------------------------
@node Thread safety,  , How Many Threads to Use?, Multi-threaded FFTW
@section Thread safety

@cindex threads
@cindex thread safety
Users writing multi-threaded programs must concern themselves with the
@dfn{thread safety} of the libraries they use---that is, whether it is
safe to call routines in parallel from multiple threads.  FFTW can be
used in such an environment, but some care must be taken because the
planner routines share data (e.g. wisdom and trigonometric tables)
between calls and plans.

The upshot is that the only thread-safe (re-entrant) routine in FFTW is
@code{fftw_execute} (and the new-array variants thereof).  All other routines
(e.g. the planner) should only be called from one thread at a time.  So,
for example, you can wrap a semaphore lock around any calls to the
planner; even more simply, you can just create all of your plans from
one thread.  We do not think this should be an important restriction
(FFTW is designed for the situation where the only performance-sensitive
code is the actual execution of the transform), and the benefits of
shared data between plans are great.

Note also that, since the plan is not modified by @code{fftw_execute},
it is safe to execute the @emph{same plan} in parallel by multiple
threads.  However, since a given plan operates by default on a fixed
array, you need to use one of the new-array execute functions (@pxref{New-array Execute Functions}) so that different threads compute the transform of different data.

(Users should note that these comments only apply to programs using
shared-memory threads.  Parallelism using MPI or forked processes
involves a separate address-space and global variables for each process,
and is not susceptible to problems of this sort.)

@c ************************************************************
@node FFTW on the Cell Processor, Calling FFTW from Fortran, Multi-threaded FFTW, Top
@chapter FFTW on the Cell Processor
@cindex Cell processor

Starting with version 3.2, FFTW contains specific support for the Cell
Broadband Engine (``Cell'') processor, graciously donated by the IBM
Austin Research Laboratory.

Cell consists of one PowerPC core (``PPE'') and of a number of
Synergistic Processing Elements (``SPE'') to which the PPE can
delegate computation.  The IBM QS20 Cell blade offers 8 SPEs per Cell
chip.  The Sony Playstation 3 contains 6 useable SPEs.

Currently, FFTW fully utilizes the SPEs for one- and multi-dimensional
complex FFTs of sizes that can be factored into small primes, both in
single and double precision.  Transforms of real data use SPEs only
partially at this time.  If FFTW cannot use the SPEs, it falls back to
a slower computation on the PPE.

FFTW is meant to use the SPEs transparently without user intervention.
However, certain caveats apply, which are discussed later in this
document.

@menu
* Cell Installation::
* Cell Caveats::
* FFTW Accuracy on Cell::
@end menu

@c ------------------------------------------------------------
@node Cell Installation, Cell Caveats, FFTW on the Cell Processor, FFTW on the Cell Processor
@section Cell Installation

All of the FFTW Cell code is located in the @code{cell} subdirectory
of the FFTW package.  On Unix systems, the FFTW Cell support is
automatically configured, compiled, and included in the uniprocessor
FFTW libraries simply by including @code{--enable-cell} in the flags
to the @code{configure} script (@pxref{Installation on Unix}).
@fpindex configure

Both double precision (the default) and single precision are supported
on the Cell; for the latter, configure with @code{--enable-cell
--enable-single}.

In addition, the PPE supports the Altivec (or VMX) instruction set in
single precision.  (Altivec is Apple/Freescale terminology, VMX is IBM
terminology for the same thing.)  You can enable support for Altivec
with the @code{--enable-altivec} flag (single precision only).

The software compiles with the Cell SDK 2.0, and probably with earlier
ones as well.

@c ------------------------------------------------------------
@node Cell Caveats, FFTW Accuracy on Cell, Cell Installation, FFTW on the Cell Processor
@section Cell Caveats

@itemize @bullet
@item
The FFTW benchmark program allocates memory using malloc() or
equivalent library calls, reflecting the common usage of the FFTW
library.  However, you can sometimes improve performance significantly
by allocating memory in system-specific large TLB pages.  E.g., we
have seen 39 GFLOPS/s for a @threedims{256,256,256} problem using
large pages, whereas the speed is about 25 GFLOPS/s with normal pages.
YMMV.

@item
FFTW hoards all available SPEs for itself.  You can optionally
choose a different number of SPEs by calling the undocumented
function @code{fftw_cell_set_nspe(n)}, where @code{n} is the number of desired
SPEs.  Expect this interface to go away once we figure out how to
make FFTW play nicely with other Cell software.

In particular, if you try to link both the single and double precision
of FFTW in the same program (which you can do), they will both try
to grab all SPEs and the second one will hang.

@item
The SPEs demand that data be stored in contiguous arrays aligned at
16-byte boundaries.  If you instruct FFTW to operate on
noncontiguous or nonaligned data, the SPEs will not be used,
resulting in slow execution.  @xref{Data Alignment}.

@item
The @code{FFTW_ESTIMATE} mode may produce seriously suboptimal plans, and
it becomes particularly confused if you enable both the SPEs and
Altivec.  If you care about performance, please use @code{FFTW_MEASURE}
or @code{FFTW_PATIENT} until we figure out a more reliable performance model.

@end itemize

@c ------------------------------------------------------------
@node FFTW Accuracy on Cell,  , Cell Caveats, FFTW on the Cell Processor
@section FFTW Accuracy on Cell

The SPEs are fully IEEE-754 compliant in double precision.  In single
precision, they only implement round-towards-zero as opposed to the
standard round-to-even mode.  (The PPE is fully IEEE-754 compliant
like all other PowerPC implementations.)  Because of the rounding
mode, FFTW is less accurate when running on the SPEs than on the PPE.
The accuracy loss is hard to quantify in general, but as a rough
guideline, the L2 norm of the relative roundoff error for random
inputs is 4 to 8 times larger than the corresponding calculation in
round-to-even arithmetic.  In other words, expect to lose 2 to 3 bits
of accuracy.

FFTW currently does not use any algorithm that degrades accuracy to
gain performance on the SPE.  One implication of this choice is that
large 1D transforms run slower than they would if we were willing to
sacrifice another bit or so of accuracy.

@c ************************************************************
@node Calling FFTW from Fortran, Upgrading from FFTW version 2, FFTW on the Cell Processor, Top
@chapter Calling FFTW from Fortran
@cindex Fortran interface

This chapter describes the Fortran-callable interface to FFTW, which
differs from the C interface only in the prefix (@samp{dfftw_} instead
of @samp{fftw_}), and a few other minor details.  The Fortran interface
is included in the FFTW libraries by default, unless a Fortran compiler
isn't found on your system or @code{--disable-fortran} is included in
the @code{configure} flags.  We assume here that the reader is already
familiar with the usage of FFTW in C, as described elsewhere in this
manual.

@menu
* Fortran-interface routines::
* FFTW Constants in Fortran::
* FFTW Execution in Fortran::
* Fortran Examples::
* Wisdom of Fortran?::
@end menu

@c -------------------------------------------------------
@node Fortran-interface routines, FFTW Constants in Fortran, Calling FFTW from Fortran, Calling FFTW from Fortran
@section Fortran-interface routines

Nearly all of the FFTW functions have Fortran-callable equivalents.  The
name of the Fortran routine is the same as that of the corresponding C
routine, but with the @samp{fftw_} prefix replaced by @samp{dfftw_}.
(The single and long-double precision versions use @samp{sfftw_} and
@samp{lfftw_}, respectively, instead of @samp{fftwf_} and
@samp{fftwl_}.)@footnote{Technically, Fortran 77 identifiers are
not allowed to have more than 6 characters, nor may they contain
underscores.  Any compiler that enforces this limitation doesn't deserve
to link to FFTW.}

For the most part, all of the arguments to the functions are the same,
with the following exceptions:

@itemize @bullet

@item
@code{plan} variables (what would be of type @code{fftw_plan} in C),
must be declared as a type that is at least as big as a pointer
(address) on your machine.  We recommend using @code{integer*8}.
@cindex portability

@item
Any function that returns a value (e.g. @code{fftw_plan_dft}) is
converted into a @emph{subroutine}.  The return value is converted into
an additional @emph{first} parameter of this subroutine.@footnote{The
reason for this is that some Fortran implementations seem to have
trouble with C function return values, and vice versa.}

@item
@cindex column-major
The Fortran routines expect multi-dimensional arrays to be in
@emph{column-major} order, which is the ordinary format of Fortran
arrays (@pxref{Multi-dimensional Array Format}).  They do this
transparently and costlessly simply by reversing the order of the
dimensions passed to FFTW, but this has one important consequence for
multi-dimensional real-complex transforms, discussed below.

@item
Wisdom import and export is somewhat more tricky because one cannot
easily pass files or strings between C and Fortran; see @ref{Wisdom of
Fortran?}.

@item
Fortran cannot use the @code{fftw_malloc} dynamic-allocation routine.
If you want to exploit the SIMD FFTW (@pxref{Data Alignment}), you'll
need to figure out some other way to ensure that your arrays are at
least 16-byte aligned.

@item
@tindex fftw_iodim
@cindex guru interface
Since Fortran 77 does not have data structures, the @code{fftw_iodim}
structure from the guru interface (@pxref{Guru vector and transform
sizes}) must be split into separate arguments.  In particular, any
@code{fftw_iodim} array arguments in the C guru interface become three
integer array arguments (@code{n}, @code{is}, and @code{os}) in the
Fortran guru interface, all of whose lengths should be equal to the
corresponding @code{rank} argument.

@item
The guru planner interface in Fortran does @emph{not} do any automatic
translation between column-major and row-major; you are responsible
for setting the strides etcetera to correspond to your Fortran arrays.
However, as a slight bug that we are preserving for backwards
compatibility, the @samp{plan_guru_r2r} in Fortran @emph{does} reverse the
order of its @code{kind} array parameter, so the @code{kind} array
of that routine should be in the reverse of the order of the iodim
arrays (see above).

@end itemize

In general, you should take care to use Fortran data types that
correspond to (i.e. are the same size as) the C types used by FFTW.  If
your C and Fortran compilers are made by the same vendor, the
correspondence is usually straightforward (i.e. @code{integer}
corresponds to @code{int}, @code{real} corresponds to @code{float},
etcetera).  The native Fortran double/single-precision complex type
should be compatible with @code{fftw_complex}/@code{fftwf_complex}.
Such simple correspondences are assumed in the examples below.
@cindex portability

@c -------------------------------------------------------
@node  FFTW Constants in Fortran, FFTW Execution in Fortran, Fortran-interface routines, Calling FFTW from Fortran
@section FFTW Constants in Fortran

When creating plans in FFTW, a number of constants are used to specify
options, such as @code{FFTW_MEASURE} or @code{FFTW_ESTIMATE}.  The
same constants must be used with the wrapper routines, but of course the
C header files where the constants are defined can't be incorporated
directly into Fortran code.

Instead, we have placed Fortran equivalents of the FFTW constant
definitions in the file @code{fftw3.f}, which can be found in the same
directory as @code{fftw3.h}.  If your Fortran compiler supports a
preprocessor of some sort, you should be able to @code{include} or
@code{#include} this file; otherwise, you can paste it directly into
your code.

@cindex flags
In C, you combine different flags (like @code{FFTW_PRESERVE_INPUT} and
@code{FFTW_MEASURE}) using the @samp{@code{|}} operator; in Fortran you
should just use @samp{@code{+}}.  (Take care not to add in the same flag
more than once, though.)

@c -------------------------------------------------------
@node  FFTW Execution in Fortran, Fortran Examples, FFTW Constants in Fortran, Calling FFTW from Fortran
@section FFTW Execution in Fortran

In C, in order to use a plan, one normally calls @code{fftw_execute},
which executes the plan to perform the transform on the input/output
arrays passed when the plan was created (@pxref{Using Plans}).  The
corresponding subroutine call in Fortran is:
@example
        call dfftw_execute(plan)
@end example
@findex dfftw_execute

However, we have had reports that this causes problems with some
recent optimizing Fortran compilers.  The problem is, because the
input/output arrays are not passed as explicit arguments to
@code{dfftw_execute}, the semantics of Fortran (unlike C) allow the
compiler to assume that the input/output arrays are not changed by
@code{dfftw_execute}.  As a consequence, certain compilers end up
optimizing out or repositioning the call to @code{dfftw_execute},
assuming incorrectly that it does nothing.

There are various workarounds to this, but the safest and simplest
thing is to not use @code{dfftw_execute} in Fortran.  Instead, use the
functions described in @ref{New-array Execute Functions}, which take
the input/output arrays as explicit arguments.  For example, if the
plan is for a complex-data DFT and was created for the arrays
@code{in} and @code{out}, you would do:
@example
        call dfftw_execute_dft(plan, in, out)
@end example
@findex dfftw_execute_dft

There are a few things to be careful of, however:

@itemize @bullet

@item
You must use the correct type of execute function, matching the way
the plan was created.  Complex DFT plans should use
@code{dfftw_execute_dft}, Real-input (r2c) DFT plans should use use
@code{dfftw_execute_dft_r2c}, and real-output (c2r) DFT plans should
use @code{dfftw_execute_dft_c2r}.  The various r2r plans should use
@code{dfftw_execute_r2r}.

@item
You should normally pass the same input/output arrays that were used when
creating the plan.  This is always safe.

@item
@emph{If} you pass @emph{different} input/output arrays compared to
those used when creating the plan, you must abide by all the
restrictions of the new-array execute functions (@pxref{New-array
Execute Functions}).  The most difficult of these, in Fortran, is the
requirement that the new arrays have the same alignment as the
original arrays, because there seems to be no way in Fortran to obtain
guaranteed-aligned arrays (analogous to @code{fftw_malloc} in C).  You
can, of course, use the @code{FFTW_UNALIGNED} flag when creating the
plan, in which case the plan does not depend on the alignment, but
this may sacrifice substantial performance on architectures (like x86)
with SIMD instructions (@pxref{SIMD alignment and fftw_malloc}).
@ctindex FFTW_UNALIGNED

@end itemize

@c -------------------------------------------------------
@node Fortran Examples, Wisdom of Fortran?, FFTW Execution in Fortran, Calling FFTW from Fortran
@section Fortran Examples

In C, you might have something like the following to transform a
one-dimensional complex array:

@example
        fftw_complex in[N], out[N];
        fftw_plan plan;

        plan = fftw_plan_dft_1d(N,in,out,FFTW_FORWARD,FFTW_ESTIMATE);
        fftw_execute(plan);
        fftw_destroy_plan(plan);
@end example

In Fortran, you would use the following to accomplish the same thing:

@example
        double complex in, out
        dimension in(N), out(N)
        integer*8 plan

        call dfftw_plan_dft_1d(plan,N,in,out,FFTW_FORWARD,FFTW_ESTIMATE)
        call dfftw_execute_dft(plan, in, out)
        call dfftw_destroy_plan(plan)
@end example
@findex dfftw_plan_dft_1d
@findex dfftw_execute_dft
@findex dfftw_destroy_plan

Notice how all routines are called as Fortran subroutines, and the
plan is returned via the first argument to @code{dfftw_plan_dft_1d}.
Notice also that we changed @code{fftw_execute} to
@code{dfftw_execute_dft} (@pxref{FFTW Execution in Fortran}).  To do
the same thing, but using 8 threads in parallel (@pxref{Multi-threaded
FFTW}), you would simply prefix these calls with:

@example
        call dfftw_init_threads
        call dfftw_plan_with_nthreads(8)
@end example
@findex dfftw_init_threads
@findex dfftw_plan_with_nthreads

To transform a three-dimensional array in-place with C, you might do:

@example
        fftw_complex arr[L][M][N];
        fftw_plan plan;

        plan = fftw_plan_dft_3d(L,M,N, arr,arr,
                                FFTW_FORWARD, FFTW_ESTIMATE);
        fftw_execute(plan);
        fftw_destroy_plan(plan);
@end example

In Fortran, you would use this instead:

@example
        double complex arr
        dimension arr(L,M,N)
        integer*8 plan

        call dfftw_plan_dft_3d(plan, L,M,N, arr,arr,
       &                       FFTW_FORWARD, FFTW_ESTIMATE)
        call dfftw_execute_dft(plan, arr, arr)
        call dfftw_destroy_plan(plan)
@end example
@findex dfftw_plan_dft_3d

Note that we pass the array dimensions in the ``natural'' order in both C
and Fortran.

To transform a one-dimensional real array in Fortran, you might do:

@example
        double precision in
        dimension in(N)
        double complex out
        dimension out(N/2 + 1)
        integer*8 plan

        call dfftw_plan_dft_r2c_1d(plan,N,in,out,FFTW_ESTIMATE)
        call dfftw_execute_dft_r2c(plan, in, out)
        call dfftw_destroy_plan(plan)
@end example
@findex dfftw_plan_dft_r2c_1d
@findex dfftw_execute_dft_r2c

To transform a two-dimensional real array, out of place, you might use
the following:

@example
        double precision in
        dimension in(M,N)
        double complex out
        dimension out(M/2 + 1, N)
        integer*8 plan

        call dfftw_plan_dft_r2c_2d(plan,M,N,in,out,FFTW_ESTIMATE)
        call dfftw_execute_dft_r2c(plan, in, out)
        call dfftw_destroy_plan(plan)
@end example
@findex dfftw_plan_dft_r2c_2d

@strong{Important:} Notice that it is the @emph{first} dimension of the
complex output array that is cut in half in Fortran, rather than the
last dimension as in C.  This is a consequence of the interface routines
reversing the order of the array dimensions passed to FFTW so that the
Fortran program can use its ordinary column-major order.
@cindex column-major
@cindex r2c/c2r multi-dimensional array format

@c -------------------------------------------------------
@node Wisdom of Fortran?,  , Fortran Examples, Calling FFTW from Fortran
@section Wisdom of Fortran?

In this section, we discuss how one can import/export FFTW wisdom
(saved plans) to/from a Fortran program; we assume that the reader is
already familiar with wisdom, as described in @ref{Words of
Wisdom-Saving Plans}.

@cindex portability
The basic problem is that is difficult to (portably) pass files and
strings between Fortran and C, so we cannot provide a direct Fortran
equivalent to the @code{fftw_export_wisdom_to_file}, etcetera,
functions.  Fortran interfaces @emph{are} provided for the functions
that do not take file/string arguments, however:
@code{dfftw_import_system_wisdom}, @code{dfftw_import_wisdom},
@code{dfftw_export_wisdom}, and @code{dfftw_forget_wisdom}.
@findex dfftw_import_system_wisdom
@findex dfftw_import_wisdom
@findex dfftw_export_wisdom
@findex dfftw_forget_wisdom

So, for example, to import the system-wide wisdom, you would do:

@example
        integer isuccess
        call dfftw_import_system_wisdom(isuccess)
@end example

As usual, the C return value is turned into a first parameter;
@code{isuccess} is non-zero on success and zero on failure (e.g. if
there is no system wisdom installed).

If you want to import/export wisdom from/to an arbitrary file or
elsewhere, you can employ the generic @code{dfftw_import_wisdom} and
@code{dfftw_export_wisdom} functions, for which you must supply a
subroutine to read/write one character at a time.  The FFTW package
contains an example file @code{doc/f77_wisdom.f} demonstrating how to
implement @code{import_wisdom_from_file} and
@code{export_wisdom_to_file} subroutines in this way.  (These routines
cannot be compiled into the FFTW library itself, lest all FFTW-using
programs be required to link with the Fortran I/O library.)

@c ************************************************************
@node Upgrading from FFTW version 2, Installation and Customization, Calling FFTW from Fortran, Top
@chapter Upgrading from FFTW version 2

In this chapter, we outline the process for updating codes designed for
the older FFTW 2 interface to work with FFTW 3.  The interface for FFTW
3 is not backwards-compatible with the interface for FFTW 2 and earlier
versions; codes written to use those versions will fail to link with
FFTW 3.  Nor is it possible to write ``compatibility wrappers'' to
bridge the gap (at least not efficiently), because FFTW 3 has different
semantics from previous versions.  However, upgrading should be a
straightforward process because the data formats are identical and the
overall style of planning/execution is essentially the same.

Unlike FFTW 2, there are no separate header files for real and complex
transforms (or even for different precisions) in FFTW 3; all interfaces
are defined in the @code{<fftw3.h>} header file.

@heading Numeric Types

The main difference in data types is that @code{fftw_complex} in FFTW 2
was defined as a @code{struct} with macros @code{c_re} and @code{c_im}
for accessing the real/imaginary parts.  (This is binary-compatible with
FFTW 3 on any machine except perhaps for some older Crays in single
precision.)  The equivalent macros for FFTW 3 are:

@example
#define c_re(c) ((c)[0])
#define c_im(c) ((c)[1])
@end example

This does not work if you are using the C99 complex type, however,
unless you insert a @code{double*} typecast into the above macros
(@pxref{Complex numbers}).

Also, FFTW 2 had an @code{fftw_real} typedef that was an alias for
@code{double} (in double precision).  In FFTW 3 you should just use
@code{double} (or whatever precision you are employing).

@heading Plans

The major difference between FFTW 2 and FFTW 3 is in the
planning/execution division of labor.  In FFTW 2, plans were found for a
given transform size and type, and then could be applied to @emph{any}
arrays and for @emph{any} multiplicity/stride parameters.  In FFTW 3,
you specify the particular arrays, stride parameters, etcetera when
creating the plan, and the plan is then executed for @emph{those} arrays
(unless the guru interface is used) and @emph{those} parameters
@emph{only}.  (FFTW 2 had ``specific planner'' routines that planned for
a particular array and stride, but the plan could still be used for
other arrays and strides.)  That is, much of the information that was
formerly specified at execution time is now specified at planning time.

Like FFTW 2's specific planner routines, the FFTW 3 planner overwrites
the input/output arrays unless you use @code{FFTW_ESTIMATE}.

FFTW 2 had separate data types @code{fftw_plan}, @code{fftwnd_plan},
@code{rfftw_plan}, and @code{rfftwnd_plan} for complex and real one- and
multi-dimensional transforms, and each type had its own @samp{destroy}
function.  In FFTW 3, all plans are of type @code{fftw_plan} and all are
destroyed by @code{fftw_destroy_plan(plan)}.

Where you formerly used @code{fftw_create_plan} and @code{fftw_one} to
plan and compute a single 1d transform, you would now use
@code{fftw_plan_dft_1d} to plan the transform.  If you used the generic
@code{fftw} function to execute the transform with multiplicity
(@code{howmany}) and stride parameters, you would now use the advanced
interface @code{fftw_plan_many_dft} to specify those parameters.  The
plans are now executed with @code{fftw_execute(plan)}, which takes all
of its parameters (including the input/output arrays) from the plan.

In-place transforms no longer interpret their output argument as scratch
space, nor is there an @code{FFTW_IN_PLACE} flag.  You simply pass the
same pointer for both the input and output arguments.  (Previously, the
output @code{ostride} and @code{odist} parameters were ignored for
in-place transforms; now, if they are specified via the advanced
interface, they are significant even in the in-place case, although they
should normally equal the corresponding input parameters.)

The @code{FFTW_ESTIMATE} and @code{FFTW_MEASURE} flags have the same
meaning as before, although the planning time will differ.  You may also
consider using @code{FFTW_PATIENT}, which is like @code{FFTW_MEASURE}
except that it takes more time in order to consider a wider variety of
algorithms.

For multi-dimensional complex DFTs, instead of @code{fftwnd_create_plan}
(or @code{fftw2d_create_plan} or @code{fftw3d_create_plan}), followed by
@code{fftwnd_one}, you would use @code{fftw_plan_dft} (or
@code{fftw_plan_dft_2d} or @code{fftw_plan_dft_3d}).  followed by
@code{fftw_execute}.  If you used @code{fftwnd} to to specify strides
etcetera, you would instead specify these via @code{fftw_plan_many_dft}.

The analogues to @code{rfftw_create_plan} and @code{rfftw_one} with
@code{FFTW_REAL_TO_COMPLEX} or @code{FFTW_COMPLEX_TO_REAL} directions
are @code{fftw_plan_r2r_1d} with kind @code{FFTW_R2HC} or
@code{FFTW_HC2R}, followed by @code{fftw_execute}.  The stride etcetera
arguments of @code{rfftw} are now in @code{fftw_plan_many_r2r}.

Instead of @code{rfftwnd_create_plan} (or @code{rfftw2d_create_plan} or
@code{rfftw3d_create_plan}) followed by
@code{rfftwnd_one_real_to_complex} or
@code{rfftwnd_one_complex_to_real}, you now use @code{fftw_plan_dft_r2c}
(or @code{fftw_plan_dft_r2c_2d} or @code{fftw_plan_dft_r2c_3d}) or
@code{fftw_plan_dft_c2r} (or @code{fftw_plan_dft_c2r_2d} or
@code{fftw_plan_dft_c2r_3d}), respectively, followed by
@code{fftw_execute}.  As usual, the strides etcetera of
@code{rfftwnd_real_to_complex} or @code{rfftwnd_complex_to_real} are no
specified in the advanced planner routines,
@code{fftw_plan_many_dft_r2c} or @code{fftw_plan_many_dft_c2r}.

@heading Wisdom

In FFTW 2, you had to supply the @code{FFTW_USE_WISDOM} flag in order to
use wisdom; in FFTW 3, wisdom is always used.  (You could simulate the
FFTW 2 wisdom-less behavior by calling @code{fftw_forget_wisdom} after
every planner call.)

The FFTW 3 wisdom import/export routines are almost the same as before
(although the storage format is entirely different).  There is one
significant difference, however.  In FFTW 2, the import routines would
never read past the end of the wisdom, so you could store extra data
beyond the wisdom in the same file, for example.  In FFTW 3, the
file-import routine may read up to a few hundred bytes past the end of
the wisdom, so you cannot store other data just beyond it.@footnote{We
do our own buffering because GNU libc I/O routines are horribly slow for
single-character I/O, apparently for thread-safety reasons (whether you
are using threads or not).}

Wisdom has been enhanced by additional humility in FFTW 3: whereas FFTW
2 would re-use wisdom for a given transform size regardless of the
stride etc., in FFTW 3 wisdom is only used with the strides etc. for
which it was created.  Unfortunately, this means FFTW 3 has to create
new plans from scratch more often than FFTW 2 (in FFTW 2, planning
e.g. one transform of size 1024 also created wisdom for all smaller
powers of 2, but this no longer occurs).

FFTW 3 also has the new routine @code{fftw_import_system_wisdom} to
import wisdom from a standard system-wide location.

@heading Memory allocation

In FFTW 3, we recommend allocating your arrays with @code{fftw_malloc}
and deallocating them with @code{fftw_free}; this is not required, but
allows optimal performance when SIMD acceleration is used.  (Those two
functions actually existed in FFTW 2, and worked the same way, but were
not documented.)

In FFTW 2, there were @code{fftw_malloc_hook} and @code{fftw_free_hook}
functions that allowed the user to replace FFTW's memory-allocation
routines (e.g. to implement different error-handling, since by default
FFTW prints an error message and calls @code{exit} to abort the program
if @code{malloc} returns @code{NULL}).  These hooks are not supported in
FFTW 3; those few users who require this functionality can just
directly modify the memory-allocation routines in FFTW (they are defined
in @code{kernel/alloc.c}).

@heading Fortran interface

In FFTW 2, the subroutine names were obtained by replacing @samp{fftw_}
with @samp{fftw_f77}; in FFTW 3, you replace @samp{fftw_} with
@samp{dfftw_} (or @samp{sfftw_} or @samp{lfftw_}, depending upon the
precision).

In FFTW 3, we have begun recommending that you always declare the type
used to store plans as @code{integer*8}.  (Too many people didn't notice
our instruction to switch from @code{integer} to @code{integer*8} for
64-bit machines.)

In FFTW 3, we provide a @code{fftw3.f} ``header file'' to include in
your code (and which is officially installed on Unix systems).  (In FFTW
2, we supplied a @code{fftw_f77.i} file, but it was not installed.)

Otherwise, the C-Fortran interface relationship is much the same as it
was before (e.g. return values become initial parameters, and
multi-dimensional arrays are in column-major order).  Unlike FFTW 2, we
do provide some support for wisdom import/export in Fortran
(@pxref{Wisdom of Fortran?}).

@heading Threads

Like FFTW 2, only the execution routines are thread-safe.  All planner
routines, etcetera, should be called by only a single thread at a time
(@pxref{Thread safety}).  @emph{Unlike} FFTW 2, there is no special
@code{FFTW_THREADSAFE} flag for the planner to allow a given plan to be
usable by multiple threads in parallel; this is now the case by default.

The multi-threaded version of FFTW 2 required you to pass the number of
threads each time you execute the transform.  The number of threads is
now stored in the plan, and is specified before the planner is called by
@code{fftw_plan_with_nthreads}.  The threads initialization routine used
to be called @code{fftw_threads_init} and would return zero on success;
the new routine is called @code{fftw_init_threads} and returns zero on
failure.  @xref{Multi-threaded FFTW}.

There is no separate threads header file in FFTW 3; all the function
prototypes are in @code{<fftw3.h>}.  However, you still have to link to
a separate library (@code{-lfftw3_threads -lfftw3 -lm} on Unix), as well as
to the threading library (e.g. POSIX threads on Unix).

@c ************************************************************
@node Installation and Customization, Acknowledgments, Upgrading from FFTW version 2, Top
@chapter Installation and Customization
@cindex installation

This chapter describes the installation and customization of FFTW, the
latest version of which may be downloaded from
@uref{http://www.fftw.org, the FFTW home page}.

In principle, FFTW should work on any system with an ANSI C compiler
(@code{gcc} is fine).  However, planner time is drastically reduced if
FFTW can exploit a hardware cycle counter; FFTW comes with cycle-counter
support for all modern general-purpose CPUs, but you may need to add a
couple of lines of code if your compiler is not yet supported
(@pxref{Cycle Counters}).  (On Unix, there will be a warning at the end
of the @code{configure} output if no cycle counter is found.)
@cindex cycle counter
@cindex compiler
@cindex portability

Installation of FFTW is simplest if you have a Unix or a GNU system,
such as GNU/Linux, and we describe this case in the first section below,
including the use of special configuration options to e.g. install
different precisions or exploit optimizations for particular
architectures (e.g. SIMD).  Compilation on non-Unix systems is a more
manual process, but we outline the procedure in the second section.  It
is also likely that pre-compiled binaries will be available for popular
systems.

Finally, we describe how you can customize FFTW for particular needs by
generating @emph{codelets} for fast transforms of sizes not supported
efficiently by the standard FFTW distribution.
@cindex codelet

@menu
* Installation on Unix::
* Installation on non-Unix systems::
* Cycle Counters::
* Generating your own code::
@end menu

@c ------------------------------------------------------------

@node Installation on Unix, Installation on non-Unix systems, Installation and Customization, Installation and Customization
@section Installation on Unix

FFTW comes with a @code{configure} program in the GNU style.
Installation can be as simple as:
@fpindex configure

@example
./configure
make
make install
@end example

This will build the uniprocessor complex and real transform libraries
along with the test programs.  (We recommend that you use GNU
@code{make} if it is available; on some systems it is called
@code{gmake}.)  The ``@code{make install}'' command installs the fftw
and rfftw libraries in standard places, and typically requires root
privileges (unless you specify a different install directory with the
@code{--prefix} flag to @code{configure}).  You can also type
``@code{make check}'' to put the FFTW test programs through their paces.
If you have problems during configuration or compilation, you may want
to run ``@code{make distclean}'' before trying again; this ensures that
you don't have any stale files left over from previous compilation
attempts.

The @code{configure} script chooses the @code{gcc} compiler by default,
if it is available; you can select some other compiler with:
@example
./configure CC="@r{@i{<the name of your C compiler>}}"
@end example

The @code{configure} script knows good @code{CFLAGS} (C compiler flags)
@cindex compiler flags
for a few systems.  If your system is not known, the @code{configure}
script will print out a warning.  In this case, you should re-configure
FFTW with the command
@example
./configure CFLAGS="@r{@i{<write your CFLAGS here>}}"
@end example
and then compile as usual.  If you do find an optimal set of
@code{CFLAGS} for your system, please let us know what they are (along
with the output of @code{config.guess}) so that we can include them in
future releases.

@code{configure} supports all the standard flags defined by the GNU
Coding Standards; see the @code{INSTALL} file in FFTW or
@uref{http://www.gnu.org/prep/standards_toc.html, the GNU web page}.
Note especially @code{--help} to list all flags and
@code{--enable-shared} to create shared, rather than static, libraries.
@code{configure} also accepts a few FFTW-specific flags, particularly:

@itemize @bullet

@item
@cindex portability
@code{--enable-portable-binary}: Disable compiler optimizations that
would produce unportable binaries. @b{Important:} Use this if you are
distributing compiled binaries to people who may not use exactly the
same processor as you.

@item
@code{--with-gcc-arch=}@i{arch}: When compiling with @code{gcc}, FFTW
tries to deduce the current CPU in order to tell @code{gcc} what
architecture to tune for; this option overrides that guess
(i.e. @i{arch} should be a valid argument for @code{gcc}'s
@code{-march} or @code{-mtune} flags).  You might do this because the
deduced architecture was wrong or because you want to tune for a
different CPU than the one you are compiling with.  You can use
@code{--without-gcc-arch} to disable architecture-specific tuning
entirely.  Note that if @code{--enable-portable-binary} is enabled
(above), then we use @code{-mtune} but not @code{-march}, so the
resulting binary will run on any architecture even though it is
optimized for a particular one.

@item
@cindex precision
@code{--enable-float}: Produces a single-precision version of FFTW
(@code{float}) instead of the default double-precision (@code{double}).
@xref{Precision}.

@item
@cindex precision
@code{--enable-long-double}: Produces a long-double precision version of
FFTW (@code{long double}) instead of the default double-precision
(@code{double}).  The @code{configure} script will halt with an error
message is @code{long double} is the same size as @code{double} on your
machine/compiler.  @xref{Precision}.

@item
@cindex threads
@code{--enable-threads}: Enables compilation and installation of the
FFTW threads library (@pxref{Multi-threaded FFTW}), which provides a
simple interface to parallel transforms for SMP systems.  By default,
the threads routines are not compiled.

@item
@code{--enable-openmp}: Like @code{--enable-threads}, but using OpenMP
compiler directives in order to induce parallelism rather than
spawning its own threads directly.  Useful especially for programs
already employing such directives, in order to minimize conflicts
between different parallelization mechanisms.  Use either
@code{--enable-openmp} or @code{--enable-threads}, not both; in either
case the multi-threaded FFTW interface/library (@pxref{Multi-threaded
FFTW}) is compiled (with different back ends).

@item
@code{--with-combined-threads}: By default, if @code{--enable-threads}
or @code{--enable-openmp} are used, the threads support is compiled
into a separate library that must be linked in addition to the main
FFTW library.  This is so that users of the serial library do not need
to link the system threads libraries.  If
@code{--with-combined-threads} is specified, however, then no separate
threads library is created, and threads are included in the main FFTW
library.  This is mainly useful under Windows, where no system threads
library is required and inter-library dependencies are problematic.

@item
@cindex Cell processor
@code{--enable-cell}: Enables code to exploit the Cell processor
(@pxref{FFTW on the Cell Processor}), assuming you have the Cell SDK.
By default, code for the Cell processor is not compiled.

@item
@cindex Fortran-callable wrappers
@code{--disable-fortran}: Disables inclusion of Fortran-callable
wrapper routines (@pxref{Calling FFTW from Fortran}) in the standard
FFTW libraries.  These wrapper routines increase the library size by
only a negligible amount, so they are included by default as long as
the @code{configure} script finds a Fortran compiler on your system.
(To specify a particular Fortran compiler @i{foo}, pass
@code{F77=}@i{foo} to @code{configure}.)

@item
@code{--with-g77-wrappers}: By default, when Fortran wrappers are
included, the wrappers employ the linking conventions of the Fortran
compiler detected by the @code{configure} script.  If this compiler is
GNU @code{g77}, however, then @emph{two} versions of the wrappers are
included: one with @code{g77}'s idiosyncratic convention of appending
two underscores to identifiers, and one with the more common
convention of appending only a single underscore.  This way, the same
FFTW library will work with both @code{g77} and other Fortran
compilers, such as GNU @code{gfortran}.  However, the converse is not
true: if you configure with a different compiler, then the
@code{g77}-compatible wrappers are not included.  By specifying
@code{--with-g77-wrappers}, the @code{g77}-compatible wrappers are
included in addition to wrappers for whatever Fortran compiler
@code{configure} finds.
@fpindex g77

@item
@code{--with-slow-timer}: Disables the use of hardware cycle counters,
and falls back on @code{gettimeofday} or @code{clock}.  This greatly
worsens performance, and should generally not be used (unless you don't
have a cycle counter but still really want an optimized plan regardless
of the time).  @xref{Cycle Counters}.

@item
@code{--enable-sse}, @code{--enable-sse2}, @code{--enable-altivec},
@code{--enable-mips-ps}:
Enable the compilation of SIMD code for SSE (Pentium III+), SSE2
(Pentium IV+), AltiVec (PowerPC G4+), or MIPS PS.  SSE, AltiVec, and MIPS PS
only work with @code{--enable-float} (above), while SSE2 only works in double
precision (the default).  The resulting code will @emph{still work} on
earlier CPUs lacking the SIMD extensions (SIMD is automatically
disabled, although the FFTW library is still larger).
@itemize @minus
@item
These options require a compiler supporting SIMD extensions, and
compiler support is still a bit flaky: see the FFTW FAQ for a list of
compiler versions that have problems compiling FFTW.
@item
With the Linux kernel, you may have to recompile the kernel with the
option to support SSE/SSE2/AltiVec (see the ``Processor type and
features'' settings).
@item
With AltiVec and @code{gcc}, you may have to use the
@code{-mabi=altivec} option when compiling any code that links to FFTW,
in order to properly align the stack; otherwise, FFTW could crash when
it tries to use an AltiVec feature.  (This is not necessary on MacOS X.)
@item
With SSE/SSE2 and @code{gcc}, you should use a version of gcc that
properly aligns the stack when compiling any code that links to FFTW.
By default, @code{gcc} 2.95 and later versions align the stack as
needed, but you should not compile FFTW with the @code{-Os} option or the
@code{-mpreferred-stack-boundary} option with an argument less than 4.
@end itemize

@end itemize

@cindex compiler
To force @code{configure} to use a particular C compiler @i{foo}
(instead of the default, usually @code{gcc}), pass @code{CC=}@i{foo} to the
@code{configure} script; you may also need to set the flags via the variable
@code{CFLAGS} as described above.
@cindex compiler flags

@c ------------------------------------------------------------
@node Installation on non-Unix systems, Cycle Counters, Installation on Unix, Installation and Customization
@section Installation on non-Unix systems

It should be relatively straightforward to compile FFTW even on non-Unix
systems lacking the niceties of a @code{configure} script.  Basically,
you need to edit the @code{config.h} header (copy it from
@code{config.h.in}) to @code{#define} the various options and compiler
characteristics, and then compile all the @samp{.c} files in the
relevant directories.

The @code{config.h} header contains about 100 options to set, each one
initially an @code{#undef}, each documented with a comment, and most of
them fairly obvious.  For most of the options, you should simply
@code{#define} them to @code{1} if they are applicable, although a few
options require a particular value (e.g. @code{SIZEOF_LONG_LONG} should
be defined to the size of the @code{long long} type, in bytes, or zero
if it is not supported).  We will likely post some sample
@code{config.h} files for various operating systems and compilers for
you to use (at least as a starting point).  Please let us know if you
have to hand-create a configuration file (and/or a pre-compiled binary)
that you want to share.

To create the FFTW library, you will then need to compile all of the
@samp{.c} files in the @code{kernel}, @code{dft}, @code{dft/scalar},
@code{dft/scalar/codelets}, @code{rdft}, @code{rdft/scalar},
@code{rdft/scalar/r2cf}, @code{rdft/scalar/r2cb},
@code{rdft/scalar/r2r}, @code{reodft}, and @code{api} directories.
If you are compiling with SIMD support (e.g. you defined
@code{HAVE_SSE2} in @code{config.h}), then you also need to compile
the @code{.c} files in the @code{simd}, @code{simd/nonportable},
@code{dft/simd}, and @code{dft/simd/codelets} directories.

Once these files are all compiled, link them into a library, or a shared
library, or directly into your program.

To compile the FFTW test program, additionally compile the code in the
@code{libbench2/} directory, and link it into a library.  Then compile
the code in the @code{tests/} directory and link it to the
@code{libbench2} and FFTW libraries.  To compile the @code{fftw-wisdom}
(command-line) tool (@pxref{Wisdom Utilities}), compile
@code{tools/fftw-wisdom.c} and link it to the @code{libbench2} and FFTW
libraries

@c ------------------------------------------------------------
@node Cycle Counters, Generating your own code, Installation on non-Unix systems, Installation and Customization
@section Cycle Counters
@cindex cycle counter

FFTW's planner actually executes and times different possible FFT
algorithms in order to pick the fastest plan for a given @math{n}.  In
order to do this in as short a time as possible, however, the timer must
have a very high resolution, and to accomplish this we employ the
hardware @dfn{cycle counters} that are available on most CPUs.
Currently, FFTW supports the cycle counters on x86, PowerPC/POWER, Alpha,
UltraSPARC (SPARC v9), IA64, PA-RISC, and MIPS processors.

@cindex compiler
Access to the cycle counters, unfortunately, is a compiler and/or
operating-system dependent task, often requiring inline assembly
language, and it may be that your compiler is not supported.  If you are
@emph{not} supported, FFTW will by default fall back on its estimator
(effectively using @code{FFTW_ESTIMATE} for all plans).
@ctindex FFTW_ESTIMATE

You can add support by editing the file @code{kernel/cycle.h}; normally,
this will involve adapting one of the examples already present in order
to use the inline-assembler syntax for your C compiler, and will only
require a couple of lines of code.  Anyone adding support for a new
system to @code{cycle.h} is encouraged to email us at @email{fftw@@fftw.org}.

If a cycle counter is not available on your system (e.g. some embedded
processor), and you don't want to use estimated plans, as a last resort
you can use the @code{--with-slow-timer} option to @code{configure} (on
Unix) or @code{#define WITH_SLOW_TIMER} in @code{config.h} (elsewhere).
This will use the much lower-resolution @code{gettimeofday} function, or even
@code{clock} if the former is unavailable, and planning will be
extremely slow.

@c ------------------------------------------------------------
@node Generating your own code,  , Cycle Counters, Installation and Customization
@section Generating your own code
@cindex code generator

The directory @code{genfft} contains the programs that were used to
generate FFTW's ``codelets,'' which are hard-coded transforms of small
sizes.
@cindex codelet
We do not expect casual users to employ the generator, which is a rather
sophisticated program that generates directed acyclic graphs of FFT
algorithms and performs algebraic simplifications on them.  It was
written in Objective Caml, a dialect of ML, which is available at
@uref{http://pauillac.inria.fr/ocaml/}.
@cindex Caml

If you have Objective Caml installed (along with recent versions of
GNU @code{autoconf}, @code{automake}, and @code{libtool}), then you
can change the set of codelets that are generated or play with the
generation options.  The set of generated codelets is specified by the
@code{dft/codelets/*/Makefile.am},
@code{dft/simd/codelets/Makefile.am}, and
@code{rdft/codelets/*/Makefile.am} files.  For example, you can add
efficient REDFT codelets of small sizes by modifying
@code{rdft/codelets/r2r/Makefile.am}.
@cindex REDFT
After you modify any @code{Makefile.am} files, you can type @code{sh
bootstrap.sh} in the top-level directory followed by @code{make} to
re-generate the files.

We do not provide more details about the code-generation process, since
we do not expect that most users will need to generate their own code.
However, feel free to contact us at @email{fftw@@fftw.org} if
you are interested in the subject.

@cindex monadic programming
You might find it interesting to learn Caml and/or some modern
programming techniques that we used in the generator (including monadic
programming), especially if you heard the rumor that Java and
object-oriented programming are the latest advancement in the field.
The internal operation of the codelet generator is described in the
paper, ``A Fast Fourier Transform Compiler,'' by M. Frigo, which is
available from the @uref{http://www.fftw.org,FFTW home page} and also
appeared in the @cite{Proceedings of the 1999 ACM SIGPLAN Conference on
Programming Language Design and Implementation (PLDI)}.

@c ************************************************************
@node Acknowledgments, License and Copyright, Installation and Customization, Top
@chapter Acknowledgments

Matteo Frigo was supported in part by the Special Research Program SFB
F011 ``AURORA'' of the Austrian Science Fund FWF and by MIT Lincoln
Laboratory.  For previous versions of FFTW, he was supported in part by the
Defense Advanced Research Projects Agency (DARPA), under Grants
N00014-94-1-0985 and F30602-97-1-0270, and by a Digital Equipment
Corporation Fellowship.

Steven G. Johnson was supported in part by a Dept.@ of Defense NDSEG
Fellowship, an MIT Karl Taylor Compton Fellowship, and by the Materials
Research Science and Engineering Center program of the National Science
Foundation under award DMR-9400334.

Code for the Cell Broadband Engine was graciously donated to the FFTW
project by the IBM Austin Research Lab.

Code for the MIPS paired-single SIMD support was graciously donated to
the FFTW project by CodeSourcery, Inc.

We are grateful to Sun Microsystems Inc.@ for its donation of a
cluster of 9 8-processor Ultra HPC 5000 SMPs (24 Gflops peak). These
machines served as the primary platform for the development of early
versions of FFTW.

We thank Intel Corporation for donating a four-processor Pentium Pro
machine.  We thank the GNU/Linux community for giving us a decent OS to
run on that machine.

We are thankful to the AMD corporation for donating an AMD Athlon XP 1700+
computer to the FFTW project.

We thank the Compaq/HP testdrive program and VA Software Corporation
(SourceForge.net) for providing remote access to machines that were used
to test FFTW.

The @code{genfft} suite of code generators was written using Objective
Caml, a dialect of ML.  Objective Caml is a small and elegant language
developed by Xavier Leroy.  The implementation is available from
@uref{http://caml.inria.fr/, @code{http://caml.inria.fr/}}.  In previous
releases of FFTW, @code{genfft} was written in Caml Light, by the same
authors.  An even earlier implementation of @code{genfft} was written in
Scheme, but Caml is definitely better for this kind of application.
@cindex Caml
@cindex LISP

FFTW uses many tools from the GNU project, including @code{automake},
@code{texinfo}, and @code{libtool}.

Prof.@ Charles E.@ Leiserson of MIT provided continuous support and
encouragement.  This program would not exist without him.  Charles also
proposed the name ``codelets'' for the basic FFT blocks.
@cindex codelet

Prof.@ John D.@ Joannopoulos of MIT demonstrated continuing tolerance of
Steven's ``extra-curricular'' computer-science activities, as well as
remarkable creativity in working them into his grant proposals.
Steven's physics degree would not exist without him.

Franz Franchetti wrote SIMD extensions to FFTW 2, which eventually
led to the SIMD support in FFTW 3.

Stefan Kral wrote most of the K7 code generator distributed with FFTW
3.0.x and 3.1.x.

Andrew Sterian contributed the Windows timing code in FFTW 2.

Didier Miras reported a bug in the test procedure used in FFTW 1.2.  We
now use a completely different test algorithm by Funda Ergun that does
not require a separate FFT program to compare against.

Wolfgang Reimer contributed the Pentium cycle counter and a few fixes
that help portability.

Ming-Chang Liu uncovered a well-hidden bug in the complex transforms of
FFTW 2.0 and supplied a patch to correct it.

The FFTW FAQ was written in @code{bfnn} (Bizarre Format With No Name)
and formatted using the tools developed by Ian Jackson for the Linux
FAQ.

@emph{We are especially thankful to all of our users for their
continuing support, feedback, and interest during our development of
FFTW.}

@c ************************************************************
@node License and Copyright, Concept Index, Acknowledgments, Top
@chapter License and Copyright


FFTW is Copyright @copyright{} 2003 Matteo Frigo, Copyright
@copyright{} 2003 Massachusetts Institute of Technology.

FFTW is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.  You can also
find the @uref{http://www.gnu.org/copyleft/gpl.html, GPL on the GNU web
site}.

In addition, we kindly ask you to acknowledge FFTW and its authors in
any program or publication in which you use FFTW.  (You are not
@emph{required} to do so; it is up to your common sense to decide
whether you want to comply with this request or not.)  For general
publications, we suggest referencing: Matteo Frigo and Steven
G. Johnson, ``The design and implementation of FFTW3,''
@i{Proc. IEEE} @b{93} (2), 216--231 (2005).

Non-free versions of FFTW are available under terms different from those
of the General Public License. (e.g. they do not require you to
accompany any object code using FFTW with the corresponding source
code.)  For these alternative terms you must purchase a license from MIT's
Technology Licensing Office.  Users interested in such a license should
contact us (@email{fftw@@fftw.org}) for more information.


@node Concept Index, Library Index, License and Copyright, Top
@chapter Concept Index
@printindex cp

@node Library Index,  , Concept Index, Top
@chapter Library Index
@printindex fn

@c ************************************************************

@bye
author	Geogaddi\David <d.m.ronan@qmul.ac.uk>
date	Wed, 22 Jul 2015 15:14:58 +0100
parents	25bf17994ef1
children