comparison src/fftw-3.3.3/doc/intro.texi @ 10:37bf6b4a2645

Add FFTW3
author Chris Cannam
date Wed, 20 Mar 2013 15:35:50 +0000
parents
children
comparison
equal deleted inserted replaced
9:c0fb53affa76 10:37bf6b4a2645
1 @node Introduction, Tutorial, Top, Top
2 @chapter Introduction
3 This manual documents version @value{VERSION} of FFTW, the
4 @emph{Fastest Fourier Transform in the West}. FFTW is a comprehensive
5 collection of fast C routines for computing the discrete Fourier
6 transform (DFT) and various special cases thereof.
7 @cindex discrete Fourier transform
8 @cindex DFT
9 @itemize @bullet
10 @item FFTW computes the DFT of complex data, real data, even-
11 or odd-symmetric real data (these symmetric transforms are usually
12 known as the discrete cosine or sine transform, respectively), and the
13 discrete Hartley transform (DHT) of real data.
14
15 @item The input data can have arbitrary length.
16 FFTW employs @Onlogn{} algorithms for all lengths, including
17 prime numbers.
18
19 @item FFTW supports arbitrary multi-dimensional data.
20
21 @item FFTW supports the SSE, SSE2, AVX, Altivec, and MIPS PS instruction
22 sets.
23
24 @item FFTW includes parallel (multi-threaded) transforms
25 for shared-memory systems.
26 @item Starting with version 3.3, FFTW includes distributed-memory parallel
27 transforms using MPI.
28 @end itemize
29
30 We assume herein that you are familiar with the properties and uses of
31 the DFT that are relevant to your application. Otherwise, see
32 e.g. @cite{The Fast Fourier Transform and Its Applications} by E. O. Brigham
33 (Prentice-Hall, Englewood Cliffs, NJ, 1988).
34 @uref{http://www.fftw.org, Our web page} also has links to FFT-related
35 information online.
36 @cindex FFTW
37
38 @c TODO: revise. We don't need to brag any longer
39 @c
40 @c FFTW is usually faster (and sometimes much faster) than all other
41 @c freely-available Fourier transform programs found on the Net. It is
42 @c competitive with (and often faster than) the FFT codes in Sun's
43 @c Performance Library, IBM's ESSL library, HP's CXML library, and
44 @c Intel's MKL library, which are targeted at specific machines.
45 @c Moreover, FFTW's performance is @emph{portable}. Indeed, FFTW is
46 @c unique in that it automatically adapts itself to your machine, your
47 @c cache, the size of your memory, your number of registers, and all the
48 @c other factors that normally make it impossible to optimize a program
49 @c for more than one machine. An extensive comparison of FFTW's
50 @c performance with that of other Fourier transform codes has been made,
51 @c and the results are available on the Web at
52 @c @uref{http://fftw.org/benchfft, the benchFFT home page}.
53 @c @cindex benchmark
54 @c @fpindex benchfft
55
56 In order to use FFTW effectively, you need to learn one basic concept
57 of FFTW's internal structure: FFTW does not use a fixed algorithm for
58 computing the transform, but instead it adapts the DFT algorithm to
59 details of the underlying hardware in order to maximize performance.
60 Hence, the computation of the transform is split into two phases.
61 First, FFTW's @dfn{planner} ``learns'' the fastest way to compute the
62 transform on your machine. The planner
63 @cindex planner
64 produces a data structure called a @dfn{plan} that contains this
65 @cindex plan
66 information. Subsequently, the plan is @dfn{executed}
67 @cindex execute
68 to transform the array of input data as dictated by the plan. The
69 plan can be reused as many times as needed. In typical
70 high-performance applications, many transforms of the same size are
71 computed and, consequently, a relatively expensive initialization of
72 this sort is acceptable. On the other hand, if you need a single
73 transform of a given size, the one-time cost of the planner becomes
74 significant. For this case, FFTW provides fast planners based on
75 heuristics or on previously computed plans.
76
77 FFTW supports transforms of data with arbitrary length, rank,
78 multiplicity, and a general memory layout. In simple cases, however,
79 this generality may be unnecessary and confusing. Consequently, we
80 organized the interface to FFTW into three levels of increasing
81 generality.
82 @itemize @bullet
83 @item The @dfn{basic interface} computes a single
84 transform of contiguous data.
85 @item The @dfn{advanced interface} computes transforms
86 of multiple or strided arrays.
87 @item The @dfn{guru interface} supports the most general data
88 layouts, multiplicities, and strides.
89 @end itemize
90 We expect that most users will be best served by the basic interface,
91 whereas the guru interface requires careful attention to the
92 documentation to avoid problems.
93 @cindex basic interface
94 @cindex advanced interface
95 @cindex guru interface
96
97
98 Besides the automatic performance adaptation performed by the planner,
99 it is also possible for advanced users to customize FFTW manually. For
100 example, if code space is a concern, we provide a tool that links only
101 the subset of FFTW needed by your application. Conversely, you may need
102 to extend FFTW because the standard distribution is not sufficient for
103 your needs. For example, the standard FFTW distribution works most
104 efficiently for arrays whose size can be factored into small primes
105 (@math{2}, @math{3}, @math{5}, and @math{7}), and otherwise it uses a
106 slower general-purpose routine. If you need efficient transforms of
107 other sizes, you can use FFTW's code generator, which produces fast C
108 programs (``codelets'') for any particular array size you may care
109 about.
110 @cindex code generator
111 @cindex codelet
112 For example, if you need transforms of size
113 @ifinfo
114 @math{513 = 19 x 3^3},
115 @end ifinfo
116 @tex
117 $513 = 19 \cdot 3^3$,
118 @end tex
119 @html
120 513&nbsp;=&nbsp;19*3<sup>3</sup>,
121 @end html
122 you can customize FFTW to support the factor @math{19} efficiently.
123
124 For more information regarding FFTW, see the paper, ``The Design and
125 Implementation of FFTW3,'' by M. Frigo and S. G. Johnson, which was an
126 invited paper in @cite{Proc. IEEE} @b{93} (2), p. 216 (2005). The
127 code generator is described in the paper ``A fast Fourier transform
128 compiler'',
129 @cindex compiler
130 by M. Frigo, in the @cite{Proceedings of the 1999 ACM SIGPLAN Conference
131 on Programming Language Design and Implementation (PLDI), Atlanta,
132 Georgia, May 1999}. These papers, along with the latest version of
133 FFTW, the FAQ, benchmarks, and other links, are available at
134 @uref{http://www.fftw.org, the FFTW home page}.
135
136 The current version of FFTW incorporates many good ideas from the past
137 thirty years of FFT literature. In one way or another, FFTW uses the
138 Cooley-Tukey algorithm, the prime factor algorithm, Rader's algorithm
139 for prime sizes, and a split-radix algorithm (with a
140 ``conjugate-pair'' variation pointed out to us by Dan Bernstein).
141 FFTW's code generator also produces new algorithms that we do not
142 completely understand.
143 @cindex algorithm
144 The reader is referred to the cited papers for the appropriate
145 references.
146
147 The rest of this manual is organized as follows. We first discuss the
148 sequential (single-processor) implementation. We start by describing
149 the basic interface/features of FFTW in @ref{Tutorial}.
150 Next, @ref{Other Important Topics} discusses data alignment
151 (@pxref{SIMD alignment and fftw_malloc}),
152 the storage scheme of multi-dimensional arrays
153 (@pxref{Multi-dimensional Array Format}), and FFTW's mechanism for
154 storing plans on disk (@pxref{Words of Wisdom-Saving Plans}). Next,
155 @ref{FFTW Reference} provides comprehensive documentation of all
156 FFTW's features. Parallel transforms are discussed in their own
157 chapters: @ref{Multi-threaded FFTW} and @ref{Distributed-memory FFTW
158 with MPI}. Fortran programmers can also use FFTW, as described in
159 @ref{Calling FFTW from Legacy Fortran} and @ref{Calling FFTW from
160 Modern Fortran}. @ref{Installation and Customization} explains how to
161 install FFTW in your computer system and how to adapt FFTW to your
162 needs. License and copyright information is given in @ref{License and
163 Copyright}. Finally, we thank all the people who helped us in
164 @ref{Acknowledgments}.
165