annotate src/fftw-3.3.8/doc/intro.texi @ 83:ae30d91d2ffe

Replace these with versions built using an older toolset (so as to avoid ABI compatibilities when linking on Ubuntu 14.04 for packaging purposes)
author Chris Cannam
date Fri, 07 Feb 2020 11:51:13 +0000
parents d0c2a83c1364
children
rev   line source
Chris@82 1 @node Introduction, Tutorial, Top, Top
Chris@82 2 @chapter Introduction
Chris@82 3 This manual documents version @value{VERSION} of FFTW, the
Chris@82 4 @emph{Fastest Fourier Transform in the West}. FFTW is a comprehensive
Chris@82 5 collection of fast C routines for computing the discrete Fourier
Chris@82 6 transform (DFT) and various special cases thereof.
Chris@82 7 @cindex discrete Fourier transform
Chris@82 8 @cindex DFT
Chris@82 9 @itemize @bullet
Chris@82 10 @item FFTW computes the DFT of complex data, real data, even-
Chris@82 11 or odd-symmetric real data (these symmetric transforms are usually
Chris@82 12 known as the discrete cosine or sine transform, respectively), and the
Chris@82 13 discrete Hartley transform (DHT) of real data.
Chris@82 14
Chris@82 15 @item The input data can have arbitrary length.
Chris@82 16 FFTW employs @Onlogn{} algorithms for all lengths, including
Chris@82 17 prime numbers.
Chris@82 18
Chris@82 19 @item FFTW supports arbitrary multi-dimensional data.
Chris@82 20
Chris@82 21 @item FFTW supports the SSE, SSE2, AVX, AVX2, AVX512, KCVI, Altivec, VSX, and
Chris@82 22 NEON vector instruction sets.
Chris@82 23
Chris@82 24 @item FFTW includes parallel (multi-threaded) transforms
Chris@82 25 for shared-memory systems.
Chris@82 26 @item Starting with version 3.3, FFTW includes distributed-memory parallel
Chris@82 27 transforms using MPI.
Chris@82 28 @end itemize
Chris@82 29
Chris@82 30 We assume herein that you are familiar with the properties and uses of
Chris@82 31 the DFT that are relevant to your application. Otherwise, see
Chris@82 32 e.g. @cite{The Fast Fourier Transform and Its Applications} by E. O. Brigham
Chris@82 33 (Prentice-Hall, Englewood Cliffs, NJ, 1988).
Chris@82 34 @uref{http://www.fftw.org, Our web page} also has links to FFT-related
Chris@82 35 information online.
Chris@82 36 @cindex FFTW
Chris@82 37
Chris@82 38 @c TODO: revise. We don't need to brag any longer
Chris@82 39 @c
Chris@82 40 @c FFTW is usually faster (and sometimes much faster) than all other
Chris@82 41 @c freely-available Fourier transform programs found on the Net. It is
Chris@82 42 @c competitive with (and often faster than) the FFT codes in Sun's
Chris@82 43 @c Performance Library, IBM's ESSL library, HP's CXML library, and
Chris@82 44 @c Intel's MKL library, which are targeted at specific machines.
Chris@82 45 @c Moreover, FFTW's performance is @emph{portable}. Indeed, FFTW is
Chris@82 46 @c unique in that it automatically adapts itself to your machine, your
Chris@82 47 @c cache, the size of your memory, your number of registers, and all the
Chris@82 48 @c other factors that normally make it impossible to optimize a program
Chris@82 49 @c for more than one machine. An extensive comparison of FFTW's
Chris@82 50 @c performance with that of other Fourier transform codes has been made,
Chris@82 51 @c and the results are available on the Web at
Chris@82 52 @c @uref{http://fftw.org/benchfft, the benchFFT home page}.
Chris@82 53 @c @cindex benchmark
Chris@82 54 @c @fpindex benchfft
Chris@82 55
Chris@82 56 In order to use FFTW effectively, you need to learn one basic concept
Chris@82 57 of FFTW's internal structure: FFTW does not use a fixed algorithm for
Chris@82 58 computing the transform, but instead it adapts the DFT algorithm to
Chris@82 59 details of the underlying hardware in order to maximize performance.
Chris@82 60 Hence, the computation of the transform is split into two phases.
Chris@82 61 First, FFTW's @dfn{planner} ``learns'' the fastest way to compute the
Chris@82 62 transform on your machine. The planner
Chris@82 63 @cindex planner
Chris@82 64 produces a data structure called a @dfn{plan} that contains this
Chris@82 65 @cindex plan
Chris@82 66 information. Subsequently, the plan is @dfn{executed}
Chris@82 67 @cindex execute
Chris@82 68 to transform the array of input data as dictated by the plan. The
Chris@82 69 plan can be reused as many times as needed. In typical
Chris@82 70 high-performance applications, many transforms of the same size are
Chris@82 71 computed and, consequently, a relatively expensive initialization of
Chris@82 72 this sort is acceptable. On the other hand, if you need a single
Chris@82 73 transform of a given size, the one-time cost of the planner becomes
Chris@82 74 significant. For this case, FFTW provides fast planners based on
Chris@82 75 heuristics or on previously computed plans.
Chris@82 76
Chris@82 77 FFTW supports transforms of data with arbitrary length, rank,
Chris@82 78 multiplicity, and a general memory layout. In simple cases, however,
Chris@82 79 this generality may be unnecessary and confusing. Consequently, we
Chris@82 80 organized the interface to FFTW into three levels of increasing
Chris@82 81 generality.
Chris@82 82 @itemize @bullet
Chris@82 83 @item The @dfn{basic interface} computes a single
Chris@82 84 transform of contiguous data.
Chris@82 85 @item The @dfn{advanced interface} computes transforms
Chris@82 86 of multiple or strided arrays.
Chris@82 87 @item The @dfn{guru interface} supports the most general data
Chris@82 88 layouts, multiplicities, and strides.
Chris@82 89 @end itemize
Chris@82 90 We expect that most users will be best served by the basic interface,
Chris@82 91 whereas the guru interface requires careful attention to the
Chris@82 92 documentation to avoid problems.
Chris@82 93 @cindex basic interface
Chris@82 94 @cindex advanced interface
Chris@82 95 @cindex guru interface
Chris@82 96
Chris@82 97
Chris@82 98 Besides the automatic performance adaptation performed by the planner,
Chris@82 99 it is also possible for advanced users to customize FFTW manually. For
Chris@82 100 example, if code space is a concern, we provide a tool that links only
Chris@82 101 the subset of FFTW needed by your application. Conversely, you may need
Chris@82 102 to extend FFTW because the standard distribution is not sufficient for
Chris@82 103 your needs. For example, the standard FFTW distribution works most
Chris@82 104 efficiently for arrays whose size can be factored into small primes
Chris@82 105 (@math{2}, @math{3}, @math{5}, and @math{7}), and otherwise it uses a
Chris@82 106 slower general-purpose routine. If you need efficient transforms of
Chris@82 107 other sizes, you can use FFTW's code generator, which produces fast C
Chris@82 108 programs (``codelets'') for any particular array size you may care
Chris@82 109 about.
Chris@82 110 @cindex code generator
Chris@82 111 @cindex codelet
Chris@82 112 For example, if you need transforms of size
Chris@82 113 @ifinfo
Chris@82 114 @math{513 = 19 x 3^3},
Chris@82 115 @end ifinfo
Chris@82 116 @tex
Chris@82 117 $513 = 19 \cdot 3^3$,
Chris@82 118 @end tex
Chris@82 119 @html
Chris@82 120 513&nbsp;=&nbsp;19*3<sup>3</sup>,
Chris@82 121 @end html
Chris@82 122 you can customize FFTW to support the factor @math{19} efficiently.
Chris@82 123
Chris@82 124 For more information regarding FFTW, see the paper, ``The Design and
Chris@82 125 Implementation of FFTW3,'' by M. Frigo and S. G. Johnson, which was an
Chris@82 126 invited paper in @cite{Proc. IEEE} @b{93} (2), p. 216 (2005). The
Chris@82 127 code generator is described in the paper ``A fast Fourier transform
Chris@82 128 compiler'',
Chris@82 129 @cindex compiler
Chris@82 130 by M. Frigo, in the @cite{Proceedings of the 1999 ACM SIGPLAN Conference
Chris@82 131 on Programming Language Design and Implementation (PLDI), Atlanta,
Chris@82 132 Georgia, May 1999}. These papers, along with the latest version of
Chris@82 133 FFTW, the FAQ, benchmarks, and other links, are available at
Chris@82 134 @uref{http://www.fftw.org, the FFTW home page}.
Chris@82 135
Chris@82 136 The current version of FFTW incorporates many good ideas from the past
Chris@82 137 thirty years of FFT literature. In one way or another, FFTW uses the
Chris@82 138 Cooley-Tukey algorithm, the prime factor algorithm, Rader's algorithm
Chris@82 139 for prime sizes, and a split-radix algorithm (with a
Chris@82 140 ``conjugate-pair'' variation pointed out to us by Dan Bernstein).
Chris@82 141 FFTW's code generator also produces new algorithms that we do not
Chris@82 142 completely understand.
Chris@82 143 @cindex algorithm
Chris@82 144 The reader is referred to the cited papers for the appropriate
Chris@82 145 references.
Chris@82 146
Chris@82 147 The rest of this manual is organized as follows. We first discuss the
Chris@82 148 sequential (single-processor) implementation. We start by describing
Chris@82 149 the basic interface/features of FFTW in @ref{Tutorial}.
Chris@82 150 Next, @ref{Other Important Topics} discusses data alignment
Chris@82 151 (@pxref{SIMD alignment and fftw_malloc}),
Chris@82 152 the storage scheme of multi-dimensional arrays
Chris@82 153 (@pxref{Multi-dimensional Array Format}), and FFTW's mechanism for
Chris@82 154 storing plans on disk (@pxref{Words of Wisdom-Saving Plans}). Next,
Chris@82 155 @ref{FFTW Reference} provides comprehensive documentation of all
Chris@82 156 FFTW's features. Parallel transforms are discussed in their own
Chris@82 157 chapters: @ref{Multi-threaded FFTW} and @ref{Distributed-memory FFTW
Chris@82 158 with MPI}. Fortran programmers can also use FFTW, as described in
Chris@82 159 @ref{Calling FFTW from Legacy Fortran} and @ref{Calling FFTW from
Chris@82 160 Modern Fortran}. @ref{Installation and Customization} explains how to
Chris@82 161 install FFTW in your computer system and how to adapt FFTW to your
Chris@82 162 needs. License and copyright information is given in @ref{License and
Chris@82 163 Copyright}. Finally, we thank all the people who helped us in
Chris@82 164 @ref{Acknowledgments}.
Chris@82 165