comparison src/fftw-3.3.5/doc/install.texi @ 127:7867fa7e1b6b

Current fftw source
author Chris Cannam <cannam@all-day-breakfast.com>
date Tue, 18 Oct 2016 13:40:26 +0100
parents
children
comparison
equal deleted inserted replaced
126:4a7071416412 127:7867fa7e1b6b
1 @node Installation and Customization, Acknowledgments, Upgrading from FFTW version 2, Top
2 @chapter Installation and Customization
3 @cindex installation
4
5 This chapter describes the installation and customization of FFTW, the
6 latest version of which may be downloaded from
7 @uref{http://www.fftw.org, the FFTW home page}.
8
9 In principle, FFTW should work on any system with an ANSI C compiler
10 (@code{gcc} is fine). However, planner time is drastically reduced if
11 FFTW can exploit a hardware cycle counter; FFTW comes with cycle-counter
12 support for all modern general-purpose CPUs, but you may need to add a
13 couple of lines of code if your compiler is not yet supported
14 (@pxref{Cycle Counters}). (On Unix, there will be a warning at the end
15 of the @code{configure} output if no cycle counter is found.)
16 @cindex cycle counter
17 @cindex compiler
18 @cindex portability
19
20
21 Installation of FFTW is simplest if you have a Unix or a GNU system,
22 such as GNU/Linux, and we describe this case in the first section below,
23 including the use of special configuration options to e.g. install
24 different precisions or exploit optimizations for particular
25 architectures (e.g. SIMD). Compilation on non-Unix systems is a more
26 manual process, but we outline the procedure in the second section. It
27 is also likely that pre-compiled binaries will be available for popular
28 systems.
29
30 Finally, we describe how you can customize FFTW for particular needs by
31 generating @emph{codelets} for fast transforms of sizes not supported
32 efficiently by the standard FFTW distribution.
33 @cindex codelet
34
35 @menu
36 * Installation on Unix::
37 * Installation on non-Unix systems::
38 * Cycle Counters::
39 * Generating your own code::
40 @end menu
41
42 @c ------------------------------------------------------------
43
44 @node Installation on Unix, Installation on non-Unix systems, Installation and Customization, Installation and Customization
45 @section Installation on Unix
46
47 FFTW comes with a @code{configure} program in the GNU style.
48 Installation can be as simple as:
49 @fpindex configure
50
51 @example
52 ./configure
53 make
54 make install
55 @end example
56
57 This will build the uniprocessor complex and real transform libraries
58 along with the test programs. (We recommend that you use GNU
59 @code{make} if it is available; on some systems it is called
60 @code{gmake}.) The ``@code{make install}'' command installs the fftw
61 and rfftw libraries in standard places, and typically requires root
62 privileges (unless you specify a different install directory with the
63 @code{--prefix} flag to @code{configure}). You can also type
64 ``@code{make check}'' to put the FFTW test programs through their paces.
65 If you have problems during configuration or compilation, you may want
66 to run ``@code{make distclean}'' before trying again; this ensures that
67 you don't have any stale files left over from previous compilation
68 attempts.
69
70 The @code{configure} script chooses the @code{gcc} compiler by default,
71 if it is available; you can select some other compiler with:
72 @example
73 ./configure CC="@r{@i{<the name of your C compiler>}}"
74 @end example
75
76 The @code{configure} script knows good @code{CFLAGS} (C compiler flags)
77 @cindex compiler flags
78 for a few systems. If your system is not known, the @code{configure}
79 script will print out a warning. In this case, you should re-configure
80 FFTW with the command
81 @example
82 ./configure CFLAGS="@r{@i{<write your CFLAGS here>}}"
83 @end example
84 and then compile as usual. If you do find an optimal set of
85 @code{CFLAGS} for your system, please let us know what they are (along
86 with the output of @code{config.guess}) so that we can include them in
87 future releases.
88
89 @code{configure} supports all the standard flags defined by the GNU
90 Coding Standards; see the @code{INSTALL} file in FFTW or
91 @uref{http://www.gnu.org/prep/standards/html_node/index.html, the GNU web page}.
92 Note especially @code{--help} to list all flags and
93 @code{--enable-shared} to create shared, rather than static, libraries.
94 @code{configure} also accepts a few FFTW-specific flags, particularly:
95
96 @itemize @bullet
97
98 @item
99 @cindex precision
100 @code{--enable-float}: Produces a single-precision version of FFTW
101 (@code{float}) instead of the default double-precision (@code{double}).
102 @xref{Precision}.
103
104 @item
105 @cindex precision
106 @code{--enable-long-double}: Produces a long-double precision version of
107 FFTW (@code{long double}) instead of the default double-precision
108 (@code{double}). The @code{configure} script will halt with an error
109 message if @code{long double} is the same size as @code{double} on your
110 machine/compiler. @xref{Precision}.
111
112 @item
113 @cindex precision
114 @code{--enable-quad-precision}: Produces a quadruple-precision version
115 of FFTW using the nonstandard @code{__float128} type provided by
116 @code{gcc} 4.6 or later on x86, x86-64, and Itanium architectures,
117 instead of the default double-precision (@code{double}). The
118 @code{configure} script will halt with an error message if the
119 compiler is not @code{gcc} version 4.6 or later or if @code{gcc}'s
120 @code{libquadmath} library is not installed. @xref{Precision}.
121
122 @item
123 @cindex threads
124 @code{--enable-threads}: Enables compilation and installation of the
125 FFTW threads library (@pxref{Multi-threaded FFTW}), which provides a
126 simple interface to parallel transforms for SMP systems. By default,
127 the threads routines are not compiled.
128
129 @item
130 @code{--enable-openmp}: Like @code{--enable-threads}, but using OpenMP
131 compiler directives in order to induce parallelism rather than
132 spawning its own threads directly, and installing an @samp{fftw3_omp} library
133 rather than an @samp{fftw3_threads} library (@pxref{Multi-threaded
134 FFTW}). You can use both @code{--enable-openmp} and @code{--enable-threads}
135 since they compile/install libraries with different names. By default,
136 the OpenMP routines are not compiled.
137
138 @item
139 @code{--with-combined-threads}: By default, if @code{--enable-threads}
140 is used, the threads support is compiled into a separate library that
141 must be linked in addition to the main FFTW library. This is so that
142 users of the serial library do not need to link the system threads
143 libraries. If @code{--with-combined-threads} is specified, however,
144 then no separate threads library is created, and threads are included
145 in the main FFTW library. This is mainly useful under Windows, where
146 no system threads library is required and inter-library dependencies
147 are problematic.
148
149 @item
150 @cindex MPI
151 @code{--enable-mpi}: Enables compilation and installation of the FFTW
152 MPI library (@pxref{Distributed-memory FFTW with MPI}), which provides
153 parallel transforms for distributed-memory systems with MPI. (By
154 default, the MPI routines are not compiled.) @xref{FFTW MPI
155 Installation}.
156
157 @item
158 @cindex Fortran-callable wrappers
159 @code{--disable-fortran}: Disables inclusion of legacy-Fortran
160 wrapper routines (@pxref{Calling FFTW from Legacy Fortran}) in the standard
161 FFTW libraries. These wrapper routines increase the library size by
162 only a negligible amount, so they are included by default as long as
163 the @code{configure} script finds a Fortran compiler on your system.
164 (To specify a particular Fortran compiler @i{foo}, pass
165 @code{F77=}@i{foo} to @code{configure}.)
166
167 @item
168 @code{--with-g77-wrappers}: By default, when Fortran wrappers are
169 included, the wrappers employ the linking conventions of the Fortran
170 compiler detected by the @code{configure} script. If this compiler is
171 GNU @code{g77}, however, then @emph{two} versions of the wrappers are
172 included: one with @code{g77}'s idiosyncratic convention of appending
173 two underscores to identifiers, and one with the more common
174 convention of appending only a single underscore. This way, the same
175 FFTW library will work with both @code{g77} and other Fortran
176 compilers, such as GNU @code{gfortran}. However, the converse is not
177 true: if you configure with a different compiler, then the
178 @code{g77}-compatible wrappers are not included. By specifying
179 @code{--with-g77-wrappers}, the @code{g77}-compatible wrappers are
180 included in addition to wrappers for whatever Fortran compiler
181 @code{configure} finds.
182 @fpindex g77
183
184 @item
185 @code{--with-slow-timer}: Disables the use of hardware cycle counters,
186 and falls back on @code{gettimeofday} or @code{clock}. This greatly
187 worsens performance, and should generally not be used (unless you don't
188 have a cycle counter but still really want an optimized plan regardless
189 of the time). @xref{Cycle Counters}.
190
191 @item
192 @code{--enable-sse} (single precision),
193 @code{--enable-sse2} (single, double),
194 @code{--enable-avx} (single, double),
195 @code{--enable-avx2} (single, double),
196 @code{--enable-avx512} (single, double),
197 @code{--enable-avx-128-fma},
198 @code{--enable-kcvi} (single),
199 @code{--enable-altivec} (single),
200 @code{--enable-vsx} (single, double),
201 @code{--enable-neon} (single, double on aarch64),
202 @code{--enable-generic-simd128},
203 and
204 @code{--enable-generic-simd256}:
205
206 Enable various SIMD instruction sets. You need compiler that supports
207 the given SIMD extensions, but FFTW will try to detect at runtime
208 whether the CPU supports these extensions. That is, you can compile
209 with@code{--enable-avx} and the code will still run on a CPU without AVX
210 support.
211
212 @itemize @minus
213 @item
214 These options require a compiler supporting SIMD extensions, and
215 compiler support is always a bit flaky: see the FFTW FAQ for a list of
216 compiler versions that have problems compiling FFTW.
217 @item
218 Because of the large variety of ARM processors and ABIs, FFTW
219 does not attempt to guess the correct @code{gcc} flags for generating
220 NEON code. In general, you will have to provide them on the command line.
221 This command line is known to have worked at least once:
222 @example
223 ./configure --with-slow-timer --host=arm-linux-gnueabi \
224 --enable-single --enable-neon \
225 "CC=arm-linux-gnueabi-gcc -march=armv7-a -mfloat-abi=softfp"
226 @end example
227 @end itemize
228
229 @end itemize
230
231 @cindex compiler
232 To force @code{configure} to use a particular C compiler @i{foo}
233 (instead of the default, usually @code{gcc}), pass @code{CC=}@i{foo} to the
234 @code{configure} script; you may also need to set the flags via the variable
235 @code{CFLAGS} as described above.
236 @cindex compiler flags
237
238 @c ------------------------------------------------------------
239 @node Installation on non-Unix systems, Cycle Counters, Installation on Unix, Installation and Customization
240 @section Installation on non-Unix systems
241
242 It should be relatively straightforward to compile FFTW even on non-Unix
243 systems lacking the niceties of a @code{configure} script. Basically,
244 you need to edit the @code{config.h} header (copy it from
245 @code{config.h.in}) to @code{#define} the various options and compiler
246 characteristics, and then compile all the @samp{.c} files in the
247 relevant directories.
248
249 The @code{config.h} header contains about 100 options to set, each one
250 initially an @code{#undef}, each documented with a comment, and most of
251 them fairly obvious. For most of the options, you should simply
252 @code{#define} them to @code{1} if they are applicable, although a few
253 options require a particular value (e.g. @code{SIZEOF_LONG_LONG} should
254 be defined to the size of the @code{long long} type, in bytes, or zero
255 if it is not supported). We will likely post some sample
256 @code{config.h} files for various operating systems and compilers for
257 you to use (at least as a starting point). Please let us know if you
258 have to hand-create a configuration file (and/or a pre-compiled binary)
259 that you want to share.
260
261 To create the FFTW library, you will then need to compile all of the
262 @samp{.c} files in the @code{kernel}, @code{dft}, @code{dft/scalar},
263 @code{dft/scalar/codelets}, @code{rdft}, @code{rdft/scalar},
264 @code{rdft/scalar/r2cf}, @code{rdft/scalar/r2cb},
265 @code{rdft/scalar/r2r}, @code{reodft}, and @code{api} directories.
266 If you are compiling with SIMD support (e.g. you defined
267 @code{HAVE_SSE2} in @code{config.h}), then you also need to compile
268 the @code{.c} files in the @code{simd-support},
269 @code{@{dft,rdft@}/simd}, @code{@{dft,rdft@}/simd/*} directories.
270
271 Once these files are all compiled, link them into a library, or a shared
272 library, or directly into your program.
273
274 To compile the FFTW test program, additionally compile the code in the
275 @code{libbench2/} directory, and link it into a library. Then compile
276 the code in the @code{tests/} directory and link it to the
277 @code{libbench2} and FFTW libraries. To compile the @code{fftw-wisdom}
278 (command-line) tool (@pxref{Wisdom Utilities}), compile
279 @code{tools/fftw-wisdom.c} and link it to the @code{libbench2} and FFTW
280 libraries
281
282 @c ------------------------------------------------------------
283 @node Cycle Counters, Generating your own code, Installation on non-Unix systems, Installation and Customization
284 @section Cycle Counters
285 @cindex cycle counter
286
287 FFTW's planner actually executes and times different possible FFT
288 algorithms in order to pick the fastest plan for a given @math{n}. In
289 order to do this in as short a time as possible, however, the timer must
290 have a very high resolution, and to accomplish this we employ the
291 hardware @dfn{cycle counters} that are available on most CPUs.
292 Currently, FFTW supports the cycle counters on x86, PowerPC/POWER, Alpha,
293 UltraSPARC (SPARC v9), IA64, PA-RISC, and MIPS processors.
294
295 @cindex compiler
296 Access to the cycle counters, unfortunately, is a compiler and/or
297 operating-system dependent task, often requiring inline assembly
298 language, and it may be that your compiler is not supported. If you are
299 @emph{not} supported, FFTW will by default fall back on its estimator
300 (effectively using @code{FFTW_ESTIMATE} for all plans).
301 @ctindex FFTW_ESTIMATE
302
303 You can add support by editing the file @code{kernel/cycle.h}; normally,
304 this will involve adapting one of the examples already present in order
305 to use the inline-assembler syntax for your C compiler, and will only
306 require a couple of lines of code. Anyone adding support for a new
307 system to @code{cycle.h} is encouraged to email us at @email{fftw@@fftw.org}.
308
309 If a cycle counter is not available on your system (e.g. some embedded
310 processor), and you don't want to use estimated plans, as a last resort
311 you can use the @code{--with-slow-timer} option to @code{configure} (on
312 Unix) or @code{#define WITH_SLOW_TIMER} in @code{config.h} (elsewhere).
313 This will use the much lower-resolution @code{gettimeofday} function, or even
314 @code{clock} if the former is unavailable, and planning will be
315 extremely slow.
316
317 @c ------------------------------------------------------------
318 @node Generating your own code, , Cycle Counters, Installation and Customization
319 @section Generating your own code
320 @cindex code generator
321
322 The directory @code{genfft} contains the programs that were used to
323 generate FFTW's ``codelets,'' which are hard-coded transforms of small
324 sizes.
325 @cindex codelet
326 We do not expect casual users to employ the generator, which is a rather
327 sophisticated program that generates directed acyclic graphs of FFT
328 algorithms and performs algebraic simplifications on them. It was
329 written in Objective Caml, a dialect of ML, which is available at
330 @uref{http://caml.inria.fr/ocaml/index.en.html}.
331 @cindex Caml
332
333
334 If you have Objective Caml installed (along with recent versions of
335 GNU @code{autoconf}, @code{automake}, and @code{libtool}), then you
336 can change the set of codelets that are generated or play with the
337 generation options. The set of generated codelets is specified by the
338 @code{@{dft,rdft@}/@{codelets,simd@}/*/Makefile.am} files. For example, you can add
339 efficient REDFT codelets of small sizes by modifying
340 @code{rdft/codelets/r2r/Makefile.am}.
341 @cindex REDFT
342 After you modify any @code{Makefile.am} files, you can type @code{sh
343 bootstrap.sh} in the top-level directory followed by @code{make} to
344 re-generate the files.
345
346 We do not provide more details about the code-generation process, since
347 we do not expect that most users will need to generate their own code.
348 However, feel free to contact us at @email{fftw@@fftw.org} if
349 you are interested in the subject.
350
351 @cindex monadic programming
352 You might find it interesting to learn Caml and/or some modern
353 programming techniques that we used in the generator (including monadic
354 programming), especially if you heard the rumor that Java and
355 object-oriented programming are the latest advancement in the field.
356 The internal operation of the codelet generator is described in the
357 paper, ``A Fast Fourier Transform Compiler,'' by M. Frigo, which is
358 available from the @uref{http://www.fftw.org,FFTW home page} and also
359 appeared in the @cite{Proceedings of the 1999 ACM SIGPLAN Conference on
360 Programming Language Design and Implementation (PLDI)}.
361