Mercurial > hg > sv-dependency-builds
comparison src/fftw-3.3.5/doc/install.texi @ 127:7867fa7e1b6b
Current fftw source
author | Chris Cannam <cannam@all-day-breakfast.com> |
---|---|
date | Tue, 18 Oct 2016 13:40:26 +0100 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
126:4a7071416412 | 127:7867fa7e1b6b |
---|---|
1 @node Installation and Customization, Acknowledgments, Upgrading from FFTW version 2, Top | |
2 @chapter Installation and Customization | |
3 @cindex installation | |
4 | |
5 This chapter describes the installation and customization of FFTW, the | |
6 latest version of which may be downloaded from | |
7 @uref{http://www.fftw.org, the FFTW home page}. | |
8 | |
9 In principle, FFTW should work on any system with an ANSI C compiler | |
10 (@code{gcc} is fine). However, planner time is drastically reduced if | |
11 FFTW can exploit a hardware cycle counter; FFTW comes with cycle-counter | |
12 support for all modern general-purpose CPUs, but you may need to add a | |
13 couple of lines of code if your compiler is not yet supported | |
14 (@pxref{Cycle Counters}). (On Unix, there will be a warning at the end | |
15 of the @code{configure} output if no cycle counter is found.) | |
16 @cindex cycle counter | |
17 @cindex compiler | |
18 @cindex portability | |
19 | |
20 | |
21 Installation of FFTW is simplest if you have a Unix or a GNU system, | |
22 such as GNU/Linux, and we describe this case in the first section below, | |
23 including the use of special configuration options to e.g. install | |
24 different precisions or exploit optimizations for particular | |
25 architectures (e.g. SIMD). Compilation on non-Unix systems is a more | |
26 manual process, but we outline the procedure in the second section. It | |
27 is also likely that pre-compiled binaries will be available for popular | |
28 systems. | |
29 | |
30 Finally, we describe how you can customize FFTW for particular needs by | |
31 generating @emph{codelets} for fast transforms of sizes not supported | |
32 efficiently by the standard FFTW distribution. | |
33 @cindex codelet | |
34 | |
35 @menu | |
36 * Installation on Unix:: | |
37 * Installation on non-Unix systems:: | |
38 * Cycle Counters:: | |
39 * Generating your own code:: | |
40 @end menu | |
41 | |
42 @c ------------------------------------------------------------ | |
43 | |
44 @node Installation on Unix, Installation on non-Unix systems, Installation and Customization, Installation and Customization | |
45 @section Installation on Unix | |
46 | |
47 FFTW comes with a @code{configure} program in the GNU style. | |
48 Installation can be as simple as: | |
49 @fpindex configure | |
50 | |
51 @example | |
52 ./configure | |
53 make | |
54 make install | |
55 @end example | |
56 | |
57 This will build the uniprocessor complex and real transform libraries | |
58 along with the test programs. (We recommend that you use GNU | |
59 @code{make} if it is available; on some systems it is called | |
60 @code{gmake}.) The ``@code{make install}'' command installs the fftw | |
61 and rfftw libraries in standard places, and typically requires root | |
62 privileges (unless you specify a different install directory with the | |
63 @code{--prefix} flag to @code{configure}). You can also type | |
64 ``@code{make check}'' to put the FFTW test programs through their paces. | |
65 If you have problems during configuration or compilation, you may want | |
66 to run ``@code{make distclean}'' before trying again; this ensures that | |
67 you don't have any stale files left over from previous compilation | |
68 attempts. | |
69 | |
70 The @code{configure} script chooses the @code{gcc} compiler by default, | |
71 if it is available; you can select some other compiler with: | |
72 @example | |
73 ./configure CC="@r{@i{<the name of your C compiler>}}" | |
74 @end example | |
75 | |
76 The @code{configure} script knows good @code{CFLAGS} (C compiler flags) | |
77 @cindex compiler flags | |
78 for a few systems. If your system is not known, the @code{configure} | |
79 script will print out a warning. In this case, you should re-configure | |
80 FFTW with the command | |
81 @example | |
82 ./configure CFLAGS="@r{@i{<write your CFLAGS here>}}" | |
83 @end example | |
84 and then compile as usual. If you do find an optimal set of | |
85 @code{CFLAGS} for your system, please let us know what they are (along | |
86 with the output of @code{config.guess}) so that we can include them in | |
87 future releases. | |
88 | |
89 @code{configure} supports all the standard flags defined by the GNU | |
90 Coding Standards; see the @code{INSTALL} file in FFTW or | |
91 @uref{http://www.gnu.org/prep/standards/html_node/index.html, the GNU web page}. | |
92 Note especially @code{--help} to list all flags and | |
93 @code{--enable-shared} to create shared, rather than static, libraries. | |
94 @code{configure} also accepts a few FFTW-specific flags, particularly: | |
95 | |
96 @itemize @bullet | |
97 | |
98 @item | |
99 @cindex precision | |
100 @code{--enable-float}: Produces a single-precision version of FFTW | |
101 (@code{float}) instead of the default double-precision (@code{double}). | |
102 @xref{Precision}. | |
103 | |
104 @item | |
105 @cindex precision | |
106 @code{--enable-long-double}: Produces a long-double precision version of | |
107 FFTW (@code{long double}) instead of the default double-precision | |
108 (@code{double}). The @code{configure} script will halt with an error | |
109 message if @code{long double} is the same size as @code{double} on your | |
110 machine/compiler. @xref{Precision}. | |
111 | |
112 @item | |
113 @cindex precision | |
114 @code{--enable-quad-precision}: Produces a quadruple-precision version | |
115 of FFTW using the nonstandard @code{__float128} type provided by | |
116 @code{gcc} 4.6 or later on x86, x86-64, and Itanium architectures, | |
117 instead of the default double-precision (@code{double}). The | |
118 @code{configure} script will halt with an error message if the | |
119 compiler is not @code{gcc} version 4.6 or later or if @code{gcc}'s | |
120 @code{libquadmath} library is not installed. @xref{Precision}. | |
121 | |
122 @item | |
123 @cindex threads | |
124 @code{--enable-threads}: Enables compilation and installation of the | |
125 FFTW threads library (@pxref{Multi-threaded FFTW}), which provides a | |
126 simple interface to parallel transforms for SMP systems. By default, | |
127 the threads routines are not compiled. | |
128 | |
129 @item | |
130 @code{--enable-openmp}: Like @code{--enable-threads}, but using OpenMP | |
131 compiler directives in order to induce parallelism rather than | |
132 spawning its own threads directly, and installing an @samp{fftw3_omp} library | |
133 rather than an @samp{fftw3_threads} library (@pxref{Multi-threaded | |
134 FFTW}). You can use both @code{--enable-openmp} and @code{--enable-threads} | |
135 since they compile/install libraries with different names. By default, | |
136 the OpenMP routines are not compiled. | |
137 | |
138 @item | |
139 @code{--with-combined-threads}: By default, if @code{--enable-threads} | |
140 is used, the threads support is compiled into a separate library that | |
141 must be linked in addition to the main FFTW library. This is so that | |
142 users of the serial library do not need to link the system threads | |
143 libraries. If @code{--with-combined-threads} is specified, however, | |
144 then no separate threads library is created, and threads are included | |
145 in the main FFTW library. This is mainly useful under Windows, where | |
146 no system threads library is required and inter-library dependencies | |
147 are problematic. | |
148 | |
149 @item | |
150 @cindex MPI | |
151 @code{--enable-mpi}: Enables compilation and installation of the FFTW | |
152 MPI library (@pxref{Distributed-memory FFTW with MPI}), which provides | |
153 parallel transforms for distributed-memory systems with MPI. (By | |
154 default, the MPI routines are not compiled.) @xref{FFTW MPI | |
155 Installation}. | |
156 | |
157 @item | |
158 @cindex Fortran-callable wrappers | |
159 @code{--disable-fortran}: Disables inclusion of legacy-Fortran | |
160 wrapper routines (@pxref{Calling FFTW from Legacy Fortran}) in the standard | |
161 FFTW libraries. These wrapper routines increase the library size by | |
162 only a negligible amount, so they are included by default as long as | |
163 the @code{configure} script finds a Fortran compiler on your system. | |
164 (To specify a particular Fortran compiler @i{foo}, pass | |
165 @code{F77=}@i{foo} to @code{configure}.) | |
166 | |
167 @item | |
168 @code{--with-g77-wrappers}: By default, when Fortran wrappers are | |
169 included, the wrappers employ the linking conventions of the Fortran | |
170 compiler detected by the @code{configure} script. If this compiler is | |
171 GNU @code{g77}, however, then @emph{two} versions of the wrappers are | |
172 included: one with @code{g77}'s idiosyncratic convention of appending | |
173 two underscores to identifiers, and one with the more common | |
174 convention of appending only a single underscore. This way, the same | |
175 FFTW library will work with both @code{g77} and other Fortran | |
176 compilers, such as GNU @code{gfortran}. However, the converse is not | |
177 true: if you configure with a different compiler, then the | |
178 @code{g77}-compatible wrappers are not included. By specifying | |
179 @code{--with-g77-wrappers}, the @code{g77}-compatible wrappers are | |
180 included in addition to wrappers for whatever Fortran compiler | |
181 @code{configure} finds. | |
182 @fpindex g77 | |
183 | |
184 @item | |
185 @code{--with-slow-timer}: Disables the use of hardware cycle counters, | |
186 and falls back on @code{gettimeofday} or @code{clock}. This greatly | |
187 worsens performance, and should generally not be used (unless you don't | |
188 have a cycle counter but still really want an optimized plan regardless | |
189 of the time). @xref{Cycle Counters}. | |
190 | |
191 @item | |
192 @code{--enable-sse} (single precision), | |
193 @code{--enable-sse2} (single, double), | |
194 @code{--enable-avx} (single, double), | |
195 @code{--enable-avx2} (single, double), | |
196 @code{--enable-avx512} (single, double), | |
197 @code{--enable-avx-128-fma}, | |
198 @code{--enable-kcvi} (single), | |
199 @code{--enable-altivec} (single), | |
200 @code{--enable-vsx} (single, double), | |
201 @code{--enable-neon} (single, double on aarch64), | |
202 @code{--enable-generic-simd128}, | |
203 and | |
204 @code{--enable-generic-simd256}: | |
205 | |
206 Enable various SIMD instruction sets. You need compiler that supports | |
207 the given SIMD extensions, but FFTW will try to detect at runtime | |
208 whether the CPU supports these extensions. That is, you can compile | |
209 with@code{--enable-avx} and the code will still run on a CPU without AVX | |
210 support. | |
211 | |
212 @itemize @minus | |
213 @item | |
214 These options require a compiler supporting SIMD extensions, and | |
215 compiler support is always a bit flaky: see the FFTW FAQ for a list of | |
216 compiler versions that have problems compiling FFTW. | |
217 @item | |
218 Because of the large variety of ARM processors and ABIs, FFTW | |
219 does not attempt to guess the correct @code{gcc} flags for generating | |
220 NEON code. In general, you will have to provide them on the command line. | |
221 This command line is known to have worked at least once: | |
222 @example | |
223 ./configure --with-slow-timer --host=arm-linux-gnueabi \ | |
224 --enable-single --enable-neon \ | |
225 "CC=arm-linux-gnueabi-gcc -march=armv7-a -mfloat-abi=softfp" | |
226 @end example | |
227 @end itemize | |
228 | |
229 @end itemize | |
230 | |
231 @cindex compiler | |
232 To force @code{configure} to use a particular C compiler @i{foo} | |
233 (instead of the default, usually @code{gcc}), pass @code{CC=}@i{foo} to the | |
234 @code{configure} script; you may also need to set the flags via the variable | |
235 @code{CFLAGS} as described above. | |
236 @cindex compiler flags | |
237 | |
238 @c ------------------------------------------------------------ | |
239 @node Installation on non-Unix systems, Cycle Counters, Installation on Unix, Installation and Customization | |
240 @section Installation on non-Unix systems | |
241 | |
242 It should be relatively straightforward to compile FFTW even on non-Unix | |
243 systems lacking the niceties of a @code{configure} script. Basically, | |
244 you need to edit the @code{config.h} header (copy it from | |
245 @code{config.h.in}) to @code{#define} the various options and compiler | |
246 characteristics, and then compile all the @samp{.c} files in the | |
247 relevant directories. | |
248 | |
249 The @code{config.h} header contains about 100 options to set, each one | |
250 initially an @code{#undef}, each documented with a comment, and most of | |
251 them fairly obvious. For most of the options, you should simply | |
252 @code{#define} them to @code{1} if they are applicable, although a few | |
253 options require a particular value (e.g. @code{SIZEOF_LONG_LONG} should | |
254 be defined to the size of the @code{long long} type, in bytes, or zero | |
255 if it is not supported). We will likely post some sample | |
256 @code{config.h} files for various operating systems and compilers for | |
257 you to use (at least as a starting point). Please let us know if you | |
258 have to hand-create a configuration file (and/or a pre-compiled binary) | |
259 that you want to share. | |
260 | |
261 To create the FFTW library, you will then need to compile all of the | |
262 @samp{.c} files in the @code{kernel}, @code{dft}, @code{dft/scalar}, | |
263 @code{dft/scalar/codelets}, @code{rdft}, @code{rdft/scalar}, | |
264 @code{rdft/scalar/r2cf}, @code{rdft/scalar/r2cb}, | |
265 @code{rdft/scalar/r2r}, @code{reodft}, and @code{api} directories. | |
266 If you are compiling with SIMD support (e.g. you defined | |
267 @code{HAVE_SSE2} in @code{config.h}), then you also need to compile | |
268 the @code{.c} files in the @code{simd-support}, | |
269 @code{@{dft,rdft@}/simd}, @code{@{dft,rdft@}/simd/*} directories. | |
270 | |
271 Once these files are all compiled, link them into a library, or a shared | |
272 library, or directly into your program. | |
273 | |
274 To compile the FFTW test program, additionally compile the code in the | |
275 @code{libbench2/} directory, and link it into a library. Then compile | |
276 the code in the @code{tests/} directory and link it to the | |
277 @code{libbench2} and FFTW libraries. To compile the @code{fftw-wisdom} | |
278 (command-line) tool (@pxref{Wisdom Utilities}), compile | |
279 @code{tools/fftw-wisdom.c} and link it to the @code{libbench2} and FFTW | |
280 libraries | |
281 | |
282 @c ------------------------------------------------------------ | |
283 @node Cycle Counters, Generating your own code, Installation on non-Unix systems, Installation and Customization | |
284 @section Cycle Counters | |
285 @cindex cycle counter | |
286 | |
287 FFTW's planner actually executes and times different possible FFT | |
288 algorithms in order to pick the fastest plan for a given @math{n}. In | |
289 order to do this in as short a time as possible, however, the timer must | |
290 have a very high resolution, and to accomplish this we employ the | |
291 hardware @dfn{cycle counters} that are available on most CPUs. | |
292 Currently, FFTW supports the cycle counters on x86, PowerPC/POWER, Alpha, | |
293 UltraSPARC (SPARC v9), IA64, PA-RISC, and MIPS processors. | |
294 | |
295 @cindex compiler | |
296 Access to the cycle counters, unfortunately, is a compiler and/or | |
297 operating-system dependent task, often requiring inline assembly | |
298 language, and it may be that your compiler is not supported. If you are | |
299 @emph{not} supported, FFTW will by default fall back on its estimator | |
300 (effectively using @code{FFTW_ESTIMATE} for all plans). | |
301 @ctindex FFTW_ESTIMATE | |
302 | |
303 You can add support by editing the file @code{kernel/cycle.h}; normally, | |
304 this will involve adapting one of the examples already present in order | |
305 to use the inline-assembler syntax for your C compiler, and will only | |
306 require a couple of lines of code. Anyone adding support for a new | |
307 system to @code{cycle.h} is encouraged to email us at @email{fftw@@fftw.org}. | |
308 | |
309 If a cycle counter is not available on your system (e.g. some embedded | |
310 processor), and you don't want to use estimated plans, as a last resort | |
311 you can use the @code{--with-slow-timer} option to @code{configure} (on | |
312 Unix) or @code{#define WITH_SLOW_TIMER} in @code{config.h} (elsewhere). | |
313 This will use the much lower-resolution @code{gettimeofday} function, or even | |
314 @code{clock} if the former is unavailable, and planning will be | |
315 extremely slow. | |
316 | |
317 @c ------------------------------------------------------------ | |
318 @node Generating your own code, , Cycle Counters, Installation and Customization | |
319 @section Generating your own code | |
320 @cindex code generator | |
321 | |
322 The directory @code{genfft} contains the programs that were used to | |
323 generate FFTW's ``codelets,'' which are hard-coded transforms of small | |
324 sizes. | |
325 @cindex codelet | |
326 We do not expect casual users to employ the generator, which is a rather | |
327 sophisticated program that generates directed acyclic graphs of FFT | |
328 algorithms and performs algebraic simplifications on them. It was | |
329 written in Objective Caml, a dialect of ML, which is available at | |
330 @uref{http://caml.inria.fr/ocaml/index.en.html}. | |
331 @cindex Caml | |
332 | |
333 | |
334 If you have Objective Caml installed (along with recent versions of | |
335 GNU @code{autoconf}, @code{automake}, and @code{libtool}), then you | |
336 can change the set of codelets that are generated or play with the | |
337 generation options. The set of generated codelets is specified by the | |
338 @code{@{dft,rdft@}/@{codelets,simd@}/*/Makefile.am} files. For example, you can add | |
339 efficient REDFT codelets of small sizes by modifying | |
340 @code{rdft/codelets/r2r/Makefile.am}. | |
341 @cindex REDFT | |
342 After you modify any @code{Makefile.am} files, you can type @code{sh | |
343 bootstrap.sh} in the top-level directory followed by @code{make} to | |
344 re-generate the files. | |
345 | |
346 We do not provide more details about the code-generation process, since | |
347 we do not expect that most users will need to generate their own code. | |
348 However, feel free to contact us at @email{fftw@@fftw.org} if | |
349 you are interested in the subject. | |
350 | |
351 @cindex monadic programming | |
352 You might find it interesting to learn Caml and/or some modern | |
353 programming techniques that we used in the generator (including monadic | |
354 programming), especially if you heard the rumor that Java and | |
355 object-oriented programming are the latest advancement in the field. | |
356 The internal operation of the codelet generator is described in the | |
357 paper, ``A Fast Fourier Transform Compiler,'' by M. Frigo, which is | |
358 available from the @uref{http://www.fftw.org,FFTW home page} and also | |
359 appeared in the @cite{Proceedings of the 1999 ACM SIGPLAN Conference on | |
360 Programming Language Design and Implementation (PLDI)}. | |
361 |