annotate src/fftw-3.3.3/doc/install.texi @ 83:ae30d91d2ffe

Replace these with versions built using an older toolset (so as to avoid ABI compatibilities when linking on Ubuntu 14.04 for packaging purposes)
author Chris Cannam
date Fri, 07 Feb 2020 11:51:13 +0000
parents 37bf6b4a2645
children
rev   line source
Chris@10 1 @node Installation and Customization, Acknowledgments, Upgrading from FFTW version 2, Top
Chris@10 2 @chapter Installation and Customization
Chris@10 3 @cindex installation
Chris@10 4
Chris@10 5 This chapter describes the installation and customization of FFTW, the
Chris@10 6 latest version of which may be downloaded from
Chris@10 7 @uref{http://www.fftw.org, the FFTW home page}.
Chris@10 8
Chris@10 9 In principle, FFTW should work on any system with an ANSI C compiler
Chris@10 10 (@code{gcc} is fine). However, planner time is drastically reduced if
Chris@10 11 FFTW can exploit a hardware cycle counter; FFTW comes with cycle-counter
Chris@10 12 support for all modern general-purpose CPUs, but you may need to add a
Chris@10 13 couple of lines of code if your compiler is not yet supported
Chris@10 14 (@pxref{Cycle Counters}). (On Unix, there will be a warning at the end
Chris@10 15 of the @code{configure} output if no cycle counter is found.)
Chris@10 16 @cindex cycle counter
Chris@10 17 @cindex compiler
Chris@10 18 @cindex portability
Chris@10 19
Chris@10 20
Chris@10 21 Installation of FFTW is simplest if you have a Unix or a GNU system,
Chris@10 22 such as GNU/Linux, and we describe this case in the first section below,
Chris@10 23 including the use of special configuration options to e.g. install
Chris@10 24 different precisions or exploit optimizations for particular
Chris@10 25 architectures (e.g. SIMD). Compilation on non-Unix systems is a more
Chris@10 26 manual process, but we outline the procedure in the second section. It
Chris@10 27 is also likely that pre-compiled binaries will be available for popular
Chris@10 28 systems.
Chris@10 29
Chris@10 30 Finally, we describe how you can customize FFTW for particular needs by
Chris@10 31 generating @emph{codelets} for fast transforms of sizes not supported
Chris@10 32 efficiently by the standard FFTW distribution.
Chris@10 33 @cindex codelet
Chris@10 34
Chris@10 35 @menu
Chris@10 36 * Installation on Unix::
Chris@10 37 * Installation on non-Unix systems::
Chris@10 38 * Cycle Counters::
Chris@10 39 * Generating your own code::
Chris@10 40 @end menu
Chris@10 41
Chris@10 42 @c ------------------------------------------------------------
Chris@10 43
Chris@10 44 @node Installation on Unix, Installation on non-Unix systems, Installation and Customization, Installation and Customization
Chris@10 45 @section Installation on Unix
Chris@10 46
Chris@10 47 FFTW comes with a @code{configure} program in the GNU style.
Chris@10 48 Installation can be as simple as:
Chris@10 49 @fpindex configure
Chris@10 50
Chris@10 51 @example
Chris@10 52 ./configure
Chris@10 53 make
Chris@10 54 make install
Chris@10 55 @end example
Chris@10 56
Chris@10 57 This will build the uniprocessor complex and real transform libraries
Chris@10 58 along with the test programs. (We recommend that you use GNU
Chris@10 59 @code{make} if it is available; on some systems it is called
Chris@10 60 @code{gmake}.) The ``@code{make install}'' command installs the fftw
Chris@10 61 and rfftw libraries in standard places, and typically requires root
Chris@10 62 privileges (unless you specify a different install directory with the
Chris@10 63 @code{--prefix} flag to @code{configure}). You can also type
Chris@10 64 ``@code{make check}'' to put the FFTW test programs through their paces.
Chris@10 65 If you have problems during configuration or compilation, you may want
Chris@10 66 to run ``@code{make distclean}'' before trying again; this ensures that
Chris@10 67 you don't have any stale files left over from previous compilation
Chris@10 68 attempts.
Chris@10 69
Chris@10 70 The @code{configure} script chooses the @code{gcc} compiler by default,
Chris@10 71 if it is available; you can select some other compiler with:
Chris@10 72 @example
Chris@10 73 ./configure CC="@r{@i{<the name of your C compiler>}}"
Chris@10 74 @end example
Chris@10 75
Chris@10 76 The @code{configure} script knows good @code{CFLAGS} (C compiler flags)
Chris@10 77 @cindex compiler flags
Chris@10 78 for a few systems. If your system is not known, the @code{configure}
Chris@10 79 script will print out a warning. In this case, you should re-configure
Chris@10 80 FFTW with the command
Chris@10 81 @example
Chris@10 82 ./configure CFLAGS="@r{@i{<write your CFLAGS here>}}"
Chris@10 83 @end example
Chris@10 84 and then compile as usual. If you do find an optimal set of
Chris@10 85 @code{CFLAGS} for your system, please let us know what they are (along
Chris@10 86 with the output of @code{config.guess}) so that we can include them in
Chris@10 87 future releases.
Chris@10 88
Chris@10 89 @code{configure} supports all the standard flags defined by the GNU
Chris@10 90 Coding Standards; see the @code{INSTALL} file in FFTW or
Chris@10 91 @uref{http://www.gnu.org/prep/standards/html_node/index.html, the GNU web page}.
Chris@10 92 Note especially @code{--help} to list all flags and
Chris@10 93 @code{--enable-shared} to create shared, rather than static, libraries.
Chris@10 94 @code{configure} also accepts a few FFTW-specific flags, particularly:
Chris@10 95
Chris@10 96 @itemize @bullet
Chris@10 97
Chris@10 98 @item
Chris@10 99 @cindex precision
Chris@10 100 @code{--enable-float}: Produces a single-precision version of FFTW
Chris@10 101 (@code{float}) instead of the default double-precision (@code{double}).
Chris@10 102 @xref{Precision}.
Chris@10 103
Chris@10 104 @item
Chris@10 105 @cindex precision
Chris@10 106 @code{--enable-long-double}: Produces a long-double precision version of
Chris@10 107 FFTW (@code{long double}) instead of the default double-precision
Chris@10 108 (@code{double}). The @code{configure} script will halt with an error
Chris@10 109 message if @code{long double} is the same size as @code{double} on your
Chris@10 110 machine/compiler. @xref{Precision}.
Chris@10 111
Chris@10 112 @item
Chris@10 113 @cindex precision
Chris@10 114 @code{--enable-quad-precision}: Produces a quadruple-precision version
Chris@10 115 of FFTW using the nonstandard @code{__float128} type provided by
Chris@10 116 @code{gcc} 4.6 or later on x86, x86-64, and Itanium architectures,
Chris@10 117 instead of the default double-precision (@code{double}). The
Chris@10 118 @code{configure} script will halt with an error message if the
Chris@10 119 compiler is not @code{gcc} version 4.6 or later or if @code{gcc}'s
Chris@10 120 @code{libquadmath} library is not installed. @xref{Precision}.
Chris@10 121
Chris@10 122 @item
Chris@10 123 @cindex threads
Chris@10 124 @code{--enable-threads}: Enables compilation and installation of the
Chris@10 125 FFTW threads library (@pxref{Multi-threaded FFTW}), which provides a
Chris@10 126 simple interface to parallel transforms for SMP systems. By default,
Chris@10 127 the threads routines are not compiled.
Chris@10 128
Chris@10 129 @item
Chris@10 130 @code{--enable-openmp}: Like @code{--enable-threads}, but using OpenMP
Chris@10 131 compiler directives in order to induce parallelism rather than
Chris@10 132 spawning its own threads directly, and installing an @samp{fftw3_omp} library
Chris@10 133 rather than an @samp{fftw3_threads} library (@pxref{Multi-threaded
Chris@10 134 FFTW}). You can use both @code{--enable-openmp} and @code{--enable-threads}
Chris@10 135 since they compile/install libraries with different names. By default,
Chris@10 136 the OpenMP routines are not compiled.
Chris@10 137
Chris@10 138 @item
Chris@10 139 @code{--with-combined-threads}: By default, if @code{--enable-threads}
Chris@10 140 is used, the threads support is compiled into a separate library that
Chris@10 141 must be linked in addition to the main FFTW library. This is so that
Chris@10 142 users of the serial library do not need to link the system threads
Chris@10 143 libraries. If @code{--with-combined-threads} is specified, however,
Chris@10 144 then no separate threads library is created, and threads are included
Chris@10 145 in the main FFTW library. This is mainly useful under Windows, where
Chris@10 146 no system threads library is required and inter-library dependencies
Chris@10 147 are problematic.
Chris@10 148
Chris@10 149 @item
Chris@10 150 @cindex MPI
Chris@10 151 @code{--enable-mpi}: Enables compilation and installation of the FFTW
Chris@10 152 MPI library (@pxref{Distributed-memory FFTW with MPI}), which provides
Chris@10 153 parallel transforms for distributed-memory systems with MPI. (By
Chris@10 154 default, the MPI routines are not compiled.) @xref{FFTW MPI
Chris@10 155 Installation}.
Chris@10 156
Chris@10 157 @item
Chris@10 158 @cindex Fortran-callable wrappers
Chris@10 159 @code{--disable-fortran}: Disables inclusion of legacy-Fortran
Chris@10 160 wrapper routines (@pxref{Calling FFTW from Legacy Fortran}) in the standard
Chris@10 161 FFTW libraries. These wrapper routines increase the library size by
Chris@10 162 only a negligible amount, so they are included by default as long as
Chris@10 163 the @code{configure} script finds a Fortran compiler on your system.
Chris@10 164 (To specify a particular Fortran compiler @i{foo}, pass
Chris@10 165 @code{F77=}@i{foo} to @code{configure}.)
Chris@10 166
Chris@10 167 @item
Chris@10 168 @code{--with-g77-wrappers}: By default, when Fortran wrappers are
Chris@10 169 included, the wrappers employ the linking conventions of the Fortran
Chris@10 170 compiler detected by the @code{configure} script. If this compiler is
Chris@10 171 GNU @code{g77}, however, then @emph{two} versions of the wrappers are
Chris@10 172 included: one with @code{g77}'s idiosyncratic convention of appending
Chris@10 173 two underscores to identifiers, and one with the more common
Chris@10 174 convention of appending only a single underscore. This way, the same
Chris@10 175 FFTW library will work with both @code{g77} and other Fortran
Chris@10 176 compilers, such as GNU @code{gfortran}. However, the converse is not
Chris@10 177 true: if you configure with a different compiler, then the
Chris@10 178 @code{g77}-compatible wrappers are not included. By specifying
Chris@10 179 @code{--with-g77-wrappers}, the @code{g77}-compatible wrappers are
Chris@10 180 included in addition to wrappers for whatever Fortran compiler
Chris@10 181 @code{configure} finds.
Chris@10 182 @fpindex g77
Chris@10 183
Chris@10 184 @item
Chris@10 185 @code{--with-slow-timer}: Disables the use of hardware cycle counters,
Chris@10 186 and falls back on @code{gettimeofday} or @code{clock}. This greatly
Chris@10 187 worsens performance, and should generally not be used (unless you don't
Chris@10 188 have a cycle counter but still really want an optimized plan regardless
Chris@10 189 of the time). @xref{Cycle Counters}.
Chris@10 190
Chris@10 191 @item
Chris@10 192 @code{--enable-sse}, @code{--enable-sse2}, @code{--enable-avx},
Chris@10 193 @code{--enable-altivec}, @code{--enable-neon}: Enable the compilation of
Chris@10 194 SIMD code for SSE (Pentium III+), SSE2 (Pentium IV+), AVX (Sandy Bridge,
Chris@10 195 Interlagos), AltiVec (PowerPC G4+), NEON (some ARM processors). SSE,
Chris@10 196 AltiVec, and NEON only work with @code{--enable-float} (above). SSE2
Chris@10 197 works in both single and double precision (and is simply SSE in single
Chris@10 198 precision). The resulting code will @emph{still work} on earlier CPUs
Chris@10 199 lacking the SIMD extensions (SIMD is automatically disabled, although
Chris@10 200 the FFTW library is still larger).
Chris@10 201 @itemize @minus
Chris@10 202 @item
Chris@10 203 These options require a compiler supporting SIMD extensions, and
Chris@10 204 compiler support is always a bit flaky: see the FFTW FAQ for a list of
Chris@10 205 compiler versions that have problems compiling FFTW.
Chris@10 206 @item
Chris@10 207 With AltiVec and @code{gcc}, you may have to use the
Chris@10 208 @code{-mabi=altivec} option when compiling any code that links to FFTW,
Chris@10 209 in order to properly align the stack; otherwise, FFTW could crash when
Chris@10 210 it tries to use an AltiVec feature. (This is not necessary on MacOS X.)
Chris@10 211 @item
Chris@10 212 With SSE/SSE2 and @code{gcc}, you should use a version of gcc that
Chris@10 213 properly aligns the stack when compiling any code that links to FFTW.
Chris@10 214 By default, @code{gcc} 2.95 and later versions align the stack as
Chris@10 215 needed, but you should not compile FFTW with the @code{-Os} option or the
Chris@10 216 @code{-mpreferred-stack-boundary} option with an argument less than 4.
Chris@10 217 @item
Chris@10 218 Because of the large variety of ARM processors and ABIs, FFTW
Chris@10 219 does not attempt to guess the correct @code{gcc} flags for generating
Chris@10 220 NEON code. In general, you will have to provide them on the command line.
Chris@10 221 This command line is known to have worked at least once:
Chris@10 222 @example
Chris@10 223 ./configure --with-slow-timer --host=arm-linux-gnueabi \
Chris@10 224 --enable-single --enable-neon \
Chris@10 225 "CC=arm-linux-gnueabi-gcc -march=armv7-a -mfloat-abi=softfp"
Chris@10 226 @end example
Chris@10 227 @end itemize
Chris@10 228
Chris@10 229 @end itemize
Chris@10 230
Chris@10 231 @cindex compiler
Chris@10 232 To force @code{configure} to use a particular C compiler @i{foo}
Chris@10 233 (instead of the default, usually @code{gcc}), pass @code{CC=}@i{foo} to the
Chris@10 234 @code{configure} script; you may also need to set the flags via the variable
Chris@10 235 @code{CFLAGS} as described above.
Chris@10 236 @cindex compiler flags
Chris@10 237
Chris@10 238 @c ------------------------------------------------------------
Chris@10 239 @node Installation on non-Unix systems, Cycle Counters, Installation on Unix, Installation and Customization
Chris@10 240 @section Installation on non-Unix systems
Chris@10 241
Chris@10 242 It should be relatively straightforward to compile FFTW even on non-Unix
Chris@10 243 systems lacking the niceties of a @code{configure} script. Basically,
Chris@10 244 you need to edit the @code{config.h} header (copy it from
Chris@10 245 @code{config.h.in}) to @code{#define} the various options and compiler
Chris@10 246 characteristics, and then compile all the @samp{.c} files in the
Chris@10 247 relevant directories.
Chris@10 248
Chris@10 249 The @code{config.h} header contains about 100 options to set, each one
Chris@10 250 initially an @code{#undef}, each documented with a comment, and most of
Chris@10 251 them fairly obvious. For most of the options, you should simply
Chris@10 252 @code{#define} them to @code{1} if they are applicable, although a few
Chris@10 253 options require a particular value (e.g. @code{SIZEOF_LONG_LONG} should
Chris@10 254 be defined to the size of the @code{long long} type, in bytes, or zero
Chris@10 255 if it is not supported). We will likely post some sample
Chris@10 256 @code{config.h} files for various operating systems and compilers for
Chris@10 257 you to use (at least as a starting point). Please let us know if you
Chris@10 258 have to hand-create a configuration file (and/or a pre-compiled binary)
Chris@10 259 that you want to share.
Chris@10 260
Chris@10 261 To create the FFTW library, you will then need to compile all of the
Chris@10 262 @samp{.c} files in the @code{kernel}, @code{dft}, @code{dft/scalar},
Chris@10 263 @code{dft/scalar/codelets}, @code{rdft}, @code{rdft/scalar},
Chris@10 264 @code{rdft/scalar/r2cf}, @code{rdft/scalar/r2cb},
Chris@10 265 @code{rdft/scalar/r2r}, @code{reodft}, and @code{api} directories.
Chris@10 266 If you are compiling with SIMD support (e.g. you defined
Chris@10 267 @code{HAVE_SSE2} in @code{config.h}), then you also need to compile
Chris@10 268 the @code{.c} files in the @code{simd-support},
Chris@10 269 @code{@{dft,rdft@}/simd}, @code{@{dft,rdft@}/simd/*} directories.
Chris@10 270
Chris@10 271 Once these files are all compiled, link them into a library, or a shared
Chris@10 272 library, or directly into your program.
Chris@10 273
Chris@10 274 To compile the FFTW test program, additionally compile the code in the
Chris@10 275 @code{libbench2/} directory, and link it into a library. Then compile
Chris@10 276 the code in the @code{tests/} directory and link it to the
Chris@10 277 @code{libbench2} and FFTW libraries. To compile the @code{fftw-wisdom}
Chris@10 278 (command-line) tool (@pxref{Wisdom Utilities}), compile
Chris@10 279 @code{tools/fftw-wisdom.c} and link it to the @code{libbench2} and FFTW
Chris@10 280 libraries
Chris@10 281
Chris@10 282 @c ------------------------------------------------------------
Chris@10 283 @node Cycle Counters, Generating your own code, Installation on non-Unix systems, Installation and Customization
Chris@10 284 @section Cycle Counters
Chris@10 285 @cindex cycle counter
Chris@10 286
Chris@10 287 FFTW's planner actually executes and times different possible FFT
Chris@10 288 algorithms in order to pick the fastest plan for a given @math{n}. In
Chris@10 289 order to do this in as short a time as possible, however, the timer must
Chris@10 290 have a very high resolution, and to accomplish this we employ the
Chris@10 291 hardware @dfn{cycle counters} that are available on most CPUs.
Chris@10 292 Currently, FFTW supports the cycle counters on x86, PowerPC/POWER, Alpha,
Chris@10 293 UltraSPARC (SPARC v9), IA64, PA-RISC, and MIPS processors.
Chris@10 294
Chris@10 295 @cindex compiler
Chris@10 296 Access to the cycle counters, unfortunately, is a compiler and/or
Chris@10 297 operating-system dependent task, often requiring inline assembly
Chris@10 298 language, and it may be that your compiler is not supported. If you are
Chris@10 299 @emph{not} supported, FFTW will by default fall back on its estimator
Chris@10 300 (effectively using @code{FFTW_ESTIMATE} for all plans).
Chris@10 301 @ctindex FFTW_ESTIMATE
Chris@10 302
Chris@10 303 You can add support by editing the file @code{kernel/cycle.h}; normally,
Chris@10 304 this will involve adapting one of the examples already present in order
Chris@10 305 to use the inline-assembler syntax for your C compiler, and will only
Chris@10 306 require a couple of lines of code. Anyone adding support for a new
Chris@10 307 system to @code{cycle.h} is encouraged to email us at @email{fftw@@fftw.org}.
Chris@10 308
Chris@10 309 If a cycle counter is not available on your system (e.g. some embedded
Chris@10 310 processor), and you don't want to use estimated plans, as a last resort
Chris@10 311 you can use the @code{--with-slow-timer} option to @code{configure} (on
Chris@10 312 Unix) or @code{#define WITH_SLOW_TIMER} in @code{config.h} (elsewhere).
Chris@10 313 This will use the much lower-resolution @code{gettimeofday} function, or even
Chris@10 314 @code{clock} if the former is unavailable, and planning will be
Chris@10 315 extremely slow.
Chris@10 316
Chris@10 317 @c ------------------------------------------------------------
Chris@10 318 @node Generating your own code, , Cycle Counters, Installation and Customization
Chris@10 319 @section Generating your own code
Chris@10 320 @cindex code generator
Chris@10 321
Chris@10 322 The directory @code{genfft} contains the programs that were used to
Chris@10 323 generate FFTW's ``codelets,'' which are hard-coded transforms of small
Chris@10 324 sizes.
Chris@10 325 @cindex codelet
Chris@10 326 We do not expect casual users to employ the generator, which is a rather
Chris@10 327 sophisticated program that generates directed acyclic graphs of FFT
Chris@10 328 algorithms and performs algebraic simplifications on them. It was
Chris@10 329 written in Objective Caml, a dialect of ML, which is available at
Chris@10 330 @uref{http://caml.inria.fr/ocaml/index.en.html}.
Chris@10 331 @cindex Caml
Chris@10 332
Chris@10 333
Chris@10 334 If you have Objective Caml installed (along with recent versions of
Chris@10 335 GNU @code{autoconf}, @code{automake}, and @code{libtool}), then you
Chris@10 336 can change the set of codelets that are generated or play with the
Chris@10 337 generation options. The set of generated codelets is specified by the
Chris@10 338 @code{@{dft,rdft@}/@{codelets,simd@}/*/Makefile.am} files. For example, you can add
Chris@10 339 efficient REDFT codelets of small sizes by modifying
Chris@10 340 @code{rdft/codelets/r2r/Makefile.am}.
Chris@10 341 @cindex REDFT
Chris@10 342 After you modify any @code{Makefile.am} files, you can type @code{sh
Chris@10 343 bootstrap.sh} in the top-level directory followed by @code{make} to
Chris@10 344 re-generate the files.
Chris@10 345
Chris@10 346 We do not provide more details about the code-generation process, since
Chris@10 347 we do not expect that most users will need to generate their own code.
Chris@10 348 However, feel free to contact us at @email{fftw@@fftw.org} if
Chris@10 349 you are interested in the subject.
Chris@10 350
Chris@10 351 @cindex monadic programming
Chris@10 352 You might find it interesting to learn Caml and/or some modern
Chris@10 353 programming techniques that we used in the generator (including monadic
Chris@10 354 programming), especially if you heard the rumor that Java and
Chris@10 355 object-oriented programming are the latest advancement in the field.
Chris@10 356 The internal operation of the codelet generator is described in the
Chris@10 357 paper, ``A Fast Fourier Transform Compiler,'' by M. Frigo, which is
Chris@10 358 available from the @uref{http://www.fftw.org,FFTW home page} and also
Chris@10 359 appeared in the @cite{Proceedings of the 1999 ACM SIGPLAN Conference on
Chris@10 360 Programming Language Design and Implementation (PLDI)}.
Chris@10 361