annotate src/fftw-3.3.5/doc/install.texi @ 169:223a55898ab9 tip default

Add null config files
author Chris Cannam <cannam@all-day-breakfast.com>
date Mon, 02 Mar 2020 14:03:47 +0000
parents 7867fa7e1b6b
children
rev   line source
cannam@127 1 @node Installation and Customization, Acknowledgments, Upgrading from FFTW version 2, Top
cannam@127 2 @chapter Installation and Customization
cannam@127 3 @cindex installation
cannam@127 4
cannam@127 5 This chapter describes the installation and customization of FFTW, the
cannam@127 6 latest version of which may be downloaded from
cannam@127 7 @uref{http://www.fftw.org, the FFTW home page}.
cannam@127 8
cannam@127 9 In principle, FFTW should work on any system with an ANSI C compiler
cannam@127 10 (@code{gcc} is fine). However, planner time is drastically reduced if
cannam@127 11 FFTW can exploit a hardware cycle counter; FFTW comes with cycle-counter
cannam@127 12 support for all modern general-purpose CPUs, but you may need to add a
cannam@127 13 couple of lines of code if your compiler is not yet supported
cannam@127 14 (@pxref{Cycle Counters}). (On Unix, there will be a warning at the end
cannam@127 15 of the @code{configure} output if no cycle counter is found.)
cannam@127 16 @cindex cycle counter
cannam@127 17 @cindex compiler
cannam@127 18 @cindex portability
cannam@127 19
cannam@127 20
cannam@127 21 Installation of FFTW is simplest if you have a Unix or a GNU system,
cannam@127 22 such as GNU/Linux, and we describe this case in the first section below,
cannam@127 23 including the use of special configuration options to e.g. install
cannam@127 24 different precisions or exploit optimizations for particular
cannam@127 25 architectures (e.g. SIMD). Compilation on non-Unix systems is a more
cannam@127 26 manual process, but we outline the procedure in the second section. It
cannam@127 27 is also likely that pre-compiled binaries will be available for popular
cannam@127 28 systems.
cannam@127 29
cannam@127 30 Finally, we describe how you can customize FFTW for particular needs by
cannam@127 31 generating @emph{codelets} for fast transforms of sizes not supported
cannam@127 32 efficiently by the standard FFTW distribution.
cannam@127 33 @cindex codelet
cannam@127 34
cannam@127 35 @menu
cannam@127 36 * Installation on Unix::
cannam@127 37 * Installation on non-Unix systems::
cannam@127 38 * Cycle Counters::
cannam@127 39 * Generating your own code::
cannam@127 40 @end menu
cannam@127 41
cannam@127 42 @c ------------------------------------------------------------
cannam@127 43
cannam@127 44 @node Installation on Unix, Installation on non-Unix systems, Installation and Customization, Installation and Customization
cannam@127 45 @section Installation on Unix
cannam@127 46
cannam@127 47 FFTW comes with a @code{configure} program in the GNU style.
cannam@127 48 Installation can be as simple as:
cannam@127 49 @fpindex configure
cannam@127 50
cannam@127 51 @example
cannam@127 52 ./configure
cannam@127 53 make
cannam@127 54 make install
cannam@127 55 @end example
cannam@127 56
cannam@127 57 This will build the uniprocessor complex and real transform libraries
cannam@127 58 along with the test programs. (We recommend that you use GNU
cannam@127 59 @code{make} if it is available; on some systems it is called
cannam@127 60 @code{gmake}.) The ``@code{make install}'' command installs the fftw
cannam@127 61 and rfftw libraries in standard places, and typically requires root
cannam@127 62 privileges (unless you specify a different install directory with the
cannam@127 63 @code{--prefix} flag to @code{configure}). You can also type
cannam@127 64 ``@code{make check}'' to put the FFTW test programs through their paces.
cannam@127 65 If you have problems during configuration or compilation, you may want
cannam@127 66 to run ``@code{make distclean}'' before trying again; this ensures that
cannam@127 67 you don't have any stale files left over from previous compilation
cannam@127 68 attempts.
cannam@127 69
cannam@127 70 The @code{configure} script chooses the @code{gcc} compiler by default,
cannam@127 71 if it is available; you can select some other compiler with:
cannam@127 72 @example
cannam@127 73 ./configure CC="@r{@i{<the name of your C compiler>}}"
cannam@127 74 @end example
cannam@127 75
cannam@127 76 The @code{configure} script knows good @code{CFLAGS} (C compiler flags)
cannam@127 77 @cindex compiler flags
cannam@127 78 for a few systems. If your system is not known, the @code{configure}
cannam@127 79 script will print out a warning. In this case, you should re-configure
cannam@127 80 FFTW with the command
cannam@127 81 @example
cannam@127 82 ./configure CFLAGS="@r{@i{<write your CFLAGS here>}}"
cannam@127 83 @end example
cannam@127 84 and then compile as usual. If you do find an optimal set of
cannam@127 85 @code{CFLAGS} for your system, please let us know what they are (along
cannam@127 86 with the output of @code{config.guess}) so that we can include them in
cannam@127 87 future releases.
cannam@127 88
cannam@127 89 @code{configure} supports all the standard flags defined by the GNU
cannam@127 90 Coding Standards; see the @code{INSTALL} file in FFTW or
cannam@127 91 @uref{http://www.gnu.org/prep/standards/html_node/index.html, the GNU web page}.
cannam@127 92 Note especially @code{--help} to list all flags and
cannam@127 93 @code{--enable-shared} to create shared, rather than static, libraries.
cannam@127 94 @code{configure} also accepts a few FFTW-specific flags, particularly:
cannam@127 95
cannam@127 96 @itemize @bullet
cannam@127 97
cannam@127 98 @item
cannam@127 99 @cindex precision
cannam@127 100 @code{--enable-float}: Produces a single-precision version of FFTW
cannam@127 101 (@code{float}) instead of the default double-precision (@code{double}).
cannam@127 102 @xref{Precision}.
cannam@127 103
cannam@127 104 @item
cannam@127 105 @cindex precision
cannam@127 106 @code{--enable-long-double}: Produces a long-double precision version of
cannam@127 107 FFTW (@code{long double}) instead of the default double-precision
cannam@127 108 (@code{double}). The @code{configure} script will halt with an error
cannam@127 109 message if @code{long double} is the same size as @code{double} on your
cannam@127 110 machine/compiler. @xref{Precision}.
cannam@127 111
cannam@127 112 @item
cannam@127 113 @cindex precision
cannam@127 114 @code{--enable-quad-precision}: Produces a quadruple-precision version
cannam@127 115 of FFTW using the nonstandard @code{__float128} type provided by
cannam@127 116 @code{gcc} 4.6 or later on x86, x86-64, and Itanium architectures,
cannam@127 117 instead of the default double-precision (@code{double}). The
cannam@127 118 @code{configure} script will halt with an error message if the
cannam@127 119 compiler is not @code{gcc} version 4.6 or later or if @code{gcc}'s
cannam@127 120 @code{libquadmath} library is not installed. @xref{Precision}.
cannam@127 121
cannam@127 122 @item
cannam@127 123 @cindex threads
cannam@127 124 @code{--enable-threads}: Enables compilation and installation of the
cannam@127 125 FFTW threads library (@pxref{Multi-threaded FFTW}), which provides a
cannam@127 126 simple interface to parallel transforms for SMP systems. By default,
cannam@127 127 the threads routines are not compiled.
cannam@127 128
cannam@127 129 @item
cannam@127 130 @code{--enable-openmp}: Like @code{--enable-threads}, but using OpenMP
cannam@127 131 compiler directives in order to induce parallelism rather than
cannam@127 132 spawning its own threads directly, and installing an @samp{fftw3_omp} library
cannam@127 133 rather than an @samp{fftw3_threads} library (@pxref{Multi-threaded
cannam@127 134 FFTW}). You can use both @code{--enable-openmp} and @code{--enable-threads}
cannam@127 135 since they compile/install libraries with different names. By default,
cannam@127 136 the OpenMP routines are not compiled.
cannam@127 137
cannam@127 138 @item
cannam@127 139 @code{--with-combined-threads}: By default, if @code{--enable-threads}
cannam@127 140 is used, the threads support is compiled into a separate library that
cannam@127 141 must be linked in addition to the main FFTW library. This is so that
cannam@127 142 users of the serial library do not need to link the system threads
cannam@127 143 libraries. If @code{--with-combined-threads} is specified, however,
cannam@127 144 then no separate threads library is created, and threads are included
cannam@127 145 in the main FFTW library. This is mainly useful under Windows, where
cannam@127 146 no system threads library is required and inter-library dependencies
cannam@127 147 are problematic.
cannam@127 148
cannam@127 149 @item
cannam@127 150 @cindex MPI
cannam@127 151 @code{--enable-mpi}: Enables compilation and installation of the FFTW
cannam@127 152 MPI library (@pxref{Distributed-memory FFTW with MPI}), which provides
cannam@127 153 parallel transforms for distributed-memory systems with MPI. (By
cannam@127 154 default, the MPI routines are not compiled.) @xref{FFTW MPI
cannam@127 155 Installation}.
cannam@127 156
cannam@127 157 @item
cannam@127 158 @cindex Fortran-callable wrappers
cannam@127 159 @code{--disable-fortran}: Disables inclusion of legacy-Fortran
cannam@127 160 wrapper routines (@pxref{Calling FFTW from Legacy Fortran}) in the standard
cannam@127 161 FFTW libraries. These wrapper routines increase the library size by
cannam@127 162 only a negligible amount, so they are included by default as long as
cannam@127 163 the @code{configure} script finds a Fortran compiler on your system.
cannam@127 164 (To specify a particular Fortran compiler @i{foo}, pass
cannam@127 165 @code{F77=}@i{foo} to @code{configure}.)
cannam@127 166
cannam@127 167 @item
cannam@127 168 @code{--with-g77-wrappers}: By default, when Fortran wrappers are
cannam@127 169 included, the wrappers employ the linking conventions of the Fortran
cannam@127 170 compiler detected by the @code{configure} script. If this compiler is
cannam@127 171 GNU @code{g77}, however, then @emph{two} versions of the wrappers are
cannam@127 172 included: one with @code{g77}'s idiosyncratic convention of appending
cannam@127 173 two underscores to identifiers, and one with the more common
cannam@127 174 convention of appending only a single underscore. This way, the same
cannam@127 175 FFTW library will work with both @code{g77} and other Fortran
cannam@127 176 compilers, such as GNU @code{gfortran}. However, the converse is not
cannam@127 177 true: if you configure with a different compiler, then the
cannam@127 178 @code{g77}-compatible wrappers are not included. By specifying
cannam@127 179 @code{--with-g77-wrappers}, the @code{g77}-compatible wrappers are
cannam@127 180 included in addition to wrappers for whatever Fortran compiler
cannam@127 181 @code{configure} finds.
cannam@127 182 @fpindex g77
cannam@127 183
cannam@127 184 @item
cannam@127 185 @code{--with-slow-timer}: Disables the use of hardware cycle counters,
cannam@127 186 and falls back on @code{gettimeofday} or @code{clock}. This greatly
cannam@127 187 worsens performance, and should generally not be used (unless you don't
cannam@127 188 have a cycle counter but still really want an optimized plan regardless
cannam@127 189 of the time). @xref{Cycle Counters}.
cannam@127 190
cannam@127 191 @item
cannam@127 192 @code{--enable-sse} (single precision),
cannam@127 193 @code{--enable-sse2} (single, double),
cannam@127 194 @code{--enable-avx} (single, double),
cannam@127 195 @code{--enable-avx2} (single, double),
cannam@127 196 @code{--enable-avx512} (single, double),
cannam@127 197 @code{--enable-avx-128-fma},
cannam@127 198 @code{--enable-kcvi} (single),
cannam@127 199 @code{--enable-altivec} (single),
cannam@127 200 @code{--enable-vsx} (single, double),
cannam@127 201 @code{--enable-neon} (single, double on aarch64),
cannam@127 202 @code{--enable-generic-simd128},
cannam@127 203 and
cannam@127 204 @code{--enable-generic-simd256}:
cannam@127 205
cannam@127 206 Enable various SIMD instruction sets. You need compiler that supports
cannam@127 207 the given SIMD extensions, but FFTW will try to detect at runtime
cannam@127 208 whether the CPU supports these extensions. That is, you can compile
cannam@127 209 with@code{--enable-avx} and the code will still run on a CPU without AVX
cannam@127 210 support.
cannam@127 211
cannam@127 212 @itemize @minus
cannam@127 213 @item
cannam@127 214 These options require a compiler supporting SIMD extensions, and
cannam@127 215 compiler support is always a bit flaky: see the FFTW FAQ for a list of
cannam@127 216 compiler versions that have problems compiling FFTW.
cannam@127 217 @item
cannam@127 218 Because of the large variety of ARM processors and ABIs, FFTW
cannam@127 219 does not attempt to guess the correct @code{gcc} flags for generating
cannam@127 220 NEON code. In general, you will have to provide them on the command line.
cannam@127 221 This command line is known to have worked at least once:
cannam@127 222 @example
cannam@127 223 ./configure --with-slow-timer --host=arm-linux-gnueabi \
cannam@127 224 --enable-single --enable-neon \
cannam@127 225 "CC=arm-linux-gnueabi-gcc -march=armv7-a -mfloat-abi=softfp"
cannam@127 226 @end example
cannam@127 227 @end itemize
cannam@127 228
cannam@127 229 @end itemize
cannam@127 230
cannam@127 231 @cindex compiler
cannam@127 232 To force @code{configure} to use a particular C compiler @i{foo}
cannam@127 233 (instead of the default, usually @code{gcc}), pass @code{CC=}@i{foo} to the
cannam@127 234 @code{configure} script; you may also need to set the flags via the variable
cannam@127 235 @code{CFLAGS} as described above.
cannam@127 236 @cindex compiler flags
cannam@127 237
cannam@127 238 @c ------------------------------------------------------------
cannam@127 239 @node Installation on non-Unix systems, Cycle Counters, Installation on Unix, Installation and Customization
cannam@127 240 @section Installation on non-Unix systems
cannam@127 241
cannam@127 242 It should be relatively straightforward to compile FFTW even on non-Unix
cannam@127 243 systems lacking the niceties of a @code{configure} script. Basically,
cannam@127 244 you need to edit the @code{config.h} header (copy it from
cannam@127 245 @code{config.h.in}) to @code{#define} the various options and compiler
cannam@127 246 characteristics, and then compile all the @samp{.c} files in the
cannam@127 247 relevant directories.
cannam@127 248
cannam@127 249 The @code{config.h} header contains about 100 options to set, each one
cannam@127 250 initially an @code{#undef}, each documented with a comment, and most of
cannam@127 251 them fairly obvious. For most of the options, you should simply
cannam@127 252 @code{#define} them to @code{1} if they are applicable, although a few
cannam@127 253 options require a particular value (e.g. @code{SIZEOF_LONG_LONG} should
cannam@127 254 be defined to the size of the @code{long long} type, in bytes, or zero
cannam@127 255 if it is not supported). We will likely post some sample
cannam@127 256 @code{config.h} files for various operating systems and compilers for
cannam@127 257 you to use (at least as a starting point). Please let us know if you
cannam@127 258 have to hand-create a configuration file (and/or a pre-compiled binary)
cannam@127 259 that you want to share.
cannam@127 260
cannam@127 261 To create the FFTW library, you will then need to compile all of the
cannam@127 262 @samp{.c} files in the @code{kernel}, @code{dft}, @code{dft/scalar},
cannam@127 263 @code{dft/scalar/codelets}, @code{rdft}, @code{rdft/scalar},
cannam@127 264 @code{rdft/scalar/r2cf}, @code{rdft/scalar/r2cb},
cannam@127 265 @code{rdft/scalar/r2r}, @code{reodft}, and @code{api} directories.
cannam@127 266 If you are compiling with SIMD support (e.g. you defined
cannam@127 267 @code{HAVE_SSE2} in @code{config.h}), then you also need to compile
cannam@127 268 the @code{.c} files in the @code{simd-support},
cannam@127 269 @code{@{dft,rdft@}/simd}, @code{@{dft,rdft@}/simd/*} directories.
cannam@127 270
cannam@127 271 Once these files are all compiled, link them into a library, or a shared
cannam@127 272 library, or directly into your program.
cannam@127 273
cannam@127 274 To compile the FFTW test program, additionally compile the code in the
cannam@127 275 @code{libbench2/} directory, and link it into a library. Then compile
cannam@127 276 the code in the @code{tests/} directory and link it to the
cannam@127 277 @code{libbench2} and FFTW libraries. To compile the @code{fftw-wisdom}
cannam@127 278 (command-line) tool (@pxref{Wisdom Utilities}), compile
cannam@127 279 @code{tools/fftw-wisdom.c} and link it to the @code{libbench2} and FFTW
cannam@127 280 libraries
cannam@127 281
cannam@127 282 @c ------------------------------------------------------------
cannam@127 283 @node Cycle Counters, Generating your own code, Installation on non-Unix systems, Installation and Customization
cannam@127 284 @section Cycle Counters
cannam@127 285 @cindex cycle counter
cannam@127 286
cannam@127 287 FFTW's planner actually executes and times different possible FFT
cannam@127 288 algorithms in order to pick the fastest plan for a given @math{n}. In
cannam@127 289 order to do this in as short a time as possible, however, the timer must
cannam@127 290 have a very high resolution, and to accomplish this we employ the
cannam@127 291 hardware @dfn{cycle counters} that are available on most CPUs.
cannam@127 292 Currently, FFTW supports the cycle counters on x86, PowerPC/POWER, Alpha,
cannam@127 293 UltraSPARC (SPARC v9), IA64, PA-RISC, and MIPS processors.
cannam@127 294
cannam@127 295 @cindex compiler
cannam@127 296 Access to the cycle counters, unfortunately, is a compiler and/or
cannam@127 297 operating-system dependent task, often requiring inline assembly
cannam@127 298 language, and it may be that your compiler is not supported. If you are
cannam@127 299 @emph{not} supported, FFTW will by default fall back on its estimator
cannam@127 300 (effectively using @code{FFTW_ESTIMATE} for all plans).
cannam@127 301 @ctindex FFTW_ESTIMATE
cannam@127 302
cannam@127 303 You can add support by editing the file @code{kernel/cycle.h}; normally,
cannam@127 304 this will involve adapting one of the examples already present in order
cannam@127 305 to use the inline-assembler syntax for your C compiler, and will only
cannam@127 306 require a couple of lines of code. Anyone adding support for a new
cannam@127 307 system to @code{cycle.h} is encouraged to email us at @email{fftw@@fftw.org}.
cannam@127 308
cannam@127 309 If a cycle counter is not available on your system (e.g. some embedded
cannam@127 310 processor), and you don't want to use estimated plans, as a last resort
cannam@127 311 you can use the @code{--with-slow-timer} option to @code{configure} (on
cannam@127 312 Unix) or @code{#define WITH_SLOW_TIMER} in @code{config.h} (elsewhere).
cannam@127 313 This will use the much lower-resolution @code{gettimeofday} function, or even
cannam@127 314 @code{clock} if the former is unavailable, and planning will be
cannam@127 315 extremely slow.
cannam@127 316
cannam@127 317 @c ------------------------------------------------------------
cannam@127 318 @node Generating your own code, , Cycle Counters, Installation and Customization
cannam@127 319 @section Generating your own code
cannam@127 320 @cindex code generator
cannam@127 321
cannam@127 322 The directory @code{genfft} contains the programs that were used to
cannam@127 323 generate FFTW's ``codelets,'' which are hard-coded transforms of small
cannam@127 324 sizes.
cannam@127 325 @cindex codelet
cannam@127 326 We do not expect casual users to employ the generator, which is a rather
cannam@127 327 sophisticated program that generates directed acyclic graphs of FFT
cannam@127 328 algorithms and performs algebraic simplifications on them. It was
cannam@127 329 written in Objective Caml, a dialect of ML, which is available at
cannam@127 330 @uref{http://caml.inria.fr/ocaml/index.en.html}.
cannam@127 331 @cindex Caml
cannam@127 332
cannam@127 333
cannam@127 334 If you have Objective Caml installed (along with recent versions of
cannam@127 335 GNU @code{autoconf}, @code{automake}, and @code{libtool}), then you
cannam@127 336 can change the set of codelets that are generated or play with the
cannam@127 337 generation options. The set of generated codelets is specified by the
cannam@127 338 @code{@{dft,rdft@}/@{codelets,simd@}/*/Makefile.am} files. For example, you can add
cannam@127 339 efficient REDFT codelets of small sizes by modifying
cannam@127 340 @code{rdft/codelets/r2r/Makefile.am}.
cannam@127 341 @cindex REDFT
cannam@127 342 After you modify any @code{Makefile.am} files, you can type @code{sh
cannam@127 343 bootstrap.sh} in the top-level directory followed by @code{make} to
cannam@127 344 re-generate the files.
cannam@127 345
cannam@127 346 We do not provide more details about the code-generation process, since
cannam@127 347 we do not expect that most users will need to generate their own code.
cannam@127 348 However, feel free to contact us at @email{fftw@@fftw.org} if
cannam@127 349 you are interested in the subject.
cannam@127 350
cannam@127 351 @cindex monadic programming
cannam@127 352 You might find it interesting to learn Caml and/or some modern
cannam@127 353 programming techniques that we used in the generator (including monadic
cannam@127 354 programming), especially if you heard the rumor that Java and
cannam@127 355 object-oriented programming are the latest advancement in the field.
cannam@127 356 The internal operation of the codelet generator is described in the
cannam@127 357 paper, ``A Fast Fourier Transform Compiler,'' by M. Frigo, which is
cannam@127 358 available from the @uref{http://www.fftw.org,FFTW home page} and also
cannam@127 359 appeared in the @cite{Proceedings of the 1999 ACM SIGPLAN Conference on
cannam@127 360 Programming Language Design and Implementation (PLDI)}.
cannam@127 361