cannam@167: @node Installation and Customization, Acknowledgments, Upgrading from FFTW version 2, Top cannam@167: @chapter Installation and Customization cannam@167: @cindex installation cannam@167: cannam@167: This chapter describes the installation and customization of FFTW, the cannam@167: latest version of which may be downloaded from cannam@167: @uref{http://www.fftw.org, the FFTW home page}. cannam@167: cannam@167: In principle, FFTW should work on any system with an ANSI C compiler cannam@167: (@code{gcc} is fine). However, planner time is drastically reduced if cannam@167: FFTW can exploit a hardware cycle counter; FFTW comes with cycle-counter cannam@167: support for all modern general-purpose CPUs, but you may need to add a cannam@167: couple of lines of code if your compiler is not yet supported cannam@167: (@pxref{Cycle Counters}). (On Unix, there will be a warning at the end cannam@167: of the @code{configure} output if no cycle counter is found.) cannam@167: @cindex cycle counter cannam@167: @cindex compiler cannam@167: @cindex portability cannam@167: cannam@167: cannam@167: Installation of FFTW is simplest if you have a Unix or a GNU system, cannam@167: such as GNU/Linux, and we describe this case in the first section below, cannam@167: including the use of special configuration options to e.g. install cannam@167: different precisions or exploit optimizations for particular cannam@167: architectures (e.g. SIMD). Compilation on non-Unix systems is a more cannam@167: manual process, but we outline the procedure in the second section. It cannam@167: is also likely that pre-compiled binaries will be available for popular cannam@167: systems. cannam@167: cannam@167: Finally, we describe how you can customize FFTW for particular needs by cannam@167: generating @emph{codelets} for fast transforms of sizes not supported cannam@167: efficiently by the standard FFTW distribution. cannam@167: @cindex codelet cannam@167: cannam@167: @menu cannam@167: * Installation on Unix:: cannam@167: * Installation on non-Unix systems:: cannam@167: * Cycle Counters:: cannam@167: * Generating your own code:: cannam@167: @end menu cannam@167: cannam@167: @c ------------------------------------------------------------ cannam@167: cannam@167: @node Installation on Unix, Installation on non-Unix systems, Installation and Customization, Installation and Customization cannam@167: @section Installation on Unix cannam@167: cannam@167: FFTW comes with a @code{configure} program in the GNU style. cannam@167: Installation can be as simple as: cannam@167: @fpindex configure cannam@167: cannam@167: @example cannam@167: ./configure cannam@167: make cannam@167: make install cannam@167: @end example cannam@167: cannam@167: This will build the uniprocessor complex and real transform libraries cannam@167: along with the test programs. (We recommend that you use GNU cannam@167: @code{make} if it is available; on some systems it is called cannam@167: @code{gmake}.) The ``@code{make install}'' command installs the fftw cannam@167: and rfftw libraries in standard places, and typically requires root cannam@167: privileges (unless you specify a different install directory with the cannam@167: @code{--prefix} flag to @code{configure}). You can also type cannam@167: ``@code{make check}'' to put the FFTW test programs through their paces. cannam@167: If you have problems during configuration or compilation, you may want cannam@167: to run ``@code{make distclean}'' before trying again; this ensures that cannam@167: you don't have any stale files left over from previous compilation cannam@167: attempts. cannam@167: cannam@167: The @code{configure} script chooses the @code{gcc} compiler by default, cannam@167: if it is available; you can select some other compiler with: cannam@167: @example cannam@167: ./configure CC="@r{@i{}}" cannam@167: @end example cannam@167: cannam@167: The @code{configure} script knows good @code{CFLAGS} (C compiler flags) cannam@167: @cindex compiler flags cannam@167: for a few systems. If your system is not known, the @code{configure} cannam@167: script will print out a warning. In this case, you should re-configure cannam@167: FFTW with the command cannam@167: @example cannam@167: ./configure CFLAGS="@r{@i{}}" cannam@167: @end example cannam@167: and then compile as usual. If you do find an optimal set of cannam@167: @code{CFLAGS} for your system, please let us know what they are (along cannam@167: with the output of @code{config.guess}) so that we can include them in cannam@167: future releases. cannam@167: cannam@167: @code{configure} supports all the standard flags defined by the GNU cannam@167: Coding Standards; see the @code{INSTALL} file in FFTW or cannam@167: @uref{http://www.gnu.org/prep/standards/html_node/index.html, the GNU web page}. cannam@167: Note especially @code{--help} to list all flags and cannam@167: @code{--enable-shared} to create shared, rather than static, libraries. cannam@167: @code{configure} also accepts a few FFTW-specific flags, particularly: cannam@167: cannam@167: @itemize @bullet cannam@167: cannam@167: @item cannam@167: @cindex precision cannam@167: @code{--enable-float}: Produces a single-precision version of FFTW cannam@167: (@code{float}) instead of the default double-precision (@code{double}). cannam@167: @xref{Precision}. cannam@167: cannam@167: @item cannam@167: @cindex precision cannam@167: @code{--enable-long-double}: Produces a long-double precision version of cannam@167: FFTW (@code{long double}) instead of the default double-precision cannam@167: (@code{double}). The @code{configure} script will halt with an error cannam@167: message if @code{long double} is the same size as @code{double} on your cannam@167: machine/compiler. @xref{Precision}. cannam@167: cannam@167: @item cannam@167: @cindex precision cannam@167: @code{--enable-quad-precision}: Produces a quadruple-precision version cannam@167: of FFTW using the nonstandard @code{__float128} type provided by cannam@167: @code{gcc} 4.6 or later on x86, x86-64, and Itanium architectures, cannam@167: instead of the default double-precision (@code{double}). The cannam@167: @code{configure} script will halt with an error message if the cannam@167: compiler is not @code{gcc} version 4.6 or later or if @code{gcc}'s cannam@167: @code{libquadmath} library is not installed. @xref{Precision}. cannam@167: cannam@167: @item cannam@167: @cindex threads cannam@167: @code{--enable-threads}: Enables compilation and installation of the cannam@167: FFTW threads library (@pxref{Multi-threaded FFTW}), which provides a cannam@167: simple interface to parallel transforms for SMP systems. By default, cannam@167: the threads routines are not compiled. cannam@167: cannam@167: @item cannam@167: @code{--enable-openmp}: Like @code{--enable-threads}, but using OpenMP cannam@167: compiler directives in order to induce parallelism rather than cannam@167: spawning its own threads directly, and installing an @samp{fftw3_omp} library cannam@167: rather than an @samp{fftw3_threads} library (@pxref{Multi-threaded cannam@167: FFTW}). You can use both @code{--enable-openmp} and @code{--enable-threads} cannam@167: since they compile/install libraries with different names. By default, cannam@167: the OpenMP routines are not compiled. cannam@167: cannam@167: @item cannam@167: @code{--with-combined-threads}: By default, if @code{--enable-threads} cannam@167: is used, the threads support is compiled into a separate library that cannam@167: must be linked in addition to the main FFTW library. This is so that cannam@167: users of the serial library do not need to link the system threads cannam@167: libraries. If @code{--with-combined-threads} is specified, however, cannam@167: then no separate threads library is created, and threads are included cannam@167: in the main FFTW library. This is mainly useful under Windows, where cannam@167: no system threads library is required and inter-library dependencies cannam@167: are problematic. cannam@167: cannam@167: @item cannam@167: @cindex MPI cannam@167: @code{--enable-mpi}: Enables compilation and installation of the FFTW cannam@167: MPI library (@pxref{Distributed-memory FFTW with MPI}), which provides cannam@167: parallel transforms for distributed-memory systems with MPI. (By cannam@167: default, the MPI routines are not compiled.) @xref{FFTW MPI cannam@167: Installation}. cannam@167: cannam@167: @item cannam@167: @cindex Fortran-callable wrappers cannam@167: @code{--disable-fortran}: Disables inclusion of legacy-Fortran cannam@167: wrapper routines (@pxref{Calling FFTW from Legacy Fortran}) in the standard cannam@167: FFTW libraries. These wrapper routines increase the library size by cannam@167: only a negligible amount, so they are included by default as long as cannam@167: the @code{configure} script finds a Fortran compiler on your system. cannam@167: (To specify a particular Fortran compiler @i{foo}, pass cannam@167: @code{F77=}@i{foo} to @code{configure}.) cannam@167: cannam@167: @item cannam@167: @code{--with-g77-wrappers}: By default, when Fortran wrappers are cannam@167: included, the wrappers employ the linking conventions of the Fortran cannam@167: compiler detected by the @code{configure} script. If this compiler is cannam@167: GNU @code{g77}, however, then @emph{two} versions of the wrappers are cannam@167: included: one with @code{g77}'s idiosyncratic convention of appending cannam@167: two underscores to identifiers, and one with the more common cannam@167: convention of appending only a single underscore. This way, the same cannam@167: FFTW library will work with both @code{g77} and other Fortran cannam@167: compilers, such as GNU @code{gfortran}. However, the converse is not cannam@167: true: if you configure with a different compiler, then the cannam@167: @code{g77}-compatible wrappers are not included. By specifying cannam@167: @code{--with-g77-wrappers}, the @code{g77}-compatible wrappers are cannam@167: included in addition to wrappers for whatever Fortran compiler cannam@167: @code{configure} finds. cannam@167: @fpindex g77 cannam@167: cannam@167: @item cannam@167: @code{--with-slow-timer}: Disables the use of hardware cycle counters, cannam@167: and falls back on @code{gettimeofday} or @code{clock}. This greatly cannam@167: worsens performance, and should generally not be used (unless you don't cannam@167: have a cycle counter but still really want an optimized plan regardless cannam@167: of the time). @xref{Cycle Counters}. cannam@167: cannam@167: @item cannam@167: @code{--enable-sse} (single precision), cannam@167: @code{--enable-sse2} (single, double), cannam@167: @code{--enable-avx} (single, double), cannam@167: @code{--enable-avx2} (single, double), cannam@167: @code{--enable-avx512} (single, double), cannam@167: @code{--enable-avx-128-fma}, cannam@167: @code{--enable-kcvi} (single), cannam@167: @code{--enable-altivec} (single), cannam@167: @code{--enable-vsx} (single, double), cannam@167: @code{--enable-neon} (single, double on aarch64), cannam@167: @code{--enable-generic-simd128}, cannam@167: and cannam@167: @code{--enable-generic-simd256}: cannam@167: cannam@167: Enable various SIMD instruction sets. You need compiler that supports cannam@167: the given SIMD extensions, but FFTW will try to detect at runtime cannam@167: whether the CPU supports these extensions. That is, you can compile cannam@167: with@code{--enable-avx} and the code will still run on a CPU without AVX cannam@167: support. cannam@167: cannam@167: @itemize @minus cannam@167: @item cannam@167: These options require a compiler supporting SIMD extensions, and cannam@167: compiler support is always a bit flaky: see the FFTW FAQ for a list of cannam@167: compiler versions that have problems compiling FFTW. cannam@167: @item cannam@167: Because of the large variety of ARM processors and ABIs, FFTW cannam@167: does not attempt to guess the correct @code{gcc} flags for generating cannam@167: NEON code. In general, you will have to provide them on the command line. cannam@167: This command line is known to have worked at least once: cannam@167: @example cannam@167: ./configure --with-slow-timer --host=arm-linux-gnueabi \ cannam@167: --enable-single --enable-neon \ cannam@167: "CC=arm-linux-gnueabi-gcc -march=armv7-a -mfloat-abi=softfp" cannam@167: @end example cannam@167: @end itemize cannam@167: cannam@167: @end itemize cannam@167: cannam@167: @cindex compiler cannam@167: To force @code{configure} to use a particular C compiler @i{foo} cannam@167: (instead of the default, usually @code{gcc}), pass @code{CC=}@i{foo} to the cannam@167: @code{configure} script; you may also need to set the flags via the variable cannam@167: @code{CFLAGS} as described above. cannam@167: @cindex compiler flags cannam@167: cannam@167: @c ------------------------------------------------------------ cannam@167: @node Installation on non-Unix systems, Cycle Counters, Installation on Unix, Installation and Customization cannam@167: @section Installation on non-Unix systems cannam@167: cannam@167: It should be relatively straightforward to compile FFTW even on non-Unix cannam@167: systems lacking the niceties of a @code{configure} script. Basically, cannam@167: you need to edit the @code{config.h} header (copy it from cannam@167: @code{config.h.in}) to @code{#define} the various options and compiler cannam@167: characteristics, and then compile all the @samp{.c} files in the cannam@167: relevant directories. cannam@167: cannam@167: The @code{config.h} header contains about 100 options to set, each one cannam@167: initially an @code{#undef}, each documented with a comment, and most of cannam@167: them fairly obvious. For most of the options, you should simply cannam@167: @code{#define} them to @code{1} if they are applicable, although a few cannam@167: options require a particular value (e.g. @code{SIZEOF_LONG_LONG} should cannam@167: be defined to the size of the @code{long long} type, in bytes, or zero cannam@167: if it is not supported). We will likely post some sample cannam@167: @code{config.h} files for various operating systems and compilers for cannam@167: you to use (at least as a starting point). Please let us know if you cannam@167: have to hand-create a configuration file (and/or a pre-compiled binary) cannam@167: that you want to share. cannam@167: cannam@167: To create the FFTW library, you will then need to compile all of the cannam@167: @samp{.c} files in the @code{kernel}, @code{dft}, @code{dft/scalar}, cannam@167: @code{dft/scalar/codelets}, @code{rdft}, @code{rdft/scalar}, cannam@167: @code{rdft/scalar/r2cf}, @code{rdft/scalar/r2cb}, cannam@167: @code{rdft/scalar/r2r}, @code{reodft}, and @code{api} directories. cannam@167: If you are compiling with SIMD support (e.g. you defined cannam@167: @code{HAVE_SSE2} in @code{config.h}), then you also need to compile cannam@167: the @code{.c} files in the @code{simd-support}, cannam@167: @code{@{dft,rdft@}/simd}, @code{@{dft,rdft@}/simd/*} directories. cannam@167: cannam@167: Once these files are all compiled, link them into a library, or a shared cannam@167: library, or directly into your program. cannam@167: cannam@167: To compile the FFTW test program, additionally compile the code in the cannam@167: @code{libbench2/} directory, and link it into a library. Then compile cannam@167: the code in the @code{tests/} directory and link it to the cannam@167: @code{libbench2} and FFTW libraries. To compile the @code{fftw-wisdom} cannam@167: (command-line) tool (@pxref{Wisdom Utilities}), compile cannam@167: @code{tools/fftw-wisdom.c} and link it to the @code{libbench2} and FFTW cannam@167: libraries cannam@167: cannam@167: @c ------------------------------------------------------------ cannam@167: @node Cycle Counters, Generating your own code, Installation on non-Unix systems, Installation and Customization cannam@167: @section Cycle Counters cannam@167: @cindex cycle counter cannam@167: cannam@167: FFTW's planner actually executes and times different possible FFT cannam@167: algorithms in order to pick the fastest plan for a given @math{n}. In cannam@167: order to do this in as short a time as possible, however, the timer must cannam@167: have a very high resolution, and to accomplish this we employ the cannam@167: hardware @dfn{cycle counters} that are available on most CPUs. cannam@167: Currently, FFTW supports the cycle counters on x86, PowerPC/POWER, Alpha, cannam@167: UltraSPARC (SPARC v9), IA64, PA-RISC, and MIPS processors. cannam@167: cannam@167: @cindex compiler cannam@167: Access to the cycle counters, unfortunately, is a compiler and/or cannam@167: operating-system dependent task, often requiring inline assembly cannam@167: language, and it may be that your compiler is not supported. If you are cannam@167: @emph{not} supported, FFTW will by default fall back on its estimator cannam@167: (effectively using @code{FFTW_ESTIMATE} for all plans). cannam@167: @ctindex FFTW_ESTIMATE cannam@167: cannam@167: You can add support by editing the file @code{kernel/cycle.h}; normally, cannam@167: this will involve adapting one of the examples already present in order cannam@167: to use the inline-assembler syntax for your C compiler, and will only cannam@167: require a couple of lines of code. Anyone adding support for a new cannam@167: system to @code{cycle.h} is encouraged to email us at @email{fftw@@fftw.org}. cannam@167: cannam@167: If a cycle counter is not available on your system (e.g. some embedded cannam@167: processor), and you don't want to use estimated plans, as a last resort cannam@167: you can use the @code{--with-slow-timer} option to @code{configure} (on cannam@167: Unix) or @code{#define WITH_SLOW_TIMER} in @code{config.h} (elsewhere). cannam@167: This will use the much lower-resolution @code{gettimeofday} function, or even cannam@167: @code{clock} if the former is unavailable, and planning will be cannam@167: extremely slow. cannam@167: cannam@167: @c ------------------------------------------------------------ cannam@167: @node Generating your own code, , Cycle Counters, Installation and Customization cannam@167: @section Generating your own code cannam@167: @cindex code generator cannam@167: cannam@167: The directory @code{genfft} contains the programs that were used to cannam@167: generate FFTW's ``codelets,'' which are hard-coded transforms of small cannam@167: sizes. cannam@167: @cindex codelet cannam@167: We do not expect casual users to employ the generator, which is a rather cannam@167: sophisticated program that generates directed acyclic graphs of FFT cannam@167: algorithms and performs algebraic simplifications on them. It was cannam@167: written in Objective Caml, a dialect of ML, which is available at cannam@167: @uref{http://caml.inria.fr/ocaml/index.en.html}. cannam@167: @cindex Caml cannam@167: cannam@167: cannam@167: If you have Objective Caml installed (along with recent versions of cannam@167: GNU @code{autoconf}, @code{automake}, and @code{libtool}), then you cannam@167: can change the set of codelets that are generated or play with the cannam@167: generation options. The set of generated codelets is specified by the cannam@167: @code{@{dft,rdft@}/@{codelets,simd@}/*/Makefile.am} files. For example, you can add cannam@167: efficient REDFT codelets of small sizes by modifying cannam@167: @code{rdft/codelets/r2r/Makefile.am}. cannam@167: @cindex REDFT cannam@167: After you modify any @code{Makefile.am} files, you can type @code{sh cannam@167: bootstrap.sh} in the top-level directory followed by @code{make} to cannam@167: re-generate the files. cannam@167: cannam@167: We do not provide more details about the code-generation process, since cannam@167: we do not expect that most users will need to generate their own code. cannam@167: However, feel free to contact us at @email{fftw@@fftw.org} if cannam@167: you are interested in the subject. cannam@167: cannam@167: @cindex monadic programming cannam@167: You might find it interesting to learn Caml and/or some modern cannam@167: programming techniques that we used in the generator (including monadic cannam@167: programming), especially if you heard the rumor that Java and cannam@167: object-oriented programming are the latest advancement in the field. cannam@167: The internal operation of the codelet generator is described in the cannam@167: paper, ``A Fast Fourier Transform Compiler,'' by M. Frigo, which is cannam@167: available from the @uref{http://www.fftw.org,FFTW home page} and also cannam@167: appeared in the @cite{Proceedings of the 1999 ACM SIGPLAN Conference on cannam@167: Programming Language Design and Implementation (PLDI)}. cannam@167: