sv-dependency-builds: src/fftw-3.3.3/doc/install.texi annotate

annotate src/fftw-3.3.3/doc/install.texi @ 23:619f715526df sv_v2.1

Update Vamp plugin SDK to 2.5

author	Chris Cannam
date	Thu, 09 May 2013 10:52:46 +0100
parents	37bf6b4a2645
children

rev	line source
Chris@10	1 @node Installation and Customization, Acknowledgments, Upgrading from FFTW version 2, Top
Chris@10	2 @chapter Installation and Customization
Chris@10	3 @cindex installation
Chris@10	4
Chris@10	5 This chapter describes the installation and customization of FFTW, the
Chris@10	6 latest version of which may be downloaded from
Chris@10	7 @uref{http://www.fftw.org, the FFTW home page}.
Chris@10	8
Chris@10	9 In principle, FFTW should work on any system with an ANSI C compiler
Chris@10	10 (@code{gcc} is fine). However, planner time is drastically reduced if
Chris@10	11 FFTW can exploit a hardware cycle counter; FFTW comes with cycle-counter
Chris@10	12 support for all modern general-purpose CPUs, but you may need to add a
Chris@10	13 couple of lines of code if your compiler is not yet supported
Chris@10	14 (@pxref{Cycle Counters}). (On Unix, there will be a warning at the end
Chris@10	15 of the @code{configure} output if no cycle counter is found.)
Chris@10	16 @cindex cycle counter
Chris@10	17 @cindex compiler
Chris@10	18 @cindex portability
Chris@10	19
Chris@10	20
Chris@10	21 Installation of FFTW is simplest if you have a Unix or a GNU system,
Chris@10	22 such as GNU/Linux, and we describe this case in the first section below,
Chris@10	23 including the use of special configuration options to e.g. install
Chris@10	24 different precisions or exploit optimizations for particular
Chris@10	25 architectures (e.g. SIMD). Compilation on non-Unix systems is a more
Chris@10	26 manual process, but we outline the procedure in the second section. It
Chris@10	27 is also likely that pre-compiled binaries will be available for popular
Chris@10	28 systems.
Chris@10	29
Chris@10	30 Finally, we describe how you can customize FFTW for particular needs by
Chris@10	31 generating @emph{codelets} for fast transforms of sizes not supported
Chris@10	32 efficiently by the standard FFTW distribution.
Chris@10	33 @cindex codelet
Chris@10	34
Chris@10	35 @menu
Chris@10	36 * Installation on Unix::
Chris@10	37 * Installation on non-Unix systems::
Chris@10	38 * Cycle Counters::
Chris@10	39 * Generating your own code::
Chris@10	40 @end menu
Chris@10	41
Chris@10	42 @c ------------------------------------------------------------
Chris@10	43
Chris@10	44 @node Installation on Unix, Installation on non-Unix systems, Installation and Customization, Installation and Customization
Chris@10	45 @section Installation on Unix
Chris@10	46
Chris@10	47 FFTW comes with a @code{configure} program in the GNU style.
Chris@10	48 Installation can be as simple as:
Chris@10	49 @fpindex configure
Chris@10	50
Chris@10	51 @example
Chris@10	52 ./configure
Chris@10	53 make
Chris@10	54 make install
Chris@10	55 @end example
Chris@10	56
Chris@10	57 This will build the uniprocessor complex and real transform libraries
Chris@10	58 along with the test programs. (We recommend that you use GNU
Chris@10	59 @code{make} if it is available; on some systems it is called
Chris@10	60 @code{gmake}.) The ``@code{make install}'' command installs the fftw
Chris@10	61 and rfftw libraries in standard places, and typically requires root
Chris@10	62 privileges (unless you specify a different install directory with the
Chris@10	63 @code{--prefix} flag to @code{configure}). You can also type
Chris@10	64 ``@code{make check}'' to put the FFTW test programs through their paces.
Chris@10	65 If you have problems during configuration or compilation, you may want
Chris@10	66 to run ``@code{make distclean}'' before trying again; this ensures that
Chris@10	67 you don't have any stale files left over from previous compilation
Chris@10	68 attempts.
Chris@10	69
Chris@10	70 The @code{configure} script chooses the @code{gcc} compiler by default,
Chris@10	71 if it is available; you can select some other compiler with:
Chris@10	72 @example
Chris@10	73 ./configure CC="@r{@i{<the name of your C compiler>}}"
Chris@10	74 @end example
Chris@10	75
Chris@10	76 The @code{configure} script knows good @code{CFLAGS} (C compiler flags)
Chris@10	77 @cindex compiler flags
Chris@10	78 for a few systems. If your system is not known, the @code{configure}
Chris@10	79 script will print out a warning. In this case, you should re-configure
Chris@10	80 FFTW with the command
Chris@10	81 @example
Chris@10	82 ./configure CFLAGS="@r{@i{<write your CFLAGS here>}}"
Chris@10	83 @end example
Chris@10	84 and then compile as usual. If you do find an optimal set of
Chris@10	85 @code{CFLAGS} for your system, please let us know what they are (along
Chris@10	86 with the output of @code{config.guess}) so that we can include them in
Chris@10	87 future releases.
Chris@10	88
Chris@10	89 @code{configure} supports all the standard flags defined by the GNU
Chris@10	90 Coding Standards; see the @code{INSTALL} file in FFTW or
Chris@10	91 @uref{http://www.gnu.org/prep/standards/html_node/index.html, the GNU web page}.
Chris@10	92 Note especially @code{--help} to list all flags and
Chris@10	93 @code{--enable-shared} to create shared, rather than static, libraries.
Chris@10	94 @code{configure} also accepts a few FFTW-specific flags, particularly:
Chris@10	95
Chris@10	96 @itemize @bullet
Chris@10	97
Chris@10	98 @item
Chris@10	99 @cindex precision
Chris@10	100 @code{--enable-float}: Produces a single-precision version of FFTW
Chris@10	101 (@code{float}) instead of the default double-precision (@code{double}).
Chris@10	102 @xref{Precision}.
Chris@10	103
Chris@10	104 @item
Chris@10	105 @cindex precision
Chris@10	106 @code{--enable-long-double}: Produces a long-double precision version of
Chris@10	107 FFTW (@code{long double}) instead of the default double-precision
Chris@10	108 (@code{double}). The @code{configure} script will halt with an error
Chris@10	109 message if @code{long double} is the same size as @code{double} on your
Chris@10	110 machine/compiler. @xref{Precision}.
Chris@10	111
Chris@10	112 @item
Chris@10	113 @cindex precision
Chris@10	114 @code{--enable-quad-precision}: Produces a quadruple-precision version
Chris@10	115 of FFTW using the nonstandard @code{__float128} type provided by
Chris@10	116 @code{gcc} 4.6 or later on x86, x86-64, and Itanium architectures,
Chris@10	117 instead of the default double-precision (@code{double}). The
Chris@10	118 @code{configure} script will halt with an error message if the
Chris@10	119 compiler is not @code{gcc} version 4.6 or later or if @code{gcc}'s
Chris@10	120 @code{libquadmath} library is not installed. @xref{Precision}.
Chris@10	121
Chris@10	122 @item
Chris@10	123 @cindex threads
Chris@10	124 @code{--enable-threads}: Enables compilation and installation of the
Chris@10	125 FFTW threads library (@pxref{Multi-threaded FFTW}), which provides a
Chris@10	126 simple interface to parallel transforms for SMP systems. By default,
Chris@10	127 the threads routines are not compiled.
Chris@10	128
Chris@10	129 @item
Chris@10	130 @code{--enable-openmp}: Like @code{--enable-threads}, but using OpenMP
Chris@10	131 compiler directives in order to induce parallelism rather than
Chris@10	132 spawning its own threads directly, and installing an @samp{fftw3_omp} library
Chris@10	133 rather than an @samp{fftw3_threads} library (@pxref{Multi-threaded
Chris@10	134 FFTW}). You can use both @code{--enable-openmp} and @code{--enable-threads}
Chris@10	135 since they compile/install libraries with different names. By default,
Chris@10	136 the OpenMP routines are not compiled.
Chris@10	137
Chris@10	138 @item
Chris@10	139 @code{--with-combined-threads}: By default, if @code{--enable-threads}
Chris@10	140 is used, the threads support is compiled into a separate library that
Chris@10	141 must be linked in addition to the main FFTW library. This is so that
Chris@10	142 users of the serial library do not need to link the system threads
Chris@10	143 libraries. If @code{--with-combined-threads} is specified, however,
Chris@10	144 then no separate threads library is created, and threads are included
Chris@10	145 in the main FFTW library. This is mainly useful under Windows, where
Chris@10	146 no system threads library is required and inter-library dependencies
Chris@10	147 are problematic.
Chris@10	148
Chris@10	149 @item
Chris@10	150 @cindex MPI
Chris@10	151 @code{--enable-mpi}: Enables compilation and installation of the FFTW
Chris@10	152 MPI library (@pxref{Distributed-memory FFTW with MPI}), which provides
Chris@10	153 parallel transforms for distributed-memory systems with MPI. (By
Chris@10	154 default, the MPI routines are not compiled.) @xref{FFTW MPI
Chris@10	155 Installation}.
Chris@10	156
Chris@10	157 @item
Chris@10	158 @cindex Fortran-callable wrappers
Chris@10	159 @code{--disable-fortran}: Disables inclusion of legacy-Fortran
Chris@10	160 wrapper routines (@pxref{Calling FFTW from Legacy Fortran}) in the standard
Chris@10	161 FFTW libraries. These wrapper routines increase the library size by
Chris@10	162 only a negligible amount, so they are included by default as long as
Chris@10	163 the @code{configure} script finds a Fortran compiler on your system.
Chris@10	164 (To specify a particular Fortran compiler @i{foo}, pass
Chris@10	165 @code{F77=}@i{foo} to @code{configure}.)
Chris@10	166
Chris@10	167 @item
Chris@10	168 @code{--with-g77-wrappers}: By default, when Fortran wrappers are
Chris@10	169 included, the wrappers employ the linking conventions of the Fortran
Chris@10	170 compiler detected by the @code{configure} script. If this compiler is
Chris@10	171 GNU @code{g77}, however, then @emph{two} versions of the wrappers are
Chris@10	172 included: one with @code{g77}'s idiosyncratic convention of appending
Chris@10	173 two underscores to identifiers, and one with the more common
Chris@10	174 convention of appending only a single underscore. This way, the same
Chris@10	175 FFTW library will work with both @code{g77} and other Fortran
Chris@10	176 compilers, such as GNU @code{gfortran}. However, the converse is not
Chris@10	177 true: if you configure with a different compiler, then the
Chris@10	178 @code{g77}-compatible wrappers are not included. By specifying
Chris@10	179 @code{--with-g77-wrappers}, the @code{g77}-compatible wrappers are
Chris@10	180 included in addition to wrappers for whatever Fortran compiler
Chris@10	181 @code{configure} finds.
Chris@10	182 @fpindex g77
Chris@10	183
Chris@10	184 @item
Chris@10	185 @code{--with-slow-timer}: Disables the use of hardware cycle counters,
Chris@10	186 and falls back on @code{gettimeofday} or @code{clock}. This greatly
Chris@10	187 worsens performance, and should generally not be used (unless you don't
Chris@10	188 have a cycle counter but still really want an optimized plan regardless
Chris@10	189 of the time). @xref{Cycle Counters}.
Chris@10	190
Chris@10	191 @item
Chris@10	192 @code{--enable-sse}, @code{--enable-sse2}, @code{--enable-avx},
Chris@10	193 @code{--enable-altivec}, @code{--enable-neon}: Enable the compilation of
Chris@10	194 SIMD code for SSE (Pentium III+), SSE2 (Pentium IV+), AVX (Sandy Bridge,
Chris@10	195 Interlagos), AltiVec (PowerPC G4+), NEON (some ARM processors). SSE,
Chris@10	196 AltiVec, and NEON only work with @code{--enable-float} (above). SSE2
Chris@10	197 works in both single and double precision (and is simply SSE in single
Chris@10	198 precision). The resulting code will @emph{still work} on earlier CPUs
Chris@10	199 lacking the SIMD extensions (SIMD is automatically disabled, although
Chris@10	200 the FFTW library is still larger).
Chris@10	201 @itemize @minus
Chris@10	202 @item
Chris@10	203 These options require a compiler supporting SIMD extensions, and
Chris@10	204 compiler support is always a bit flaky: see the FFTW FAQ for a list of
Chris@10	205 compiler versions that have problems compiling FFTW.
Chris@10	206 @item
Chris@10	207 With AltiVec and @code{gcc}, you may have to use the
Chris@10	208 @code{-mabi=altivec} option when compiling any code that links to FFTW,
Chris@10	209 in order to properly align the stack; otherwise, FFTW could crash when
Chris@10	210 it tries to use an AltiVec feature. (This is not necessary on MacOS X.)
Chris@10	211 @item
Chris@10	212 With SSE/SSE2 and @code{gcc}, you should use a version of gcc that
Chris@10	213 properly aligns the stack when compiling any code that links to FFTW.
Chris@10	214 By default, @code{gcc} 2.95 and later versions align the stack as
Chris@10	215 needed, but you should not compile FFTW with the @code{-Os} option or the
Chris@10	216 @code{-mpreferred-stack-boundary} option with an argument less than 4.
Chris@10	217 @item
Chris@10	218 Because of the large variety of ARM processors and ABIs, FFTW
Chris@10	219 does not attempt to guess the correct @code{gcc} flags for generating
Chris@10	220 NEON code. In general, you will have to provide them on the command line.
Chris@10	221 This command line is known to have worked at least once:
Chris@10	222 @example
Chris@10	223 ./configure --with-slow-timer --host=arm-linux-gnueabi \
Chris@10	224 --enable-single --enable-neon \
Chris@10	225 "CC=arm-linux-gnueabi-gcc -march=armv7-a -mfloat-abi=softfp"
Chris@10	226 @end example
Chris@10	227 @end itemize
Chris@10	228
Chris@10	229 @end itemize
Chris@10	230
Chris@10	231 @cindex compiler
Chris@10	232 To force @code{configure} to use a particular C compiler @i{foo}
Chris@10	233 (instead of the default, usually @code{gcc}), pass @code{CC=}@i{foo} to the
Chris@10	234 @code{configure} script; you may also need to set the flags via the variable
Chris@10	235 @code{CFLAGS} as described above.
Chris@10	236 @cindex compiler flags
Chris@10	237
Chris@10	238 @c ------------------------------------------------------------
Chris@10	239 @node Installation on non-Unix systems, Cycle Counters, Installation on Unix, Installation and Customization
Chris@10	240 @section Installation on non-Unix systems
Chris@10	241
Chris@10	242 It should be relatively straightforward to compile FFTW even on non-Unix
Chris@10	243 systems lacking the niceties of a @code{configure} script. Basically,
Chris@10	244 you need to edit the @code{config.h} header (copy it from
Chris@10	245 @code{config.h.in}) to @code{#define} the various options and compiler
Chris@10	246 characteristics, and then compile all the @samp{.c} files in the
Chris@10	247 relevant directories.
Chris@10	248
Chris@10	249 The @code{config.h} header contains about 100 options to set, each one
Chris@10	250 initially an @code{#undef}, each documented with a comment, and most of
Chris@10	251 them fairly obvious. For most of the options, you should simply
Chris@10	252 @code{#define} them to @code{1} if they are applicable, although a few
Chris@10	253 options require a particular value (e.g. @code{SIZEOF_LONG_LONG} should
Chris@10	254 be defined to the size of the @code{long long} type, in bytes, or zero
Chris@10	255 if it is not supported). We will likely post some sample
Chris@10	256 @code{config.h} files for various operating systems and compilers for
Chris@10	257 you to use (at least as a starting point). Please let us know if you
Chris@10	258 have to hand-create a configuration file (and/or a pre-compiled binary)
Chris@10	259 that you want to share.
Chris@10	260
Chris@10	261 To create the FFTW library, you will then need to compile all of the
Chris@10	262 @samp{.c} files in the @code{kernel}, @code{dft}, @code{dft/scalar},
Chris@10	263 @code{dft/scalar/codelets}, @code{rdft}, @code{rdft/scalar},
Chris@10	264 @code{rdft/scalar/r2cf}, @code{rdft/scalar/r2cb},
Chris@10	265 @code{rdft/scalar/r2r}, @code{reodft}, and @code{api} directories.
Chris@10	266 If you are compiling with SIMD support (e.g. you defined
Chris@10	267 @code{HAVE_SSE2} in @code{config.h}), then you also need to compile
Chris@10	268 the @code{.c} files in the @code{simd-support},
Chris@10	269 @code{@{dft,rdft@}/simd}, @code{@{dft,rdft@}/simd/*} directories.
Chris@10	270
Chris@10	271 Once these files are all compiled, link them into a library, or a shared
Chris@10	272 library, or directly into your program.
Chris@10	273
Chris@10	274 To compile the FFTW test program, additionally compile the code in the
Chris@10	275 @code{libbench2/} directory, and link it into a library. Then compile
Chris@10	276 the code in the @code{tests/} directory and link it to the
Chris@10	277 @code{libbench2} and FFTW libraries. To compile the @code{fftw-wisdom}
Chris@10	278 (command-line) tool (@pxref{Wisdom Utilities}), compile
Chris@10	279 @code{tools/fftw-wisdom.c} and link it to the @code{libbench2} and FFTW
Chris@10	280 libraries
Chris@10	281
Chris@10	282 @c ------------------------------------------------------------
Chris@10	283 @node Cycle Counters, Generating your own code, Installation on non-Unix systems, Installation and Customization
Chris@10	284 @section Cycle Counters
Chris@10	285 @cindex cycle counter
Chris@10	286
Chris@10	287 FFTW's planner actually executes and times different possible FFT
Chris@10	288 algorithms in order to pick the fastest plan for a given @math{n}. In
Chris@10	289 order to do this in as short a time as possible, however, the timer must
Chris@10	290 have a very high resolution, and to accomplish this we employ the
Chris@10	291 hardware @dfn{cycle counters} that are available on most CPUs.
Chris@10	292 Currently, FFTW supports the cycle counters on x86, PowerPC/POWER, Alpha,
Chris@10	293 UltraSPARC (SPARC v9), IA64, PA-RISC, and MIPS processors.
Chris@10	294
Chris@10	295 @cindex compiler
Chris@10	296 Access to the cycle counters, unfortunately, is a compiler and/or
Chris@10	297 operating-system dependent task, often requiring inline assembly
Chris@10	298 language, and it may be that your compiler is not supported. If you are
Chris@10	299 @emph{not} supported, FFTW will by default fall back on its estimator
Chris@10	300 (effectively using @code{FFTW_ESTIMATE} for all plans).
Chris@10	301 @ctindex FFTW_ESTIMATE
Chris@10	302
Chris@10	303 You can add support by editing the file @code{kernel/cycle.h}; normally,
Chris@10	304 this will involve adapting one of the examples already present in order
Chris@10	305 to use the inline-assembler syntax for your C compiler, and will only
Chris@10	306 require a couple of lines of code. Anyone adding support for a new
Chris@10	307 system to @code{cycle.h} is encouraged to email us at @email{fftw@@fftw.org}.
Chris@10	308
Chris@10	309 If a cycle counter is not available on your system (e.g. some embedded
Chris@10	310 processor), and you don't want to use estimated plans, as a last resort
Chris@10	311 you can use the @code{--with-slow-timer} option to @code{configure} (on
Chris@10	312 Unix) or @code{#define WITH_SLOW_TIMER} in @code{config.h} (elsewhere).
Chris@10	313 This will use the much lower-resolution @code{gettimeofday} function, or even
Chris@10	314 @code{clock} if the former is unavailable, and planning will be
Chris@10	315 extremely slow.
Chris@10	316
Chris@10	317 @c ------------------------------------------------------------
Chris@10	318 @node Generating your own code, , Cycle Counters, Installation and Customization
Chris@10	319 @section Generating your own code
Chris@10	320 @cindex code generator
Chris@10	321
Chris@10	322 The directory @code{genfft} contains the programs that were used to
Chris@10	323 generate FFTW's ``codelets,'' which are hard-coded transforms of small
Chris@10	324 sizes.
Chris@10	325 @cindex codelet
Chris@10	326 We do not expect casual users to employ the generator, which is a rather
Chris@10	327 sophisticated program that generates directed acyclic graphs of FFT
Chris@10	328 algorithms and performs algebraic simplifications on them. It was
Chris@10	329 written in Objective Caml, a dialect of ML, which is available at
Chris@10	330 @uref{http://caml.inria.fr/ocaml/index.en.html}.
Chris@10	331 @cindex Caml
Chris@10	332
Chris@10	333
Chris@10	334 If you have Objective Caml installed (along with recent versions of
Chris@10	335 GNU @code{autoconf}, @code{automake}, and @code{libtool}), then you
Chris@10	336 can change the set of codelets that are generated or play with the
Chris@10	337 generation options. The set of generated codelets is specified by the
Chris@10	338 @code{@{dft,rdft@}/@{codelets,simd@}/*/Makefile.am} files. For example, you can add
Chris@10	339 efficient REDFT codelets of small sizes by modifying
Chris@10	340 @code{rdft/codelets/r2r/Makefile.am}.
Chris@10	341 @cindex REDFT
Chris@10	342 After you modify any @code{Makefile.am} files, you can type @code{sh
Chris@10	343 bootstrap.sh} in the top-level directory followed by @code{make} to
Chris@10	344 re-generate the files.
Chris@10	345
Chris@10	346 We do not provide more details about the code-generation process, since
Chris@10	347 we do not expect that most users will need to generate their own code.
Chris@10	348 However, feel free to contact us at @email{fftw@@fftw.org} if
Chris@10	349 you are interested in the subject.
Chris@10	350
Chris@10	351 @cindex monadic programming
Chris@10	352 You might find it interesting to learn Caml and/or some modern
Chris@10	353 programming techniques that we used in the generator (including monadic
Chris@10	354 programming), especially if you heard the rumor that Java and
Chris@10	355 object-oriented programming are the latest advancement in the field.
Chris@10	356 The internal operation of the codelet generator is described in the
Chris@10	357 paper, ``A Fast Fourier Transform Compiler,'' by M. Frigo, which is
Chris@10	358 available from the @uref{http://www.fftw.org,FFTW home page} and also
Chris@10	359 appeared in the @cite{Proceedings of the 1999 ACM SIGPLAN Conference on
Chris@10	360 Programming Language Design and Implementation (PLDI)}.
Chris@10	361

Mercurial > hg > sv-dependency-builds

annotate src/fftw-3.3.3/doc/install.texi @ 23:619f715526df sv_v2.1