Chris@42:
Chris@42: The innovation (if it can be so called) in FFTW consists in having a
Chris@42: variety of composable solvers, representing different FFT algorithms and implementation strategies, whose combination into a
Chris@42: particular plan for a given size can be determined at runtime according to the characteristics of your machine/compiler.
Chris@42: This peculiar software architecture allows FFTW to adapt itself to
Chris@42: almost any machine.
Chris@42:
Chris@42: For more details (albeit somewhat outdated), see the paper "FFTW:
Chris@42: An Adaptive Software Architecture for the FFT", by M. Frigo and
Chris@42: S. G. Johnson, Proc. ICASSP 3, 1381 (1998), also available at the FFTW web page.
Chris@42:
Chris@42:
Chris@42: This is a complex question, and there is no simple answer. In fact,
Chris@42: the authors do not fully know the answer, either. In addition to many
Chris@42: small performance hacks throughout FFTW, there are three general
Chris@42: reasons for FFTW's speed.
Chris@42:
Chris@42:
FFTW uses a variety of FFT algorithms and implementation styles
Chris@42: that can be arbitrarily composed to adapt itself to
Chris@42: a machine. See Q4.1 `How does FFTW work?'.
Chris@42:
FFTW uses a code generator to produce highly-optimized
Chris@42: routines for computing small transforms.
Chris@42:
Chris@42:
FFTW uses explicit divide-and-conquer to take advantage
Chris@42: of the memory hierarchy.
Chris@42:
Chris@42: For more details (albeit somewhat outdated), see the paper "FFTW:
Chris@42: An Adaptive Software Architecture for the FFT", by M. Frigo and
Chris@42: S. G. Johnson, Proc. ICASSP 3, 1381 (1998), available along with other references at
Chris@42: the FFTW web page.
Chris@42: Next: Known bugs.
Chris@42: Back: Using FFTW.
Chris@42: Return to contents.