cannam@95: cannam@95:
cannam@95:cannam@95: Next: Distributed-memory FFTW with MPI, cannam@95: Previous: FFTW Reference, cannam@95: Up: Top cannam@95:
In this chapter we document the parallel FFTW routines for cannam@95: shared-memory parallel hardware. These routines, which support cannam@95: parallel one- and multi-dimensional transforms of both real and cannam@95: complex data, are the easiest way to take advantage of multiple cannam@95: processors with FFTW. They work just like the corresponding cannam@95: uniprocessor transform routines, except that you have an extra cannam@95: initialization routine to call, and there is a routine to set the cannam@95: number of threads to employ. Any program that uses the uniprocessor cannam@95: FFTW can therefore be trivially modified to use the multi-threaded cannam@95: FFTW. cannam@95: cannam@95:
A shared-memory machine is one in which all CPUs can directly access cannam@95: the same main memory, and such machines are now common due to the cannam@95: ubiquity of multi-core CPUs. FFTW's multi-threading support allows cannam@95: you to utilize these additional CPUs transparently from a single cannam@95: program. However, this does not necessarily translate into cannam@95: performance gains—when multiple threads/CPUs are employed, there is cannam@95: an overhead required for synchronization that may outweigh the cannam@95: computatational parallelism. Therefore, you can only benefit from cannam@95: threads if your problem is sufficiently large. cannam@95: cannam@95: cannam@95:
cannam@95: cannam@95: cannam@95: cannam@95: