Chris@19: Chris@19:
Chris@19:Chris@19: Next: Distributed-memory FFTW with MPI, Chris@19: Previous: FFTW Reference, Chris@19: Up: Top Chris@19:
In this chapter we document the parallel FFTW routines for Chris@19: shared-memory parallel hardware. These routines, which support Chris@19: parallel one- and multi-dimensional transforms of both real and Chris@19: complex data, are the easiest way to take advantage of multiple Chris@19: processors with FFTW. They work just like the corresponding Chris@19: uniprocessor transform routines, except that you have an extra Chris@19: initialization routine to call, and there is a routine to set the Chris@19: number of threads to employ. Any program that uses the uniprocessor Chris@19: FFTW can therefore be trivially modified to use the multi-threaded Chris@19: FFTW. Chris@19: Chris@19:
A shared-memory machine is one in which all CPUs can directly access Chris@19: the same main memory, and such machines are now common due to the Chris@19: ubiquity of multi-core CPUs. FFTW's multi-threading support allows Chris@19: you to utilize these additional CPUs transparently from a single Chris@19: program. However, this does not necessarily translate into Chris@19: performance gains—when multiple threads/CPUs are employed, there is Chris@19: an overhead required for synchronization that may outweigh the Chris@19: computatational parallelism. Therefore, you can only benefit from Chris@19: threads if your problem is sufficiently large. Chris@19: Chris@19: Chris@19:
Chris@19: Chris@19: Chris@19: Chris@19: