d@0: d@0:
d@0:d@0: d@0: d@0: Next: FFTW on the Cell Processor, d@0: Previous: FFTW Reference, d@0: Up: Top d@0:
In this chapter we document the parallel FFTW routines for d@0: shared-memory parallel hardware. These routines, which support d@0: parallel one- and multi-dimensional transforms of both real and d@0: complex data, are the easiest way to take advantage of multiple d@0: processors with FFTW. They work just like the corresponding d@0: uniprocessor transform routines, except that you have an extra d@0: initialization routine to call, and there is a routine to set the d@0: number of threads to employ. Any program that uses the uniprocessor d@0: FFTW can therefore be trivially modified to use the multi-threaded d@0: FFTW. d@0: d@0:
A shared-memory machine is one in which all CPUs can directly access d@0: the same main memory, and such machines are now common due to the d@0: ubiquity of multi-core CPUs. FFTW's multi-threading support allows d@0: you to utilize these additional CPUs transparently from a single d@0: program. However, this does not necessarily translate into d@0: performance gains—when multiple threads/CPUs are employed, there is d@0: an overhead required for synchronization that may outweigh the d@0: computatational parallelism. Therefore, you can only benefit from d@0: threads if your problem is sufficiently large. d@0: d@0: d@0:
d@0: d@0: d@0: d@0: