Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: FFTW 3.3.8: Multi-threaded FFTW Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82:
Chris@82:

Chris@82: Next: , Previous: , Up: Top   [Contents][Index]

Chris@82:
Chris@82:
Chris@82: Chris@82:

5 Multi-threaded FFTW

Chris@82: Chris@82: Chris@82:

In this chapter we document the parallel FFTW routines for Chris@82: shared-memory parallel hardware. These routines, which support Chris@82: parallel one- and multi-dimensional transforms of both real and Chris@82: complex data, are the easiest way to take advantage of multiple Chris@82: processors with FFTW. They work just like the corresponding Chris@82: uniprocessor transform routines, except that you have an extra Chris@82: initialization routine to call, and there is a routine to set the Chris@82: number of threads to employ. Any program that uses the uniprocessor Chris@82: FFTW can therefore be trivially modified to use the multi-threaded Chris@82: FFTW. Chris@82:

Chris@82:

A shared-memory machine is one in which all CPUs can directly access Chris@82: the same main memory, and such machines are now common due to the Chris@82: ubiquity of multi-core CPUs. FFTW’s multi-threading support allows Chris@82: you to utilize these additional CPUs transparently from a single Chris@82: program. However, this does not necessarily translate into Chris@82: performance gains—when multiple threads/CPUs are employed, there is Chris@82: an overhead required for synchronization that may outweigh the Chris@82: computatational parallelism. Therefore, you can only benefit from Chris@82: threads if your problem is sufficiently large. Chris@82: Chris@82: Chris@82:

Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: