Chris@42: Chris@42: Chris@42: Chris@42: Chris@42:
Chris@42:Chris@42: Next: Distributed-memory FFTW with MPI, Previous: FFTW Reference, Up: Top [Contents][Index]
Chris@42:In this chapter we document the parallel FFTW routines for Chris@42: shared-memory parallel hardware. These routines, which support Chris@42: parallel one- and multi-dimensional transforms of both real and Chris@42: complex data, are the easiest way to take advantage of multiple Chris@42: processors with FFTW. They work just like the corresponding Chris@42: uniprocessor transform routines, except that you have an extra Chris@42: initialization routine to call, and there is a routine to set the Chris@42: number of threads to employ. Any program that uses the uniprocessor Chris@42: FFTW can therefore be trivially modified to use the multi-threaded Chris@42: FFTW. Chris@42:
Chris@42:A shared-memory machine is one in which all CPUs can directly access Chris@42: the same main memory, and such machines are now common due to the Chris@42: ubiquity of multi-core CPUs. FFTW’s multi-threading support allows Chris@42: you to utilize these additional CPUs transparently from a single Chris@42: program. However, this does not necessarily translate into Chris@42: performance gains—when multiple threads/CPUs are employed, there is Chris@42: an overhead required for synchronization that may outweigh the Chris@42: computatational parallelism. Therefore, you can only benefit from Chris@42: threads if your problem is sufficiently large. Chris@42: Chris@42: Chris@42:
Chris@42:• Installation and Supported Hardware/Software: | Chris@42: | |
• Usage of Multi-threaded FFTW: | Chris@42: | |
• How Many Threads to Use?: | Chris@42: | |
• Thread safety: | Chris@42: |