FFTW 3.3.5: Distributed-memory FFTW with MPI

Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: FFTW 3.3.5: Distributed-memory FFTW with MPI Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42:

Chris@42: Chris@42:

6 Distributed-memory FFTW with MPI

Chris@42: Chris@42: Chris@42: Chris@42:

In this chapter we document the parallel FFTW routines for parallel Chris@42: systems supporting the MPI message-passing interface. Unlike the Chris@42: shared-memory threads described in the previous chapter, MPI allows Chris@42: you to use distributed-memory parallelism, where each CPU has Chris@42: its own separate memory, and which can scale up to clusters of many Chris@42: thousands of processors. This capability comes at a price, however: Chris@42: each process only stores a portion of the data to be Chris@42: transformed, which means that the data structures and Chris@42: programming-interface are quite different from the serial or threads Chris@42: versions of FFTW. Chris@42: Chris@42:

Chris@42: Chris@42:

Distributed-memory parallelism is especially useful when you are Chris@42: transforming arrays so large that they do not fit into the memory of a Chris@42: single processor. The storage per-process required by FFTW’s MPI Chris@42: routines is proportional to the total array size divided by the number Chris@42: of processes. Conversely, distributed-memory parallelism can easily Chris@42: pose an unacceptably high communications overhead for small problems; Chris@42: the threshold problem size for which parallelism becomes advantageous Chris@42: will depend on the precise problem you are interested in, your Chris@42: hardware, and your MPI implementation. Chris@42:

Chris@42:

A note on terminology: in MPI, you divide the data among a set of Chris@42: “processes” which each run in their own memory address space. Chris@42: Generally, each process runs on a different physical processor, but Chris@42: this is not required. A set of processes in MPI is described by an Chris@42: opaque data structure called a “communicator,” the most common of Chris@42: which is the predefined communicator MPI_COMM_WORLD which Chris@42: refers to all processes. For more information on these and Chris@42: other concepts common to all MPI programs, we refer the reader to the Chris@42: documentation at the MPI home Chris@42: page. Chris@42: Chris@42: Chris@42:

Chris@42: Chris@42:

We assume in this chapter that the reader is familiar with the usage Chris@42: of the serial (uniprocessor) FFTW, and focus only on the concepts new Chris@42: to the MPI interface. Chris@42:

Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42:

Chris@42: Chris@42: Chris@42: Chris@42: Chris@42: Chris@42:

• FFTW MPI Installation:		Chris@42:
• Linking and Initializing MPI FFTW:		Chris@42:
• 2d MPI example:		Chris@42:
• MPI Data Distribution:		Chris@42:
• Multi-dimensional MPI DFTs of Real Data:		Chris@42:
• Other Multi-dimensional Real-data MPI Transforms:		Chris@42:
• FFTW MPI Transposes:		Chris@42:
• FFTW MPI Wisdom:		Chris@42:
• Avoiding MPI Deadlocks:		Chris@42:
• FFTW MPI Performance Tips:		Chris@42:
• Combining MPI and Threads:		Chris@42:
• FFTW MPI Reference:		Chris@42:
• FFTW MPI Fortran Interface:		Chris@42: