FFTW 3.3.8: Distributed-memory FFTW with MPI

Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: FFTW 3.3.8: Distributed-memory FFTW with MPI Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82:

Chris@82: Chris@82:

6 Distributed-memory FFTW with MPI

Chris@82: Chris@82: Chris@82: Chris@82:

In this chapter we document the parallel FFTW routines for parallel Chris@82: systems supporting the MPI message-passing interface. Unlike the Chris@82: shared-memory threads described in the previous chapter, MPI allows Chris@82: you to use distributed-memory parallelism, where each CPU has Chris@82: its own separate memory, and which can scale up to clusters of many Chris@82: thousands of processors. This capability comes at a price, however: Chris@82: each process only stores a portion of the data to be Chris@82: transformed, which means that the data structures and Chris@82: programming-interface are quite different from the serial or threads Chris@82: versions of FFTW. Chris@82: Chris@82:

Chris@82: Chris@82:

Distributed-memory parallelism is especially useful when you are Chris@82: transforming arrays so large that they do not fit into the memory of a Chris@82: single processor. The storage per-process required by FFTW’s MPI Chris@82: routines is proportional to the total array size divided by the number Chris@82: of processes. Conversely, distributed-memory parallelism can easily Chris@82: pose an unacceptably high communications overhead for small problems; Chris@82: the threshold problem size for which parallelism becomes advantageous Chris@82: will depend on the precise problem you are interested in, your Chris@82: hardware, and your MPI implementation. Chris@82:

Chris@82:

A note on terminology: in MPI, you divide the data among a set of Chris@82: “processes” which each run in their own memory address space. Chris@82: Generally, each process runs on a different physical processor, but Chris@82: this is not required. A set of processes in MPI is described by an Chris@82: opaque data structure called a “communicator,” the most common of Chris@82: which is the predefined communicator MPI_COMM_WORLD which Chris@82: refers to all processes. For more information on these and Chris@82: other concepts common to all MPI programs, we refer the reader to the Chris@82: documentation at the MPI home Chris@82: page. Chris@82: Chris@82: Chris@82:

Chris@82: Chris@82:

We assume in this chapter that the reader is familiar with the usage Chris@82: of the serial (uniprocessor) FFTW, and focus only on the concepts new Chris@82: to the MPI interface. Chris@82:

Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82:

Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82:

• FFTW MPI Installation:		Chris@82:
• Linking and Initializing MPI FFTW:		Chris@82:
• 2d MPI example:		Chris@82:
• MPI Data Distribution:		Chris@82:
• Multi-dimensional MPI DFTs of Real Data:		Chris@82:
• Other Multi-dimensional Real-data MPI Transforms:		Chris@82:
• FFTW MPI Transposes:		Chris@82:
• FFTW MPI Wisdom:		Chris@82:
• Avoiding MPI Deadlocks:		Chris@82:
• FFTW MPI Performance Tips:		Chris@82:
• Combining MPI and Threads:		Chris@82:
• FFTW MPI Reference:		Chris@82:
• FFTW MPI Fortran Interface:		Chris@82: