Chris@19: Chris@19: Chris@19: Distributed-memory FFTW with MPI - FFTW 3.3.4 Chris@19: Chris@19: Chris@19: Chris@19: Chris@19: Chris@19: Chris@19: Chris@19: Chris@19: Chris@19: Chris@19: Chris@19: Chris@19:
Chris@19: Chris@19: Chris@19:

Chris@19: Next: , Chris@19: Previous: Multi-threaded FFTW, Chris@19: Up: Top Chris@19:


Chris@19:
Chris@19: Chris@19:

6 Distributed-memory FFTW with MPI

Chris@19: Chris@19:

Chris@19: In this chapter we document the parallel FFTW routines for parallel Chris@19: systems supporting the MPI message-passing interface. Unlike the Chris@19: shared-memory threads described in the previous chapter, MPI allows Chris@19: you to use distributed-memory parallelism, where each CPU has Chris@19: its own separate memory, and which can scale up to clusters of many Chris@19: thousands of processors. This capability comes at a price, however: Chris@19: each process only stores a portion of the data to be Chris@19: transformed, which means that the data structures and Chris@19: programming-interface are quite different from the serial or threads Chris@19: versions of FFTW. Chris@19: Chris@19: Chris@19:

Distributed-memory parallelism is especially useful when you are Chris@19: transforming arrays so large that they do not fit into the memory of a Chris@19: single processor. The storage per-process required by FFTW's MPI Chris@19: routines is proportional to the total array size divided by the number Chris@19: of processes. Conversely, distributed-memory parallelism can easily Chris@19: pose an unacceptably high communications overhead for small problems; Chris@19: the threshold problem size for which parallelism becomes advantageous Chris@19: will depend on the precise problem you are interested in, your Chris@19: hardware, and your MPI implementation. Chris@19: Chris@19:

A note on terminology: in MPI, you divide the data among a set of Chris@19: “processes” which each run in their own memory address space. Chris@19: Generally, each process runs on a different physical processor, but Chris@19: this is not required. A set of processes in MPI is described by an Chris@19: opaque data structure called a “communicator,” the most common of Chris@19: which is the predefined communicator MPI_COMM_WORLD which Chris@19: refers to all processes. For more information on these and Chris@19: other concepts common to all MPI programs, we refer the reader to the Chris@19: documentation at the MPI home page. Chris@19: Chris@19: Chris@19:

We assume in this chapter that the reader is familiar with the usage Chris@19: of the serial (uniprocessor) FFTW, and focus only on the concepts new Chris@19: to the MPI interface. Chris@19: Chris@19:

Chris@19: Chris@19: Chris@19: Chris@19: