d@0: d@0:
d@0:d@0: d@0: Next: Multi-dimensional MPI DFT of Real Data, d@0: Previous: Simple MPI example, d@0: Up: Distributed-memory FFTW with MPI d@0:
d@0: The most important concept to understand in using FFTW's MPI interface d@0: is the data distribution. With a serial or multithreaded FFT, all of d@0: the input and outputs are stored as a single contiguous chunk of d@0: memory. With a distributed-memory FFT, the inputs and outputs are d@0: broken into disjoint blocks, one per process. d@0: d@0:
In particular, FFTW uses a 1d block distribution of the data,
d@0: distributed along the first dimension. For example, if you
d@0: want to perform a 100 × 200 complex DFT, distributed over 4
d@0: processes, each process will get a 25 × 200 slice of the data.
d@0: That is, process 0 will get rows 0 through 24, process 1 will get rows
d@0: 25 through 49, process 2 will get rows 50 through 74, and process 3
d@0: will get rows 75 through 99. If you take the same array but
d@0: distribute it over 3 processes, then it is not evenly divisible so the
d@0: different processes will have unequal chunks. FFTW's default choice
d@0: in this case is to assign 34 rows to processes 0 and 1, and 32 rows to
d@0: process 2.
d@0:
d@0: FFTW provides several `fftw_mpi_local_size' routines that you can
d@0: call to find out what portion of an array is stored on the current
d@0: process. In most cases, you should use the default block sizes picked
d@0: by FFTW, but it is also possible to specify your own block size. For
d@0: example, with a 100 × 200 array on three processes, you can
d@0: tell FFTW to use a block size of 40, which would assign 40 rows to
d@0: processes 0 and 1, and 20 rows to process 2. FFTW's default is to
d@0: divide the data equally among the processes if possible, and as best
d@0: it can otherwise. The rows are always assigned in “rank order,”
d@0: i.e. process 0 gets the first block of rows, then process 1, and so
d@0: on. (You can change this by using MPI_Comm_split
to create a
d@0: new communicator with re-ordered processes.) However, you should
d@0: always call the `fftw_mpi_local_size' routines, if possible,
d@0: rather than trying to predict FFTW's distribution choices.
d@0:
d@0: