FFTW 3.3.8: Multi-dimensional MPI DFTs of Real Data

FFTW’s MPI interface also supports multi-dimensional DFTs of real cannam@167: data, similar to the serial r2c and c2r interfaces. (Parallel cannam@167: one-dimensional real-data DFTs are not currently supported; you must cannam@167: use a complex transform and set the imaginary parts of the inputs to cannam@167: zero.) cannam@167:

The key points to understand for r2c and c2r MPI transforms (compared cannam@167: to the MPI complex DFTs or the serial r2c/c2r transforms), are: cannam@167:

For example suppose we are performing an out-of-place r2c transform of cannam@167: L × M × N cannam@167: real data [padded to L × M × 2(N/2+1) cannam@167: ], cannam@167: resulting in L × M × N/2+1 cannam@167: complex data. Similar to the cannam@167: example in 2d MPI example, we might do something like: cannam@167:

cannam@167:

#include <fftw3-mpi.h>
cannam@167: 
cannam@167: int main(int argc, char **argv)
cannam@167: {
cannam@167:     const ptrdiff_t L = ..., M = ..., N = ...;
cannam@167:     fftw_plan plan;
cannam@167:     double *rin;
cannam@167:     fftw_complex *cout;
cannam@167:     ptrdiff_t alloc_local, local_n0, local_0_start, i, j, k;
cannam@167: 
cannam@167:     MPI_Init(&argc, &argv);
cannam@167:     fftw_mpi_init();
cannam@167: 
cannam@167:     /* get local data size and allocate */
cannam@167:     alloc_local = fftw_mpi_local_size_3d(L, M, N/2+1, MPI_COMM_WORLD,
cannam@167:                                          &local_n0, &local_0_start);
cannam@167:     rin = fftw_alloc_real(2 * alloc_local);
cannam@167:     cout = fftw_alloc_complex(alloc_local);
cannam@167: 
cannam@167:     /* create plan for out-of-place r2c DFT */
cannam@167:     plan = fftw_mpi_plan_dft_r2c_3d(L, M, N, rin, cout, MPI_COMM_WORLD,
cannam@167:                                     FFTW_MEASURE);
cannam@167: 
cannam@167:     /* initialize rin to some function my_func(x,y,z) */
cannam@167:     for (i = 0; i < local_n0; ++i)
cannam@167:        for (j = 0; j < M; ++j)
cannam@167:          for (k = 0; k < N; ++k)
cannam@167:        rin[(i*M + j) * (2*(N/2+1)) + k] = my_func(local_0_start+i, j, k);
cannam@167: 
cannam@167:     /* compute transforms as many times as desired */
cannam@167:     fftw_execute(plan);
cannam@167: 
cannam@167:     fftw_destroy_plan(plan);
cannam@167: 
cannam@167:     MPI_Finalize();
cannam@167: }
cannam@167:

Note that we allocated rin using fftw_alloc_real with an cannam@167: argument of 2 * alloc_local: since alloc_local is the cannam@167: number of complex values to allocate, the number of real cannam@167: values is twice as many. The rin array is then cannam@167: local_n0 × M × 2(N/2+1) cannam@167: in row-major order, so its cannam@167: (i,j,k) element is at the index

(i*M + j) * (2*(N/2+1)) +
cannam@167: k

(see Multi-dimensional Array Format). cannam@167:

As for the complex transforms, improved performance can be obtained by cannam@167: specifying that the output is the transpose of the input or vice versa cannam@167: (see Transposed distributions). In our L × M × N cannam@167: r2c cannam@167: example, including FFTW_TRANSPOSED_OUT in the flags means that cannam@167: the input would be a padded L × M × 2(N/2+1) cannam@167: real array cannam@167: distributed over the L dimension, while the output would be a cannam@167: M × L × N/2+1 cannam@167: complex array distributed over the M cannam@167: dimension. To perform the inverse c2r transform with the same data cannam@167: distributions, you would use the FFTW_TRANSPOSED_IN flag. cannam@167:

6.5 Multi-dimensional MPI DFTs of Real Data