Multi-dimensional MPI DFTs of Real Data

cannam@95: cannam@95: cannam@95: Multi-dimensional MPI DFTs of Real Data - FFTW 3.3.3 cannam@95: cannam@95: cannam@95: cannam@95: cannam@95: cannam@95: cannam@95: cannam@95: cannam@95: cannam@95: cannam@95: cannam@95: cannam@95: cannam@95:

cannam@95: cannam@95: cannam@95:

cannam@95: Next: Other Multi-dimensional Real-data MPI Transforms, cannam@95: Previous: MPI Data Distribution, cannam@95: Up: Distributed-memory FFTW with MPI cannam@95:

cannam@95:

cannam@95: cannam@95:

6.5 Multi-dimensional MPI DFTs of Real Data

cannam@95: cannam@95:

FFTW's MPI interface also supports multi-dimensional DFTs of real cannam@95: data, similar to the serial r2c and c2r interfaces. (Parallel cannam@95: one-dimensional real-data DFTs are not currently supported; you must cannam@95: use a complex transform and set the imaginary parts of the inputs to cannam@95: zero.) cannam@95: cannam@95:

The key points to understand for r2c and c2r MPI transforms (compared cannam@95: to the MPI complex DFTs or the serial r2c/c2r transforms), are: cannam@95: cannam@95:

Just as for serial transforms, r2c/c2r DFTs transform n₀ × n₁ × n₂ × … × n_d-1 real cannam@95: data to/from n₀ × n₁ × n₂ × … × (n_d-1/2 + 1) complex data: the last dimension of the cannam@95: complex data is cut in half (rounded down), plus one. As for the cannam@95: serial transforms, the sizes you pass to the ‘plan_dft_r2c’ and cannam@95: ‘plan_dft_c2r’ are the n₀ × n₁ × n₂ × … × n_d-1 dimensions of the real data. cannam@95: cannam@95:
Although the real data is conceptually n₀ × n₁ × n₂ × … × n_d-1, it is cannam@95: physically stored as an n₀ × n₁ × n₂ × … × [2 (n_d-1/2 + 1)] array, where the last cannam@95: dimension has been padded to make it the same size as the cannam@95: complex output. This is much like the in-place serial r2c/c2r cannam@95: interface (see Multi-Dimensional DFTs of Real Data), except that cannam@95: in MPI the padding is required even for out-of-place data. The extra cannam@95: padding numbers are ignored by FFTW (they are not like cannam@95: zero-padding the transform to a larger size); they are only used to cannam@95: determine the data layout. cannam@95: cannam@95:
The data distribution in MPI for both the real and complex data cannam@95: is determined by the shape of the complex data. That is, you cannam@95: call the appropriate ‘local size’ function for the n₀ × n₁ × n₂ × … × (n_d-1/2 + 1) cannam@95: cannam@95:
complex data, and then use the same distribution for the real cannam@95: data except that the last complex dimension is replaced by a (padded) cannam@95: real dimension of twice the length. cannam@95: cannam@95:

cannam@95: cannam@95:

For example suppose we are performing an out-of-place r2c transform of cannam@95: L × M × N real data [padded to L × M × 2(N/2+1)], cannam@95: resulting in L × M × N/2+1 complex data. Similar to the cannam@95: example in 2d MPI example, we might do something like: cannam@95: cannam@95:

     #include <fftw3-mpi.h>
cannam@95:      
cannam@95:      int main(int argc, char **argv)
cannam@95:      {
cannam@95:          const ptrdiff_t L = ..., M = ..., N = ...;
cannam@95:          fftw_plan plan;
cannam@95:          double *rin;
cannam@95:          fftw_complex *cout;
cannam@95:          ptrdiff_t alloc_local, local_n0, local_0_start, i, j, k;
cannam@95:      
cannam@95:          MPI_Init(&argc, &argv);
cannam@95:          fftw_mpi_init();
cannam@95:      
cannam@95:          /* get local data size and allocate */
cannam@95:          alloc_local = fftw_mpi_local_size_3d(L, M, N/2+1, MPI_COMM_WORLD,
cannam@95:                                               &local_n0, &local_0_start);
cannam@95:          rin = fftw_alloc_real(2 * alloc_local);
cannam@95:          cout = fftw_alloc_complex(alloc_local);
cannam@95:      
cannam@95:          /* create plan for out-of-place r2c DFT */
cannam@95:          plan = fftw_mpi_plan_dft_r2c_3d(L, M, N, rin, cout, MPI_COMM_WORLD,
cannam@95:                                          FFTW_MEASURE);
cannam@95:      
cannam@95:          /* initialize rin to some function my_func(x,y,z) */
cannam@95:          for (i = 0; i < local_n0; ++i)
cannam@95:             for (j = 0; j < M; ++j)
cannam@95:               for (k = 0; k < N; ++k)
cannam@95:             rin[(i*M + j) * (2*(N/2+1)) + k] = my_func(local_0_start+i, j, k);
cannam@95:      
cannam@95:          /* compute transforms as many times as desired */
cannam@95:          fftw_execute(plan);
cannam@95:      
cannam@95:          fftw_destroy_plan(plan);
cannam@95:      
cannam@95:          MPI_Finalize();
cannam@95:      }
cannam@95:

cannam@95:

Note that we allocated rin using fftw_alloc_real with an cannam@95: argument of 2 * alloc_local: since alloc_local is the cannam@95: number of complex values to allocate, the number of real cannam@95: values is twice as many. The rin array is then cannam@95: local_n0 × M × 2(N/2+1) in row-major order, so its cannam@95: (i,j,k) element is at the index (i*M + j) * (2*(N/2+1)) + cannam@95: k (see Multi-dimensional Array Format). cannam@95: cannam@95:

As for the complex transforms, improved performance can be obtained by cannam@95: specifying that the output is the transpose of the input or vice versa cannam@95: (see Transposed distributions). In our L × M × N r2c cannam@95: example, including FFTW_TRANSPOSED_OUT in the flags means that cannam@95: the input would be a padded L × M × 2(N/2+1) real array cannam@95: distributed over the L dimension, while the output would be a cannam@95: M × L × N/2+1 complex array distributed over the M cannam@95: dimension. To perform the inverse c2r transform with the same data cannam@95: distributions, you would use the FFTW_TRANSPOSED_IN flag. cannam@95: cannam@95: cannam@95: cannam@95: