Chris@19: Chris@19: Chris@19: Multi-dimensional MPI DFTs of Real Data - FFTW 3.3.4 Chris@19: Chris@19: Chris@19: Chris@19: Chris@19: Chris@19: Chris@19: Chris@19: Chris@19: Chris@19: Chris@19: Chris@19: Chris@19: Chris@19:
Chris@19: Chris@19: Chris@19:

Chris@19: Next: , Chris@19: Previous: MPI Data Distribution, Chris@19: Up: Distributed-memory FFTW with MPI Chris@19:


Chris@19:
Chris@19: Chris@19:

6.5 Multi-dimensional MPI DFTs of Real Data

Chris@19: Chris@19:

FFTW's MPI interface also supports multi-dimensional DFTs of real Chris@19: data, similar to the serial r2c and c2r interfaces. (Parallel Chris@19: one-dimensional real-data DFTs are not currently supported; you must Chris@19: use a complex transform and set the imaginary parts of the inputs to Chris@19: zero.) Chris@19: Chris@19:

The key points to understand for r2c and c2r MPI transforms (compared Chris@19: to the MPI complex DFTs or the serial r2c/c2r transforms), are: Chris@19: Chris@19:

Chris@19: Chris@19:

For example suppose we are performing an out-of-place r2c transform of Chris@19: L × M × N real data [padded to L × M × 2(N/2+1)], Chris@19: resulting in L × M × N/2+1 complex data. Similar to the Chris@19: example in 2d MPI example, we might do something like: Chris@19: Chris@19:

     #include <fftw3-mpi.h>
Chris@19:      
Chris@19:      int main(int argc, char **argv)
Chris@19:      {
Chris@19:          const ptrdiff_t L = ..., M = ..., N = ...;
Chris@19:          fftw_plan plan;
Chris@19:          double *rin;
Chris@19:          fftw_complex *cout;
Chris@19:          ptrdiff_t alloc_local, local_n0, local_0_start, i, j, k;
Chris@19:      
Chris@19:          MPI_Init(&argc, &argv);
Chris@19:          fftw_mpi_init();
Chris@19:      
Chris@19:          /* get local data size and allocate */
Chris@19:          alloc_local = fftw_mpi_local_size_3d(L, M, N/2+1, MPI_COMM_WORLD,
Chris@19:                                               &local_n0, &local_0_start);
Chris@19:          rin = fftw_alloc_real(2 * alloc_local);
Chris@19:          cout = fftw_alloc_complex(alloc_local);
Chris@19:      
Chris@19:          /* create plan for out-of-place r2c DFT */
Chris@19:          plan = fftw_mpi_plan_dft_r2c_3d(L, M, N, rin, cout, MPI_COMM_WORLD,
Chris@19:                                          FFTW_MEASURE);
Chris@19:      
Chris@19:          /* initialize rin to some function my_func(x,y,z) */
Chris@19:          for (i = 0; i < local_n0; ++i)
Chris@19:             for (j = 0; j < M; ++j)
Chris@19:               for (k = 0; k < N; ++k)
Chris@19:             rin[(i*M + j) * (2*(N/2+1)) + k] = my_func(local_0_start+i, j, k);
Chris@19:      
Chris@19:          /* compute transforms as many times as desired */
Chris@19:          fftw_execute(plan);
Chris@19:      
Chris@19:          fftw_destroy_plan(plan);
Chris@19:      
Chris@19:          MPI_Finalize();
Chris@19:      }
Chris@19: 
Chris@19:

Note that we allocated rin using fftw_alloc_real with an Chris@19: argument of 2 * alloc_local: since alloc_local is the Chris@19: number of complex values to allocate, the number of real Chris@19: values is twice as many. The rin array is then Chris@19: local_n0 × M × 2(N/2+1) in row-major order, so its Chris@19: (i,j,k) element is at the index (i*M + j) * (2*(N/2+1)) + Chris@19: k (see Multi-dimensional Array Format). Chris@19: Chris@19:

As for the complex transforms, improved performance can be obtained by Chris@19: specifying that the output is the transpose of the input or vice versa Chris@19: (see Transposed distributions). In our L × M × N r2c Chris@19: example, including FFTW_TRANSPOSED_OUT in the flags means that Chris@19: the input would be a padded L × M × 2(N/2+1) real array Chris@19: distributed over the L dimension, while the output would be a Chris@19: M × L × N/2+1 complex array distributed over the M Chris@19: dimension. To perform the inverse c2r transform with the same data Chris@19: distributions, you would use the FFTW_TRANSPOSED_IN flag. Chris@19: Chris@19: Chris@19: Chris@19: