Chris@82: Chris@82: Chris@82: Chris@82: Chris@82:
Chris@82:Chris@82: Next: Other Multi-dimensional Real-data MPI Transforms, Previous: MPI Data Distribution, Up: Distributed-memory FFTW with MPI [Contents][Index]
Chris@82:FFTW’s MPI interface also supports multi-dimensional DFTs of real Chris@82: data, similar to the serial r2c and c2r interfaces. (Parallel Chris@82: one-dimensional real-data DFTs are not currently supported; you must Chris@82: use a complex transform and set the imaginary parts of the inputs to Chris@82: zero.) Chris@82:
Chris@82:The key points to understand for r2c and c2r MPI transforms (compared Chris@82: to the MPI complex DFTs or the serial r2c/c2r transforms), are: Chris@82:
Chris@82:For example suppose we are performing an out-of-place r2c transform of Chris@82: L × M × N Chris@82: real data [padded to L × M × 2(N/2+1) Chris@82: ], Chris@82: resulting in L × M × N/2+1 Chris@82: complex data. Similar to the Chris@82: example in 2d MPI example, we might do something like: Chris@82:
Chris@82:#include <fftw3-mpi.h> Chris@82: Chris@82: int main(int argc, char **argv) Chris@82: { Chris@82: const ptrdiff_t L = ..., M = ..., N = ...; Chris@82: fftw_plan plan; Chris@82: double *rin; Chris@82: fftw_complex *cout; Chris@82: ptrdiff_t alloc_local, local_n0, local_0_start, i, j, k; Chris@82: Chris@82: MPI_Init(&argc, &argv); Chris@82: fftw_mpi_init(); Chris@82: Chris@82: /* get local data size and allocate */ Chris@82: alloc_local = fftw_mpi_local_size_3d(L, M, N/2+1, MPI_COMM_WORLD, Chris@82: &local_n0, &local_0_start); Chris@82: rin = fftw_alloc_real(2 * alloc_local); Chris@82: cout = fftw_alloc_complex(alloc_local); Chris@82: Chris@82: /* create plan for out-of-place r2c DFT */ Chris@82: plan = fftw_mpi_plan_dft_r2c_3d(L, M, N, rin, cout, MPI_COMM_WORLD, Chris@82: FFTW_MEASURE); Chris@82: Chris@82: /* initialize rin to some function my_func(x,y,z) */ Chris@82: for (i = 0; i < local_n0; ++i) Chris@82: for (j = 0; j < M; ++j) Chris@82: for (k = 0; k < N; ++k) Chris@82: rin[(i*M + j) * (2*(N/2+1)) + k] = my_func(local_0_start+i, j, k); Chris@82: Chris@82: /* compute transforms as many times as desired */ Chris@82: fftw_execute(plan); Chris@82: Chris@82: fftw_destroy_plan(plan); Chris@82: Chris@82: MPI_Finalize(); Chris@82: } Chris@82:
Note that we allocated rin
using fftw_alloc_real
with an
Chris@82: argument of 2 * alloc_local
: since alloc_local
is the
Chris@82: number of complex values to allocate, the number of real
Chris@82: values is twice as many. The rin
array is then
Chris@82: local_n0 × M × 2(N/2+1)
Chris@82: in row-major order, so its
Chris@82: (i,j,k)
element is at the index (i*M + j) * (2*(N/2+1)) +
Chris@82: k
(see Multi-dimensional Array Format).
Chris@82:
As for the complex transforms, improved performance can be obtained by
Chris@82: specifying that the output is the transpose of the input or vice versa
Chris@82: (see Transposed distributions). In our L × M × N
Chris@82: r2c
Chris@82: example, including FFTW_TRANSPOSED_OUT
in the flags means that
Chris@82: the input would be a padded L × M × 2(N/2+1)
Chris@82: real array
Chris@82: distributed over the L
dimension, while the output would be a
Chris@82: M × L × N/2+1
Chris@82: complex array distributed over the M
Chris@82: dimension. To perform the inverse c2r transform with the same data
Chris@82: distributions, you would use the FFTW_TRANSPOSED_IN
flag.
Chris@82:
Chris@82: Next: Other Multi-dimensional Real-data MPI Transforms, Previous: MPI Data Distribution, Up: Distributed-memory FFTW with MPI [Contents][Index]
Chris@82: