d@0: d@0: d@0: Basic distributed-transpose interface - FFTW 3.2alpha3 d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0:
d@0:

d@0: d@0: d@0: Next: , d@0: Previous: FFTW MPI Transposes, d@0: Up: FFTW MPI Transposes d@0:


d@0:
d@0: d@0:

6.7.1 Basic distributed-transpose interface

d@0: d@0:

In particular, suppose that we have an n0 by n1 array in d@0: row-major order, block-distributed across the n0 dimension. To d@0: transpose this into an n1 by n0 array block-distributed d@0: across the n1 dimension, we would create a plan by calling the d@0: following function: d@0: d@0:

     fftw_plan fftw_mpi_plan_transpose(ptrdiff_t n0, ptrdiff_t n1,
d@0:                                        double *in, double *out,
d@0:                                        MPI_Comm comm, unsigned flags);
d@0: 
d@0:

d@0: The input and output arrays (in and out) can be the d@0: same. The transpose is actually executed by calling d@0: fftw_execute on the plan, as usual. d@0: d@0: The flags are the usual FFTW planner flags, but support d@0: two additional flags: FFTW_MPI_TRANSPOSED_OUT and/or d@0: FFTW_MPI_TRANSPOSED_IN. What these flags indicate, for d@0: transpose plans, is that the output and/or input, respectively, are d@0: locally transposed. That is, on each process input data is d@0: normally stored as a local_n0 by n1 array in row-major d@0: order, but for an FFTW_MPI_TRANSPOSED_IN plan the input data is d@0: stored as n1 by local_n0 in row-major order. Similarly, d@0: FFTW_MPI_TRANSPOSED_OUT means that the output is n0 by d@0: local_n1 instead of local_n1 by n0. d@0: d@0: To determine the local size of the array on each process before and d@0: after the transpose, as well as the amount of storage that must be d@0: allocated, one should call fftw_mpi_local_size_2d_transposed, d@0: just as for a 2d DFT as described in the previous section: d@0: d@0:

     ptrdiff_t fftw_mpi_local_size_2d_transposed
d@0:                      (ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
d@0:                       ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
d@0:                       ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
d@0: 
d@0:

d@0: d@0: d@0: