cannam@167: cannam@167: cannam@167: cannam@167: cannam@167:
cannam@167:cannam@167: Next: Advanced distributed-transpose interface, Previous: FFTW MPI Transposes, Up: FFTW MPI Transposes [Contents][Index]
cannam@167:In particular, suppose that we have an n0
by n1
array in
cannam@167: row-major order, block-distributed across the n0
dimension. To
cannam@167: transpose this into an n1
by n0
array block-distributed
cannam@167: across the n1
dimension, we would create a plan by calling the
cannam@167: following function:
cannam@167:
fftw_plan fftw_mpi_plan_transpose(ptrdiff_t n0, ptrdiff_t n1, cannam@167: double *in, double *out, cannam@167: MPI_Comm comm, unsigned flags); cannam@167:
The input and output arrays (in
and out
) can be the
cannam@167: same. The transpose is actually executed by calling
cannam@167: fftw_execute
on the plan, as usual.
cannam@167:
cannam@167:
The flags
are the usual FFTW planner flags, but support
cannam@167: two additional flags: FFTW_MPI_TRANSPOSED_OUT
and/or
cannam@167: FFTW_MPI_TRANSPOSED_IN
. What these flags indicate, for
cannam@167: transpose plans, is that the output and/or input, respectively, are
cannam@167: locally transposed. That is, on each process input data is
cannam@167: normally stored as a local_n0
by n1
array in row-major
cannam@167: order, but for an FFTW_MPI_TRANSPOSED_IN
plan the input data is
cannam@167: stored as n1
by local_n0
in row-major order. Similarly,
cannam@167: FFTW_MPI_TRANSPOSED_OUT
means that the output is n0
by
cannam@167: local_n1
instead of local_n1
by n0
.
cannam@167:
cannam@167:
cannam@167:
To determine the local size of the array on each process before and
cannam@167: after the transpose, as well as the amount of storage that must be
cannam@167: allocated, one should call fftw_mpi_local_size_2d_transposed
,
cannam@167: just as for a 2d DFT as described in the previous section:
cannam@167:
cannam@167:
ptrdiff_t fftw_mpi_local_size_2d_transposed cannam@167: (ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm, cannam@167: ptrdiff_t *local_n0, ptrdiff_t *local_0_start, cannam@167: ptrdiff_t *local_n1, ptrdiff_t *local_1_start); cannam@167:
Again, the return value is the local storage to allocate, which in
cannam@167: this case is the number of real (double
) values rather
cannam@167: than complex numbers as in the previous examples.
cannam@167:
cannam@167: Next: Advanced distributed-transpose interface, Previous: FFTW MPI Transposes, Up: FFTW MPI Transposes [Contents][Index]
cannam@167: