d@0: d@0:
d@0:d@0: d@0: d@0: Next: An improved replacement for MPI_Alltoall, d@0: Previous: Basic distributed-transpose interface, d@0: Up: FFTW MPI Transposes d@0:
The above routines are for a transpose of a matrix of numbers (of type
d@0: double
), using FFTW's default block sizes. More generally, one
d@0: can perform transposes of tuples of numbers, with
d@0: user-specified block sizes for the input and output:
d@0:
d@0:
fftw_plan fftw_mpi_plan_many_transpose d@0: (ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t howmany, d@0: ptrdiff_t block0, ptrdiff_t block1, d@0: double *in, double *out, MPI_Comm comm, unsigned flags); d@0:d@0:
d@0: In this case, one is transposing an n0
by n1
matrix of
d@0: howmany
-tuples (e.g. howmany = 2
for complex numbers).
d@0: The input is distributed along the n0
dimension with block size
d@0: block0
, and the n1
by n0
output is distributed
d@0: along the n1
dimension with block size block1
. If
d@0: FFTW_MPI_DEFAULT_BLOCK
(0) is passed for a block size then FFTW
d@0: uses its default block size. To get the local size of the data on
d@0: each process, you should then call fftw_mpi_local_size_many_transposed
.
d@0:
d@0:
d@0:
d@0: