Chris@82: Chris@82: Chris@82: Chris@82: Chris@82:
Chris@82:Chris@82: Next: An improved replacement for MPI_Alltoall, Previous: Basic distributed-transpose interface, Up: FFTW MPI Transposes [Contents][Index]
Chris@82:The above routines are for a transpose of a matrix of numbers (of type
Chris@82: double), using FFTW’s default block sizes. More generally, one
Chris@82: can perform transposes of tuples of numbers, with
Chris@82: user-specified block sizes for the input and output:
Chris@82:
fftw_plan fftw_mpi_plan_many_transpose Chris@82: (ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t howmany, Chris@82: ptrdiff_t block0, ptrdiff_t block1, Chris@82: double *in, double *out, MPI_Comm comm, unsigned flags); Chris@82:
In this case, one is transposing an n0 by n1 matrix of
Chris@82: howmany-tuples (e.g. howmany = 2 for complex numbers).
Chris@82: The input is distributed along the n0 dimension with block size
Chris@82: block0, and the n1 by n0 output is distributed
Chris@82: along the n1 dimension with block size block1. If
Chris@82: FFTW_MPI_DEFAULT_BLOCK (0) is passed for a block size then FFTW
Chris@82: uses its default block size. To get the local size of the data on
Chris@82: each process, you should then call fftw_mpi_local_size_many_transposed.
Chris@82:
Chris@82:
Chris@82: