Advanced distributed-transpose interface

d@0: d@0: d@0: Advanced distributed-transpose interface - FFTW 3.2alpha3 d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0:

d@0:

d@0: d@0: d@0: Next: An improved replacement for MPI_Alltoall, d@0: Previous: Basic distributed-transpose interface, d@0: Up: FFTW MPI Transposes d@0:

d@0:

d@0: d@0:

6.7.2 Advanced distributed-transpose interface

d@0: d@0:

The above routines are for a transpose of a matrix of numbers (of type d@0: double), using FFTW's default block sizes. More generally, one d@0: can perform transposes of tuples of numbers, with d@0: user-specified block sizes for the input and output: d@0: d@0:

     fftw_plan fftw_mpi_plan_many_transpose
d@0:                      (ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t howmany,
d@0:                       ptrdiff_t block0, ptrdiff_t block1,
d@0:                       double *in, double *out, MPI_Comm comm, unsigned flags);
d@0:

d@0:

d@0: In this case, one is transposing an n0 by n1 matrix of d@0: howmany-tuples (e.g. howmany = 2 for complex numbers). d@0: The input is distributed along the n0 dimension with block size d@0: block0, and the n1 by n0 output is distributed d@0: along the n1 dimension with block size block1. If d@0: FFTW_MPI_DEFAULT_BLOCK (0) is passed for a block size then FFTW d@0: uses its default block size. To get the local size of the data on d@0: each process, you should then call fftw_mpi_local_size_many_transposed. d@0: d@0: d@0: d@0: