cannam@95: cannam@95:
cannam@95:cannam@95: Next: An improved replacement for MPI_Alltoall, cannam@95: Previous: Basic distributed-transpose interface, cannam@95: Up: FFTW MPI Transposes cannam@95:
The above routines are for a transpose of a matrix of numbers (of type
cannam@95: double), using FFTW's default block sizes.  More generally, one
cannam@95: can perform transposes of tuples of numbers, with
cannam@95: user-specified block sizes for the input and output:
cannam@95: 
cannam@95: 
fftw_plan fftw_mpi_plan_many_transpose cannam@95: (ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t howmany, cannam@95: ptrdiff_t block0, ptrdiff_t block1, cannam@95: double *in, double *out, MPI_Comm comm, unsigned flags); cannam@95:cannam@95:
cannam@95: In this case, one is transposing an n0 by n1 matrix of
cannam@95: howmany-tuples (e.g. howmany = 2 for complex numbers). 
cannam@95: The input is distributed along the n0 dimension with block size
cannam@95: block0, and the n1 by n0 output is distributed
cannam@95: along the n1 dimension with block size block1.  If
cannam@95: FFTW_MPI_DEFAULT_BLOCK (0) is passed for a block size then FFTW
cannam@95: uses its default block size.  To get the local size of the data on
cannam@95: each process, you should then call fftw_mpi_local_size_many_transposed. 
cannam@95: 
cannam@95: 
cannam@95:    
cannam@95: