cannam@95: cannam@95:
cannam@95:cannam@95: Next: Advanced distributed-transpose interface, cannam@95: Previous: FFTW MPI Transposes, cannam@95: Up: FFTW MPI Transposes cannam@95:
In particular, suppose that we have an n0 by n1 array in
cannam@95: row-major order, block-distributed across the n0 dimension.  To
cannam@95: transpose this into an n1 by n0 array block-distributed
cannam@95: across the n1 dimension, we would create a plan by calling the
cannam@95: following function:
cannam@95: 
cannam@95: 
fftw_plan fftw_mpi_plan_transpose(ptrdiff_t n0, ptrdiff_t n1, cannam@95: double *in, double *out, cannam@95: MPI_Comm comm, unsigned flags); cannam@95:cannam@95:
cannam@95: The input and output arrays (in and out) can be the
cannam@95: same.  The transpose is actually executed by calling
cannam@95: fftw_execute on the plan, as usual. 
cannam@95: 
cannam@95: 
cannam@95:    
The flags are the usual FFTW planner flags, but support
cannam@95: two additional flags: FFTW_MPI_TRANSPOSED_OUT and/or
cannam@95: FFTW_MPI_TRANSPOSED_IN.  What these flags indicate, for
cannam@95: transpose plans, is that the output and/or input, respectively, are
cannam@95: locally transposed.  That is, on each process input data is
cannam@95: normally stored as a local_n0 by n1 array in row-major
cannam@95: order, but for an FFTW_MPI_TRANSPOSED_IN plan the input data is
cannam@95: stored as n1 by local_n0 in row-major order.  Similarly,
cannam@95: FFTW_MPI_TRANSPOSED_OUT means that the output is n0 by
cannam@95: local_n1 instead of local_n1 by n0. 
cannam@95: 
cannam@95: 
cannam@95:    
To determine the local size of the array on each process before and
cannam@95: after the transpose, as well as the amount of storage that must be
cannam@95: allocated, one should call fftw_mpi_local_size_2d_transposed,
cannam@95: just as for a 2d DFT as described in the previous section:
cannam@95: 
cannam@95: 
ptrdiff_t fftw_mpi_local_size_2d_transposed cannam@95: (ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm, cannam@95: ptrdiff_t *local_n0, ptrdiff_t *local_0_start, cannam@95: ptrdiff_t *local_n1, ptrdiff_t *local_1_start); cannam@95:cannam@95:
cannam@95: Again, the return value is the local storage to allocate, which in
cannam@95: this case is the number of real (double) values rather
cannam@95: than complex numbers as in the previous examples.
cannam@95: 
cannam@95:    
cannam@95: