Basic distributed-transpose interface

Chris@19: Chris@19: Chris@19: Basic distributed-transpose interface - FFTW 3.3.4 Chris@19: Chris@19: Chris@19: Chris@19: Chris@19: Chris@19: Chris@19: Chris@19: Chris@19: Chris@19: Chris@19: Chris@19: Chris@19: Chris@19:

Chris@19: Chris@19: Chris@19:

Chris@19: Next: Advanced distributed-transpose interface, Chris@19: Previous: FFTW MPI Transposes, Chris@19: Up: FFTW MPI Transposes Chris@19:

Chris@19:

Chris@19: Chris@19:

6.7.1 Basic distributed-transpose interface

Chris@19: Chris@19:

In particular, suppose that we have an n0 by n1 array in Chris@19: row-major order, block-distributed across the n0 dimension. To Chris@19: transpose this into an n1 by n0 array block-distributed Chris@19: across the n1 dimension, we would create a plan by calling the Chris@19: following function: Chris@19: Chris@19:

     fftw_plan fftw_mpi_plan_transpose(ptrdiff_t n0, ptrdiff_t n1,
Chris@19:                                        double *in, double *out,
Chris@19:                                        MPI_Comm comm, unsigned flags);
Chris@19:

Chris@19:

Chris@19: The input and output arrays (in and out) can be the Chris@19: same. The transpose is actually executed by calling Chris@19: fftw_execute on the plan, as usual. Chris@19: Chris@19: Chris@19:

The flags are the usual FFTW planner flags, but support Chris@19: two additional flags: FFTW_MPI_TRANSPOSED_OUT and/or Chris@19: FFTW_MPI_TRANSPOSED_IN. What these flags indicate, for Chris@19: transpose plans, is that the output and/or input, respectively, are Chris@19: locally transposed. That is, on each process input data is Chris@19: normally stored as a local_n0 by n1 array in row-major Chris@19: order, but for an FFTW_MPI_TRANSPOSED_IN plan the input data is Chris@19: stored as n1 by local_n0 in row-major order. Similarly, Chris@19: FFTW_MPI_TRANSPOSED_OUT means that the output is n0 by Chris@19: local_n1 instead of local_n1 by n0. Chris@19: Chris@19: Chris@19:

To determine the local size of the array on each process before and Chris@19: after the transpose, as well as the amount of storage that must be Chris@19: allocated, one should call fftw_mpi_local_size_2d_transposed, Chris@19: just as for a 2d DFT as described in the previous section: Chris@19: Chris@19:

     ptrdiff_t fftw_mpi_local_size_2d_transposed
Chris@19:                      (ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
Chris@19:                       ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
Chris@19:                       ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
Chris@19:

Chris@19:

Chris@19: Again, the return value is the local storage to allocate, which in Chris@19: this case is the number of real (double) values rather Chris@19: than complex numbers as in the previous examples. Chris@19: Chris@19: Chris@19: