cannam@127: cannam@127: cannam@127: cannam@127: cannam@127:
cannam@127:cannam@127: Next: One-dimensional distributions, Previous: Load balancing, Up: MPI Data Distribution [Contents][Index]
cannam@127:Internally, FFTW’s MPI transform algorithms work by first computing
cannam@127: transforms of the data local to each process, then by globally
cannam@127: transposing the data in some fashion to redistribute the data
cannam@127: among the processes, transforming the new data local to each process,
cannam@127: and transposing back.  For example, a two-dimensional n0 by
cannam@127: n1 array, distributed across the n0 dimension, is
cannam@127: transformd by: (i) transforming the n1 dimension, which are
cannam@127: local to each process; (ii) transposing to an n1 by n0
cannam@127: array, distributed across the n1 dimension; (iii) transforming
cannam@127: the n0 dimension, which is now local to each process; (iv)
cannam@127: transposing back.
cannam@127: 
cannam@127: 
However, in many applications it is acceptable to compute a
cannam@127: multidimensional DFT whose results are produced in transposed order
cannam@127: (e.g., n1 by n0 in two dimensions).  This provides a
cannam@127: significant performance advantage, because it means that the final
cannam@127: transposition step can be omitted.  FFTW supports this optimization,
cannam@127: which you specify by passing the flag FFTW_MPI_TRANSPOSED_OUT
cannam@127: to the planner routines.  To compute the inverse transform of
cannam@127: transposed output, you specify FFTW_MPI_TRANSPOSED_IN to tell
cannam@127: it that the input is transposed.  In this section, we explain how to
cannam@127: interpret the output format of such a transform.
cannam@127: 
cannam@127: 
cannam@127: 
Suppose you have are transforming multi-dimensional data with (at
cannam@127: least two) dimensions n0 × n1 × n2 × … × nd-1.  As always, it is distributed along
cannam@127: the first dimension n0.  Now, if we compute its DFT with the
cannam@127: FFTW_MPI_TRANSPOSED_OUT flag, the resulting output data are stored
cannam@127: with the first two dimensions transposed: n1 × n0 × n2 ×…× nd-1,
cannam@127: distributed along the n1 dimension.  Conversely, if we take the
cannam@127: n1 × n0 × n2 ×…× nd-1 data and transform it with the
cannam@127: FFTW_MPI_TRANSPOSED_IN flag, then the format goes back to the
cannam@127: original n0 × n1 × n2 × … × nd-1 array.
cannam@127: 
There are two ways to find the portion of the transposed array that cannam@127: resides on the current process. First, you can simply call the cannam@127: appropriate ‘local_size’ function, passing n1 × n0 × n2 ×…× nd-1 (the cannam@127: transposed dimensions). This would mean calling the ‘local_size’ cannam@127: function twice, once for the transposed and once for the cannam@127: non-transposed dimensions. Alternatively, you can call one of the cannam@127: ‘local_size_transposed’ functions, which returns both the cannam@127: non-transposed and transposed data distribution from a single call. cannam@127: For example, for a 3d transform with transposed output (or input), you cannam@127: might call: cannam@127:
cannam@127:ptrdiff_t fftw_mpi_local_size_3d_transposed( cannam@127: ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2, MPI_Comm comm, cannam@127: ptrdiff_t *local_n0, ptrdiff_t *local_0_start, cannam@127: ptrdiff_t *local_n1, ptrdiff_t *local_1_start); cannam@127:
Here, local_n0 and local_0_start give the size and
cannam@127: starting index of the n0 dimension for the
cannam@127: non-transposed data, as in the previous sections.  For
cannam@127: transposed data (e.g. the output for
cannam@127: FFTW_MPI_TRANSPOSED_OUT), local_n1 and
cannam@127: local_1_start give the size and starting index of the n1
cannam@127: dimension, which is the first dimension of the transposed data
cannam@127: (n1 by n0 by n2).
cannam@127: 
(Note that FFTW_MPI_TRANSPOSED_IN is completely equivalent to
cannam@127: performing FFTW_MPI_TRANSPOSED_OUT and passing the first two
cannam@127: dimensions to the planner in reverse order, or vice versa.  If you
cannam@127: pass both the FFTW_MPI_TRANSPOSED_IN and
cannam@127: FFTW_MPI_TRANSPOSED_OUT flags, it is equivalent to swapping the
cannam@127: first two dimensions passed to the planner and passing neither
cannam@127: flag.)
cannam@127: 
cannam@127: Next: One-dimensional distributions, Previous: Load balancing, Up: MPI Data Distribution [Contents][Index]
cannam@127: