Chris@82: Chris@82: Chris@82: Chris@82: Chris@82:
Chris@82:Chris@82: Next: One-dimensional distributions, Previous: Load balancing, Up: MPI Data Distribution [Contents][Index]
Chris@82:Internally, FFTW’s MPI transform algorithms work by first computing
Chris@82: transforms of the data local to each process, then by globally
Chris@82: transposing the data in some fashion to redistribute the data
Chris@82: among the processes, transforming the new data local to each process,
Chris@82: and transposing back. For example, a two-dimensional n0
by
Chris@82: n1
array, distributed across the n0
dimension, is
Chris@82: transformd by: (i) transforming the n1
dimension, which are
Chris@82: local to each process; (ii) transposing to an n1
by n0
Chris@82: array, distributed across the n1
dimension; (iii) transforming
Chris@82: the n0
dimension, which is now local to each process; (iv)
Chris@82: transposing back.
Chris@82:
Chris@82:
However, in many applications it is acceptable to compute a
Chris@82: multidimensional DFT whose results are produced in transposed order
Chris@82: (e.g., n1
by n0
in two dimensions). This provides a
Chris@82: significant performance advantage, because it means that the final
Chris@82: transposition step can be omitted. FFTW supports this optimization,
Chris@82: which you specify by passing the flag FFTW_MPI_TRANSPOSED_OUT
Chris@82: to the planner routines. To compute the inverse transform of
Chris@82: transposed output, you specify FFTW_MPI_TRANSPOSED_IN
to tell
Chris@82: it that the input is transposed. In this section, we explain how to
Chris@82: interpret the output format of such a transform.
Chris@82:
Chris@82:
Chris@82:
Suppose you have are transforming multi-dimensional data with (at
Chris@82: least two) dimensions n0 × n1 × n2 × … × nd-1
Chris@82: . As always, it is distributed along
Chris@82: the first dimension n0
Chris@82: . Now, if we compute its DFT with the
Chris@82: FFTW_MPI_TRANSPOSED_OUT
flag, the resulting output data are stored
Chris@82: with the first two dimensions transposed: n1 × n0 × n2 ×…× nd-1
Chris@82: ,
Chris@82: distributed along the n1
Chris@82: dimension. Conversely, if we take the
Chris@82: n1 × n0 × n2 ×…× nd-1
Chris@82: data and transform it with the
Chris@82: FFTW_MPI_TRANSPOSED_IN
flag, then the format goes back to the
Chris@82: original n0 × n1 × n2 × … × nd-1
Chris@82: array.
Chris@82:
There are two ways to find the portion of the transposed array that Chris@82: resides on the current process. First, you can simply call the Chris@82: appropriate ‘local_size’ function, passing n1 × n0 × n2 ×…× nd-1 Chris@82: (the Chris@82: transposed dimensions). This would mean calling the ‘local_size’ Chris@82: function twice, once for the transposed and once for the Chris@82: non-transposed dimensions. Alternatively, you can call one of the Chris@82: ‘local_size_transposed’ functions, which returns both the Chris@82: non-transposed and transposed data distribution from a single call. Chris@82: For example, for a 3d transform with transposed output (or input), you Chris@82: might call: Chris@82:
Chris@82:ptrdiff_t fftw_mpi_local_size_3d_transposed( Chris@82: ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2, MPI_Comm comm, Chris@82: ptrdiff_t *local_n0, ptrdiff_t *local_0_start, Chris@82: ptrdiff_t *local_n1, ptrdiff_t *local_1_start); Chris@82:
Here, local_n0
and local_0_start
give the size and
Chris@82: starting index of the n0
dimension for the
Chris@82: non-transposed data, as in the previous sections. For
Chris@82: transposed data (e.g. the output for
Chris@82: FFTW_MPI_TRANSPOSED_OUT
), local_n1
and
Chris@82: local_1_start
give the size and starting index of the n1
Chris@82: dimension, which is the first dimension of the transposed data
Chris@82: (n1
by n0
by n2
).
Chris@82:
(Note that FFTW_MPI_TRANSPOSED_IN
is completely equivalent to
Chris@82: performing FFTW_MPI_TRANSPOSED_OUT
and passing the first two
Chris@82: dimensions to the planner in reverse order, or vice versa. If you
Chris@82: pass both the FFTW_MPI_TRANSPOSED_IN
and
Chris@82: FFTW_MPI_TRANSPOSED_OUT
flags, it is equivalent to swapping the
Chris@82: first two dimensions passed to the planner and passing neither
Chris@82: flag.)
Chris@82:
Chris@82: Next: One-dimensional distributions, Previous: Load balancing, Up: MPI Data Distribution [Contents][Index]
Chris@82: