FFTW 3.3.8: Transposed distributions

Internally, FFTW’s MPI transform algorithms work by first computing cannam@167: transforms of the data local to each process, then by globally cannam@167: transposing the data in some fashion to redistribute the data cannam@167: among the processes, transforming the new data local to each process, cannam@167: and transposing back. For example, a two-dimensional n0 by cannam@167: n1 array, distributed across the n0 dimension, is cannam@167: transformd by: (i) transforming the n1 dimension, which are cannam@167: local to each process; (ii) transposing to an n1 by n0 cannam@167: array, distributed across the n1 dimension; (iii) transforming cannam@167: the n0 dimension, which is now local to each process; (iv) cannam@167: transposing back. cannam@167: cannam@167:

However, in many applications it is acceptable to compute a cannam@167: multidimensional DFT whose results are produced in transposed order cannam@167: (e.g., n1 by n0 in two dimensions). This provides a cannam@167: significant performance advantage, because it means that the final cannam@167: transposition step can be omitted. FFTW supports this optimization, cannam@167: which you specify by passing the flag FFTW_MPI_TRANSPOSED_OUT cannam@167: to the planner routines. To compute the inverse transform of cannam@167: transposed output, you specify FFTW_MPI_TRANSPOSED_IN to tell cannam@167: it that the input is transposed. In this section, we explain how to cannam@167: interpret the output format of such a transform. cannam@167: cannam@167: cannam@167:

Suppose you have are transforming multi-dimensional data with (at cannam@167: least two) dimensions n₀ × n₁ × n₂ × … × n_d-1 cannam@167: . As always, it is distributed along cannam@167: the first dimension n₀ cannam@167: . Now, if we compute its DFT with the cannam@167: FFTW_MPI_TRANSPOSED_OUT flag, the resulting output data are stored cannam@167: with the first two dimensions transposed: n₁ × n₀ × n₂ ×…× n_d-1 cannam@167: , cannam@167: distributed along the n₁ cannam@167: dimension. Conversely, if we take the cannam@167: n₁ × n₀ × n₂ ×…× n_d-1 cannam@167: data and transform it with the cannam@167: FFTW_MPI_TRANSPOSED_IN flag, then the format goes back to the cannam@167: original n₀ × n₁ × n₂ × … × n_d-1 cannam@167: array. cannam@167:

There are two ways to find the portion of the transposed array that cannam@167: resides on the current process. First, you can simply call the cannam@167: appropriate ‘local_size’ function, passing n₁ × n₀ × n₂ ×…× n_d-1 cannam@167: (the cannam@167: transposed dimensions). This would mean calling the ‘local_size’ cannam@167: function twice, once for the transposed and once for the cannam@167: non-transposed dimensions. Alternatively, you can call one of the cannam@167: ‘local_size_transposed’ functions, which returns both the cannam@167: non-transposed and transposed data distribution from a single call. cannam@167: For example, for a 3d transform with transposed output (or input), you cannam@167: might call: cannam@167:

cannam@167:

ptrdiff_t fftw_mpi_local_size_3d_transposed(
cannam@167:                 ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2, MPI_Comm comm,
cannam@167:                 ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
cannam@167:                 ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
cannam@167:

Here, local_n0 and local_0_start give the size and cannam@167: starting index of the n0 dimension for the cannam@167: non-transposed data, as in the previous sections. For cannam@167: transposed data (e.g. the output for cannam@167: FFTW_MPI_TRANSPOSED_OUT), local_n1 and cannam@167: local_1_start give the size and starting index of the n1 cannam@167: dimension, which is the first dimension of the transposed data cannam@167: (n1 by n0 by n2). cannam@167:

(Note that FFTW_MPI_TRANSPOSED_IN is completely equivalent to cannam@167: performing FFTW_MPI_TRANSPOSED_OUT and passing the first two cannam@167: dimensions to the planner in reverse order, or vice versa. If you cannam@167: pass both the FFTW_MPI_TRANSPOSED_IN and cannam@167: FFTW_MPI_TRANSPOSED_OUT flags, it is equivalent to swapping the cannam@167: first two dimensions passed to the planner and passing neither cannam@167: flag.) cannam@167:

6.4.3 Transposed distributions