d@0: d@0: d@0: Transposed distributions - FFTW 3.2alpha3 d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0:
d@0:

d@0: d@0: Next: , d@0: Previous: Load balancing, d@0: Up: MPI data distribution d@0:


d@0:
d@0: d@0:

6.4.3 Transposed distributions

d@0: d@0:

Internally, FFTW's MPI transform algorithms work by first computing d@0: transforms of the data local to each process, then by globally d@0: transposing the data in some fashion to redistribute the data d@0: among the processes, transforming the new data local to each process, d@0: and transposing back. For example, a two-dimensional n0 by d@0: n1 array, distributed across the n0 dimension, is d@0: transformd by: (i) transforming the n1 dimension, which are d@0: local to each process; (ii) transposing to a n1 by n0 d@0: array, distributed across the n1 dimension; (iii) transforming d@0: the n0 dimension, which is now local to each process; (iv) d@0: transposing back. d@0: d@0: However, in many applications it is acceptable to compute a d@0: multidimensional DFT whose results are produced in transposed order d@0: (e.g., n1 by n0 in two dimensions). This provides a d@0: significant performance advantage, because it means that the final d@0: transposition step can be omitted. FFTW supports this optimization, d@0: which you specify by passing the flag FFTW_MPI_TRANSPOSED_OUT d@0: to the planner routines. To compute the inverse transform of d@0: transposed output, you specify FFTW_MPI_TRANSPOSED_IN to tell d@0: it that the input is transposed. In this section, we explain how to d@0: interpret the output format of such a transform. d@0: d@0: Suppose you have are transforming multi-dimensional data with (at d@0: least two) dimensions n0 × n1 × n2 × … × nd-1. As always, it is distributed along d@0: the first dimension n0. Now, if we compute its DFT with the d@0: FFTW_MPI_TRANSPOSED_OUT, the resulting output data are stored d@0: with the first two dimensions transosed: n1 × n0 × n2 ×…× nd-1, d@0: distributed along the n0 dimension. Conversely, if we take the d@0: n1 × n0 × n2 ×…× nd-1 data and transform it with the d@0: FFTW_MPI_TRANSPOSED_IN flag, then the format goes back to the d@0: original n0 × n1 × n2 × … × nd-1. d@0: d@0:

There are two ways to find the portion of the transposed array that d@0: resides on the current process. First, you can simply call the d@0: appropriate `local_size' function, passing n1 × n0 × n2 ×…× nd-1 (the d@0: transposed dimensions). This would mean calling the `local_size' d@0: function twice, once for the transposed and once for the d@0: non-transposed dimensions. Alternatively, you can call one of the d@0: `local_size_transposed' functions, which returns both the d@0: non-transposed and transposed data distribution from a single call. d@0: For example, for a 3d transform with transposed output (or input), you d@0: might call: d@0: d@0:

     ptrdiff_t fftw_mpi_local_size_3d_transposed(
d@0:                      ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2, MPI_Comm comm,
d@0:                      ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
d@0:                      ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
d@0: 
d@0:

d@0: Here, local_n0 and local_0_start give the size and d@0: starting index of the n0 dimension, for the d@0: non-transposed data, as in the previous sections. For d@0: transposed data (e.g. the output for d@0: FFTW_MPI_TRANSPOSED_OUT), local_n1 and d@0: local_1_start give the size and starting index of the n1 d@0: dimension, which is the first dimension of the transposed data d@0: (n1 by n0 by n2). d@0: d@0:

(Note that FFTW_MPI_TRANSPOSED_IN is completely equivalent to d@0: performing FFTW_MPI_TRANSPOSED_OUT and passing the first two d@0: dimensions to the planner in reverse order, or vice versa. If you d@0: pass both the FFTW_MPI_TRANSPOSED_IN and d@0: FFTW_MPI_TRANSPOSED_OUT flags, it is equivalent to swapping the d@0: first two dimensions passed to the planner and passing neither d@0: flag.) d@0: d@0: d@0: