cannam@167: cannam@167: cannam@167: cannam@167: cannam@167:
cannam@167:cannam@167: Previous: Transposed distributions, Up: MPI Data Distribution [Contents][Index]
cannam@167:For one-dimensional distributed DFTs using FFTW, matters are slightly cannam@167: more complicated because the data distribution is more closely tied to cannam@167: how the algorithm works. In particular, you can no longer pass an cannam@167: arbitrary block size and must accept FFTW’s default; also, the block cannam@167: sizes may be different for input and output. Also, the data cannam@167: distribution depends on the flags and transform direction, in order cannam@167: for forward and backward transforms to work correctly. cannam@167:
cannam@167:ptrdiff_t fftw_mpi_local_size_1d(ptrdiff_t n0, MPI_Comm comm, cannam@167: int sign, unsigned flags, cannam@167: ptrdiff_t *local_ni, ptrdiff_t *local_i_start, cannam@167: ptrdiff_t *local_no, ptrdiff_t *local_o_start); cannam@167:
This function computes the data distribution for a 1d transform of
cannam@167: size n0
with the given transform sign
and flags
.
cannam@167: Both input and output data use block distributions. The input on the
cannam@167: current process will consist of local_ni
numbers starting at
cannam@167: index local_i_start
; e.g. if only a single process is used,
cannam@167: then local_ni
will be n0
and local_i_start
will
cannam@167: be 0
. Similarly for the output, with local_no
numbers
cannam@167: starting at index local_o_start
. The return value of
cannam@167: fftw_mpi_local_size_1d
will be the total number of elements to
cannam@167: allocate on the current process (which might be slightly larger than
cannam@167: the local size due to intermediate steps in the algorithm).
cannam@167:
As mentioned above (see Load balancing), the data will be divided
cannam@167: equally among the processes if n0
is divisible by the
cannam@167: square of the number of processes. In this case,
cannam@167: local_ni
will equal local_no
. Otherwise, they may be
cannam@167: different.
cannam@167:
For some applications, such as convolutions, the order of the output
cannam@167: data is irrelevant. In this case, performance can be improved by
cannam@167: specifying that the output data be stored in an FFTW-defined
cannam@167: “scrambled” format. (In particular, this is the analogue of
cannam@167: transposed output in the multidimensional case: scrambled output saves
cannam@167: a communications step.) If you pass FFTW_MPI_SCRAMBLED_OUT
in
cannam@167: the flags, then the output is stored in this (undocumented) scrambled
cannam@167: order. Conversely, to perform the inverse transform of data in
cannam@167: scrambled order, pass the FFTW_MPI_SCRAMBLED_IN
flag.
cannam@167:
cannam@167:
cannam@167:
In MPI FFTW, only composite sizes n0
can be parallelized; we
cannam@167: have not yet implemented a parallel algorithm for large prime sizes.
cannam@167:
cannam@167: Previous: Transposed distributions, Up: MPI Data Distribution [Contents][Index]
cannam@167: