d@0: d@0: d@0: One-dimensional distributions - FFTW 3.2alpha3 d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0:
d@0:

d@0: d@0: d@0: Previous: Transposed distributions, d@0: Up: MPI data distribution d@0:


d@0:
d@0: d@0:

6.4.4 One-dimensional distributions

d@0: d@0:

For one-dimensional distributed DFTs using FFTW, matters are slightly d@0: more complicated because the data distribution is more closely tied to d@0: how the algorithm works. In particular, you can no longer pass an d@0: arbitrary block size, and must accept FFTW's default, and the block d@0: sizes may be different for input and output. Also, the data d@0: distribution depends on the flags and transform direction, in order d@0: for forward and backward transforms to work correctly. d@0: d@0:

     ptrdiff_t fftw_mpi_local_size_1d(ptrdiff_t n0, MPI_Comm comm,
d@0:                      int sign, unsigned flags,
d@0:                      ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
d@0:                      ptrdiff_t *local_no, ptrdiff_t *local_o_start);
d@0: 
d@0:

d@0: This function computes the data distribution for a 1d transform of d@0: size n0 with the given transform sign and flags. d@0: Both input and output data use block distributions. The input on the d@0: current process will consist of local_ni numbers starting at d@0: index local_i_start; e.g. if only a single process is used, d@0: then local_ni will be n0 and local_i_start will d@0: be 0. Similarly for the output, with local_no numbers d@0: starting at index local_o_start. The return value of d@0: fftw_mpi_local_size_1d will be the total number of elements to d@0: allocate on the current process (which might be slightly larger than d@0: the local size due to intermediate steps in the algorithm). d@0: d@0:

As mentioned above (see Load balancing), the data will be divided d@0: equally among the processes if n0 is divisible by the d@0: square of the number of processes. In this case, d@0: local_ni will equal local_no. Otherwise, they may be d@0: different. d@0: d@0:

For some applications, such as convolutions, the order of the output d@0: data is irrelevant. In this case, performance can be improved by d@0: specifying that the output data be stored in an FFTW-defined d@0: “scrambled” format. (In particular, this is the analogue of d@0: transposed output in the multidimensional case: scrambled output saves d@0: a communications step.) If you pass FFTW_MPI_SCRAMBLED_OUT in d@0: the flags, then the output is stored in this (undocumented) scrambled d@0: order. Conversely, to perform the inverse transform of data in d@0: scrambled order, pass the FFTW_MPI_SCRAMBLED_IN flag. d@0: d@0: In MPI FFTW, only composite sizes n0 can be parallelized; we d@0: have not yet implemented a parallel algorithm for large prime sizes. d@0: d@0: d@0: d@0: