cannam@95: cannam@95: cannam@95: One-dimensional distributions - FFTW 3.3.3 cannam@95: cannam@95: cannam@95: cannam@95: cannam@95: cannam@95: cannam@95: cannam@95: cannam@95: cannam@95: cannam@95: cannam@95: cannam@95:
cannam@95: cannam@95: cannam@95:

cannam@95: Previous: Transposed distributions, cannam@95: Up: MPI Data Distribution cannam@95:


cannam@95:
cannam@95: cannam@95:

6.4.4 One-dimensional distributions

cannam@95: cannam@95:

For one-dimensional distributed DFTs using FFTW, matters are slightly cannam@95: more complicated because the data distribution is more closely tied to cannam@95: how the algorithm works. In particular, you can no longer pass an cannam@95: arbitrary block size and must accept FFTW's default; also, the block cannam@95: sizes may be different for input and output. Also, the data cannam@95: distribution depends on the flags and transform direction, in order cannam@95: for forward and backward transforms to work correctly. cannam@95: cannam@95:

     ptrdiff_t fftw_mpi_local_size_1d(ptrdiff_t n0, MPI_Comm comm,
cannam@95:                      int sign, unsigned flags,
cannam@95:                      ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
cannam@95:                      ptrdiff_t *local_no, ptrdiff_t *local_o_start);
cannam@95: 
cannam@95:

cannam@95: This function computes the data distribution for a 1d transform of cannam@95: size n0 with the given transform sign and flags. cannam@95: Both input and output data use block distributions. The input on the cannam@95: current process will consist of local_ni numbers starting at cannam@95: index local_i_start; e.g. if only a single process is used, cannam@95: then local_ni will be n0 and local_i_start will cannam@95: be 0. Similarly for the output, with local_no numbers cannam@95: starting at index local_o_start. The return value of cannam@95: fftw_mpi_local_size_1d will be the total number of elements to cannam@95: allocate on the current process (which might be slightly larger than cannam@95: the local size due to intermediate steps in the algorithm). cannam@95: cannam@95:

As mentioned above (see Load balancing), the data will be divided cannam@95: equally among the processes if n0 is divisible by the cannam@95: square of the number of processes. In this case, cannam@95: local_ni will equal local_no. Otherwise, they may be cannam@95: different. cannam@95: cannam@95:

For some applications, such as convolutions, the order of the output cannam@95: data is irrelevant. In this case, performance can be improved by cannam@95: specifying that the output data be stored in an FFTW-defined cannam@95: “scrambled” format. (In particular, this is the analogue of cannam@95: transposed output in the multidimensional case: scrambled output saves cannam@95: a communications step.) If you pass FFTW_MPI_SCRAMBLED_OUT in cannam@95: the flags, then the output is stored in this (undocumented) scrambled cannam@95: order. Conversely, to perform the inverse transform of data in cannam@95: scrambled order, pass the FFTW_MPI_SCRAMBLED_IN flag. cannam@95: cannam@95: cannam@95:

In MPI FFTW, only composite sizes n0 can be parallelized; we cannam@95: have not yet implemented a parallel algorithm for large prime sizes. cannam@95: cannam@95: cannam@95: cannam@95: