MPI Plan Creation - FFTW 3.3.3

cannam@95: cannam@95: cannam@95: MPI Plan Creation - FFTW 3.3.3 cannam@95: cannam@95: cannam@95: cannam@95: cannam@95: cannam@95: cannam@95: cannam@95: cannam@95: cannam@95: cannam@95: cannam@95: cannam@95: cannam@95:

cannam@95: cannam@95:

cannam@95: Next: MPI Wisdom Communication, cannam@95: Previous: MPI Data Distribution Functions, cannam@95: Up: FFTW MPI Reference cannam@95:

cannam@95:

cannam@95: cannam@95:

6.12.5 MPI Plan Creation

cannam@95: cannam@95:

Complex-data MPI DFTs

cannam@95: cannam@95:

Plans for complex-data DFTs (see 2d MPI example) are created by: cannam@95: cannam@95:

cannam@95:

     fftw_plan fftw_mpi_plan_dft_1d(ptrdiff_t n0, fftw_complex *in, fftw_complex *out,
cannam@95:                                     MPI_Comm comm, int sign, unsigned flags);
cannam@95:      fftw_plan fftw_mpi_plan_dft_2d(ptrdiff_t n0, ptrdiff_t n1,
cannam@95:                                     fftw_complex *in, fftw_complex *out,
cannam@95:                                     MPI_Comm comm, int sign, unsigned flags);
cannam@95:      fftw_plan fftw_mpi_plan_dft_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
cannam@95:                                     fftw_complex *in, fftw_complex *out,
cannam@95:                                     MPI_Comm comm, int sign, unsigned flags);
cannam@95:      fftw_plan fftw_mpi_plan_dft(int rnk, const ptrdiff_t *n,
cannam@95:                                  fftw_complex *in, fftw_complex *out,
cannam@95:                                  MPI_Comm comm, int sign, unsigned flags);
cannam@95:      fftw_plan fftw_mpi_plan_many_dft(int rnk, const ptrdiff_t *n,
cannam@95:                                       ptrdiff_t howmany, ptrdiff_t block, ptrdiff_t tblock,
cannam@95:                                       fftw_complex *in, fftw_complex *out,
cannam@95:                                       MPI_Comm comm, int sign, unsigned flags);
cannam@95:

cannam@95:

These are similar to their serial counterparts (see Complex DFTs) cannam@95: in specifying the dimensions, sign, and flags of the transform. The cannam@95: comm argument gives an MPI communicator that specifies the set cannam@95: of processes to participate in the transform; plan creation is a cannam@95: collective function that must be called for all processes in the cannam@95: communicator. The in and out pointers refer only to a cannam@95: portion of the overall transform data (see MPI Data Distribution) cannam@95: as specified by the ‘local_size’ functions in the previous cannam@95: section. Unless flags contains FFTW_ESTIMATE, these cannam@95: arrays are overwritten during plan creation as for the serial cannam@95: interface. For multi-dimensional transforms, any dimensions > cannam@95: 1 are supported; for one-dimensional transforms, only composite cannam@95: (non-prime) n0 are currently supported (unlike the serial cannam@95: FFTW). Requesting an unsupported transform size will yield a cannam@95: NULL plan. (As in the serial interface, highly composite sizes cannam@95: generally yield the best performance.) cannam@95: cannam@95:

The advanced-interface fftw_mpi_plan_many_dft additionally cannam@95: allows you to specify the block sizes for the first dimension cannam@95: (block) of the n₀ × n₁ × n₂ × … × n_d-1 input data and the first dimension cannam@95: (tblock) of the n₁ × n₀ × n₂ ×…× n_d-1 transposed data (at intermediate cannam@95: steps of the transform, and for the output if cannam@95: FFTW_TRANSPOSED_OUT is specified in flags). These must cannam@95: be the same block sizes as were passed to the corresponding cannam@95: ‘local_size’ function; you can pass FFTW_MPI_DEFAULT_BLOCK cannam@95: to use FFTW's default block size as in the basic interface. Also, the cannam@95: howmany parameter specifies that the transform is of contiguous cannam@95: howmany-tuples rather than individual complex numbers; this cannam@95: corresponds to the same parameter in the serial advanced interface cannam@95: (see Advanced Complex DFTs) with stride = howmany and cannam@95: dist = 1. cannam@95: cannam@95:

MPI flags

cannam@95: cannam@95:

The flags can be any of those for the serial FFTW cannam@95: (see Planner Flags), and in addition may include one or more of cannam@95: the following MPI-specific flags, which improve performance at the cannam@95: cost of changing the output or input data formats. cannam@95: cannam@95:

FFTW_MPI_SCRAMBLED_OUT, FFTW_MPI_SCRAMBLED_IN: valid for cannam@95: 1d transforms only, these flags indicate that the output/input of the cannam@95: transform are in an undocumented “scrambled” order. A forward cannam@95: FFTW_MPI_SCRAMBLED_OUT transform can be inverted by a backward cannam@95: FFTW_MPI_SCRAMBLED_IN (times the usual 1/N normalization). cannam@95: See One-dimensional distributions. cannam@95: cannam@95:
FFTW_MPI_TRANSPOSED_OUT, FFTW_MPI_TRANSPOSED_IN: valid cannam@95: for multidimensional (rnk > 1) transforms only, these flags cannam@95: specify that the output or input of an n₀ × n₁ × n₂ × … × n_d-1 transform is cannam@95: transposed to n₁ × n₀ × n₂ ×…× n_d-1. See Transposed distributions. cannam@95: cannam@95:

cannam@95: cannam@95:

Real-data MPI DFTs

cannam@95: cannam@95:

Plans for real-input/output (r2c/c2r) DFTs (see Multi-dimensional MPI DFTs of Real Data) are created by: cannam@95: cannam@95:

cannam@95:

     fftw_plan fftw_mpi_plan_dft_r2c_2d(ptrdiff_t n0, ptrdiff_t n1,
cannam@95:                                         double *in, fftw_complex *out,
cannam@95:                                         MPI_Comm comm, unsigned flags);
cannam@95:      fftw_plan fftw_mpi_plan_dft_r2c_2d(ptrdiff_t n0, ptrdiff_t n1,
cannam@95:                                         double *in, fftw_complex *out,
cannam@95:                                         MPI_Comm comm, unsigned flags);
cannam@95:      fftw_plan fftw_mpi_plan_dft_r2c_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
cannam@95:                                         double *in, fftw_complex *out,
cannam@95:                                         MPI_Comm comm, unsigned flags);
cannam@95:      fftw_plan fftw_mpi_plan_dft_r2c(int rnk, const ptrdiff_t *n,
cannam@95:                                      double *in, fftw_complex *out,
cannam@95:                                      MPI_Comm comm, unsigned flags);
cannam@95:      fftw_plan fftw_mpi_plan_dft_c2r_2d(ptrdiff_t n0, ptrdiff_t n1,
cannam@95:                                         fftw_complex *in, double *out,
cannam@95:                                         MPI_Comm comm, unsigned flags);
cannam@95:      fftw_plan fftw_mpi_plan_dft_c2r_2d(ptrdiff_t n0, ptrdiff_t n1,
cannam@95:                                         fftw_complex *in, double *out,
cannam@95:                                         MPI_Comm comm, unsigned flags);
cannam@95:      fftw_plan fftw_mpi_plan_dft_c2r_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
cannam@95:                                         fftw_complex *in, double *out,
cannam@95:                                         MPI_Comm comm, unsigned flags);
cannam@95:      fftw_plan fftw_mpi_plan_dft_c2r(int rnk, const ptrdiff_t *n,
cannam@95:                                      fftw_complex *in, double *out,
cannam@95:                                      MPI_Comm comm, unsigned flags);
cannam@95:

cannam@95:

Similar to the serial interface (see Real-data DFTs), these cannam@95: transform logically n₀ × n₁ × n₂ × … × n_d-1 real data to/from n₀ × n₁ × n₂ × … × (n_d-1/2 + 1) complex cannam@95: data, representing the non-redundant half of the conjugate-symmetry cannam@95: output of a real-input DFT (see Multi-dimensional Transforms). cannam@95: However, the real array must be stored within a padded n₀ × n₁ × n₂ × … × [2 (n_d-1/2 + 1)] cannam@95: cannam@95:

array (much like the in-place serial r2c transforms, but here for cannam@95: out-of-place transforms as well). Currently, only multi-dimensional cannam@95: (rnk > 1) r2c/c2r transforms are supported (requesting a plan cannam@95: for rnk = 1 will yield NULL). As explained above cannam@95: (see Multi-dimensional MPI DFTs of Real Data), the data cannam@95: distribution of both the real and complex arrays is given by the cannam@95: ‘local_size’ function called for the dimensions of the cannam@95: complex array. Similar to the other planning functions, the cannam@95: input and output arrays are overwritten when the plan is created cannam@95: except in FFTW_ESTIMATE mode. cannam@95: cannam@95:

As for the complex DFTs above, there is an advance interface that cannam@95: allows you to manually specify block sizes and to transform contiguous cannam@95: howmany-tuples of real/complex numbers: cannam@95: cannam@95:

cannam@95:

     fftw_plan fftw_mpi_plan_many_dft_r2c
cannam@95:                    (int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
cannam@95:                     ptrdiff_t iblock, ptrdiff_t oblock,
cannam@95:                     double *in, fftw_complex *out,
cannam@95:                     MPI_Comm comm, unsigned flags);
cannam@95:      fftw_plan fftw_mpi_plan_many_dft_c2r
cannam@95:                    (int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
cannam@95:                     ptrdiff_t iblock, ptrdiff_t oblock,
cannam@95:                     fftw_complex *in, double *out,
cannam@95:                     MPI_Comm comm, unsigned flags);
cannam@95:

cannam@95:

MPI r2r transforms

cannam@95: cannam@95:

There are corresponding plan-creation routines for r2r cannam@95: transforms (see More DFTs of Real Data), currently supporting cannam@95: multidimensional (rnk > 1) transforms only (rnk = 1 will cannam@95: yield a NULL plan): cannam@95: cannam@95:

     fftw_plan fftw_mpi_plan_r2r_2d(ptrdiff_t n0, ptrdiff_t n1,
cannam@95:                                     double *in, double *out,
cannam@95:                                     MPI_Comm comm,
cannam@95:                                     fftw_r2r_kind kind0, fftw_r2r_kind kind1,
cannam@95:                                     unsigned flags);
cannam@95:      fftw_plan fftw_mpi_plan_r2r_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
cannam@95:                                     double *in, double *out,
cannam@95:                                     MPI_Comm comm,
cannam@95:                                     fftw_r2r_kind kind0, fftw_r2r_kind kind1, fftw_r2r_kind kind2,
cannam@95:                                     unsigned flags);
cannam@95:      fftw_plan fftw_mpi_plan_r2r(int rnk, const ptrdiff_t *n,
cannam@95:                                  double *in, double *out,
cannam@95:                                  MPI_Comm comm, const fftw_r2r_kind *kind,
cannam@95:                                  unsigned flags);
cannam@95:      fftw_plan fftw_mpi_plan_many_r2r(int rnk, const ptrdiff_t *n,
cannam@95:                                       ptrdiff_t iblock, ptrdiff_t oblock,
cannam@95:                                       double *in, double *out,
cannam@95:                                       MPI_Comm comm, const fftw_r2r_kind *kind,
cannam@95:                                       unsigned flags);
cannam@95:

cannam@95:

The parameters are much the same as for the complex DFTs above, except cannam@95: that the arrays are of real numbers (and hence the outputs of the cannam@95: ‘local_size’ data-distribution functions should be interpreted as cannam@95: counts of real rather than complex numbers). Also, the kind cannam@95: parameters specify the r2r kinds along each dimension as for the cannam@95: serial interface (see Real-to-Real Transform Kinds). See Other Multi-dimensional Real-data MPI Transforms. cannam@95: cannam@95:

MPI transposition

cannam@95: cannam@95:

cannam@95: FFTW also provides routines to plan a transpose of a distributed cannam@95: n0 by n1 array of real numbers, or an array of cannam@95: howmany-tuples of real numbers with specified block sizes cannam@95: (see FFTW MPI Transposes): cannam@95: cannam@95:

cannam@95:

     fftw_plan fftw_mpi_plan_transpose(ptrdiff_t n0, ptrdiff_t n1,
cannam@95:                                        double *in, double *out,
cannam@95:                                        MPI_Comm comm, unsigned flags);
cannam@95:      fftw_plan fftw_mpi_plan_many_transpose
cannam@95:                      (ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t howmany,
cannam@95:                       ptrdiff_t block0, ptrdiff_t block1,
cannam@95:                       double *in, double *out, MPI_Comm comm, unsigned flags);
cannam@95:

cannam@95:

These plans are used with the fftw_mpi_execute_r2r new-array cannam@95: execute function (see Using MPI Plans), since they count as (rank cannam@95: zero) r2r plans from FFTW's perspective. cannam@95: cannam@95: cannam@95: