As described above (see MPI Data Distribution), in order to
allocate your arrays, before creating a plan, you must first
call one of the following routines to determine the required
allocation size and the portion of the array locally stored on a given
process.  The MPI_Comm communicator passed here must be
equivalent to the communicator used below for plan creation.
   
The basic interface for multidimensional transforms consists of the functions:
     ptrdiff_t fftw_mpi_local_size_2d(ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
                                      ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
     ptrdiff_t fftw_mpi_local_size_3d(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
                                      MPI_Comm comm,
                                      ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
     ptrdiff_t fftw_mpi_local_size(int rnk, const ptrdiff_t *n, MPI_Comm comm,
                                   ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
     
     ptrdiff_t fftw_mpi_local_size_2d_transposed(ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
                                                 ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
                                                 ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
     ptrdiff_t fftw_mpi_local_size_3d_transposed(ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2,
                                                 MPI_Comm comm,
                                                 ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
                                                 ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
     ptrdiff_t fftw_mpi_local_size_transposed(int rnk, const ptrdiff_t *n, MPI_Comm comm,
                                              ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
                                              ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
   These functions return the number of elements to allocate (complex
numbers for DFT/r2c/c2r plans, real numbers for r2r plans), whereas
the local_n0 and local_0_start return the portion
(local_0_start to local_0_start + local_n0 - 1) of the
first dimension of an n0 × n1 × n2 × … × nd-1 array that is stored on the local
process.  See Basic and advanced distribution interfaces.  For
FFTW_MPI_TRANSPOSED_OUT plans, the ‘_transposed’ variants
are useful in order to also return the local portion of the first
dimension in the n1 × n0 × n2 ×…× nd-1 transposed output.  See Transposed distributions.  The advanced interface for multidimensional
transforms is:
   
     ptrdiff_t fftw_mpi_local_size_many(int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
                                        ptrdiff_t block0, MPI_Comm comm,
                                        ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
     ptrdiff_t fftw_mpi_local_size_many_transposed(int rnk, const ptrdiff_t *n, ptrdiff_t howmany,
                                                   ptrdiff_t block0, ptrdiff_t block1, MPI_Comm comm,
                                                   ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
                                                   ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
   These differ from the basic interface in only two ways.  First, they
allow you to specify block sizes block0 and block1 (the
latter for the transposed output); you can pass
FFTW_MPI_DEFAULT_BLOCK to use FFTW's default block size as in
the basic interface.  Second, you can pass a howmany parameter,
corresponding to the advanced planning interface below: this is for
transforms of contiguous howmany-tuples of numbers
(howmany = 1 in the basic interface).
   
The corresponding basic and advanced routines for one-dimensional transforms (currently only complex DFTs) are:
     ptrdiff_t fftw_mpi_local_size_1d(
                  ptrdiff_t n0, MPI_Comm comm, int sign, unsigned flags,
                  ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
                  ptrdiff_t *local_no, ptrdiff_t *local_o_start);
     ptrdiff_t fftw_mpi_local_size_many_1d(
                  ptrdiff_t n0, ptrdiff_t howmany,
                  MPI_Comm comm, int sign, unsigned flags,
                  ptrdiff_t *local_ni, ptrdiff_t *local_i_start,
                  ptrdiff_t *local_no, ptrdiff_t *local_o_start);
   As above, the return value is the number of elements to allocate
(complex numbers, for complex DFTs).  The local_ni and
local_i_start arguments return the portion
(local_i_start to local_i_start + local_ni - 1) of the
1d array that is stored on this process for the transform
input, and local_no and local_o_start are the
corresponding quantities for the input.  The sign
(FFTW_FORWARD or FFTW_BACKWARD) and flags must
match the arguments passed when creating a plan.  Although the inputs
and outputs have different data distributions in general, it is
guaranteed that the output data distribution of an
FFTW_FORWARD plan will match the input data distribution
of an FFTW_BACKWARD plan and vice versa; similarly for the
FFTW_MPI_SCRAMBLED_OUT and FFTW_MPI_SCRAMBLED_IN flags. 
See One-dimensional distributions.