FFTW 3.3.8: An improved replacement for MPI

cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: FFTW 3.3.8: An improved replacement for MPI_Alltoall cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167:

cannam@167: cannam@167:

6.7.3 An improved replacement for MPI_Alltoall

cannam@167: cannam@167:

We close this section by noting that FFTW’s MPI transpose routines can cannam@167: be thought of as a generalization for the MPI_Alltoall function cannam@167: (albeit only for floating-point types), and in some circumstances can cannam@167: function as an improved replacement. cannam@167: cannam@167:

cannam@167: cannam@167:

MPI_Alltoall is defined by the MPI standard as: cannam@167:

cannam@167:

int MPI_Alltoall(void *sendbuf, int sendcount, MPI_Datatype sendtype, 
cannam@167:                  void *recvbuf, int recvcnt, MPI_Datatype recvtype, 
cannam@167:                  MPI_Comm comm);
cannam@167:

cannam@167: cannam@167:

In particular, for double* arrays in and out, cannam@167: consider the call: cannam@167:

cannam@167:

MPI_Alltoall(in, howmany, MPI_DOUBLE, out, howmany MPI_DOUBLE, comm);
cannam@167:

cannam@167: cannam@167:

This is completely equivalent to: cannam@167:

cannam@167:

MPI_Comm_size(comm, &P);
cannam@167: plan = fftw_mpi_plan_many_transpose(P, P, howmany, 1, 1, in, out, comm, FFTW_ESTIMATE);
cannam@167: fftw_execute(plan);
cannam@167: fftw_destroy_plan(plan);
cannam@167:

cannam@167: cannam@167:

That is, computing a P × P cannam@167: transpose on P processes, cannam@167: with a block size of 1, is just a standard all-to-all communication. cannam@167:

cannam@167:

However, using the FFTW routine instead of MPI_Alltoall may cannam@167: have certain advantages. First of all, FFTW’s routine can operate cannam@167: in-place (in == out) whereas MPI_Alltoall can only cannam@167: operate out-of-place. cannam@167: cannam@167:

cannam@167: cannam@167:

Second, even for out-of-place plans, FFTW’s routine may be faster, cannam@167: especially if you need to perform the all-to-all communication many cannam@167: times and can afford to use FFTW_MEASURE or cannam@167: FFTW_PATIENT. It should certainly be no slower, not including cannam@167: the time to create the plan, since one of the possible algorithms that cannam@167: FFTW uses for an out-of-place transpose is simply to call cannam@167: MPI_Alltoall. However, FFTW also considers several other cannam@167: possible algorithms that, depending on your MPI implementation and cannam@167: your hardware, may be faster. cannam@167: cannam@167: cannam@167:

cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: