cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: FFTW 3.3.8: FFTW MPI Performance Tips cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: cannam@167:
cannam@167:

cannam@167: Next: , Previous: , Up: Distributed-memory FFTW with MPI   [Contents][Index]

cannam@167:
cannam@167:
cannam@167: cannam@167:

6.10 FFTW MPI Performance Tips

cannam@167: cannam@167:

In this section, we collect a few tips on getting the best performance cannam@167: out of FFTW’s MPI transforms. cannam@167:

cannam@167:

First, because of the 1d block distribution, FFTW’s parallelization is cannam@167: currently limited by the size of the first dimension. cannam@167: (Multidimensional block distributions may be supported by a future cannam@167: version.) More generally, you should ideally arrange the dimensions so cannam@167: that FFTW can divide them equally among the processes. See Load balancing. cannam@167: cannam@167: cannam@167:

cannam@167: cannam@167:

Second, if it is not too inconvenient, you should consider working cannam@167: with transposed output for multidimensional plans, as this saves a cannam@167: considerable amount of communications. See Transposed distributions. cannam@167: cannam@167:

cannam@167: cannam@167:

Third, the fastest choices are generally either an in-place transform cannam@167: or an out-of-place transform with the FFTW_DESTROY_INPUT flag cannam@167: (which allows the input array to be used as scratch space). In-place cannam@167: is especially beneficial if the amount of data per process is large. cannam@167: cannam@167:

cannam@167: cannam@167:

Fourth, if you have multiple arrays to transform at once, rather than cannam@167: calling FFTW’s MPI transforms several times it usually seems to be cannam@167: faster to interleave the data and use the advanced interface. (This cannam@167: groups the communications together instead of requiring separate cannam@167: messages for each transform.) cannam@167:

cannam@167: cannam@167: cannam@167: cannam@167: cannam@167: