Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: FFTW 3.3.8: FFTW MPI Performance Tips Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82:
Chris@82:

Chris@82: Next: , Previous: , Up: Distributed-memory FFTW with MPI   [Contents][Index]

Chris@82:
Chris@82:
Chris@82: Chris@82:

6.10 FFTW MPI Performance Tips

Chris@82: Chris@82:

In this section, we collect a few tips on getting the best performance Chris@82: out of FFTW’s MPI transforms. Chris@82:

Chris@82:

First, because of the 1d block distribution, FFTW’s parallelization is Chris@82: currently limited by the size of the first dimension. Chris@82: (Multidimensional block distributions may be supported by a future Chris@82: version.) More generally, you should ideally arrange the dimensions so Chris@82: that FFTW can divide them equally among the processes. See Load balancing. Chris@82: Chris@82: Chris@82:

Chris@82: Chris@82:

Second, if it is not too inconvenient, you should consider working Chris@82: with transposed output for multidimensional plans, as this saves a Chris@82: considerable amount of communications. See Transposed distributions. Chris@82: Chris@82:

Chris@82: Chris@82:

Third, the fastest choices are generally either an in-place transform Chris@82: or an out-of-place transform with the FFTW_DESTROY_INPUT flag Chris@82: (which allows the input array to be used as scratch space). In-place Chris@82: is especially beneficial if the amount of data per process is large. Chris@82: Chris@82:

Chris@82: Chris@82:

Fourth, if you have multiple arrays to transform at once, rather than Chris@82: calling FFTW’s MPI transforms several times it usually seems to be Chris@82: faster to interleave the data and use the advanced interface. (This Chris@82: groups the communications together instead of requiring separate Chris@82: messages for each transform.) Chris@82:

Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: