FFTW 3.3.5: Load balancing

cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: FFTW 3.3.5: Load balancing cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: cannam@127:

cannam@127: cannam@127:

6.4.2 Load balancing

cannam@127: cannam@127: cannam@127:

Ideally, when you parallelize a transform over some P cannam@127: processes, each process should end up with work that takes equal time. cannam@127: Otherwise, all of the processes end up waiting on whichever process is cannam@127: slowest. This goal is known as “load balancing.” In this section, cannam@127: we describe the circumstances under which FFTW is able to load-balance cannam@127: well, and in particular how you should choose your transform size in cannam@127: order to load balance. cannam@127:

cannam@127:

Load balancing is especially difficult when you are parallelizing over cannam@127: heterogeneous machines; for example, if one of your processors is a cannam@127: old 486 and another is a Pentium IV, obviously you should give the cannam@127: Pentium more work to do than the 486 since the latter is much slower. cannam@127: FFTW does not deal with this problem, however—it assumes that your cannam@127: processes run on hardware of comparable speed, and that the goal is cannam@127: therefore to divide the problem as equally as possible. cannam@127:

cannam@127:

For a multi-dimensional complex DFT, FFTW can divide the problem cannam@127: equally among the processes if: (i) the first dimension cannam@127: n0 is divisible by P; and (ii), the product of cannam@127: the subsequent dimensions is divisible by P. (For the advanced cannam@127: interface, where you can specify multiple simultaneous transforms via cannam@127: some “vector” length howmany, a factor of howmany is cannam@127: included in the product of the subsequent dimensions.) cannam@127:

cannam@127:

For a one-dimensional complex DFT, the length N of the data cannam@127: should be divisible by P squared to be able to divide cannam@127: the problem equally among the processes. cannam@127:

cannam@127: cannam@127: cannam@127: cannam@127: cannam@127: