Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: FFTW 3.3.8: Load balancing Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: Chris@82:
Chris@82:

Chris@82: Next: , Previous: , Up: MPI Data Distribution   [Contents][Index]

Chris@82:
Chris@82:
Chris@82: Chris@82:

6.4.2 Load balancing

Chris@82: Chris@82: Chris@82:

Ideally, when you parallelize a transform over some P Chris@82: processes, each process should end up with work that takes equal time. Chris@82: Otherwise, all of the processes end up waiting on whichever process is Chris@82: slowest. This goal is known as “load balancing.” In this section, Chris@82: we describe the circumstances under which FFTW is able to load-balance Chris@82: well, and in particular how you should choose your transform size in Chris@82: order to load balance. Chris@82:

Chris@82:

Load balancing is especially difficult when you are parallelizing over Chris@82: heterogeneous machines; for example, if one of your processors is a Chris@82: old 486 and another is a Pentium IV, obviously you should give the Chris@82: Pentium more work to do than the 486 since the latter is much slower. Chris@82: FFTW does not deal with this problem, however—it assumes that your Chris@82: processes run on hardware of comparable speed, and that the goal is Chris@82: therefore to divide the problem as equally as possible. Chris@82:

Chris@82:

For a multi-dimensional complex DFT, FFTW can divide the problem Chris@82: equally among the processes if: (i) the first dimension Chris@82: n0 is divisible by P; and (ii), the product of Chris@82: the subsequent dimensions is divisible by P. (For the advanced Chris@82: interface, where you can specify multiple simultaneous transforms via Chris@82: some “vector” length howmany, a factor of howmany is Chris@82: included in the product of the subsequent dimensions.) Chris@82:

Chris@82:

For a one-dimensional complex DFT, the length N of the data Chris@82: should be divisible by P squared to be able to divide Chris@82: the problem equally among the processes. Chris@82:

Chris@82: Chris@82: Chris@82: Chris@82: Chris@82: