cannam@95: <html lang="en">
cannam@95: <head>
cannam@95: <title>Load balancing - FFTW 3.3.3</title>
cannam@95: <meta http-equiv="Content-Type" content="text/html">
cannam@95: <meta name="description" content="FFTW 3.3.3">
cannam@95: <meta name="generator" content="makeinfo 4.13">
cannam@95: <link title="Top" rel="start" href="index.html#Top">
cannam@95: <link rel="up" href="MPI-Data-Distribution.html#MPI-Data-Distribution" title="MPI Data Distribution">
cannam@95: <link rel="prev" href="Basic-and-advanced-distribution-interfaces.html#Basic-and-advanced-distribution-interfaces" title="Basic and advanced distribution interfaces">
cannam@95: <link rel="next" href="Transposed-distributions.html#Transposed-distributions" title="Transposed distributions">
cannam@95: <link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
cannam@95: <!--
cannam@95: This manual is for FFTW
cannam@95: (version 3.3.3, 25 November 2012).
cannam@95: 
cannam@95: Copyright (C) 2003 Matteo Frigo.
cannam@95: 
cannam@95: Copyright (C) 2003 Massachusetts Institute of Technology.
cannam@95: 
cannam@95:      Permission is granted to make and distribute verbatim copies of
cannam@95:      this manual provided the copyright notice and this permission
cannam@95:      notice are preserved on all copies.
cannam@95: 
cannam@95:      Permission is granted to copy and distribute modified versions of
cannam@95:      this manual under the conditions for verbatim copying, provided
cannam@95:      that the entire resulting derived work is distributed under the
cannam@95:      terms of a permission notice identical to this one.
cannam@95: 
cannam@95:      Permission is granted to copy and distribute translations of this
cannam@95:      manual into another language, under the above conditions for
cannam@95:      modified versions, except that this permission notice may be
cannam@95:      stated in a translation approved by the Free Software Foundation.
cannam@95:    -->
cannam@95: <meta http-equiv="Content-Style-Type" content="text/css">
cannam@95: <style type="text/css"><!--
cannam@95:   pre.display { font-family:inherit }
cannam@95:   pre.format  { font-family:inherit }
cannam@95:   pre.smalldisplay { font-family:inherit; font-size:smaller }
cannam@95:   pre.smallformat  { font-family:inherit; font-size:smaller }
cannam@95:   pre.smallexample { font-size:smaller }
cannam@95:   pre.smalllisp    { font-size:smaller }
cannam@95:   span.sc    { font-variant:small-caps }
cannam@95:   span.roman { font-family:serif; font-weight:normal; } 
cannam@95:   span.sansserif { font-family:sans-serif; font-weight:normal; } 
cannam@95: --></style>
cannam@95: </head>
cannam@95: <body>
cannam@95: <div class="node">
cannam@95: <a name="Load-balancing"></a>
cannam@95: <p>
cannam@95: Next:&nbsp;<a rel="next" accesskey="n" href="Transposed-distributions.html#Transposed-distributions">Transposed distributions</a>,
cannam@95: Previous:&nbsp;<a rel="previous" accesskey="p" href="Basic-and-advanced-distribution-interfaces.html#Basic-and-advanced-distribution-interfaces">Basic and advanced distribution interfaces</a>,
cannam@95: Up:&nbsp;<a rel="up" accesskey="u" href="MPI-Data-Distribution.html#MPI-Data-Distribution">MPI Data Distribution</a>
cannam@95: <hr>
cannam@95: </div>
cannam@95: 
cannam@95: <h4 class="subsection">6.4.2 Load balancing</h4>
cannam@95: 
cannam@95: <p><a name="index-load-balancing-378"></a>
cannam@95: Ideally, when you parallelize a transform over some P
cannam@95: processes, each process should end up with work that takes equal time. 
cannam@95: Otherwise, all of the processes end up waiting on whichever process is
cannam@95: slowest.  This goal is known as &ldquo;load balancing.&rdquo;  In this section,
cannam@95: we describe the circumstances under which FFTW is able to load-balance
cannam@95: well, and in particular how you should choose your transform size in
cannam@95: order to load balance.
cannam@95: 
cannam@95:    <p>Load balancing is especially difficult when you are parallelizing over
cannam@95: heterogeneous machines; for example, if one of your processors is a
cannam@95: old 486 and another is a Pentium IV, obviously you should give the
cannam@95: Pentium more work to do than the 486 since the latter is much slower. 
cannam@95: FFTW does not deal with this problem, however&mdash;it assumes that your
cannam@95: processes run on hardware of comparable speed, and that the goal is
cannam@95: therefore to divide the problem as equally as possible.
cannam@95: 
cannam@95:    <p>For a multi-dimensional complex DFT, FFTW can divide the problem
cannam@95: equally among the processes if: (i) the <em>first</em> dimension
cannam@95: <code>n0</code> is divisible by P; and (ii), the <em>product</em> of
cannam@95: the subsequent dimensions is divisible by P.  (For the advanced
cannam@95: interface, where you can specify multiple simultaneous transforms via
cannam@95: some &ldquo;vector&rdquo; length <code>howmany</code>, a factor of <code>howmany</code> is
cannam@95: included in the product of the subsequent dimensions.)
cannam@95: 
cannam@95:    <p>For a one-dimensional complex DFT, the length <code>N</code> of the data
cannam@95: should be divisible by P <em>squared</em> to be able to divide
cannam@95: the problem equally among the processes.
cannam@95: 
cannam@95:    </body></html>
cannam@95: