cannam@127: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> cannam@127: <html> cannam@127: <!-- This manual is for FFTW cannam@127: (version 3.3.5, 30 July 2016). cannam@127: cannam@127: Copyright (C) 2003 Matteo Frigo. cannam@127: cannam@127: Copyright (C) 2003 Massachusetts Institute of Technology. cannam@127: cannam@127: Permission is granted to make and distribute verbatim copies of this cannam@127: manual provided the copyright notice and this permission notice are cannam@127: preserved on all copies. cannam@127: cannam@127: Permission is granted to copy and distribute modified versions of this cannam@127: manual under the conditions for verbatim copying, provided that the cannam@127: entire resulting derived work is distributed under the terms of a cannam@127: permission notice identical to this one. cannam@127: cannam@127: Permission is granted to copy and distribute translations of this manual cannam@127: into another language, under the above conditions for modified versions, cannam@127: except that this permission notice may be stated in a translation cannam@127: approved by the Free Software Foundation. --> cannam@127: <!-- Created by GNU Texinfo 5.2, http://www.gnu.org/software/texinfo/ --> cannam@127: <head> cannam@127: <title>FFTW 3.3.5: FFTW MPI Performance Tips</title> cannam@127: cannam@127: <meta name="description" content="FFTW 3.3.5: FFTW MPI Performance Tips"> cannam@127: <meta name="keywords" content="FFTW 3.3.5: FFTW MPI Performance Tips"> cannam@127: <meta name="resource-type" content="document"> cannam@127: <meta name="distribution" content="global"> cannam@127: <meta name="Generator" content="makeinfo"> cannam@127: <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> cannam@127: <link href="index.html#Top" rel="start" title="Top"> cannam@127: <link href="Concept-Index.html#Concept-Index" rel="index" title="Concept Index"> cannam@127: <link href="index.html#SEC_Contents" rel="contents" title="Table of Contents"> cannam@127: <link href="Distributed_002dmemory-FFTW-with-MPI.html#Distributed_002dmemory-FFTW-with-MPI" rel="up" title="Distributed-memory FFTW with MPI"> cannam@127: <link href="Combining-MPI-and-Threads.html#Combining-MPI-and-Threads" rel="next" title="Combining MPI and Threads"> cannam@127: <link href="Avoiding-MPI-Deadlocks.html#Avoiding-MPI-Deadlocks" rel="prev" title="Avoiding MPI Deadlocks"> cannam@127: <style type="text/css"> cannam@127: <!-- cannam@127: a.summary-letter {text-decoration: none} cannam@127: blockquote.smallquotation {font-size: smaller} cannam@127: div.display {margin-left: 3.2em} cannam@127: div.example {margin-left: 3.2em} cannam@127: div.indentedblock {margin-left: 3.2em} cannam@127: div.lisp {margin-left: 3.2em} cannam@127: div.smalldisplay {margin-left: 3.2em} cannam@127: div.smallexample {margin-left: 3.2em} cannam@127: div.smallindentedblock {margin-left: 3.2em; font-size: smaller} cannam@127: div.smalllisp {margin-left: 3.2em} cannam@127: kbd {font-style:oblique} cannam@127: pre.display {font-family: inherit} cannam@127: pre.format {font-family: inherit} cannam@127: pre.menu-comment {font-family: serif} cannam@127: pre.menu-preformatted {font-family: serif} cannam@127: pre.smalldisplay {font-family: inherit; font-size: smaller} cannam@127: pre.smallexample {font-size: smaller} cannam@127: pre.smallformat {font-family: inherit; font-size: smaller} cannam@127: pre.smalllisp {font-size: smaller} cannam@127: span.nocodebreak {white-space:nowrap} cannam@127: span.nolinebreak {white-space:nowrap} cannam@127: span.roman {font-family:serif; font-weight:normal} cannam@127: span.sansserif {font-family:sans-serif; font-weight:normal} cannam@127: ul.no-bullet {list-style: none} cannam@127: --> cannam@127: </style> cannam@127: cannam@127: cannam@127: </head> cannam@127: cannam@127: <body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000"> cannam@127: <a name="FFTW-MPI-Performance-Tips"></a> cannam@127: <div class="header"> cannam@127: <p> cannam@127: Next: <a href="Combining-MPI-and-Threads.html#Combining-MPI-and-Threads" accesskey="n" rel="next">Combining MPI and Threads</a>, Previous: <a href="Avoiding-MPI-Deadlocks.html#Avoiding-MPI-Deadlocks" accesskey="p" rel="prev">Avoiding MPI Deadlocks</a>, Up: <a href="Distributed_002dmemory-FFTW-with-MPI.html#Distributed_002dmemory-FFTW-with-MPI" accesskey="u" rel="up">Distributed-memory FFTW with MPI</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p> cannam@127: </div> cannam@127: <hr> cannam@127: <a name="FFTW-MPI-Performance-Tips-1"></a> cannam@127: <h3 class="section">6.10 FFTW MPI Performance Tips</h3> cannam@127: cannam@127: <p>In this section, we collect a few tips on getting the best performance cannam@127: out of FFTW’s MPI transforms. cannam@127: </p> cannam@127: <p>First, because of the 1d block distribution, FFTW’s parallelization is cannam@127: currently limited by the size of the first dimension. cannam@127: (Multidimensional block distributions may be supported by a future cannam@127: version.) More generally, you should ideally arrange the dimensions so cannam@127: that FFTW can divide them equally among the processes. See <a href="Load-balancing.html#Load-balancing">Load balancing</a>. cannam@127: <a name="index-block-distribution-2"></a> cannam@127: <a name="index-load-balancing-1"></a> cannam@127: </p> cannam@127: cannam@127: <p>Second, if it is not too inconvenient, you should consider working cannam@127: with transposed output for multidimensional plans, as this saves a cannam@127: considerable amount of communications. See <a href="Transposed-distributions.html#Transposed-distributions">Transposed distributions</a>. cannam@127: <a name="index-transpose-3"></a> cannam@127: </p> cannam@127: cannam@127: <p>Third, the fastest choices are generally either an in-place transform cannam@127: or an out-of-place transform with the <code>FFTW_DESTROY_INPUT</code> flag cannam@127: (which allows the input array to be used as scratch space). In-place cannam@127: is especially beneficial if the amount of data per process is large. cannam@127: <a name="index-FFTW_005fDESTROY_005fINPUT-1"></a> cannam@127: </p> cannam@127: cannam@127: <p>Fourth, if you have multiple arrays to transform at once, rather than cannam@127: calling FFTW’s MPI transforms several times it usually seems to be cannam@127: faster to interleave the data and use the advanced interface. (This cannam@127: groups the communications together instead of requiring separate cannam@127: messages for each transform.) cannam@127: </p> cannam@127: cannam@127: cannam@127: cannam@127: </body> cannam@127: </html>