annotate src/fftw-3.3.3/doc/html/MPI-Data-Distribution.html @ 83:ae30d91d2ffe

Replace these with versions built using an older toolset (so as to avoid ABI compatibilities when linking on Ubuntu 14.04 for packaging purposes)
author Chris Cannam
date Fri, 07 Feb 2020 11:51:13 +0000
parents 37bf6b4a2645
children
rev   line source
Chris@10 1 <html lang="en">
Chris@10 2 <head>
Chris@10 3 <title>MPI Data Distribution - FFTW 3.3.3</title>
Chris@10 4 <meta http-equiv="Content-Type" content="text/html">
Chris@10 5 <meta name="description" content="FFTW 3.3.3">
Chris@10 6 <meta name="generator" content="makeinfo 4.13">
Chris@10 7 <link title="Top" rel="start" href="index.html#Top">
Chris@10 8 <link rel="up" href="Distributed_002dmemory-FFTW-with-MPI.html#Distributed_002dmemory-FFTW-with-MPI" title="Distributed-memory FFTW with MPI">
Chris@10 9 <link rel="prev" href="2d-MPI-example.html#g_t2d-MPI-example" title="2d MPI example">
Chris@10 10 <link rel="next" href="Multi_002ddimensional-MPI-DFTs-of-Real-Data.html#Multi_002ddimensional-MPI-DFTs-of-Real-Data" title="Multi-dimensional MPI DFTs of Real Data">
Chris@10 11 <link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
Chris@10 12 <!--
Chris@10 13 This manual is for FFTW
Chris@10 14 (version 3.3.3, 25 November 2012).
Chris@10 15
Chris@10 16 Copyright (C) 2003 Matteo Frigo.
Chris@10 17
Chris@10 18 Copyright (C) 2003 Massachusetts Institute of Technology.
Chris@10 19
Chris@10 20 Permission is granted to make and distribute verbatim copies of
Chris@10 21 this manual provided the copyright notice and this permission
Chris@10 22 notice are preserved on all copies.
Chris@10 23
Chris@10 24 Permission is granted to copy and distribute modified versions of
Chris@10 25 this manual under the conditions for verbatim copying, provided
Chris@10 26 that the entire resulting derived work is distributed under the
Chris@10 27 terms of a permission notice identical to this one.
Chris@10 28
Chris@10 29 Permission is granted to copy and distribute translations of this
Chris@10 30 manual into another language, under the above conditions for
Chris@10 31 modified versions, except that this permission notice may be
Chris@10 32 stated in a translation approved by the Free Software Foundation.
Chris@10 33 -->
Chris@10 34 <meta http-equiv="Content-Style-Type" content="text/css">
Chris@10 35 <style type="text/css"><!--
Chris@10 36 pre.display { font-family:inherit }
Chris@10 37 pre.format { font-family:inherit }
Chris@10 38 pre.smalldisplay { font-family:inherit; font-size:smaller }
Chris@10 39 pre.smallformat { font-family:inherit; font-size:smaller }
Chris@10 40 pre.smallexample { font-size:smaller }
Chris@10 41 pre.smalllisp { font-size:smaller }
Chris@10 42 span.sc { font-variant:small-caps }
Chris@10 43 span.roman { font-family:serif; font-weight:normal; }
Chris@10 44 span.sansserif { font-family:sans-serif; font-weight:normal; }
Chris@10 45 --></style>
Chris@10 46 </head>
Chris@10 47 <body>
Chris@10 48 <div class="node">
Chris@10 49 <a name="MPI-Data-Distribution"></a>
Chris@10 50 <p>
Chris@10 51 Next:&nbsp;<a rel="next" accesskey="n" href="Multi_002ddimensional-MPI-DFTs-of-Real-Data.html#Multi_002ddimensional-MPI-DFTs-of-Real-Data">Multi-dimensional MPI DFTs of Real Data</a>,
Chris@10 52 Previous:&nbsp;<a rel="previous" accesskey="p" href="2d-MPI-example.html#g_t2d-MPI-example">2d MPI example</a>,
Chris@10 53 Up:&nbsp;<a rel="up" accesskey="u" href="Distributed_002dmemory-FFTW-with-MPI.html#Distributed_002dmemory-FFTW-with-MPI">Distributed-memory FFTW with MPI</a>
Chris@10 54 <hr>
Chris@10 55 </div>
Chris@10 56
Chris@10 57 <h3 class="section">6.4 MPI Data Distribution</h3>
Chris@10 58
Chris@10 59 <p><a name="index-data-distribution-368"></a>
Chris@10 60 The most important concept to understand in using FFTW's MPI interface
Chris@10 61 is the data distribution. With a serial or multithreaded FFT, all of
Chris@10 62 the inputs and outputs are stored as a single contiguous chunk of
Chris@10 63 memory. With a distributed-memory FFT, the inputs and outputs are
Chris@10 64 broken into disjoint blocks, one per process.
Chris@10 65
Chris@10 66 <p>In particular, FFTW uses a <em>1d block distribution</em> of the data,
Chris@10 67 distributed along the <em>first dimension</em>. For example, if you
Chris@10 68 want to perform a 100&nbsp;&times;&nbsp;200 complex DFT, distributed over 4
Chris@10 69 processes, each process will get a 25&nbsp;&times;&nbsp;200 slice of the data.
Chris@10 70 That is, process 0 will get rows 0 through 24, process 1 will get rows
Chris@10 71 25 through 49, process 2 will get rows 50 through 74, and process 3
Chris@10 72 will get rows 75 through 99. If you take the same array but
Chris@10 73 distribute it over 3 processes, then it is not evenly divisible so the
Chris@10 74 different processes will have unequal chunks. FFTW's default choice
Chris@10 75 in this case is to assign 34 rows to processes 0 and 1, and 32 rows to
Chris@10 76 process 2.
Chris@10 77 <a name="index-block-distribution-369"></a>
Chris@10 78
Chris@10 79 <p>FFTW provides several &lsquo;<samp><span class="samp">fftw_mpi_local_size</span></samp>&rsquo; routines that you can
Chris@10 80 call to find out what portion of an array is stored on the current
Chris@10 81 process. In most cases, you should use the default block sizes picked
Chris@10 82 by FFTW, but it is also possible to specify your own block size. For
Chris@10 83 example, with a 100&nbsp;&times;&nbsp;200 array on three processes, you can
Chris@10 84 tell FFTW to use a block size of 40, which would assign 40 rows to
Chris@10 85 processes 0 and 1, and 20 rows to process 2. FFTW's default is to
Chris@10 86 divide the data equally among the processes if possible, and as best
Chris@10 87 it can otherwise. The rows are always assigned in &ldquo;rank order,&rdquo;
Chris@10 88 i.e. process 0 gets the first block of rows, then process 1, and so
Chris@10 89 on. (You can change this by using <code>MPI_Comm_split</code> to create a
Chris@10 90 new communicator with re-ordered processes.) However, you should
Chris@10 91 always call the &lsquo;<samp><span class="samp">fftw_mpi_local_size</span></samp>&rsquo; routines, if possible,
Chris@10 92 rather than trying to predict FFTW's distribution choices.
Chris@10 93
Chris@10 94 <p>In particular, it is critical that you allocate the storage size that
Chris@10 95 is returned by &lsquo;<samp><span class="samp">fftw_mpi_local_size</span></samp>&rsquo;, which is <em>not</em>
Chris@10 96 necessarily the size of the local slice of the array. The reason is
Chris@10 97 that intermediate steps of FFTW's algorithms involve transposing the
Chris@10 98 array and redistributing the data, so at these intermediate steps FFTW
Chris@10 99 may require more local storage space (albeit always proportional to
Chris@10 100 the total size divided by the number of processes). The
Chris@10 101 &lsquo;<samp><span class="samp">fftw_mpi_local_size</span></samp>&rsquo; functions know how much storage is required
Chris@10 102 for these intermediate steps and tell you the correct amount to
Chris@10 103 allocate.
Chris@10 104
Chris@10 105 <ul class="menu">
Chris@10 106 <li><a accesskey="1" href="Basic-and-advanced-distribution-interfaces.html#Basic-and-advanced-distribution-interfaces">Basic and advanced distribution interfaces</a>
Chris@10 107 <li><a accesskey="2" href="Load-balancing.html#Load-balancing">Load balancing</a>
Chris@10 108 <li><a accesskey="3" href="Transposed-distributions.html#Transposed-distributions">Transposed distributions</a>
Chris@10 109 <li><a accesskey="4" href="One_002ddimensional-distributions.html#One_002ddimensional-distributions">One-dimensional distributions</a>
Chris@10 110 </ul>
Chris@10 111
Chris@10 112 </body></html>
Chris@10 113