annotate Lib/fftw-3.2.1/doc/html/.svn/text-base/Basic-and-advanced-distribution-interfaces.html.svn-base @ 14:636c989477e7

XML changes for Public.
author Geogaddi\David <d.m.ronan@qmul.ac.uk>
date Wed, 04 May 2016 11:02:59 +0100
parents 25bf17994ef1
children
rev   line source
d@0 1 <html lang="en">
d@0 2 <head>
d@0 3 <title>Basic and advanced distribution interfaces - FFTW 3.2alpha3</title>
d@0 4 <meta http-equiv="Content-Type" content="text/html">
d@0 5 <meta name="description" content="FFTW 3.2alpha3">
d@0 6 <meta name="generator" content="makeinfo 4.8">
d@0 7 <link title="Top" rel="start" href="index.html#Top">
d@0 8 <link rel="up" href="MPI-data-distribution.html#MPI-data-distribution" title="MPI data distribution">
d@0 9 <link rel="prev" href="MPI-data-distribution.html#MPI-data-distribution" title="MPI data distribution">
d@0 10 <link rel="next" href="Load-balancing.html#Load-balancing" title="Load balancing">
d@0 11 <link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
d@0 12 <!--
d@0 13 This manual is for FFTW
d@0 14 (version 3.2alpha3, 14 August 2007).
d@0 15
d@0 16 Copyright (C) 2003 Matteo Frigo.
d@0 17
d@0 18 Copyright (C) 2003 Massachusetts Institute of Technology.
d@0 19
d@0 20 Permission is granted to make and distribute verbatim copies of
d@0 21 this manual provided the copyright notice and this permission
d@0 22 notice are preserved on all copies.
d@0 23
d@0 24 Permission is granted to copy and distribute modified versions of
d@0 25 this manual under the conditions for verbatim copying, provided
d@0 26 that the entire resulting derived work is distributed under the
d@0 27 terms of a permission notice identical to this one.
d@0 28
d@0 29 Permission is granted to copy and distribute translations of this
d@0 30 manual into another language, under the above conditions for
d@0 31 modified versions, except that this permission notice may be
d@0 32 stated in a translation approved by the Free Software Foundation.
d@0 33 -->
d@0 34 <meta http-equiv="Content-Style-Type" content="text/css">
d@0 35 <style type="text/css"><!--
d@0 36 pre.display { font-family:inherit }
d@0 37 pre.format { font-family:inherit }
d@0 38 pre.smalldisplay { font-family:inherit; font-size:smaller }
d@0 39 pre.smallformat { font-family:inherit; font-size:smaller }
d@0 40 pre.smallexample { font-size:smaller }
d@0 41 pre.smalllisp { font-size:smaller }
d@0 42 span.sc { font-variant:small-caps }
d@0 43 span.roman { font-family:serif; font-weight:normal; }
d@0 44 span.sansserif { font-family:sans-serif; font-weight:normal; }
d@0 45 --></style>
d@0 46 </head>
d@0 47 <body>
d@0 48 <div class="node">
d@0 49 <p>
d@0 50 <a name="Basic-and-advanced-distribution-interfaces"></a>
d@0 51 Next:&nbsp;<a rel="next" accesskey="n" href="Load-balancing.html#Load-balancing">Load balancing</a>,
d@0 52 Previous:&nbsp;<a rel="previous" accesskey="p" href="MPI-data-distribution.html#MPI-data-distribution">MPI data distribution</a>,
d@0 53 Up:&nbsp;<a rel="up" accesskey="u" href="MPI-data-distribution.html#MPI-data-distribution">MPI data distribution</a>
d@0 54 <hr>
d@0 55 </div>
d@0 56
d@0 57 <h4 class="subsection">6.4.1 Basic and advanced distribution interfaces</h4>
d@0 58
d@0 59 <p>As with the planner interface, the `<samp><span class="samp">fftw_mpi_local_size</span></samp>'
d@0 60 distribution interface is broken into basic and advanced
d@0 61 (`<samp><span class="samp">_many</span></samp>') interfaces, where the latter allows you to specify the
d@0 62 block size manually and also to request block sizes when computing
d@0 63 multiple transforms simultaneously. These functions are documented
d@0 64 more exhaustively by the FFTW MPI Reference, but we summarize the
d@0 65 basic ideas here using a couple of two-dimensional examples.
d@0 66
d@0 67 <p>For the 100&nbsp;&times;&nbsp;200 complex-DFT example, above, we would find
d@0 68 the distribution by calling the following function in the basic
d@0 69 interface:
d@0 70
d@0 71 <pre class="example"> ptrdiff_t fftw_mpi_local_size_2d(ptrdiff_t n0, ptrdiff_t n1, MPI_Comm comm,
d@0 72 ptrdiff_t *local_n0, ptrdiff_t *local_0_start);
d@0 73 </pre>
d@0 74 <p><a name="index-fftw_005fmpi_005flocal_005fsize_005f2d-352"></a>
d@0 75 Given the total size of the data to be transformed (here, <code>n0 =
d@0 76 100</code> and <code>n1 = 200</code>) and an MPI communicator (<code>comm</code>), this
d@0 77 function provides three numbers.
d@0 78
d@0 79 <p>First, it describes the shape of the local data: the current process
d@0 80 should store a <code>local_n0</code> by <code>n1</code> slice of the overall
d@0 81 dataset, in row-major order (<code>n1</code> dimension contiguous), starting
d@0 82 at index <code>local_0_start</code>. That is, if the total dataset is
d@0 83 viewed as a <code>n0</code> by <code>n1</code> matrix, the current process should
d@0 84 store the rows <code>local_0_start</code> to
d@0 85 <code>local_0_start+local_n0-1</code>. Obviously, if you are running with
d@0 86 only a single MPI process, that process will store the entire array:
d@0 87 <code>local_0_start</code> will be zero and <code>local_n0</code> will be
d@0 88 <code>n0</code>. See <a href="Row_002dmajor-Format.html#Row_002dmajor-Format">Row-major Format</a>.
d@0 89 <a name="index-row_002dmajor-353"></a>
d@0 90 Second, the return value is the total number of data elements (e.g.,
d@0 91 complex numbers for a complex DFT) that should be allocated for the
d@0 92 input and output arrays on the current process (ideally with
d@0 93 <code>fftw_malloc</code>, to ensure optimal alignment). It might seem that
d@0 94 this should always be equal to <code>local_n0 * n1</code>, but this is
d@0 95 <em>not</em> the case. FFTW's distributed FFT algorithms require data
d@0 96 redistributions at intermediate stages of the transform, and in some
d@0 97 circumstances this may require slightly larger local storage. This is
d@0 98 discussed in more detail below, under <a href="Load-balancing.html#Load-balancing">Load balancing</a>.
d@0 99 <a name="index-fftw_005fmalloc-354"></a>
d@0 100 The advanced-interface `<samp><span class="samp">local_size</span></samp>' function for multidimensional
d@0 101 transforms returns the same three things (<code>local_n0</code>,
d@0 102 <code>local_0_start</code>, and the total number of elements to allocate),
d@0 103 but takes more inputs:
d@0 104
d@0 105 <pre class="example"> ptrdiff_t fftw_mpi_local_size_many(int rnk, const ptrdiff_t *n,
d@0 106 ptrdiff_t howmany,
d@0 107 ptrdiff_t block0,
d@0 108 MPI_Comm comm,
d@0 109 ptrdiff_t *local_n0,
d@0 110 ptrdiff_t *local_0_start);
d@0 111 </pre>
d@0 112 <p><a name="index-fftw_005fmpi_005flocal_005fsize_005fmany-355"></a>
d@0 113 The two-dimensional case above corresponds to <code>rnk = 2</code> and an
d@0 114 array <code>n</code> of length 2 with <code>n[0] = n0</code> and <code>n[1] = n1</code>.
d@0 115 This routine is for any <code>rnk &gt; 1</code>; one-dimensional transforms
d@0 116 have their own interface because they work slightly differently, as
d@0 117 discussed below.
d@0 118
d@0 119 <p>First, the advanced interface allows you to perform multiple
d@0 120 transforms at once, of interleaved data, as specified by the
d@0 121 <code>howmany</code> parameter. (<code>hoamany</code> is 1 for a single
d@0 122 transform.)
d@0 123
d@0 124 <p>Second, here you can specify your desired block size in the <code>n0</code>
d@0 125 dimension, <code>block0</code>. To use FFTW's default block size, pass
d@0 126 <code>FFTW_MPI_DEFAULT_BLOCK</code> (0) for <code>block0</code>. Otherwise, on
d@0 127 <code>P</code> processes, FFTW will return <code>local_n0</code> equal to
d@0 128 <code>block0</code> on the first <code>P / block0</code> processes (rounded down),
d@0 129 return <code>local_n0</code> equal to <code>n0 - block0 * (P / block0)</code> on
d@0 130 the next process, and <code>local_n0</code> equal to zero on any remaining
d@0 131 processes. In general, we recommend using the default block size
d@0 132 (which corresponds to <code>n0 / P</code>, rounded up).
d@0 133 <a name="index-FFTW_005fMPI_005fDEFAULT_005fBLOCK-356"></a><a name="index-block-distribution-357"></a>
d@0 134 For example, suppose you have <code>P = 4</code> processes and <code>n0 =
d@0 135 21</code>. The default will be a block size of <code>6</code>, which will give
d@0 136 <code>local_n0 = 6</code> on the first three processes and <code>local_n0 =
d@0 137 3</code> on the last process. Instead, however, you could specify
d@0 138 <code>block0 = 5</code> if you wanted, which would give <code>local_n0 = 5</code>
d@0 139 on processes 0 to 2, <code>local_n0 = 6</code> on process 3. (This choice,
d@0 140 while it may look superficially more &ldquo;balanced,&rdquo; has the same
d@0 141 critical path as FFTW's default but requires more communications.)
d@0 142
d@0 143 </body></html>
d@0 144