annotate src/fftw-3.3.3/doc/html/Transposed-distributions.html @ 83:ae30d91d2ffe

Replace these with versions built using an older toolset (so as to avoid ABI compatibilities when linking on Ubuntu 14.04 for packaging purposes)
author Chris Cannam
date Fri, 07 Feb 2020 11:51:13 +0000
parents 37bf6b4a2645
children
rev   line source
Chris@10 1 <html lang="en">
Chris@10 2 <head>
Chris@10 3 <title>Transposed distributions - FFTW 3.3.3</title>
Chris@10 4 <meta http-equiv="Content-Type" content="text/html">
Chris@10 5 <meta name="description" content="FFTW 3.3.3">
Chris@10 6 <meta name="generator" content="makeinfo 4.13">
Chris@10 7 <link title="Top" rel="start" href="index.html#Top">
Chris@10 8 <link rel="up" href="MPI-Data-Distribution.html#MPI-Data-Distribution" title="MPI Data Distribution">
Chris@10 9 <link rel="prev" href="Load-balancing.html#Load-balancing" title="Load balancing">
Chris@10 10 <link rel="next" href="One_002ddimensional-distributions.html#One_002ddimensional-distributions" title="One-dimensional distributions">
Chris@10 11 <link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
Chris@10 12 <!--
Chris@10 13 This manual is for FFTW
Chris@10 14 (version 3.3.3, 25 November 2012).
Chris@10 15
Chris@10 16 Copyright (C) 2003 Matteo Frigo.
Chris@10 17
Chris@10 18 Copyright (C) 2003 Massachusetts Institute of Technology.
Chris@10 19
Chris@10 20 Permission is granted to make and distribute verbatim copies of
Chris@10 21 this manual provided the copyright notice and this permission
Chris@10 22 notice are preserved on all copies.
Chris@10 23
Chris@10 24 Permission is granted to copy and distribute modified versions of
Chris@10 25 this manual under the conditions for verbatim copying, provided
Chris@10 26 that the entire resulting derived work is distributed under the
Chris@10 27 terms of a permission notice identical to this one.
Chris@10 28
Chris@10 29 Permission is granted to copy and distribute translations of this
Chris@10 30 manual into another language, under the above conditions for
Chris@10 31 modified versions, except that this permission notice may be
Chris@10 32 stated in a translation approved by the Free Software Foundation.
Chris@10 33 -->
Chris@10 34 <meta http-equiv="Content-Style-Type" content="text/css">
Chris@10 35 <style type="text/css"><!--
Chris@10 36 pre.display { font-family:inherit }
Chris@10 37 pre.format { font-family:inherit }
Chris@10 38 pre.smalldisplay { font-family:inherit; font-size:smaller }
Chris@10 39 pre.smallformat { font-family:inherit; font-size:smaller }
Chris@10 40 pre.smallexample { font-size:smaller }
Chris@10 41 pre.smalllisp { font-size:smaller }
Chris@10 42 span.sc { font-variant:small-caps }
Chris@10 43 span.roman { font-family:serif; font-weight:normal; }
Chris@10 44 span.sansserif { font-family:sans-serif; font-weight:normal; }
Chris@10 45 --></style>
Chris@10 46 </head>
Chris@10 47 <body>
Chris@10 48 <div class="node">
Chris@10 49 <a name="Transposed-distributions"></a>
Chris@10 50 <p>
Chris@10 51 Next:&nbsp;<a rel="next" accesskey="n" href="One_002ddimensional-distributions.html#One_002ddimensional-distributions">One-dimensional distributions</a>,
Chris@10 52 Previous:&nbsp;<a rel="previous" accesskey="p" href="Load-balancing.html#Load-balancing">Load balancing</a>,
Chris@10 53 Up:&nbsp;<a rel="up" accesskey="u" href="MPI-Data-Distribution.html#MPI-Data-Distribution">MPI Data Distribution</a>
Chris@10 54 <hr>
Chris@10 55 </div>
Chris@10 56
Chris@10 57 <h4 class="subsection">6.4.3 Transposed distributions</h4>
Chris@10 58
Chris@10 59 <p>Internally, FFTW's MPI transform algorithms work by first computing
Chris@10 60 transforms of the data local to each process, then by globally
Chris@10 61 <em>transposing</em> the data in some fashion to redistribute the data
Chris@10 62 among the processes, transforming the new data local to each process,
Chris@10 63 and transposing back. For example, a two-dimensional <code>n0</code> by
Chris@10 64 <code>n1</code> array, distributed across the <code>n0</code> dimension, is
Chris@10 65 transformd by: (i) transforming the <code>n1</code> dimension, which are
Chris@10 66 local to each process; (ii) transposing to an <code>n1</code> by <code>n0</code>
Chris@10 67 array, distributed across the <code>n1</code> dimension; (iii) transforming
Chris@10 68 the <code>n0</code> dimension, which is now local to each process; (iv)
Chris@10 69 transposing back.
Chris@10 70 <a name="index-transpose-379"></a>
Chris@10 71
Chris@10 72 <p>However, in many applications it is acceptable to compute a
Chris@10 73 multidimensional DFT whose results are produced in transposed order
Chris@10 74 (e.g., <code>n1</code> by <code>n0</code> in two dimensions). This provides a
Chris@10 75 significant performance advantage, because it means that the final
Chris@10 76 transposition step can be omitted. FFTW supports this optimization,
Chris@10 77 which you specify by passing the flag <code>FFTW_MPI_TRANSPOSED_OUT</code>
Chris@10 78 to the planner routines. To compute the inverse transform of
Chris@10 79 transposed output, you specify <code>FFTW_MPI_TRANSPOSED_IN</code> to tell
Chris@10 80 it that the input is transposed. In this section, we explain how to
Chris@10 81 interpret the output format of such a transform.
Chris@10 82 <a name="index-FFTW_005fMPI_005fTRANSPOSED_005fOUT-380"></a><a name="index-FFTW_005fMPI_005fTRANSPOSED_005fIN-381"></a>
Chris@10 83
Chris@10 84 <p>Suppose you have are transforming multi-dimensional data with (at
Chris@10 85 least two) dimensions n<sub>0</sub>&nbsp;&times;&nbsp;n<sub>1</sub>&nbsp;&times;&nbsp;n<sub>2</sub>&nbsp;&times;&nbsp;&hellip;&nbsp;&times;&nbsp;n<sub>d-1</sub>. As always, it is distributed along
Chris@10 86 the first dimension n<sub>0</sub>. Now, if we compute its DFT with the
Chris@10 87 <code>FFTW_MPI_TRANSPOSED_OUT</code> flag, the resulting output data are stored
Chris@10 88 with the first <em>two</em> dimensions transposed: n<sub>1</sub>&nbsp;&times;&nbsp;n<sub>0</sub>&nbsp;&times;&nbsp;n<sub>2</sub>&nbsp;&times;&hellip;&times;&nbsp;n<sub>d-1</sub>,
Chris@10 89 distributed along the n<sub>1</sub> dimension. Conversely, if we take the
Chris@10 90 n<sub>1</sub>&nbsp;&times;&nbsp;n<sub>0</sub>&nbsp;&times;&nbsp;n<sub>2</sub>&nbsp;&times;&hellip;&times;&nbsp;n<sub>d-1</sub> data and transform it with the
Chris@10 91 <code>FFTW_MPI_TRANSPOSED_IN</code> flag, then the format goes back to the
Chris@10 92 original n<sub>0</sub>&nbsp;&times;&nbsp;n<sub>1</sub>&nbsp;&times;&nbsp;n<sub>2</sub>&nbsp;&times;&nbsp;&hellip;&nbsp;&times;&nbsp;n<sub>d-1</sub> array.
Chris@10 93
Chris@10 94 <p>There are two ways to find the portion of the transposed array that
Chris@10 95 resides on the current process. First, you can simply call the
Chris@10 96 appropriate &lsquo;<samp><span class="samp">local_size</span></samp>&rsquo; function, passing n<sub>1</sub>&nbsp;&times;&nbsp;n<sub>0</sub>&nbsp;&times;&nbsp;n<sub>2</sub>&nbsp;&times;&hellip;&times;&nbsp;n<sub>d-1</sub> (the
Chris@10 97 transposed dimensions). This would mean calling the &lsquo;<samp><span class="samp">local_size</span></samp>&rsquo;
Chris@10 98 function twice, once for the transposed and once for the
Chris@10 99 non-transposed dimensions. Alternatively, you can call one of the
Chris@10 100 &lsquo;<samp><span class="samp">local_size_transposed</span></samp>&rsquo; functions, which returns both the
Chris@10 101 non-transposed and transposed data distribution from a single call.
Chris@10 102 For example, for a 3d transform with transposed output (or input), you
Chris@10 103 might call:
Chris@10 104
Chris@10 105 <pre class="example"> ptrdiff_t fftw_mpi_local_size_3d_transposed(
Chris@10 106 ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2, MPI_Comm comm,
Chris@10 107 ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
Chris@10 108 ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
Chris@10 109 </pre>
Chris@10 110 <p><a name="index-fftw_005fmpi_005flocal_005fsize_005f3d_005ftransposed-382"></a>
Chris@10 111 Here, <code>local_n0</code> and <code>local_0_start</code> give the size and
Chris@10 112 starting index of the <code>n0</code> dimension for the
Chris@10 113 <em>non</em>-transposed data, as in the previous sections. For
Chris@10 114 <em>transposed</em> data (e.g. the output for
Chris@10 115 <code>FFTW_MPI_TRANSPOSED_OUT</code>), <code>local_n1</code> and
Chris@10 116 <code>local_1_start</code> give the size and starting index of the <code>n1</code>
Chris@10 117 dimension, which is the first dimension of the transposed data
Chris@10 118 (<code>n1</code> by <code>n0</code> by <code>n2</code>).
Chris@10 119
Chris@10 120 <p>(Note that <code>FFTW_MPI_TRANSPOSED_IN</code> is completely equivalent to
Chris@10 121 performing <code>FFTW_MPI_TRANSPOSED_OUT</code> and passing the first two
Chris@10 122 dimensions to the planner in reverse order, or vice versa. If you
Chris@10 123 pass <em>both</em> the <code>FFTW_MPI_TRANSPOSED_IN</code> and
Chris@10 124 <code>FFTW_MPI_TRANSPOSED_OUT</code> flags, it is equivalent to swapping the
Chris@10 125 first two dimensions passed to the planner and passing <em>neither</em>
Chris@10 126 flag.)
Chris@10 127
Chris@10 128 </body></html>
Chris@10 129