Chris@10
|
1 <html lang="en">
|
Chris@10
|
2 <head>
|
Chris@10
|
3 <title>Multi-dimensional MPI DFTs of Real Data - FFTW 3.3.3</title>
|
Chris@10
|
4 <meta http-equiv="Content-Type" content="text/html">
|
Chris@10
|
5 <meta name="description" content="FFTW 3.3.3">
|
Chris@10
|
6 <meta name="generator" content="makeinfo 4.13">
|
Chris@10
|
7 <link title="Top" rel="start" href="index.html#Top">
|
Chris@10
|
8 <link rel="up" href="Distributed_002dmemory-FFTW-with-MPI.html#Distributed_002dmemory-FFTW-with-MPI" title="Distributed-memory FFTW with MPI">
|
Chris@10
|
9 <link rel="prev" href="MPI-Data-Distribution.html#MPI-Data-Distribution" title="MPI Data Distribution">
|
Chris@10
|
10 <link rel="next" href="Other-Multi_002ddimensional-Real_002ddata-MPI-Transforms.html#Other-Multi_002ddimensional-Real_002ddata-MPI-Transforms" title="Other Multi-dimensional Real-data MPI Transforms">
|
Chris@10
|
11 <link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
|
Chris@10
|
12 <!--
|
Chris@10
|
13 This manual is for FFTW
|
Chris@10
|
14 (version 3.3.3, 25 November 2012).
|
Chris@10
|
15
|
Chris@10
|
16 Copyright (C) 2003 Matteo Frigo.
|
Chris@10
|
17
|
Chris@10
|
18 Copyright (C) 2003 Massachusetts Institute of Technology.
|
Chris@10
|
19
|
Chris@10
|
20 Permission is granted to make and distribute verbatim copies of
|
Chris@10
|
21 this manual provided the copyright notice and this permission
|
Chris@10
|
22 notice are preserved on all copies.
|
Chris@10
|
23
|
Chris@10
|
24 Permission is granted to copy and distribute modified versions of
|
Chris@10
|
25 this manual under the conditions for verbatim copying, provided
|
Chris@10
|
26 that the entire resulting derived work is distributed under the
|
Chris@10
|
27 terms of a permission notice identical to this one.
|
Chris@10
|
28
|
Chris@10
|
29 Permission is granted to copy and distribute translations of this
|
Chris@10
|
30 manual into another language, under the above conditions for
|
Chris@10
|
31 modified versions, except that this permission notice may be
|
Chris@10
|
32 stated in a translation approved by the Free Software Foundation.
|
Chris@10
|
33 -->
|
Chris@10
|
34 <meta http-equiv="Content-Style-Type" content="text/css">
|
Chris@10
|
35 <style type="text/css"><!--
|
Chris@10
|
36 pre.display { font-family:inherit }
|
Chris@10
|
37 pre.format { font-family:inherit }
|
Chris@10
|
38 pre.smalldisplay { font-family:inherit; font-size:smaller }
|
Chris@10
|
39 pre.smallformat { font-family:inherit; font-size:smaller }
|
Chris@10
|
40 pre.smallexample { font-size:smaller }
|
Chris@10
|
41 pre.smalllisp { font-size:smaller }
|
Chris@10
|
42 span.sc { font-variant:small-caps }
|
Chris@10
|
43 span.roman { font-family:serif; font-weight:normal; }
|
Chris@10
|
44 span.sansserif { font-family:sans-serif; font-weight:normal; }
|
Chris@10
|
45 --></style>
|
Chris@10
|
46 </head>
|
Chris@10
|
47 <body>
|
Chris@10
|
48 <div class="node">
|
Chris@10
|
49 <a name="Multi-dimensional-MPI-DFTs-of-Real-Data"></a>
|
Chris@10
|
50 <a name="Multi_002ddimensional-MPI-DFTs-of-Real-Data"></a>
|
Chris@10
|
51 <p>
|
Chris@10
|
52 Next: <a rel="next" accesskey="n" href="Other-Multi_002ddimensional-Real_002ddata-MPI-Transforms.html#Other-Multi_002ddimensional-Real_002ddata-MPI-Transforms">Other Multi-dimensional Real-data MPI Transforms</a>,
|
Chris@10
|
53 Previous: <a rel="previous" accesskey="p" href="MPI-Data-Distribution.html#MPI-Data-Distribution">MPI Data Distribution</a>,
|
Chris@10
|
54 Up: <a rel="up" accesskey="u" href="Distributed_002dmemory-FFTW-with-MPI.html#Distributed_002dmemory-FFTW-with-MPI">Distributed-memory FFTW with MPI</a>
|
Chris@10
|
55 <hr>
|
Chris@10
|
56 </div>
|
Chris@10
|
57
|
Chris@10
|
58 <h3 class="section">6.5 Multi-dimensional MPI DFTs of Real Data</h3>
|
Chris@10
|
59
|
Chris@10
|
60 <p>FFTW's MPI interface also supports multi-dimensional DFTs of real
|
Chris@10
|
61 data, similar to the serial r2c and c2r interfaces. (Parallel
|
Chris@10
|
62 one-dimensional real-data DFTs are not currently supported; you must
|
Chris@10
|
63 use a complex transform and set the imaginary parts of the inputs to
|
Chris@10
|
64 zero.)
|
Chris@10
|
65
|
Chris@10
|
66 <p>The key points to understand for r2c and c2r MPI transforms (compared
|
Chris@10
|
67 to the MPI complex DFTs or the serial r2c/c2r transforms), are:
|
Chris@10
|
68
|
Chris@10
|
69 <ul>
|
Chris@10
|
70 <li>Just as for serial transforms, r2c/c2r DFTs transform n<sub>0</sub> × n<sub>1</sub> × n<sub>2</sub> × … × n<sub>d-1</sub> real
|
Chris@10
|
71 data to/from n<sub>0</sub> × n<sub>1</sub> × n<sub>2</sub> × … × (n<sub>d-1</sub>/2 + 1) complex data: the last dimension of the
|
Chris@10
|
72 complex data is cut in half (rounded down), plus one. As for the
|
Chris@10
|
73 serial transforms, the sizes you pass to the ‘<samp><span class="samp">plan_dft_r2c</span></samp>’ and
|
Chris@10
|
74 ‘<samp><span class="samp">plan_dft_c2r</span></samp>’ are the n<sub>0</sub> × n<sub>1</sub> × n<sub>2</sub> × … × n<sub>d-1</sub> dimensions of the real data.
|
Chris@10
|
75
|
Chris@10
|
76 <li><a name="index-padding-386"></a>Although the real data is <em>conceptually</em> n<sub>0</sub> × n<sub>1</sub> × n<sub>2</sub> × … × n<sub>d-1</sub>, it is
|
Chris@10
|
77 <em>physically</em> stored as an n<sub>0</sub> × n<sub>1</sub> × n<sub>2</sub> × … × [2 (n<sub>d-1</sub>/2 + 1)] array, where the last
|
Chris@10
|
78 dimension has been <em>padded</em> to make it the same size as the
|
Chris@10
|
79 complex output. This is much like the in-place serial r2c/c2r
|
Chris@10
|
80 interface (see <a href="Multi_002dDimensional-DFTs-of-Real-Data.html#Multi_002dDimensional-DFTs-of-Real-Data">Multi-Dimensional DFTs of Real Data</a>), except that
|
Chris@10
|
81 in MPI the padding is required even for out-of-place data. The extra
|
Chris@10
|
82 padding numbers are ignored by FFTW (they are <em>not</em> like
|
Chris@10
|
83 zero-padding the transform to a larger size); they are only used to
|
Chris@10
|
84 determine the data layout.
|
Chris@10
|
85
|
Chris@10
|
86 <li><a name="index-data-distribution-387"></a>The data distribution in MPI for <em>both</em> the real and complex data
|
Chris@10
|
87 is determined by the shape of the <em>complex</em> data. That is, you
|
Chris@10
|
88 call the appropriate ‘<samp><span class="samp">local size</span></samp>’ function for the n<sub>0</sub> × n<sub>1</sub> × n<sub>2</sub> × … × (n<sub>d-1</sub>/2 + 1)
|
Chris@10
|
89
|
Chris@10
|
90 <p>complex data, and then use the <em>same</em> distribution for the real
|
Chris@10
|
91 data except that the last complex dimension is replaced by a (padded)
|
Chris@10
|
92 real dimension of twice the length.
|
Chris@10
|
93
|
Chris@10
|
94 </ul>
|
Chris@10
|
95
|
Chris@10
|
96 <p>For example suppose we are performing an out-of-place r2c transform of
|
Chris@10
|
97 L × M × N real data [padded to L × M × 2(N/2+1)],
|
Chris@10
|
98 resulting in L × M × N/2+1 complex data. Similar to the
|
Chris@10
|
99 example in <a href="2d-MPI-example.html#g_t2d-MPI-example">2d MPI example</a>, we might do something like:
|
Chris@10
|
100
|
Chris@10
|
101 <pre class="example"> #include <fftw3-mpi.h>
|
Chris@10
|
102
|
Chris@10
|
103 int main(int argc, char **argv)
|
Chris@10
|
104 {
|
Chris@10
|
105 const ptrdiff_t L = ..., M = ..., N = ...;
|
Chris@10
|
106 fftw_plan plan;
|
Chris@10
|
107 double *rin;
|
Chris@10
|
108 fftw_complex *cout;
|
Chris@10
|
109 ptrdiff_t alloc_local, local_n0, local_0_start, i, j, k;
|
Chris@10
|
110
|
Chris@10
|
111 MPI_Init(&argc, &argv);
|
Chris@10
|
112 fftw_mpi_init();
|
Chris@10
|
113
|
Chris@10
|
114 /* <span class="roman">get local data size and allocate</span> */
|
Chris@10
|
115 alloc_local = fftw_mpi_local_size_3d(L, M, N/2+1, MPI_COMM_WORLD,
|
Chris@10
|
116 &local_n0, &local_0_start);
|
Chris@10
|
117 rin = fftw_alloc_real(2 * alloc_local);
|
Chris@10
|
118 cout = fftw_alloc_complex(alloc_local);
|
Chris@10
|
119
|
Chris@10
|
120 /* <span class="roman">create plan for out-of-place r2c DFT</span> */
|
Chris@10
|
121 plan = fftw_mpi_plan_dft_r2c_3d(L, M, N, rin, cout, MPI_COMM_WORLD,
|
Chris@10
|
122 FFTW_MEASURE);
|
Chris@10
|
123
|
Chris@10
|
124 /* <span class="roman">initialize rin to some function</span> my_func(x,y,z) */
|
Chris@10
|
125 for (i = 0; i < local_n0; ++i)
|
Chris@10
|
126 for (j = 0; j < M; ++j)
|
Chris@10
|
127 for (k = 0; k < N; ++k)
|
Chris@10
|
128 rin[(i*M + j) * (2*(N/2+1)) + k] = my_func(local_0_start+i, j, k);
|
Chris@10
|
129
|
Chris@10
|
130 /* <span class="roman">compute transforms as many times as desired</span> */
|
Chris@10
|
131 fftw_execute(plan);
|
Chris@10
|
132
|
Chris@10
|
133 fftw_destroy_plan(plan);
|
Chris@10
|
134
|
Chris@10
|
135 MPI_Finalize();
|
Chris@10
|
136 }
|
Chris@10
|
137 </pre>
|
Chris@10
|
138 <p><a name="index-fftw_005falloc_005freal-388"></a><a name="index-row_002dmajor-389"></a>Note that we allocated <code>rin</code> using <code>fftw_alloc_real</code> with an
|
Chris@10
|
139 argument of <code>2 * alloc_local</code>: since <code>alloc_local</code> is the
|
Chris@10
|
140 number of <em>complex</em> values to allocate, the number of <em>real</em>
|
Chris@10
|
141 values is twice as many. The <code>rin</code> array is then
|
Chris@10
|
142 local_n0 × M × 2(N/2+1) in row-major order, so its
|
Chris@10
|
143 <code>(i,j,k)</code> element is at the index <code>(i*M + j) * (2*(N/2+1)) +
|
Chris@10
|
144 k</code> (see <a href="Multi_002ddimensional-Array-Format.html#Multi_002ddimensional-Array-Format">Multi-dimensional Array Format</a>).
|
Chris@10
|
145
|
Chris@10
|
146 <p><a name="index-transpose-390"></a><a name="index-FFTW_005fTRANSPOSED_005fOUT-391"></a><a name="index-FFTW_005fTRANSPOSED_005fIN-392"></a>As for the complex transforms, improved performance can be obtained by
|
Chris@10
|
147 specifying that the output is the transpose of the input or vice versa
|
Chris@10
|
148 (see <a href="Transposed-distributions.html#Transposed-distributions">Transposed distributions</a>). In our L × M × N r2c
|
Chris@10
|
149 example, including <code>FFTW_TRANSPOSED_OUT</code> in the flags means that
|
Chris@10
|
150 the input would be a padded L × M × 2(N/2+1) real array
|
Chris@10
|
151 distributed over the <code>L</code> dimension, while the output would be a
|
Chris@10
|
152 M × L × N/2+1 complex array distributed over the <code>M</code>
|
Chris@10
|
153 dimension. To perform the inverse c2r transform with the same data
|
Chris@10
|
154 distributions, you would use the <code>FFTW_TRANSPOSED_IN</code> flag.
|
Chris@10
|
155
|
Chris@10
|
156 <!-- -->
|
Chris@10
|
157 </body></html>
|
Chris@10
|
158
|