Mercurial > hg > sv-dependency-builds
comparison src/fftw-3.3.3/doc/threads.texi @ 10:37bf6b4a2645
Add FFTW3
author | Chris Cannam |
---|---|
date | Wed, 20 Mar 2013 15:35:50 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
9:c0fb53affa76 | 10:37bf6b4a2645 |
---|---|
1 @node Multi-threaded FFTW, Distributed-memory FFTW with MPI, FFTW Reference, Top | |
2 @chapter Multi-threaded FFTW | |
3 | |
4 @cindex parallel transform | |
5 In this chapter we document the parallel FFTW routines for | |
6 shared-memory parallel hardware. These routines, which support | |
7 parallel one- and multi-dimensional transforms of both real and | |
8 complex data, are the easiest way to take advantage of multiple | |
9 processors with FFTW. They work just like the corresponding | |
10 uniprocessor transform routines, except that you have an extra | |
11 initialization routine to call, and there is a routine to set the | |
12 number of threads to employ. Any program that uses the uniprocessor | |
13 FFTW can therefore be trivially modified to use the multi-threaded | |
14 FFTW. | |
15 | |
16 A shared-memory machine is one in which all CPUs can directly access | |
17 the same main memory, and such machines are now common due to the | |
18 ubiquity of multi-core CPUs. FFTW's multi-threading support allows | |
19 you to utilize these additional CPUs transparently from a single | |
20 program. However, this does not necessarily translate into | |
21 performance gains---when multiple threads/CPUs are employed, there is | |
22 an overhead required for synchronization that may outweigh the | |
23 computatational parallelism. Therefore, you can only benefit from | |
24 threads if your problem is sufficiently large. | |
25 @cindex shared-memory | |
26 @cindex threads | |
27 | |
28 @menu | |
29 * Installation and Supported Hardware/Software:: | |
30 * Usage of Multi-threaded FFTW:: | |
31 * How Many Threads to Use?:: | |
32 * Thread safety:: | |
33 @end menu | |
34 | |
35 @c ------------------------------------------------------------ | |
36 @node Installation and Supported Hardware/Software, Usage of Multi-threaded FFTW, Multi-threaded FFTW, Multi-threaded FFTW | |
37 @section Installation and Supported Hardware/Software | |
38 | |
39 All of the FFTW threads code is located in the @code{threads} | |
40 subdirectory of the FFTW package. On Unix systems, the FFTW threads | |
41 libraries and header files can be automatically configured, compiled, | |
42 and installed along with the uniprocessor FFTW libraries simply by | |
43 including @code{--enable-threads} in the flags to the @code{configure} | |
44 script (@pxref{Installation on Unix}), or @code{--enable-openmp} to use | |
45 @uref{http://www.openmp.org,OpenMP} threads. | |
46 @fpindex configure | |
47 | |
48 | |
49 @cindex portability | |
50 @cindex OpenMP | |
51 The threads routines require your operating system to have some sort | |
52 of shared-memory threads support. Specifically, the FFTW threads | |
53 package works with POSIX threads (available on most Unix variants, | |
54 from GNU/Linux to MacOS X) and Win32 threads. OpenMP threads, which | |
55 are supported in many common compilers (e.g. gcc) are also supported, | |
56 and may give better performance on some systems. (OpenMP threads are | |
57 also useful if you are employing OpenMP in your own code, in order to | |
58 minimize conflicts between threading models.) If you have a | |
59 shared-memory machine that uses a different threads API, it should be | |
60 a simple matter of programming to include support for it; see the file | |
61 @code{threads/threads.c} for more detail. | |
62 | |
63 You can compile FFTW with @emph{both} @code{--enable-threads} and | |
64 @code{--enable-openmp} at the same time, since they install libraries | |
65 with different names (@samp{fftw3_threads} and @samp{fftw3_omp}, as | |
66 described below). However, your programs may only link to @emph{one} | |
67 of these two libraries at a time. | |
68 | |
69 Ideally, of course, you should also have multiple processors in order to | |
70 get any benefit from the threaded transforms. | |
71 | |
72 @c ------------------------------------------------------------ | |
73 @node Usage of Multi-threaded FFTW, How Many Threads to Use?, Installation and Supported Hardware/Software, Multi-threaded FFTW | |
74 @section Usage of Multi-threaded FFTW | |
75 | |
76 Here, it is assumed that the reader is already familiar with the usage | |
77 of the uniprocessor FFTW routines, described elsewhere in this manual. | |
78 We only describe what one has to change in order to use the | |
79 multi-threaded routines. | |
80 | |
81 @cindex OpenMP | |
82 First, programs using the parallel complex transforms should be linked | |
83 with @code{-lfftw3_threads -lfftw3 -lm} on Unix, or @code{-lfftw3_omp | |
84 -lfftw3 -lm} if you compiled with OpenMP. You will also need to link | |
85 with whatever library is responsible for threads on your system | |
86 (e.g. @code{-lpthread} on GNU/Linux) or include whatever compiler flag | |
87 enables OpenMP (e.g. @code{-fopenmp} with gcc). | |
88 @cindex linking on Unix | |
89 | |
90 | |
91 Second, before calling @emph{any} FFTW routines, you should call the | |
92 function: | |
93 | |
94 @example | |
95 int fftw_init_threads(void); | |
96 @end example | |
97 @findex fftw_init_threads | |
98 | |
99 This function, which need only be called once, performs any one-time | |
100 initialization required to use threads on your system. It returns zero | |
101 if there was some error (which should not happen under normal | |
102 circumstances) and a non-zero value otherwise. | |
103 | |
104 Third, before creating a plan that you want to parallelize, you should | |
105 call: | |
106 | |
107 @example | |
108 void fftw_plan_with_nthreads(int nthreads); | |
109 @end example | |
110 @findex fftw_plan_with_nthreads | |
111 | |
112 The @code{nthreads} argument indicates the number of threads you want | |
113 FFTW to use (or actually, the maximum number). All plans subsequently | |
114 created with any planner routine will use that many threads. You can | |
115 call @code{fftw_plan_with_nthreads}, create some plans, call | |
116 @code{fftw_plan_with_nthreads} again with a different argument, and | |
117 create some more plans for a new number of threads. Plans already created | |
118 before a call to @code{fftw_plan_with_nthreads} are unaffected. If you | |
119 pass an @code{nthreads} argument of @code{1} (the default), threads are | |
120 disabled for subsequent plans. | |
121 | |
122 @cindex OpenMP | |
123 With OpenMP, to configure FFTW to use all of the currently running | |
124 OpenMP threads (set by @code{omp_set_num_threads(nthreads)} or by the | |
125 @code{OMP_NUM_THREADS} environment variable), you can do: | |
126 @code{fftw_plan_with_nthreads(omp_get_max_threads())}. (The @samp{omp_} | |
127 OpenMP functions are declared via @code{#include <omp.h>}.) | |
128 | |
129 @cindex thread safety | |
130 Given a plan, you then execute it as usual with | |
131 @code{fftw_execute(plan)}, and the execution will use the number of | |
132 threads specified when the plan was created. When done, you destroy | |
133 it as usual with @code{fftw_destroy_plan}. As described in | |
134 @ref{Thread safety}, plan @emph{execution} is thread-safe, but plan | |
135 creation and destruction are @emph{not}: you should create/destroy | |
136 plans only from a single thread, but can safely execute multiple plans | |
137 in parallel. | |
138 | |
139 There is one additional routine: if you want to get rid of all memory | |
140 and other resources allocated internally by FFTW, you can call: | |
141 | |
142 @example | |
143 void fftw_cleanup_threads(void); | |
144 @end example | |
145 @findex fftw_cleanup_threads | |
146 | |
147 which is much like the @code{fftw_cleanup()} function except that it | |
148 also gets rid of threads-related data. You must @emph{not} execute any | |
149 previously created plans after calling this function. | |
150 | |
151 We should also mention one other restriction: if you save wisdom from a | |
152 program using the multi-threaded FFTW, that wisdom @emph{cannot be used} | |
153 by a program using only the single-threaded FFTW (i.e. not calling | |
154 @code{fftw_init_threads}). @xref{Words of Wisdom-Saving Plans}. | |
155 | |
156 @c ------------------------------------------------------------ | |
157 @node How Many Threads to Use?, Thread safety, Usage of Multi-threaded FFTW, Multi-threaded FFTW | |
158 @section How Many Threads to Use? | |
159 | |
160 @cindex number of threads | |
161 There is a fair amount of overhead involved in synchronizing threads, | |
162 so the optimal number of threads to use depends upon the size of the | |
163 transform as well as on the number of processors you have. | |
164 | |
165 As a general rule, you don't want to use more threads than you have | |
166 processors. (Using more threads will work, but there will be extra | |
167 overhead with no benefit.) In fact, if the problem size is too small, | |
168 you may want to use fewer threads than you have processors. | |
169 | |
170 You will have to experiment with your system to see what level of | |
171 parallelization is best for your problem size. Typically, the problem | |
172 will have to involve at least a few thousand data points before threads | |
173 become beneficial. If you plan with @code{FFTW_PATIENT}, it will | |
174 automatically disable threads for sizes that don't benefit from | |
175 parallelization. | |
176 @ctindex FFTW_PATIENT | |
177 | |
178 @c ------------------------------------------------------------ | |
179 @node Thread safety, , How Many Threads to Use?, Multi-threaded FFTW | |
180 @section Thread safety | |
181 | |
182 @cindex threads | |
183 @cindex OpenMP | |
184 @cindex thread safety | |
185 Users writing multi-threaded programs (including OpenMP) must concern | |
186 themselves with the @dfn{thread safety} of the libraries they | |
187 use---that is, whether it is safe to call routines in parallel from | |
188 multiple threads. FFTW can be used in such an environment, but some | |
189 care must be taken because the planner routines share data | |
190 (e.g. wisdom and trigonometric tables) between calls and plans. | |
191 | |
192 The upshot is that the only thread-safe (re-entrant) routine in FFTW is | |
193 @code{fftw_execute} (and the new-array variants thereof). All other routines | |
194 (e.g. the planner) should only be called from one thread at a time. So, | |
195 for example, you can wrap a semaphore lock around any calls to the | |
196 planner; even more simply, you can just create all of your plans from | |
197 one thread. We do not think this should be an important restriction | |
198 (FFTW is designed for the situation where the only performance-sensitive | |
199 code is the actual execution of the transform), and the benefits of | |
200 shared data between plans are great. | |
201 | |
202 Note also that, since the plan is not modified by @code{fftw_execute}, | |
203 it is safe to execute the @emph{same plan} in parallel by multiple | |
204 threads. However, since a given plan operates by default on a fixed | |
205 array, you need to use one of the new-array execute functions (@pxref{New-array Execute Functions}) so that different threads compute the transform of different data. | |
206 | |
207 (Users should note that these comments only apply to programs using | |
208 shared-memory threads or OpenMP. Parallelism using MPI or forked processes | |
209 involves a separate address-space and global variables for each process, | |
210 and is not susceptible to problems of this sort.) | |
211 | |
212 If you are configured FFTW with the @code{--enable-debug} or | |
213 @code{--enable-debug-malloc} flags (@pxref{Installation on Unix}), | |
214 then @code{fftw_execute} is not thread-safe. These flags are not | |
215 documented because they are intended only for developing | |
216 and debugging FFTW, but if you must use @code{--enable-debug} then you | |
217 should also specifically pass @code{--disable-debug-malloc} for | |
218 @code{fftw_execute} to be thread-safe. | |
219 |