comparison src/fftw-3.3.3/doc/threads.texi @ 10:37bf6b4a2645

Add FFTW3
author Chris Cannam
date Wed, 20 Mar 2013 15:35:50 +0000
parents
children
comparison
equal deleted inserted replaced
9:c0fb53affa76 10:37bf6b4a2645
1 @node Multi-threaded FFTW, Distributed-memory FFTW with MPI, FFTW Reference, Top
2 @chapter Multi-threaded FFTW
3
4 @cindex parallel transform
5 In this chapter we document the parallel FFTW routines for
6 shared-memory parallel hardware. These routines, which support
7 parallel one- and multi-dimensional transforms of both real and
8 complex data, are the easiest way to take advantage of multiple
9 processors with FFTW. They work just like the corresponding
10 uniprocessor transform routines, except that you have an extra
11 initialization routine to call, and there is a routine to set the
12 number of threads to employ. Any program that uses the uniprocessor
13 FFTW can therefore be trivially modified to use the multi-threaded
14 FFTW.
15
16 A shared-memory machine is one in which all CPUs can directly access
17 the same main memory, and such machines are now common due to the
18 ubiquity of multi-core CPUs. FFTW's multi-threading support allows
19 you to utilize these additional CPUs transparently from a single
20 program. However, this does not necessarily translate into
21 performance gains---when multiple threads/CPUs are employed, there is
22 an overhead required for synchronization that may outweigh the
23 computatational parallelism. Therefore, you can only benefit from
24 threads if your problem is sufficiently large.
25 @cindex shared-memory
26 @cindex threads
27
28 @menu
29 * Installation and Supported Hardware/Software::
30 * Usage of Multi-threaded FFTW::
31 * How Many Threads to Use?::
32 * Thread safety::
33 @end menu
34
35 @c ------------------------------------------------------------
36 @node Installation and Supported Hardware/Software, Usage of Multi-threaded FFTW, Multi-threaded FFTW, Multi-threaded FFTW
37 @section Installation and Supported Hardware/Software
38
39 All of the FFTW threads code is located in the @code{threads}
40 subdirectory of the FFTW package. On Unix systems, the FFTW threads
41 libraries and header files can be automatically configured, compiled,
42 and installed along with the uniprocessor FFTW libraries simply by
43 including @code{--enable-threads} in the flags to the @code{configure}
44 script (@pxref{Installation on Unix}), or @code{--enable-openmp} to use
45 @uref{http://www.openmp.org,OpenMP} threads.
46 @fpindex configure
47
48
49 @cindex portability
50 @cindex OpenMP
51 The threads routines require your operating system to have some sort
52 of shared-memory threads support. Specifically, the FFTW threads
53 package works with POSIX threads (available on most Unix variants,
54 from GNU/Linux to MacOS X) and Win32 threads. OpenMP threads, which
55 are supported in many common compilers (e.g. gcc) are also supported,
56 and may give better performance on some systems. (OpenMP threads are
57 also useful if you are employing OpenMP in your own code, in order to
58 minimize conflicts between threading models.) If you have a
59 shared-memory machine that uses a different threads API, it should be
60 a simple matter of programming to include support for it; see the file
61 @code{threads/threads.c} for more detail.
62
63 You can compile FFTW with @emph{both} @code{--enable-threads} and
64 @code{--enable-openmp} at the same time, since they install libraries
65 with different names (@samp{fftw3_threads} and @samp{fftw3_omp}, as
66 described below). However, your programs may only link to @emph{one}
67 of these two libraries at a time.
68
69 Ideally, of course, you should also have multiple processors in order to
70 get any benefit from the threaded transforms.
71
72 @c ------------------------------------------------------------
73 @node Usage of Multi-threaded FFTW, How Many Threads to Use?, Installation and Supported Hardware/Software, Multi-threaded FFTW
74 @section Usage of Multi-threaded FFTW
75
76 Here, it is assumed that the reader is already familiar with the usage
77 of the uniprocessor FFTW routines, described elsewhere in this manual.
78 We only describe what one has to change in order to use the
79 multi-threaded routines.
80
81 @cindex OpenMP
82 First, programs using the parallel complex transforms should be linked
83 with @code{-lfftw3_threads -lfftw3 -lm} on Unix, or @code{-lfftw3_omp
84 -lfftw3 -lm} if you compiled with OpenMP. You will also need to link
85 with whatever library is responsible for threads on your system
86 (e.g. @code{-lpthread} on GNU/Linux) or include whatever compiler flag
87 enables OpenMP (e.g. @code{-fopenmp} with gcc).
88 @cindex linking on Unix
89
90
91 Second, before calling @emph{any} FFTW routines, you should call the
92 function:
93
94 @example
95 int fftw_init_threads(void);
96 @end example
97 @findex fftw_init_threads
98
99 This function, which need only be called once, performs any one-time
100 initialization required to use threads on your system. It returns zero
101 if there was some error (which should not happen under normal
102 circumstances) and a non-zero value otherwise.
103
104 Third, before creating a plan that you want to parallelize, you should
105 call:
106
107 @example
108 void fftw_plan_with_nthreads(int nthreads);
109 @end example
110 @findex fftw_plan_with_nthreads
111
112 The @code{nthreads} argument indicates the number of threads you want
113 FFTW to use (or actually, the maximum number). All plans subsequently
114 created with any planner routine will use that many threads. You can
115 call @code{fftw_plan_with_nthreads}, create some plans, call
116 @code{fftw_plan_with_nthreads} again with a different argument, and
117 create some more plans for a new number of threads. Plans already created
118 before a call to @code{fftw_plan_with_nthreads} are unaffected. If you
119 pass an @code{nthreads} argument of @code{1} (the default), threads are
120 disabled for subsequent plans.
121
122 @cindex OpenMP
123 With OpenMP, to configure FFTW to use all of the currently running
124 OpenMP threads (set by @code{omp_set_num_threads(nthreads)} or by the
125 @code{OMP_NUM_THREADS} environment variable), you can do:
126 @code{fftw_plan_with_nthreads(omp_get_max_threads())}. (The @samp{omp_}
127 OpenMP functions are declared via @code{#include <omp.h>}.)
128
129 @cindex thread safety
130 Given a plan, you then execute it as usual with
131 @code{fftw_execute(plan)}, and the execution will use the number of
132 threads specified when the plan was created. When done, you destroy
133 it as usual with @code{fftw_destroy_plan}. As described in
134 @ref{Thread safety}, plan @emph{execution} is thread-safe, but plan
135 creation and destruction are @emph{not}: you should create/destroy
136 plans only from a single thread, but can safely execute multiple plans
137 in parallel.
138
139 There is one additional routine: if you want to get rid of all memory
140 and other resources allocated internally by FFTW, you can call:
141
142 @example
143 void fftw_cleanup_threads(void);
144 @end example
145 @findex fftw_cleanup_threads
146
147 which is much like the @code{fftw_cleanup()} function except that it
148 also gets rid of threads-related data. You must @emph{not} execute any
149 previously created plans after calling this function.
150
151 We should also mention one other restriction: if you save wisdom from a
152 program using the multi-threaded FFTW, that wisdom @emph{cannot be used}
153 by a program using only the single-threaded FFTW (i.e. not calling
154 @code{fftw_init_threads}). @xref{Words of Wisdom-Saving Plans}.
155
156 @c ------------------------------------------------------------
157 @node How Many Threads to Use?, Thread safety, Usage of Multi-threaded FFTW, Multi-threaded FFTW
158 @section How Many Threads to Use?
159
160 @cindex number of threads
161 There is a fair amount of overhead involved in synchronizing threads,
162 so the optimal number of threads to use depends upon the size of the
163 transform as well as on the number of processors you have.
164
165 As a general rule, you don't want to use more threads than you have
166 processors. (Using more threads will work, but there will be extra
167 overhead with no benefit.) In fact, if the problem size is too small,
168 you may want to use fewer threads than you have processors.
169
170 You will have to experiment with your system to see what level of
171 parallelization is best for your problem size. Typically, the problem
172 will have to involve at least a few thousand data points before threads
173 become beneficial. If you plan with @code{FFTW_PATIENT}, it will
174 automatically disable threads for sizes that don't benefit from
175 parallelization.
176 @ctindex FFTW_PATIENT
177
178 @c ------------------------------------------------------------
179 @node Thread safety, , How Many Threads to Use?, Multi-threaded FFTW
180 @section Thread safety
181
182 @cindex threads
183 @cindex OpenMP
184 @cindex thread safety
185 Users writing multi-threaded programs (including OpenMP) must concern
186 themselves with the @dfn{thread safety} of the libraries they
187 use---that is, whether it is safe to call routines in parallel from
188 multiple threads. FFTW can be used in such an environment, but some
189 care must be taken because the planner routines share data
190 (e.g. wisdom and trigonometric tables) between calls and plans.
191
192 The upshot is that the only thread-safe (re-entrant) routine in FFTW is
193 @code{fftw_execute} (and the new-array variants thereof). All other routines
194 (e.g. the planner) should only be called from one thread at a time. So,
195 for example, you can wrap a semaphore lock around any calls to the
196 planner; even more simply, you can just create all of your plans from
197 one thread. We do not think this should be an important restriction
198 (FFTW is designed for the situation where the only performance-sensitive
199 code is the actual execution of the transform), and the benefits of
200 shared data between plans are great.
201
202 Note also that, since the plan is not modified by @code{fftw_execute},
203 it is safe to execute the @emph{same plan} in parallel by multiple
204 threads. However, since a given plan operates by default on a fixed
205 array, you need to use one of the new-array execute functions (@pxref{New-array Execute Functions}) so that different threads compute the transform of different data.
206
207 (Users should note that these comments only apply to programs using
208 shared-memory threads or OpenMP. Parallelism using MPI or forked processes
209 involves a separate address-space and global variables for each process,
210 and is not susceptible to problems of this sort.)
211
212 If you are configured FFTW with the @code{--enable-debug} or
213 @code{--enable-debug-malloc} flags (@pxref{Installation on Unix}),
214 then @code{fftw_execute} is not thread-safe. These flags are not
215 documented because they are intended only for developing
216 and debugging FFTW, but if you must use @code{--enable-debug} then you
217 should also specifically pass @code{--disable-debug-malloc} for
218 @code{fftw_execute} to be thread-safe.
219