annotate src/fftw-3.3.3/doc/threads.texi @ 83:ae30d91d2ffe

Replace these with versions built using an older toolset (so as to avoid ABI compatibilities when linking on Ubuntu 14.04 for packaging purposes)
author Chris Cannam
date Fri, 07 Feb 2020 11:51:13 +0000
parents 37bf6b4a2645
children
rev   line source
Chris@10 1 @node Multi-threaded FFTW, Distributed-memory FFTW with MPI, FFTW Reference, Top
Chris@10 2 @chapter Multi-threaded FFTW
Chris@10 3
Chris@10 4 @cindex parallel transform
Chris@10 5 In this chapter we document the parallel FFTW routines for
Chris@10 6 shared-memory parallel hardware. These routines, which support
Chris@10 7 parallel one- and multi-dimensional transforms of both real and
Chris@10 8 complex data, are the easiest way to take advantage of multiple
Chris@10 9 processors with FFTW. They work just like the corresponding
Chris@10 10 uniprocessor transform routines, except that you have an extra
Chris@10 11 initialization routine to call, and there is a routine to set the
Chris@10 12 number of threads to employ. Any program that uses the uniprocessor
Chris@10 13 FFTW can therefore be trivially modified to use the multi-threaded
Chris@10 14 FFTW.
Chris@10 15
Chris@10 16 A shared-memory machine is one in which all CPUs can directly access
Chris@10 17 the same main memory, and such machines are now common due to the
Chris@10 18 ubiquity of multi-core CPUs. FFTW's multi-threading support allows
Chris@10 19 you to utilize these additional CPUs transparently from a single
Chris@10 20 program. However, this does not necessarily translate into
Chris@10 21 performance gains---when multiple threads/CPUs are employed, there is
Chris@10 22 an overhead required for synchronization that may outweigh the
Chris@10 23 computatational parallelism. Therefore, you can only benefit from
Chris@10 24 threads if your problem is sufficiently large.
Chris@10 25 @cindex shared-memory
Chris@10 26 @cindex threads
Chris@10 27
Chris@10 28 @menu
Chris@10 29 * Installation and Supported Hardware/Software::
Chris@10 30 * Usage of Multi-threaded FFTW::
Chris@10 31 * How Many Threads to Use?::
Chris@10 32 * Thread safety::
Chris@10 33 @end menu
Chris@10 34
Chris@10 35 @c ------------------------------------------------------------
Chris@10 36 @node Installation and Supported Hardware/Software, Usage of Multi-threaded FFTW, Multi-threaded FFTW, Multi-threaded FFTW
Chris@10 37 @section Installation and Supported Hardware/Software
Chris@10 38
Chris@10 39 All of the FFTW threads code is located in the @code{threads}
Chris@10 40 subdirectory of the FFTW package. On Unix systems, the FFTW threads
Chris@10 41 libraries and header files can be automatically configured, compiled,
Chris@10 42 and installed along with the uniprocessor FFTW libraries simply by
Chris@10 43 including @code{--enable-threads} in the flags to the @code{configure}
Chris@10 44 script (@pxref{Installation on Unix}), or @code{--enable-openmp} to use
Chris@10 45 @uref{http://www.openmp.org,OpenMP} threads.
Chris@10 46 @fpindex configure
Chris@10 47
Chris@10 48
Chris@10 49 @cindex portability
Chris@10 50 @cindex OpenMP
Chris@10 51 The threads routines require your operating system to have some sort
Chris@10 52 of shared-memory threads support. Specifically, the FFTW threads
Chris@10 53 package works with POSIX threads (available on most Unix variants,
Chris@10 54 from GNU/Linux to MacOS X) and Win32 threads. OpenMP threads, which
Chris@10 55 are supported in many common compilers (e.g. gcc) are also supported,
Chris@10 56 and may give better performance on some systems. (OpenMP threads are
Chris@10 57 also useful if you are employing OpenMP in your own code, in order to
Chris@10 58 minimize conflicts between threading models.) If you have a
Chris@10 59 shared-memory machine that uses a different threads API, it should be
Chris@10 60 a simple matter of programming to include support for it; see the file
Chris@10 61 @code{threads/threads.c} for more detail.
Chris@10 62
Chris@10 63 You can compile FFTW with @emph{both} @code{--enable-threads} and
Chris@10 64 @code{--enable-openmp} at the same time, since they install libraries
Chris@10 65 with different names (@samp{fftw3_threads} and @samp{fftw3_omp}, as
Chris@10 66 described below). However, your programs may only link to @emph{one}
Chris@10 67 of these two libraries at a time.
Chris@10 68
Chris@10 69 Ideally, of course, you should also have multiple processors in order to
Chris@10 70 get any benefit from the threaded transforms.
Chris@10 71
Chris@10 72 @c ------------------------------------------------------------
Chris@10 73 @node Usage of Multi-threaded FFTW, How Many Threads to Use?, Installation and Supported Hardware/Software, Multi-threaded FFTW
Chris@10 74 @section Usage of Multi-threaded FFTW
Chris@10 75
Chris@10 76 Here, it is assumed that the reader is already familiar with the usage
Chris@10 77 of the uniprocessor FFTW routines, described elsewhere in this manual.
Chris@10 78 We only describe what one has to change in order to use the
Chris@10 79 multi-threaded routines.
Chris@10 80
Chris@10 81 @cindex OpenMP
Chris@10 82 First, programs using the parallel complex transforms should be linked
Chris@10 83 with @code{-lfftw3_threads -lfftw3 -lm} on Unix, or @code{-lfftw3_omp
Chris@10 84 -lfftw3 -lm} if you compiled with OpenMP. You will also need to link
Chris@10 85 with whatever library is responsible for threads on your system
Chris@10 86 (e.g. @code{-lpthread} on GNU/Linux) or include whatever compiler flag
Chris@10 87 enables OpenMP (e.g. @code{-fopenmp} with gcc).
Chris@10 88 @cindex linking on Unix
Chris@10 89
Chris@10 90
Chris@10 91 Second, before calling @emph{any} FFTW routines, you should call the
Chris@10 92 function:
Chris@10 93
Chris@10 94 @example
Chris@10 95 int fftw_init_threads(void);
Chris@10 96 @end example
Chris@10 97 @findex fftw_init_threads
Chris@10 98
Chris@10 99 This function, which need only be called once, performs any one-time
Chris@10 100 initialization required to use threads on your system. It returns zero
Chris@10 101 if there was some error (which should not happen under normal
Chris@10 102 circumstances) and a non-zero value otherwise.
Chris@10 103
Chris@10 104 Third, before creating a plan that you want to parallelize, you should
Chris@10 105 call:
Chris@10 106
Chris@10 107 @example
Chris@10 108 void fftw_plan_with_nthreads(int nthreads);
Chris@10 109 @end example
Chris@10 110 @findex fftw_plan_with_nthreads
Chris@10 111
Chris@10 112 The @code{nthreads} argument indicates the number of threads you want
Chris@10 113 FFTW to use (or actually, the maximum number). All plans subsequently
Chris@10 114 created with any planner routine will use that many threads. You can
Chris@10 115 call @code{fftw_plan_with_nthreads}, create some plans, call
Chris@10 116 @code{fftw_plan_with_nthreads} again with a different argument, and
Chris@10 117 create some more plans for a new number of threads. Plans already created
Chris@10 118 before a call to @code{fftw_plan_with_nthreads} are unaffected. If you
Chris@10 119 pass an @code{nthreads} argument of @code{1} (the default), threads are
Chris@10 120 disabled for subsequent plans.
Chris@10 121
Chris@10 122 @cindex OpenMP
Chris@10 123 With OpenMP, to configure FFTW to use all of the currently running
Chris@10 124 OpenMP threads (set by @code{omp_set_num_threads(nthreads)} or by the
Chris@10 125 @code{OMP_NUM_THREADS} environment variable), you can do:
Chris@10 126 @code{fftw_plan_with_nthreads(omp_get_max_threads())}. (The @samp{omp_}
Chris@10 127 OpenMP functions are declared via @code{#include <omp.h>}.)
Chris@10 128
Chris@10 129 @cindex thread safety
Chris@10 130 Given a plan, you then execute it as usual with
Chris@10 131 @code{fftw_execute(plan)}, and the execution will use the number of
Chris@10 132 threads specified when the plan was created. When done, you destroy
Chris@10 133 it as usual with @code{fftw_destroy_plan}. As described in
Chris@10 134 @ref{Thread safety}, plan @emph{execution} is thread-safe, but plan
Chris@10 135 creation and destruction are @emph{not}: you should create/destroy
Chris@10 136 plans only from a single thread, but can safely execute multiple plans
Chris@10 137 in parallel.
Chris@10 138
Chris@10 139 There is one additional routine: if you want to get rid of all memory
Chris@10 140 and other resources allocated internally by FFTW, you can call:
Chris@10 141
Chris@10 142 @example
Chris@10 143 void fftw_cleanup_threads(void);
Chris@10 144 @end example
Chris@10 145 @findex fftw_cleanup_threads
Chris@10 146
Chris@10 147 which is much like the @code{fftw_cleanup()} function except that it
Chris@10 148 also gets rid of threads-related data. You must @emph{not} execute any
Chris@10 149 previously created plans after calling this function.
Chris@10 150
Chris@10 151 We should also mention one other restriction: if you save wisdom from a
Chris@10 152 program using the multi-threaded FFTW, that wisdom @emph{cannot be used}
Chris@10 153 by a program using only the single-threaded FFTW (i.e. not calling
Chris@10 154 @code{fftw_init_threads}). @xref{Words of Wisdom-Saving Plans}.
Chris@10 155
Chris@10 156 @c ------------------------------------------------------------
Chris@10 157 @node How Many Threads to Use?, Thread safety, Usage of Multi-threaded FFTW, Multi-threaded FFTW
Chris@10 158 @section How Many Threads to Use?
Chris@10 159
Chris@10 160 @cindex number of threads
Chris@10 161 There is a fair amount of overhead involved in synchronizing threads,
Chris@10 162 so the optimal number of threads to use depends upon the size of the
Chris@10 163 transform as well as on the number of processors you have.
Chris@10 164
Chris@10 165 As a general rule, you don't want to use more threads than you have
Chris@10 166 processors. (Using more threads will work, but there will be extra
Chris@10 167 overhead with no benefit.) In fact, if the problem size is too small,
Chris@10 168 you may want to use fewer threads than you have processors.
Chris@10 169
Chris@10 170 You will have to experiment with your system to see what level of
Chris@10 171 parallelization is best for your problem size. Typically, the problem
Chris@10 172 will have to involve at least a few thousand data points before threads
Chris@10 173 become beneficial. If you plan with @code{FFTW_PATIENT}, it will
Chris@10 174 automatically disable threads for sizes that don't benefit from
Chris@10 175 parallelization.
Chris@10 176 @ctindex FFTW_PATIENT
Chris@10 177
Chris@10 178 @c ------------------------------------------------------------
Chris@10 179 @node Thread safety, , How Many Threads to Use?, Multi-threaded FFTW
Chris@10 180 @section Thread safety
Chris@10 181
Chris@10 182 @cindex threads
Chris@10 183 @cindex OpenMP
Chris@10 184 @cindex thread safety
Chris@10 185 Users writing multi-threaded programs (including OpenMP) must concern
Chris@10 186 themselves with the @dfn{thread safety} of the libraries they
Chris@10 187 use---that is, whether it is safe to call routines in parallel from
Chris@10 188 multiple threads. FFTW can be used in such an environment, but some
Chris@10 189 care must be taken because the planner routines share data
Chris@10 190 (e.g. wisdom and trigonometric tables) between calls and plans.
Chris@10 191
Chris@10 192 The upshot is that the only thread-safe (re-entrant) routine in FFTW is
Chris@10 193 @code{fftw_execute} (and the new-array variants thereof). All other routines
Chris@10 194 (e.g. the planner) should only be called from one thread at a time. So,
Chris@10 195 for example, you can wrap a semaphore lock around any calls to the
Chris@10 196 planner; even more simply, you can just create all of your plans from
Chris@10 197 one thread. We do not think this should be an important restriction
Chris@10 198 (FFTW is designed for the situation where the only performance-sensitive
Chris@10 199 code is the actual execution of the transform), and the benefits of
Chris@10 200 shared data between plans are great.
Chris@10 201
Chris@10 202 Note also that, since the plan is not modified by @code{fftw_execute},
Chris@10 203 it is safe to execute the @emph{same plan} in parallel by multiple
Chris@10 204 threads. However, since a given plan operates by default on a fixed
Chris@10 205 array, you need to use one of the new-array execute functions (@pxref{New-array Execute Functions}) so that different threads compute the transform of different data.
Chris@10 206
Chris@10 207 (Users should note that these comments only apply to programs using
Chris@10 208 shared-memory threads or OpenMP. Parallelism using MPI or forked processes
Chris@10 209 involves a separate address-space and global variables for each process,
Chris@10 210 and is not susceptible to problems of this sort.)
Chris@10 211
Chris@10 212 If you are configured FFTW with the @code{--enable-debug} or
Chris@10 213 @code{--enable-debug-malloc} flags (@pxref{Installation on Unix}),
Chris@10 214 then @code{fftw_execute} is not thread-safe. These flags are not
Chris@10 215 documented because they are intended only for developing
Chris@10 216 and debugging FFTW, but if you must use @code{--enable-debug} then you
Chris@10 217 should also specifically pass @code{--disable-debug-malloc} for
Chris@10 218 @code{fftw_execute} to be thread-safe.
Chris@10 219