sv-dependency-builds: src/fftw-3.3.8/doc/threads.texi annotate

annotate src/fftw-3.3.8/doc/threads.texi @ 169:223a55898ab9 tip default

Add null config files

author	Chris Cannam <cannam@all-day-breakfast.com>
date	Mon, 02 Mar 2020 14:03:47 +0000
parents	bd3cc4d1df30
children

rev	line source
cannam@167	1 @node Multi-threaded FFTW, Distributed-memory FFTW with MPI, FFTW Reference, Top
cannam@167	2 @chapter Multi-threaded FFTW
cannam@167	3
cannam@167	4 @cindex parallel transform
cannam@167	5 In this chapter we document the parallel FFTW routines for
cannam@167	6 shared-memory parallel hardware. These routines, which support
cannam@167	7 parallel one- and multi-dimensional transforms of both real and
cannam@167	8 complex data, are the easiest way to take advantage of multiple
cannam@167	9 processors with FFTW. They work just like the corresponding
cannam@167	10 uniprocessor transform routines, except that you have an extra
cannam@167	11 initialization routine to call, and there is a routine to set the
cannam@167	12 number of threads to employ. Any program that uses the uniprocessor
cannam@167	13 FFTW can therefore be trivially modified to use the multi-threaded
cannam@167	14 FFTW.
cannam@167	15
cannam@167	16 A shared-memory machine is one in which all CPUs can directly access
cannam@167	17 the same main memory, and such machines are now common due to the
cannam@167	18 ubiquity of multi-core CPUs. FFTW's multi-threading support allows
cannam@167	19 you to utilize these additional CPUs transparently from a single
cannam@167	20 program. However, this does not necessarily translate into
cannam@167	21 performance gains---when multiple threads/CPUs are employed, there is
cannam@167	22 an overhead required for synchronization that may outweigh the
cannam@167	23 computatational parallelism. Therefore, you can only benefit from
cannam@167	24 threads if your problem is sufficiently large.
cannam@167	25 @cindex shared-memory
cannam@167	26 @cindex threads
cannam@167	27
cannam@167	28 @menu
cannam@167	29 * Installation and Supported Hardware/Software::
cannam@167	30 * Usage of Multi-threaded FFTW::
cannam@167	31 * How Many Threads to Use?::
cannam@167	32 * Thread safety::
cannam@167	33 @end menu
cannam@167	34
cannam@167	35 @c ------------------------------------------------------------
cannam@167	36 @node Installation and Supported Hardware/Software, Usage of Multi-threaded FFTW, Multi-threaded FFTW, Multi-threaded FFTW
cannam@167	37 @section Installation and Supported Hardware/Software
cannam@167	38
cannam@167	39 All of the FFTW threads code is located in the @code{threads}
cannam@167	40 subdirectory of the FFTW package. On Unix systems, the FFTW threads
cannam@167	41 libraries and header files can be automatically configured, compiled,
cannam@167	42 and installed along with the uniprocessor FFTW libraries simply by
cannam@167	43 including @code{--enable-threads} in the flags to the @code{configure}
cannam@167	44 script (@pxref{Installation on Unix}), or @code{--enable-openmp} to use
cannam@167	45 @uref{http://www.openmp.org,OpenMP} threads.
cannam@167	46 @fpindex configure
cannam@167	47
cannam@167	48
cannam@167	49 @cindex portability
cannam@167	50 @cindex OpenMP
cannam@167	51 The threads routines require your operating system to have some sort
cannam@167	52 of shared-memory threads support. Specifically, the FFTW threads
cannam@167	53 package works with POSIX threads (available on most Unix variants,
cannam@167	54 from GNU/Linux to MacOS X) and Win32 threads. OpenMP threads, which
cannam@167	55 are supported in many common compilers (e.g. gcc) are also supported,
cannam@167	56 and may give better performance on some systems. (OpenMP threads are
cannam@167	57 also useful if you are employing OpenMP in your own code, in order to
cannam@167	58 minimize conflicts between threading models.) If you have a
cannam@167	59 shared-memory machine that uses a different threads API, it should be
cannam@167	60 a simple matter of programming to include support for it; see the file
cannam@167	61 @code{threads/threads.c} for more detail.
cannam@167	62
cannam@167	63 You can compile FFTW with @emph{both} @code{--enable-threads} and
cannam@167	64 @code{--enable-openmp} at the same time, since they install libraries
cannam@167	65 with different names (@samp{fftw3_threads} and @samp{fftw3_omp}, as
cannam@167	66 described below). However, your programs may only link to @emph{one}
cannam@167	67 of these two libraries at a time.
cannam@167	68
cannam@167	69 Ideally, of course, you should also have multiple processors in order to
cannam@167	70 get any benefit from the threaded transforms.
cannam@167	71
cannam@167	72 @c ------------------------------------------------------------
cannam@167	73 @node Usage of Multi-threaded FFTW, How Many Threads to Use?, Installation and Supported Hardware/Software, Multi-threaded FFTW
cannam@167	74 @section Usage of Multi-threaded FFTW
cannam@167	75
cannam@167	76 Here, it is assumed that the reader is already familiar with the usage
cannam@167	77 of the uniprocessor FFTW routines, described elsewhere in this manual.
cannam@167	78 We only describe what one has to change in order to use the
cannam@167	79 multi-threaded routines.
cannam@167	80
cannam@167	81 @cindex OpenMP
cannam@167	82 First, programs using the parallel complex transforms should be linked
cannam@167	83 with @code{-lfftw3_threads -lfftw3 -lm} on Unix, or @code{-lfftw3_omp
cannam@167	84 -lfftw3 -lm} if you compiled with OpenMP. You will also need to link
cannam@167	85 with whatever library is responsible for threads on your system
cannam@167	86 (e.g. @code{-lpthread} on GNU/Linux) or include whatever compiler flag
cannam@167	87 enables OpenMP (e.g. @code{-fopenmp} with gcc).
cannam@167	88 @cindex linking on Unix
cannam@167	89
cannam@167	90
cannam@167	91 Second, before calling @emph{any} FFTW routines, you should call the
cannam@167	92 function:
cannam@167	93
cannam@167	94 @example
cannam@167	95 int fftw_init_threads(void);
cannam@167	96 @end example
cannam@167	97 @findex fftw_init_threads
cannam@167	98
cannam@167	99 This function, which need only be called once, performs any one-time
cannam@167	100 initialization required to use threads on your system. It returns zero
cannam@167	101 if there was some error (which should not happen under normal
cannam@167	102 circumstances) and a non-zero value otherwise.
cannam@167	103
cannam@167	104 Third, before creating a plan that you want to parallelize, you should
cannam@167	105 call:
cannam@167	106
cannam@167	107 @example
cannam@167	108 void fftw_plan_with_nthreads(int nthreads);
cannam@167	109 @end example
cannam@167	110 @findex fftw_plan_with_nthreads
cannam@167	111
cannam@167	112 The @code{nthreads} argument indicates the number of threads you want
cannam@167	113 FFTW to use (or actually, the maximum number). All plans subsequently
cannam@167	114 created with any planner routine will use that many threads. You can
cannam@167	115 call @code{fftw_plan_with_nthreads}, create some plans, call
cannam@167	116 @code{fftw_plan_with_nthreads} again with a different argument, and
cannam@167	117 create some more plans for a new number of threads. Plans already created
cannam@167	118 before a call to @code{fftw_plan_with_nthreads} are unaffected. If you
cannam@167	119 pass an @code{nthreads} argument of @code{1} (the default), threads are
cannam@167	120 disabled for subsequent plans.
cannam@167	121
cannam@167	122 @cindex OpenMP
cannam@167	123 With OpenMP, to configure FFTW to use all of the currently running
cannam@167	124 OpenMP threads (set by @code{omp_set_num_threads(nthreads)} or by the
cannam@167	125 @code{OMP_NUM_THREADS} environment variable), you can do:
cannam@167	126 @code{fftw_plan_with_nthreads(omp_get_max_threads())}. (The @samp{omp_}
cannam@167	127 OpenMP functions are declared via @code{#include <omp.h>}.)
cannam@167	128
cannam@167	129 @cindex thread safety
cannam@167	130 Given a plan, you then execute it as usual with
cannam@167	131 @code{fftw_execute(plan)}, and the execution will use the number of
cannam@167	132 threads specified when the plan was created. When done, you destroy
cannam@167	133 it as usual with @code{fftw_destroy_plan}. As described in
cannam@167	134 @ref{Thread safety}, plan @emph{execution} is thread-safe, but plan
cannam@167	135 creation and destruction are @emph{not}: you should create/destroy
cannam@167	136 plans only from a single thread, but can safely execute multiple plans
cannam@167	137 in parallel.
cannam@167	138
cannam@167	139 There is one additional routine: if you want to get rid of all memory
cannam@167	140 and other resources allocated internally by FFTW, you can call:
cannam@167	141
cannam@167	142 @example
cannam@167	143 void fftw_cleanup_threads(void);
cannam@167	144 @end example
cannam@167	145 @findex fftw_cleanup_threads
cannam@167	146
cannam@167	147 which is much like the @code{fftw_cleanup()} function except that it
cannam@167	148 also gets rid of threads-related data. You must @emph{not} execute any
cannam@167	149 previously created plans after calling this function.
cannam@167	150
cannam@167	151 We should also mention one other restriction: if you save wisdom from a
cannam@167	152 program using the multi-threaded FFTW, that wisdom @emph{cannot be used}
cannam@167	153 by a program using only the single-threaded FFTW (i.e. not calling
cannam@167	154 @code{fftw_init_threads}). @xref{Words of Wisdom-Saving Plans}.
cannam@167	155
cannam@167	156 @c ------------------------------------------------------------
cannam@167	157 @node How Many Threads to Use?, Thread safety, Usage of Multi-threaded FFTW, Multi-threaded FFTW
cannam@167	158 @section How Many Threads to Use?
cannam@167	159
cannam@167	160 @cindex number of threads
cannam@167	161 There is a fair amount of overhead involved in synchronizing threads,
cannam@167	162 so the optimal number of threads to use depends upon the size of the
cannam@167	163 transform as well as on the number of processors you have.
cannam@167	164
cannam@167	165 As a general rule, you don't want to use more threads than you have
cannam@167	166 processors. (Using more threads will work, but there will be extra
cannam@167	167 overhead with no benefit.) In fact, if the problem size is too small,
cannam@167	168 you may want to use fewer threads than you have processors.
cannam@167	169
cannam@167	170 You will have to experiment with your system to see what level of
cannam@167	171 parallelization is best for your problem size. Typically, the problem
cannam@167	172 will have to involve at least a few thousand data points before threads
cannam@167	173 become beneficial. If you plan with @code{FFTW_PATIENT}, it will
cannam@167	174 automatically disable threads for sizes that don't benefit from
cannam@167	175 parallelization.
cannam@167	176 @ctindex FFTW_PATIENT
cannam@167	177
cannam@167	178 @c ------------------------------------------------------------
cannam@167	179 @node Thread safety, , How Many Threads to Use?, Multi-threaded FFTW
cannam@167	180 @section Thread safety
cannam@167	181
cannam@167	182 @cindex threads
cannam@167	183 @cindex OpenMP
cannam@167	184 @cindex thread safety
cannam@167	185 Users writing multi-threaded programs (including OpenMP) must concern
cannam@167	186 themselves with the @dfn{thread safety} of the libraries they
cannam@167	187 use---that is, whether it is safe to call routines in parallel from
cannam@167	188 multiple threads. FFTW can be used in such an environment, but some
cannam@167	189 care must be taken because the planner routines share data
cannam@167	190 (e.g. wisdom and trigonometric tables) between calls and plans.
cannam@167	191
cannam@167	192 The upshot is that the only thread-safe routine in FFTW is
cannam@167	193 @code{fftw_execute} (and the new-array variants thereof). All other routines
cannam@167	194 (e.g. the planner) should only be called from one thread at a time. So,
cannam@167	195 for example, you can wrap a semaphore lock around any calls to the
cannam@167	196 planner; even more simply, you can just create all of your plans from
cannam@167	197 one thread. We do not think this should be an important restriction
cannam@167	198 (FFTW is designed for the situation where the only performance-sensitive
cannam@167	199 code is the actual execution of the transform), and the benefits of
cannam@167	200 shared data between plans are great.
cannam@167	201
cannam@167	202 Note also that, since the plan is not modified by @code{fftw_execute},
cannam@167	203 it is safe to execute the @emph{same plan} in parallel by multiple
cannam@167	204 threads. However, since a given plan operates by default on a fixed
cannam@167	205 array, you need to use one of the new-array execute functions (@pxref{New-array Execute Functions}) so that different threads compute the transform of different data.
cannam@167	206
cannam@167	207 (Users should note that these comments only apply to programs using
cannam@167	208 shared-memory threads or OpenMP. Parallelism using MPI or forked processes
cannam@167	209 involves a separate address-space and global variables for each process,
cannam@167	210 and is not susceptible to problems of this sort.)
cannam@167	211
cannam@167	212 The FFTW planner is intended to be called from a single thread. If you
cannam@167	213 really must call it from multiple threads, you are expected to grab
cannam@167	214 whatever lock makes sense for your application, with the understanding
cannam@167	215 that you may be holding that lock for a long time, which is undesirable.
cannam@167	216
cannam@167	217 Neither strategy works, however, in the following situation. The
cannam@167	218 ``application'' is structured as a set of ``plugins'' which are unaware
cannam@167	219 of each other, and for whatever reason the ``plugins'' cannot coordinate
cannam@167	220 on grabbing the lock. (This is not a technical problem, but an
cannam@167	221 organizational one. The ``plugins'' are written by independent agents,
cannam@167	222 and from the perspective of each plugin's author, each plugin is using
cannam@167	223 FFTW correctly from a single thread.) To cope with this situation,
cannam@167	224 starting from FFTW-3.3.5, FFTW supports an API to make the planner
cannam@167	225 thread-safe:
cannam@167	226
cannam@167	227 @example
cannam@167	228 void fftw_make_planner_thread_safe(void);
cannam@167	229 @end example
cannam@167	230 @findex fftw_make_planner_thread_safe
cannam@167	231
cannam@167	232 This call operates by brute force: It just installs a hook that wraps a
cannam@167	233 lock (chosen by us) around all planner calls. So there is no magic and
cannam@167	234 you get the worst of all worlds. The planner is still single-threaded,
cannam@167	235 but you cannot choose which lock to use. The planner still holds the
cannam@167	236 lock for a long time, but you cannot impose a timeout on lock
cannam@167	237 acquisition. As of FFTW-3.3.5 and FFTW-3.3.6, this call does not work
cannam@167	238 when using OpenMP as threading substrate. (Suggestions on what to do
cannam@167	239 about this bug are welcome.) @emph{Do not use
cannam@167	240 @code{fftw_make_planner_thread_safe} unless there is no other choice,}
cannam@167	241 such as in the application/plugin situation.

Mercurial > hg > sv-dependency-builds

annotate src/fftw-3.3.8/doc/threads.texi @ 169:223a55898ab9 tip default