sv-dependency-builds: src/fftw-3.3.3/doc/threads.texi annotate

annotate src/fftw-3.3.3/doc/threads.texi @ 23:619f715526df sv_v2.1

Update Vamp plugin SDK to 2.5

author	Chris Cannam
date	Thu, 09 May 2013 10:52:46 +0100
parents	37bf6b4a2645
children

rev	line source
Chris@10	1 @node Multi-threaded FFTW, Distributed-memory FFTW with MPI, FFTW Reference, Top
Chris@10	2 @chapter Multi-threaded FFTW
Chris@10	3
Chris@10	4 @cindex parallel transform
Chris@10	5 In this chapter we document the parallel FFTW routines for
Chris@10	6 shared-memory parallel hardware. These routines, which support
Chris@10	7 parallel one- and multi-dimensional transforms of both real and
Chris@10	8 complex data, are the easiest way to take advantage of multiple
Chris@10	9 processors with FFTW. They work just like the corresponding
Chris@10	10 uniprocessor transform routines, except that you have an extra
Chris@10	11 initialization routine to call, and there is a routine to set the
Chris@10	12 number of threads to employ. Any program that uses the uniprocessor
Chris@10	13 FFTW can therefore be trivially modified to use the multi-threaded
Chris@10	14 FFTW.
Chris@10	15
Chris@10	16 A shared-memory machine is one in which all CPUs can directly access
Chris@10	17 the same main memory, and such machines are now common due to the
Chris@10	18 ubiquity of multi-core CPUs. FFTW's multi-threading support allows
Chris@10	19 you to utilize these additional CPUs transparently from a single
Chris@10	20 program. However, this does not necessarily translate into
Chris@10	21 performance gains---when multiple threads/CPUs are employed, there is
Chris@10	22 an overhead required for synchronization that may outweigh the
Chris@10	23 computatational parallelism. Therefore, you can only benefit from
Chris@10	24 threads if your problem is sufficiently large.
Chris@10	25 @cindex shared-memory
Chris@10	26 @cindex threads
Chris@10	27
Chris@10	28 @menu
Chris@10	29 * Installation and Supported Hardware/Software::
Chris@10	30 * Usage of Multi-threaded FFTW::
Chris@10	31 * How Many Threads to Use?::
Chris@10	32 * Thread safety::
Chris@10	33 @end menu
Chris@10	34
Chris@10	35 @c ------------------------------------------------------------
Chris@10	36 @node Installation and Supported Hardware/Software, Usage of Multi-threaded FFTW, Multi-threaded FFTW, Multi-threaded FFTW
Chris@10	37 @section Installation and Supported Hardware/Software
Chris@10	38
Chris@10	39 All of the FFTW threads code is located in the @code{threads}
Chris@10	40 subdirectory of the FFTW package. On Unix systems, the FFTW threads
Chris@10	41 libraries and header files can be automatically configured, compiled,
Chris@10	42 and installed along with the uniprocessor FFTW libraries simply by
Chris@10	43 including @code{--enable-threads} in the flags to the @code{configure}
Chris@10	44 script (@pxref{Installation on Unix}), or @code{--enable-openmp} to use
Chris@10	45 @uref{http://www.openmp.org,OpenMP} threads.
Chris@10	46 @fpindex configure
Chris@10	47
Chris@10	48
Chris@10	49 @cindex portability
Chris@10	50 @cindex OpenMP
Chris@10	51 The threads routines require your operating system to have some sort
Chris@10	52 of shared-memory threads support. Specifically, the FFTW threads
Chris@10	53 package works with POSIX threads (available on most Unix variants,
Chris@10	54 from GNU/Linux to MacOS X) and Win32 threads. OpenMP threads, which
Chris@10	55 are supported in many common compilers (e.g. gcc) are also supported,
Chris@10	56 and may give better performance on some systems. (OpenMP threads are
Chris@10	57 also useful if you are employing OpenMP in your own code, in order to
Chris@10	58 minimize conflicts between threading models.) If you have a
Chris@10	59 shared-memory machine that uses a different threads API, it should be
Chris@10	60 a simple matter of programming to include support for it; see the file
Chris@10	61 @code{threads/threads.c} for more detail.
Chris@10	62
Chris@10	63 You can compile FFTW with @emph{both} @code{--enable-threads} and
Chris@10	64 @code{--enable-openmp} at the same time, since they install libraries
Chris@10	65 with different names (@samp{fftw3_threads} and @samp{fftw3_omp}, as
Chris@10	66 described below). However, your programs may only link to @emph{one}
Chris@10	67 of these two libraries at a time.
Chris@10	68
Chris@10	69 Ideally, of course, you should also have multiple processors in order to
Chris@10	70 get any benefit from the threaded transforms.
Chris@10	71
Chris@10	72 @c ------------------------------------------------------------
Chris@10	73 @node Usage of Multi-threaded FFTW, How Many Threads to Use?, Installation and Supported Hardware/Software, Multi-threaded FFTW
Chris@10	74 @section Usage of Multi-threaded FFTW
Chris@10	75
Chris@10	76 Here, it is assumed that the reader is already familiar with the usage
Chris@10	77 of the uniprocessor FFTW routines, described elsewhere in this manual.
Chris@10	78 We only describe what one has to change in order to use the
Chris@10	79 multi-threaded routines.
Chris@10	80
Chris@10	81 @cindex OpenMP
Chris@10	82 First, programs using the parallel complex transforms should be linked
Chris@10	83 with @code{-lfftw3_threads -lfftw3 -lm} on Unix, or @code{-lfftw3_omp
Chris@10	84 -lfftw3 -lm} if you compiled with OpenMP. You will also need to link
Chris@10	85 with whatever library is responsible for threads on your system
Chris@10	86 (e.g. @code{-lpthread} on GNU/Linux) or include whatever compiler flag
Chris@10	87 enables OpenMP (e.g. @code{-fopenmp} with gcc).
Chris@10	88 @cindex linking on Unix
Chris@10	89
Chris@10	90
Chris@10	91 Second, before calling @emph{any} FFTW routines, you should call the
Chris@10	92 function:
Chris@10	93
Chris@10	94 @example
Chris@10	95 int fftw_init_threads(void);
Chris@10	96 @end example
Chris@10	97 @findex fftw_init_threads
Chris@10	98
Chris@10	99 This function, which need only be called once, performs any one-time
Chris@10	100 initialization required to use threads on your system. It returns zero
Chris@10	101 if there was some error (which should not happen under normal
Chris@10	102 circumstances) and a non-zero value otherwise.
Chris@10	103
Chris@10	104 Third, before creating a plan that you want to parallelize, you should
Chris@10	105 call:
Chris@10	106
Chris@10	107 @example
Chris@10	108 void fftw_plan_with_nthreads(int nthreads);
Chris@10	109 @end example
Chris@10	110 @findex fftw_plan_with_nthreads
Chris@10	111
Chris@10	112 The @code{nthreads} argument indicates the number of threads you want
Chris@10	113 FFTW to use (or actually, the maximum number). All plans subsequently
Chris@10	114 created with any planner routine will use that many threads. You can
Chris@10	115 call @code{fftw_plan_with_nthreads}, create some plans, call
Chris@10	116 @code{fftw_plan_with_nthreads} again with a different argument, and
Chris@10	117 create some more plans for a new number of threads. Plans already created
Chris@10	118 before a call to @code{fftw_plan_with_nthreads} are unaffected. If you
Chris@10	119 pass an @code{nthreads} argument of @code{1} (the default), threads are
Chris@10	120 disabled for subsequent plans.
Chris@10	121
Chris@10	122 @cindex OpenMP
Chris@10	123 With OpenMP, to configure FFTW to use all of the currently running
Chris@10	124 OpenMP threads (set by @code{omp_set_num_threads(nthreads)} or by the
Chris@10	125 @code{OMP_NUM_THREADS} environment variable), you can do:
Chris@10	126 @code{fftw_plan_with_nthreads(omp_get_max_threads())}. (The @samp{omp_}
Chris@10	127 OpenMP functions are declared via @code{#include <omp.h>}.)
Chris@10	128
Chris@10	129 @cindex thread safety
Chris@10	130 Given a plan, you then execute it as usual with
Chris@10	131 @code{fftw_execute(plan)}, and the execution will use the number of
Chris@10	132 threads specified when the plan was created. When done, you destroy
Chris@10	133 it as usual with @code{fftw_destroy_plan}. As described in
Chris@10	134 @ref{Thread safety}, plan @emph{execution} is thread-safe, but plan
Chris@10	135 creation and destruction are @emph{not}: you should create/destroy
Chris@10	136 plans only from a single thread, but can safely execute multiple plans
Chris@10	137 in parallel.
Chris@10	138
Chris@10	139 There is one additional routine: if you want to get rid of all memory
Chris@10	140 and other resources allocated internally by FFTW, you can call:
Chris@10	141
Chris@10	142 @example
Chris@10	143 void fftw_cleanup_threads(void);
Chris@10	144 @end example
Chris@10	145 @findex fftw_cleanup_threads
Chris@10	146
Chris@10	147 which is much like the @code{fftw_cleanup()} function except that it
Chris@10	148 also gets rid of threads-related data. You must @emph{not} execute any
Chris@10	149 previously created plans after calling this function.
Chris@10	150
Chris@10	151 We should also mention one other restriction: if you save wisdom from a
Chris@10	152 program using the multi-threaded FFTW, that wisdom @emph{cannot be used}
Chris@10	153 by a program using only the single-threaded FFTW (i.e. not calling
Chris@10	154 @code{fftw_init_threads}). @xref{Words of Wisdom-Saving Plans}.
Chris@10	155
Chris@10	156 @c ------------------------------------------------------------
Chris@10	157 @node How Many Threads to Use?, Thread safety, Usage of Multi-threaded FFTW, Multi-threaded FFTW
Chris@10	158 @section How Many Threads to Use?
Chris@10	159
Chris@10	160 @cindex number of threads
Chris@10	161 There is a fair amount of overhead involved in synchronizing threads,
Chris@10	162 so the optimal number of threads to use depends upon the size of the
Chris@10	163 transform as well as on the number of processors you have.
Chris@10	164
Chris@10	165 As a general rule, you don't want to use more threads than you have
Chris@10	166 processors. (Using more threads will work, but there will be extra
Chris@10	167 overhead with no benefit.) In fact, if the problem size is too small,
Chris@10	168 you may want to use fewer threads than you have processors.
Chris@10	169
Chris@10	170 You will have to experiment with your system to see what level of
Chris@10	171 parallelization is best for your problem size. Typically, the problem
Chris@10	172 will have to involve at least a few thousand data points before threads
Chris@10	173 become beneficial. If you plan with @code{FFTW_PATIENT}, it will
Chris@10	174 automatically disable threads for sizes that don't benefit from
Chris@10	175 parallelization.
Chris@10	176 @ctindex FFTW_PATIENT
Chris@10	177
Chris@10	178 @c ------------------------------------------------------------
Chris@10	179 @node Thread safety, , How Many Threads to Use?, Multi-threaded FFTW
Chris@10	180 @section Thread safety
Chris@10	181
Chris@10	182 @cindex threads
Chris@10	183 @cindex OpenMP
Chris@10	184 @cindex thread safety
Chris@10	185 Users writing multi-threaded programs (including OpenMP) must concern
Chris@10	186 themselves with the @dfn{thread safety} of the libraries they
Chris@10	187 use---that is, whether it is safe to call routines in parallel from
Chris@10	188 multiple threads. FFTW can be used in such an environment, but some
Chris@10	189 care must be taken because the planner routines share data
Chris@10	190 (e.g. wisdom and trigonometric tables) between calls and plans.
Chris@10	191
Chris@10	192 The upshot is that the only thread-safe (re-entrant) routine in FFTW is
Chris@10	193 @code{fftw_execute} (and the new-array variants thereof). All other routines
Chris@10	194 (e.g. the planner) should only be called from one thread at a time. So,
Chris@10	195 for example, you can wrap a semaphore lock around any calls to the
Chris@10	196 planner; even more simply, you can just create all of your plans from
Chris@10	197 one thread. We do not think this should be an important restriction
Chris@10	198 (FFTW is designed for the situation where the only performance-sensitive
Chris@10	199 code is the actual execution of the transform), and the benefits of
Chris@10	200 shared data between plans are great.
Chris@10	201
Chris@10	202 Note also that, since the plan is not modified by @code{fftw_execute},
Chris@10	203 it is safe to execute the @emph{same plan} in parallel by multiple
Chris@10	204 threads. However, since a given plan operates by default on a fixed
Chris@10	205 array, you need to use one of the new-array execute functions (@pxref{New-array Execute Functions}) so that different threads compute the transform of different data.
Chris@10	206
Chris@10	207 (Users should note that these comments only apply to programs using
Chris@10	208 shared-memory threads or OpenMP. Parallelism using MPI or forked processes
Chris@10	209 involves a separate address-space and global variables for each process,
Chris@10	210 and is not susceptible to problems of this sort.)
Chris@10	211
Chris@10	212 If you are configured FFTW with the @code{--enable-debug} or
Chris@10	213 @code{--enable-debug-malloc} flags (@pxref{Installation on Unix}),
Chris@10	214 then @code{fftw_execute} is not thread-safe. These flags are not
Chris@10	215 documented because they are intended only for developing
Chris@10	216 and debugging FFTW, but if you must use @code{--enable-debug} then you
Chris@10	217 should also specifically pass @code{--disable-debug-malloc} for
Chris@10	218 @code{fftw_execute} to be thread-safe.
Chris@10	219

Mercurial > hg > sv-dependency-builds

annotate src/fftw-3.3.3/doc/threads.texi @ 23:619f715526df sv_v2.1