Mercurial > hg > sv-dependency-builds
diff src/fftw-3.3.3/doc/threads.texi @ 10:37bf6b4a2645
Add FFTW3
author | Chris Cannam |
---|---|
date | Wed, 20 Mar 2013 15:35:50 +0000 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/src/fftw-3.3.3/doc/threads.texi Wed Mar 20 15:35:50 2013 +0000 @@ -0,0 +1,219 @@ +@node Multi-threaded FFTW, Distributed-memory FFTW with MPI, FFTW Reference, Top +@chapter Multi-threaded FFTW + +@cindex parallel transform +In this chapter we document the parallel FFTW routines for +shared-memory parallel hardware. These routines, which support +parallel one- and multi-dimensional transforms of both real and +complex data, are the easiest way to take advantage of multiple +processors with FFTW. They work just like the corresponding +uniprocessor transform routines, except that you have an extra +initialization routine to call, and there is a routine to set the +number of threads to employ. Any program that uses the uniprocessor +FFTW can therefore be trivially modified to use the multi-threaded +FFTW. + +A shared-memory machine is one in which all CPUs can directly access +the same main memory, and such machines are now common due to the +ubiquity of multi-core CPUs. FFTW's multi-threading support allows +you to utilize these additional CPUs transparently from a single +program. However, this does not necessarily translate into +performance gains---when multiple threads/CPUs are employed, there is +an overhead required for synchronization that may outweigh the +computatational parallelism. Therefore, you can only benefit from +threads if your problem is sufficiently large. +@cindex shared-memory +@cindex threads + +@menu +* Installation and Supported Hardware/Software:: +* Usage of Multi-threaded FFTW:: +* How Many Threads to Use?:: +* Thread safety:: +@end menu + +@c ------------------------------------------------------------ +@node Installation and Supported Hardware/Software, Usage of Multi-threaded FFTW, Multi-threaded FFTW, Multi-threaded FFTW +@section Installation and Supported Hardware/Software + +All of the FFTW threads code is located in the @code{threads} +subdirectory of the FFTW package. On Unix systems, the FFTW threads +libraries and header files can be automatically configured, compiled, +and installed along with the uniprocessor FFTW libraries simply by +including @code{--enable-threads} in the flags to the @code{configure} +script (@pxref{Installation on Unix}), or @code{--enable-openmp} to use +@uref{http://www.openmp.org,OpenMP} threads. +@fpindex configure + + +@cindex portability +@cindex OpenMP +The threads routines require your operating system to have some sort +of shared-memory threads support. Specifically, the FFTW threads +package works with POSIX threads (available on most Unix variants, +from GNU/Linux to MacOS X) and Win32 threads. OpenMP threads, which +are supported in many common compilers (e.g. gcc) are also supported, +and may give better performance on some systems. (OpenMP threads are +also useful if you are employing OpenMP in your own code, in order to +minimize conflicts between threading models.) If you have a +shared-memory machine that uses a different threads API, it should be +a simple matter of programming to include support for it; see the file +@code{threads/threads.c} for more detail. + +You can compile FFTW with @emph{both} @code{--enable-threads} and +@code{--enable-openmp} at the same time, since they install libraries +with different names (@samp{fftw3_threads} and @samp{fftw3_omp}, as +described below). However, your programs may only link to @emph{one} +of these two libraries at a time. + +Ideally, of course, you should also have multiple processors in order to +get any benefit from the threaded transforms. + +@c ------------------------------------------------------------ +@node Usage of Multi-threaded FFTW, How Many Threads to Use?, Installation and Supported Hardware/Software, Multi-threaded FFTW +@section Usage of Multi-threaded FFTW + +Here, it is assumed that the reader is already familiar with the usage +of the uniprocessor FFTW routines, described elsewhere in this manual. +We only describe what one has to change in order to use the +multi-threaded routines. + +@cindex OpenMP +First, programs using the parallel complex transforms should be linked +with @code{-lfftw3_threads -lfftw3 -lm} on Unix, or @code{-lfftw3_omp +-lfftw3 -lm} if you compiled with OpenMP. You will also need to link +with whatever library is responsible for threads on your system +(e.g. @code{-lpthread} on GNU/Linux) or include whatever compiler flag +enables OpenMP (e.g. @code{-fopenmp} with gcc). +@cindex linking on Unix + + +Second, before calling @emph{any} FFTW routines, you should call the +function: + +@example +int fftw_init_threads(void); +@end example +@findex fftw_init_threads + +This function, which need only be called once, performs any one-time +initialization required to use threads on your system. It returns zero +if there was some error (which should not happen under normal +circumstances) and a non-zero value otherwise. + +Third, before creating a plan that you want to parallelize, you should +call: + +@example +void fftw_plan_with_nthreads(int nthreads); +@end example +@findex fftw_plan_with_nthreads + +The @code{nthreads} argument indicates the number of threads you want +FFTW to use (or actually, the maximum number). All plans subsequently +created with any planner routine will use that many threads. You can +call @code{fftw_plan_with_nthreads}, create some plans, call +@code{fftw_plan_with_nthreads} again with a different argument, and +create some more plans for a new number of threads. Plans already created +before a call to @code{fftw_plan_with_nthreads} are unaffected. If you +pass an @code{nthreads} argument of @code{1} (the default), threads are +disabled for subsequent plans. + +@cindex OpenMP +With OpenMP, to configure FFTW to use all of the currently running +OpenMP threads (set by @code{omp_set_num_threads(nthreads)} or by the +@code{OMP_NUM_THREADS} environment variable), you can do: +@code{fftw_plan_with_nthreads(omp_get_max_threads())}. (The @samp{omp_} +OpenMP functions are declared via @code{#include <omp.h>}.) + +@cindex thread safety +Given a plan, you then execute it as usual with +@code{fftw_execute(plan)}, and the execution will use the number of +threads specified when the plan was created. When done, you destroy +it as usual with @code{fftw_destroy_plan}. As described in +@ref{Thread safety}, plan @emph{execution} is thread-safe, but plan +creation and destruction are @emph{not}: you should create/destroy +plans only from a single thread, but can safely execute multiple plans +in parallel. + +There is one additional routine: if you want to get rid of all memory +and other resources allocated internally by FFTW, you can call: + +@example +void fftw_cleanup_threads(void); +@end example +@findex fftw_cleanup_threads + +which is much like the @code{fftw_cleanup()} function except that it +also gets rid of threads-related data. You must @emph{not} execute any +previously created plans after calling this function. + +We should also mention one other restriction: if you save wisdom from a +program using the multi-threaded FFTW, that wisdom @emph{cannot be used} +by a program using only the single-threaded FFTW (i.e. not calling +@code{fftw_init_threads}). @xref{Words of Wisdom-Saving Plans}. + +@c ------------------------------------------------------------ +@node How Many Threads to Use?, Thread safety, Usage of Multi-threaded FFTW, Multi-threaded FFTW +@section How Many Threads to Use? + +@cindex number of threads +There is a fair amount of overhead involved in synchronizing threads, +so the optimal number of threads to use depends upon the size of the +transform as well as on the number of processors you have. + +As a general rule, you don't want to use more threads than you have +processors. (Using more threads will work, but there will be extra +overhead with no benefit.) In fact, if the problem size is too small, +you may want to use fewer threads than you have processors. + +You will have to experiment with your system to see what level of +parallelization is best for your problem size. Typically, the problem +will have to involve at least a few thousand data points before threads +become beneficial. If you plan with @code{FFTW_PATIENT}, it will +automatically disable threads for sizes that don't benefit from +parallelization. +@ctindex FFTW_PATIENT + +@c ------------------------------------------------------------ +@node Thread safety, , How Many Threads to Use?, Multi-threaded FFTW +@section Thread safety + +@cindex threads +@cindex OpenMP +@cindex thread safety +Users writing multi-threaded programs (including OpenMP) must concern +themselves with the @dfn{thread safety} of the libraries they +use---that is, whether it is safe to call routines in parallel from +multiple threads. FFTW can be used in such an environment, but some +care must be taken because the planner routines share data +(e.g. wisdom and trigonometric tables) between calls and plans. + +The upshot is that the only thread-safe (re-entrant) routine in FFTW is +@code{fftw_execute} (and the new-array variants thereof). All other routines +(e.g. the planner) should only be called from one thread at a time. So, +for example, you can wrap a semaphore lock around any calls to the +planner; even more simply, you can just create all of your plans from +one thread. We do not think this should be an important restriction +(FFTW is designed for the situation where the only performance-sensitive +code is the actual execution of the transform), and the benefits of +shared data between plans are great. + +Note also that, since the plan is not modified by @code{fftw_execute}, +it is safe to execute the @emph{same plan} in parallel by multiple +threads. However, since a given plan operates by default on a fixed +array, you need to use one of the new-array execute functions (@pxref{New-array Execute Functions}) so that different threads compute the transform of different data. + +(Users should note that these comments only apply to programs using +shared-memory threads or OpenMP. Parallelism using MPI or forked processes +involves a separate address-space and global variables for each process, +and is not susceptible to problems of this sort.) + +If you are configured FFTW with the @code{--enable-debug} or +@code{--enable-debug-malloc} flags (@pxref{Installation on Unix}), +then @code{fftw_execute} is not thread-safe. These flags are not +documented because they are intended only for developing +and debugging FFTW, but if you must use @code{--enable-debug} then you +should also specifically pass @code{--disable-debug-malloc} for +@code{fftw_execute} to be thread-safe. +