diff src/fftw-3.3.3/doc/threads.texi @ 10:37bf6b4a2645

Add FFTW3
author Chris Cannam
date Wed, 20 Mar 2013 15:35:50 +0000
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/src/fftw-3.3.3/doc/threads.texi	Wed Mar 20 15:35:50 2013 +0000
@@ -0,0 +1,219 @@
+@node Multi-threaded FFTW, Distributed-memory FFTW with MPI, FFTW Reference, Top
+@chapter Multi-threaded FFTW
+
+@cindex parallel transform
+In this chapter we document the parallel FFTW routines for
+shared-memory parallel hardware.  These routines, which support
+parallel one- and multi-dimensional transforms of both real and
+complex data, are the easiest way to take advantage of multiple
+processors with FFTW.  They work just like the corresponding
+uniprocessor transform routines, except that you have an extra
+initialization routine to call, and there is a routine to set the
+number of threads to employ.  Any program that uses the uniprocessor
+FFTW can therefore be trivially modified to use the multi-threaded
+FFTW.
+
+A shared-memory machine is one in which all CPUs can directly access
+the same main memory, and such machines are now common due to the
+ubiquity of multi-core CPUs.  FFTW's multi-threading support allows
+you to utilize these additional CPUs transparently from a single
+program.  However, this does not necessarily translate into
+performance gains---when multiple threads/CPUs are employed, there is
+an overhead required for synchronization that may outweigh the
+computatational parallelism.  Therefore, you can only benefit from
+threads if your problem is sufficiently large.
+@cindex shared-memory
+@cindex threads
+
+@menu
+* Installation and Supported Hardware/Software::  
+* Usage of Multi-threaded FFTW::  
+* How Many Threads to Use?::    
+* Thread safety::               
+@end menu
+
+@c ------------------------------------------------------------
+@node Installation and Supported Hardware/Software, Usage of Multi-threaded FFTW, Multi-threaded FFTW, Multi-threaded FFTW
+@section Installation and Supported Hardware/Software
+
+All of the FFTW threads code is located in the @code{threads}
+subdirectory of the FFTW package.  On Unix systems, the FFTW threads
+libraries and header files can be automatically configured, compiled,
+and installed along with the uniprocessor FFTW libraries simply by
+including @code{--enable-threads} in the flags to the @code{configure}
+script (@pxref{Installation on Unix}), or @code{--enable-openmp} to use
+@uref{http://www.openmp.org,OpenMP} threads.
+@fpindex configure
+
+
+@cindex portability
+@cindex OpenMP
+The threads routines require your operating system to have some sort
+of shared-memory threads support.  Specifically, the FFTW threads
+package works with POSIX threads (available on most Unix variants,
+from GNU/Linux to MacOS X) and Win32 threads.  OpenMP threads, which
+are supported in many common compilers (e.g. gcc) are also supported,
+and may give better performance on some systems.  (OpenMP threads are
+also useful if you are employing OpenMP in your own code, in order to
+minimize conflicts between threading models.)  If you have a
+shared-memory machine that uses a different threads API, it should be
+a simple matter of programming to include support for it; see the file
+@code{threads/threads.c} for more detail.
+
+You can compile FFTW with @emph{both} @code{--enable-threads} and
+@code{--enable-openmp} at the same time, since they install libraries
+with different names (@samp{fftw3_threads} and @samp{fftw3_omp}, as
+described below).  However, your programs may only link to @emph{one}
+of these two libraries at a time.
+
+Ideally, of course, you should also have multiple processors in order to
+get any benefit from the threaded transforms.
+
+@c ------------------------------------------------------------
+@node Usage of Multi-threaded FFTW, How Many Threads to Use?, Installation and Supported Hardware/Software, Multi-threaded FFTW
+@section Usage of Multi-threaded FFTW
+
+Here, it is assumed that the reader is already familiar with the usage
+of the uniprocessor FFTW routines, described elsewhere in this manual.
+We only describe what one has to change in order to use the
+multi-threaded routines.
+
+@cindex OpenMP
+First, programs using the parallel complex transforms should be linked
+with @code{-lfftw3_threads -lfftw3 -lm} on Unix, or @code{-lfftw3_omp
+-lfftw3 -lm} if you compiled with OpenMP. You will also need to link
+with whatever library is responsible for threads on your system
+(e.g. @code{-lpthread} on GNU/Linux) or include whatever compiler flag
+enables OpenMP (e.g. @code{-fopenmp} with gcc).
+@cindex linking on Unix
+
+
+Second, before calling @emph{any} FFTW routines, you should call the
+function:
+
+@example
+int fftw_init_threads(void);
+@end example
+@findex fftw_init_threads
+
+This function, which need only be called once, performs any one-time
+initialization required to use threads on your system.  It returns zero
+if there was some error (which should not happen under normal
+circumstances) and a non-zero value otherwise.
+
+Third, before creating a plan that you want to parallelize, you should
+call:
+
+@example
+void fftw_plan_with_nthreads(int nthreads);
+@end example
+@findex fftw_plan_with_nthreads
+
+The @code{nthreads} argument indicates the number of threads you want
+FFTW to use (or actually, the maximum number).  All plans subsequently
+created with any planner routine will use that many threads.  You can
+call @code{fftw_plan_with_nthreads}, create some plans, call
+@code{fftw_plan_with_nthreads} again with a different argument, and
+create some more plans for a new number of threads.  Plans already created
+before a call to @code{fftw_plan_with_nthreads} are unaffected.  If you
+pass an @code{nthreads} argument of @code{1} (the default), threads are
+disabled for subsequent plans.
+
+@cindex OpenMP
+With OpenMP, to configure FFTW to use all of the currently running
+OpenMP threads (set by @code{omp_set_num_threads(nthreads)} or by the
+@code{OMP_NUM_THREADS} environment variable), you can do:
+@code{fftw_plan_with_nthreads(omp_get_max_threads())}. (The @samp{omp_}
+OpenMP functions are declared via @code{#include <omp.h>}.)
+
+@cindex thread safety
+Given a plan, you then execute it as usual with
+@code{fftw_execute(plan)}, and the execution will use the number of
+threads specified when the plan was created.  When done, you destroy
+it as usual with @code{fftw_destroy_plan}.  As described in
+@ref{Thread safety}, plan @emph{execution} is thread-safe, but plan
+creation and destruction are @emph{not}: you should create/destroy
+plans only from a single thread, but can safely execute multiple plans
+in parallel.
+
+There is one additional routine: if you want to get rid of all memory
+and other resources allocated internally by FFTW, you can call:
+
+@example
+void fftw_cleanup_threads(void);
+@end example
+@findex fftw_cleanup_threads
+
+which is much like the @code{fftw_cleanup()} function except that it
+also gets rid of threads-related data.  You must @emph{not} execute any
+previously created plans after calling this function.
+
+We should also mention one other restriction: if you save wisdom from a
+program using the multi-threaded FFTW, that wisdom @emph{cannot be used}
+by a program using only the single-threaded FFTW (i.e. not calling
+@code{fftw_init_threads}).  @xref{Words of Wisdom-Saving Plans}.
+
+@c ------------------------------------------------------------
+@node How Many Threads to Use?, Thread safety, Usage of Multi-threaded FFTW, Multi-threaded FFTW
+@section How Many Threads to Use?
+
+@cindex number of threads
+There is a fair amount of overhead involved in synchronizing threads,
+so the optimal number of threads to use depends upon the size of the
+transform as well as on the number of processors you have.
+
+As a general rule, you don't want to use more threads than you have
+processors.  (Using more threads will work, but there will be extra
+overhead with no benefit.)  In fact, if the problem size is too small,
+you may want to use fewer threads than you have processors.
+
+You will have to experiment with your system to see what level of
+parallelization is best for your problem size.  Typically, the problem
+will have to involve at least a few thousand data points before threads
+become beneficial.  If you plan with @code{FFTW_PATIENT}, it will
+automatically disable threads for sizes that don't benefit from
+parallelization.
+@ctindex FFTW_PATIENT
+
+@c ------------------------------------------------------------
+@node Thread safety,  , How Many Threads to Use?, Multi-threaded FFTW
+@section Thread safety
+
+@cindex threads
+@cindex OpenMP
+@cindex thread safety
+Users writing multi-threaded programs (including OpenMP) must concern
+themselves with the @dfn{thread safety} of the libraries they
+use---that is, whether it is safe to call routines in parallel from
+multiple threads.  FFTW can be used in such an environment, but some
+care must be taken because the planner routines share data
+(e.g. wisdom and trigonometric tables) between calls and plans.
+
+The upshot is that the only thread-safe (re-entrant) routine in FFTW is
+@code{fftw_execute} (and the new-array variants thereof).  All other routines
+(e.g. the planner) should only be called from one thread at a time.  So,
+for example, you can wrap a semaphore lock around any calls to the
+planner; even more simply, you can just create all of your plans from
+one thread.  We do not think this should be an important restriction
+(FFTW is designed for the situation where the only performance-sensitive
+code is the actual execution of the transform), and the benefits of
+shared data between plans are great.
+
+Note also that, since the plan is not modified by @code{fftw_execute},
+it is safe to execute the @emph{same plan} in parallel by multiple
+threads.  However, since a given plan operates by default on a fixed
+array, you need to use one of the new-array execute functions (@pxref{New-array Execute Functions}) so that different threads compute the transform of different data.
+
+(Users should note that these comments only apply to programs using
+shared-memory threads or OpenMP.  Parallelism using MPI or forked processes
+involves a separate address-space and global variables for each process,
+and is not susceptible to problems of this sort.)
+
+If you are configured FFTW with the @code{--enable-debug} or
+@code{--enable-debug-malloc} flags (@pxref{Installation on Unix}),
+then @code{fftw_execute} is not thread-safe.  These flags are not
+documented because they are intended only for developing
+and debugging FFTW, but if you must use @code{--enable-debug} then you
+should also specifically pass @code{--disable-debug-malloc} for
+@code{fftw_execute} to be thread-safe.
+