cannam@167: @node Multi-threaded FFTW, Distributed-memory FFTW with MPI, FFTW Reference, Top cannam@167: @chapter Multi-threaded FFTW cannam@167: cannam@167: @cindex parallel transform cannam@167: In this chapter we document the parallel FFTW routines for cannam@167: shared-memory parallel hardware. These routines, which support cannam@167: parallel one- and multi-dimensional transforms of both real and cannam@167: complex data, are the easiest way to take advantage of multiple cannam@167: processors with FFTW. They work just like the corresponding cannam@167: uniprocessor transform routines, except that you have an extra cannam@167: initialization routine to call, and there is a routine to set the cannam@167: number of threads to employ. Any program that uses the uniprocessor cannam@167: FFTW can therefore be trivially modified to use the multi-threaded cannam@167: FFTW. cannam@167: cannam@167: A shared-memory machine is one in which all CPUs can directly access cannam@167: the same main memory, and such machines are now common due to the cannam@167: ubiquity of multi-core CPUs. FFTW's multi-threading support allows cannam@167: you to utilize these additional CPUs transparently from a single cannam@167: program. However, this does not necessarily translate into cannam@167: performance gains---when multiple threads/CPUs are employed, there is cannam@167: an overhead required for synchronization that may outweigh the cannam@167: computatational parallelism. Therefore, you can only benefit from cannam@167: threads if your problem is sufficiently large. cannam@167: @cindex shared-memory cannam@167: @cindex threads cannam@167: cannam@167: @menu cannam@167: * Installation and Supported Hardware/Software:: cannam@167: * Usage of Multi-threaded FFTW:: cannam@167: * How Many Threads to Use?:: cannam@167: * Thread safety:: cannam@167: @end menu cannam@167: cannam@167: @c ------------------------------------------------------------ cannam@167: @node Installation and Supported Hardware/Software, Usage of Multi-threaded FFTW, Multi-threaded FFTW, Multi-threaded FFTW cannam@167: @section Installation and Supported Hardware/Software cannam@167: cannam@167: All of the FFTW threads code is located in the @code{threads} cannam@167: subdirectory of the FFTW package. On Unix systems, the FFTW threads cannam@167: libraries and header files can be automatically configured, compiled, cannam@167: and installed along with the uniprocessor FFTW libraries simply by cannam@167: including @code{--enable-threads} in the flags to the @code{configure} cannam@167: script (@pxref{Installation on Unix}), or @code{--enable-openmp} to use cannam@167: @uref{http://www.openmp.org,OpenMP} threads. cannam@167: @fpindex configure cannam@167: cannam@167: cannam@167: @cindex portability cannam@167: @cindex OpenMP cannam@167: The threads routines require your operating system to have some sort cannam@167: of shared-memory threads support. Specifically, the FFTW threads cannam@167: package works with POSIX threads (available on most Unix variants, cannam@167: from GNU/Linux to MacOS X) and Win32 threads. OpenMP threads, which cannam@167: are supported in many common compilers (e.g. gcc) are also supported, cannam@167: and may give better performance on some systems. (OpenMP threads are cannam@167: also useful if you are employing OpenMP in your own code, in order to cannam@167: minimize conflicts between threading models.) If you have a cannam@167: shared-memory machine that uses a different threads API, it should be cannam@167: a simple matter of programming to include support for it; see the file cannam@167: @code{threads/threads.c} for more detail. cannam@167: cannam@167: You can compile FFTW with @emph{both} @code{--enable-threads} and cannam@167: @code{--enable-openmp} at the same time, since they install libraries cannam@167: with different names (@samp{fftw3_threads} and @samp{fftw3_omp}, as cannam@167: described below). However, your programs may only link to @emph{one} cannam@167: of these two libraries at a time. cannam@167: cannam@167: Ideally, of course, you should also have multiple processors in order to cannam@167: get any benefit from the threaded transforms. cannam@167: cannam@167: @c ------------------------------------------------------------ cannam@167: @node Usage of Multi-threaded FFTW, How Many Threads to Use?, Installation and Supported Hardware/Software, Multi-threaded FFTW cannam@167: @section Usage of Multi-threaded FFTW cannam@167: cannam@167: Here, it is assumed that the reader is already familiar with the usage cannam@167: of the uniprocessor FFTW routines, described elsewhere in this manual. cannam@167: We only describe what one has to change in order to use the cannam@167: multi-threaded routines. cannam@167: cannam@167: @cindex OpenMP cannam@167: First, programs using the parallel complex transforms should be linked cannam@167: with @code{-lfftw3_threads -lfftw3 -lm} on Unix, or @code{-lfftw3_omp cannam@167: -lfftw3 -lm} if you compiled with OpenMP. You will also need to link cannam@167: with whatever library is responsible for threads on your system cannam@167: (e.g. @code{-lpthread} on GNU/Linux) or include whatever compiler flag cannam@167: enables OpenMP (e.g. @code{-fopenmp} with gcc). cannam@167: @cindex linking on Unix cannam@167: cannam@167: cannam@167: Second, before calling @emph{any} FFTW routines, you should call the cannam@167: function: cannam@167: cannam@167: @example cannam@167: int fftw_init_threads(void); cannam@167: @end example cannam@167: @findex fftw_init_threads cannam@167: cannam@167: This function, which need only be called once, performs any one-time cannam@167: initialization required to use threads on your system. It returns zero cannam@167: if there was some error (which should not happen under normal cannam@167: circumstances) and a non-zero value otherwise. cannam@167: cannam@167: Third, before creating a plan that you want to parallelize, you should cannam@167: call: cannam@167: cannam@167: @example cannam@167: void fftw_plan_with_nthreads(int nthreads); cannam@167: @end example cannam@167: @findex fftw_plan_with_nthreads cannam@167: cannam@167: The @code{nthreads} argument indicates the number of threads you want cannam@167: FFTW to use (or actually, the maximum number). All plans subsequently cannam@167: created with any planner routine will use that many threads. You can cannam@167: call @code{fftw_plan_with_nthreads}, create some plans, call cannam@167: @code{fftw_plan_with_nthreads} again with a different argument, and cannam@167: create some more plans for a new number of threads. Plans already created cannam@167: before a call to @code{fftw_plan_with_nthreads} are unaffected. If you cannam@167: pass an @code{nthreads} argument of @code{1} (the default), threads are cannam@167: disabled for subsequent plans. cannam@167: cannam@167: @cindex OpenMP cannam@167: With OpenMP, to configure FFTW to use all of the currently running cannam@167: OpenMP threads (set by @code{omp_set_num_threads(nthreads)} or by the cannam@167: @code{OMP_NUM_THREADS} environment variable), you can do: cannam@167: @code{fftw_plan_with_nthreads(omp_get_max_threads())}. (The @samp{omp_} cannam@167: OpenMP functions are declared via @code{#include }.) cannam@167: cannam@167: @cindex thread safety cannam@167: Given a plan, you then execute it as usual with cannam@167: @code{fftw_execute(plan)}, and the execution will use the number of cannam@167: threads specified when the plan was created. When done, you destroy cannam@167: it as usual with @code{fftw_destroy_plan}. As described in cannam@167: @ref{Thread safety}, plan @emph{execution} is thread-safe, but plan cannam@167: creation and destruction are @emph{not}: you should create/destroy cannam@167: plans only from a single thread, but can safely execute multiple plans cannam@167: in parallel. cannam@167: cannam@167: There is one additional routine: if you want to get rid of all memory cannam@167: and other resources allocated internally by FFTW, you can call: cannam@167: cannam@167: @example cannam@167: void fftw_cleanup_threads(void); cannam@167: @end example cannam@167: @findex fftw_cleanup_threads cannam@167: cannam@167: which is much like the @code{fftw_cleanup()} function except that it cannam@167: also gets rid of threads-related data. You must @emph{not} execute any cannam@167: previously created plans after calling this function. cannam@167: cannam@167: We should also mention one other restriction: if you save wisdom from a cannam@167: program using the multi-threaded FFTW, that wisdom @emph{cannot be used} cannam@167: by a program using only the single-threaded FFTW (i.e. not calling cannam@167: @code{fftw_init_threads}). @xref{Words of Wisdom-Saving Plans}. cannam@167: cannam@167: @c ------------------------------------------------------------ cannam@167: @node How Many Threads to Use?, Thread safety, Usage of Multi-threaded FFTW, Multi-threaded FFTW cannam@167: @section How Many Threads to Use? cannam@167: cannam@167: @cindex number of threads cannam@167: There is a fair amount of overhead involved in synchronizing threads, cannam@167: so the optimal number of threads to use depends upon the size of the cannam@167: transform as well as on the number of processors you have. cannam@167: cannam@167: As a general rule, you don't want to use more threads than you have cannam@167: processors. (Using more threads will work, but there will be extra cannam@167: overhead with no benefit.) In fact, if the problem size is too small, cannam@167: you may want to use fewer threads than you have processors. cannam@167: cannam@167: You will have to experiment with your system to see what level of cannam@167: parallelization is best for your problem size. Typically, the problem cannam@167: will have to involve at least a few thousand data points before threads cannam@167: become beneficial. If you plan with @code{FFTW_PATIENT}, it will cannam@167: automatically disable threads for sizes that don't benefit from cannam@167: parallelization. cannam@167: @ctindex FFTW_PATIENT cannam@167: cannam@167: @c ------------------------------------------------------------ cannam@167: @node Thread safety, , How Many Threads to Use?, Multi-threaded FFTW cannam@167: @section Thread safety cannam@167: cannam@167: @cindex threads cannam@167: @cindex OpenMP cannam@167: @cindex thread safety cannam@167: Users writing multi-threaded programs (including OpenMP) must concern cannam@167: themselves with the @dfn{thread safety} of the libraries they cannam@167: use---that is, whether it is safe to call routines in parallel from cannam@167: multiple threads. FFTW can be used in such an environment, but some cannam@167: care must be taken because the planner routines share data cannam@167: (e.g. wisdom and trigonometric tables) between calls and plans. cannam@167: cannam@167: The upshot is that the only thread-safe routine in FFTW is cannam@167: @code{fftw_execute} (and the new-array variants thereof). All other routines cannam@167: (e.g. the planner) should only be called from one thread at a time. So, cannam@167: for example, you can wrap a semaphore lock around any calls to the cannam@167: planner; even more simply, you can just create all of your plans from cannam@167: one thread. We do not think this should be an important restriction cannam@167: (FFTW is designed for the situation where the only performance-sensitive cannam@167: code is the actual execution of the transform), and the benefits of cannam@167: shared data between plans are great. cannam@167: cannam@167: Note also that, since the plan is not modified by @code{fftw_execute}, cannam@167: it is safe to execute the @emph{same plan} in parallel by multiple cannam@167: threads. However, since a given plan operates by default on a fixed cannam@167: array, you need to use one of the new-array execute functions (@pxref{New-array Execute Functions}) so that different threads compute the transform of different data. cannam@167: cannam@167: (Users should note that these comments only apply to programs using cannam@167: shared-memory threads or OpenMP. Parallelism using MPI or forked processes cannam@167: involves a separate address-space and global variables for each process, cannam@167: and is not susceptible to problems of this sort.) cannam@167: cannam@167: The FFTW planner is intended to be called from a single thread. If you cannam@167: really must call it from multiple threads, you are expected to grab cannam@167: whatever lock makes sense for your application, with the understanding cannam@167: that you may be holding that lock for a long time, which is undesirable. cannam@167: cannam@167: Neither strategy works, however, in the following situation. The cannam@167: ``application'' is structured as a set of ``plugins'' which are unaware cannam@167: of each other, and for whatever reason the ``plugins'' cannot coordinate cannam@167: on grabbing the lock. (This is not a technical problem, but an cannam@167: organizational one. The ``plugins'' are written by independent agents, cannam@167: and from the perspective of each plugin's author, each plugin is using cannam@167: FFTW correctly from a single thread.) To cope with this situation, cannam@167: starting from FFTW-3.3.5, FFTW supports an API to make the planner cannam@167: thread-safe: cannam@167: cannam@167: @example cannam@167: void fftw_make_planner_thread_safe(void); cannam@167: @end example cannam@167: @findex fftw_make_planner_thread_safe cannam@167: cannam@167: This call operates by brute force: It just installs a hook that wraps a cannam@167: lock (chosen by us) around all planner calls. So there is no magic and cannam@167: you get the worst of all worlds. The planner is still single-threaded, cannam@167: but you cannot choose which lock to use. The planner still holds the cannam@167: lock for a long time, but you cannot impose a timeout on lock cannam@167: acquisition. As of FFTW-3.3.5 and FFTW-3.3.6, this call does not work cannam@167: when using OpenMP as threading substrate. (Suggestions on what to do cannam@167: about this bug are welcome.) @emph{Do not use cannam@167: @code{fftw_make_planner_thread_safe} unless there is no other choice,} cannam@167: such as in the application/plugin situation.