Chris@19: @node Multi-threaded FFTW, Distributed-memory FFTW with MPI, FFTW Reference, Top Chris@19: @chapter Multi-threaded FFTW Chris@19: Chris@19: @cindex parallel transform Chris@19: In this chapter we document the parallel FFTW routines for Chris@19: shared-memory parallel hardware. These routines, which support Chris@19: parallel one- and multi-dimensional transforms of both real and Chris@19: complex data, are the easiest way to take advantage of multiple Chris@19: processors with FFTW. They work just like the corresponding Chris@19: uniprocessor transform routines, except that you have an extra Chris@19: initialization routine to call, and there is a routine to set the Chris@19: number of threads to employ. Any program that uses the uniprocessor Chris@19: FFTW can therefore be trivially modified to use the multi-threaded Chris@19: FFTW. Chris@19: Chris@19: A shared-memory machine is one in which all CPUs can directly access Chris@19: the same main memory, and such machines are now common due to the Chris@19: ubiquity of multi-core CPUs. FFTW's multi-threading support allows Chris@19: you to utilize these additional CPUs transparently from a single Chris@19: program. However, this does not necessarily translate into Chris@19: performance gains---when multiple threads/CPUs are employed, there is Chris@19: an overhead required for synchronization that may outweigh the Chris@19: computatational parallelism. Therefore, you can only benefit from Chris@19: threads if your problem is sufficiently large. Chris@19: @cindex shared-memory Chris@19: @cindex threads Chris@19: Chris@19: @menu Chris@19: * Installation and Supported Hardware/Software:: Chris@19: * Usage of Multi-threaded FFTW:: Chris@19: * How Many Threads to Use?:: Chris@19: * Thread safety:: Chris@19: @end menu Chris@19: Chris@19: @c ------------------------------------------------------------ Chris@19: @node Installation and Supported Hardware/Software, Usage of Multi-threaded FFTW, Multi-threaded FFTW, Multi-threaded FFTW Chris@19: @section Installation and Supported Hardware/Software Chris@19: Chris@19: All of the FFTW threads code is located in the @code{threads} Chris@19: subdirectory of the FFTW package. On Unix systems, the FFTW threads Chris@19: libraries and header files can be automatically configured, compiled, Chris@19: and installed along with the uniprocessor FFTW libraries simply by Chris@19: including @code{--enable-threads} in the flags to the @code{configure} Chris@19: script (@pxref{Installation on Unix}), or @code{--enable-openmp} to use Chris@19: @uref{http://www.openmp.org,OpenMP} threads. Chris@19: @fpindex configure Chris@19: Chris@19: Chris@19: @cindex portability Chris@19: @cindex OpenMP Chris@19: The threads routines require your operating system to have some sort Chris@19: of shared-memory threads support. Specifically, the FFTW threads Chris@19: package works with POSIX threads (available on most Unix variants, Chris@19: from GNU/Linux to MacOS X) and Win32 threads. OpenMP threads, which Chris@19: are supported in many common compilers (e.g. gcc) are also supported, Chris@19: and may give better performance on some systems. (OpenMP threads are Chris@19: also useful if you are employing OpenMP in your own code, in order to Chris@19: minimize conflicts between threading models.) If you have a Chris@19: shared-memory machine that uses a different threads API, it should be Chris@19: a simple matter of programming to include support for it; see the file Chris@19: @code{threads/threads.c} for more detail. Chris@19: Chris@19: You can compile FFTW with @emph{both} @code{--enable-threads} and Chris@19: @code{--enable-openmp} at the same time, since they install libraries Chris@19: with different names (@samp{fftw3_threads} and @samp{fftw3_omp}, as Chris@19: described below). However, your programs may only link to @emph{one} Chris@19: of these two libraries at a time. Chris@19: Chris@19: Ideally, of course, you should also have multiple processors in order to Chris@19: get any benefit from the threaded transforms. Chris@19: Chris@19: @c ------------------------------------------------------------ Chris@19: @node Usage of Multi-threaded FFTW, How Many Threads to Use?, Installation and Supported Hardware/Software, Multi-threaded FFTW Chris@19: @section Usage of Multi-threaded FFTW Chris@19: Chris@19: Here, it is assumed that the reader is already familiar with the usage Chris@19: of the uniprocessor FFTW routines, described elsewhere in this manual. Chris@19: We only describe what one has to change in order to use the Chris@19: multi-threaded routines. Chris@19: Chris@19: @cindex OpenMP Chris@19: First, programs using the parallel complex transforms should be linked Chris@19: with @code{-lfftw3_threads -lfftw3 -lm} on Unix, or @code{-lfftw3_omp Chris@19: -lfftw3 -lm} if you compiled with OpenMP. You will also need to link Chris@19: with whatever library is responsible for threads on your system Chris@19: (e.g. @code{-lpthread} on GNU/Linux) or include whatever compiler flag Chris@19: enables OpenMP (e.g. @code{-fopenmp} with gcc). Chris@19: @cindex linking on Unix Chris@19: Chris@19: Chris@19: Second, before calling @emph{any} FFTW routines, you should call the Chris@19: function: Chris@19: Chris@19: @example Chris@19: int fftw_init_threads(void); Chris@19: @end example Chris@19: @findex fftw_init_threads Chris@19: Chris@19: This function, which need only be called once, performs any one-time Chris@19: initialization required to use threads on your system. It returns zero Chris@19: if there was some error (which should not happen under normal Chris@19: circumstances) and a non-zero value otherwise. Chris@19: Chris@19: Third, before creating a plan that you want to parallelize, you should Chris@19: call: Chris@19: Chris@19: @example Chris@19: void fftw_plan_with_nthreads(int nthreads); Chris@19: @end example Chris@19: @findex fftw_plan_with_nthreads Chris@19: Chris@19: The @code{nthreads} argument indicates the number of threads you want Chris@19: FFTW to use (or actually, the maximum number). All plans subsequently Chris@19: created with any planner routine will use that many threads. You can Chris@19: call @code{fftw_plan_with_nthreads}, create some plans, call Chris@19: @code{fftw_plan_with_nthreads} again with a different argument, and Chris@19: create some more plans for a new number of threads. Plans already created Chris@19: before a call to @code{fftw_plan_with_nthreads} are unaffected. If you Chris@19: pass an @code{nthreads} argument of @code{1} (the default), threads are Chris@19: disabled for subsequent plans. Chris@19: Chris@19: @cindex OpenMP Chris@19: With OpenMP, to configure FFTW to use all of the currently running Chris@19: OpenMP threads (set by @code{omp_set_num_threads(nthreads)} or by the Chris@19: @code{OMP_NUM_THREADS} environment variable), you can do: Chris@19: @code{fftw_plan_with_nthreads(omp_get_max_threads())}. (The @samp{omp_} Chris@19: OpenMP functions are declared via @code{#include }.) Chris@19: Chris@19: @cindex thread safety Chris@19: Given a plan, you then execute it as usual with Chris@19: @code{fftw_execute(plan)}, and the execution will use the number of Chris@19: threads specified when the plan was created. When done, you destroy Chris@19: it as usual with @code{fftw_destroy_plan}. As described in Chris@19: @ref{Thread safety}, plan @emph{execution} is thread-safe, but plan Chris@19: creation and destruction are @emph{not}: you should create/destroy Chris@19: plans only from a single thread, but can safely execute multiple plans Chris@19: in parallel. Chris@19: Chris@19: There is one additional routine: if you want to get rid of all memory Chris@19: and other resources allocated internally by FFTW, you can call: Chris@19: Chris@19: @example Chris@19: void fftw_cleanup_threads(void); Chris@19: @end example Chris@19: @findex fftw_cleanup_threads Chris@19: Chris@19: which is much like the @code{fftw_cleanup()} function except that it Chris@19: also gets rid of threads-related data. You must @emph{not} execute any Chris@19: previously created plans after calling this function. Chris@19: Chris@19: We should also mention one other restriction: if you save wisdom from a Chris@19: program using the multi-threaded FFTW, that wisdom @emph{cannot be used} Chris@19: by a program using only the single-threaded FFTW (i.e. not calling Chris@19: @code{fftw_init_threads}). @xref{Words of Wisdom-Saving Plans}. Chris@19: Chris@19: @c ------------------------------------------------------------ Chris@19: @node How Many Threads to Use?, Thread safety, Usage of Multi-threaded FFTW, Multi-threaded FFTW Chris@19: @section How Many Threads to Use? Chris@19: Chris@19: @cindex number of threads Chris@19: There is a fair amount of overhead involved in synchronizing threads, Chris@19: so the optimal number of threads to use depends upon the size of the Chris@19: transform as well as on the number of processors you have. Chris@19: Chris@19: As a general rule, you don't want to use more threads than you have Chris@19: processors. (Using more threads will work, but there will be extra Chris@19: overhead with no benefit.) In fact, if the problem size is too small, Chris@19: you may want to use fewer threads than you have processors. Chris@19: Chris@19: You will have to experiment with your system to see what level of Chris@19: parallelization is best for your problem size. Typically, the problem Chris@19: will have to involve at least a few thousand data points before threads Chris@19: become beneficial. If you plan with @code{FFTW_PATIENT}, it will Chris@19: automatically disable threads for sizes that don't benefit from Chris@19: parallelization. Chris@19: @ctindex FFTW_PATIENT Chris@19: Chris@19: @c ------------------------------------------------------------ Chris@19: @node Thread safety, , How Many Threads to Use?, Multi-threaded FFTW Chris@19: @section Thread safety Chris@19: Chris@19: @cindex threads Chris@19: @cindex OpenMP Chris@19: @cindex thread safety Chris@19: Users writing multi-threaded programs (including OpenMP) must concern Chris@19: themselves with the @dfn{thread safety} of the libraries they Chris@19: use---that is, whether it is safe to call routines in parallel from Chris@19: multiple threads. FFTW can be used in such an environment, but some Chris@19: care must be taken because the planner routines share data Chris@19: (e.g. wisdom and trigonometric tables) between calls and plans. Chris@19: Chris@19: The upshot is that the only thread-safe (re-entrant) routine in FFTW is Chris@19: @code{fftw_execute} (and the new-array variants thereof). All other routines Chris@19: (e.g. the planner) should only be called from one thread at a time. So, Chris@19: for example, you can wrap a semaphore lock around any calls to the Chris@19: planner; even more simply, you can just create all of your plans from Chris@19: one thread. We do not think this should be an important restriction Chris@19: (FFTW is designed for the situation where the only performance-sensitive Chris@19: code is the actual execution of the transform), and the benefits of Chris@19: shared data between plans are great. Chris@19: Chris@19: Note also that, since the plan is not modified by @code{fftw_execute}, Chris@19: it is safe to execute the @emph{same plan} in parallel by multiple Chris@19: threads. However, since a given plan operates by default on a fixed Chris@19: array, you need to use one of the new-array execute functions (@pxref{New-array Execute Functions}) so that different threads compute the transform of different data. Chris@19: Chris@19: (Users should note that these comments only apply to programs using Chris@19: shared-memory threads or OpenMP. Parallelism using MPI or forked processes Chris@19: involves a separate address-space and global variables for each process, Chris@19: and is not susceptible to problems of this sort.) Chris@19: Chris@19: If you are configured FFTW with the @code{--enable-debug} or Chris@19: @code{--enable-debug-malloc} flags (@pxref{Installation on Unix}), Chris@19: then @code{fftw_execute} is not thread-safe. These flags are not Chris@19: documented because they are intended only for developing Chris@19: and debugging FFTW, but if you must use @code{--enable-debug} then you Chris@19: should also specifically pass @code{--disable-debug-malloc} for Chris@19: @code{fftw_execute} to be thread-safe. Chris@19: