cannam@167: cannam@167: cannam@167: cannam@167: cannam@167:
cannam@167:cannam@167: Next: FFTW MPI Reference, Previous: FFTW MPI Performance Tips, Up: Distributed-memory FFTW with MPI [Contents][Index]
cannam@167:In certain cases, it may be advantageous to combine MPI cannam@167: (distributed-memory) and threads (shared-memory) parallelization. cannam@167: FFTW supports this, with certain caveats. For example, if you have a cannam@167: cluster of 4-processor shared-memory nodes, you may want to use cannam@167: threads within the nodes and MPI between the nodes, instead of MPI for cannam@167: all parallelization. cannam@167:
cannam@167:In particular, it is possible to seamlessly combine the MPI FFTW cannam@167: routines with the multi-threaded FFTW routines (see Multi-threaded FFTW). However, some care must be taken in the initialization code, cannam@167: which should look something like this: cannam@167:
cannam@167:int threads_ok; cannam@167: cannam@167: int main(int argc, char **argv) cannam@167: { cannam@167: int provided; cannam@167: MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &provided); cannam@167: threads_ok = provided >= MPI_THREAD_FUNNELED; cannam@167: cannam@167: if (threads_ok) threads_ok = fftw_init_threads(); cannam@167: fftw_mpi_init(); cannam@167: cannam@167: ... cannam@167: if (threads_ok) fftw_plan_with_nthreads(...); cannam@167: ... cannam@167: cannam@167: MPI_Finalize(); cannam@167: } cannam@167:
First, note that instead of calling MPI_Init
, you should call
cannam@167: MPI_Init_threads
, which is the initialization routine defined
cannam@167: by the MPI-2 standard to indicate to MPI that your program will be
cannam@167: multithreaded. We pass MPI_THREAD_FUNNELED
, which indicates
cannam@167: that we will only call MPI routines from the main thread. (FFTW will
cannam@167: launch additional threads internally, but the extra threads will not
cannam@167: call MPI code.) (You may also pass MPI_THREAD_SERIALIZED
or
cannam@167: MPI_THREAD_MULTIPLE
, which requests additional multithreading
cannam@167: support from the MPI implementation, but this is not required by
cannam@167: FFTW.) The provided
parameter returns what level of threads
cannam@167: support is actually supported by your MPI implementation; this
cannam@167: must be at least MPI_THREAD_FUNNELED
if you want to call
cannam@167: the FFTW threads routines, so we define a global variable
cannam@167: threads_ok
to record this. You should only call
cannam@167: fftw_init_threads
or fftw_plan_with_nthreads
if
cannam@167: threads_ok
is true. For more information on thread safety in
cannam@167: MPI, see the
cannam@167: MPI and
cannam@167: Threads section of the MPI-2 standard.
cannam@167:
cannam@167:
Second, we must call fftw_init_threads
before
cannam@167: fftw_mpi_init
. This is critical for technical reasons having
cannam@167: to do with how FFTW initializes its list of algorithms.
cannam@167:
Then, if you call fftw_plan_with_nthreads(N)
, every MPI
cannam@167: process will launch (up to) N
threads to parallelize its transforms.
cannam@167:
For example, in the hypothetical cluster of 4-processor nodes, you
cannam@167: might wish to launch only a single MPI process per node, and then call
cannam@167: fftw_plan_with_nthreads(4)
on each process to use all
cannam@167: processors in the nodes.
cannam@167:
This may or may not be faster than simply using as many MPI processes cannam@167: as you have processors, however. On the one hand, using threads cannam@167: within a node eliminates the need for explicit message passing within cannam@167: the node. On the other hand, FFTW’s transpose routines are not cannam@167: multi-threaded, and this means that the communications that do take cannam@167: place will not benefit from parallelization within the node. cannam@167: Moreover, many MPI implementations already have optimizations to cannam@167: exploit shared memory when it is available, so adding the cannam@167: multithreaded FFTW on top of this may be superfluous. cannam@167: cannam@167:
cannam@167:cannam@167: Next: FFTW MPI Reference, Previous: FFTW MPI Performance Tips, Up: Distributed-memory FFTW with MPI [Contents][Index]
cannam@167: