d@0: d@0:
d@0:d@0: d@0: Next: Avoiding MPI Deadlocks, d@0: Previous: FFTW MPI Transposes, d@0: Up: Distributed-memory FFTW with MPI d@0:
d@0: FFTW's “wisdom” facility (see Words of Wisdom-Saving Plans) can d@0: be used to save MPI plans as well as to save uniprocessor plans. d@0: However, for MPI there are several unavoidable complications. d@0: d@0:
First, the MPI standard does not guarantee that every process can
d@0: perform file I/O (at least, not using C stdio routines)—only the
d@0: process of rank MPI_IO
can in general (unless MPI_IO ==
d@0: MPI_ANY_SOURCE
). (In practice, MPI_IO
is commonly assumed to
d@0: be zero, i.e. at least process 0 can perform I/O, since otherwise a
d@0: single-process run could not perform I/O.) So, if we want to export
d@0: the wisdom from a single process to a file, we must first export the
d@0: wisdom to a string, then send it to process 0, then write it to a
d@0: file.
d@0:
d@0: Second, in principle we may want to have separate wisdom for every
d@0: process, since in general the processes may run on different hardware
d@0: even for a single MPI program. However, in practice FFTW's MPI code
d@0: is designed for the case of homogeneous hardware (see Load balancing), and in this case it is convenient to use the same wisdom
d@0: for every process. Thus, we need a mechanism to synchronize the wisdom.
d@0:
d@0:
To address both of these problems, FFTW provides the following two d@0: functions: d@0: d@0:
void fftw_mpi_broadcast_wisdom(MPI_Comm comm); d@0: void fftw_mpi_gather_wisdom(MPI_Comm comm); d@0:d@0:
d@0: Given a communicator comm
, fftw_mpi_broadcast_wisdom
d@0: will broadcast the wisdom from process 0 to all other processes.
d@0: Conversely, fftw_mpi_gather_wisdom
will collect wisdom from all
d@0: processes onto process 0. (If the plans created for the same problem
d@0: by different processes are not the same, fftw_mpi_gather_wisdom
d@0: will arbitrarily choose one of the plans.) Both of these functions
d@0: may result in suboptimal plans for different processes if the
d@0: processes are running on non-identical hardware. Both of these
d@0: functions are collective calls, which means that they must be
d@0: executed by all processes in the communicator.
d@0:
d@0: So, for example, a typical code snippet to import wisdom from a file
d@0: and use it on all processes would be:
d@0:
d@0:
{ d@0: int rank; d@0: FILE *f; d@0: d@0: fftw_mpi_init(); d@0: MPI_Comm_rank(MPI_COMM_WORLD, &rank); d@0: if (rank == 0 && (f = fopen("mywisdom", "r"))) { d@0: fftw_import_wisdom_from_file(f); d@0: fclose(f); d@0: } d@0: fftw_mpi_broadcast_wisdom(MPI_COMM_WORLD); d@0: } d@0:d@0:
(Note that we must call fftw_mpi_init
before importing any
d@0: wisdom that might contain MPI plans.) Similarly, a typical code
d@0: snippet to export wisdom from all processes to a file is:
d@0:
d@0:
{ d@0: int rank; d@0: FILE *f; d@0: d@0: fftw_mpi_gather_wisdom(MPI_COMM_WORLD); d@0: MPI_Comm_rank(MPI_COMM_WORLD, &rank); d@0: if (rank == 0 && (f = fopen("mywisdom", "w"))) { d@0: fftw_export_wisdom_to_file(f); d@0: fclose(f); d@0: } d@0: } d@0:d@0: d@0: d@0: