The FFTW benchmark program allocates memory using malloc() or
d@0: equivalent library calls, reflecting the common usage of the FFTW
d@0: library. However, you can sometimes improve performance significantly
d@0: by allocating memory in system-specific large TLB pages. E.g., we
d@0: have seen 39 GFLOPS/s for a 256 × 256 × 256 problem using
d@0: large pages, whereas the speed is about 25 GFLOPS/s with normal pages.
d@0: YMMV.
d@0:
d@0:
FFTW hoards all available SPEs for itself. You can optionally
d@0: choose a different number of SPEs by calling the undocumented
d@0: function fftw_cell_set_nspe(n), where n is the number of desired
d@0: SPEs. Expect this interface to go away once we figure out how to
d@0: make FFTW play nicely with other Cell software.
d@0:
d@0:
In particular, if you try to link both the single and double precision
d@0: of FFTW in the same program (which you can do), they will both try
d@0: to grab all SPEs and the second one will hang.
d@0:
d@0:
The SPEs demand that data be stored in contiguous arrays aligned at
d@0: 16-byte boundaries. If you instruct FFTW to operate on
d@0: noncontiguous or nonaligned data, the SPEs will not be used,
d@0: resulting in slow execution. See Data Alignment.
d@0:
d@0:
The FFTW_ESTIMATE mode may produce seriously suboptimal plans, and
d@0: it becomes particularly confused if you enable both the SPEs and
d@0: Altivec. If you care about performance, please use FFTW_MEASURE
d@0: or FFTW_PATIENT until we figure out a more reliable performance model.
d@0:
d@0: