Cell Caveats - FFTW 3.2.1

d@0: d@0: d@0: Cell Caveats - FFTW 3.2.1 d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0:

d@0:

d@0: d@0: Next: FFTW Accuracy on Cell, d@0: Previous: Cell Installation, d@0: Up: FFTW on the Cell Processor d@0:

d@0:

d@0: d@0:

6.2 Cell Caveats

d@0: d@0:

The FFTW benchmark program allocates memory using malloc() or d@0: equivalent library calls, reflecting the common usage of the FFTW d@0: library. However, you can sometimes improve performance significantly d@0: by allocating memory in system-specific large TLB pages. E.g., we d@0: have seen 39 GFLOPS/s for a 256 × 256 × 256 problem using d@0: large pages, whereas the speed is about 25 GFLOPS/s with normal pages. d@0: YMMV. d@0: d@0:
FFTW hoards all available SPEs for itself. You can optionally d@0: choose a different number of SPEs by calling the undocumented d@0: function fftw_cell_set_nspe(n), where n is the number of desired d@0: SPEs. Expect this interface to go away once we figure out how to d@0: make FFTW play nicely with other Cell software. d@0: d@0:
In particular, if you try to link both the single and double precision d@0: of FFTW in the same program (which you can do), they will both try d@0: to grab all SPEs and the second one will hang. d@0: d@0:
The SPEs demand that data be stored in contiguous arrays aligned at d@0: 16-byte boundaries. If you instruct FFTW to operate on d@0: noncontiguous or nonaligned data, the SPEs will not be used, d@0: resulting in slow execution. See Data Alignment. d@0: d@0:
The FFTW_ESTIMATE mode may produce seriously suboptimal plans, and d@0: it becomes particularly confused if you enable both the SPEs and d@0: Altivec. If you care about performance, please use FFTW_MEASURE d@0: or FFTW_PATIENT until we figure out a more reliable performance model. d@0: d@0:

d@0: d@0: d@0: d@0: