d@0: d@0: d@0: FFTW on the Cell Processor - FFTW 3.2.1 d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0: d@0:
d@0:

d@0: d@0: Next: , d@0: Previous: Multi-threaded FFTW, d@0: Up: Top d@0:


d@0:
d@0: d@0:

6 FFTW on the Cell Processor

d@0: d@0:

d@0: Starting with version 3.2, FFTW contains specific support for the Cell d@0: Broadband Engine (“Cell”) processor, graciously donated by the IBM d@0: Austin Research Laboratory. d@0: d@0:

Cell consists of one PowerPC core (“PPE”) and of a number of d@0: Synergistic Processing Elements (“SPE”) to which the PPE can d@0: delegate computation. The IBM QS20 Cell blade offers 8 SPEs per Cell d@0: chip. The Sony Playstation 3 contains 6 useable SPEs. d@0: d@0:

Currently, FFTW fully utilizes the SPEs for one- and multi-dimensional d@0: complex FFTs of sizes that can be factored into small primes, both in d@0: single and double precision. Transforms of real data use SPEs only d@0: partially at this time. If FFTW cannot use the SPEs, it falls back to d@0: a slower computation on the PPE. d@0: d@0:

FFTW is meant to use the SPEs transparently without user intervention. d@0: However, certain caveats apply, which are discussed later in this d@0: document. d@0: d@0:

d@0: d@0: d@0: d@0: