d@0
|
1 <html lang="en">
|
d@0
|
2 <head>
|
d@0
|
3 <title>Cell Caveats - FFTW 3.2.1</title>
|
d@0
|
4 <meta http-equiv="Content-Type" content="text/html">
|
d@0
|
5 <meta name="description" content="FFTW 3.2.1">
|
d@0
|
6 <meta name="generator" content="makeinfo 4.8">
|
d@0
|
7 <link title="Top" rel="start" href="index.html#Top">
|
d@0
|
8 <link rel="up" href="FFTW-on-the-Cell-Processor.html#FFTW-on-the-Cell-Processor" title="FFTW on the Cell Processor">
|
d@0
|
9 <link rel="prev" href="Cell-Installation.html#Cell-Installation" title="Cell Installation">
|
d@0
|
10 <link rel="next" href="FFTW-Accuracy-on-Cell.html#FFTW-Accuracy-on-Cell" title="FFTW Accuracy on Cell">
|
d@0
|
11 <link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
|
d@0
|
12 <!--
|
d@0
|
13 This manual is for FFTW
|
d@0
|
14 (version 3.2.1, 5 February 2009).
|
d@0
|
15
|
d@0
|
16 Copyright (C) 2003 Matteo Frigo.
|
d@0
|
17
|
d@0
|
18 Copyright (C) 2003 Massachusetts Institute of Technology.
|
d@0
|
19
|
d@0
|
20 Permission is granted to make and distribute verbatim copies of
|
d@0
|
21 this manual provided the copyright notice and this permission
|
d@0
|
22 notice are preserved on all copies.
|
d@0
|
23
|
d@0
|
24 Permission is granted to copy and distribute modified versions of
|
d@0
|
25 this manual under the conditions for verbatim copying, provided
|
d@0
|
26 that the entire resulting derived work is distributed under the
|
d@0
|
27 terms of a permission notice identical to this one.
|
d@0
|
28
|
d@0
|
29 Permission is granted to copy and distribute translations of this
|
d@0
|
30 manual into another language, under the above conditions for
|
d@0
|
31 modified versions, except that this permission notice may be
|
d@0
|
32 stated in a translation approved by the Free Software Foundation.
|
d@0
|
33 -->
|
d@0
|
34 <meta http-equiv="Content-Style-Type" content="text/css">
|
d@0
|
35 <style type="text/css"><!--
|
d@0
|
36 pre.display { font-family:inherit }
|
d@0
|
37 pre.format { font-family:inherit }
|
d@0
|
38 pre.smalldisplay { font-family:inherit; font-size:smaller }
|
d@0
|
39 pre.smallformat { font-family:inherit; font-size:smaller }
|
d@0
|
40 pre.smallexample { font-size:smaller }
|
d@0
|
41 pre.smalllisp { font-size:smaller }
|
d@0
|
42 span.sc { font-variant:small-caps }
|
d@0
|
43 span.roman { font-family:serif; font-weight:normal; }
|
d@0
|
44 span.sansserif { font-family:sans-serif; font-weight:normal; }
|
d@0
|
45 --></style>
|
d@0
|
46 </head>
|
d@0
|
47 <body>
|
d@0
|
48 <div class="node">
|
d@0
|
49 <p>
|
d@0
|
50 <a name="Cell-Caveats"></a>
|
d@0
|
51 Next: <a rel="next" accesskey="n" href="FFTW-Accuracy-on-Cell.html#FFTW-Accuracy-on-Cell">FFTW Accuracy on Cell</a>,
|
d@0
|
52 Previous: <a rel="previous" accesskey="p" href="Cell-Installation.html#Cell-Installation">Cell Installation</a>,
|
d@0
|
53 Up: <a rel="up" accesskey="u" href="FFTW-on-the-Cell-Processor.html#FFTW-on-the-Cell-Processor">FFTW on the Cell Processor</a>
|
d@0
|
54 <hr>
|
d@0
|
55 </div>
|
d@0
|
56
|
d@0
|
57 <h3 class="section">6.2 Cell Caveats</h3>
|
d@0
|
58
|
d@0
|
59 <ul>
|
d@0
|
60 <li>The FFTW benchmark program allocates memory using malloc() or
|
d@0
|
61 equivalent library calls, reflecting the common usage of the FFTW
|
d@0
|
62 library. However, you can sometimes improve performance significantly
|
d@0
|
63 by allocating memory in system-specific large TLB pages. E.g., we
|
d@0
|
64 have seen 39 GFLOPS/s for a 256 × 256 × 256 problem using
|
d@0
|
65 large pages, whereas the speed is about 25 GFLOPS/s with normal pages.
|
d@0
|
66 YMMV.
|
d@0
|
67
|
d@0
|
68 <li>FFTW hoards all available SPEs for itself. You can optionally
|
d@0
|
69 choose a different number of SPEs by calling the undocumented
|
d@0
|
70 function <code>fftw_cell_set_nspe(n)</code>, where <code>n</code> is the number of desired
|
d@0
|
71 SPEs. Expect this interface to go away once we figure out how to
|
d@0
|
72 make FFTW play nicely with other Cell software.
|
d@0
|
73
|
d@0
|
74 <p>In particular, if you try to link both the single and double precision
|
d@0
|
75 of FFTW in the same program (which you can do), they will both try
|
d@0
|
76 to grab all SPEs and the second one will hang.
|
d@0
|
77
|
d@0
|
78 <li>The SPEs demand that data be stored in contiguous arrays aligned at
|
d@0
|
79 16-byte boundaries. If you instruct FFTW to operate on
|
d@0
|
80 noncontiguous or nonaligned data, the SPEs will not be used,
|
d@0
|
81 resulting in slow execution. See <a href="Data-Alignment.html#Data-Alignment">Data Alignment</a>.
|
d@0
|
82
|
d@0
|
83 <li>The <code>FFTW_ESTIMATE</code> mode may produce seriously suboptimal plans, and
|
d@0
|
84 it becomes particularly confused if you enable both the SPEs and
|
d@0
|
85 Altivec. If you care about performance, please use <code>FFTW_MEASURE</code>
|
d@0
|
86 or <code>FFTW_PATIENT</code> until we figure out a more reliable performance model.
|
d@0
|
87
|
d@0
|
88 </ul>
|
d@0
|
89
|
d@0
|
90 <!-- -->
|
d@0
|
91 </body></html>
|
d@0
|
92
|