comparison Lib/fftw-3.2.1/doc/html/.svn/text-base/Cell-Caveats.html.svn-base @ 15:585caf503ef5 tip

Tidy up for ROLI
author Geogaddi\David <d.m.ronan@qmul.ac.uk>
date Tue, 17 May 2016 18:50:19 +0100
parents 636c989477e7
children
comparison
equal deleted inserted replaced
14:636c989477e7 15:585caf503ef5
1 <html lang="en">
2 <head>
3 <title>Cell Caveats - FFTW 3.2.1</title>
4 <meta http-equiv="Content-Type" content="text/html">
5 <meta name="description" content="FFTW 3.2.1">
6 <meta name="generator" content="makeinfo 4.8">
7 <link title="Top" rel="start" href="index.html#Top">
8 <link rel="up" href="FFTW-on-the-Cell-Processor.html#FFTW-on-the-Cell-Processor" title="FFTW on the Cell Processor">
9 <link rel="prev" href="Cell-Installation.html#Cell-Installation" title="Cell Installation">
10 <link rel="next" href="FFTW-Accuracy-on-Cell.html#FFTW-Accuracy-on-Cell" title="FFTW Accuracy on Cell">
11 <link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
12 <!--
13 This manual is for FFTW
14 (version 3.2.1, 5 February 2009).
15
16 Copyright (C) 2003 Matteo Frigo.
17
18 Copyright (C) 2003 Massachusetts Institute of Technology.
19
20 Permission is granted to make and distribute verbatim copies of
21 this manual provided the copyright notice and this permission
22 notice are preserved on all copies.
23
24 Permission is granted to copy and distribute modified versions of
25 this manual under the conditions for verbatim copying, provided
26 that the entire resulting derived work is distributed under the
27 terms of a permission notice identical to this one.
28
29 Permission is granted to copy and distribute translations of this
30 manual into another language, under the above conditions for
31 modified versions, except that this permission notice may be
32 stated in a translation approved by the Free Software Foundation.
33 -->
34 <meta http-equiv="Content-Style-Type" content="text/css">
35 <style type="text/css"><!--
36 pre.display { font-family:inherit }
37 pre.format { font-family:inherit }
38 pre.smalldisplay { font-family:inherit; font-size:smaller }
39 pre.smallformat { font-family:inherit; font-size:smaller }
40 pre.smallexample { font-size:smaller }
41 pre.smalllisp { font-size:smaller }
42 span.sc { font-variant:small-caps }
43 span.roman { font-family:serif; font-weight:normal; }
44 span.sansserif { font-family:sans-serif; font-weight:normal; }
45 --></style>
46 </head>
47 <body>
48 <div class="node">
49 <p>
50 <a name="Cell-Caveats"></a>
51 Next:&nbsp;<a rel="next" accesskey="n" href="FFTW-Accuracy-on-Cell.html#FFTW-Accuracy-on-Cell">FFTW Accuracy on Cell</a>,
52 Previous:&nbsp;<a rel="previous" accesskey="p" href="Cell-Installation.html#Cell-Installation">Cell Installation</a>,
53 Up:&nbsp;<a rel="up" accesskey="u" href="FFTW-on-the-Cell-Processor.html#FFTW-on-the-Cell-Processor">FFTW on the Cell Processor</a>
54 <hr>
55 </div>
56
57 <h3 class="section">6.2 Cell Caveats</h3>
58
59 <ul>
60 <li>The FFTW benchmark program allocates memory using malloc() or
61 equivalent library calls, reflecting the common usage of the FFTW
62 library. However, you can sometimes improve performance significantly
63 by allocating memory in system-specific large TLB pages. E.g., we
64 have seen 39 GFLOPS/s for a 256&nbsp;&times;&nbsp;256&nbsp;&times;&nbsp;256 problem using
65 large pages, whereas the speed is about 25 GFLOPS/s with normal pages.
66 YMMV.
67
68 <li>FFTW hoards all available SPEs for itself. You can optionally
69 choose a different number of SPEs by calling the undocumented
70 function <code>fftw_cell_set_nspe(n)</code>, where <code>n</code> is the number of desired
71 SPEs. Expect this interface to go away once we figure out how to
72 make FFTW play nicely with other Cell software.
73
74 <p>In particular, if you try to link both the single and double precision
75 of FFTW in the same program (which you can do), they will both try
76 to grab all SPEs and the second one will hang.
77
78 <li>The SPEs demand that data be stored in contiguous arrays aligned at
79 16-byte boundaries. If you instruct FFTW to operate on
80 noncontiguous or nonaligned data, the SPEs will not be used,
81 resulting in slow execution. See <a href="Data-Alignment.html#Data-Alignment">Data Alignment</a>.
82
83 <li>The <code>FFTW_ESTIMATE</code> mode may produce seriously suboptimal plans, and
84 it becomes particularly confused if you enable both the SPEs and
85 Altivec. If you care about performance, please use <code>FFTW_MEASURE</code>
86 or <code>FFTW_PATIENT</code> until we figure out a more reliable performance model.
87
88 </ul>
89
90 <!-- -->
91 </body></html>
92