Mercurial > hg > sv-dependency-builds
comparison src/fftw-3.3.5/doc/FAQ/fftw-faq.ascii @ 42:2cd0e3b3e1fd
Current fftw source
author | Chris Cannam |
---|---|
date | Tue, 18 Oct 2016 13:40:26 +0100 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
41:481f5f8c5634 | 42:2cd0e3b3e1fd |
---|---|
1 FFTW FREQUENTLY ASKED QUESTIONS WITH ANSWERS | |
2 30 Jul 2016 | |
3 Matteo Frigo | |
4 Steven G. Johnson | |
5 <fftw@fftw.org> | |
6 | |
7 This is the list of Frequently Asked Questions about FFTW, a collection of | |
8 fast C routines for computing the Discrete Fourier Transform in one or | |
9 more dimensions. | |
10 | |
11 =============================================================================== | |
12 | |
13 Index | |
14 | |
15 Section 1. Introduction and General Information | |
16 Q1.1 What is FFTW? | |
17 Q1.2 How do I obtain FFTW? | |
18 Q1.3 Is FFTW free software? | |
19 Q1.4 What is this about non-free licenses? | |
20 Q1.5 In the West? I thought MIT was in the East? | |
21 | |
22 Section 2. Installing FFTW | |
23 Q2.1 Which systems does FFTW run on? | |
24 Q2.2 Does FFTW run on Windows? | |
25 Q2.3 My compiler has trouble with FFTW. | |
26 Q2.4 FFTW does not compile on Solaris, complaining about const. | |
27 Q2.5 What's the difference between --enable-3dnow and --enable-k7? | |
28 Q2.6 What's the difference between the fma and the non-fma versions? | |
29 Q2.7 Which language is FFTW written in? | |
30 Q2.8 Can I call FFTW from Fortran? | |
31 Q2.9 Can I call FFTW from C++? | |
32 Q2.10 Why isn't FFTW written in Fortran/C++? | |
33 Q2.11 How do I compile FFTW to run in single precision? | |
34 Q2.12 --enable-k7 does not work on x86-64 | |
35 | |
36 Section 3. Using FFTW | |
37 Q3.1 Why not support the FFTW 2 interface in FFTW 3? | |
38 Q3.2 Why do FFTW 3 plans encapsulate the input/output arrays and not ju | |
39 Q3.3 FFTW seems really slow. | |
40 Q3.4 FFTW slows down after repeated calls. | |
41 Q3.5 An FFTW routine is crashing when I call it. | |
42 Q3.6 My Fortran program crashes when calling FFTW. | |
43 Q3.7 FFTW gives results different from my old FFT. | |
44 Q3.8 FFTW gives different results between runs | |
45 Q3.9 Can I save FFTW's plans? | |
46 Q3.10 Why does your inverse transform return a scaled result? | |
47 Q3.11 How can I make FFTW put the origin (zero frequency) at the center | |
48 Q3.12 How do I FFT an image/audio file in *foobar* format? | |
49 Q3.13 My program does not link (on Unix). | |
50 Q3.14 I included your header, but linking still fails. | |
51 Q3.15 My program crashes, complaining about stack space. | |
52 Q3.16 FFTW seems to have a memory leak. | |
53 Q3.17 The output of FFTW's transform is all zeros. | |
54 Q3.18 How do I call FFTW from the Microsoft language du jour? | |
55 Q3.19 Can I compute only a subset of the DFT outputs? | |
56 Q3.20 Can I use FFTW's routines for in-place and out-of-place matrix tra | |
57 | |
58 Section 4. Internals of FFTW | |
59 Q4.1 How does FFTW work? | |
60 Q4.2 Why is FFTW so fast? | |
61 | |
62 Section 5. Known bugs | |
63 Q5.1 FFTW 1.1 crashes in rfftwnd on Linux. | |
64 Q5.2 The MPI transforms in FFTW 1.2 give incorrect results/leak memory. | |
65 Q5.3 The test programs in FFTW 1.2.1 fail when I change FFTW to use sin | |
66 Q5.4 The test program in FFTW 1.2.1 fails for n > 46340. | |
67 Q5.5 The threaded code fails on Linux Redhat 5.0 | |
68 Q5.6 FFTW 2.0's rfftwnd fails for rank > 1 transforms with a final dime | |
69 Q5.7 FFTW 2.0's complex transforms give the wrong results with prime fa | |
70 Q5.8 FFTW 2.1.1's MPI test programs crash with MPICH. | |
71 Q5.9 FFTW 2.1.2's multi-threaded transforms don't work on AIX. | |
72 Q5.10 FFTW 2.1.2's complex transforms give incorrect results for large p | |
73 Q5.11 FFTW 2.1.3's multi-threaded transforms don't give any speedup on S | |
74 Q5.12 FFTW 2.1.3 crashes on AIX. | |
75 | |
76 =============================================================================== | |
77 | |
78 Section 1. Introduction and General Information | |
79 | |
80 Q1.1 What is FFTW? | |
81 Q1.2 How do I obtain FFTW? | |
82 Q1.3 Is FFTW free software? | |
83 Q1.4 What is this about non-free licenses? | |
84 Q1.5 In the West? I thought MIT was in the East? | |
85 | |
86 ------------------------------------------------------------------------------- | |
87 | |
88 Question 1.1. What is FFTW? | |
89 | |
90 FFTW is a free collection of fast C routines for computing the Discrete | |
91 Fourier Transform in one or more dimensions. It includes complex, real, | |
92 symmetric, and parallel transforms, and can handle arbitrary array sizes | |
93 efficiently. FFTW is typically faster than other publically-available FFT | |
94 implementations, and is even competitive with vendor-tuned libraries. | |
95 (See our web page for extensive benchmarks.) To achieve this performance, | |
96 FFTW uses novel code-generation and runtime self-optimization techniques | |
97 (along with many other tricks). | |
98 | |
99 ------------------------------------------------------------------------------- | |
100 | |
101 Question 1.2. How do I obtain FFTW? | |
102 | |
103 FFTW can be found at the FFTW web page. You can also retrieve it from | |
104 ftp.fftw.org in /pub/fftw. | |
105 | |
106 ------------------------------------------------------------------------------- | |
107 | |
108 Question 1.3. Is FFTW free software? | |
109 | |
110 Starting with version 1.3, FFTW is Free Software in the technical sense | |
111 defined by the Free Software Foundation (see Categories of Free and | |
112 Non-Free Software), and is distributed under the terms of the GNU General | |
113 Public License. Previous versions of FFTW were distributed without fee | |
114 for noncommercial use, but were not technically ``free.'' | |
115 | |
116 Non-free licenses for FFTW are also available that permit different terms | |
117 of use than the GPL. | |
118 | |
119 ------------------------------------------------------------------------------- | |
120 | |
121 Question 1.4. What is this about non-free licenses? | |
122 | |
123 The non-free licenses are for companies that wish to use FFTW in their | |
124 products but are unwilling to release their software under the GPL (which | |
125 would require them to release source code and allow free redistribution). | |
126 Such users can purchase an unlimited-use license from MIT. Contact us for | |
127 more details. | |
128 | |
129 We could instead have released FFTW under the LGPL, or even disallowed | |
130 non-Free usage. Suffice it to say, however, that MIT owns the copyright | |
131 to FFTW and they only let us GPL it because we convinced them that it | |
132 would neither affect their licensing revenue nor irritate existing | |
133 licensees. | |
134 | |
135 ------------------------------------------------------------------------------- | |
136 | |
137 Question 1.5. In the West? I thought MIT was in the East? | |
138 | |
139 Not to an Italian. You could say that we're a Spaghetti Western (with | |
140 apologies to Sergio Leone). | |
141 | |
142 =============================================================================== | |
143 | |
144 Section 2. Installing FFTW | |
145 | |
146 Q2.1 Which systems does FFTW run on? | |
147 Q2.2 Does FFTW run on Windows? | |
148 Q2.3 My compiler has trouble with FFTW. | |
149 Q2.4 FFTW does not compile on Solaris, complaining about const. | |
150 Q2.5 What's the difference between --enable-3dnow and --enable-k7? | |
151 Q2.6 What's the difference between the fma and the non-fma versions? | |
152 Q2.7 Which language is FFTW written in? | |
153 Q2.8 Can I call FFTW from Fortran? | |
154 Q2.9 Can I call FFTW from C++? | |
155 Q2.10 Why isn't FFTW written in Fortran/C++? | |
156 Q2.11 How do I compile FFTW to run in single precision? | |
157 Q2.12 --enable-k7 does not work on x86-64 | |
158 | |
159 ------------------------------------------------------------------------------- | |
160 | |
161 Question 2.1. Which systems does FFTW run on? | |
162 | |
163 FFTW is written in ANSI C, and should work on any system with a decent C | |
164 compiler. (See also Q2.2 `Does FFTW run on Windows?', Q2.3 `My compiler | |
165 has trouble with FFTW.'.) FFTW can also take advantage of certain | |
166 hardware-specific features, such as cycle counters and SIMD instructions, | |
167 but this is optional. | |
168 | |
169 ------------------------------------------------------------------------------- | |
170 | |
171 Question 2.2. Does FFTW run on Windows? | |
172 | |
173 Yes, many people have reported successfully using FFTW on Windows with | |
174 various compilers. FFTW was not developed on Windows, but the source code | |
175 is essentially straight ANSI C. See also the FFTW Windows installation | |
176 notes, Q2.3 `My compiler has trouble with FFTW.', and Q3.18 `How do I call | |
177 FFTW from the Microsoft language du jour?'. | |
178 | |
179 ------------------------------------------------------------------------------- | |
180 | |
181 Question 2.3. My compiler has trouble with FFTW. | |
182 | |
183 Complain fiercely to the vendor of the compiler. | |
184 | |
185 We have successfully used gcc 3.2.x on x86 and PPC, a recent Compaq C | |
186 compiler for Alpha, version 6 of IBM's xlc compiler for AIX, Intel's icc | |
187 versions 5-7, and Sun WorkShop cc version 6. | |
188 | |
189 FFTW is likely to push compilers to their limits, however, and several | |
190 compiler bugs have been exposed by FFTW. A partial list follows. | |
191 | |
192 gcc 2.95.x for Solaris/SPARC produces incorrect code for the test program | |
193 (workaround: recompile the libbench2 directory with -O2). | |
194 | |
195 NetBSD/macppc 1.6 comes with a gcc version that also miscompiles the test | |
196 program. (Please report a workaround if you know one.) | |
197 | |
198 gcc 3.2.3 for ARM reportedly crashes during compilation. This bug is | |
199 reportedly fixed in later versions of gcc. | |
200 | |
201 Versions 8.0 and 8.1 of Intel's icc falsely claim to be gcc, so you should | |
202 specify CC="icc -no-gcc"; this is automatic in FFTW 3.1. icc-8.0.066 | |
203 reportely produces incorrect code for FFTW 2.1.5, but is fixed in version | |
204 8.1. icc-7.1 compiler build 20030402Z appears to produce incorrect | |
205 dependencies, causing the compilation to fail. icc-7.1 build 20030307Z | |
206 appears to work fine. (Use icc -V to check which build you have.) As of | |
207 2003/04/18, build 20030402Z appears not to be available any longer on | |
208 Intel's website, whereas the older build 20030307Z is available. | |
209 | |
210 ranlib of GNU binutils 2.9.1 on Irix has been observed to corrupt the FFTW | |
211 libraries, causing a link failure when FFTW is compiled. Since ranlib is | |
212 completely superfluous on Irix, we suggest deleting it from your system | |
213 and replacing it with a symbolic link to /bin/echo. | |
214 | |
215 If support for SIMD instructions is enabled in FFTW, further compiler | |
216 problems may appear: | |
217 | |
218 gcc 3.4.[0123] for x86 produces incorrect SSE2 code for FFTW when -O2 (the | |
219 best choice for FFTW) is used, causing FFTW to crash (make check crashes). | |
220 This bug is fixed in gcc 3.4.4. On x86_64 (amd64/em64t), gcc 3.4.4 | |
221 reportedly still has a similar problem, but this is fixed as of gcc 3.4.6. | |
222 | |
223 gcc-3.2 for x86 produces incorrect SIMD code if -O3 is used. The same | |
224 compiler produces incorrect SIMD code if no optimization is used, too. | |
225 When using gcc-3.2, it is a good idea not to change the default CFLAGS | |
226 selected by the configure script. | |
227 | |
228 Some 3.0.x and 3.1.x versions of gcc on x86 may crash. gcc so-called 2.96 | |
229 shipping with RedHat 7.3 crashes when compiling SIMD code. In both cases, | |
230 please upgrade to gcc-3.2 or later. | |
231 | |
232 Intel's icc 6.0 misaligns SSE constants, but FFTW has a workaround. icc | |
233 8.x fails to compile FFTW 3.0.x because it falsely claims to be gcc; we | |
234 believe this to be a bug in icc, but FFTW 3.1 has a workaround. | |
235 | |
236 Visual C++ 2003 reportedly produces incorrect code for SSE/SSE2 when | |
237 compiling FFTW. This bug was reportedly fixed in VC++ 2005; | |
238 alternatively, you could switch to the Intel compiler. VC++ 6.0 also | |
239 reportedly produces incorrect code for the file reodft11e-r2hc-odd.c | |
240 unless optimizations are disabled for that file. | |
241 | |
242 gcc 2.95 on MacOS X miscompiles AltiVec code (fixed in later versions). | |
243 gcc 3.2.x miscompiles AltiVec permutations, but FFTW has a workaround. | |
244 gcc 4.0.1 on MacOS for Intel crashes when compiling FFTW; a workaround is | |
245 to compile one file without optimization: cd kernel; make CFLAGS=" " | |
246 trig.lo. | |
247 | |
248 gcc 4.1.1 reportedly crashes when compiling FFTW for MIPS; the workaround | |
249 is to compile the file it crashes on (t2_64.c) with a lower optimization | |
250 level. | |
251 | |
252 gcc versions 4.1.2 to 4.2.0 for x86 reportedly miscompile FFTW 3.1's test | |
253 program, causing make check to crash (gcc bug #26528). The bug was | |
254 reportedly fixed in gcc version 4.2.1 and later. A workaround is to | |
255 compile libbench2/verify-lib.c without optimization. | |
256 | |
257 ------------------------------------------------------------------------------- | |
258 | |
259 Question 2.4. FFTW does not compile on Solaris, complaining about const. | |
260 | |
261 We know that at least on Solaris 2.5.x with Sun's compilers 4.2 you might | |
262 get error messages from make such as | |
263 | |
264 "./fftw.h", line 88: warning: const is a keyword in ANSI C | |
265 | |
266 This is the case when the configure script reports that const does not | |
267 work: | |
268 | |
269 checking for working const... (cached) no | |
270 | |
271 You should be aware that Solaris comes with two compilers, namely, | |
272 /opt/SUNWspro/SC4.2/bin/cc and /usr/ucb/cc. The latter compiler is | |
273 non-ANSI. Indeed, it is a perverse shell script that calls the real | |
274 compiler in non-ANSI mode. In order to compile FFTW, change your path so | |
275 that the right cc is used. | |
276 | |
277 To know whether your compiler is the right one, type cc -V. If the | |
278 compiler prints ``ucbcc'', as in | |
279 | |
280 ucbcc: WorkShop Compilers 4.2 30 Oct 1996 C 4.2 | |
281 | |
282 then the compiler is wrong. The right message is something like | |
283 | |
284 cc: WorkShop Compilers 4.2 30 Oct 1996 C 4.2 | |
285 | |
286 ------------------------------------------------------------------------------- | |
287 | |
288 Question 2.5. What's the difference between --enable-3dnow and --enable-k7? | |
289 | |
290 --enable-k7 enables 3DNow! instructions on K7 processors (AMD Athlon and | |
291 its variants). K7 support is provided by assembly routines generated by a | |
292 special purpose compiler. As of fftw-3.2, --enable-k7 is no longer | |
293 supported. | |
294 | |
295 --enable-3dnow enables generic 3DNow! support using gcc builtin functions. | |
296 This works on earlier AMD processors, but it is not as fast as our special | |
297 assembly routines. As of fftw-3.1, --enable-3dnow is no longer supported. | |
298 | |
299 ------------------------------------------------------------------------------- | |
300 | |
301 Question 2.6. What's the difference between the fma and the non-fma versions? | |
302 | |
303 The fma version tries to exploit the fused multiply-add instructions | |
304 implemented in many processors such as PowerPC, ia-64, and MIPS. The two | |
305 FFTW packages are otherwise identical. In FFTW 3.1, the fma and non-fma | |
306 versions were merged together into a single package, and the configure | |
307 script attempts to automatically guess which version to use. | |
308 | |
309 The FFTW 3.1 configure script enables fma by default on PowerPC, Itanium, | |
310 and PA-RISC, and disables it otherwise. You can force one or the other by | |
311 using the --enable-fma or --disable-fma flag for configure. | |
312 | |
313 Definitely use fma if you have a PowerPC-based system with gcc (or IBM | |
314 xlc). This includes all GNU/Linux systems for PowerPC and the older | |
315 PowerPC-based MacOS systems. Also use it on PA-RISC and Itanium with the | |
316 HP/UX compiler. | |
317 | |
318 Definitely do not use the fma version if you have an ia-32 processor | |
319 (Intel, AMD, MacOS on Intel, etcetera). | |
320 | |
321 For other architectures/compilers, the situation is not so clear. For | |
322 example, ia-64 has the fma instruction, but gcc-3.2 appears not to exploit | |
323 it correctly. Other compilers may do the right thing, but we have not | |
324 tried them. Please send us your feedback so that we can update this FAQ | |
325 entry. | |
326 | |
327 ------------------------------------------------------------------------------- | |
328 | |
329 Question 2.7. Which language is FFTW written in? | |
330 | |
331 FFTW is written in ANSI C. Most of the code, however, was automatically | |
332 generated by a program called genfft, written in the Objective Caml | |
333 dialect of ML. You do not need to know ML or to have an Objective Caml | |
334 compiler in order to use FFTW. | |
335 | |
336 genfft is provided with the FFTW sources, which means that you can play | |
337 with the code generator if you want. In this case, you need a working | |
338 Objective Caml system. Objective Caml is available from the Caml web | |
339 page. | |
340 | |
341 ------------------------------------------------------------------------------- | |
342 | |
343 Question 2.8. Can I call FFTW from Fortran? | |
344 | |
345 Yes, FFTW (versions 1.3 and higher) contains a Fortran-callable interface, | |
346 documented in the FFTW manual. | |
347 | |
348 By default, FFTW configures its Fortran interface to work with the first | |
349 compiler it finds, e.g. g77. To configure for a different, incompatible | |
350 Fortran compiler foobar, use ./configure F77=foobar when installing FFTW. | |
351 (In the case of g77, however, FFTW 3.x also includes an extra set of | |
352 Fortran-callable routines with one less underscore at the end of | |
353 identifiers, which should cover most other Fortran compilers on Linux at | |
354 least.) | |
355 | |
356 ------------------------------------------------------------------------------- | |
357 | |
358 Question 2.9. Can I call FFTW from C++? | |
359 | |
360 Most definitely. FFTW should compile and/or link under any C++ compiler. | |
361 Moreover, it is likely that the C++ <complex> template class is | |
362 bit-compatible with FFTW's complex-number format (see the FFTW manual for | |
363 more details). | |
364 | |
365 ------------------------------------------------------------------------------- | |
366 | |
367 Question 2.10. Why isn't FFTW written in Fortran/C++? | |
368 | |
369 Because we don't like those languages, and neither approaches the | |
370 portability of C. | |
371 | |
372 ------------------------------------------------------------------------------- | |
373 | |
374 Question 2.11. How do I compile FFTW to run in single precision? | |
375 | |
376 On a Unix system: configure --enable-float. On a non-Unix system: edit | |
377 config.h to #define the symbol FFTW_SINGLE (for FFTW 3.x). In both cases, | |
378 you must then recompile FFTW. In FFTW 3, all FFTW identifiers will then | |
379 begin with fftwf_ instead of fftw_. | |
380 | |
381 ------------------------------------------------------------------------------- | |
382 | |
383 Question 2.12. --enable-k7 does not work on x86-64 | |
384 | |
385 Support for --enable-k7 was discontinued in fftw-3.2. | |
386 | |
387 The fftw-3.1 release supports --enable-k7. This option only works on | |
388 32-bit x86 machines that implement 3DNow!, including the AMD Athlon and | |
389 the AMD Opteron in 32-bit mode. --enable-k7 does not work on AMD Opteron | |
390 in 64-bit mode. Use --enable-sse for x86-64 machines. | |
391 | |
392 FFTW supports 3DNow! by means of assembly code generated by a | |
393 special-purpose compiler. It is hard to produce assembly code that works | |
394 in both 32-bit and 64-bit mode. | |
395 | |
396 =============================================================================== | |
397 | |
398 Section 3. Using FFTW | |
399 | |
400 Q3.1 Why not support the FFTW 2 interface in FFTW 3? | |
401 Q3.2 Why do FFTW 3 plans encapsulate the input/output arrays and not ju | |
402 Q3.3 FFTW seems really slow. | |
403 Q3.4 FFTW slows down after repeated calls. | |
404 Q3.5 An FFTW routine is crashing when I call it. | |
405 Q3.6 My Fortran program crashes when calling FFTW. | |
406 Q3.7 FFTW gives results different from my old FFT. | |
407 Q3.8 FFTW gives different results between runs | |
408 Q3.9 Can I save FFTW's plans? | |
409 Q3.10 Why does your inverse transform return a scaled result? | |
410 Q3.11 How can I make FFTW put the origin (zero frequency) at the center | |
411 Q3.12 How do I FFT an image/audio file in *foobar* format? | |
412 Q3.13 My program does not link (on Unix). | |
413 Q3.14 I included your header, but linking still fails. | |
414 Q3.15 My program crashes, complaining about stack space. | |
415 Q3.16 FFTW seems to have a memory leak. | |
416 Q3.17 The output of FFTW's transform is all zeros. | |
417 Q3.18 How do I call FFTW from the Microsoft language du jour? | |
418 Q3.19 Can I compute only a subset of the DFT outputs? | |
419 Q3.20 Can I use FFTW's routines for in-place and out-of-place matrix tra | |
420 | |
421 ------------------------------------------------------------------------------- | |
422 | |
423 Question 3.1. Why not support the FFTW 2 interface in FFTW 3? | |
424 | |
425 FFTW 3 has semantics incompatible with earlier versions: its plans can | |
426 only be used for a given stride, multiplicity, and other characteristics | |
427 of the input and output arrays; these stronger semantics are necessary for | |
428 performance reasons. Thus, it is impossible to efficiently emulate the | |
429 older interface (whose plans can be used for any transform of the same | |
430 size). We believe that it should be possible to upgrade most programs | |
431 without any difficulty, however. | |
432 | |
433 ------------------------------------------------------------------------------- | |
434 | |
435 Question 3.2. Why do FFTW 3 plans encapsulate the input/output arrays and not just the algorithm? | |
436 | |
437 There are several reasons: | |
438 | |
439 * It was important for performance reasons that the plan be specific to | |
440 array characteristics like the stride (and alignment, for SIMD), and | |
441 requiring that the user maintain these invariants is error prone. | |
442 * In most high-performance applications, as far as we can tell, you are | |
443 usually transforming the same array over and over, so FFTW's semantics | |
444 should not be a burden. | |
445 * If you need to transform another array of the same size, creating a new | |
446 plan once the first exists is a cheap operation. | |
447 * If you need to transform many arrays of the same size at once, you | |
448 should really use the plan_many routines in FFTW's "advanced" interface. | |
449 * If the abovementioned array characteristics are the same, you are | |
450 willing to pay close attention to the documentation, and you really need | |
451 to, we provide a "new-array execution" interface to apply a plan to a | |
452 new array. | |
453 | |
454 ------------------------------------------------------------------------------- | |
455 | |
456 Question 3.3. FFTW seems really slow. | |
457 | |
458 You are probably recreating the plan before every transform, rather than | |
459 creating it once and reusing it for all transforms of the same size. FFTW | |
460 is designed to be used in the following way: | |
461 | |
462 * First, you create a plan. This will take several seconds. | |
463 * Then, you reuse the plan many times to perform FFTs. These are fast. | |
464 | |
465 If you don't need to compute many transforms and the time for the planner | |
466 is significant, you have two options. First, you can use the | |
467 FFTW_ESTIMATE option in the planner, which uses heuristics instead of | |
468 runtime measurements and produces a good plan in a short time. Second, | |
469 you can use the wisdom feature to precompute the plan; see Q3.9 `Can I | |
470 save FFTW's plans?' | |
471 | |
472 ------------------------------------------------------------------------------- | |
473 | |
474 Question 3.4. FFTW slows down after repeated calls. | |
475 | |
476 Probably, NaNs or similar are creeping into your data, and the slowdown is | |
477 due to the resulting floating-point exceptions. For example, be aware | |
478 that repeatedly FFTing the same array is a diverging process (because FFTW | |
479 computes the unnormalized transform). | |
480 | |
481 ------------------------------------------------------------------------------- | |
482 | |
483 Question 3.5. An FFTW routine is crashing when I call it. | |
484 | |
485 Did the FFTW test programs pass (make check, or cd tests; make bigcheck if | |
486 you want to be paranoid)? If so, you almost certainly have a bug in your | |
487 own code. For example, you could be passing invalid arguments (such as | |
488 wrongly-sized arrays) to FFTW, or you could simply have memory corruption | |
489 elsewhere in your program that causes random crashes later on. Please | |
490 don't complain to us unless you can come up with a minimal self-contained | |
491 program (preferably under 30 lines) that illustrates the problem. | |
492 | |
493 ------------------------------------------------------------------------------- | |
494 | |
495 Question 3.6. My Fortran program crashes when calling FFTW. | |
496 | |
497 As described in the manual, on 64-bit machines you must store the plans in | |
498 variables large enough to hold a pointer, for example integer*8. We | |
499 recommend using integer*8 on 32-bit machines as well, to simplify porting. | |
500 | |
501 ------------------------------------------------------------------------------- | |
502 | |
503 Question 3.7. FFTW gives results different from my old FFT. | |
504 | |
505 People follow many different conventions for the DFT, and you should be | |
506 sure to know the ones that we use (described in the FFTW manual). In | |
507 particular, you should be aware that the FFTW_FORWARD/FFTW_BACKWARD | |
508 directions correspond to signs of -1/+1 in the exponent of the DFT | |
509 definition. (*Numerical Recipes* uses the opposite convention.) | |
510 | |
511 You should also know that we compute an unnormalized transform. In | |
512 contrast, Matlab is an example of program that computes a normalized | |
513 transform. See Q3.10 `Why does your inverse transform return a scaled | |
514 result?'. | |
515 | |
516 Finally, note that floating-point arithmetic is not exact, so different | |
517 FFT algorithms will give slightly different results (on the order of the | |
518 numerical accuracy; typically a fractional difference of 1e-15 or so in | |
519 double precision). | |
520 | |
521 ------------------------------------------------------------------------------- | |
522 | |
523 Question 3.8. FFTW gives different results between runs | |
524 | |
525 If you use FFTW_MEASURE or FFTW_PATIENT mode, then the algorithm FFTW | |
526 employs is not deterministic: it depends on runtime performance | |
527 measurements. This will cause the results to vary slightly from run to | |
528 run. However, the differences should be slight, on the order of the | |
529 floating-point precision, and therefore should have no practical impact on | |
530 most applications. | |
531 | |
532 If you use saved plans (wisdom) or FFTW_ESTIMATE mode, however, then the | |
533 algorithm is deterministic and the results should be identical between | |
534 runs. | |
535 | |
536 ------------------------------------------------------------------------------- | |
537 | |
538 Question 3.9. Can I save FFTW's plans? | |
539 | |
540 Yes. Starting with version 1.2, FFTW provides the wisdom mechanism for | |
541 saving plans; see the FFTW manual. | |
542 | |
543 ------------------------------------------------------------------------------- | |
544 | |
545 Question 3.10. Why does your inverse transform return a scaled result? | |
546 | |
547 Computing the forward transform followed by the backward transform (or | |
548 vice versa) yields the original array scaled by the size of the array. | |
549 (For multi-dimensional transforms, the size of the array is the product of | |
550 the dimensions.) We could, instead, have chosen a normalization that | |
551 would have returned the unscaled array. Or, to accomodate the many | |
552 conventions in this matter, the transform routines could have accepted a | |
553 "scale factor" parameter. We did not do this, however, for two reasons. | |
554 First, we didn't want to sacrifice performance in the common case where | |
555 the scale factor is 1. Second, in real applications the FFT is followed or | |
556 preceded by some computation on the data, into which the scale factor can | |
557 typically be absorbed at little or no cost. | |
558 | |
559 ------------------------------------------------------------------------------- | |
560 | |
561 Question 3.11. How can I make FFTW put the origin (zero frequency) at the center of its output? | |
562 | |
563 For human viewing of a spectrum, it is often convenient to put the origin | |
564 in frequency space at the center of the output array, rather than in the | |
565 zero-th element (the default in FFTW). If all of the dimensions of your | |
566 array are even, you can accomplish this by simply multiplying each element | |
567 of the input array by (-1)^(i + j + ...), where i, j, etcetera are the | |
568 indices of the element. (This trick is a general property of the DFT, and | |
569 is not specific to FFTW.) | |
570 | |
571 ------------------------------------------------------------------------------- | |
572 | |
573 Question 3.12. How do I FFT an image/audio file in *foobar* format? | |
574 | |
575 FFTW performs an FFT on an array of floating-point values. You can | |
576 certainly use it to compute the transform of an image or audio stream, but | |
577 you are responsible for figuring out your data format and converting it to | |
578 the form FFTW requires. | |
579 | |
580 ------------------------------------------------------------------------------- | |
581 | |
582 Question 3.13. My program does not link (on Unix). | |
583 | |
584 The libraries must be listed in the correct order (-lfftw3 -lm for FFTW | |
585 3.x) and *after* your program sources/objects. (The general rule is that | |
586 if *A* uses *B*, then *A* must be listed before *B* in the link command.). | |
587 | |
588 ------------------------------------------------------------------------------- | |
589 | |
590 Question 3.14. I included your header, but linking still fails. | |
591 | |
592 You're a C++ programmer, aren't you? You have to compile the FFTW library | |
593 and link it into your program, not just #include <fftw3.h>. (Yes, this is | |
594 really a FAQ.) | |
595 | |
596 ------------------------------------------------------------------------------- | |
597 | |
598 Question 3.15. My program crashes, complaining about stack space. | |
599 | |
600 You cannot declare large arrays with automatic storage (e.g. via | |
601 fftw_complex array[N]); you should use fftw_malloc (or equivalent) to | |
602 allocate the arrays you want to transform if they are larger than a few | |
603 hundred elements. | |
604 | |
605 ------------------------------------------------------------------------------- | |
606 | |
607 Question 3.16. FFTW seems to have a memory leak. | |
608 | |
609 After you create a plan, FFTW caches the information required to quickly | |
610 recreate the plan. (See Q3.9 `Can I save FFTW's plans?') It also | |
611 maintains a small amount of other persistent memory. You can deallocate | |
612 all of FFTW's internally allocated memory, if you wish, by calling | |
613 fftw_cleanup(), as documented in the manual. | |
614 | |
615 ------------------------------------------------------------------------------- | |
616 | |
617 Question 3.17. The output of FFTW's transform is all zeros. | |
618 | |
619 You should initialize your input array *after* creating the plan, unless | |
620 you use FFTW_ESTIMATE: planning with FFTW_MEASURE or FFTW_PATIENT | |
621 overwrites the input/output arrays, as described in the manual. | |
622 | |
623 ------------------------------------------------------------------------------- | |
624 | |
625 Question 3.18. How do I call FFTW from the Microsoft language du jour? | |
626 | |
627 Please *do not* ask us Windows-specific questions. We do not use Windows. | |
628 We know nothing about Visual Basic, Visual C++, or .NET. Please find the | |
629 appropriate Usenet discussion group and ask your question there. See also | |
630 Q2.2 `Does FFTW run on Windows?'. | |
631 | |
632 ------------------------------------------------------------------------------- | |
633 | |
634 Question 3.19. Can I compute only a subset of the DFT outputs? | |
635 | |
636 In general, no, an FFT intrinsically computes all outputs from all inputs. | |
637 In principle, there is something called a *pruned FFT* that can do what | |
638 you want, but to compute K outputs out of N the complexity is in general | |
639 O(N log K) instead of O(N log N), thus saving only a small additive factor | |
640 in the log. (The same argument holds if you instead have only K nonzero | |
641 inputs.) | |
642 | |
643 There are some specific cases in which you can get the O(N log K) | |
644 performance benefits easily, however, by combining a few ordinary FFTs. | |
645 In particular, the case where you want the first K outputs, where K | |
646 divides N, can be handled by performing N/K transforms of size K and then | |
647 summing the outputs multiplied by appropriate phase factors. For more | |
648 details, see pruned FFTs with FFTW. | |
649 | |
650 There are also some algorithms that compute pruned transforms | |
651 *approximately*, but they are beyond the scope of this FAQ. | |
652 | |
653 ------------------------------------------------------------------------------- | |
654 | |
655 Question 3.20. Can I use FFTW's routines for in-place and out-of-place matrix transposition? | |
656 | |
657 You can use the FFTW guru interface to create a rank-0 transform of vector | |
658 rank 2 where the vector strides are transposed. (A rank-0 transform is | |
659 equivalent to a 1D transform of size 1, which. just copies the input into | |
660 the output.) Specifying the same location for the input and output makes | |
661 the transpose in-place. | |
662 | |
663 For double-valued data stored in row-major format, plan creation looks | |
664 like this: | |
665 | |
666 fftw_plan plan_transpose(int rows, int cols, double *in, double *out) | |
667 { | |
668 const unsigned flags = FFTW_ESTIMATE; /* other flags are possible */ | |
669 fftw_iodim howmany_dims[2]; | |
670 | |
671 howmany_dims[0].n = rows; | |
672 howmany_dims[0].is = cols; | |
673 howmany_dims[0].os = 1; | |
674 | |
675 howmany_dims[1].n = cols; | |
676 howmany_dims[1].is = 1; | |
677 howmany_dims[1].os = rows; | |
678 | |
679 return fftw_plan_guru_r2r(/*rank=*/ 0, /*dims=*/ NULL, | |
680 /*howmany_rank=*/ 2, howmany_dims, | |
681 in, out, /*kind=*/ NULL, flags); | |
682 } | |
683 (This entry was written by Rhys Ulerich.) | |
684 | |
685 =============================================================================== | |
686 | |
687 Section 4. Internals of FFTW | |
688 | |
689 Q4.1 How does FFTW work? | |
690 Q4.2 Why is FFTW so fast? | |
691 | |
692 ------------------------------------------------------------------------------- | |
693 | |
694 Question 4.1. How does FFTW work? | |
695 | |
696 The innovation (if it can be so called) in FFTW consists in having a | |
697 variety of composable *solvers*, representing different FFT algorithms and | |
698 implementation strategies, whose combination into a particular *plan* for | |
699 a given size can be determined at runtime according to the characteristics | |
700 of your machine/compiler. This peculiar software architecture allows FFTW | |
701 to adapt itself to almost any machine. | |
702 | |
703 For more details (albeit somewhat outdated), see the paper "FFTW: An | |
704 Adaptive Software Architecture for the FFT", by M. Frigo and S. G. | |
705 Johnson, *Proc. ICASSP* 3, 1381 (1998), also available at the FFTW web | |
706 page. | |
707 | |
708 ------------------------------------------------------------------------------- | |
709 | |
710 Question 4.2. Why is FFTW so fast? | |
711 | |
712 This is a complex question, and there is no simple answer. In fact, the | |
713 authors do not fully know the answer, either. In addition to many small | |
714 performance hacks throughout FFTW, there are three general reasons for | |
715 FFTW's speed. | |
716 | |
717 * FFTW uses a variety of FFT algorithms and implementation styles that | |
718 can be arbitrarily composed to adapt itself to a machine. See Q4.1 `How | |
719 does FFTW work?'. | |
720 * FFTW uses a code generator to produce highly-optimized routines for | |
721 computing small transforms. | |
722 * FFTW uses explicit divide-and-conquer to take advantage of the memory | |
723 hierarchy. | |
724 | |
725 For more details (albeit somewhat outdated), see the paper "FFTW: An | |
726 Adaptive Software Architecture for the FFT", by M. Frigo and S. G. | |
727 Johnson, *Proc. ICASSP* 3, 1381 (1998), available along with other | |
728 references at the FFTW web page. | |
729 | |
730 =============================================================================== | |
731 | |
732 Section 5. Known bugs | |
733 | |
734 Q5.1 FFTW 1.1 crashes in rfftwnd on Linux. | |
735 Q5.2 The MPI transforms in FFTW 1.2 give incorrect results/leak memory. | |
736 Q5.3 The test programs in FFTW 1.2.1 fail when I change FFTW to use sin | |
737 Q5.4 The test program in FFTW 1.2.1 fails for n > 46340. | |
738 Q5.5 The threaded code fails on Linux Redhat 5.0 | |
739 Q5.6 FFTW 2.0's rfftwnd fails for rank > 1 transforms with a final dime | |
740 Q5.7 FFTW 2.0's complex transforms give the wrong results with prime fa | |
741 Q5.8 FFTW 2.1.1's MPI test programs crash with MPICH. | |
742 Q5.9 FFTW 2.1.2's multi-threaded transforms don't work on AIX. | |
743 Q5.10 FFTW 2.1.2's complex transforms give incorrect results for large p | |
744 Q5.11 FFTW 2.1.3's multi-threaded transforms don't give any speedup on S | |
745 Q5.12 FFTW 2.1.3 crashes on AIX. | |
746 | |
747 ------------------------------------------------------------------------------- | |
748 | |
749 Question 5.1. FFTW 1.1 crashes in rfftwnd on Linux. | |
750 | |
751 This bug was fixed in FFTW 1.2. There was a bug in rfftwnd causing an | |
752 incorrect amount of memory to be allocated. The bug showed up in Linux | |
753 with libc-5.3.12 (and nowhere else that we know of). | |
754 | |
755 ------------------------------------------------------------------------------- | |
756 | |
757 Question 5.2. The MPI transforms in FFTW 1.2 give incorrect results/leak memory. | |
758 | |
759 These bugs were corrected in FFTW 1.2.1. The MPI transforms (really, just | |
760 the transpose routines) in FFTW 1.2 had bugs that could cause errors in | |
761 some situations. | |
762 | |
763 ------------------------------------------------------------------------------- | |
764 | |
765 Question 5.3. The test programs in FFTW 1.2.1 fail when I change FFTW to use single precision. | |
766 | |
767 This bug was fixed in FFTW 1.3. (Older versions of FFTW did work in | |
768 single precision, but the test programs didn't--the error tolerances in | |
769 the tests were set for double precision.) | |
770 | |
771 ------------------------------------------------------------------------------- | |
772 | |
773 Question 5.4. The test program in FFTW 1.2.1 fails for n > 46340. | |
774 | |
775 This bug was fixed in FFTW 1.3. FFTW 1.2.1 produced the right answer, but | |
776 the test program was wrong. For large n, n*n in the naive transform that | |
777 we used for comparison overflows 32 bit integer precision, breaking the | |
778 test. | |
779 | |
780 ------------------------------------------------------------------------------- | |
781 | |
782 Question 5.5. The threaded code fails on Linux Redhat 5.0 | |
783 | |
784 We had problems with glibc-2.0.5. The code should work with glibc-2.0.7. | |
785 | |
786 ------------------------------------------------------------------------------- | |
787 | |
788 Question 5.6. FFTW 2.0's rfftwnd fails for rank > 1 transforms with a final dimension >= 65536. | |
789 | |
790 This bug was fixed in FFTW 2.0.1. (There was a 32-bit integer overflow | |
791 due to a poorly-parenthesized expression.) | |
792 | |
793 ------------------------------------------------------------------------------- | |
794 | |
795 Question 5.7. FFTW 2.0's complex transforms give the wrong results with prime factors 17 to 97. | |
796 | |
797 There was a bug in the complex transforms that could cause incorrect | |
798 results under (hopefully rare) circumstances for lengths with | |
799 intermediate-size prime factors (17-97). This bug was fixed in FFTW | |
800 2.1.1. | |
801 | |
802 ------------------------------------------------------------------------------- | |
803 | |
804 Question 5.8. FFTW 2.1.1's MPI test programs crash with MPICH. | |
805 | |
806 This bug was fixed in FFTW 2.1.2. The 2.1/2.1.1 MPI test programs crashed | |
807 when using the MPICH implementation of MPI with the ch_p4 device (TCP/IP); | |
808 the transforms themselves worked fine. | |
809 | |
810 ------------------------------------------------------------------------------- | |
811 | |
812 Question 5.9. FFTW 2.1.2's multi-threaded transforms don't work on AIX. | |
813 | |
814 This bug was fixed in FFTW 2.1.3. The multi-threaded transforms in | |
815 previous versions didn't work with AIX's pthreads implementation, which | |
816 idiosyncratically creates threads in detached (non-joinable) mode by | |
817 default. | |
818 | |
819 ------------------------------------------------------------------------------- | |
820 | |
821 Question 5.10. FFTW 2.1.2's complex transforms give incorrect results for large prime sizes. | |
822 | |
823 This bug was fixed in FFTW 2.1.3. FFTW's complex-transform algorithm for | |
824 prime sizes (in versions 2.0 to 2.1.2) had an integer overflow problem | |
825 that caused incorrect results for many primes greater than 32768 (on | |
826 32-bit machines). (Sizes without large prime factors are not affected.) | |
827 | |
828 ------------------------------------------------------------------------------- | |
829 | |
830 Question 5.11. FFTW 2.1.3's multi-threaded transforms don't give any speedup on Solaris. | |
831 | |
832 This bug was fixed in FFTW 2.1.4. (By default, Solaris creates threads | |
833 that do not parallelize over multiple processors, so one has to request | |
834 the proper behavior specifically.) | |
835 | |
836 ------------------------------------------------------------------------------- | |
837 | |
838 Question 5.12. FFTW 2.1.3 crashes on AIX. | |
839 | |
840 The FFTW 2.1.3 configure script picked incorrect compiler flags for the | |
841 xlc compiler on newer IBM processors. This is fixed in FFTW 2.1.4. | |
842 |