comparison src/fftw-3.3.5/doc/FAQ/fftw-faq.bfnn @ 42:2cd0e3b3e1fd

Current fftw source
author Chris Cannam
date Tue, 18 Oct 2016 13:40:26 +0100
parents
children
comparison
equal deleted inserted replaced
41:481f5f8c5634 42:2cd0e3b3e1fd
1 \comment This is the source for the FFTW FAQ list, in
2 \comment the Bizarre Format With No Name. It is turned into Lout
3 \comment input, HTML, plain ASCII and an Info document by a Perl script.
4 \comment
5 \comment The format and scripts come from the Linux FAQ, by
6 \comment Ian Jackson.
7 \set brieftitle FFTW FAQ
8 \set author <A href="http://www.fftw.org">Matteo Frigo and Steven G. Johnson</A> / <A href="mailto:fftw@fftw.org">fftw@fftw.org</A>
9 \set authormail fftw@fftw.org
10 \set title FFTW Frequently Asked Questions with Answers
11 \set copyholder Matteo Frigo and Massachusetts Institute of Technology
12 \call-html startup html.refs2
13 \copyto ASCII
14 FFTW FREQUENTLY ASKED QUESTIONS WITH ANSWERS
15 `date '+%d %h %Y'`
16 Matteo Frigo
17 Steven G. Johnson
18 <fftw@fftw.org>
19
20 \endcopy
21 \copyto INFO
22 INFO-DIR-SECTION Development
23 START-INFO-DIR-ENTRY
24 * FFTW FAQ: (fftw-faq). FFTW Frequently Asked Questions with Answers.
25 END-INFO-DIR-ENTRY
26
27 
28 File: $prefix.info, Node: Top, Next: Question 1.1, Up: (dir)
29
30 FFTW FREQUENTLY ASKED QUESTIONS WITH ANSWERS
31 `date '+%d %h %Y'`
32 Matteo Frigo
33 Steven G. Johnson
34 <fftw@fftw.org>
35
36 \endcopy
37
38 This is the list of Frequently Asked Questions about FFTW, a
39 collection of fast C routines for computing the Discrete Fourier
40 Transform in one or more dimensions.
41
42 \section Index
43
44 \index
45
46 \comment ######################################################################
47
48 \section Introduction and General Information
49
50 \question 26aug:whatisfftw What is FFTW?
51
52 FFTW is a free collection of fast C routines for computing the
53 Discrete Fourier Transform in one or more dimensions. It includes
54 complex, real, symmetric, and parallel transforms, and can handle
55 arbitrary array sizes efficiently. FFTW is typically faster than
56 other publically-available FFT implementations, and is even
57 competitive with vendor-tuned libraries. (See our web page for
58 extensive benchmarks.) To achieve this performance, FFTW uses novel
59 code-generation and runtime self-optimization techniques (along with
60 many other tricks).
61
62 \question 26aug:whereisfftw How do I obtain FFTW?
63
64 FFTW can be found at \docref{the FFTW web page\}. You can also
65 retrieve it from \ftpon ftp.fftw.org in \ftpin /pub/fftw.
66
67 \question 26aug:isfftwfree Is FFTW free software?
68
69 Starting with version 1.3, FFTW is Free Software in the technical
70 sense defined by the Free Software Foundation (see \docref{Categories
71 of Free and Non-Free Software\}), and is distributed under the terms
72 of the GNU General Public License. Previous versions of FFTW were
73 distributed without fee for noncommercial use, but were not
74 technically ``free.''
75
76 Non-free licenses for FFTW are also available that permit different
77 terms of use than the GPL.
78
79 \question 10apr:nonfree What is this about non-free licenses?
80
81 The non-free licenses are for companies that wish to use FFTW in their
82 products but are unwilling to release their software under the GPL
83 (which would require them to release source code and allow free
84 redistribution). Such users can purchase an unlimited-use license
85 from MIT. Contact us for more details.
86
87 We could instead have released FFTW under the LGPL, or even disallowed
88 non-Free usage. Suffice it to say, however, that MIT owns the
89 copyright to FFTW and they only let us GPL it because we convinced
90 them that it would neither affect their licensing revenue nor irritate
91 existing licensees.
92
93 \question 24oct:west In the West? I thought MIT was in the East?
94
95 Not to an Italian. You could say that we're a Spaghetti Western
96 (with apologies to Sergio Leone).
97
98 \comment ######################################################################
99
100 \section Installing FFTW
101
102 \question 26aug:systems Which systems does FFTW run on?
103
104 FFTW is written in ANSI C, and should work on any system with a decent
105 C compiler. (See also \qref runOnWindows, \qref compilerCrashes.)
106 FFTW can also take advantage of certain hardware-specific features,
107 such as cycle counters and SIMD instructions, but this is optional.
108
109 \question 26aug:runOnWindows Does FFTW run on Windows?
110
111 Yes, many people have reported successfully using FFTW on Windows with
112 various compilers. FFTW was not developed on Windows, but the source
113 code is essentially straight ANSI C. See also the \docref{FFTW
114 Windows installation notes\}, \qref compilerCrashes, and \qref
115 vbetalia.
116
117 \question 26aug:compilerCrashes My compiler has trouble with FFTW.
118
119 Complain fiercely to the vendor of the compiler.
120
121 We have successfully used \courier{gcc\} 3.2.x on x86 and PPC, a
122 recent Compaq C compiler for Alpha, version 6 of IBM's \courier{xlc\}
123 compiler for AIX, Intel's \courier{icc\} versions 5-7, and Sun
124 WorkShop \courier{cc\} version 6.
125
126 FFTW is likely to push compilers to their limits, however, and several
127 compiler bugs have been exposed by FFTW. A partial list follows.
128
129 \courier{gcc\} 2.95.x for Solaris/SPARC produces incorrect code for
130 the test program (workaround: recompile the \courier{libbench2\}
131 directory with \courier{-O2\}).
132
133 NetBSD/macppc 1.6 comes with a \courier{gcc\} version that also
134 miscompiles the test program. (Please report a workaround if you know
135 one.)
136
137 \courier{gcc\} 3.2.3 for ARM reportedly crashes during compilation.
138 This bug is reportedly fixed in later versions of \courier{gcc\}.
139
140 Versions 8.0 and 8.1 of Intel's \courier{icc\} falsely claim to be
141 \courier{gcc\}, so you should specify \courier{CC="icc -no-gcc"\};
142 this is automatic in FFTW 3.1. \courier{icc-8.0.066\} reportely
143 produces incorrect code for FFTW 2.1.5, but is fixed in version 8.1.
144 \courier{icc-7.1\} compiler build 20030402Z appears to produce
145 incorrect dependencies, causing the compilation to fail.
146 \courier{icc-7.1\} build 20030307Z appears to work fine. (Use
147 \courier{icc -V\} to check which build you have.) As of 2003/04/18,
148 build 20030402Z appears not to be available any longer on Intel's
149 website, whereas the older build 20030307Z is available.
150
151 \courier{ranlib\} of GNU \courier{binutils\} 2.9.1 on Irix has been
152 observed to corrupt the FFTW libraries, causing a link failure when
153 FFTW is compiled. Since \courier{ranlib\} is completely superfluous
154 on Irix, we suggest deleting it from your system and replacing it with
155 a symbolic link to \courier{/bin/echo\}.
156
157 If support for SIMD instructions is enabled in FFTW, further compiler
158 problems may appear:
159
160 \courier{gcc\} 3.4.[0123] for x86 produces incorrect SSE2 code for
161 FFTW when \courier{-O2\} (the best choice for FFTW) is used, causing
162 FFTW to crash (\courier{make check\} crashes). This bug is fixed in
163 \courier{gcc\} 3.4.4. On x86_64 (amd64/em64t), \courier{gcc\} 3.4.4
164 reportedly still has a similar problem, but this is fixed as of
165 \courier{gcc\} 3.4.6.
166
167 \courier{gcc-3.2\} for x86 produces incorrect SIMD code if
168 \courier{-O3\} is used. The same compiler produces incorrect SIMD
169 code if no optimization is used, too. When using \courier{gcc-3.2\},
170 it is a good idea not to change the default \courier{CFLAGS\} selected
171 by the \courier{configure\} script.
172
173 Some 3.0.x and 3.1.x versions of \courier{gcc\} on \courier{x86\} may
174 crash. \courier{gcc\} so-called 2.96 shipping with RedHat 7.3 crashes
175 when compiling SIMD code. In both cases, please upgrade to
176 \courier{gcc-3.2\} or later.
177
178 Intel's \courier{icc\} 6.0 misaligns SSE constants, but FFTW has a
179 workaround. \courier{icc\} 8.x fails to compile FFTW 3.0.x because it
180 falsely claims to be \courier{gcc\}; we believe this to be a bug in
181 \courier{icc\}, but FFTW 3.1 has a workaround.
182
183 Visual C++ 2003 reportedly produces incorrect code for SSE/SSE2 when
184 compiling FFTW. This bug was reportedly fixed in VC++ 2005;
185 alternatively, you could switch to the Intel compiler. VC++ 6.0 also
186 reportedly produces incorrect code for the file
187 \courier{reodft11e-r2hc-odd.c\} unless optimizations are disabled for
188 that file.
189
190 \courier{gcc\} 2.95 on MacOS X miscompiles AltiVec code (fixed in
191 later versions). \courier{gcc\} 3.2.x miscompiles AltiVec
192 permutations, but FFTW has a workaround. \courier{gcc\} 4.0.1 on
193 MacOS for Intel crashes when compiling FFTW; a workaround is to
194 compile one file without optimization: \courier{cd kernel; make
195 CFLAGS=" " trig.lo\}.
196
197 \courier{gcc\} 4.1.1 reportedly crashes when compiling FFTW for MIPS;
198 the workaround is to compile the file it crashes on
199 (\courier{t2_64.c\}) with a lower optimization level.
200
201 \courier{gcc\} versions 4.1.2 to 4.2.0 for x86 reportedly miscompile
202 FFTW 3.1's test program, causing \courier{make check\} to crash
203 (\courier{gcc\} bug #26528). The bug was reportedly fixed in
204 \courier{gcc\} version 4.2.1 and later. A workaround is to compile
205 \courier{libbench2/verify-lib.c\} without optimization.
206
207 \question 26aug:solarisSucks FFTW does not compile on Solaris, complaining about \courier{const\}.
208
209 We know that at least on Solaris 2.5.x with Sun's compilers 4.2 you
210 might get error messages from \courier{make\} such as
211
212 \courier{"./fftw.h", line 88: warning: const is a keyword in ANSI C\}
213
214 This is the case when the \courier{configure\} script reports that
215 \courier{const\} does not work:
216
217 \courier{checking for working const... (cached) no\}
218
219 You should be aware that Solaris comes with two compilers, namely,
220 \courier{/opt/SUNWspro/SC4.2/bin/cc\} and \courier{/usr/ucb/cc\}. The
221 latter compiler is non-ANSI. Indeed, it is a perverse shell script
222 that calls the real compiler in non-ANSI mode. In order
223 to compile FFTW, change your path so that the right \courier{cc\}
224 is used.
225
226 To know whether your compiler is the right one, type
227 \courier{cc -V\}. If the compiler prints ``\courier{ucbcc\}'',
228 as in
229
230 \courier{ucbcc: WorkShop Compilers 4.2 30 Oct 1996 C 4.2\}
231
232 then the compiler is wrong. The right message is something like
233
234 \courier{cc: WorkShop Compilers 4.2 30 Oct 1996 C 4.2\}
235
236 \question 19mar:3dnow What's the difference between \courier{--enable-3dnow\} and \courier{--enable-k7\}?
237
238 \courier{--enable-k7\} enables 3DNow! instructions on K7 processors
239 (AMD Athlon and its variants). K7 support is provided by assembly
240 routines generated by a special purpose compiler.
241 As of fftw-3.2, --enable-k7 is no longer supported.
242
243 \courier{--enable-3dnow\} enables generic 3DNow! support using
244 \courier{gcc\} builtin functions. This works on earlier AMD
245 processors, but it is not as fast as our special assembly routines.
246 As of fftw-3.1, --enable-3dnow is no longer supported.
247
248 \question 18apr:fma What's the difference between the fma and the non-fma versions?
249
250 The fma version tries to exploit the fused multiply-add instructions
251 implemented in many processors such as PowerPC, ia-64, and MIPS. The
252 two FFTW packages are otherwise identical. In FFTW 3.1, the fma and
253 non-fma versions were merged together into a single package, and the
254 \courier{configure\} script attempts to automatically guess which
255 version to use.
256
257 The FFTW 3.1 \courier{configure\} script enables fma by default on
258 PowerPC, Itanium, and PA-RISC, and disables it otherwise. You can
259 force one or the other by using the \courier{--enable-fma\} or
260 \courier{--disable-fma\} flag for \courier{configure\}.
261
262 Definitely use fma if you have a PowerPC-based system with
263 \courier{gcc\} (or IBM \courier{xlc\}). This includes all GNU/Linux
264 systems for PowerPC and the older PowerPC-based MacOS systems. Also
265 use it on PA-RISC and Itanium with the HP/UX compiler.
266
267 Definitely do not use the fma version if you have an ia-32 processor
268 (Intel, AMD, MacOS on Intel, etcetera).
269
270 For other architectures/compilers, the situation is not so clear. For
271 example, ia-64 has the fma instruction, but \courier{gcc-3.2\} appears
272 not to exploit it correctly. Other compilers may do the right thing,
273 but we have not tried them. Please send us your feedback so that we
274 can update this FAQ entry.
275
276 \question 26aug:languages Which language is FFTW written in?
277
278 FFTW is written in ANSI C. Most of the code, however, was
279 automatically generated by a program called \courier{genfft\}, written
280 in the Objective Caml dialect of ML. You do not need to know ML or to
281 have an Objective Caml compiler in order to use FFTW.
282
283 \courier{genfft\} is provided with the FFTW sources, which means that
284 you can play with the code generator if you want. In this case, you
285 need a working Objective Caml system. Objective Caml is available
286 from \docref{the Caml web page\}.
287
288 \question 26aug:fortran Can I call FFTW from Fortran?
289
290 Yes, FFTW (versions 1.3 and higher) contains a Fortran-callable
291 interface, documented in the FFTW manual.
292
293 By default, FFTW configures its Fortran interface to work with the
294 first compiler it finds, e.g. \courier{g77\}. To configure for a
295 different, incompatible Fortran compiler \courier{foobar\}, use
296 \courier{./configure F77=foobar\} when installing FFTW. (In the case
297 of \courier{g77\}, however, FFTW 3.x also includes an extra set of
298 Fortran-callable routines with one less underscore at the end of
299 identifiers, which should cover most other Fortran compilers on Linux
300 at least.)
301
302 \question 26aug:cplusplus Can I call FFTW from C++?
303
304 Most definitely. FFTW should compile and/or link under any C++
305 compiler. Moreover, it is likely that the C++ \courier{<complex>\}
306 template class is bit-compatible with FFTW's complex-number format
307 (see the FFTW manual for more details).
308
309 \question 26aug:whynotfortran Why isn't FFTW written in Fortran/C++?
310
311 Because we don't like those languages, and neither approaches the
312 portability of C.
313
314 \question 29mar:singleprec How do I compile FFTW to run in single precision?
315
316 On a Unix system: \courier{configure --enable-float\}. On a non-Unix
317 system: edit \courier{config.h\} to \courier{#define\} the symbol
318 \courier{FFTW_SINGLE\} (for FFTW 3.x). In both cases, you must then
319 recompile FFTW. In FFTW 3, all FFTW identifiers will then begin with
320 \courier{fftwf_\} instead of \courier{fftw_\}.
321
322 \question 28mar:64bitk7 --enable-k7 does not work on x86-64
323
324 Support for --enable-k7 was discontinued in fftw-3.2.
325
326 The fftw-3.1 release supports --enable-k7. This option only works on
327 32-bit x86 machines that implement 3DNow!, including the AMD Athlon
328 and the AMD Opteron in 32-bit mode. --enable-k7 does not work on AMD
329 Opteron in 64-bit mode. Use --enable-sse for x86-64 machines.
330
331 FFTW supports 3DNow! by means of assembly code generated by a
332 special-purpose compiler. It is hard to produce assembly code that
333 works in both 32-bit and 64-bit mode.
334
335 \comment ######################################################################
336
337 \section Using FFTW
338
339 \question 15mar:fftw2to3 Why not support the FFTW 2 interface in FFTW 3?
340
341 FFTW 3 has semantics incompatible with earlier versions: its plans can
342 only be used for a given stride, multiplicity, and other
343 characteristics of the input and output arrays; these stronger
344 semantics are necessary for performance reasons. Thus, it is
345 impossible to efficiently emulate the older interface (whose plans can
346 be used for any transform of the same size). We believe that it
347 should be possible to upgrade most programs without any difficulty,
348 however.
349
350 \question 20mar:planperarray Why do FFTW 3 plans encapsulate the input/output arrays and not just the algorithm?
351
352 There are several reasons:
353
354 \call startlist
355 \call item
356 It was important for performance reasons that the plan be specific to
357 array characteristics like the stride (and alignment, for SIMD), and
358 requiring that the user maintain these invariants is error prone.
359 \call item
360 In most high-performance applications, as far as we can tell, you are
361 usually transforming the same array over and over, so FFTW's semantics
362 should not be a burden.
363 \call item
364 If you need to transform another array of the same size, creating a
365 new plan once the first exists is a cheap operation.
366 \call item
367 If you need to transform many arrays of the same size at once, you
368 should really use the \courier{plan_many\} routines in FFTW's "advanced"
369 interface.
370 \call item
371 If the abovementioned array characteristics are the same, you are
372 willing to pay close attention to the documentation, and you really
373 need to, we provide a "new-array execution" interface to apply a plan
374 to a new array.
375 \call endlist
376
377 \question 25may:slow FFTW seems really slow.
378
379 You are probably recreating the plan before every transform, rather
380 than creating it once and reusing it for all transforms of the same
381 size. FFTW is designed to be used in the following way:
382
383 \call startlist
384 \call item
385 First, you create a plan. This will take several seconds.
386 \call item
387 Then, you reuse the plan many times to perform FFTs. These are fast.
388 \call endlist
389
390 If you don't need to compute many transforms and the time for the
391 planner is significant, you have two options. First, you can use the
392 \courier{FFTW_ESTIMATE\} option in the planner, which uses heuristics
393 instead of runtime measurements and produces a good plan in a short
394 time. Second, you can use the wisdom feature to precompute the plan;
395 see \qref savePlans
396
397 \question 22oct:slows FFTW slows down after repeated calls.
398
399 Probably, NaNs or similar are creeping into your data, and the
400 slowdown is due to the resulting floating-point exceptions. For
401 example, be aware that repeatedly FFTing the same array is a diverging
402 process (because FFTW computes the unnormalized transform).
403
404 \question 22oct:segfault An FFTW routine is crashing when I call it.
405
406 Did the FFTW test programs pass (\courier{make check\}, or \courier{cd
407 tests; make bigcheck\} if you want to be paranoid)? If so, you almost
408 certainly have a bug in your own code. For example, you could be
409 passing invalid arguments (such as wrongly-sized arrays) to FFTW, or
410 you could simply have memory corruption elsewhere in your program that
411 causes random crashes later on. Please don't complain to us unless
412 you can come up with a minimal self-contained program (preferably
413 under 30 lines) that illustrates the problem.
414
415 \question 22oct:fortran64 My Fortran program crashes when calling FFTW.
416
417 As described in the manual, on 64-bit machines you must store the
418 plans in variables large enough to hold a pointer, for example
419 \courier{integer*8\}. We recommend using \courier{integer*8\} on
420 32-bit machines as well, to simplify porting.
421
422 \question 24mar:conventions FFTW gives results different from my old FFT.
423
424 People follow many different conventions for the DFT, and you should
425 be sure to know the ones that we use (described in the FFTW manual).
426 In particular, you should be aware that the
427 \courier{FFTW_FORWARD\}/\courier{FFTW_BACKWARD\} directions correspond
428 to signs of -1/+1 in the exponent of the DFT definition.
429 (\italic{Numerical Recipes\} uses the opposite convention.)
430
431 You should also know that we compute an unnormalized transform. In
432 contrast, Matlab is an example of program that computes a normalized
433 transform. See \qref whyscaled.
434
435 Finally, note that floating-point arithmetic is not exact, so
436 different FFT algorithms will give slightly different results (on the
437 order of the numerical accuracy; typically a fractional difference of
438 1e-15 or so in double precision).
439
440 \question 31aug:nondeterministic FFTW gives different results between runs
441
442 If you use \courier{FFTW_MEASURE\} or \courier{FFTW_PATIENT\} mode,
443 then the algorithm FFTW employs is not deterministic: it depends on
444 runtime performance measurements. This will cause the results to vary
445 slightly from run to run. However, the differences should be slight,
446 on the order of the floating-point precision, and therefore should
447 have no practical impact on most applications.
448
449 If you use saved plans (wisdom) or \courier{FFTW_ESTIMATE\} mode,
450 however, then the algorithm is deterministic and the results should be
451 identical between runs.
452
453 \question 26aug:savePlans Can I save FFTW's plans?
454
455 Yes. Starting with version 1.2, FFTW provides the \courier{wisdom\}
456 mechanism for saving plans; see the FFTW manual.
457
458 \question 14sep:whyscaled Why does your inverse transform return a scaled result?
459
460 Computing the forward transform followed by the backward transform (or
461 vice versa) yields the original array scaled by the size of the array.
462 (For multi-dimensional transforms, the size of the array is the
463 product of the dimensions.) We could, instead, have chosen a
464 normalization that would have returned the unscaled array. Or, to
465 accomodate the many conventions in this matter, the transform routines
466 could have accepted a "scale factor" parameter. We did not do this,
467 however, for two reasons. First, we didn't want to sacrifice
468 performance in the common case where the scale factor is 1. Second, in
469 real applications the FFT is followed or preceded by some computation
470 on the data, into which the scale factor can typically be absorbed at
471 little or no cost.
472
473 \question 02dec:centerorigin How can I make FFTW put the origin (zero frequency) at the center of its output?
474
475 For human viewing of a spectrum, it is often convenient to put the
476 origin in frequency space at the center of the output array, rather
477 than in the zero-th element (the default in FFTW). If all of the
478 dimensions of your array are even, you can accomplish this by simply
479 multiplying each element of the input array by (-1)^(i + j + ...),
480 where i, j, etcetera are the indices of the element. (This trick is a
481 general property of the DFT, and is not specific to FFTW.)
482
483 \question 08may:imageaudio How do I FFT an image/audio file in \italic{foobar\} format?
484
485 FFTW performs an FFT on an array of floating-point values. You can
486 certainly use it to compute the transform of an image or audio stream,
487 but you are responsible for figuring out your data format and
488 converting it to the form FFTW requires.
489
490 \question 09apr:linkfails My program does not link (on Unix).
491
492 The libraries must be listed in the correct order (\courier{-lfftw3
493 -lm\} for FFTW 3.x) and \italic{after\} your program sources/objects.
494 (The general rule is that if \italic{A\} uses \italic{B\}, then
495 \italic{A\} must be listed before \italic{B\} in the link command.).
496
497 \question 15mar:linkheader I included your header, but linking still fails.
498
499 You're a C++ programmer, aren't you? You have to compile the FFTW
500 library and link it into your program, not just \courier{#include
501 <fftw3.h>\}. (Yes, this is really a FAQ.)
502
503 \question 22oct:nostack My program crashes, complaining about stack space.
504
505 You cannot declare large arrays with automatic storage (e.g. via
506 \courier{fftw_complex array[N]\}); you should use
507 \courier{fftw_malloc\} (or equivalent) to allocate the arrays you want
508 to transform if they are larger than a few hundred elements.
509
510 \question 13may:leaks FFTW seems to have a memory leak.
511
512 After you create a plan, FFTW caches the information required to
513 quickly recreate the plan. (See \qref savePlans) It also maintains a
514 small amount of other persistent memory. You can deallocate all of
515 FFTW's internally allocated memory, if you wish, by calling
516 \courier{fftw_cleanup()\}, as documented in the manual.
517
518 \question 16may:allzero The output of FFTW's transform is all zeros.
519
520 You should initialize your input array \italic{after\} creating the
521 plan, unless you use \courier{FFTW_ESTIMATE\}: planning with
522 \courier{FFTW_MEASURE\} or \courier{FFTW_PATIENT\} overwrites the
523 input/output arrays, as described in the manual.
524
525 \question 05sep:vbetalia How do I call FFTW from the Microsoft language du jour?
526
527 Please \italic{do not\} ask us Windows-specific questions. We do not
528 use Windows. We know nothing about Visual Basic, Visual C++, or .NET.
529 Please find the appropriate Usenet discussion group and ask your
530 question there. See also \qref runOnWindows.
531
532 \question 15oct:pruned Can I compute only a subset of the DFT outputs?
533
534 In general, no, an FFT intrinsically computes all outputs from all
535 inputs. In principle, there is something called a \italic{pruned
536 FFT\} that can do what you want, but to compute K outputs out of N the
537 complexity is in general O(N log K) instead of O(N log N), thus saving
538 only a small additive factor in the log. (The same argument holds if
539 you instead have only K nonzero inputs.)
540
541 There are some specific cases in which you can get the O(N log K)
542 performance benefits easily, however, by combining a few ordinary
543 FFTs. In particular, the case where you want the first K outputs,
544 where K divides N, can be handled by performing N/K transforms of size
545 K and then summing the outputs multiplied by appropriate phase
546 factors. For more details, see \docref{pruned FFTs with FFTW\}.
547
548 There are also some algorithms that compute pruned transforms
549 \italic{approximately\}, but they are beyond the scope of this FAQ.
550
551 \question 21jan:transpose Can I use FFTW's routines for in-place and out-of-place matrix transposition?
552
553 You can use the FFTW guru interface to create a rank-0 transform of
554 vector rank 2 where the vector strides are transposed. (A rank-0
555 transform is equivalent to a 1D transform of size 1, which. just
556 copies the input into the output.) Specifying the same location for
557 the input and output makes the transpose in-place.
558
559 For double-valued data stored in row-major format, plan creation looks like
560 this:
561
562 \verbatim
563 fftw_plan plan_transpose(int rows, int cols, double *in, double *out)
564 {
565 const unsigned flags = FFTW_ESTIMATE; /* other flags are possible */
566 fftw_iodim howmany_dims[2];
567
568 howmany_dims[0].n = rows;
569 howmany_dims[0].is = cols;
570 howmany_dims[0].os = 1;
571
572 howmany_dims[1].n = cols;
573 howmany_dims[1].is = 1;
574 howmany_dims[1].os = rows;
575
576 return fftw_plan_guru_r2r(/*rank=*/ 0, /*dims=*/ NULL,
577 /*howmany_rank=*/ 2, howmany_dims,
578 in, out, /*kind=*/ NULL, flags);
579 }
580 \endverbatim
581
582 (This entry was written by Rhys Ulerich.)
583
584 \comment ######################################################################
585
586 \section Internals of FFTW
587
588 \question 26aug:howworks How does FFTW work?
589
590 The innovation (if it can be so called) in FFTW consists in having a
591 variety of composable \italic{solvers\}, representing different FFT
592 algorithms and implementation strategies, whose combination into a
593 particular \italic{plan\} for a given size can be determined at
594 runtime according to the characteristics of your machine/compiler.
595 This peculiar software architecture allows FFTW to adapt itself to
596 almost any machine.
597
598 For more details (albeit somewhat outdated), see the paper "FFTW: An
599 Adaptive Software Architecture for the FFT", by M. Frigo and
600 S. G. Johnson, \italic{Proc. ICASSP\} 3, 1381 (1998), also
601 available at \docref{the FFTW web page\}.
602
603 \question 26aug:whyfast Why is FFTW so fast?
604
605 This is a complex question, and there is no simple answer. In fact,
606 the authors do not fully know the answer, either. In addition to many
607 small performance hacks throughout FFTW, there are three general
608 reasons for FFTW's speed.
609
610 \call startlist
611 \call item
612 FFTW uses a variety of FFT algorithms and implementation styles
613 that can be arbitrarily composed to adapt itself to
614 a machine. See \qref howworks.
615 \call item
616 FFTW uses a code generator to produce highly-optimized
617 routines for computing small transforms.
618 \call item
619 FFTW uses explicit divide-and-conquer to take advantage
620 of the memory hierarchy.
621 \call endlist
622
623 For more details (albeit somewhat outdated), see the paper "FFTW: An
624 Adaptive Software Architecture for the FFT", by M. Frigo and
625 S. G. Johnson, \italic{Proc. ICASSP\} 3, 1381 (1998),
626 available along with other references at \docref{the FFTW web page\}.
627
628 \comment ######################################################################
629
630 \section Known bugs
631
632 \question 27aug:rfftwndbug FFTW 1.1 crashes in rfftwnd on Linux.
633
634 This bug was fixed in FFTW 1.2. There was a bug in \courier{rfftwnd\}
635 causing an incorrect amount of memory to be allocated. The bug showed
636 up in Linux with libc-5.3.12 (and nowhere else that we know of).
637
638 \question 15oct:fftwmpibug The MPI transforms in FFTW 1.2 give incorrect results/leak memory.
639
640 These bugs were corrected in FFTW 1.2.1. The MPI transforms (really,
641 just the transpose routines) in FFTW 1.2 had bugs that could cause
642 errors in some situations.
643
644 \question 05nov:testsingbug The test programs in FFTW 1.2.1 fail when I change FFTW to use single precision.
645
646 This bug was fixed in FFTW 1.3. (Older versions of FFTW did
647 work in single precision, but the test programs didn't--the error
648 tolerances in the tests were set for double precision.)
649
650 \question 24mar:teststoobig The test program in FFTW 1.2.1 fails for n > 46340.
651
652 This bug was fixed in FFTW 1.3. FFTW 1.2.1 produced the right answer,
653 but the test program was wrong. For large n, n*n in the naive
654 transform that we used for comparison overflows 32 bit integer
655 precision, breaking the test.
656
657 \question 24aug:linuxthreads The threaded code fails on Linux Redhat 5.0
658
659 We had problems with glibc-2.0.5. The code should work with
660 glibc-2.0.7.
661
662 \question 26sep:bigrfftwnd FFTW 2.0's rfftwnd fails for rank > 1 transforms with a final dimension >= 65536.
663
664 This bug was fixed in FFTW 2.0.1. (There was a 32-bit integer overflow due
665 to a poorly-parenthesized expression.)
666
667 \question 26mar:primebug FFTW 2.0's complex transforms give the wrong results with prime factors 17 to 97.
668
669 There was a bug in the complex transforms that could cause incorrect
670 results under (hopefully rare) circumstances for lengths with
671 intermediate-size prime factors (17-97). This bug was fixed in FFTW
672 2.1.1.
673
674 \question 05apr:mpichbug FFTW 2.1.1's MPI test programs crash with MPICH.
675
676 This bug was fixed in FFTW 2.1.2. The 2.1/2.1.1 MPI test programs crashed
677 when using the MPICH implementation of MPI with the \courier{ch_p4\}
678 device (TCP/IP); the transforms themselves worked fine.
679
680 \question 25may:aixthreadbug FFTW 2.1.2's multi-threaded transforms don't work on AIX.
681
682 This bug was fixed in FFTW 2.1.3. The multi-threaded transforms in
683 previous versions didn't work with AIX's \courier{pthreads\}
684 implementation, which idiosyncratically creates threads in detached
685 (non-joinable) mode by default.
686
687 \question 27sep:bigprimebug FFTW 2.1.2's complex transforms give incorrect results for large prime sizes.
688
689 This bug was fixed in FFTW 2.1.3. FFTW's complex-transform algorithm
690 for prime sizes (in versions 2.0 to 2.1.2) had an integer overflow
691 problem that caused incorrect results for many primes greater than
692 32768 (on 32-bit machines). (Sizes without large prime factors are
693 not affected.)
694
695 \question 25may:solaristhreadbug FFTW 2.1.3's multi-threaded transforms don't give any speedup on Solaris.
696
697 This bug was fixed in FFTW 2.1.4. (By default, Solaris creates
698 threads that do not parallelize over multiple processors, so one has
699 to request the proper behavior specifically.)
700
701 \question 03may:aixflags FFTW 2.1.3 crashes on AIX.
702
703 The FFTW 2.1.3 \courier{configure\} script picked incorrect compiler
704 flags for the \courier{xlc\} compiler on newer IBM processors. This
705 is fixed in FFTW 2.1.4.
706
707 \comment Here it ends!
708