comparison src/fftw-3.3.3/doc/FAQ/fftw-faq.bfnn @ 10:37bf6b4a2645

Add FFTW3
author Chris Cannam
date Wed, 20 Mar 2013 15:35:50 +0000
parents
children
comparison
equal deleted inserted replaced
9:c0fb53affa76 10:37bf6b4a2645
1 \comment This is the source for the FFTW FAQ list, in
2 \comment the Bizarre Format With No Name. It is turned into Lout
3 \comment input, HTML, plain ASCII and an Info document by a Perl script.
4 \comment
5 \comment The format and scripts come from the Linux FAQ, by
6 \comment Ian Jackson.
7 \set brieftitle FFTW FAQ
8 \set author <A href="http://www.fftw.org">Matteo Frigo and Steven G. Johnson</A> / <A href="mailto:fftw@fftw.org">fftw@fftw.org</A>
9 \set authormail fftw@fftw.org
10 \set title FFTW Frequently Asked Questions with Answers
11 \set copyholder Matteo Frigo and Massachusetts Institute of Technology
12 \call-html startup html.refs2
13 \copyto ASCII
14 FFTW FREQUENTLY ASKED QUESTIONS WITH ANSWERS
15 `date '+%d %h %Y'`
16 Matteo Frigo
17 Steven G. Johnson
18 <fftw@fftw.org>
19
20 \endcopy
21 \copyto INFO
22 START-INFO-DIR-ENTRY
23 * FFTW FAQ: (fftw-faq). FFTW Frequently Asked Questions with Answers.
24 END-INFO-DIR-ENTRY
25
26 
27 File: $prefix.info, Node: Top, Next: Question 1.1, Up: (dir)
28
29 FFTW FREQUENTLY ASKED QUESTIONS WITH ANSWERS
30 `date '+%d %h %Y'`
31 Matteo Frigo
32 Steven G. Johnson
33 <fftw@fftw.org>
34
35 \endcopy
36
37 This is the list of Frequently Asked Questions about FFTW, a
38 collection of fast C routines for computing the Discrete Fourier
39 Transform in one or more dimensions.
40
41 \section Index
42
43 \index
44
45 \comment ######################################################################
46
47 \section Introduction and General Information
48
49 \question 26aug:whatisfftw What is FFTW?
50
51 FFTW is a free collection of fast C routines for computing the
52 Discrete Fourier Transform in one or more dimensions. It includes
53 complex, real, symmetric, and parallel transforms, and can handle
54 arbitrary array sizes efficiently. FFTW is typically faster than
55 other publically-available FFT implementations, and is even
56 competitive with vendor-tuned libraries. (See our web page for
57 extensive benchmarks.) To achieve this performance, FFTW uses novel
58 code-generation and runtime self-optimization techniques (along with
59 many other tricks).
60
61 \question 26aug:whereisfftw How do I obtain FFTW?
62
63 FFTW can be found at \docref{the FFTW web page\}. You can also
64 retrieve it from \ftpon ftp.fftw.org in \ftpin /pub/fftw.
65
66 \question 26aug:isfftwfree Is FFTW free software?
67
68 Starting with version 1.3, FFTW is Free Software in the technical
69 sense defined by the Free Software Foundation (see \docref{Categories
70 of Free and Non-Free Software\}), and is distributed under the terms
71 of the GNU General Public License. Previous versions of FFTW were
72 distributed without fee for noncommercial use, but were not
73 technically ``free.''
74
75 Non-free licenses for FFTW are also available that permit different
76 terms of use than the GPL.
77
78 \question 10apr:nonfree What is this about non-free licenses?
79
80 The non-free licenses are for companies that wish to use FFTW in their
81 products but are unwilling to release their software under the GPL
82 (which would require them to release source code and allow free
83 redistribution). Such users can purchase an unlimited-use license
84 from MIT. Contact us for more details.
85
86 We could instead have released FFTW under the LGPL, or even disallowed
87 non-Free usage. Suffice it to say, however, that MIT owns the
88 copyright to FFTW and they only let us GPL it because we convinced
89 them that it would neither affect their licensing revenue nor irritate
90 existing licensees.
91
92 \question 24oct:west In the West? I thought MIT was in the East?
93
94 Not to an Italian. You could say that we're a Spaghetti Western
95 (with apologies to Sergio Leone).
96
97 \comment ######################################################################
98
99 \section Installing FFTW
100
101 \question 26aug:systems Which systems does FFTW run on?
102
103 FFTW is written in ANSI C, and should work on any system with a decent
104 C compiler. (See also \qref runOnWindows, \qref compilerCrashes.)
105 FFTW can also take advantage of certain hardware-specific features,
106 such as cycle counters and SIMD instructions, but this is optional.
107
108 \question 26aug:runOnWindows Does FFTW run on Windows?
109
110 Yes, many people have reported successfully using FFTW on Windows with
111 various compilers. FFTW was not developed on Windows, but the source
112 code is essentially straight ANSI C. See also the \docref{FFTW
113 Windows installation notes\}, \qref compilerCrashes, and \qref
114 vbetalia.
115
116 \question 26aug:compilerCrashes My compiler has trouble with FFTW.
117
118 Complain fiercely to the vendor of the compiler.
119
120 We have successfully used \courier{gcc\} 3.2.x on x86 and PPC, a
121 recent Compaq C compiler for Alpha, version 6 of IBM's \courier{xlc\}
122 compiler for AIX, Intel's \courier{icc\} versions 5-7, and Sun
123 WorkShop \courier{cc\} version 6.
124
125 FFTW is likely to push compilers to their limits, however, and several
126 compiler bugs have been exposed by FFTW. A partial list follows.
127
128 \courier{gcc\} 2.95.x for Solaris/SPARC produces incorrect code for
129 the test program (workaround: recompile the \courier{libbench2\}
130 directory with \courier{-O2\}).
131
132 NetBSD/macppc 1.6 comes with a \courier{gcc\} version that also
133 miscompiles the test program. (Please report a workaround if you know
134 one.)
135
136 \courier{gcc\} 3.2.3 for ARM reportedly crashes during compilation.
137 This bug is reportedly fixed in later versions of \courier{gcc\}.
138
139 Versions 8.0 and 8.1 of Intel's \courier{icc\} falsely claim to be
140 \courier{gcc\}, so you should specify \courier{CC="icc -no-gcc"\};
141 this is automatic in FFTW 3.1. \courier{icc-8.0.066\} reportely
142 produces incorrect code for FFTW 2.1.5, but is fixed in version 8.1.
143 \courier{icc-7.1\} compiler build 20030402Z appears to produce
144 incorrect dependencies, causing the compilation to fail.
145 \courier{icc-7.1\} build 20030307Z appears to work fine. (Use
146 \courier{icc -V\} to check which build you have.) As of 2003/04/18,
147 build 20030402Z appears not to be available any longer on Intel's
148 website, whereas the older build 20030307Z is available.
149
150 \courier{ranlib\} of GNU \courier{binutils\} 2.9.1 on Irix has been
151 observed to corrupt the FFTW libraries, causing a link failure when
152 FFTW is compiled. Since \courier{ranlib\} is completely superfluous
153 on Irix, we suggest deleting it from your system and replacing it with
154 a symbolic link to \courier{/bin/echo\}.
155
156 If support for SIMD instructions is enabled in FFTW, further compiler
157 problems may appear:
158
159 \courier{gcc\} 3.4.[0123] for x86 produces incorrect SSE2 code for
160 FFTW when \courier{-O2\} (the best choice for FFTW) is used, causing
161 FFTW to crash (\courier{make check\} crashes). This bug is fixed in
162 \courier{gcc\} 3.4.4. On x86_64 (amd64/em64t), \courier{gcc\} 3.4.4
163 reportedly still has a similar problem, but this is fixed as of
164 \courier{gcc\} 3.4.6.
165
166 \courier{gcc-3.2\} for x86 produces incorrect SIMD code if
167 \courier{-O3\} is used. The same compiler produces incorrect SIMD
168 code if no optimization is used, too. When using \courier{gcc-3.2\},
169 it is a good idea not to change the default \courier{CFLAGS\} selected
170 by the \courier{configure\} script.
171
172 Some 3.0.x and 3.1.x versions of \courier{gcc\} on \courier{x86\} may
173 crash. \courier{gcc\} so-called 2.96 shipping with RedHat 7.3 crashes
174 when compiling SIMD code. In both cases, please upgrade to
175 \courier{gcc-3.2\} or later.
176
177 Intel's \courier{icc\} 6.0 misaligns SSE constants, but FFTW has a
178 workaround. \courier{icc\} 8.x fails to compile FFTW 3.0.x because it
179 falsely claims to be \courier{gcc\}; we believe this to be a bug in
180 \courier{icc\}, but FFTW 3.1 has a workaround.
181
182 Visual C++ 2003 reportedly produces incorrect code for SSE/SSE2 when
183 compiling FFTW. This bug was reportedly fixed in VC++ 2005;
184 alternatively, you could switch to the Intel compiler. VC++ 6.0 also
185 reportedly produces incorrect code for the file
186 \courier{reodft11e-r2hc-odd.c\} unless optimizations are disabled for
187 that file.
188
189 \courier{gcc\} 2.95 on MacOS X miscompiles AltiVec code (fixed in
190 later versions). \courier{gcc\} 3.2.x miscompiles AltiVec
191 permutations, but FFTW has a workaround. \courier{gcc\} 4.0.1 on
192 MacOS for Intel crashes when compiling FFTW; a workaround is to
193 compile one file without optimization: \courier{cd kernel; make
194 CFLAGS=" " trig.lo\}.
195
196 \courier{gcc\} 4.1.1 reportedly crashes when compiling FFTW for MIPS;
197 the workaround is to compile the file it crashes on
198 (\courier{t2_64.c\}) with a lower optimization level.
199
200 \courier{gcc\} versions 4.1.2 to 4.2.0 for x86 reportedly miscompile
201 FFTW 3.1's test program, causing \courier{make check\} to crash
202 (\courier{gcc\} bug #26528). The bug was reportedly fixed in
203 \courier{gcc\} version 4.2.1 and later. A workaround is to compile
204 \courier{libbench2/verify-lib.c\} without optimization.
205
206 \question 26aug:solarisSucks FFTW does not compile on Solaris, complaining about \courier{const\}.
207
208 We know that at least on Solaris 2.5.x with Sun's compilers 4.2 you
209 might get error messages from \courier{make\} such as
210
211 \courier{"./fftw.h", line 88: warning: const is a keyword in ANSI C\}
212
213 This is the case when the \courier{configure\} script reports that
214 \courier{const\} does not work:
215
216 \courier{checking for working const... (cached) no\}
217
218 You should be aware that Solaris comes with two compilers, namely,
219 \courier{/opt/SUNWspro/SC4.2/bin/cc\} and \courier{/usr/ucb/cc\}. The
220 latter compiler is non-ANSI. Indeed, it is a perverse shell script
221 that calls the real compiler in non-ANSI mode. In order
222 to compile FFTW, change your path so that the right \courier{cc\}
223 is used.
224
225 To know whether your compiler is the right one, type
226 \courier{cc -V\}. If the compiler prints ``\courier{ucbcc\}'',
227 as in
228
229 \courier{ucbcc: WorkShop Compilers 4.2 30 Oct 1996 C 4.2\}
230
231 then the compiler is wrong. The right message is something like
232
233 \courier{cc: WorkShop Compilers 4.2 30 Oct 1996 C 4.2\}
234
235 \question 19mar:3dnow What's the difference between \courier{--enable-3dnow\} and \courier{--enable-k7\}?
236
237 \courier{--enable-k7\} enables 3DNow! instructions on K7 processors
238 (AMD Athlon and its variants). K7 support is provided by assembly
239 routines generated by a special purpose compiler.
240 As of fftw-3.2, --enable-k7 is no longer supported.
241
242 \courier{--enable-3dnow\} enables generic 3DNow! support using
243 \courier{gcc\} builtin functions. This works on earlier AMD
244 processors, but it is not as fast as our special assembly routines.
245 As of fftw-3.1, --enable-3dnow is no longer supported.
246
247 \question 18apr:fma What's the difference between the fma and the non-fma versions?
248
249 The fma version tries to exploit the fused multiply-add instructions
250 implemented in many processors such as PowerPC, ia-64, and MIPS. The
251 two FFTW packages are otherwise identical. In FFTW 3.1, the fma and
252 non-fma versions were merged together into a single package, and the
253 \courier{configure\} script attempts to automatically guess which
254 version to use.
255
256 The FFTW 3.1 \courier{configure\} script enables fma by default on
257 PowerPC, Itanium, and PA-RISC, and disables it otherwise. You can
258 force one or the other by using the \courier{--enable-fma\} or
259 \courier{--disable-fma\} flag for \courier{configure\}.
260
261 Definitely use fma if you have a PowerPC-based system with
262 \courier{gcc\} (or IBM \courier{xlc\}). This includes all GNU/Linux
263 systems for PowerPC and the older PowerPC-based MacOS systems. Also
264 use it on PA-RISC and Itanium with the HP/UX compiler.
265
266 Definitely do not use the fma version if you have an ia-32 processor
267 (Intel, AMD, MacOS on Intel, etcetera).
268
269 For other architectures/compilers, the situation is not so clear. For
270 example, ia-64 has the fma instruction, but \courier{gcc-3.2\} appears
271 not to exploit it correctly. Other compilers may do the right thing,
272 but we have not tried them. Please send us your feedback so that we
273 can update this FAQ entry.
274
275 \question 26aug:languages Which language is FFTW written in?
276
277 FFTW is written in ANSI C. Most of the code, however, was
278 automatically generated by a program called \courier{genfft\}, written
279 in the Objective Caml dialect of ML. You do not need to know ML or to
280 have an Objective Caml compiler in order to use FFTW.
281
282 \courier{genfft\} is provided with the FFTW sources, which means that
283 you can play with the code generator if you want. In this case, you
284 need a working Objective Caml system. Objective Caml is available
285 from \docref{the Caml web page\}.
286
287 \question 26aug:fortran Can I call FFTW from Fortran?
288
289 Yes, FFTW (versions 1.3 and higher) contains a Fortran-callable
290 interface, documented in the FFTW manual.
291
292 By default, FFTW configures its Fortran interface to work with the
293 first compiler it finds, e.g. \courier{g77\}. To configure for a
294 different, incompatible Fortran compiler \courier{foobar\}, use
295 \courier{./configure F77=foobar\} when installing FFTW. (In the case
296 of \courier{g77\}, however, FFTW 3.x also includes an extra set of
297 Fortran-callable routines with one less underscore at the end of
298 identifiers, which should cover most other Fortran compilers on Linux
299 at least.)
300
301 \question 26aug:cplusplus Can I call FFTW from C++?
302
303 Most definitely. FFTW should compile and/or link under any C++
304 compiler. Moreover, it is likely that the C++ \courier{<complex>\}
305 template class is bit-compatible with FFTW's complex-number format
306 (see the FFTW manual for more details).
307
308 \question 26aug:whynotfortran Why isn't FFTW written in Fortran/C++?
309
310 Because we don't like those languages, and neither approaches the
311 portability of C.
312
313 \question 29mar:singleprec How do I compile FFTW to run in single precision?
314
315 On a Unix system: \courier{configure --enable-float\}. On a non-Unix
316 system: edit \courier{config.h\} to \courier{#define\} the symbol
317 \courier{FFTW_SINGLE\} (for FFTW 3.x). In both cases, you must then
318 recompile FFTW. In FFTW 3, all FFTW identifiers will then begin with
319 \courier{fftwf_\} instead of \courier{fftw_\}.
320
321 \question 28mar:64bitk7 --enable-k7 does not work on x86-64
322
323 Support for --enable-k7 was discontinued in fftw-3.2.
324
325 The fftw-3.1 release supports --enable-k7. This option only works on
326 32-bit x86 machines that implement 3DNow!, including the AMD Athlon
327 and the AMD Opteron in 32-bit mode. --enable-k7 does not work on AMD
328 Opteron in 64-bit mode. Use --enable-sse for x86-64 machines.
329
330 FFTW supports 3DNow! by means of assembly code generated by a
331 special-purpose compiler. It is hard to produce assembly code that
332 works in both 32-bit and 64-bit mode.
333
334 \comment ######################################################################
335
336 \section Using FFTW
337
338 \question 15mar:fftw2to3 Why not support the FFTW 2 interface in FFTW 3?
339
340 FFTW 3 has semantics incompatible with earlier versions: its plans can
341 only be used for a given stride, multiplicity, and other
342 characteristics of the input and output arrays; these stronger
343 semantics are necessary for performance reasons. Thus, it is
344 impossible to efficiently emulate the older interface (whose plans can
345 be used for any transform of the same size). We believe that it
346 should be possible to upgrade most programs without any difficulty,
347 however.
348
349 \question 20mar:planperarray Why do FFTW 3 plans encapsulate the input/output arrays and not just the algorithm?
350
351 There are several reasons:
352
353 \call startlist
354 \call item
355 It was important for performance reasons that the plan be specific to
356 array characteristics like the stride (and alignment, for SIMD), and
357 requiring that the user maintain these invariants is error prone.
358 \call item
359 In most high-performance applications, as far as we can tell, you are
360 usually transforming the same array over and over, so FFTW's semantics
361 should not be a burden.
362 \call item
363 If you need to transform another array of the same size, creating a
364 new plan once the first exists is a cheap operation.
365 \call item
366 If you need to transform many arrays of the same size at once, you
367 should really use the \courier{plan_many\} routines in FFTW's "advanced"
368 interface.
369 \call item
370 If the abovementioned array characteristics are the same, you are
371 willing to pay close attention to the documentation, and you really
372 need to, we provide a "new-array execution" interface to apply a plan
373 to a new array.
374 \call endlist
375
376 \question 25may:slow FFTW seems really slow.
377
378 You are probably recreating the plan before every transform, rather
379 than creating it once and reusing it for all transforms of the same
380 size. FFTW is designed to be used in the following way:
381
382 \call startlist
383 \call item
384 First, you create a plan. This will take several seconds.
385 \call item
386 Then, you reuse the plan many times to perform FFTs. These are fast.
387 \call endlist
388
389 If you don't need to compute many transforms and the time for the
390 planner is significant, you have two options. First, you can use the
391 \courier{FFTW_ESTIMATE\} option in the planner, which uses heuristics
392 instead of runtime measurements and produces a good plan in a short
393 time. Second, you can use the wisdom feature to precompute the plan;
394 see \qref savePlans
395
396 \question 22oct:slows FFTW slows down after repeated calls.
397
398 Probably, NaNs or similar are creeping into your data, and the
399 slowdown is due to the resulting floating-point exceptions. For
400 example, be aware that repeatedly FFTing the same array is a diverging
401 process (because FFTW computes the unnormalized transform).
402
403 \question 22oct:segfault An FFTW routine is crashing when I call it.
404
405 Did the FFTW test programs pass (\courier{make check\}, or \courier{cd
406 tests; make bigcheck\} if you want to be paranoid)? If so, you almost
407 certainly have a bug in your own code. For example, you could be
408 passing invalid arguments (such as wrongly-sized arrays) to FFTW, or
409 you could simply have memory corruption elsewhere in your program that
410 causes random crashes later on. Please don't complain to us unless
411 you can come up with a minimal self-contained program (preferably
412 under 30 lines) that illustrates the problem.
413
414 \question 22oct:fortran64 My Fortran program crashes when calling FFTW.
415
416 As described in the manual, on 64-bit machines you must store the
417 plans in variables large enough to hold a pointer, for example
418 \courier{integer*8\}. We recommend using \courier{integer*8\} on
419 32-bit machines as well, to simplify porting.
420
421 \question 24mar:conventions FFTW gives results different from my old FFT.
422
423 People follow many different conventions for the DFT, and you should
424 be sure to know the ones that we use (described in the FFTW manual).
425 In particular, you should be aware that the
426 \courier{FFTW_FORWARD\}/\courier{FFTW_BACKWARD\} directions correspond
427 to signs of -1/+1 in the exponent of the DFT definition.
428 (\italic{Numerical Recipes\} uses the opposite convention.)
429
430 You should also know that we compute an unnormalized transform. In
431 contrast, Matlab is an example of program that computes a normalized
432 transform. See \qref whyscaled.
433
434 Finally, note that floating-point arithmetic is not exact, so
435 different FFT algorithms will give slightly different results (on the
436 order of the numerical accuracy; typically a fractional difference of
437 1e-15 or so in double precision).
438
439 \question 31aug:nondeterministic FFTW gives different results between runs
440
441 If you use \courier{FFTW_MEASURE\} or \courier{FFTW_PATIENT\} mode,
442 then the algorithm FFTW employs is not deterministic: it depends on
443 runtime performance measurements. This will cause the results to vary
444 slightly from run to run. However, the differences should be slight,
445 on the order of the floating-point precision, and therefore should
446 have no practical impact on most applications.
447
448 If you use saved plans (wisdom) or \courier{FFTW_ESTIMATE\} mode,
449 however, then the algorithm is deterministic and the results should be
450 identical between runs.
451
452 \question 26aug:savePlans Can I save FFTW's plans?
453
454 Yes. Starting with version 1.2, FFTW provides the \courier{wisdom\}
455 mechanism for saving plans; see the FFTW manual.
456
457 \question 14sep:whyscaled Why does your inverse transform return a scaled result?
458
459 Computing the forward transform followed by the backward transform (or
460 vice versa) yields the original array scaled by the size of the array.
461 (For multi-dimensional transforms, the size of the array is the
462 product of the dimensions.) We could, instead, have chosen a
463 normalization that would have returned the unscaled array. Or, to
464 accomodate the many conventions in this matter, the transform routines
465 could have accepted a "scale factor" parameter. We did not do this,
466 however, for two reasons. First, we didn't want to sacrifice
467 performance in the common case where the scale factor is 1. Second, in
468 real applications the FFT is followed or preceded by some computation
469 on the data, into which the scale factor can typically be absorbed at
470 little or no cost.
471
472 \question 02dec:centerorigin How can I make FFTW put the origin (zero frequency) at the center of its output?
473
474 For human viewing of a spectrum, it is often convenient to put the
475 origin in frequency space at the center of the output array, rather
476 than in the zero-th element (the default in FFTW). If all of the
477 dimensions of your array are even, you can accomplish this by simply
478 multiplying each element of the input array by (-1)^(i + j + ...),
479 where i, j, etcetera are the indices of the element. (This trick is a
480 general property of the DFT, and is not specific to FFTW.)
481
482 \question 08may:imageaudio How do I FFT an image/audio file in \italic{foobar\} format?
483
484 FFTW performs an FFT on an array of floating-point values. You can
485 certainly use it to compute the transform of an image or audio stream,
486 but you are responsible for figuring out your data format and
487 converting it to the form FFTW requires.
488
489 \question 09apr:linkfails My program does not link (on Unix).
490
491 The libraries must be listed in the correct order (\courier{-lfftw3
492 -lm\} for FFTW 3.x) and \italic{after\} your program sources/objects.
493 (The general rule is that if \italic{A\} uses \italic{B\}, then
494 \italic{A\} must be listed before \italic{B\} in the link command.).
495
496 \question 15mar:linkheader I included your header, but linking still fails.
497
498 You're a C++ programmer, aren't you? You have to compile the FFTW
499 library and link it into your program, not just \courier{#include
500 <fftw3.h>\}. (Yes, this is really a FAQ.)
501
502 \question 22oct:nostack My program crashes, complaining about stack space.
503
504 You cannot declare large arrays with automatic storage (e.g. via
505 \courier{fftw_complex array[N]\}); you should use
506 \courier{fftw_malloc\} (or equivalent) to allocate the arrays you want
507 to transform if they are larger than a few hundred elements.
508
509 \question 13may:leaks FFTW seems to have a memory leak.
510
511 After you create a plan, FFTW caches the information required to
512 quickly recreate the plan. (See \qref savePlans) It also maintains a
513 small amount of other persistent memory. You can deallocate all of
514 FFTW's internally allocated memory, if you wish, by calling
515 \courier{fftw_cleanup()\}, as documented in the manual.
516
517 \question 16may:allzero The output of FFTW's transform is all zeros.
518
519 You should initialize your input array \italic{after\} creating the
520 plan, unless you use \courier{FFTW_ESTIMATE\}: planning with
521 \courier{FFTW_MEASURE\} or \courier{FFTW_PATIENT\} overwrites the
522 input/output arrays, as described in the manual.
523
524 \question 05sep:vbetalia How do I call FFTW from the Microsoft language du jour?
525
526 Please \italic{do not\} ask us Windows-specific questions. We do not
527 use Windows. We know nothing about Visual Basic, Visual C++, or .NET.
528 Please find the appropriate Usenet discussion group and ask your
529 question there. See also \qref runOnWindows.
530
531 \question 15oct:pruned Can I compute only a subset of the DFT outputs?
532
533 In general, no, an FFT intrinsically computes all outputs from all
534 inputs. In principle, there is something called a \italic{pruned
535 FFT\} that can do what you want, but to compute K outputs out of N the
536 complexity is in general O(N log K) instead of O(N log N), thus saving
537 only a small additive factor in the log. (The same argument holds if
538 you instead have only K nonzero inputs.)
539
540 There are some specific cases in which you can get the O(N log K)
541 performance benefits easily, however, by combining a few ordinary
542 FFTs. In particular, the case where you want the first K outputs,
543 where K divides N, can be handled by performing N/K transforms of size
544 K and then summing the outputs multiplied by appropriate phase
545 factors. For more details, see \docref{pruned FFTs with FFTW\}.
546
547 There are also some algorithms that compute pruned transforms
548 \italic{approximately\}, but they are beyond the scope of this FAQ.
549
550 \question 21jan:transpose Can I use FFTW's routines for in-place and out-of-place matrix transposition?
551
552 You can use the FFTW guru interface to create a rank-0 transform of
553 vector rank 2 where the vector strides are transposed. (A rank-0
554 transform is equivalent to a 1D transform of size 1, which. just
555 copies the input into the output.) Specifying the same location for
556 the input and output makes the transpose in-place.
557
558 For double-valued data stored in row-major format, plan creation looks like
559 this:
560
561 \verbatim
562 fftw_plan plan_transpose(int rows, int cols, double *in, double *out)
563 {
564 const unsigned flags = FFTW_ESTIMATE; /* other flags are possible */
565 fftw_iodim howmany_dims[2];
566
567 howmany_dims[0].n = rows;
568 howmany_dims[0].is = cols;
569 howmany_dims[0].os = 1;
570
571 howmany_dims[1].n = cols;
572 howmany_dims[1].is = 1;
573 howmany_dims[1].os = rows;
574
575 return fftw_plan_guru_r2r(/*rank=*/ 0, /*dims=*/ NULL,
576 /*howmany_rank=*/ 2, howmany_dims,
577 in, out, /*kind=*/ NULL, flags);
578 }
579 \endverbatim
580
581 (This entry was written by Rhys Ulerich.)
582
583 \comment ######################################################################
584
585 \section Internals of FFTW
586
587 \question 26aug:howworks How does FFTW work?
588
589 The innovation (if it can be so called) in FFTW consists in having a
590 variety of composable \italic{solvers\}, representing different FFT
591 algorithms and implementation strategies, whose combination into a
592 particular \italic{plan\} for a given size can be determined at
593 runtime according to the characteristics of your machine/compiler.
594 This peculiar software architecture allows FFTW to adapt itself to
595 almost any machine.
596
597 For more details (albeit somewhat outdated), see the paper "FFTW: An
598 Adaptive Software Architecture for the FFT", by M. Frigo and
599 S. G. Johnson, \italic{Proc. ICASSP\} 3, 1381 (1998), also
600 available at \docref{the FFTW web page\}.
601
602 \question 26aug:whyfast Why is FFTW so fast?
603
604 This is a complex question, and there is no simple answer. In fact,
605 the authors do not fully know the answer, either. In addition to many
606 small performance hacks throughout FFTW, there are three general
607 reasons for FFTW's speed.
608
609 \call startlist
610 \call item
611 FFTW uses a variety of FFT algorithms and implementation styles
612 that can be arbitrarily composed to adapt itself to
613 a machine. See \qref howworks.
614 \call item
615 FFTW uses a code generator to produce highly-optimized
616 routines for computing small transforms.
617 \call item
618 FFTW uses explicit divide-and-conquer to take advantage
619 of the memory hierarchy.
620 \call endlist
621
622 For more details (albeit somewhat outdated), see the paper "FFTW: An
623 Adaptive Software Architecture for the FFT", by M. Frigo and
624 S. G. Johnson, \italic{Proc. ICASSP\} 3, 1381 (1998),
625 available along with other references at \docref{the FFTW web page\}.
626
627 \comment ######################################################################
628
629 \section Known bugs
630
631 \question 27aug:rfftwndbug FFTW 1.1 crashes in rfftwnd on Linux.
632
633 This bug was fixed in FFTW 1.2. There was a bug in \courier{rfftwnd\}
634 causing an incorrect amount of memory to be allocated. The bug showed
635 up in Linux with libc-5.3.12 (and nowhere else that we know of).
636
637 \question 15oct:fftwmpibug The MPI transforms in FFTW 1.2 give incorrect results/leak memory.
638
639 These bugs were corrected in FFTW 1.2.1. The MPI transforms (really,
640 just the transpose routines) in FFTW 1.2 had bugs that could cause
641 errors in some situations.
642
643 \question 05nov:testsingbug The test programs in FFTW 1.2.1 fail when I change FFTW to use single precision.
644
645 This bug was fixed in FFTW 1.3. (Older versions of FFTW did
646 work in single precision, but the test programs didn't--the error
647 tolerances in the tests were set for double precision.)
648
649 \question 24mar:teststoobig The test program in FFTW 1.2.1 fails for n > 46340.
650
651 This bug was fixed in FFTW 1.3. FFTW 1.2.1 produced the right answer,
652 but the test program was wrong. For large n, n*n in the naive
653 transform that we used for comparison overflows 32 bit integer
654 precision, breaking the test.
655
656 \question 24aug:linuxthreads The threaded code fails on Linux Redhat 5.0
657
658 We had problems with glibc-2.0.5. The code should work with
659 glibc-2.0.7.
660
661 \question 26sep:bigrfftwnd FFTW 2.0's rfftwnd fails for rank > 1 transforms with a final dimension >= 65536.
662
663 This bug was fixed in FFTW 2.0.1. (There was a 32-bit integer overflow due
664 to a poorly-parenthesized expression.)
665
666 \question 26mar:primebug FFTW 2.0's complex transforms give the wrong results with prime factors 17 to 97.
667
668 There was a bug in the complex transforms that could cause incorrect
669 results under (hopefully rare) circumstances for lengths with
670 intermediate-size prime factors (17-97). This bug was fixed in FFTW
671 2.1.1.
672
673 \question 05apr:mpichbug FFTW 2.1.1's MPI test programs crash with MPICH.
674
675 This bug was fixed in FFTW 2.1.2. The 2.1/2.1.1 MPI test programs crashed
676 when using the MPICH implementation of MPI with the \courier{ch_p4\}
677 device (TCP/IP); the transforms themselves worked fine.
678
679 \question 25may:aixthreadbug FFTW 2.1.2's multi-threaded transforms don't work on AIX.
680
681 This bug was fixed in FFTW 2.1.3. The multi-threaded transforms in
682 previous versions didn't work with AIX's \courier{pthreads\}
683 implementation, which idiosyncratically creates threads in detached
684 (non-joinable) mode by default.
685
686 \question 27sep:bigprimebug FFTW 2.1.2's complex transforms give incorrect results for large prime sizes.
687
688 This bug was fixed in FFTW 2.1.3. FFTW's complex-transform algorithm
689 for prime sizes (in versions 2.0 to 2.1.2) had an integer overflow
690 problem that caused incorrect results for many primes greater than
691 32768 (on 32-bit machines). (Sizes without large prime factors are
692 not affected.)
693
694 \question 25may:solaristhreadbug FFTW 2.1.3's multi-threaded transforms don't give any speedup on Solaris.
695
696 This bug was fixed in FFTW 2.1.4. (By default, Solaris creates
697 threads that do not parallelize over multiple processors, so one has
698 to request the proper behavior specifically.)
699
700 \question 03may:aixflags FFTW 2.1.3 crashes on AIX.
701
702 The FFTW 2.1.3 \courier{configure\} script picked incorrect compiler
703 flags for the \courier{xlc\} compiler on newer IBM processors. This
704 is fixed in FFTW 2.1.4.
705
706 \comment Here it ends!
707