annotate src/fftw-3.3.5/doc/FAQ/fftw-faq.ascii @ 83:ae30d91d2ffe

Replace these with versions built using an older toolset (so as to avoid ABI compatibilities when linking on Ubuntu 14.04 for packaging purposes)
author Chris Cannam
date Fri, 07 Feb 2020 11:51:13 +0000
parents 2cd0e3b3e1fd
children
rev   line source
Chris@42 1 FFTW FREQUENTLY ASKED QUESTIONS WITH ANSWERS
Chris@42 2 30 Jul 2016
Chris@42 3 Matteo Frigo
Chris@42 4 Steven G. Johnson
Chris@42 5 <fftw@fftw.org>
Chris@42 6
Chris@42 7 This is the list of Frequently Asked Questions about FFTW, a collection of
Chris@42 8 fast C routines for computing the Discrete Fourier Transform in one or
Chris@42 9 more dimensions.
Chris@42 10
Chris@42 11 ===============================================================================
Chris@42 12
Chris@42 13 Index
Chris@42 14
Chris@42 15 Section 1. Introduction and General Information
Chris@42 16 Q1.1 What is FFTW?
Chris@42 17 Q1.2 How do I obtain FFTW?
Chris@42 18 Q1.3 Is FFTW free software?
Chris@42 19 Q1.4 What is this about non-free licenses?
Chris@42 20 Q1.5 In the West? I thought MIT was in the East?
Chris@42 21
Chris@42 22 Section 2. Installing FFTW
Chris@42 23 Q2.1 Which systems does FFTW run on?
Chris@42 24 Q2.2 Does FFTW run on Windows?
Chris@42 25 Q2.3 My compiler has trouble with FFTW.
Chris@42 26 Q2.4 FFTW does not compile on Solaris, complaining about const.
Chris@42 27 Q2.5 What's the difference between --enable-3dnow and --enable-k7?
Chris@42 28 Q2.6 What's the difference between the fma and the non-fma versions?
Chris@42 29 Q2.7 Which language is FFTW written in?
Chris@42 30 Q2.8 Can I call FFTW from Fortran?
Chris@42 31 Q2.9 Can I call FFTW from C++?
Chris@42 32 Q2.10 Why isn't FFTW written in Fortran/C++?
Chris@42 33 Q2.11 How do I compile FFTW to run in single precision?
Chris@42 34 Q2.12 --enable-k7 does not work on x86-64
Chris@42 35
Chris@42 36 Section 3. Using FFTW
Chris@42 37 Q3.1 Why not support the FFTW 2 interface in FFTW 3?
Chris@42 38 Q3.2 Why do FFTW 3 plans encapsulate the input/output arrays and not ju
Chris@42 39 Q3.3 FFTW seems really slow.
Chris@42 40 Q3.4 FFTW slows down after repeated calls.
Chris@42 41 Q3.5 An FFTW routine is crashing when I call it.
Chris@42 42 Q3.6 My Fortran program crashes when calling FFTW.
Chris@42 43 Q3.7 FFTW gives results different from my old FFT.
Chris@42 44 Q3.8 FFTW gives different results between runs
Chris@42 45 Q3.9 Can I save FFTW's plans?
Chris@42 46 Q3.10 Why does your inverse transform return a scaled result?
Chris@42 47 Q3.11 How can I make FFTW put the origin (zero frequency) at the center
Chris@42 48 Q3.12 How do I FFT an image/audio file in *foobar* format?
Chris@42 49 Q3.13 My program does not link (on Unix).
Chris@42 50 Q3.14 I included your header, but linking still fails.
Chris@42 51 Q3.15 My program crashes, complaining about stack space.
Chris@42 52 Q3.16 FFTW seems to have a memory leak.
Chris@42 53 Q3.17 The output of FFTW's transform is all zeros.
Chris@42 54 Q3.18 How do I call FFTW from the Microsoft language du jour?
Chris@42 55 Q3.19 Can I compute only a subset of the DFT outputs?
Chris@42 56 Q3.20 Can I use FFTW's routines for in-place and out-of-place matrix tra
Chris@42 57
Chris@42 58 Section 4. Internals of FFTW
Chris@42 59 Q4.1 How does FFTW work?
Chris@42 60 Q4.2 Why is FFTW so fast?
Chris@42 61
Chris@42 62 Section 5. Known bugs
Chris@42 63 Q5.1 FFTW 1.1 crashes in rfftwnd on Linux.
Chris@42 64 Q5.2 The MPI transforms in FFTW 1.2 give incorrect results/leak memory.
Chris@42 65 Q5.3 The test programs in FFTW 1.2.1 fail when I change FFTW to use sin
Chris@42 66 Q5.4 The test program in FFTW 1.2.1 fails for n > 46340.
Chris@42 67 Q5.5 The threaded code fails on Linux Redhat 5.0
Chris@42 68 Q5.6 FFTW 2.0's rfftwnd fails for rank > 1 transforms with a final dime
Chris@42 69 Q5.7 FFTW 2.0's complex transforms give the wrong results with prime fa
Chris@42 70 Q5.8 FFTW 2.1.1's MPI test programs crash with MPICH.
Chris@42 71 Q5.9 FFTW 2.1.2's multi-threaded transforms don't work on AIX.
Chris@42 72 Q5.10 FFTW 2.1.2's complex transforms give incorrect results for large p
Chris@42 73 Q5.11 FFTW 2.1.3's multi-threaded transforms don't give any speedup on S
Chris@42 74 Q5.12 FFTW 2.1.3 crashes on AIX.
Chris@42 75
Chris@42 76 ===============================================================================
Chris@42 77
Chris@42 78 Section 1. Introduction and General Information
Chris@42 79
Chris@42 80 Q1.1 What is FFTW?
Chris@42 81 Q1.2 How do I obtain FFTW?
Chris@42 82 Q1.3 Is FFTW free software?
Chris@42 83 Q1.4 What is this about non-free licenses?
Chris@42 84 Q1.5 In the West? I thought MIT was in the East?
Chris@42 85
Chris@42 86 -------------------------------------------------------------------------------
Chris@42 87
Chris@42 88 Question 1.1. What is FFTW?
Chris@42 89
Chris@42 90 FFTW is a free collection of fast C routines for computing the Discrete
Chris@42 91 Fourier Transform in one or more dimensions. It includes complex, real,
Chris@42 92 symmetric, and parallel transforms, and can handle arbitrary array sizes
Chris@42 93 efficiently. FFTW is typically faster than other publically-available FFT
Chris@42 94 implementations, and is even competitive with vendor-tuned libraries.
Chris@42 95 (See our web page for extensive benchmarks.) To achieve this performance,
Chris@42 96 FFTW uses novel code-generation and runtime self-optimization techniques
Chris@42 97 (along with many other tricks).
Chris@42 98
Chris@42 99 -------------------------------------------------------------------------------
Chris@42 100
Chris@42 101 Question 1.2. How do I obtain FFTW?
Chris@42 102
Chris@42 103 FFTW can be found at the FFTW web page. You can also retrieve it from
Chris@42 104 ftp.fftw.org in /pub/fftw.
Chris@42 105
Chris@42 106 -------------------------------------------------------------------------------
Chris@42 107
Chris@42 108 Question 1.3. Is FFTW free software?
Chris@42 109
Chris@42 110 Starting with version 1.3, FFTW is Free Software in the technical sense
Chris@42 111 defined by the Free Software Foundation (see Categories of Free and
Chris@42 112 Non-Free Software), and is distributed under the terms of the GNU General
Chris@42 113 Public License. Previous versions of FFTW were distributed without fee
Chris@42 114 for noncommercial use, but were not technically ``free.''
Chris@42 115
Chris@42 116 Non-free licenses for FFTW are also available that permit different terms
Chris@42 117 of use than the GPL.
Chris@42 118
Chris@42 119 -------------------------------------------------------------------------------
Chris@42 120
Chris@42 121 Question 1.4. What is this about non-free licenses?
Chris@42 122
Chris@42 123 The non-free licenses are for companies that wish to use FFTW in their
Chris@42 124 products but are unwilling to release their software under the GPL (which
Chris@42 125 would require them to release source code and allow free redistribution).
Chris@42 126 Such users can purchase an unlimited-use license from MIT. Contact us for
Chris@42 127 more details.
Chris@42 128
Chris@42 129 We could instead have released FFTW under the LGPL, or even disallowed
Chris@42 130 non-Free usage. Suffice it to say, however, that MIT owns the copyright
Chris@42 131 to FFTW and they only let us GPL it because we convinced them that it
Chris@42 132 would neither affect their licensing revenue nor irritate existing
Chris@42 133 licensees.
Chris@42 134
Chris@42 135 -------------------------------------------------------------------------------
Chris@42 136
Chris@42 137 Question 1.5. In the West? I thought MIT was in the East?
Chris@42 138
Chris@42 139 Not to an Italian. You could say that we're a Spaghetti Western (with
Chris@42 140 apologies to Sergio Leone).
Chris@42 141
Chris@42 142 ===============================================================================
Chris@42 143
Chris@42 144 Section 2. Installing FFTW
Chris@42 145
Chris@42 146 Q2.1 Which systems does FFTW run on?
Chris@42 147 Q2.2 Does FFTW run on Windows?
Chris@42 148 Q2.3 My compiler has trouble with FFTW.
Chris@42 149 Q2.4 FFTW does not compile on Solaris, complaining about const.
Chris@42 150 Q2.5 What's the difference between --enable-3dnow and --enable-k7?
Chris@42 151 Q2.6 What's the difference between the fma and the non-fma versions?
Chris@42 152 Q2.7 Which language is FFTW written in?
Chris@42 153 Q2.8 Can I call FFTW from Fortran?
Chris@42 154 Q2.9 Can I call FFTW from C++?
Chris@42 155 Q2.10 Why isn't FFTW written in Fortran/C++?
Chris@42 156 Q2.11 How do I compile FFTW to run in single precision?
Chris@42 157 Q2.12 --enable-k7 does not work on x86-64
Chris@42 158
Chris@42 159 -------------------------------------------------------------------------------
Chris@42 160
Chris@42 161 Question 2.1. Which systems does FFTW run on?
Chris@42 162
Chris@42 163 FFTW is written in ANSI C, and should work on any system with a decent C
Chris@42 164 compiler. (See also Q2.2 `Does FFTW run on Windows?', Q2.3 `My compiler
Chris@42 165 has trouble with FFTW.'.) FFTW can also take advantage of certain
Chris@42 166 hardware-specific features, such as cycle counters and SIMD instructions,
Chris@42 167 but this is optional.
Chris@42 168
Chris@42 169 -------------------------------------------------------------------------------
Chris@42 170
Chris@42 171 Question 2.2. Does FFTW run on Windows?
Chris@42 172
Chris@42 173 Yes, many people have reported successfully using FFTW on Windows with
Chris@42 174 various compilers. FFTW was not developed on Windows, but the source code
Chris@42 175 is essentially straight ANSI C. See also the FFTW Windows installation
Chris@42 176 notes, Q2.3 `My compiler has trouble with FFTW.', and Q3.18 `How do I call
Chris@42 177 FFTW from the Microsoft language du jour?'.
Chris@42 178
Chris@42 179 -------------------------------------------------------------------------------
Chris@42 180
Chris@42 181 Question 2.3. My compiler has trouble with FFTW.
Chris@42 182
Chris@42 183 Complain fiercely to the vendor of the compiler.
Chris@42 184
Chris@42 185 We have successfully used gcc 3.2.x on x86 and PPC, a recent Compaq C
Chris@42 186 compiler for Alpha, version 6 of IBM's xlc compiler for AIX, Intel's icc
Chris@42 187 versions 5-7, and Sun WorkShop cc version 6.
Chris@42 188
Chris@42 189 FFTW is likely to push compilers to their limits, however, and several
Chris@42 190 compiler bugs have been exposed by FFTW. A partial list follows.
Chris@42 191
Chris@42 192 gcc 2.95.x for Solaris/SPARC produces incorrect code for the test program
Chris@42 193 (workaround: recompile the libbench2 directory with -O2).
Chris@42 194
Chris@42 195 NetBSD/macppc 1.6 comes with a gcc version that also miscompiles the test
Chris@42 196 program. (Please report a workaround if you know one.)
Chris@42 197
Chris@42 198 gcc 3.2.3 for ARM reportedly crashes during compilation. This bug is
Chris@42 199 reportedly fixed in later versions of gcc.
Chris@42 200
Chris@42 201 Versions 8.0 and 8.1 of Intel's icc falsely claim to be gcc, so you should
Chris@42 202 specify CC="icc -no-gcc"; this is automatic in FFTW 3.1. icc-8.0.066
Chris@42 203 reportely produces incorrect code for FFTW 2.1.5, but is fixed in version
Chris@42 204 8.1. icc-7.1 compiler build 20030402Z appears to produce incorrect
Chris@42 205 dependencies, causing the compilation to fail. icc-7.1 build 20030307Z
Chris@42 206 appears to work fine. (Use icc -V to check which build you have.) As of
Chris@42 207 2003/04/18, build 20030402Z appears not to be available any longer on
Chris@42 208 Intel's website, whereas the older build 20030307Z is available.
Chris@42 209
Chris@42 210 ranlib of GNU binutils 2.9.1 on Irix has been observed to corrupt the FFTW
Chris@42 211 libraries, causing a link failure when FFTW is compiled. Since ranlib is
Chris@42 212 completely superfluous on Irix, we suggest deleting it from your system
Chris@42 213 and replacing it with a symbolic link to /bin/echo.
Chris@42 214
Chris@42 215 If support for SIMD instructions is enabled in FFTW, further compiler
Chris@42 216 problems may appear:
Chris@42 217
Chris@42 218 gcc 3.4.[0123] for x86 produces incorrect SSE2 code for FFTW when -O2 (the
Chris@42 219 best choice for FFTW) is used, causing FFTW to crash (make check crashes).
Chris@42 220 This bug is fixed in gcc 3.4.4. On x86_64 (amd64/em64t), gcc 3.4.4
Chris@42 221 reportedly still has a similar problem, but this is fixed as of gcc 3.4.6.
Chris@42 222
Chris@42 223 gcc-3.2 for x86 produces incorrect SIMD code if -O3 is used. The same
Chris@42 224 compiler produces incorrect SIMD code if no optimization is used, too.
Chris@42 225 When using gcc-3.2, it is a good idea not to change the default CFLAGS
Chris@42 226 selected by the configure script.
Chris@42 227
Chris@42 228 Some 3.0.x and 3.1.x versions of gcc on x86 may crash. gcc so-called 2.96
Chris@42 229 shipping with RedHat 7.3 crashes when compiling SIMD code. In both cases,
Chris@42 230 please upgrade to gcc-3.2 or later.
Chris@42 231
Chris@42 232 Intel's icc 6.0 misaligns SSE constants, but FFTW has a workaround. icc
Chris@42 233 8.x fails to compile FFTW 3.0.x because it falsely claims to be gcc; we
Chris@42 234 believe this to be a bug in icc, but FFTW 3.1 has a workaround.
Chris@42 235
Chris@42 236 Visual C++ 2003 reportedly produces incorrect code for SSE/SSE2 when
Chris@42 237 compiling FFTW. This bug was reportedly fixed in VC++ 2005;
Chris@42 238 alternatively, you could switch to the Intel compiler. VC++ 6.0 also
Chris@42 239 reportedly produces incorrect code for the file reodft11e-r2hc-odd.c
Chris@42 240 unless optimizations are disabled for that file.
Chris@42 241
Chris@42 242 gcc 2.95 on MacOS X miscompiles AltiVec code (fixed in later versions).
Chris@42 243 gcc 3.2.x miscompiles AltiVec permutations, but FFTW has a workaround.
Chris@42 244 gcc 4.0.1 on MacOS for Intel crashes when compiling FFTW; a workaround is
Chris@42 245 to compile one file without optimization: cd kernel; make CFLAGS=" "
Chris@42 246 trig.lo.
Chris@42 247
Chris@42 248 gcc 4.1.1 reportedly crashes when compiling FFTW for MIPS; the workaround
Chris@42 249 is to compile the file it crashes on (t2_64.c) with a lower optimization
Chris@42 250 level.
Chris@42 251
Chris@42 252 gcc versions 4.1.2 to 4.2.0 for x86 reportedly miscompile FFTW 3.1's test
Chris@42 253 program, causing make check to crash (gcc bug #26528). The bug was
Chris@42 254 reportedly fixed in gcc version 4.2.1 and later. A workaround is to
Chris@42 255 compile libbench2/verify-lib.c without optimization.
Chris@42 256
Chris@42 257 -------------------------------------------------------------------------------
Chris@42 258
Chris@42 259 Question 2.4. FFTW does not compile on Solaris, complaining about const.
Chris@42 260
Chris@42 261 We know that at least on Solaris 2.5.x with Sun's compilers 4.2 you might
Chris@42 262 get error messages from make such as
Chris@42 263
Chris@42 264 "./fftw.h", line 88: warning: const is a keyword in ANSI C
Chris@42 265
Chris@42 266 This is the case when the configure script reports that const does not
Chris@42 267 work:
Chris@42 268
Chris@42 269 checking for working const... (cached) no
Chris@42 270
Chris@42 271 You should be aware that Solaris comes with two compilers, namely,
Chris@42 272 /opt/SUNWspro/SC4.2/bin/cc and /usr/ucb/cc. The latter compiler is
Chris@42 273 non-ANSI. Indeed, it is a perverse shell script that calls the real
Chris@42 274 compiler in non-ANSI mode. In order to compile FFTW, change your path so
Chris@42 275 that the right cc is used.
Chris@42 276
Chris@42 277 To know whether your compiler is the right one, type cc -V. If the
Chris@42 278 compiler prints ``ucbcc'', as in
Chris@42 279
Chris@42 280 ucbcc: WorkShop Compilers 4.2 30 Oct 1996 C 4.2
Chris@42 281
Chris@42 282 then the compiler is wrong. The right message is something like
Chris@42 283
Chris@42 284 cc: WorkShop Compilers 4.2 30 Oct 1996 C 4.2
Chris@42 285
Chris@42 286 -------------------------------------------------------------------------------
Chris@42 287
Chris@42 288 Question 2.5. What's the difference between --enable-3dnow and --enable-k7?
Chris@42 289
Chris@42 290 --enable-k7 enables 3DNow! instructions on K7 processors (AMD Athlon and
Chris@42 291 its variants). K7 support is provided by assembly routines generated by a
Chris@42 292 special purpose compiler. As of fftw-3.2, --enable-k7 is no longer
Chris@42 293 supported.
Chris@42 294
Chris@42 295 --enable-3dnow enables generic 3DNow! support using gcc builtin functions.
Chris@42 296 This works on earlier AMD processors, but it is not as fast as our special
Chris@42 297 assembly routines. As of fftw-3.1, --enable-3dnow is no longer supported.
Chris@42 298
Chris@42 299 -------------------------------------------------------------------------------
Chris@42 300
Chris@42 301 Question 2.6. What's the difference between the fma and the non-fma versions?
Chris@42 302
Chris@42 303 The fma version tries to exploit the fused multiply-add instructions
Chris@42 304 implemented in many processors such as PowerPC, ia-64, and MIPS. The two
Chris@42 305 FFTW packages are otherwise identical. In FFTW 3.1, the fma and non-fma
Chris@42 306 versions were merged together into a single package, and the configure
Chris@42 307 script attempts to automatically guess which version to use.
Chris@42 308
Chris@42 309 The FFTW 3.1 configure script enables fma by default on PowerPC, Itanium,
Chris@42 310 and PA-RISC, and disables it otherwise. You can force one or the other by
Chris@42 311 using the --enable-fma or --disable-fma flag for configure.
Chris@42 312
Chris@42 313 Definitely use fma if you have a PowerPC-based system with gcc (or IBM
Chris@42 314 xlc). This includes all GNU/Linux systems for PowerPC and the older
Chris@42 315 PowerPC-based MacOS systems. Also use it on PA-RISC and Itanium with the
Chris@42 316 HP/UX compiler.
Chris@42 317
Chris@42 318 Definitely do not use the fma version if you have an ia-32 processor
Chris@42 319 (Intel, AMD, MacOS on Intel, etcetera).
Chris@42 320
Chris@42 321 For other architectures/compilers, the situation is not so clear. For
Chris@42 322 example, ia-64 has the fma instruction, but gcc-3.2 appears not to exploit
Chris@42 323 it correctly. Other compilers may do the right thing, but we have not
Chris@42 324 tried them. Please send us your feedback so that we can update this FAQ
Chris@42 325 entry.
Chris@42 326
Chris@42 327 -------------------------------------------------------------------------------
Chris@42 328
Chris@42 329 Question 2.7. Which language is FFTW written in?
Chris@42 330
Chris@42 331 FFTW is written in ANSI C. Most of the code, however, was automatically
Chris@42 332 generated by a program called genfft, written in the Objective Caml
Chris@42 333 dialect of ML. You do not need to know ML or to have an Objective Caml
Chris@42 334 compiler in order to use FFTW.
Chris@42 335
Chris@42 336 genfft is provided with the FFTW sources, which means that you can play
Chris@42 337 with the code generator if you want. In this case, you need a working
Chris@42 338 Objective Caml system. Objective Caml is available from the Caml web
Chris@42 339 page.
Chris@42 340
Chris@42 341 -------------------------------------------------------------------------------
Chris@42 342
Chris@42 343 Question 2.8. Can I call FFTW from Fortran?
Chris@42 344
Chris@42 345 Yes, FFTW (versions 1.3 and higher) contains a Fortran-callable interface,
Chris@42 346 documented in the FFTW manual.
Chris@42 347
Chris@42 348 By default, FFTW configures its Fortran interface to work with the first
Chris@42 349 compiler it finds, e.g. g77. To configure for a different, incompatible
Chris@42 350 Fortran compiler foobar, use ./configure F77=foobar when installing FFTW.
Chris@42 351 (In the case of g77, however, FFTW 3.x also includes an extra set of
Chris@42 352 Fortran-callable routines with one less underscore at the end of
Chris@42 353 identifiers, which should cover most other Fortran compilers on Linux at
Chris@42 354 least.)
Chris@42 355
Chris@42 356 -------------------------------------------------------------------------------
Chris@42 357
Chris@42 358 Question 2.9. Can I call FFTW from C++?
Chris@42 359
Chris@42 360 Most definitely. FFTW should compile and/or link under any C++ compiler.
Chris@42 361 Moreover, it is likely that the C++ <complex> template class is
Chris@42 362 bit-compatible with FFTW's complex-number format (see the FFTW manual for
Chris@42 363 more details).
Chris@42 364
Chris@42 365 -------------------------------------------------------------------------------
Chris@42 366
Chris@42 367 Question 2.10. Why isn't FFTW written in Fortran/C++?
Chris@42 368
Chris@42 369 Because we don't like those languages, and neither approaches the
Chris@42 370 portability of C.
Chris@42 371
Chris@42 372 -------------------------------------------------------------------------------
Chris@42 373
Chris@42 374 Question 2.11. How do I compile FFTW to run in single precision?
Chris@42 375
Chris@42 376 On a Unix system: configure --enable-float. On a non-Unix system: edit
Chris@42 377 config.h to #define the symbol FFTW_SINGLE (for FFTW 3.x). In both cases,
Chris@42 378 you must then recompile FFTW. In FFTW 3, all FFTW identifiers will then
Chris@42 379 begin with fftwf_ instead of fftw_.
Chris@42 380
Chris@42 381 -------------------------------------------------------------------------------
Chris@42 382
Chris@42 383 Question 2.12. --enable-k7 does not work on x86-64
Chris@42 384
Chris@42 385 Support for --enable-k7 was discontinued in fftw-3.2.
Chris@42 386
Chris@42 387 The fftw-3.1 release supports --enable-k7. This option only works on
Chris@42 388 32-bit x86 machines that implement 3DNow!, including the AMD Athlon and
Chris@42 389 the AMD Opteron in 32-bit mode. --enable-k7 does not work on AMD Opteron
Chris@42 390 in 64-bit mode. Use --enable-sse for x86-64 machines.
Chris@42 391
Chris@42 392 FFTW supports 3DNow! by means of assembly code generated by a
Chris@42 393 special-purpose compiler. It is hard to produce assembly code that works
Chris@42 394 in both 32-bit and 64-bit mode.
Chris@42 395
Chris@42 396 ===============================================================================
Chris@42 397
Chris@42 398 Section 3. Using FFTW
Chris@42 399
Chris@42 400 Q3.1 Why not support the FFTW 2 interface in FFTW 3?
Chris@42 401 Q3.2 Why do FFTW 3 plans encapsulate the input/output arrays and not ju
Chris@42 402 Q3.3 FFTW seems really slow.
Chris@42 403 Q3.4 FFTW slows down after repeated calls.
Chris@42 404 Q3.5 An FFTW routine is crashing when I call it.
Chris@42 405 Q3.6 My Fortran program crashes when calling FFTW.
Chris@42 406 Q3.7 FFTW gives results different from my old FFT.
Chris@42 407 Q3.8 FFTW gives different results between runs
Chris@42 408 Q3.9 Can I save FFTW's plans?
Chris@42 409 Q3.10 Why does your inverse transform return a scaled result?
Chris@42 410 Q3.11 How can I make FFTW put the origin (zero frequency) at the center
Chris@42 411 Q3.12 How do I FFT an image/audio file in *foobar* format?
Chris@42 412 Q3.13 My program does not link (on Unix).
Chris@42 413 Q3.14 I included your header, but linking still fails.
Chris@42 414 Q3.15 My program crashes, complaining about stack space.
Chris@42 415 Q3.16 FFTW seems to have a memory leak.
Chris@42 416 Q3.17 The output of FFTW's transform is all zeros.
Chris@42 417 Q3.18 How do I call FFTW from the Microsoft language du jour?
Chris@42 418 Q3.19 Can I compute only a subset of the DFT outputs?
Chris@42 419 Q3.20 Can I use FFTW's routines for in-place and out-of-place matrix tra
Chris@42 420
Chris@42 421 -------------------------------------------------------------------------------
Chris@42 422
Chris@42 423 Question 3.1. Why not support the FFTW 2 interface in FFTW 3?
Chris@42 424
Chris@42 425 FFTW 3 has semantics incompatible with earlier versions: its plans can
Chris@42 426 only be used for a given stride, multiplicity, and other characteristics
Chris@42 427 of the input and output arrays; these stronger semantics are necessary for
Chris@42 428 performance reasons. Thus, it is impossible to efficiently emulate the
Chris@42 429 older interface (whose plans can be used for any transform of the same
Chris@42 430 size). We believe that it should be possible to upgrade most programs
Chris@42 431 without any difficulty, however.
Chris@42 432
Chris@42 433 -------------------------------------------------------------------------------
Chris@42 434
Chris@42 435 Question 3.2. Why do FFTW 3 plans encapsulate the input/output arrays and not just the algorithm?
Chris@42 436
Chris@42 437 There are several reasons:
Chris@42 438
Chris@42 439 * It was important for performance reasons that the plan be specific to
Chris@42 440 array characteristics like the stride (and alignment, for SIMD), and
Chris@42 441 requiring that the user maintain these invariants is error prone.
Chris@42 442 * In most high-performance applications, as far as we can tell, you are
Chris@42 443 usually transforming the same array over and over, so FFTW's semantics
Chris@42 444 should not be a burden.
Chris@42 445 * If you need to transform another array of the same size, creating a new
Chris@42 446 plan once the first exists is a cheap operation.
Chris@42 447 * If you need to transform many arrays of the same size at once, you
Chris@42 448 should really use the plan_many routines in FFTW's "advanced" interface.
Chris@42 449 * If the abovementioned array characteristics are the same, you are
Chris@42 450 willing to pay close attention to the documentation, and you really need
Chris@42 451 to, we provide a "new-array execution" interface to apply a plan to a
Chris@42 452 new array.
Chris@42 453
Chris@42 454 -------------------------------------------------------------------------------
Chris@42 455
Chris@42 456 Question 3.3. FFTW seems really slow.
Chris@42 457
Chris@42 458 You are probably recreating the plan before every transform, rather than
Chris@42 459 creating it once and reusing it for all transforms of the same size. FFTW
Chris@42 460 is designed to be used in the following way:
Chris@42 461
Chris@42 462 * First, you create a plan. This will take several seconds.
Chris@42 463 * Then, you reuse the plan many times to perform FFTs. These are fast.
Chris@42 464
Chris@42 465 If you don't need to compute many transforms and the time for the planner
Chris@42 466 is significant, you have two options. First, you can use the
Chris@42 467 FFTW_ESTIMATE option in the planner, which uses heuristics instead of
Chris@42 468 runtime measurements and produces a good plan in a short time. Second,
Chris@42 469 you can use the wisdom feature to precompute the plan; see Q3.9 `Can I
Chris@42 470 save FFTW's plans?'
Chris@42 471
Chris@42 472 -------------------------------------------------------------------------------
Chris@42 473
Chris@42 474 Question 3.4. FFTW slows down after repeated calls.
Chris@42 475
Chris@42 476 Probably, NaNs or similar are creeping into your data, and the slowdown is
Chris@42 477 due to the resulting floating-point exceptions. For example, be aware
Chris@42 478 that repeatedly FFTing the same array is a diverging process (because FFTW
Chris@42 479 computes the unnormalized transform).
Chris@42 480
Chris@42 481 -------------------------------------------------------------------------------
Chris@42 482
Chris@42 483 Question 3.5. An FFTW routine is crashing when I call it.
Chris@42 484
Chris@42 485 Did the FFTW test programs pass (make check, or cd tests; make bigcheck if
Chris@42 486 you want to be paranoid)? If so, you almost certainly have a bug in your
Chris@42 487 own code. For example, you could be passing invalid arguments (such as
Chris@42 488 wrongly-sized arrays) to FFTW, or you could simply have memory corruption
Chris@42 489 elsewhere in your program that causes random crashes later on. Please
Chris@42 490 don't complain to us unless you can come up with a minimal self-contained
Chris@42 491 program (preferably under 30 lines) that illustrates the problem.
Chris@42 492
Chris@42 493 -------------------------------------------------------------------------------
Chris@42 494
Chris@42 495 Question 3.6. My Fortran program crashes when calling FFTW.
Chris@42 496
Chris@42 497 As described in the manual, on 64-bit machines you must store the plans in
Chris@42 498 variables large enough to hold a pointer, for example integer*8. We
Chris@42 499 recommend using integer*8 on 32-bit machines as well, to simplify porting.
Chris@42 500
Chris@42 501 -------------------------------------------------------------------------------
Chris@42 502
Chris@42 503 Question 3.7. FFTW gives results different from my old FFT.
Chris@42 504
Chris@42 505 People follow many different conventions for the DFT, and you should be
Chris@42 506 sure to know the ones that we use (described in the FFTW manual). In
Chris@42 507 particular, you should be aware that the FFTW_FORWARD/FFTW_BACKWARD
Chris@42 508 directions correspond to signs of -1/+1 in the exponent of the DFT
Chris@42 509 definition. (*Numerical Recipes* uses the opposite convention.)
Chris@42 510
Chris@42 511 You should also know that we compute an unnormalized transform. In
Chris@42 512 contrast, Matlab is an example of program that computes a normalized
Chris@42 513 transform. See Q3.10 `Why does your inverse transform return a scaled
Chris@42 514 result?'.
Chris@42 515
Chris@42 516 Finally, note that floating-point arithmetic is not exact, so different
Chris@42 517 FFT algorithms will give slightly different results (on the order of the
Chris@42 518 numerical accuracy; typically a fractional difference of 1e-15 or so in
Chris@42 519 double precision).
Chris@42 520
Chris@42 521 -------------------------------------------------------------------------------
Chris@42 522
Chris@42 523 Question 3.8. FFTW gives different results between runs
Chris@42 524
Chris@42 525 If you use FFTW_MEASURE or FFTW_PATIENT mode, then the algorithm FFTW
Chris@42 526 employs is not deterministic: it depends on runtime performance
Chris@42 527 measurements. This will cause the results to vary slightly from run to
Chris@42 528 run. However, the differences should be slight, on the order of the
Chris@42 529 floating-point precision, and therefore should have no practical impact on
Chris@42 530 most applications.
Chris@42 531
Chris@42 532 If you use saved plans (wisdom) or FFTW_ESTIMATE mode, however, then the
Chris@42 533 algorithm is deterministic and the results should be identical between
Chris@42 534 runs.
Chris@42 535
Chris@42 536 -------------------------------------------------------------------------------
Chris@42 537
Chris@42 538 Question 3.9. Can I save FFTW's plans?
Chris@42 539
Chris@42 540 Yes. Starting with version 1.2, FFTW provides the wisdom mechanism for
Chris@42 541 saving plans; see the FFTW manual.
Chris@42 542
Chris@42 543 -------------------------------------------------------------------------------
Chris@42 544
Chris@42 545 Question 3.10. Why does your inverse transform return a scaled result?
Chris@42 546
Chris@42 547 Computing the forward transform followed by the backward transform (or
Chris@42 548 vice versa) yields the original array scaled by the size of the array.
Chris@42 549 (For multi-dimensional transforms, the size of the array is the product of
Chris@42 550 the dimensions.) We could, instead, have chosen a normalization that
Chris@42 551 would have returned the unscaled array. Or, to accomodate the many
Chris@42 552 conventions in this matter, the transform routines could have accepted a
Chris@42 553 "scale factor" parameter. We did not do this, however, for two reasons.
Chris@42 554 First, we didn't want to sacrifice performance in the common case where
Chris@42 555 the scale factor is 1. Second, in real applications the FFT is followed or
Chris@42 556 preceded by some computation on the data, into which the scale factor can
Chris@42 557 typically be absorbed at little or no cost.
Chris@42 558
Chris@42 559 -------------------------------------------------------------------------------
Chris@42 560
Chris@42 561 Question 3.11. How can I make FFTW put the origin (zero frequency) at the center of its output?
Chris@42 562
Chris@42 563 For human viewing of a spectrum, it is often convenient to put the origin
Chris@42 564 in frequency space at the center of the output array, rather than in the
Chris@42 565 zero-th element (the default in FFTW). If all of the dimensions of your
Chris@42 566 array are even, you can accomplish this by simply multiplying each element
Chris@42 567 of the input array by (-1)^(i + j + ...), where i, j, etcetera are the
Chris@42 568 indices of the element. (This trick is a general property of the DFT, and
Chris@42 569 is not specific to FFTW.)
Chris@42 570
Chris@42 571 -------------------------------------------------------------------------------
Chris@42 572
Chris@42 573 Question 3.12. How do I FFT an image/audio file in *foobar* format?
Chris@42 574
Chris@42 575 FFTW performs an FFT on an array of floating-point values. You can
Chris@42 576 certainly use it to compute the transform of an image or audio stream, but
Chris@42 577 you are responsible for figuring out your data format and converting it to
Chris@42 578 the form FFTW requires.
Chris@42 579
Chris@42 580 -------------------------------------------------------------------------------
Chris@42 581
Chris@42 582 Question 3.13. My program does not link (on Unix).
Chris@42 583
Chris@42 584 The libraries must be listed in the correct order (-lfftw3 -lm for FFTW
Chris@42 585 3.x) and *after* your program sources/objects. (The general rule is that
Chris@42 586 if *A* uses *B*, then *A* must be listed before *B* in the link command.).
Chris@42 587
Chris@42 588 -------------------------------------------------------------------------------
Chris@42 589
Chris@42 590 Question 3.14. I included your header, but linking still fails.
Chris@42 591
Chris@42 592 You're a C++ programmer, aren't you? You have to compile the FFTW library
Chris@42 593 and link it into your program, not just #include <fftw3.h>. (Yes, this is
Chris@42 594 really a FAQ.)
Chris@42 595
Chris@42 596 -------------------------------------------------------------------------------
Chris@42 597
Chris@42 598 Question 3.15. My program crashes, complaining about stack space.
Chris@42 599
Chris@42 600 You cannot declare large arrays with automatic storage (e.g. via
Chris@42 601 fftw_complex array[N]); you should use fftw_malloc (or equivalent) to
Chris@42 602 allocate the arrays you want to transform if they are larger than a few
Chris@42 603 hundred elements.
Chris@42 604
Chris@42 605 -------------------------------------------------------------------------------
Chris@42 606
Chris@42 607 Question 3.16. FFTW seems to have a memory leak.
Chris@42 608
Chris@42 609 After you create a plan, FFTW caches the information required to quickly
Chris@42 610 recreate the plan. (See Q3.9 `Can I save FFTW's plans?') It also
Chris@42 611 maintains a small amount of other persistent memory. You can deallocate
Chris@42 612 all of FFTW's internally allocated memory, if you wish, by calling
Chris@42 613 fftw_cleanup(), as documented in the manual.
Chris@42 614
Chris@42 615 -------------------------------------------------------------------------------
Chris@42 616
Chris@42 617 Question 3.17. The output of FFTW's transform is all zeros.
Chris@42 618
Chris@42 619 You should initialize your input array *after* creating the plan, unless
Chris@42 620 you use FFTW_ESTIMATE: planning with FFTW_MEASURE or FFTW_PATIENT
Chris@42 621 overwrites the input/output arrays, as described in the manual.
Chris@42 622
Chris@42 623 -------------------------------------------------------------------------------
Chris@42 624
Chris@42 625 Question 3.18. How do I call FFTW from the Microsoft language du jour?
Chris@42 626
Chris@42 627 Please *do not* ask us Windows-specific questions. We do not use Windows.
Chris@42 628 We know nothing about Visual Basic, Visual C++, or .NET. Please find the
Chris@42 629 appropriate Usenet discussion group and ask your question there. See also
Chris@42 630 Q2.2 `Does FFTW run on Windows?'.
Chris@42 631
Chris@42 632 -------------------------------------------------------------------------------
Chris@42 633
Chris@42 634 Question 3.19. Can I compute only a subset of the DFT outputs?
Chris@42 635
Chris@42 636 In general, no, an FFT intrinsically computes all outputs from all inputs.
Chris@42 637 In principle, there is something called a *pruned FFT* that can do what
Chris@42 638 you want, but to compute K outputs out of N the complexity is in general
Chris@42 639 O(N log K) instead of O(N log N), thus saving only a small additive factor
Chris@42 640 in the log. (The same argument holds if you instead have only K nonzero
Chris@42 641 inputs.)
Chris@42 642
Chris@42 643 There are some specific cases in which you can get the O(N log K)
Chris@42 644 performance benefits easily, however, by combining a few ordinary FFTs.
Chris@42 645 In particular, the case where you want the first K outputs, where K
Chris@42 646 divides N, can be handled by performing N/K transforms of size K and then
Chris@42 647 summing the outputs multiplied by appropriate phase factors. For more
Chris@42 648 details, see pruned FFTs with FFTW.
Chris@42 649
Chris@42 650 There are also some algorithms that compute pruned transforms
Chris@42 651 *approximately*, but they are beyond the scope of this FAQ.
Chris@42 652
Chris@42 653 -------------------------------------------------------------------------------
Chris@42 654
Chris@42 655 Question 3.20. Can I use FFTW's routines for in-place and out-of-place matrix transposition?
Chris@42 656
Chris@42 657 You can use the FFTW guru interface to create a rank-0 transform of vector
Chris@42 658 rank 2 where the vector strides are transposed. (A rank-0 transform is
Chris@42 659 equivalent to a 1D transform of size 1, which. just copies the input into
Chris@42 660 the output.) Specifying the same location for the input and output makes
Chris@42 661 the transpose in-place.
Chris@42 662
Chris@42 663 For double-valued data stored in row-major format, plan creation looks
Chris@42 664 like this:
Chris@42 665
Chris@42 666 fftw_plan plan_transpose(int rows, int cols, double *in, double *out)
Chris@42 667 {
Chris@42 668 const unsigned flags = FFTW_ESTIMATE; /* other flags are possible */
Chris@42 669 fftw_iodim howmany_dims[2];
Chris@42 670
Chris@42 671 howmany_dims[0].n = rows;
Chris@42 672 howmany_dims[0].is = cols;
Chris@42 673 howmany_dims[0].os = 1;
Chris@42 674
Chris@42 675 howmany_dims[1].n = cols;
Chris@42 676 howmany_dims[1].is = 1;
Chris@42 677 howmany_dims[1].os = rows;
Chris@42 678
Chris@42 679 return fftw_plan_guru_r2r(/*rank=*/ 0, /*dims=*/ NULL,
Chris@42 680 /*howmany_rank=*/ 2, howmany_dims,
Chris@42 681 in, out, /*kind=*/ NULL, flags);
Chris@42 682 }
Chris@42 683 (This entry was written by Rhys Ulerich.)
Chris@42 684
Chris@42 685 ===============================================================================
Chris@42 686
Chris@42 687 Section 4. Internals of FFTW
Chris@42 688
Chris@42 689 Q4.1 How does FFTW work?
Chris@42 690 Q4.2 Why is FFTW so fast?
Chris@42 691
Chris@42 692 -------------------------------------------------------------------------------
Chris@42 693
Chris@42 694 Question 4.1. How does FFTW work?
Chris@42 695
Chris@42 696 The innovation (if it can be so called) in FFTW consists in having a
Chris@42 697 variety of composable *solvers*, representing different FFT algorithms and
Chris@42 698 implementation strategies, whose combination into a particular *plan* for
Chris@42 699 a given size can be determined at runtime according to the characteristics
Chris@42 700 of your machine/compiler. This peculiar software architecture allows FFTW
Chris@42 701 to adapt itself to almost any machine.
Chris@42 702
Chris@42 703 For more details (albeit somewhat outdated), see the paper "FFTW: An
Chris@42 704 Adaptive Software Architecture for the FFT", by M. Frigo and S. G.
Chris@42 705 Johnson, *Proc. ICASSP* 3, 1381 (1998), also available at the FFTW web
Chris@42 706 page.
Chris@42 707
Chris@42 708 -------------------------------------------------------------------------------
Chris@42 709
Chris@42 710 Question 4.2. Why is FFTW so fast?
Chris@42 711
Chris@42 712 This is a complex question, and there is no simple answer. In fact, the
Chris@42 713 authors do not fully know the answer, either. In addition to many small
Chris@42 714 performance hacks throughout FFTW, there are three general reasons for
Chris@42 715 FFTW's speed.
Chris@42 716
Chris@42 717 * FFTW uses a variety of FFT algorithms and implementation styles that
Chris@42 718 can be arbitrarily composed to adapt itself to a machine. See Q4.1 `How
Chris@42 719 does FFTW work?'.
Chris@42 720 * FFTW uses a code generator to produce highly-optimized routines for
Chris@42 721 computing small transforms.
Chris@42 722 * FFTW uses explicit divide-and-conquer to take advantage of the memory
Chris@42 723 hierarchy.
Chris@42 724
Chris@42 725 For more details (albeit somewhat outdated), see the paper "FFTW: An
Chris@42 726 Adaptive Software Architecture for the FFT", by M. Frigo and S. G.
Chris@42 727 Johnson, *Proc. ICASSP* 3, 1381 (1998), available along with other
Chris@42 728 references at the FFTW web page.
Chris@42 729
Chris@42 730 ===============================================================================
Chris@42 731
Chris@42 732 Section 5. Known bugs
Chris@42 733
Chris@42 734 Q5.1 FFTW 1.1 crashes in rfftwnd on Linux.
Chris@42 735 Q5.2 The MPI transforms in FFTW 1.2 give incorrect results/leak memory.
Chris@42 736 Q5.3 The test programs in FFTW 1.2.1 fail when I change FFTW to use sin
Chris@42 737 Q5.4 The test program in FFTW 1.2.1 fails for n > 46340.
Chris@42 738 Q5.5 The threaded code fails on Linux Redhat 5.0
Chris@42 739 Q5.6 FFTW 2.0's rfftwnd fails for rank > 1 transforms with a final dime
Chris@42 740 Q5.7 FFTW 2.0's complex transforms give the wrong results with prime fa
Chris@42 741 Q5.8 FFTW 2.1.1's MPI test programs crash with MPICH.
Chris@42 742 Q5.9 FFTW 2.1.2's multi-threaded transforms don't work on AIX.
Chris@42 743 Q5.10 FFTW 2.1.2's complex transforms give incorrect results for large p
Chris@42 744 Q5.11 FFTW 2.1.3's multi-threaded transforms don't give any speedup on S
Chris@42 745 Q5.12 FFTW 2.1.3 crashes on AIX.
Chris@42 746
Chris@42 747 -------------------------------------------------------------------------------
Chris@42 748
Chris@42 749 Question 5.1. FFTW 1.1 crashes in rfftwnd on Linux.
Chris@42 750
Chris@42 751 This bug was fixed in FFTW 1.2. There was a bug in rfftwnd causing an
Chris@42 752 incorrect amount of memory to be allocated. The bug showed up in Linux
Chris@42 753 with libc-5.3.12 (and nowhere else that we know of).
Chris@42 754
Chris@42 755 -------------------------------------------------------------------------------
Chris@42 756
Chris@42 757 Question 5.2. The MPI transforms in FFTW 1.2 give incorrect results/leak memory.
Chris@42 758
Chris@42 759 These bugs were corrected in FFTW 1.2.1. The MPI transforms (really, just
Chris@42 760 the transpose routines) in FFTW 1.2 had bugs that could cause errors in
Chris@42 761 some situations.
Chris@42 762
Chris@42 763 -------------------------------------------------------------------------------
Chris@42 764
Chris@42 765 Question 5.3. The test programs in FFTW 1.2.1 fail when I change FFTW to use single precision.
Chris@42 766
Chris@42 767 This bug was fixed in FFTW 1.3. (Older versions of FFTW did work in
Chris@42 768 single precision, but the test programs didn't--the error tolerances in
Chris@42 769 the tests were set for double precision.)
Chris@42 770
Chris@42 771 -------------------------------------------------------------------------------
Chris@42 772
Chris@42 773 Question 5.4. The test program in FFTW 1.2.1 fails for n > 46340.
Chris@42 774
Chris@42 775 This bug was fixed in FFTW 1.3. FFTW 1.2.1 produced the right answer, but
Chris@42 776 the test program was wrong. For large n, n*n in the naive transform that
Chris@42 777 we used for comparison overflows 32 bit integer precision, breaking the
Chris@42 778 test.
Chris@42 779
Chris@42 780 -------------------------------------------------------------------------------
Chris@42 781
Chris@42 782 Question 5.5. The threaded code fails on Linux Redhat 5.0
Chris@42 783
Chris@42 784 We had problems with glibc-2.0.5. The code should work with glibc-2.0.7.
Chris@42 785
Chris@42 786 -------------------------------------------------------------------------------
Chris@42 787
Chris@42 788 Question 5.6. FFTW 2.0's rfftwnd fails for rank > 1 transforms with a final dimension >= 65536.
Chris@42 789
Chris@42 790 This bug was fixed in FFTW 2.0.1. (There was a 32-bit integer overflow
Chris@42 791 due to a poorly-parenthesized expression.)
Chris@42 792
Chris@42 793 -------------------------------------------------------------------------------
Chris@42 794
Chris@42 795 Question 5.7. FFTW 2.0's complex transforms give the wrong results with prime factors 17 to 97.
Chris@42 796
Chris@42 797 There was a bug in the complex transforms that could cause incorrect
Chris@42 798 results under (hopefully rare) circumstances for lengths with
Chris@42 799 intermediate-size prime factors (17-97). This bug was fixed in FFTW
Chris@42 800 2.1.1.
Chris@42 801
Chris@42 802 -------------------------------------------------------------------------------
Chris@42 803
Chris@42 804 Question 5.8. FFTW 2.1.1's MPI test programs crash with MPICH.
Chris@42 805
Chris@42 806 This bug was fixed in FFTW 2.1.2. The 2.1/2.1.1 MPI test programs crashed
Chris@42 807 when using the MPICH implementation of MPI with the ch_p4 device (TCP/IP);
Chris@42 808 the transforms themselves worked fine.
Chris@42 809
Chris@42 810 -------------------------------------------------------------------------------
Chris@42 811
Chris@42 812 Question 5.9. FFTW 2.1.2's multi-threaded transforms don't work on AIX.
Chris@42 813
Chris@42 814 This bug was fixed in FFTW 2.1.3. The multi-threaded transforms in
Chris@42 815 previous versions didn't work with AIX's pthreads implementation, which
Chris@42 816 idiosyncratically creates threads in detached (non-joinable) mode by
Chris@42 817 default.
Chris@42 818
Chris@42 819 -------------------------------------------------------------------------------
Chris@42 820
Chris@42 821 Question 5.10. FFTW 2.1.2's complex transforms give incorrect results for large prime sizes.
Chris@42 822
Chris@42 823 This bug was fixed in FFTW 2.1.3. FFTW's complex-transform algorithm for
Chris@42 824 prime sizes (in versions 2.0 to 2.1.2) had an integer overflow problem
Chris@42 825 that caused incorrect results for many primes greater than 32768 (on
Chris@42 826 32-bit machines). (Sizes without large prime factors are not affected.)
Chris@42 827
Chris@42 828 -------------------------------------------------------------------------------
Chris@42 829
Chris@42 830 Question 5.11. FFTW 2.1.3's multi-threaded transforms don't give any speedup on Solaris.
Chris@42 831
Chris@42 832 This bug was fixed in FFTW 2.1.4. (By default, Solaris creates threads
Chris@42 833 that do not parallelize over multiple processors, so one has to request
Chris@42 834 the proper behavior specifically.)
Chris@42 835
Chris@42 836 -------------------------------------------------------------------------------
Chris@42 837
Chris@42 838 Question 5.12. FFTW 2.1.3 crashes on AIX.
Chris@42 839
Chris@42 840 The FFTW 2.1.3 configure script picked incorrect compiler flags for the
Chris@42 841 xlc compiler on newer IBM processors. This is fixed in FFTW 2.1.4.
Chris@42 842