annotate fft/fftw/fftw-3.3.4/NEWS @ 40:223f770b5341 kissfft-double tip

Try a double-precision kissfft
author Chris Cannam
date Wed, 07 Sep 2016 10:40:32 +0100
parents 26056e866c29
children
rev   line source
Chris@19 1 FFTW 3.3.4
Chris@19 2
Chris@19 3 * New functions fftw_alignment_of (to check whether two arrays are
Chris@19 4 equally aligned for the purposes of applying a plan) and fftw_sprint_plan
Chris@19 5 (to output a description of plan to a string).
Chris@19 6
Chris@19 7 * Bugfix in fftw-wisdom-to-conf; thanks to Florian Oppermann for the
Chris@19 8 bug report.
Chris@19 9
Chris@19 10 * Fixed manual to work with texinfo-5.
Chris@19 11
Chris@19 12 * Increased timing interval on x86_64 to reduce timing errors.
Chris@19 13
Chris@19 14 * Default to Win32 threads, not pthreads, if both are present.
Chris@19 15
Chris@19 16 * Various build-script fixes.
Chris@19 17
Chris@19 18 FFTW 3.3.3
Chris@19 19
Chris@19 20 * Fix deadlock bug in MPI transforms (thanks to Michael Pippig for the
Chris@19 21 bug report and patch, and to Graham Dennis for the bug report).
Chris@19 22
Chris@19 23 * Use 128-bit ARM NEON instructions instead of 64-bits. This change
Chris@19 24 appears to speed up even ARM processors with a 64-bit NEON pipe.
Chris@19 25
Chris@19 26 * Speed improvements for single-precision AVX.
Chris@19 27
Chris@19 28 * Speed up planner on machines without "official" cycle counters, such as ARM.
Chris@19 29
Chris@19 30 FFTW 3.3.2
Chris@19 31
Chris@19 32 * Removed an archaic stack-alignment hack that was failing with
Chris@19 33 gcc-4.7/i386.
Chris@19 34
Chris@19 35 * Added stack-alignment hack necessary for gcc on Windows/i386. We
Chris@19 36 will regret this in ten years (see previous change).
Chris@19 37
Chris@19 38 * Fix incompatibility with Intel icc which pretends to be gcc
Chris@19 39 but does not support quad precision.
Chris@19 40
Chris@19 41 * make libfftw{threads,mpi} depend upon libfftw when using libtool;
Chris@19 42 this is consistent with most other libraries and simplifies the life
Chris@19 43 of various distributors of GNU/Linux.
Chris@19 44
Chris@19 45 FFTW 3.3.1
Chris@19 46
Chris@19 47 * Changes since 3.3.1-beta1:
Chris@19 48
Chris@19 49 - Reduced planning time in estimate mode for sizes with large
Chris@19 50 prime factors.
Chris@19 51
Chris@19 52 - Added AVX autodetection under Visual Studio. Thanks Carsten
Chris@19 53 Steger for submitting the necessary code.
Chris@19 54
Chris@19 55 - Modern Fortran interface now uses a separate fftw3l.f03 interface
Chris@19 56 file for the long double interface, which is not supported by
Chris@19 57 some Fortran compilers. Provided new fftw3q.f03 interface file
Chris@19 58 to access the quadruple-precision FFTW routines with recent
Chris@19 59 versions of gcc/gfortran.
Chris@19 60
Chris@19 61 * Added support for the NEON extensions to the ARM ISA. (Note to beta
Chris@19 62 users: an ARM cycle counter is not yet implemented; please contact
Chris@19 63 fftw@fftw.org if you know how to do it right.)
Chris@19 64
Chris@19 65 * MPI code now compiles even if mpicc is a C++ compiler; thanks to
Chris@19 66 Kyle Spyksma for the bug report.
Chris@19 67
Chris@19 68 FFTW 3.3
Chris@19 69
Chris@19 70 * Changes since 3.3-beta1:
Chris@19 71
Chris@19 72 - Compiling OpenMP support (--enable-openmp) now installs a
Chris@19 73 fftw3_omp library, instead of fftw3_threads, so that OpenMP
Chris@19 74 and POSIX threads (--enable-threads) libraries can be built
Chris@19 75 and installed at the same time.
Chris@19 76
Chris@19 77 - Various minor compilation fixes, corrections of manual typos, and
Chris@19 78 improvements to the benchmark test program.
Chris@19 79
Chris@19 80 * Add support for the AVX extensions to x86 and x86-64. The AVX code
Chris@19 81 works with 16-byte alignment (as opposed to 32-byte alignment),
Chris@19 82 so there is no ABI change compared to FFTW 3.2.2.
Chris@19 83
Chris@19 84 * Added Fortran 2003 interface, which should be usable on most modern
Chris@19 85 Fortran compilers (e.g. gfortran) and provides type-checked access
Chris@19 86 to the the C FFTW interface. (The legacy Fortran-77 interface is
Chris@19 87 still included also.)
Chris@19 88
Chris@19 89 * Added MPI distributed-memory transforms. Compared to 3.3alpha,
Chris@19 90 the major changes in the MPI transforms are:
Chris@19 91 - Fixed some deadlock and crashing bugs.
Chris@19 92 - Added Fortran 2003 interface.
Chris@19 93 - Added new-array execute functions for MPI plans.
Chris@19 94 - Eliminated use of large MPI tags, since Cray MPI requires tags < 2^24;
Chris@19 95 thanks to Jonathan Bentz for the bug report.
Chris@19 96 - Expanded documentation.
Chris@19 97 - 'make check' now runs MPI tests
Chris@19 98 - Some ABI changes - not binary-compatible with 3.3alpha MPI.
Chris@19 99
Chris@19 100 * Add support for quad-precision __float128 in gcc 4.6 or later (on x86.
Chris@19 101 x86-64, and Itanium). The new routines use the fftwq_ prefix.
Chris@19 102
Chris@19 103 * Removed support for MIPS paired-single instructions due to lack of
Chris@19 104 available hardware for testing. Users who want this functionality
Chris@19 105 should continue using FFTW 3.2.x. (Note that FFTW 3.3 still works
Chris@19 106 on MIPS; this only concerns special instructions available on some
Chris@19 107 MIPS chips.)
Chris@19 108
Chris@19 109 * Removed support for the Cell Broadband Engine. Cell users should
Chris@19 110 use FFTW 3.2.x.
Chris@19 111
Chris@19 112 * New convenience functions fftw_alloc_real and fftw_alloc_complex
Chris@19 113 to use fftw_malloc for real and complex arrays without typecasts
Chris@19 114 or sizeof.
Chris@19 115
Chris@19 116 * New convenience functions fftw_export_wisdom_to_filename and
Chris@19 117 fftw_import_wisdom_from_filename that export/import wisdom
Chris@19 118 to a file, which don't require you to open/close the file yourself.
Chris@19 119
Chris@19 120 * New function fftw_cost to return FFTW's internal cost metric for
Chris@19 121 a given plan; thanks to Rhys Ulerich and Nathanael Schaeffer for the
Chris@19 122 suggestion.
Chris@19 123
Chris@19 124 * The --enable-sse2 configure flag now works in both double and single
Chris@19 125 precision (and is equivalent to --enable-sse in the latter case).
Chris@19 126
Chris@19 127 * Remove --enable-portable-binary flag: we new produce portable binaries
Chris@19 128 by default.
Chris@19 129
Chris@19 130 * Remove the automatic detection of native architecture flag for gcc
Chris@19 131 which was introduced in fftw-3.1, since new gcc supports -mtune=native.
Chris@19 132 Remove the --with-gcc-arch flag; if you want to specify a particlar
Chris@19 133 arch to configure, use ./configure CC="gcc -mtune=...".
Chris@19 134
Chris@19 135 * --with-our-malloc16 configure flag is now renamed --with-our-malloc.
Chris@19 136
Chris@19 137 * Fixed build problem failure when srand48 declaration is missing;
Chris@19 138 thanks to Ralf Wildenhues for the bug report.
Chris@19 139
Chris@19 140 * Fixed bug in fftw_set_timelimit: ensure that a negative timelimit
Chris@19 141 is equivalent to no timelimit in all cases. Thanks to William Andrew
Chris@19 142 Burnson for the bug report.
Chris@19 143
Chris@19 144 * Fixed stack-overflow problem on OpenBSD caused by using alloca with
Chris@19 145 too large a buffer.
Chris@19 146
Chris@19 147 FFTW 3.2.2
Chris@19 148
Chris@19 149 * Improve performance of some copy operations of complex arrays on
Chris@19 150 x86 machines.
Chris@19 151
Chris@19 152 * Add configure flag to disable alloca(), which is broken in mingw64.
Chris@19 153
Chris@19 154 * Planning in FFTW_ESTIMATE mode for r2r transforms became slower
Chris@19 155 between fftw-3.1.3 and 3.2. This regression has now been fixed.
Chris@19 156
Chris@19 157 FFTW 3.2.1
Chris@19 158
Chris@19 159 * Performance improvements for some multidimensional r2c/c2r transforms;
Chris@19 160 thanks to Eugene Miloslavsky for his benchmark reports.
Chris@19 161
Chris@19 162 * Compile with icc on MacOS X, use better icc compiler flags.
Chris@19 163
Chris@19 164 * Compilation fixes for systems where snprintf is defined as a macro;
Chris@19 165 thanks to Marcus Mae for the bug report.
Chris@19 166
Chris@19 167 * Fortran documentation now recommends not using dfftw_execute,
Chris@19 168 because of reports of problems with various Fortran compilers;
Chris@19 169 it is better to use dfftw_execute_dft etcetera.
Chris@19 170
Chris@19 171 * Some documentation clarifications, e.g. of fact that --enable-openmp
Chris@19 172 and --enable-threads are mutually exclusive (thanks to Long To),
Chris@19 173 and document slightly odd behavior of plan_guru_r2r in Fortran
Chris@19 174 (thanks to Alexander Pozdneev).
Chris@19 175
Chris@19 176 * FAQ was accidentally omitted from 3.2 tarball.
Chris@19 177
Chris@19 178 * Remove some extraneous (harmless) files accidentally included in
Chris@19 179 a subdirectory of the 3.2 tarball.
Chris@19 180
Chris@19 181 FFTW 3.2
Chris@19 182
Chris@19 183 * Worked around apparent glibc bug that leads to rare hangs when freeing
Chris@19 184 semaphores.
Chris@19 185
Chris@19 186 * Fixed segfault due to unaligned access in certain obscure problems
Chris@19 187 that use SSE and multiple threads.
Chris@19 188
Chris@19 189 * MPI transforms not included, as they are still in alpha; the alpha
Chris@19 190 versions of the MPI transforms have been moved to FFTW 3.3alpha1.
Chris@19 191
Chris@19 192 FFTW 3.2alpha3
Chris@19 193
Chris@19 194 * Performance improvements for sizes with factors of 5 and 10.
Chris@19 195
Chris@19 196 * Documented FFTW_WISDOM_ONLY flag, at the suggestion of Mario
Chris@19 197 Emmenlauer and Phil Dumont.
Chris@19 198
Chris@19 199 * Port Cell code to SDK2.1 (libspe2), as opposed to the old libspe1 code.
Chris@19 200
Chris@19 201 * Performance improvements in Cell code for N < 32k, thanks to Jan Wagner
Chris@19 202 for the suggestions.
Chris@19 203
Chris@19 204 * Cycle counter for Sun x86_64 compiler, and compilation fix in cycle
Chris@19 205 counter for AIX/xlc (thanks to Jeff Haferman for the bug report).
Chris@19 206
Chris@19 207 * Fixed incorrect type prefix in MPI code that prevented wisdom routines
Chris@19 208 from working in single precision (thanks to Eric A. Borisch for the report).
Chris@19 209
Chris@19 210 * Added 'make check' for MPI code (which still fails in a couple corner
Chris@19 211 cases, but should be much better than in alpha2).
Chris@19 212
Chris@19 213 * Many other small fixes.
Chris@19 214
Chris@19 215 FFTW 3.2alpha2
Chris@19 216
Chris@19 217 * Support for the Cell processor, donated by IBM Research; see README.Cell
Chris@19 218 and the Cell section of the manual.
Chris@19 219
Chris@19 220 * New 64-bit API: for every "plan_guru" function there is a new "plan_guru64"
Chris@19 221 function with the same semantics, but which takes fftw_iodim64 instead of
Chris@19 222 fftw_iodim. fftw_iodim64 is the same as fftw_iodim, except that it takes
Chris@19 223 ptrdiff_t integer types as parameters, which is a 64-bit type on
Chris@19 224 64-bit machines. This is only useful for specifying very large transforms
Chris@19 225 on 64-bit machines. (Internally, FFTW uses ptrdiff_t everywhere
Chris@19 226 regardless of what API you choose.)
Chris@19 227
Chris@19 228 * Experimental MPI support. Complex one- and multi-dimensional FFTs,
Chris@19 229 multi-dimensional r2r, multi-dimensional r2c/c2r transforms, and
Chris@19 230 distributed transpose operations, with 1d block distributions.
Chris@19 231 (This is an alpha preview: routines have not been exhaustively
Chris@19 232 tested, documentation is incomplete, and some functionality is
Chris@19 233 missing, e.g. Fortran support.) See mpi/README and also the MPI
Chris@19 234 section of the manual.
Chris@19 235
Chris@19 236 * Significantly faster r2c/c2r transforms, especially on machines with SIMD.
Chris@19 237
Chris@19 238 * Rewritten multi-threaded support for better performance by
Chris@19 239 re-using a fixed pool of threads rather than continually
Chris@19 240 respawning and joining (which nowadays is much slower).
Chris@19 241
Chris@19 242 * Support for MIPS paired-single SIMD instructions, donated by
Chris@19 243 Codesourcery.
Chris@19 244
Chris@19 245 * FFTW_WISDOM_ONLY planner flag, to create plan only if wisdom is
Chris@19 246 available and return NULL otherwise.
Chris@19 247
Chris@19 248 * Removed k7 support, which only worked in 32-bit mode and is
Chris@19 249 becoming obsolete. Use --enable-sse instead.
Chris@19 250
Chris@19 251 * Added --with-g77-wrappers configure option to force inclusion
Chris@19 252 of g77 wrappers, in addition to whatever is needed for the
Chris@19 253 detected Fortran compilers. This is mainly intended for GNU/Linux
Chris@19 254 distros switching to gfortran that wish to include both
Chris@19 255 gfortran and g77 support in FFTW.
Chris@19 256
Chris@19 257 * In manual, renamed "guru execute" functions to "new-array execute"
Chris@19 258 functions, to reduce confusion with the guru planner interface.
Chris@19 259 (The programming interface is unchanged.)
Chris@19 260
Chris@19 261 * Add missing __declspec attribute to threads API functions when compiling
Chris@19 262 for Windows; thanks to Robert O. Morris for the bug report.
Chris@19 263
Chris@19 264 * Fixed missing return value from dfftw_init_threads in Fortran;
Chris@19 265 thanks to Markus Wetzstein for the bug report.
Chris@19 266
Chris@19 267 FFTW 3.1.3
Chris@19 268
Chris@19 269 * Bug fix: FFTW computes incorrect results when the user plans both
Chris@19 270 REDFT11 and RODFT11 transforms of certain sizes. The bug is caused
Chris@19 271 by incorrect sharing of twiddle-factor tables between the two
Chris@19 272 transforms, and only occurs when both are used. Thanks to Paul
Chris@19 273 A. Valiant for the bug report.
Chris@19 274
Chris@19 275 FFTW 3.1.2
Chris@19 276
Chris@19 277 * Correct bug in configure script: --enable-portable-binary option was ignored!
Chris@19 278 Thanks to Andrew Salamon for the bug report.
Chris@19 279
Chris@19 280 * Threads compilation fix on AIX: prefer xlc_r to cc_r, and don't use
Chris@19 281 either if we are using gcc. Thanks to Guy Moebs for the bug report.
Chris@19 282
Chris@19 283 * Updated FAQ to note that Apple gcc 4.0.1 on MacOS/Intel is broken,
Chris@19 284 and suggest a workaround. configure script now detects Core/Duo arch.
Chris@19 285
Chris@19 286 * Use -maltivec when checking for altivec.h. Fixes Gentoo bug #129304,
Chris@19 287 thanks to Markus Dittrich.
Chris@19 288
Chris@19 289 FFTW 3.1.1
Chris@19 290
Chris@19 291 * Performance improvements for Intel EMT64.
Chris@19 292
Chris@19 293 * Performance improvements for large-size transforms with SIMD.
Chris@19 294
Chris@19 295 * Cycle counter support for Intel icc and Visual C++ on x86-64.
Chris@19 296
Chris@19 297 * In fftw-wisdom tool, replaced obsolete --impatient with --measure.
Chris@19 298
Chris@19 299 * Fixed compilation failure with AIX/xlc; thanks to Joseph Thomas.
Chris@19 300
Chris@19 301 * Windows DLL support for Fortran API (added missing __declspec(dllexport)).
Chris@19 302
Chris@19 303 * SSE/SSE2 code works properly (i.e. disables itself) on older 386 and 486
Chris@19 304 CPUs lacking a CPUID instruction; thanks to Eric Korpela.
Chris@19 305
Chris@19 306 FFTW 3.1
Chris@19 307
Chris@19 308 * Faster FFTW_ESTIMATE planner.
Chris@19 309
Chris@19 310 * New (faster) algorithm for REDFT00/RODFT00 (type-I DCT/DST) of odd size.
Chris@19 311
Chris@19 312 * "4-step" algorithm for faster FFTs of very large sizes (> 2^18).
Chris@19 313
Chris@19 314 * Faster in-place real-data DFTs (for R2HC and HC2R r2r formats).
Chris@19 315
Chris@19 316 * Faster in-place non-square transpositions (FFTW uses these internally
Chris@19 317 for in-place FFTs, and you can also perform them explicitly using
Chris@19 318 the guru interface).
Chris@19 319
Chris@19 320 * Faster prime-size DFTs: implemented Bluestein's algorithm, as well
Chris@19 321 as a zero-padded Rader variant to limit recursive use of Rader's algorithm.
Chris@19 322
Chris@19 323 * SIMD support for split complex arrays.
Chris@19 324
Chris@19 325 * Much faster Altivec/VMX performance.
Chris@19 326
Chris@19 327 * New fftw_set_timelimit function to specify a (rough) upper bound to the
Chris@19 328 planning time (does not affect ESTIMATE mode).
Chris@19 329
Chris@19 330 * Removed --enable-3dnow support; use --enable-k7 instead.
Chris@19 331
Chris@19 332 * FMA (fused multiply-add) version is now included in "standard" FFTW,
Chris@19 333 and is enabled with --enable-fma (the default on PowerPC and Itanium).
Chris@19 334
Chris@19 335 * Automatic detection of native architecture flag for gcc. New
Chris@19 336 configure options: --enable-portable-binary and --with-gcc-arch=<arch>,
Chris@19 337 for people distributing compiled binaries of FFTW (see manual).
Chris@19 338
Chris@19 339 * Automatic detection of Altivec under Linux with gcc 3.4 (so that
Chris@19 340 same binary should work on both Altivec and non-Altivec PowerPCs).
Chris@19 341
Chris@19 342 * Compiler-specific tweaks/flags/workarounds for gcc 3.4, xlc, HP/UX,
Chris@19 343 Solaris/Intel.
Chris@19 344
Chris@19 345 * Various documentation clarifications.
Chris@19 346
Chris@19 347 * 64-bit clean. (Fixes a bug affecting the split guru planner on
Chris@19 348 64-bit machines, reported by David Necas.)
Chris@19 349
Chris@19 350 * Fixed Debian bug #259612: inadvertent use of SSE instructions on
Chris@19 351 non-SSE machines (causing a crash) for --enable-sse binaries.
Chris@19 352
Chris@19 353 * Fixed bug that caused HC2R transforms to destroy the input in
Chris@19 354 certain cases, even if the user specified FFTW_PRESERVE_INPUT.
Chris@19 355
Chris@19 356 * Fixed bug where wisdom would be lost under rare circumstances,
Chris@19 357 causing excessive planning time.
Chris@19 358
Chris@19 359 * FAQ notes bug in gcc-3.4.[1-3] that causes FFTW to crash with SSE/SSE2.
Chris@19 360
Chris@19 361 * Fixed accidentally exported symbol that prohibited simultaneous
Chris@19 362 linking to double/single multithreaded FFTW (thanks to Alessio Massaro).
Chris@19 363
Chris@19 364 * Support Win32 threads under MinGW (thanks to Alessio Massaro).
Chris@19 365
Chris@19 366 * Fixed problem with building DLL under Cygwin; thanks to Stephane Fillod.
Chris@19 367
Chris@19 368 * Fix build failure if no Fortran compiler is found (thanks to Charles
Chris@19 369 Radley for the bug report).
Chris@19 370
Chris@19 371 * Fixed compilation failure with icc 8.0 and SSE/SSE2. Automatic
Chris@19 372 detection of icc architecture flag (e.g. -xW).
Chris@19 373
Chris@19 374 * Fixed compilation with OpenMP on AIX (thanks to Greg Bauer).
Chris@19 375
Chris@19 376 * Fixed compilation failure on x86-64 with gcc (thanks to Orion Poplawski).
Chris@19 377
Chris@19 378 * Incorporated patch from FreeBSD ports (FreeBSD does not have memalign,
Chris@19 379 but its malloc is 16-byte aligned).
Chris@19 380
Chris@19 381 * Cycle-counter compilation fixes for Itanium, Alpha, x86-64, Sparc,
Chris@19 382 MacOS (thanks to Matt Boman, John Bowman, and James A. Treacy for
Chris@19 383 reports/fixes). Added x86-64 cycle counter for PGI compilers,
Chris@19 384 courtesy Cristiano Calonaci.
Chris@19 385
Chris@19 386 * Fix compilation problem in test program due to C99 conflict.
Chris@19 387
Chris@19 388 * Portability fix for import_system_wisdom with djgpp (thanks to Juan
Chris@19 389 Manuel Guerrero).
Chris@19 390
Chris@19 391 * Fixed compilation failure on MacOS 10.3 due to getopt conflict.
Chris@19 392
Chris@19 393 * Work around Visual C++ (version 6/7) bug in SSE compilation;
Chris@19 394 thanks to Eddie Yee for his detailed report.
Chris@19 395
Chris@19 396 Changes from FFTW 3.1 beta 2:
Chris@19 397
Chris@19 398 * Several minor compilation fixes.
Chris@19 399
Chris@19 400 * Eliminate FFTW_TIMELIMIT flag and replace fftw_timelimit global with
Chris@19 401 fftw_set_timelimit function. Make wisdom work with time-limited plans.
Chris@19 402
Chris@19 403 Changes from FFTW 3.1 beta 1:
Chris@19 404
Chris@19 405 * Fixes for creating DLLs under Windows; thanks to John Pavel for his feedback.
Chris@19 406
Chris@19 407 * Fixed more 64-bit problems, thanks to John Pavel for the bug report.
Chris@19 408
Chris@19 409 * Further speed improvements for Altivec/VMX.
Chris@19 410
Chris@19 411 * Further speed improvements for non-square transpositions.
Chris@19 412
Chris@19 413 * Many minor tweaks.
Chris@19 414
Chris@19 415 FFTW 3.0.1
Chris@19 416
Chris@19 417 * Some speed improvements in SIMD code.
Chris@19 418
Chris@19 419 * --without-cycle-counter option is removed. If no cycle counter is found,
Chris@19 420 then the estimator is always used. A --with-slow-timer option is provided
Chris@19 421 to force the use of lower-resolution timers.
Chris@19 422
Chris@19 423 * Several fixes for compilation under Visual C++, with help from Stefane Ruel.
Chris@19 424
Chris@19 425 * Added x86 cycle counter for Visual C++, with help from Morten Nissov.
Chris@19 426
Chris@19 427 * Added S390 cycle counter, courtesy of James Treacy.
Chris@19 428
Chris@19 429 * Added missing static keyword that prevented simultaneous linkage
Chris@19 430 of different-precision versions; thanks to Rasmus Larsen for the bug report.
Chris@19 431
Chris@19 432 * Corrected accidental omission of f77_wisdom.f file; thanks to Alan Watson.
Chris@19 433
Chris@19 434 * Support -xopenmp flag for SunOS; thanks to John Lou for the bug report.
Chris@19 435
Chris@19 436 * Compilation with HP/UX cc requires -Wp,-H128000 flag to increase
Chris@19 437 preprocessor limits; thanks to Peter Vouras for the bug report.
Chris@19 438
Chris@19 439 * Removed non-portable use of 'tempfile' in fftw-wisdom-to-conf script;
Chris@19 440 thanks to Nicolas Decoster for the patch.
Chris@19 441
Chris@19 442 * Added 'make smallcheck' target in tests/ directory, at the request of
Chris@19 443 James Treacy.
Chris@19 444
Chris@19 445 FFTW 3.0
Chris@19 446
Chris@19 447 Major goals of this release:
Chris@19 448
Chris@19 449 * Speed: often 20% or more faster than FFTW 2.x, even without SIMD (see below).
Chris@19 450
Chris@19 451 * Complete rewrite, to make it easier to add new algorithms and transforms.
Chris@19 452
Chris@19 453 * New API, to support more general semantics.
Chris@19 454
Chris@19 455 Other enhancements:
Chris@19 456
Chris@19 457 * SIMD acceleration on supporting CPUs (SSE, SSE2, 3DNow!, and AltiVec).
Chris@19 458 (With special thanks to Franz Franchetti for many experimental prototypes
Chris@19 459 and to Stefan Kral for the vectorizing generator from fftwgel.)
Chris@19 460
Chris@19 461 * True in-place 1d transforms of large sizes (as well as compressed
Chris@19 462 twiddle tables for additional memory/cache savings).
Chris@19 463
Chris@19 464 * More arbitrary placement of real & imaginary data, e.g. including
Chris@19 465 interleaved (as in FFTW 2.x) as well as separate real/imag arrays.
Chris@19 466
Chris@19 467 * Efficient prime-size transforms of real data.
Chris@19 468
Chris@19 469 * Multidimensional transforms can operate on a subset of a larger matrix,
Chris@19 470 and/or transform selected dimensions of a multidimensional array.
Chris@19 471
Chris@19 472 * By popular demand, simultaneous linking to double precision (fftw),
Chris@19 473 single precision (fftwf), and long-double precision (fftwl) versions
Chris@19 474 of FFTW is now supported.
Chris@19 475
Chris@19 476 * Cycle counters (on all modern CPUs) are exploited to speed planning.
Chris@19 477
Chris@19 478 * Efficient transforms of real even/odd arrays, a.k.a. discrete
Chris@19 479 cosine/sine transforms (types I-IV). (Currently work via pre/post
Chris@19 480 processing of real transforms, ala FFTPACK, so are not optimal.)
Chris@19 481
Chris@19 482 * DHTs (Discrete Hartley Transforms), again via post-processing
Chris@19 483 of real transforms (and thus suboptimal, for now).
Chris@19 484
Chris@19 485 * Support for linking to just those parts of FFTW that you need,
Chris@19 486 greatly reducing the size of statically linked programs when
Chris@19 487 only a limited set of transform sizes/types are required.
Chris@19 488
Chris@19 489 * Canonical global wisdom file (/etc/fftw/wisdom) on Unix, along
Chris@19 490 with a command-line tool (fftw-wisdom) to generate/update it.
Chris@19 491
Chris@19 492 * Fortran API can be used with both g77 and non-g77 compilers
Chris@19 493 simultaneously.
Chris@19 494
Chris@19 495 * Multi-threaded version has optional OpenMP support.
Chris@19 496
Chris@19 497 * Authors' good looks have greatly improved with age.
Chris@19 498
Chris@19 499 Changes from 3.0beta3:
Chris@19 500
Chris@19 501 * Separate FMA distribution to better exploit fused multiply-add instructions
Chris@19 502 on PowerPC (and possibly other) architectures.
Chris@19 503
Chris@19 504 * Performance improvements via some inlining tweaks.
Chris@19 505
Chris@19 506 * fftw_flops now returns double arguments, not int, to avoid overflows
Chris@19 507 for large sizes.
Chris@19 508
Chris@19 509 * Workarounds for automake bugs.
Chris@19 510
Chris@19 511 Changes from 3.0beta2:
Chris@19 512
Chris@19 513 * The standard REDFT00/RODFT00 (DCT-I/DST-I) algorithm (used in
Chris@19 514 FFTPACK, NR, etcetera) turns out to have poor numerical accuracy, so
Chris@19 515 we replaced it with a slower routine that is more accurate.
Chris@19 516
Chris@19 517 * The guru planner and execute functions now have two variants, one that
Chris@19 518 takes complex arguments and one that takes separate real/imag pointers.
Chris@19 519
Chris@19 520 * Execute and planner routines now automatically align the stack on x86,
Chris@19 521 in case the calling program is misaligned.
Chris@19 522
Chris@19 523 * README file for test program.
Chris@19 524
Chris@19 525 * Fixed bugs in the combination of SIMD with multi-threaded transforms.
Chris@19 526
Chris@19 527 * Eliminated internal fftw_threads_init function, which some people were
Chris@19 528 calling accidentally instead of the fftw_init_threads API function.
Chris@19 529
Chris@19 530 * Check for -openmp flag (Intel C compiler) when --enable-openmp is used.
Chris@19 531
Chris@19 532 * Support AMD x86-64 SIMD and cycle counter.
Chris@19 533
Chris@19 534 * Support SSE2 intrinsics in forthcoming gcc 3.3.
Chris@19 535
Chris@19 536 Changes from 3.0beta1:
Chris@19 537
Chris@19 538 * Faster in-place 1d transforms of non-power-of-two sizes.
Chris@19 539
Chris@19 540 * SIMD improvements for in-place, multi-dimensional, and/or non-FFTW_PATIENT
Chris@19 541 transforms.
Chris@19 542
Chris@19 543 * Added support for hard-coded DCT/DST/DHT codelets of small sizes; the
Chris@19 544 default distribution only includes hard-coded size-8 DCT-II/III, however.
Chris@19 545
Chris@19 546 * Many minor improvements to the manual. Added section on using the
Chris@19 547 codelet generator to customize and enhance FFTW.
Chris@19 548
Chris@19 549 * The default 'make check' should now only take a few minutes; for more
Chris@19 550 strenuous tests (which may take a day or so), do 'cd tests; make bigcheck'.
Chris@19 551
Chris@19 552 * fftw_print_plan is split into fftw_fprint_plan and fftw_print_plan, where
Chris@19 553 the latter uses stdout.
Chris@19 554
Chris@19 555 * Fixed ability to compile with a C++ compiler.
Chris@19 556
Chris@19 557 * Fixed support for C99 complex type under glibc.
Chris@19 558
Chris@19 559 * Fixed problems with alloca under MinGW, AIX.
Chris@19 560
Chris@19 561 * Workaround for gcc/SPARC bug.
Chris@19 562
Chris@19 563 * Fixed multi-threaded initialization failure on IRIX due to lack of
Chris@19 564 user-accessible PTHREAD_SCOPE_SYSTEM there.