annotate src/fftw-3.3.3/NEWS @ 83:ae30d91d2ffe

Replace these with versions built using an older toolset (so as to avoid ABI compatibilities when linking on Ubuntu 14.04 for packaging purposes)
author Chris Cannam
date Fri, 07 Feb 2020 11:51:13 +0000
parents 37bf6b4a2645
children
rev   line source
Chris@10 1 FFTW 3.3.3
Chris@10 2
Chris@10 3 * Fix deadlock bug in MPI transforms (thanks to Michael Pippig for the
Chris@10 4 bug report and patch, and to Graham Dennis for the bug report).
Chris@10 5
Chris@10 6 * Use 128-bit ARM NEON instructions instead of 64-bits. This change
Chris@10 7 appears to speed up even ARM processors with a 64-bit NEON pipe.
Chris@10 8
Chris@10 9 * Speed improvements for single-precision AVX.
Chris@10 10
Chris@10 11 * Speed up planner on machines without "official" cycle counters, such as ARM.
Chris@10 12
Chris@10 13 FFTW 3.3.2
Chris@10 14
Chris@10 15 * Removed an archaic stack-alignment hack that was failing with
Chris@10 16 gcc-4.7/i386.
Chris@10 17
Chris@10 18 * Added stack-alignment hack necessary for gcc on Windows/i386. We
Chris@10 19 will regret this in ten years (see previous change).
Chris@10 20
Chris@10 21 * Fix incompatibility with Intel icc which pretends to be gcc
Chris@10 22 but does not support quad precision.
Chris@10 23
Chris@10 24 * make libfftw{threads,mpi} depend upon libfftw when using libtool;
Chris@10 25 this is consistent with most other libraries and simplifies the life
Chris@10 26 of various distributors of GNU/Linux.
Chris@10 27
Chris@10 28 FFTW 3.3.1
Chris@10 29
Chris@10 30 * Changes since 3.3.1-beta1:
Chris@10 31
Chris@10 32 - Reduced planning time in estimate mode for sizes with large
Chris@10 33 prime factors.
Chris@10 34
Chris@10 35 - Added AVX autodetection under Visual Studio. Thanks Carsten
Chris@10 36 Steger for submitting the necessary code.
Chris@10 37
Chris@10 38 - Modern Fortran interface now uses a separate fftw3l.f03 interface
Chris@10 39 file for the long double interface, which is not supported by
Chris@10 40 some Fortran compilers. Provided new fftw3q.f03 interface file
Chris@10 41 to access the quadruple-precision FFTW routines with recent
Chris@10 42 versions of gcc/gfortran.
Chris@10 43
Chris@10 44 * Added support for the NEON extensions to the ARM ISA. (Note to beta
Chris@10 45 users: an ARM cycle counter is not yet implemented; please contact
Chris@10 46 fftw@fftw.org if you know how to do it right.)
Chris@10 47
Chris@10 48 * MPI code now compiles even if mpicc is a C++ compiler; thanks to
Chris@10 49 Kyle Spyksma for the bug report.
Chris@10 50
Chris@10 51 FFTW 3.3
Chris@10 52
Chris@10 53 * Changes since 3.3-beta1:
Chris@10 54
Chris@10 55 - Compiling OpenMP support (--enable-openmp) now installs a
Chris@10 56 fftw3_omp library, instead of fftw3_threads, so that OpenMP
Chris@10 57 and POSIX threads (--enable-threads) libraries can be built
Chris@10 58 and installed at the same time.
Chris@10 59
Chris@10 60 - Various minor compilation fixes, corrections of manual typos, and
Chris@10 61 improvements to the benchmark test program.
Chris@10 62
Chris@10 63 * Add support for the AVX extensions to x86 and x86-64. The AVX code
Chris@10 64 works with 16-byte alignment (as opposed to 32-byte alignment),
Chris@10 65 so there is no ABI change compared to FFTW 3.2.2.
Chris@10 66
Chris@10 67 * Added Fortran 2003 interface, which should be usable on most modern
Chris@10 68 Fortran compilers (e.g. gfortran) and provides type-checked access
Chris@10 69 to the the C FFTW interface. (The legacy Fortran-77 interface is
Chris@10 70 still included also.)
Chris@10 71
Chris@10 72 * Added MPI distributed-memory transforms. Compared to 3.3alpha,
Chris@10 73 the major changes in the MPI transforms are:
Chris@10 74 - Fixed some deadlock and crashing bugs.
Chris@10 75 - Added Fortran 2003 interface.
Chris@10 76 - Added new-array execute functions for MPI plans.
Chris@10 77 - Eliminated use of large MPI tags, since Cray MPI requires tags < 2^24;
Chris@10 78 thanks to Jonathan Bentz for the bug report.
Chris@10 79 - Expanded documentation.
Chris@10 80 - 'make check' now runs MPI tests
Chris@10 81 - Some ABI changes - not binary-compatible with 3.3alpha MPI.
Chris@10 82
Chris@10 83 * Add support for quad-precision __float128 in gcc 4.6 or later (on x86.
Chris@10 84 x86-64, and Itanium). The new routines use the fftwq_ prefix.
Chris@10 85
Chris@10 86 * Removed support for MIPS paired-single instructions due to lack of
Chris@10 87 available hardware for testing. Users who want this functionality
Chris@10 88 should continue using FFTW 3.2.x. (Note that FFTW 3.3 still works
Chris@10 89 on MIPS; this only concerns special instructions available on some
Chris@10 90 MIPS chips.)
Chris@10 91
Chris@10 92 * Removed support for the Cell Broadband Engine. Cell users should
Chris@10 93 use FFTW 3.2.x.
Chris@10 94
Chris@10 95 * New convenience functions fftw_alloc_real and fftw_alloc_complex
Chris@10 96 to use fftw_malloc for real and complex arrays without typecasts
Chris@10 97 or sizeof.
Chris@10 98
Chris@10 99 * New convenience functions fftw_export_wisdom_to_filename and
Chris@10 100 fftw_import_wisdom_from_filename that export/import wisdom
Chris@10 101 to a file, which don't require you to open/close the file yourself.
Chris@10 102
Chris@10 103 * New function fftw_cost to return FFTW's internal cost metric for
Chris@10 104 a given plan; thanks to Rhys Ulerich and Nathanael Schaeffer for the
Chris@10 105 suggestion.
Chris@10 106
Chris@10 107 * The --enable-sse2 configure flag now works in both double and single
Chris@10 108 precision (and is equivalent to --enable-sse in the latter case).
Chris@10 109
Chris@10 110 * Remove --enable-portable-binary flag: we new produce portable binaries
Chris@10 111 by default.
Chris@10 112
Chris@10 113 * Remove the automatic detection of native architecture flag for gcc
Chris@10 114 which was introduced in fftw-3.1, since new gcc supports -mtune=native.
Chris@10 115 Remove the --with-gcc-arch flag; if you want to specify a particlar
Chris@10 116 arch to configure, use ./configure CC="gcc -mtune=...".
Chris@10 117
Chris@10 118 * --with-our-malloc16 configure flag is now renamed --with-our-malloc.
Chris@10 119
Chris@10 120 * Fixed build problem failure when srand48 declaration is missing;
Chris@10 121 thanks to Ralf Wildenhues for the bug report.
Chris@10 122
Chris@10 123 * Fixed bug in fftw_set_timelimit: ensure that a negative timelimit
Chris@10 124 is equivalent to no timelimit in all cases. Thanks to William Andrew
Chris@10 125 Burnson for the bug report.
Chris@10 126
Chris@10 127 * Fixed stack-overflow problem on OpenBSD caused by using alloca with
Chris@10 128 too large a buffer.
Chris@10 129
Chris@10 130 FFTW 3.2.2
Chris@10 131
Chris@10 132 * Improve performance of some copy operations of complex arrays on
Chris@10 133 x86 machines.
Chris@10 134
Chris@10 135 * Add configure flag to disable alloca(), which is broken in mingw64.
Chris@10 136
Chris@10 137 * Planning in FFTW_ESTIMATE mode for r2r transforms became slower
Chris@10 138 between fftw-3.1.3 and 3.2. This regression has now been fixed.
Chris@10 139
Chris@10 140 FFTW 3.2.1
Chris@10 141
Chris@10 142 * Performance improvements for some multidimensional r2c/c2r transforms;
Chris@10 143 thanks to Eugene Miloslavsky for his benchmark reports.
Chris@10 144
Chris@10 145 * Compile with icc on MacOS X, use better icc compiler flags.
Chris@10 146
Chris@10 147 * Compilation fixes for systems where snprintf is defined as a macro;
Chris@10 148 thanks to Marcus Mae for the bug report.
Chris@10 149
Chris@10 150 * Fortran documentation now recommends not using dfftw_execute,
Chris@10 151 because of reports of problems with various Fortran compilers;
Chris@10 152 it is better to use dfftw_execute_dft etcetera.
Chris@10 153
Chris@10 154 * Some documentation clarifications, e.g. of fact that --enable-openmp
Chris@10 155 and --enable-threads are mutually exclusive (thanks to Long To),
Chris@10 156 and document slightly odd behavior of plan_guru_r2r in Fortran
Chris@10 157 (thanks to Alexander Pozdneev).
Chris@10 158
Chris@10 159 * FAQ was accidentally omitted from 3.2 tarball.
Chris@10 160
Chris@10 161 * Remove some extraneous (harmless) files accidentally included in
Chris@10 162 a subdirectory of the 3.2 tarball.
Chris@10 163
Chris@10 164 FFTW 3.2
Chris@10 165
Chris@10 166 * Worked around apparent glibc bug that leads to rare hangs when freeing
Chris@10 167 semaphores.
Chris@10 168
Chris@10 169 * Fixed segfault due to unaligned access in certain obscure problems
Chris@10 170 that use SSE and multiple threads.
Chris@10 171
Chris@10 172 * MPI transforms not included, as they are still in alpha; the alpha
Chris@10 173 versions of the MPI transforms have been moved to FFTW 3.3alpha1.
Chris@10 174
Chris@10 175 FFTW 3.2alpha3
Chris@10 176
Chris@10 177 * Performance improvements for sizes with factors of 5 and 10.
Chris@10 178
Chris@10 179 * Documented FFTW_WISDOM_ONLY flag, at the suggestion of Mario
Chris@10 180 Emmenlauer and Phil Dumont.
Chris@10 181
Chris@10 182 * Port Cell code to SDK2.1 (libspe2), as opposed to the old libspe1 code.
Chris@10 183
Chris@10 184 * Performance improvements in Cell code for N < 32k, thanks to Jan Wagner
Chris@10 185 for the suggestions.
Chris@10 186
Chris@10 187 * Cycle counter for Sun x86_64 compiler, and compilation fix in cycle
Chris@10 188 counter for AIX/xlc (thanks to Jeff Haferman for the bug report).
Chris@10 189
Chris@10 190 * Fixed incorrect type prefix in MPI code that prevented wisdom routines
Chris@10 191 from working in single precision (thanks to Eric A. Borisch for the report).
Chris@10 192
Chris@10 193 * Added 'make check' for MPI code (which still fails in a couple corner
Chris@10 194 cases, but should be much better than in alpha2).
Chris@10 195
Chris@10 196 * Many other small fixes.
Chris@10 197
Chris@10 198 FFTW 3.2alpha2
Chris@10 199
Chris@10 200 * Support for the Cell processor, donated by IBM Research; see README.Cell
Chris@10 201 and the Cell section of the manual.
Chris@10 202
Chris@10 203 * New 64-bit API: for every "plan_guru" function there is a new "plan_guru64"
Chris@10 204 function with the same semantics, but which takes fftw_iodim64 instead of
Chris@10 205 fftw_iodim. fftw_iodim64 is the same as fftw_iodim, except that it takes
Chris@10 206 ptrdiff_t integer types as parameters, which is a 64-bit type on
Chris@10 207 64-bit machines. This is only useful for specifying very large transforms
Chris@10 208 on 64-bit machines. (Internally, FFTW uses ptrdiff_t everywhere
Chris@10 209 regardless of what API you choose.)
Chris@10 210
Chris@10 211 * Experimental MPI support. Complex one- and multi-dimensional FFTs,
Chris@10 212 multi-dimensional r2r, multi-dimensional r2c/c2r transforms, and
Chris@10 213 distributed transpose operations, with 1d block distributions.
Chris@10 214 (This is an alpha preview: routines have not been exhaustively
Chris@10 215 tested, documentation is incomplete, and some functionality is
Chris@10 216 missing, e.g. Fortran support.) See mpi/README and also the MPI
Chris@10 217 section of the manual.
Chris@10 218
Chris@10 219 * Significantly faster r2c/c2r transforms, especially on machines with SIMD.
Chris@10 220
Chris@10 221 * Rewritten multi-threaded support for better performance by
Chris@10 222 re-using a fixed pool of threads rather than continually
Chris@10 223 respawning and joining (which nowadays is much slower).
Chris@10 224
Chris@10 225 * Support for MIPS paired-single SIMD instructions, donated by
Chris@10 226 Codesourcery.
Chris@10 227
Chris@10 228 * FFTW_WISDOM_ONLY planner flag, to create plan only if wisdom is
Chris@10 229 available and return NULL otherwise.
Chris@10 230
Chris@10 231 * Removed k7 support, which only worked in 32-bit mode and is
Chris@10 232 becoming obsolete. Use --enable-sse instead.
Chris@10 233
Chris@10 234 * Added --with-g77-wrappers configure option to force inclusion
Chris@10 235 of g77 wrappers, in addition to whatever is needed for the
Chris@10 236 detected Fortran compilers. This is mainly intended for GNU/Linux
Chris@10 237 distros switching to gfortran that wish to include both
Chris@10 238 gfortran and g77 support in FFTW.
Chris@10 239
Chris@10 240 * In manual, renamed "guru execute" functions to "new-array execute"
Chris@10 241 functions, to reduce confusion with the guru planner interface.
Chris@10 242 (The programming interface is unchanged.)
Chris@10 243
Chris@10 244 * Add missing __declspec attribute to threads API functions when compiling
Chris@10 245 for Windows; thanks to Robert O. Morris for the bug report.
Chris@10 246
Chris@10 247 * Fixed missing return value from dfftw_init_threads in Fortran;
Chris@10 248 thanks to Markus Wetzstein for the bug report.
Chris@10 249
Chris@10 250 FFTW 3.1.1
Chris@10 251
Chris@10 252 * Performance improvements for Intel EMT64.
Chris@10 253
Chris@10 254 * Performance improvements for large-size transforms with SIMD.
Chris@10 255
Chris@10 256 * Cycle counter support for Intel icc and Visual C++ on x86-64.
Chris@10 257
Chris@10 258 * In fftw-wisdom tool, replaced obsolete --impatient with --measure.
Chris@10 259
Chris@10 260 * Fixed compilation failure with AIX/xlc; thanks to Joseph Thomas.
Chris@10 261
Chris@10 262 * Windows DLL support for Fortran API (added missing __declspec(dllexport)).
Chris@10 263
Chris@10 264 * SSE/SSE2 code works properly (i.e. disables itself) on older 386 and 486
Chris@10 265 CPUs lacking a CPUID instruction; thanks to Eric Korpela.
Chris@10 266
Chris@10 267 FFTW 3.1
Chris@10 268
Chris@10 269 * Faster FFTW_ESTIMATE planner.
Chris@10 270
Chris@10 271 * New (faster) algorithm for REDFT00/RODFT00 (type-I DCT/DST) of odd size.
Chris@10 272
Chris@10 273 * "4-step" algorithm for faster FFTs of very large sizes (> 2^18).
Chris@10 274
Chris@10 275 * Faster in-place real-data DFTs (for R2HC and HC2R r2r formats).
Chris@10 276
Chris@10 277 * Faster in-place non-square transpositions (FFTW uses these internally
Chris@10 278 for in-place FFTs, and you can also perform them explicitly using
Chris@10 279 the guru interface).
Chris@10 280
Chris@10 281 * Faster prime-size DFTs: implemented Bluestein's algorithm, as well
Chris@10 282 as a zero-padded Rader variant to limit recursive use of Rader's algorithm.
Chris@10 283
Chris@10 284 * SIMD support for split complex arrays.
Chris@10 285
Chris@10 286 * Much faster Altivec/VMX performance.
Chris@10 287
Chris@10 288 * New fftw_set_timelimit function to specify a (rough) upper bound to the
Chris@10 289 planning time (does not affect ESTIMATE mode).
Chris@10 290
Chris@10 291 * Removed --enable-3dnow support; use --enable-k7 instead.
Chris@10 292
Chris@10 293 * FMA (fused multiply-add) version is now included in "standard" FFTW,
Chris@10 294 and is enabled with --enable-fma (the default on PowerPC and Itanium).
Chris@10 295
Chris@10 296 * Automatic detection of native architecture flag for gcc. New
Chris@10 297 configure options: --enable-portable-binary and --with-gcc-arch=<arch>,
Chris@10 298 for people distributing compiled binaries of FFTW (see manual).
Chris@10 299
Chris@10 300 * Automatic detection of Altivec under Linux with gcc 3.4 (so that
Chris@10 301 same binary should work on both Altivec and non-Altivec PowerPCs).
Chris@10 302
Chris@10 303 * Compiler-specific tweaks/flags/workarounds for gcc 3.4, xlc, HP/UX,
Chris@10 304 Solaris/Intel.
Chris@10 305
Chris@10 306 * Various documentation clarifications.
Chris@10 307
Chris@10 308 * 64-bit clean. (Fixes a bug affecting the split guru planner on
Chris@10 309 64-bit machines, reported by David Necas.)
Chris@10 310
Chris@10 311 * Fixed Debian bug #259612: inadvertent use of SSE instructions on
Chris@10 312 non-SSE machines (causing a crash) for --enable-sse binaries.
Chris@10 313
Chris@10 314 * Fixed bug that caused HC2R transforms to destroy the input in
Chris@10 315 certain cases, even if the user specified FFTW_PRESERVE_INPUT.
Chris@10 316
Chris@10 317 * Fixed bug where wisdom would be lost under rare circumstances,
Chris@10 318 causing excessive planning time.
Chris@10 319
Chris@10 320 * FAQ notes bug in gcc-3.4.[1-3] that causes FFTW to crash with SSE/SSE2.
Chris@10 321
Chris@10 322 * Fixed accidentally exported symbol that prohibited simultaneous
Chris@10 323 linking to double/single multithreaded FFTW (thanks to Alessio Massaro).
Chris@10 324
Chris@10 325 * Support Win32 threads under MinGW (thanks to Alessio Massaro).
Chris@10 326
Chris@10 327 * Fixed problem with building DLL under Cygwin; thanks to Stephane Fillod.
Chris@10 328
Chris@10 329 * Fix build failure if no Fortran compiler is found (thanks to Charles
Chris@10 330 Radley for the bug report).
Chris@10 331
Chris@10 332 * Fixed compilation failure with icc 8.0 and SSE/SSE2. Automatic
Chris@10 333 detection of icc architecture flag (e.g. -xW).
Chris@10 334
Chris@10 335 * Fixed compilation with OpenMP on AIX (thanks to Greg Bauer).
Chris@10 336
Chris@10 337 * Fixed compilation failure on x86-64 with gcc (thanks to Orion Poplawski).
Chris@10 338
Chris@10 339 * Incorporated patch from FreeBSD ports (FreeBSD does not have memalign,
Chris@10 340 but its malloc is 16-byte aligned).
Chris@10 341
Chris@10 342 * Cycle-counter compilation fixes for Itanium, Alpha, x86-64, Sparc,
Chris@10 343 MacOS (thanks to Matt Boman, John Bowman, and James A. Treacy for
Chris@10 344 reports/fixes). Added x86-64 cycle counter for PGI compilers,
Chris@10 345 courtesy Cristiano Calonaci.
Chris@10 346
Chris@10 347 * Fix compilation problem in test program due to C99 conflict.
Chris@10 348
Chris@10 349 * Portability fix for import_system_wisdom with djgpp (thanks to Juan
Chris@10 350 Manuel Guerrero).
Chris@10 351
Chris@10 352 * Fixed compilation failure on MacOS 10.3 due to getopt conflict.
Chris@10 353
Chris@10 354 * Work around Visual C++ (version 6/7) bug in SSE compilation;
Chris@10 355 thanks to Eddie Yee for his detailed report.
Chris@10 356
Chris@10 357 Changes from FFTW 3.1 beta 2:
Chris@10 358
Chris@10 359 * Several minor compilation fixes.
Chris@10 360
Chris@10 361 * Eliminate FFTW_TIMELIMIT flag and replace fftw_timelimit global with
Chris@10 362 fftw_set_timelimit function. Make wisdom work with time-limited plans.
Chris@10 363
Chris@10 364 Changes from FFTW 3.1 beta 1:
Chris@10 365
Chris@10 366 * Fixes for creating DLLs under Windows; thanks to John Pavel for his feedback.
Chris@10 367
Chris@10 368 * Fixed more 64-bit problems, thanks to John Pavel for the bug report.
Chris@10 369
Chris@10 370 * Further speed improvements for Altivec/VMX.
Chris@10 371
Chris@10 372 * Further speed improvements for non-square transpositions.
Chris@10 373
Chris@10 374 * Many minor tweaks.
Chris@10 375
Chris@10 376 FFTW 3.0.1
Chris@10 377
Chris@10 378 * Some speed improvements in SIMD code.
Chris@10 379
Chris@10 380 * --without-cycle-counter option is removed. If no cycle counter is found,
Chris@10 381 then the estimator is always used. A --with-slow-timer option is provided
Chris@10 382 to force the use of lower-resolution timers.
Chris@10 383
Chris@10 384 * Several fixes for compilation under Visual C++, with help from Stefane Ruel.
Chris@10 385
Chris@10 386 * Added x86 cycle counter for Visual C++, with help from Morten Nissov.
Chris@10 387
Chris@10 388 * Added S390 cycle counter, courtesy of James Treacy.
Chris@10 389
Chris@10 390 * Added missing static keyword that prevented simultaneous linkage
Chris@10 391 of different-precision versions; thanks to Rasmus Larsen for the bug report.
Chris@10 392
Chris@10 393 * Corrected accidental omission of f77_wisdom.f file; thanks to Alan Watson.
Chris@10 394
Chris@10 395 * Support -xopenmp flag for SunOS; thanks to John Lou for the bug report.
Chris@10 396
Chris@10 397 * Compilation with HP/UX cc requires -Wp,-H128000 flag to increase
Chris@10 398 preprocessor limits; thanks to Peter Vouras for the bug report.
Chris@10 399
Chris@10 400 * Removed non-portable use of 'tempfile' in fftw-wisdom-to-conf script;
Chris@10 401 thanks to Nicolas Decoster for the patch.
Chris@10 402
Chris@10 403 * Added 'make smallcheck' target in tests/ directory, at the request of
Chris@10 404 James Treacy.
Chris@10 405
Chris@10 406 FFTW 3.0
Chris@10 407
Chris@10 408 Major goals of this release:
Chris@10 409
Chris@10 410 * Speed: often 20% or more faster than FFTW 2.x, even without SIMD (see below).
Chris@10 411
Chris@10 412 * Complete rewrite, to make it easier to add new algorithms and transforms.
Chris@10 413
Chris@10 414 * New API, to support more general semantics.
Chris@10 415
Chris@10 416 Other enhancements:
Chris@10 417
Chris@10 418 * SIMD acceleration on supporting CPUs (SSE, SSE2, 3DNow!, and AltiVec).
Chris@10 419 (With special thanks to Franz Franchetti for many experimental prototypes
Chris@10 420 and to Stefan Kral for the vectorizing generator from fftwgel.)
Chris@10 421
Chris@10 422 * True in-place 1d transforms of large sizes (as well as compressed
Chris@10 423 twiddle tables for additional memory/cache savings).
Chris@10 424
Chris@10 425 * More arbitrary placement of real & imaginary data, e.g. including
Chris@10 426 interleaved (as in FFTW 2.x) as well as separate real/imag arrays.
Chris@10 427
Chris@10 428 * Efficient prime-size transforms of real data.
Chris@10 429
Chris@10 430 * Multidimensional transforms can operate on a subset of a larger matrix,
Chris@10 431 and/or transform selected dimensions of a multidimensional array.
Chris@10 432
Chris@10 433 * By popular demand, simultaneous linking to double precision (fftw),
Chris@10 434 single precision (fftwf), and long-double precision (fftwl) versions
Chris@10 435 of FFTW is now supported.
Chris@10 436
Chris@10 437 * Cycle counters (on all modern CPUs) are exploited to speed planning.
Chris@10 438
Chris@10 439 * Efficient transforms of real even/odd arrays, a.k.a. discrete
Chris@10 440 cosine/sine transforms (types I-IV). (Currently work via pre/post
Chris@10 441 processing of real transforms, ala FFTPACK, so are not optimal.)
Chris@10 442
Chris@10 443 * DHTs (Discrete Hartley Transforms), again via post-processing
Chris@10 444 of real transforms (and thus suboptimal, for now).
Chris@10 445
Chris@10 446 * Support for linking to just those parts of FFTW that you need,
Chris@10 447 greatly reducing the size of statically linked programs when
Chris@10 448 only a limited set of transform sizes/types are required.
Chris@10 449
Chris@10 450 * Canonical global wisdom file (/etc/fftw/wisdom) on Unix, along
Chris@10 451 with a command-line tool (fftw-wisdom) to generate/update it.
Chris@10 452
Chris@10 453 * Fortran API can be used with both g77 and non-g77 compilers
Chris@10 454 simultaneously.
Chris@10 455
Chris@10 456 * Multi-threaded version has optional OpenMP support.
Chris@10 457
Chris@10 458 * Authors' good looks have greatly improved with age.
Chris@10 459
Chris@10 460 Changes from 3.0beta3:
Chris@10 461
Chris@10 462 * Separate FMA distribution to better exploit fused multiply-add instructions
Chris@10 463 on PowerPC (and possibly other) architectures.
Chris@10 464
Chris@10 465 * Performance improvements via some inlining tweaks.
Chris@10 466
Chris@10 467 * fftw_flops now returns double arguments, not int, to avoid overflows
Chris@10 468 for large sizes.
Chris@10 469
Chris@10 470 * Workarounds for automake bugs.
Chris@10 471
Chris@10 472 Changes from 3.0beta2:
Chris@10 473
Chris@10 474 * The standard REDFT00/RODFT00 (DCT-I/DST-I) algorithm (used in
Chris@10 475 FFTPACK, NR, etcetera) turns out to have poor numerical accuracy, so
Chris@10 476 we replaced it with a slower routine that is more accurate.
Chris@10 477
Chris@10 478 * The guru planner and execute functions now have two variants, one that
Chris@10 479 takes complex arguments and one that takes separate real/imag pointers.
Chris@10 480
Chris@10 481 * Execute and planner routines now automatically align the stack on x86,
Chris@10 482 in case the calling program is misaligned.
Chris@10 483
Chris@10 484 * README file for test program.
Chris@10 485
Chris@10 486 * Fixed bugs in the combination of SIMD with multi-threaded transforms.
Chris@10 487
Chris@10 488 * Eliminated internal fftw_threads_init function, which some people were
Chris@10 489 calling accidentally instead of the fftw_init_threads API function.
Chris@10 490
Chris@10 491 * Check for -openmp flag (Intel C compiler) when --enable-openmp is used.
Chris@10 492
Chris@10 493 * Support AMD x86-64 SIMD and cycle counter.
Chris@10 494
Chris@10 495 * Support SSE2 intrinsics in forthcoming gcc 3.3.
Chris@10 496
Chris@10 497 Changes from 3.0beta1:
Chris@10 498
Chris@10 499 * Faster in-place 1d transforms of non-power-of-two sizes.
Chris@10 500
Chris@10 501 * SIMD improvements for in-place, multi-dimensional, and/or non-FFTW_PATIENT
Chris@10 502 transforms.
Chris@10 503
Chris@10 504 * Added support for hard-coded DCT/DST/DHT codelets of small sizes; the
Chris@10 505 default distribution only includes hard-coded size-8 DCT-II/III, however.
Chris@10 506
Chris@10 507 * Many minor improvements to the manual. Added section on using the
Chris@10 508 codelet generator to customize and enhance FFTW.
Chris@10 509
Chris@10 510 * The default 'make check' should now only take a few minutes; for more
Chris@10 511 strenuous tests (which may take a day or so), do 'cd tests; make bigcheck'.
Chris@10 512
Chris@10 513 * fftw_print_plan is split into fftw_fprint_plan and fftw_print_plan, where
Chris@10 514 the latter uses stdout.
Chris@10 515
Chris@10 516 * Fixed ability to compile with a C++ compiler.
Chris@10 517
Chris@10 518 * Fixed support for C99 complex type under glibc.
Chris@10 519
Chris@10 520 * Fixed problems with alloca under MinGW, AIX.
Chris@10 521
Chris@10 522 * Workaround for gcc/SPARC bug.
Chris@10 523
Chris@10 524 * Fixed multi-threaded initialization failure on IRIX due to lack of
Chris@10 525 user-accessible PTHREAD_SCOPE_SYSTEM there.