annotate src/fftw-3.3.3/NEWS @ 169:223a55898ab9 tip default

Add null config files
author Chris Cannam <cannam@all-day-breakfast.com>
date Mon, 02 Mar 2020 14:03:47 +0000
parents 89f5e221ed7b
children
rev   line source
cannam@95 1 FFTW 3.3.3
cannam@95 2
cannam@95 3 * Fix deadlock bug in MPI transforms (thanks to Michael Pippig for the
cannam@95 4 bug report and patch, and to Graham Dennis for the bug report).
cannam@95 5
cannam@95 6 * Use 128-bit ARM NEON instructions instead of 64-bits. This change
cannam@95 7 appears to speed up even ARM processors with a 64-bit NEON pipe.
cannam@95 8
cannam@95 9 * Speed improvements for single-precision AVX.
cannam@95 10
cannam@95 11 * Speed up planner on machines without "official" cycle counters, such as ARM.
cannam@95 12
cannam@95 13 FFTW 3.3.2
cannam@95 14
cannam@95 15 * Removed an archaic stack-alignment hack that was failing with
cannam@95 16 gcc-4.7/i386.
cannam@95 17
cannam@95 18 * Added stack-alignment hack necessary for gcc on Windows/i386. We
cannam@95 19 will regret this in ten years (see previous change).
cannam@95 20
cannam@95 21 * Fix incompatibility with Intel icc which pretends to be gcc
cannam@95 22 but does not support quad precision.
cannam@95 23
cannam@95 24 * make libfftw{threads,mpi} depend upon libfftw when using libtool;
cannam@95 25 this is consistent with most other libraries and simplifies the life
cannam@95 26 of various distributors of GNU/Linux.
cannam@95 27
cannam@95 28 FFTW 3.3.1
cannam@95 29
cannam@95 30 * Changes since 3.3.1-beta1:
cannam@95 31
cannam@95 32 - Reduced planning time in estimate mode for sizes with large
cannam@95 33 prime factors.
cannam@95 34
cannam@95 35 - Added AVX autodetection under Visual Studio. Thanks Carsten
cannam@95 36 Steger for submitting the necessary code.
cannam@95 37
cannam@95 38 - Modern Fortran interface now uses a separate fftw3l.f03 interface
cannam@95 39 file for the long double interface, which is not supported by
cannam@95 40 some Fortran compilers. Provided new fftw3q.f03 interface file
cannam@95 41 to access the quadruple-precision FFTW routines with recent
cannam@95 42 versions of gcc/gfortran.
cannam@95 43
cannam@95 44 * Added support for the NEON extensions to the ARM ISA. (Note to beta
cannam@95 45 users: an ARM cycle counter is not yet implemented; please contact
cannam@95 46 fftw@fftw.org if you know how to do it right.)
cannam@95 47
cannam@95 48 * MPI code now compiles even if mpicc is a C++ compiler; thanks to
cannam@95 49 Kyle Spyksma for the bug report.
cannam@95 50
cannam@95 51 FFTW 3.3
cannam@95 52
cannam@95 53 * Changes since 3.3-beta1:
cannam@95 54
cannam@95 55 - Compiling OpenMP support (--enable-openmp) now installs a
cannam@95 56 fftw3_omp library, instead of fftw3_threads, so that OpenMP
cannam@95 57 and POSIX threads (--enable-threads) libraries can be built
cannam@95 58 and installed at the same time.
cannam@95 59
cannam@95 60 - Various minor compilation fixes, corrections of manual typos, and
cannam@95 61 improvements to the benchmark test program.
cannam@95 62
cannam@95 63 * Add support for the AVX extensions to x86 and x86-64. The AVX code
cannam@95 64 works with 16-byte alignment (as opposed to 32-byte alignment),
cannam@95 65 so there is no ABI change compared to FFTW 3.2.2.
cannam@95 66
cannam@95 67 * Added Fortran 2003 interface, which should be usable on most modern
cannam@95 68 Fortran compilers (e.g. gfortran) and provides type-checked access
cannam@95 69 to the the C FFTW interface. (The legacy Fortran-77 interface is
cannam@95 70 still included also.)
cannam@95 71
cannam@95 72 * Added MPI distributed-memory transforms. Compared to 3.3alpha,
cannam@95 73 the major changes in the MPI transforms are:
cannam@95 74 - Fixed some deadlock and crashing bugs.
cannam@95 75 - Added Fortran 2003 interface.
cannam@95 76 - Added new-array execute functions for MPI plans.
cannam@95 77 - Eliminated use of large MPI tags, since Cray MPI requires tags < 2^24;
cannam@95 78 thanks to Jonathan Bentz for the bug report.
cannam@95 79 - Expanded documentation.
cannam@95 80 - 'make check' now runs MPI tests
cannam@95 81 - Some ABI changes - not binary-compatible with 3.3alpha MPI.
cannam@95 82
cannam@95 83 * Add support for quad-precision __float128 in gcc 4.6 or later (on x86.
cannam@95 84 x86-64, and Itanium). The new routines use the fftwq_ prefix.
cannam@95 85
cannam@95 86 * Removed support for MIPS paired-single instructions due to lack of
cannam@95 87 available hardware for testing. Users who want this functionality
cannam@95 88 should continue using FFTW 3.2.x. (Note that FFTW 3.3 still works
cannam@95 89 on MIPS; this only concerns special instructions available on some
cannam@95 90 MIPS chips.)
cannam@95 91
cannam@95 92 * Removed support for the Cell Broadband Engine. Cell users should
cannam@95 93 use FFTW 3.2.x.
cannam@95 94
cannam@95 95 * New convenience functions fftw_alloc_real and fftw_alloc_complex
cannam@95 96 to use fftw_malloc for real and complex arrays without typecasts
cannam@95 97 or sizeof.
cannam@95 98
cannam@95 99 * New convenience functions fftw_export_wisdom_to_filename and
cannam@95 100 fftw_import_wisdom_from_filename that export/import wisdom
cannam@95 101 to a file, which don't require you to open/close the file yourself.
cannam@95 102
cannam@95 103 * New function fftw_cost to return FFTW's internal cost metric for
cannam@95 104 a given plan; thanks to Rhys Ulerich and Nathanael Schaeffer for the
cannam@95 105 suggestion.
cannam@95 106
cannam@95 107 * The --enable-sse2 configure flag now works in both double and single
cannam@95 108 precision (and is equivalent to --enable-sse in the latter case).
cannam@95 109
cannam@95 110 * Remove --enable-portable-binary flag: we new produce portable binaries
cannam@95 111 by default.
cannam@95 112
cannam@95 113 * Remove the automatic detection of native architecture flag for gcc
cannam@95 114 which was introduced in fftw-3.1, since new gcc supports -mtune=native.
cannam@95 115 Remove the --with-gcc-arch flag; if you want to specify a particlar
cannam@95 116 arch to configure, use ./configure CC="gcc -mtune=...".
cannam@95 117
cannam@95 118 * --with-our-malloc16 configure flag is now renamed --with-our-malloc.
cannam@95 119
cannam@95 120 * Fixed build problem failure when srand48 declaration is missing;
cannam@95 121 thanks to Ralf Wildenhues for the bug report.
cannam@95 122
cannam@95 123 * Fixed bug in fftw_set_timelimit: ensure that a negative timelimit
cannam@95 124 is equivalent to no timelimit in all cases. Thanks to William Andrew
cannam@95 125 Burnson for the bug report.
cannam@95 126
cannam@95 127 * Fixed stack-overflow problem on OpenBSD caused by using alloca with
cannam@95 128 too large a buffer.
cannam@95 129
cannam@95 130 FFTW 3.2.2
cannam@95 131
cannam@95 132 * Improve performance of some copy operations of complex arrays on
cannam@95 133 x86 machines.
cannam@95 134
cannam@95 135 * Add configure flag to disable alloca(), which is broken in mingw64.
cannam@95 136
cannam@95 137 * Planning in FFTW_ESTIMATE mode for r2r transforms became slower
cannam@95 138 between fftw-3.1.3 and 3.2. This regression has now been fixed.
cannam@95 139
cannam@95 140 FFTW 3.2.1
cannam@95 141
cannam@95 142 * Performance improvements for some multidimensional r2c/c2r transforms;
cannam@95 143 thanks to Eugene Miloslavsky for his benchmark reports.
cannam@95 144
cannam@95 145 * Compile with icc on MacOS X, use better icc compiler flags.
cannam@95 146
cannam@95 147 * Compilation fixes for systems where snprintf is defined as a macro;
cannam@95 148 thanks to Marcus Mae for the bug report.
cannam@95 149
cannam@95 150 * Fortran documentation now recommends not using dfftw_execute,
cannam@95 151 because of reports of problems with various Fortran compilers;
cannam@95 152 it is better to use dfftw_execute_dft etcetera.
cannam@95 153
cannam@95 154 * Some documentation clarifications, e.g. of fact that --enable-openmp
cannam@95 155 and --enable-threads are mutually exclusive (thanks to Long To),
cannam@95 156 and document slightly odd behavior of plan_guru_r2r in Fortran
cannam@95 157 (thanks to Alexander Pozdneev).
cannam@95 158
cannam@95 159 * FAQ was accidentally omitted from 3.2 tarball.
cannam@95 160
cannam@95 161 * Remove some extraneous (harmless) files accidentally included in
cannam@95 162 a subdirectory of the 3.2 tarball.
cannam@95 163
cannam@95 164 FFTW 3.2
cannam@95 165
cannam@95 166 * Worked around apparent glibc bug that leads to rare hangs when freeing
cannam@95 167 semaphores.
cannam@95 168
cannam@95 169 * Fixed segfault due to unaligned access in certain obscure problems
cannam@95 170 that use SSE and multiple threads.
cannam@95 171
cannam@95 172 * MPI transforms not included, as they are still in alpha; the alpha
cannam@95 173 versions of the MPI transforms have been moved to FFTW 3.3alpha1.
cannam@95 174
cannam@95 175 FFTW 3.2alpha3
cannam@95 176
cannam@95 177 * Performance improvements for sizes with factors of 5 and 10.
cannam@95 178
cannam@95 179 * Documented FFTW_WISDOM_ONLY flag, at the suggestion of Mario
cannam@95 180 Emmenlauer and Phil Dumont.
cannam@95 181
cannam@95 182 * Port Cell code to SDK2.1 (libspe2), as opposed to the old libspe1 code.
cannam@95 183
cannam@95 184 * Performance improvements in Cell code for N < 32k, thanks to Jan Wagner
cannam@95 185 for the suggestions.
cannam@95 186
cannam@95 187 * Cycle counter for Sun x86_64 compiler, and compilation fix in cycle
cannam@95 188 counter for AIX/xlc (thanks to Jeff Haferman for the bug report).
cannam@95 189
cannam@95 190 * Fixed incorrect type prefix in MPI code that prevented wisdom routines
cannam@95 191 from working in single precision (thanks to Eric A. Borisch for the report).
cannam@95 192
cannam@95 193 * Added 'make check' for MPI code (which still fails in a couple corner
cannam@95 194 cases, but should be much better than in alpha2).
cannam@95 195
cannam@95 196 * Many other small fixes.
cannam@95 197
cannam@95 198 FFTW 3.2alpha2
cannam@95 199
cannam@95 200 * Support for the Cell processor, donated by IBM Research; see README.Cell
cannam@95 201 and the Cell section of the manual.
cannam@95 202
cannam@95 203 * New 64-bit API: for every "plan_guru" function there is a new "plan_guru64"
cannam@95 204 function with the same semantics, but which takes fftw_iodim64 instead of
cannam@95 205 fftw_iodim. fftw_iodim64 is the same as fftw_iodim, except that it takes
cannam@95 206 ptrdiff_t integer types as parameters, which is a 64-bit type on
cannam@95 207 64-bit machines. This is only useful for specifying very large transforms
cannam@95 208 on 64-bit machines. (Internally, FFTW uses ptrdiff_t everywhere
cannam@95 209 regardless of what API you choose.)
cannam@95 210
cannam@95 211 * Experimental MPI support. Complex one- and multi-dimensional FFTs,
cannam@95 212 multi-dimensional r2r, multi-dimensional r2c/c2r transforms, and
cannam@95 213 distributed transpose operations, with 1d block distributions.
cannam@95 214 (This is an alpha preview: routines have not been exhaustively
cannam@95 215 tested, documentation is incomplete, and some functionality is
cannam@95 216 missing, e.g. Fortran support.) See mpi/README and also the MPI
cannam@95 217 section of the manual.
cannam@95 218
cannam@95 219 * Significantly faster r2c/c2r transforms, especially on machines with SIMD.
cannam@95 220
cannam@95 221 * Rewritten multi-threaded support for better performance by
cannam@95 222 re-using a fixed pool of threads rather than continually
cannam@95 223 respawning and joining (which nowadays is much slower).
cannam@95 224
cannam@95 225 * Support for MIPS paired-single SIMD instructions, donated by
cannam@95 226 Codesourcery.
cannam@95 227
cannam@95 228 * FFTW_WISDOM_ONLY planner flag, to create plan only if wisdom is
cannam@95 229 available and return NULL otherwise.
cannam@95 230
cannam@95 231 * Removed k7 support, which only worked in 32-bit mode and is
cannam@95 232 becoming obsolete. Use --enable-sse instead.
cannam@95 233
cannam@95 234 * Added --with-g77-wrappers configure option to force inclusion
cannam@95 235 of g77 wrappers, in addition to whatever is needed for the
cannam@95 236 detected Fortran compilers. This is mainly intended for GNU/Linux
cannam@95 237 distros switching to gfortran that wish to include both
cannam@95 238 gfortran and g77 support in FFTW.
cannam@95 239
cannam@95 240 * In manual, renamed "guru execute" functions to "new-array execute"
cannam@95 241 functions, to reduce confusion with the guru planner interface.
cannam@95 242 (The programming interface is unchanged.)
cannam@95 243
cannam@95 244 * Add missing __declspec attribute to threads API functions when compiling
cannam@95 245 for Windows; thanks to Robert O. Morris for the bug report.
cannam@95 246
cannam@95 247 * Fixed missing return value from dfftw_init_threads in Fortran;
cannam@95 248 thanks to Markus Wetzstein for the bug report.
cannam@95 249
cannam@95 250 FFTW 3.1.1
cannam@95 251
cannam@95 252 * Performance improvements for Intel EMT64.
cannam@95 253
cannam@95 254 * Performance improvements for large-size transforms with SIMD.
cannam@95 255
cannam@95 256 * Cycle counter support for Intel icc and Visual C++ on x86-64.
cannam@95 257
cannam@95 258 * In fftw-wisdom tool, replaced obsolete --impatient with --measure.
cannam@95 259
cannam@95 260 * Fixed compilation failure with AIX/xlc; thanks to Joseph Thomas.
cannam@95 261
cannam@95 262 * Windows DLL support for Fortran API (added missing __declspec(dllexport)).
cannam@95 263
cannam@95 264 * SSE/SSE2 code works properly (i.e. disables itself) on older 386 and 486
cannam@95 265 CPUs lacking a CPUID instruction; thanks to Eric Korpela.
cannam@95 266
cannam@95 267 FFTW 3.1
cannam@95 268
cannam@95 269 * Faster FFTW_ESTIMATE planner.
cannam@95 270
cannam@95 271 * New (faster) algorithm for REDFT00/RODFT00 (type-I DCT/DST) of odd size.
cannam@95 272
cannam@95 273 * "4-step" algorithm for faster FFTs of very large sizes (> 2^18).
cannam@95 274
cannam@95 275 * Faster in-place real-data DFTs (for R2HC and HC2R r2r formats).
cannam@95 276
cannam@95 277 * Faster in-place non-square transpositions (FFTW uses these internally
cannam@95 278 for in-place FFTs, and you can also perform them explicitly using
cannam@95 279 the guru interface).
cannam@95 280
cannam@95 281 * Faster prime-size DFTs: implemented Bluestein's algorithm, as well
cannam@95 282 as a zero-padded Rader variant to limit recursive use of Rader's algorithm.
cannam@95 283
cannam@95 284 * SIMD support for split complex arrays.
cannam@95 285
cannam@95 286 * Much faster Altivec/VMX performance.
cannam@95 287
cannam@95 288 * New fftw_set_timelimit function to specify a (rough) upper bound to the
cannam@95 289 planning time (does not affect ESTIMATE mode).
cannam@95 290
cannam@95 291 * Removed --enable-3dnow support; use --enable-k7 instead.
cannam@95 292
cannam@95 293 * FMA (fused multiply-add) version is now included in "standard" FFTW,
cannam@95 294 and is enabled with --enable-fma (the default on PowerPC and Itanium).
cannam@95 295
cannam@95 296 * Automatic detection of native architecture flag for gcc. New
cannam@95 297 configure options: --enable-portable-binary and --with-gcc-arch=<arch>,
cannam@95 298 for people distributing compiled binaries of FFTW (see manual).
cannam@95 299
cannam@95 300 * Automatic detection of Altivec under Linux with gcc 3.4 (so that
cannam@95 301 same binary should work on both Altivec and non-Altivec PowerPCs).
cannam@95 302
cannam@95 303 * Compiler-specific tweaks/flags/workarounds for gcc 3.4, xlc, HP/UX,
cannam@95 304 Solaris/Intel.
cannam@95 305
cannam@95 306 * Various documentation clarifications.
cannam@95 307
cannam@95 308 * 64-bit clean. (Fixes a bug affecting the split guru planner on
cannam@95 309 64-bit machines, reported by David Necas.)
cannam@95 310
cannam@95 311 * Fixed Debian bug #259612: inadvertent use of SSE instructions on
cannam@95 312 non-SSE machines (causing a crash) for --enable-sse binaries.
cannam@95 313
cannam@95 314 * Fixed bug that caused HC2R transforms to destroy the input in
cannam@95 315 certain cases, even if the user specified FFTW_PRESERVE_INPUT.
cannam@95 316
cannam@95 317 * Fixed bug where wisdom would be lost under rare circumstances,
cannam@95 318 causing excessive planning time.
cannam@95 319
cannam@95 320 * FAQ notes bug in gcc-3.4.[1-3] that causes FFTW to crash with SSE/SSE2.
cannam@95 321
cannam@95 322 * Fixed accidentally exported symbol that prohibited simultaneous
cannam@95 323 linking to double/single multithreaded FFTW (thanks to Alessio Massaro).
cannam@95 324
cannam@95 325 * Support Win32 threads under MinGW (thanks to Alessio Massaro).
cannam@95 326
cannam@95 327 * Fixed problem with building DLL under Cygwin; thanks to Stephane Fillod.
cannam@95 328
cannam@95 329 * Fix build failure if no Fortran compiler is found (thanks to Charles
cannam@95 330 Radley for the bug report).
cannam@95 331
cannam@95 332 * Fixed compilation failure with icc 8.0 and SSE/SSE2. Automatic
cannam@95 333 detection of icc architecture flag (e.g. -xW).
cannam@95 334
cannam@95 335 * Fixed compilation with OpenMP on AIX (thanks to Greg Bauer).
cannam@95 336
cannam@95 337 * Fixed compilation failure on x86-64 with gcc (thanks to Orion Poplawski).
cannam@95 338
cannam@95 339 * Incorporated patch from FreeBSD ports (FreeBSD does not have memalign,
cannam@95 340 but its malloc is 16-byte aligned).
cannam@95 341
cannam@95 342 * Cycle-counter compilation fixes for Itanium, Alpha, x86-64, Sparc,
cannam@95 343 MacOS (thanks to Matt Boman, John Bowman, and James A. Treacy for
cannam@95 344 reports/fixes). Added x86-64 cycle counter for PGI compilers,
cannam@95 345 courtesy Cristiano Calonaci.
cannam@95 346
cannam@95 347 * Fix compilation problem in test program due to C99 conflict.
cannam@95 348
cannam@95 349 * Portability fix for import_system_wisdom with djgpp (thanks to Juan
cannam@95 350 Manuel Guerrero).
cannam@95 351
cannam@95 352 * Fixed compilation failure on MacOS 10.3 due to getopt conflict.
cannam@95 353
cannam@95 354 * Work around Visual C++ (version 6/7) bug in SSE compilation;
cannam@95 355 thanks to Eddie Yee for his detailed report.
cannam@95 356
cannam@95 357 Changes from FFTW 3.1 beta 2:
cannam@95 358
cannam@95 359 * Several minor compilation fixes.
cannam@95 360
cannam@95 361 * Eliminate FFTW_TIMELIMIT flag and replace fftw_timelimit global with
cannam@95 362 fftw_set_timelimit function. Make wisdom work with time-limited plans.
cannam@95 363
cannam@95 364 Changes from FFTW 3.1 beta 1:
cannam@95 365
cannam@95 366 * Fixes for creating DLLs under Windows; thanks to John Pavel for his feedback.
cannam@95 367
cannam@95 368 * Fixed more 64-bit problems, thanks to John Pavel for the bug report.
cannam@95 369
cannam@95 370 * Further speed improvements for Altivec/VMX.
cannam@95 371
cannam@95 372 * Further speed improvements for non-square transpositions.
cannam@95 373
cannam@95 374 * Many minor tweaks.
cannam@95 375
cannam@95 376 FFTW 3.0.1
cannam@95 377
cannam@95 378 * Some speed improvements in SIMD code.
cannam@95 379
cannam@95 380 * --without-cycle-counter option is removed. If no cycle counter is found,
cannam@95 381 then the estimator is always used. A --with-slow-timer option is provided
cannam@95 382 to force the use of lower-resolution timers.
cannam@95 383
cannam@95 384 * Several fixes for compilation under Visual C++, with help from Stefane Ruel.
cannam@95 385
cannam@95 386 * Added x86 cycle counter for Visual C++, with help from Morten Nissov.
cannam@95 387
cannam@95 388 * Added S390 cycle counter, courtesy of James Treacy.
cannam@95 389
cannam@95 390 * Added missing static keyword that prevented simultaneous linkage
cannam@95 391 of different-precision versions; thanks to Rasmus Larsen for the bug report.
cannam@95 392
cannam@95 393 * Corrected accidental omission of f77_wisdom.f file; thanks to Alan Watson.
cannam@95 394
cannam@95 395 * Support -xopenmp flag for SunOS; thanks to John Lou for the bug report.
cannam@95 396
cannam@95 397 * Compilation with HP/UX cc requires -Wp,-H128000 flag to increase
cannam@95 398 preprocessor limits; thanks to Peter Vouras for the bug report.
cannam@95 399
cannam@95 400 * Removed non-portable use of 'tempfile' in fftw-wisdom-to-conf script;
cannam@95 401 thanks to Nicolas Decoster for the patch.
cannam@95 402
cannam@95 403 * Added 'make smallcheck' target in tests/ directory, at the request of
cannam@95 404 James Treacy.
cannam@95 405
cannam@95 406 FFTW 3.0
cannam@95 407
cannam@95 408 Major goals of this release:
cannam@95 409
cannam@95 410 * Speed: often 20% or more faster than FFTW 2.x, even without SIMD (see below).
cannam@95 411
cannam@95 412 * Complete rewrite, to make it easier to add new algorithms and transforms.
cannam@95 413
cannam@95 414 * New API, to support more general semantics.
cannam@95 415
cannam@95 416 Other enhancements:
cannam@95 417
cannam@95 418 * SIMD acceleration on supporting CPUs (SSE, SSE2, 3DNow!, and AltiVec).
cannam@95 419 (With special thanks to Franz Franchetti for many experimental prototypes
cannam@95 420 and to Stefan Kral for the vectorizing generator from fftwgel.)
cannam@95 421
cannam@95 422 * True in-place 1d transforms of large sizes (as well as compressed
cannam@95 423 twiddle tables for additional memory/cache savings).
cannam@95 424
cannam@95 425 * More arbitrary placement of real & imaginary data, e.g. including
cannam@95 426 interleaved (as in FFTW 2.x) as well as separate real/imag arrays.
cannam@95 427
cannam@95 428 * Efficient prime-size transforms of real data.
cannam@95 429
cannam@95 430 * Multidimensional transforms can operate on a subset of a larger matrix,
cannam@95 431 and/or transform selected dimensions of a multidimensional array.
cannam@95 432
cannam@95 433 * By popular demand, simultaneous linking to double precision (fftw),
cannam@95 434 single precision (fftwf), and long-double precision (fftwl) versions
cannam@95 435 of FFTW is now supported.
cannam@95 436
cannam@95 437 * Cycle counters (on all modern CPUs) are exploited to speed planning.
cannam@95 438
cannam@95 439 * Efficient transforms of real even/odd arrays, a.k.a. discrete
cannam@95 440 cosine/sine transforms (types I-IV). (Currently work via pre/post
cannam@95 441 processing of real transforms, ala FFTPACK, so are not optimal.)
cannam@95 442
cannam@95 443 * DHTs (Discrete Hartley Transforms), again via post-processing
cannam@95 444 of real transforms (and thus suboptimal, for now).
cannam@95 445
cannam@95 446 * Support for linking to just those parts of FFTW that you need,
cannam@95 447 greatly reducing the size of statically linked programs when
cannam@95 448 only a limited set of transform sizes/types are required.
cannam@95 449
cannam@95 450 * Canonical global wisdom file (/etc/fftw/wisdom) on Unix, along
cannam@95 451 with a command-line tool (fftw-wisdom) to generate/update it.
cannam@95 452
cannam@95 453 * Fortran API can be used with both g77 and non-g77 compilers
cannam@95 454 simultaneously.
cannam@95 455
cannam@95 456 * Multi-threaded version has optional OpenMP support.
cannam@95 457
cannam@95 458 * Authors' good looks have greatly improved with age.
cannam@95 459
cannam@95 460 Changes from 3.0beta3:
cannam@95 461
cannam@95 462 * Separate FMA distribution to better exploit fused multiply-add instructions
cannam@95 463 on PowerPC (and possibly other) architectures.
cannam@95 464
cannam@95 465 * Performance improvements via some inlining tweaks.
cannam@95 466
cannam@95 467 * fftw_flops now returns double arguments, not int, to avoid overflows
cannam@95 468 for large sizes.
cannam@95 469
cannam@95 470 * Workarounds for automake bugs.
cannam@95 471
cannam@95 472 Changes from 3.0beta2:
cannam@95 473
cannam@95 474 * The standard REDFT00/RODFT00 (DCT-I/DST-I) algorithm (used in
cannam@95 475 FFTPACK, NR, etcetera) turns out to have poor numerical accuracy, so
cannam@95 476 we replaced it with a slower routine that is more accurate.
cannam@95 477
cannam@95 478 * The guru planner and execute functions now have two variants, one that
cannam@95 479 takes complex arguments and one that takes separate real/imag pointers.
cannam@95 480
cannam@95 481 * Execute and planner routines now automatically align the stack on x86,
cannam@95 482 in case the calling program is misaligned.
cannam@95 483
cannam@95 484 * README file for test program.
cannam@95 485
cannam@95 486 * Fixed bugs in the combination of SIMD with multi-threaded transforms.
cannam@95 487
cannam@95 488 * Eliminated internal fftw_threads_init function, which some people were
cannam@95 489 calling accidentally instead of the fftw_init_threads API function.
cannam@95 490
cannam@95 491 * Check for -openmp flag (Intel C compiler) when --enable-openmp is used.
cannam@95 492
cannam@95 493 * Support AMD x86-64 SIMD and cycle counter.
cannam@95 494
cannam@95 495 * Support SSE2 intrinsics in forthcoming gcc 3.3.
cannam@95 496
cannam@95 497 Changes from 3.0beta1:
cannam@95 498
cannam@95 499 * Faster in-place 1d transforms of non-power-of-two sizes.
cannam@95 500
cannam@95 501 * SIMD improvements for in-place, multi-dimensional, and/or non-FFTW_PATIENT
cannam@95 502 transforms.
cannam@95 503
cannam@95 504 * Added support for hard-coded DCT/DST/DHT codelets of small sizes; the
cannam@95 505 default distribution only includes hard-coded size-8 DCT-II/III, however.
cannam@95 506
cannam@95 507 * Many minor improvements to the manual. Added section on using the
cannam@95 508 codelet generator to customize and enhance FFTW.
cannam@95 509
cannam@95 510 * The default 'make check' should now only take a few minutes; for more
cannam@95 511 strenuous tests (which may take a day or so), do 'cd tests; make bigcheck'.
cannam@95 512
cannam@95 513 * fftw_print_plan is split into fftw_fprint_plan and fftw_print_plan, where
cannam@95 514 the latter uses stdout.
cannam@95 515
cannam@95 516 * Fixed ability to compile with a C++ compiler.
cannam@95 517
cannam@95 518 * Fixed support for C99 complex type under glibc.
cannam@95 519
cannam@95 520 * Fixed problems with alloca under MinGW, AIX.
cannam@95 521
cannam@95 522 * Workaround for gcc/SPARC bug.
cannam@95 523
cannam@95 524 * Fixed multi-threaded initialization failure on IRIX due to lack of
cannam@95 525 user-accessible PTHREAD_SCOPE_SYSTEM there.