annotate src/fftw-3.3.5/NEWS @ 83:ae30d91d2ffe

Replace these with versions built using an older toolset (so as to avoid ABI compatibilities when linking on Ubuntu 14.04 for packaging purposes)
author Chris Cannam
date Fri, 07 Feb 2020 11:51:13 +0000
parents 2cd0e3b3e1fd
children
rev   line source
Chris@42 1 FFTW 3.3.5:
Chris@42 2
Chris@42 3 * New SIMD support:
Chris@42 4 - Power8 VSX instructions in single and double precision.
Chris@42 5 To use, add --enable-vsx to configure.
Chris@42 6 - Support for AVX2 (256-bit FMA instructions).
Chris@42 7 To use, add --enable-avx2 to configure.
Chris@42 8 - Experimental support for AVX512 and KCVI. (--enable-avx512, --enable-kcvi)
Chris@42 9 This code is expected to work but the FFTW maintainers do not have
Chris@42 10 hardware to test it.
Chris@42 11 - Support for AVX128/FMA (for some AMD machines) (--enable-avx128-fma)
Chris@42 12 - Double precision Neon SIMD for aarch64.
Chris@42 13 This code is expected to work but the FFTW maintainers do not have
Chris@42 14 hardware to test it.
Chris@42 15 - generic SIMD support using gcc vector intrinsics
Chris@42 16 * Add fftw_make_planner_thread_safe() API
Chris@42 17 * fix #18 (disable float128 for CUDACC)
Chris@42 18 * fix #19: missing Fortran interface for fftwq_alloc_real
Chris@42 19 * fix #21 (don't use float128 on Portland compilers, which pretend to be gcc)
Chris@42 20 * fix: Avoid segfaults due to double free in MPI transpose
Chris@42 21
Chris@42 22 * Special note for distribution maintainers: Although FFTW supports a
Chris@42 23 zillion SIMD instruction sets, enabling them all at the same time is
Chris@42 24 a bad idea, because it increases the planning time for minimal gain.
Chris@42 25 We recommend that general-purpose x86 distributions only enable SSE2
Chris@42 26 and perhaps AVX. Users who care about the last ounce of performance
Chris@42 27 should recompile FFTW themselves.
Chris@42 28
Chris@42 29 FFTW 3.3.4
Chris@42 30
Chris@42 31 * New functions fftw_alignment_of (to check whether two arrays are
Chris@42 32 equally aligned for the purposes of applying a plan) and fftw_sprint_plan
Chris@42 33 (to output a description of plan to a string).
Chris@42 34
Chris@42 35 * Bugfix in fftw-wisdom-to-conf; thanks to Florian Oppermann for the
Chris@42 36 bug report.
Chris@42 37
Chris@42 38 * Fixed manual to work with texinfo-5.
Chris@42 39
Chris@42 40 * Increased timing interval on x86_64 to reduce timing errors.
Chris@42 41
Chris@42 42 * Default to Win32 threads, not pthreads, if both are present.
Chris@42 43
Chris@42 44 * Various build-script fixes.
Chris@42 45
Chris@42 46 FFTW 3.3.3
Chris@42 47
Chris@42 48 * Fix deadlock bug in MPI transforms (thanks to Michael Pippig for the
Chris@42 49 bug report and patch, and to Graham Dennis for the bug report).
Chris@42 50
Chris@42 51 * Use 128-bit ARM NEON instructions instead of 64-bits. This change
Chris@42 52 appears to speed up even ARM processors with a 64-bit NEON pipe.
Chris@42 53
Chris@42 54 * Speed improvements for single-precision AVX.
Chris@42 55
Chris@42 56 * Speed up planner on machines without "official" cycle counters, such as ARM.
Chris@42 57
Chris@42 58 FFTW 3.3.2
Chris@42 59
Chris@42 60 * Removed an archaic stack-alignment hack that was failing with
Chris@42 61 gcc-4.7/i386.
Chris@42 62
Chris@42 63 * Added stack-alignment hack necessary for gcc on Windows/i386. We
Chris@42 64 will regret this in ten years (see previous change).
Chris@42 65
Chris@42 66 * Fix incompatibility with Intel icc which pretends to be gcc
Chris@42 67 but does not support quad precision.
Chris@42 68
Chris@42 69 * make libfftw{threads,mpi} depend upon libfftw when using libtool;
Chris@42 70 this is consistent with most other libraries and simplifies the life
Chris@42 71 of various distributors of GNU/Linux.
Chris@42 72
Chris@42 73 FFTW 3.3.1
Chris@42 74
Chris@42 75 * Changes since 3.3.1-beta1:
Chris@42 76
Chris@42 77 - Reduced planning time in estimate mode for sizes with large
Chris@42 78 prime factors.
Chris@42 79
Chris@42 80 - Added AVX autodetection under Visual Studio. Thanks Carsten
Chris@42 81 Steger for submitting the necessary code.
Chris@42 82
Chris@42 83 - Modern Fortran interface now uses a separate fftw3l.f03 interface
Chris@42 84 file for the long double interface, which is not supported by
Chris@42 85 some Fortran compilers. Provided new fftw3q.f03 interface file
Chris@42 86 to access the quadruple-precision FFTW routines with recent
Chris@42 87 versions of gcc/gfortran.
Chris@42 88
Chris@42 89 * Added support for the NEON extensions to the ARM ISA. (Note to beta
Chris@42 90 users: an ARM cycle counter is not yet implemented; please contact
Chris@42 91 fftw@fftw.org if you know how to do it right.)
Chris@42 92
Chris@42 93 * MPI code now compiles even if mpicc is a C++ compiler; thanks to
Chris@42 94 Kyle Spyksma for the bug report.
Chris@42 95
Chris@42 96 FFTW 3.3
Chris@42 97
Chris@42 98 * Changes since 3.3-beta1:
Chris@42 99
Chris@42 100 - Compiling OpenMP support (--enable-openmp) now installs a
Chris@42 101 fftw3_omp library, instead of fftw3_threads, so that OpenMP
Chris@42 102 and POSIX threads (--enable-threads) libraries can be built
Chris@42 103 and installed at the same time.
Chris@42 104
Chris@42 105 - Various minor compilation fixes, corrections of manual typos, and
Chris@42 106 improvements to the benchmark test program.
Chris@42 107
Chris@42 108 * Add support for the AVX extensions to x86 and x86-64. The AVX code
Chris@42 109 works with 16-byte alignment (as opposed to 32-byte alignment),
Chris@42 110 so there is no ABI change compared to FFTW 3.2.2.
Chris@42 111
Chris@42 112 * Added Fortran 2003 interface, which should be usable on most modern
Chris@42 113 Fortran compilers (e.g. gfortran) and provides type-checked access
Chris@42 114 to the the C FFTW interface. (The legacy Fortran-77 interface is
Chris@42 115 still included also.)
Chris@42 116
Chris@42 117 * Added MPI distributed-memory transforms. Compared to 3.3alpha,
Chris@42 118 the major changes in the MPI transforms are:
Chris@42 119 - Fixed some deadlock and crashing bugs.
Chris@42 120 - Added Fortran 2003 interface.
Chris@42 121 - Added new-array execute functions for MPI plans.
Chris@42 122 - Eliminated use of large MPI tags, since Cray MPI requires tags < 2^24;
Chris@42 123 thanks to Jonathan Bentz for the bug report.
Chris@42 124 - Expanded documentation.
Chris@42 125 - 'make check' now runs MPI tests
Chris@42 126 - Some ABI changes - not binary-compatible with 3.3alpha MPI.
Chris@42 127
Chris@42 128 * Add support for quad-precision __float128 in gcc 4.6 or later (on x86.
Chris@42 129 x86-64, and Itanium). The new routines use the fftwq_ prefix.
Chris@42 130
Chris@42 131 * Removed support for MIPS paired-single instructions due to lack of
Chris@42 132 available hardware for testing. Users who want this functionality
Chris@42 133 should continue using FFTW 3.2.x. (Note that FFTW 3.3 still works
Chris@42 134 on MIPS; this only concerns special instructions available on some
Chris@42 135 MIPS chips.)
Chris@42 136
Chris@42 137 * Removed support for the Cell Broadband Engine. Cell users should
Chris@42 138 use FFTW 3.2.x.
Chris@42 139
Chris@42 140 * New convenience functions fftw_alloc_real and fftw_alloc_complex
Chris@42 141 to use fftw_malloc for real and complex arrays without typecasts
Chris@42 142 or sizeof.
Chris@42 143
Chris@42 144 * New convenience functions fftw_export_wisdom_to_filename and
Chris@42 145 fftw_import_wisdom_from_filename that export/import wisdom
Chris@42 146 to a file, which don't require you to open/close the file yourself.
Chris@42 147
Chris@42 148 * New function fftw_cost to return FFTW's internal cost metric for
Chris@42 149 a given plan; thanks to Rhys Ulerich and Nathanael Schaeffer for the
Chris@42 150 suggestion.
Chris@42 151
Chris@42 152 * The --enable-sse2 configure flag now works in both double and single
Chris@42 153 precision (and is equivalent to --enable-sse in the latter case).
Chris@42 154
Chris@42 155 * Remove --enable-portable-binary flag: we new produce portable binaries
Chris@42 156 by default.
Chris@42 157
Chris@42 158 * Remove the automatic detection of native architecture flag for gcc
Chris@42 159 which was introduced in fftw-3.1, since new gcc supports -mtune=native.
Chris@42 160 Remove the --with-gcc-arch flag; if you want to specify a particlar
Chris@42 161 arch to configure, use ./configure CC="gcc -mtune=...".
Chris@42 162
Chris@42 163 * --with-our-malloc16 configure flag is now renamed --with-our-malloc.
Chris@42 164
Chris@42 165 * Fixed build problem failure when srand48 declaration is missing;
Chris@42 166 thanks to Ralf Wildenhues for the bug report.
Chris@42 167
Chris@42 168 * Fixed bug in fftw_set_timelimit: ensure that a negative timelimit
Chris@42 169 is equivalent to no timelimit in all cases. Thanks to William Andrew
Chris@42 170 Burnson for the bug report.
Chris@42 171
Chris@42 172 * Fixed stack-overflow problem on OpenBSD caused by using alloca with
Chris@42 173 too large a buffer.
Chris@42 174
Chris@42 175 FFTW 3.2.2
Chris@42 176
Chris@42 177 * Improve performance of some copy operations of complex arrays on
Chris@42 178 x86 machines.
Chris@42 179
Chris@42 180 * Add configure flag to disable alloca(), which is broken in mingw64.
Chris@42 181
Chris@42 182 * Planning in FFTW_ESTIMATE mode for r2r transforms became slower
Chris@42 183 between fftw-3.1.3 and 3.2. This regression has now been fixed.
Chris@42 184
Chris@42 185 FFTW 3.2.1
Chris@42 186
Chris@42 187 * Performance improvements for some multidimensional r2c/c2r transforms;
Chris@42 188 thanks to Eugene Miloslavsky for his benchmark reports.
Chris@42 189
Chris@42 190 * Compile with icc on MacOS X, use better icc compiler flags.
Chris@42 191
Chris@42 192 * Compilation fixes for systems where snprintf is defined as a macro;
Chris@42 193 thanks to Marcus Mae for the bug report.
Chris@42 194
Chris@42 195 * Fortran documentation now recommends not using dfftw_execute,
Chris@42 196 because of reports of problems with various Fortran compilers;
Chris@42 197 it is better to use dfftw_execute_dft etcetera.
Chris@42 198
Chris@42 199 * Some documentation clarifications, e.g. of fact that --enable-openmp
Chris@42 200 and --enable-threads are mutually exclusive (thanks to Long To),
Chris@42 201 and document slightly odd behavior of plan_guru_r2r in Fortran
Chris@42 202 (thanks to Alexander Pozdneev).
Chris@42 203
Chris@42 204 * FAQ was accidentally omitted from 3.2 tarball.
Chris@42 205
Chris@42 206 * Remove some extraneous (harmless) files accidentally included in
Chris@42 207 a subdirectory of the 3.2 tarball.
Chris@42 208
Chris@42 209 FFTW 3.2
Chris@42 210
Chris@42 211 * Worked around apparent glibc bug that leads to rare hangs when freeing
Chris@42 212 semaphores.
Chris@42 213
Chris@42 214 * Fixed segfault due to unaligned access in certain obscure problems
Chris@42 215 that use SSE and multiple threads.
Chris@42 216
Chris@42 217 * MPI transforms not included, as they are still in alpha; the alpha
Chris@42 218 versions of the MPI transforms have been moved to FFTW 3.3alpha1.
Chris@42 219
Chris@42 220 FFTW 3.2alpha3
Chris@42 221
Chris@42 222 * Performance improvements for sizes with factors of 5 and 10.
Chris@42 223
Chris@42 224 * Documented FFTW_WISDOM_ONLY flag, at the suggestion of Mario
Chris@42 225 Emmenlauer and Phil Dumont.
Chris@42 226
Chris@42 227 * Port Cell code to SDK2.1 (libspe2), as opposed to the old libspe1 code.
Chris@42 228
Chris@42 229 * Performance improvements in Cell code for N < 32k, thanks to Jan Wagner
Chris@42 230 for the suggestions.
Chris@42 231
Chris@42 232 * Cycle counter for Sun x86_64 compiler, and compilation fix in cycle
Chris@42 233 counter for AIX/xlc (thanks to Jeff Haferman for the bug report).
Chris@42 234
Chris@42 235 * Fixed incorrect type prefix in MPI code that prevented wisdom routines
Chris@42 236 from working in single precision (thanks to Eric A. Borisch for the report).
Chris@42 237
Chris@42 238 * Added 'make check' for MPI code (which still fails in a couple corner
Chris@42 239 cases, but should be much better than in alpha2).
Chris@42 240
Chris@42 241 * Many other small fixes.
Chris@42 242
Chris@42 243 FFTW 3.2alpha2
Chris@42 244
Chris@42 245 * Support for the Cell processor, donated by IBM Research; see README.Cell
Chris@42 246 and the Cell section of the manual.
Chris@42 247
Chris@42 248 * New 64-bit API: for every "plan_guru" function there is a new "plan_guru64"
Chris@42 249 function with the same semantics, but which takes fftw_iodim64 instead of
Chris@42 250 fftw_iodim. fftw_iodim64 is the same as fftw_iodim, except that it takes
Chris@42 251 ptrdiff_t integer types as parameters, which is a 64-bit type on
Chris@42 252 64-bit machines. This is only useful for specifying very large transforms
Chris@42 253 on 64-bit machines. (Internally, FFTW uses ptrdiff_t everywhere
Chris@42 254 regardless of what API you choose.)
Chris@42 255
Chris@42 256 * Experimental MPI support. Complex one- and multi-dimensional FFTs,
Chris@42 257 multi-dimensional r2r, multi-dimensional r2c/c2r transforms, and
Chris@42 258 distributed transpose operations, with 1d block distributions.
Chris@42 259 (This is an alpha preview: routines have not been exhaustively
Chris@42 260 tested, documentation is incomplete, and some functionality is
Chris@42 261 missing, e.g. Fortran support.) See mpi/README and also the MPI
Chris@42 262 section of the manual.
Chris@42 263
Chris@42 264 * Significantly faster r2c/c2r transforms, especially on machines with SIMD.
Chris@42 265
Chris@42 266 * Rewritten multi-threaded support for better performance by
Chris@42 267 re-using a fixed pool of threads rather than continually
Chris@42 268 respawning and joining (which nowadays is much slower).
Chris@42 269
Chris@42 270 * Support for MIPS paired-single SIMD instructions, donated by
Chris@42 271 Codesourcery.
Chris@42 272
Chris@42 273 * FFTW_WISDOM_ONLY planner flag, to create plan only if wisdom is
Chris@42 274 available and return NULL otherwise.
Chris@42 275
Chris@42 276 * Removed k7 support, which only worked in 32-bit mode and is
Chris@42 277 becoming obsolete. Use --enable-sse instead.
Chris@42 278
Chris@42 279 * Added --with-g77-wrappers configure option to force inclusion
Chris@42 280 of g77 wrappers, in addition to whatever is needed for the
Chris@42 281 detected Fortran compilers. This is mainly intended for GNU/Linux
Chris@42 282 distros switching to gfortran that wish to include both
Chris@42 283 gfortran and g77 support in FFTW.
Chris@42 284
Chris@42 285 * In manual, renamed "guru execute" functions to "new-array execute"
Chris@42 286 functions, to reduce confusion with the guru planner interface.
Chris@42 287 (The programming interface is unchanged.)
Chris@42 288
Chris@42 289 * Add missing __declspec attribute to threads API functions when compiling
Chris@42 290 for Windows; thanks to Robert O. Morris for the bug report.
Chris@42 291
Chris@42 292 * Fixed missing return value from dfftw_init_threads in Fortran;
Chris@42 293 thanks to Markus Wetzstein for the bug report.
Chris@42 294
Chris@42 295 FFTW 3.1.3
Chris@42 296
Chris@42 297 * Bug fix: FFTW computes incorrect results when the user plans both
Chris@42 298 REDFT11 and RODFT11 transforms of certain sizes. The bug is caused
Chris@42 299 by incorrect sharing of twiddle-factor tables between the two
Chris@42 300 transforms, and only occurs when both are used. Thanks to Paul
Chris@42 301 A. Valiant for the bug report.
Chris@42 302
Chris@42 303 FFTW 3.1.2
Chris@42 304
Chris@42 305 * Correct bug in configure script: --enable-portable-binary option was ignored!
Chris@42 306 Thanks to Andrew Salamon for the bug report.
Chris@42 307
Chris@42 308 * Threads compilation fix on AIX: prefer xlc_r to cc_r, and don't use
Chris@42 309 either if we are using gcc. Thanks to Guy Moebs for the bug report.
Chris@42 310
Chris@42 311 * Updated FAQ to note that Apple gcc 4.0.1 on MacOS/Intel is broken,
Chris@42 312 and suggest a workaround. configure script now detects Core/Duo arch.
Chris@42 313
Chris@42 314 * Use -maltivec when checking for altivec.h. Fixes Gentoo bug #129304,
Chris@42 315 thanks to Markus Dittrich.
Chris@42 316
Chris@42 317 FFTW 3.1.1
Chris@42 318
Chris@42 319 * Performance improvements for Intel EMT64.
Chris@42 320
Chris@42 321 * Performance improvements for large-size transforms with SIMD.
Chris@42 322
Chris@42 323 * Cycle counter support for Intel icc and Visual C++ on x86-64.
Chris@42 324
Chris@42 325 * In fftw-wisdom tool, replaced obsolete --impatient with --measure.
Chris@42 326
Chris@42 327 * Fixed compilation failure with AIX/xlc; thanks to Joseph Thomas.
Chris@42 328
Chris@42 329 * Windows DLL support for Fortran API (added missing __declspec(dllexport)).
Chris@42 330
Chris@42 331 * SSE/SSE2 code works properly (i.e. disables itself) on older 386 and 486
Chris@42 332 CPUs lacking a CPUID instruction; thanks to Eric Korpela.
Chris@42 333
Chris@42 334 FFTW 3.1
Chris@42 335
Chris@42 336 * Faster FFTW_ESTIMATE planner.
Chris@42 337
Chris@42 338 * New (faster) algorithm for REDFT00/RODFT00 (type-I DCT/DST) of odd size.
Chris@42 339
Chris@42 340 * "4-step" algorithm for faster FFTs of very large sizes (> 2^18).
Chris@42 341
Chris@42 342 * Faster in-place real-data DFTs (for R2HC and HC2R r2r formats).
Chris@42 343
Chris@42 344 * Faster in-place non-square transpositions (FFTW uses these internally
Chris@42 345 for in-place FFTs, and you can also perform them explicitly using
Chris@42 346 the guru interface).
Chris@42 347
Chris@42 348 * Faster prime-size DFTs: implemented Bluestein's algorithm, as well
Chris@42 349 as a zero-padded Rader variant to limit recursive use of Rader's algorithm.
Chris@42 350
Chris@42 351 * SIMD support for split complex arrays.
Chris@42 352
Chris@42 353 * Much faster Altivec/VMX performance.
Chris@42 354
Chris@42 355 * New fftw_set_timelimit function to specify a (rough) upper bound to the
Chris@42 356 planning time (does not affect ESTIMATE mode).
Chris@42 357
Chris@42 358 * Removed --enable-3dnow support; use --enable-k7 instead.
Chris@42 359
Chris@42 360 * FMA (fused multiply-add) version is now included in "standard" FFTW,
Chris@42 361 and is enabled with --enable-fma (the default on PowerPC and Itanium).
Chris@42 362
Chris@42 363 * Automatic detection of native architecture flag for gcc. New
Chris@42 364 configure options: --enable-portable-binary and --with-gcc-arch=<arch>,
Chris@42 365 for people distributing compiled binaries of FFTW (see manual).
Chris@42 366
Chris@42 367 * Automatic detection of Altivec under Linux with gcc 3.4 (so that
Chris@42 368 same binary should work on both Altivec and non-Altivec PowerPCs).
Chris@42 369
Chris@42 370 * Compiler-specific tweaks/flags/workarounds for gcc 3.4, xlc, HP/UX,
Chris@42 371 Solaris/Intel.
Chris@42 372
Chris@42 373 * Various documentation clarifications.
Chris@42 374
Chris@42 375 * 64-bit clean. (Fixes a bug affecting the split guru planner on
Chris@42 376 64-bit machines, reported by David Necas.)
Chris@42 377
Chris@42 378 * Fixed Debian bug #259612: inadvertent use of SSE instructions on
Chris@42 379 non-SSE machines (causing a crash) for --enable-sse binaries.
Chris@42 380
Chris@42 381 * Fixed bug that caused HC2R transforms to destroy the input in
Chris@42 382 certain cases, even if the user specified FFTW_PRESERVE_INPUT.
Chris@42 383
Chris@42 384 * Fixed bug where wisdom would be lost under rare circumstances,
Chris@42 385 causing excessive planning time.
Chris@42 386
Chris@42 387 * FAQ notes bug in gcc-3.4.[1-3] that causes FFTW to crash with SSE/SSE2.
Chris@42 388
Chris@42 389 * Fixed accidentally exported symbol that prohibited simultaneous
Chris@42 390 linking to double/single multithreaded FFTW (thanks to Alessio Massaro).
Chris@42 391
Chris@42 392 * Support Win32 threads under MinGW (thanks to Alessio Massaro).
Chris@42 393
Chris@42 394 * Fixed problem with building DLL under Cygwin; thanks to Stephane Fillod.
Chris@42 395
Chris@42 396 * Fix build failure if no Fortran compiler is found (thanks to Charles
Chris@42 397 Radley for the bug report).
Chris@42 398
Chris@42 399 * Fixed compilation failure with icc 8.0 and SSE/SSE2. Automatic
Chris@42 400 detection of icc architecture flag (e.g. -xW).
Chris@42 401
Chris@42 402 * Fixed compilation with OpenMP on AIX (thanks to Greg Bauer).
Chris@42 403
Chris@42 404 * Fixed compilation failure on x86-64 with gcc (thanks to Orion Poplawski).
Chris@42 405
Chris@42 406 * Incorporated patch from FreeBSD ports (FreeBSD does not have memalign,
Chris@42 407 but its malloc is 16-byte aligned).
Chris@42 408
Chris@42 409 * Cycle-counter compilation fixes for Itanium, Alpha, x86-64, Sparc,
Chris@42 410 MacOS (thanks to Matt Boman, John Bowman, and James A. Treacy for
Chris@42 411 reports/fixes). Added x86-64 cycle counter for PGI compilers,
Chris@42 412 courtesy Cristiano Calonaci.
Chris@42 413
Chris@42 414 * Fix compilation problem in test program due to C99 conflict.
Chris@42 415
Chris@42 416 * Portability fix for import_system_wisdom with djgpp (thanks to Juan
Chris@42 417 Manuel Guerrero).
Chris@42 418
Chris@42 419 * Fixed compilation failure on MacOS 10.3 due to getopt conflict.
Chris@42 420
Chris@42 421 * Work around Visual C++ (version 6/7) bug in SSE compilation;
Chris@42 422 thanks to Eddie Yee for his detailed report.
Chris@42 423
Chris@42 424 Changes from FFTW 3.1 beta 2:
Chris@42 425
Chris@42 426 * Several minor compilation fixes.
Chris@42 427
Chris@42 428 * Eliminate FFTW_TIMELIMIT flag and replace fftw_timelimit global with
Chris@42 429 fftw_set_timelimit function. Make wisdom work with time-limited plans.
Chris@42 430
Chris@42 431 Changes from FFTW 3.1 beta 1:
Chris@42 432
Chris@42 433 * Fixes for creating DLLs under Windows; thanks to John Pavel for his feedback.
Chris@42 434
Chris@42 435 * Fixed more 64-bit problems, thanks to John Pavel for the bug report.
Chris@42 436
Chris@42 437 * Further speed improvements for Altivec/VMX.
Chris@42 438
Chris@42 439 * Further speed improvements for non-square transpositions.
Chris@42 440
Chris@42 441 * Many minor tweaks.
Chris@42 442
Chris@42 443 FFTW 3.0.1
Chris@42 444
Chris@42 445 * Some speed improvements in SIMD code.
Chris@42 446
Chris@42 447 * --without-cycle-counter option is removed. If no cycle counter is found,
Chris@42 448 then the estimator is always used. A --with-slow-timer option is provided
Chris@42 449 to force the use of lower-resolution timers.
Chris@42 450
Chris@42 451 * Several fixes for compilation under Visual C++, with help from Stefane Ruel.
Chris@42 452
Chris@42 453 * Added x86 cycle counter for Visual C++, with help from Morten Nissov.
Chris@42 454
Chris@42 455 * Added S390 cycle counter, courtesy of James Treacy.
Chris@42 456
Chris@42 457 * Added missing static keyword that prevented simultaneous linkage
Chris@42 458 of different-precision versions; thanks to Rasmus Larsen for the bug report.
Chris@42 459
Chris@42 460 * Corrected accidental omission of f77_wisdom.f file; thanks to Alan Watson.
Chris@42 461
Chris@42 462 * Support -xopenmp flag for SunOS; thanks to John Lou for the bug report.
Chris@42 463
Chris@42 464 * Compilation with HP/UX cc requires -Wp,-H128000 flag to increase
Chris@42 465 preprocessor limits; thanks to Peter Vouras for the bug report.
Chris@42 466
Chris@42 467 * Removed non-portable use of 'tempfile' in fftw-wisdom-to-conf script;
Chris@42 468 thanks to Nicolas Decoster for the patch.
Chris@42 469
Chris@42 470 * Added 'make smallcheck' target in tests/ directory, at the request of
Chris@42 471 James Treacy.
Chris@42 472
Chris@42 473 FFTW 3.0
Chris@42 474
Chris@42 475 Major goals of this release:
Chris@42 476
Chris@42 477 * Speed: often 20% or more faster than FFTW 2.x, even without SIMD (see below).
Chris@42 478
Chris@42 479 * Complete rewrite, to make it easier to add new algorithms and transforms.
Chris@42 480
Chris@42 481 * New API, to support more general semantics.
Chris@42 482
Chris@42 483 Other enhancements:
Chris@42 484
Chris@42 485 * SIMD acceleration on supporting CPUs (SSE, SSE2, 3DNow!, and AltiVec).
Chris@42 486 (With special thanks to Franz Franchetti for many experimental prototypes
Chris@42 487 and to Stefan Kral for the vectorizing generator from fftwgel.)
Chris@42 488
Chris@42 489 * True in-place 1d transforms of large sizes (as well as compressed
Chris@42 490 twiddle tables for additional memory/cache savings).
Chris@42 491
Chris@42 492 * More arbitrary placement of real & imaginary data, e.g. including
Chris@42 493 interleaved (as in FFTW 2.x) as well as separate real/imag arrays.
Chris@42 494
Chris@42 495 * Efficient prime-size transforms of real data.
Chris@42 496
Chris@42 497 * Multidimensional transforms can operate on a subset of a larger matrix,
Chris@42 498 and/or transform selected dimensions of a multidimensional array.
Chris@42 499
Chris@42 500 * By popular demand, simultaneous linking to double precision (fftw),
Chris@42 501 single precision (fftwf), and long-double precision (fftwl) versions
Chris@42 502 of FFTW is now supported.
Chris@42 503
Chris@42 504 * Cycle counters (on all modern CPUs) are exploited to speed planning.
Chris@42 505
Chris@42 506 * Efficient transforms of real even/odd arrays, a.k.a. discrete
Chris@42 507 cosine/sine transforms (types I-IV). (Currently work via pre/post
Chris@42 508 processing of real transforms, ala FFTPACK, so are not optimal.)
Chris@42 509
Chris@42 510 * DHTs (Discrete Hartley Transforms), again via post-processing
Chris@42 511 of real transforms (and thus suboptimal, for now).
Chris@42 512
Chris@42 513 * Support for linking to just those parts of FFTW that you need,
Chris@42 514 greatly reducing the size of statically linked programs when
Chris@42 515 only a limited set of transform sizes/types are required.
Chris@42 516
Chris@42 517 * Canonical global wisdom file (/etc/fftw/wisdom) on Unix, along
Chris@42 518 with a command-line tool (fftw-wisdom) to generate/update it.
Chris@42 519
Chris@42 520 * Fortran API can be used with both g77 and non-g77 compilers
Chris@42 521 simultaneously.
Chris@42 522
Chris@42 523 * Multi-threaded version has optional OpenMP support.
Chris@42 524
Chris@42 525 * Authors' good looks have greatly improved with age.
Chris@42 526
Chris@42 527 Changes from 3.0beta3:
Chris@42 528
Chris@42 529 * Separate FMA distribution to better exploit fused multiply-add instructions
Chris@42 530 on PowerPC (and possibly other) architectures.
Chris@42 531
Chris@42 532 * Performance improvements via some inlining tweaks.
Chris@42 533
Chris@42 534 * fftw_flops now returns double arguments, not int, to avoid overflows
Chris@42 535 for large sizes.
Chris@42 536
Chris@42 537 * Workarounds for automake bugs.
Chris@42 538
Chris@42 539 Changes from 3.0beta2:
Chris@42 540
Chris@42 541 * The standard REDFT00/RODFT00 (DCT-I/DST-I) algorithm (used in
Chris@42 542 FFTPACK, NR, etcetera) turns out to have poor numerical accuracy, so
Chris@42 543 we replaced it with a slower routine that is more accurate.
Chris@42 544
Chris@42 545 * The guru planner and execute functions now have two variants, one that
Chris@42 546 takes complex arguments and one that takes separate real/imag pointers.
Chris@42 547
Chris@42 548 * Execute and planner routines now automatically align the stack on x86,
Chris@42 549 in case the calling program is misaligned.
Chris@42 550
Chris@42 551 * README file for test program.
Chris@42 552
Chris@42 553 * Fixed bugs in the combination of SIMD with multi-threaded transforms.
Chris@42 554
Chris@42 555 * Eliminated internal fftw_threads_init function, which some people were
Chris@42 556 calling accidentally instead of the fftw_init_threads API function.
Chris@42 557
Chris@42 558 * Check for -openmp flag (Intel C compiler) when --enable-openmp is used.
Chris@42 559
Chris@42 560 * Support AMD x86-64 SIMD and cycle counter.
Chris@42 561
Chris@42 562 * Support SSE2 intrinsics in forthcoming gcc 3.3.
Chris@42 563
Chris@42 564 Changes from 3.0beta1:
Chris@42 565
Chris@42 566 * Faster in-place 1d transforms of non-power-of-two sizes.
Chris@42 567
Chris@42 568 * SIMD improvements for in-place, multi-dimensional, and/or non-FFTW_PATIENT
Chris@42 569 transforms.
Chris@42 570
Chris@42 571 * Added support for hard-coded DCT/DST/DHT codelets of small sizes; the
Chris@42 572 default distribution only includes hard-coded size-8 DCT-II/III, however.
Chris@42 573
Chris@42 574 * Many minor improvements to the manual. Added section on using the
Chris@42 575 codelet generator to customize and enhance FFTW.
Chris@42 576
Chris@42 577 * The default 'make check' should now only take a few minutes; for more
Chris@42 578 strenuous tests (which may take a day or so), do 'cd tests; make bigcheck'.
Chris@42 579
Chris@42 580 * fftw_print_plan is split into fftw_fprint_plan and fftw_print_plan, where
Chris@42 581 the latter uses stdout.
Chris@42 582
Chris@42 583 * Fixed ability to compile with a C++ compiler.
Chris@42 584
Chris@42 585 * Fixed support for C99 complex type under glibc.
Chris@42 586
Chris@42 587 * Fixed problems with alloca under MinGW, AIX.
Chris@42 588
Chris@42 589 * Workaround for gcc/SPARC bug.
Chris@42 590
Chris@42 591 * Fixed multi-threaded initialization failure on IRIX due to lack of
Chris@42 592 user-accessible PTHREAD_SCOPE_SYSTEM there.