annotate src/fftw-3.3.8/NEWS @ 83:ae30d91d2ffe

Replace these with versions built using an older toolset (so as to avoid ABI compatibilities when linking on Ubuntu 14.04 for packaging purposes)
author Chris Cannam
date Fri, 07 Feb 2020 11:51:13 +0000
parents d0c2a83c1364
children
rev   line source
Chris@82 1 FFTW 3.3.8:
Chris@82 2
Chris@82 3 * Fixed AVX, AVX2 for gcc-8.
Chris@82 4
Chris@82 5 By default, FFTW 3.3.7 was broken with gcc-8. AVX and AVX2 code
Chris@82 6 assumed that the compiler honors the distinction between +0 and -0,
Chris@82 7 but gcc-8 -ffast-math does not. The default CFLAGS included -ffast-math.
Chris@82 8 This release ensures that FFTW works with gcc-8 -ffast-math, and
Chris@82 9 removes -ffast-math from the default CFLAGS for good measure.
Chris@82 10
Chris@82 11 FFTW 3.3.7:
Chris@82 12
Chris@82 13 * Experimental support for CMake.
Chris@82 14
Chris@82 15 The primary build mechanism for FFTW remains GNU autoconf/automake.
Chris@82 16 CMake support is meant to offer an easy way to compile FFTW on
Chris@82 17 Windows, and as such it does not cover all the features of the
Chris@82 18 automake build system, such as exotic cycle counters,
Chris@82 19 cross-compiling, or build of binaries for a mixture of ISA's
Chris@82 20 (e.g., amd64 vs amd64+avx vs amd64+avx2). Patches are welcome.
Chris@82 21
Chris@82 22 * Fixes for armv7a cycle counter.
Chris@82 23 * Official support for aarch64, now that we have hardware to test it.
Chris@82 24 * Tweak usage of FMA instructions in a way that favors newer processors
Chris@82 25 (Skylake and Ryzen) over older processors (Haswell).
Chris@82 26 * tests/bench: use 64-bit precision to compute mflops.
Chris@82 27
Chris@82 28 FFTW 3.3.6-pl2:
Chris@82 29
Chris@82 30 * Bugfix: MPI Fortran-03 headers were missing in FFTW 3.3.6-pl1.
Chris@82 31
Chris@82 32 FFTW 3.3.6-pl1:
Chris@82 33
Chris@82 34 * Bugfix: FFTW 3.3.6 had the wrong libtool version number, and generated
Chris@82 35 shared libraries of the form libfftw3.so.2.6.6 instead of
Chris@82 36 libfftw3.so.3.*.
Chris@82 37
Chris@82 38 FFTW 3.3.6:
Chris@82 39
Chris@82 40 * The fftw_make_planner_thread_safe() API introduced in 3.3.5 didn't
Chris@82 41 work, and this 3.3.6 fixes it. Sorry about that.
Chris@82 42 * compilation fixes for IBM XLC
Chris@82 43 * compilation fixes for threads on Windows
Chris@82 44 * fix SIMD autodetection on amd64 when (_MSC_VER > 1500)
Chris@82 45
Chris@82 46 FFTW 3.3.5:
Chris@82 47
Chris@82 48 * New SIMD support:
Chris@82 49 - Power8 VSX instructions in single and double precision.
Chris@82 50 To use, add --enable-vsx to configure.
Chris@82 51 - Support for AVX2 (256-bit FMA instructions).
Chris@82 52 To use, add --enable-avx2 to configure.
Chris@82 53 - Experimental support for AVX512 and KCVI. (--enable-avx512, --enable-kcvi)
Chris@82 54 This code is expected to work but the FFTW maintainers do not have
Chris@82 55 hardware to test it.
Chris@82 56 - Support for AVX128/FMA (for some AMD machines) (--enable-avx128-fma)
Chris@82 57 - Double precision Neon SIMD for aarch64.
Chris@82 58 This code is expected to work but the FFTW maintainers do not have
Chris@82 59 hardware to test it.
Chris@82 60 - generic SIMD support using gcc vector intrinsics
Chris@82 61 * Add fftw_make_planner_thread_safe() API
Chris@82 62 * fix #18 (disable float128 for CUDACC)
Chris@82 63 * fix #19: missing Fortran interface for fftwq_alloc_real
Chris@82 64 * fix #21 (don't use float128 on Portland compilers, which pretend to be gcc)
Chris@82 65 * fix: Avoid segfaults due to double free in MPI transpose
Chris@82 66
Chris@82 67 * Special note for distribution maintainers: Although FFTW supports a
Chris@82 68 zillion SIMD instruction sets, enabling them all at the same time is
Chris@82 69 a bad idea, because it increases the planning time for minimal gain.
Chris@82 70 We recommend that general-purpose x86 distributions only enable SSE2
Chris@82 71 and perhaps AVX. Users who care about the last ounce of performance
Chris@82 72 should recompile FFTW themselves.
Chris@82 73
Chris@82 74 FFTW 3.3.4
Chris@82 75
Chris@82 76 * New functions fftw_alignment_of (to check whether two arrays are
Chris@82 77 equally aligned for the purposes of applying a plan) and fftw_sprint_plan
Chris@82 78 (to output a description of plan to a string).
Chris@82 79
Chris@82 80 * Bugfix in fftw-wisdom-to-conf; thanks to Florian Oppermann for the
Chris@82 81 bug report.
Chris@82 82
Chris@82 83 * Fixed manual to work with texinfo-5.
Chris@82 84
Chris@82 85 * Increased timing interval on x86_64 to reduce timing errors.
Chris@82 86
Chris@82 87 * Default to Win32 threads, not pthreads, if both are present.
Chris@82 88
Chris@82 89 * Various build-script fixes.
Chris@82 90
Chris@82 91 FFTW 3.3.3
Chris@82 92
Chris@82 93 * Fix deadlock bug in MPI transforms (thanks to Michael Pippig for the
Chris@82 94 bug report and patch, and to Graham Dennis for the bug report).
Chris@82 95
Chris@82 96 * Use 128-bit ARM NEON instructions instead of 64-bits. This change
Chris@82 97 appears to speed up even ARM processors with a 64-bit NEON pipe.
Chris@82 98
Chris@82 99 * Speed improvements for single-precision AVX.
Chris@82 100
Chris@82 101 * Speed up planner on machines without "official" cycle counters, such as ARM.
Chris@82 102
Chris@82 103 FFTW 3.3.2
Chris@82 104
Chris@82 105 * Removed an archaic stack-alignment hack that was failing with
Chris@82 106 gcc-4.7/i386.
Chris@82 107
Chris@82 108 * Added stack-alignment hack necessary for gcc on Windows/i386. We
Chris@82 109 will regret this in ten years (see previous change).
Chris@82 110
Chris@82 111 * Fix incompatibility with Intel icc which pretends to be gcc
Chris@82 112 but does not support quad precision.
Chris@82 113
Chris@82 114 * make libfftw{threads,mpi} depend upon libfftw when using libtool;
Chris@82 115 this is consistent with most other libraries and simplifies the life
Chris@82 116 of various distributors of GNU/Linux.
Chris@82 117
Chris@82 118 FFTW 3.3.1
Chris@82 119
Chris@82 120 * Changes since 3.3.1-beta1:
Chris@82 121
Chris@82 122 - Reduced planning time in estimate mode for sizes with large
Chris@82 123 prime factors.
Chris@82 124
Chris@82 125 - Added AVX autodetection under Visual Studio. Thanks Carsten
Chris@82 126 Steger for submitting the necessary code.
Chris@82 127
Chris@82 128 - Modern Fortran interface now uses a separate fftw3l.f03 interface
Chris@82 129 file for the long double interface, which is not supported by
Chris@82 130 some Fortran compilers. Provided new fftw3q.f03 interface file
Chris@82 131 to access the quadruple-precision FFTW routines with recent
Chris@82 132 versions of gcc/gfortran.
Chris@82 133
Chris@82 134 * Added support for the NEON extensions to the ARM ISA. (Note to beta
Chris@82 135 users: an ARM cycle counter is not yet implemented; please contact
Chris@82 136 fftw@fftw.org if you know how to do it right.)
Chris@82 137
Chris@82 138 * MPI code now compiles even if mpicc is a C++ compiler; thanks to
Chris@82 139 Kyle Spyksma for the bug report.
Chris@82 140
Chris@82 141 FFTW 3.3
Chris@82 142
Chris@82 143 * Changes since 3.3-beta1:
Chris@82 144
Chris@82 145 - Compiling OpenMP support (--enable-openmp) now installs a
Chris@82 146 fftw3_omp library, instead of fftw3_threads, so that OpenMP
Chris@82 147 and POSIX threads (--enable-threads) libraries can be built
Chris@82 148 and installed at the same time.
Chris@82 149
Chris@82 150 - Various minor compilation fixes, corrections of manual typos, and
Chris@82 151 improvements to the benchmark test program.
Chris@82 152
Chris@82 153 * Add support for the AVX extensions to x86 and x86-64. The AVX code
Chris@82 154 works with 16-byte alignment (as opposed to 32-byte alignment),
Chris@82 155 so there is no ABI change compared to FFTW 3.2.2.
Chris@82 156
Chris@82 157 * Added Fortran 2003 interface, which should be usable on most modern
Chris@82 158 Fortran compilers (e.g. gfortran) and provides type-checked access
Chris@82 159 to the the C FFTW interface. (The legacy Fortran-77 interface is
Chris@82 160 still included also.)
Chris@82 161
Chris@82 162 * Added MPI distributed-memory transforms. Compared to 3.3alpha,
Chris@82 163 the major changes in the MPI transforms are:
Chris@82 164 - Fixed some deadlock and crashing bugs.
Chris@82 165 - Added Fortran 2003 interface.
Chris@82 166 - Added new-array execute functions for MPI plans.
Chris@82 167 - Eliminated use of large MPI tags, since Cray MPI requires tags < 2^24;
Chris@82 168 thanks to Jonathan Bentz for the bug report.
Chris@82 169 - Expanded documentation.
Chris@82 170 - 'make check' now runs MPI tests
Chris@82 171 - Some ABI changes - not binary-compatible with 3.3alpha MPI.
Chris@82 172
Chris@82 173 * Add support for quad-precision __float128 in gcc 4.6 or later (on x86.
Chris@82 174 x86-64, and Itanium). The new routines use the fftwq_ prefix.
Chris@82 175
Chris@82 176 * Removed support for MIPS paired-single instructions due to lack of
Chris@82 177 available hardware for testing. Users who want this functionality
Chris@82 178 should continue using FFTW 3.2.x. (Note that FFTW 3.3 still works
Chris@82 179 on MIPS; this only concerns special instructions available on some
Chris@82 180 MIPS chips.)
Chris@82 181
Chris@82 182 * Removed support for the Cell Broadband Engine. Cell users should
Chris@82 183 use FFTW 3.2.x.
Chris@82 184
Chris@82 185 * New convenience functions fftw_alloc_real and fftw_alloc_complex
Chris@82 186 to use fftw_malloc for real and complex arrays without typecasts
Chris@82 187 or sizeof.
Chris@82 188
Chris@82 189 * New convenience functions fftw_export_wisdom_to_filename and
Chris@82 190 fftw_import_wisdom_from_filename that export/import wisdom
Chris@82 191 to a file, which don't require you to open/close the file yourself.
Chris@82 192
Chris@82 193 * New function fftw_cost to return FFTW's internal cost metric for
Chris@82 194 a given plan; thanks to Rhys Ulerich and Nathanael Schaeffer for the
Chris@82 195 suggestion.
Chris@82 196
Chris@82 197 * The --enable-sse2 configure flag now works in both double and single
Chris@82 198 precision (and is equivalent to --enable-sse in the latter case).
Chris@82 199
Chris@82 200 * Remove --enable-portable-binary flag: we new produce portable binaries
Chris@82 201 by default.
Chris@82 202
Chris@82 203 * Remove the automatic detection of native architecture flag for gcc
Chris@82 204 which was introduced in fftw-3.1, since new gcc supports -mtune=native.
Chris@82 205 Remove the --with-gcc-arch flag; if you want to specify a particlar
Chris@82 206 arch to configure, use ./configure CC="gcc -mtune=...".
Chris@82 207
Chris@82 208 * --with-our-malloc16 configure flag is now renamed --with-our-malloc.
Chris@82 209
Chris@82 210 * Fixed build problem failure when srand48 declaration is missing;
Chris@82 211 thanks to Ralf Wildenhues for the bug report.
Chris@82 212
Chris@82 213 * Fixed bug in fftw_set_timelimit: ensure that a negative timelimit
Chris@82 214 is equivalent to no timelimit in all cases. Thanks to William Andrew
Chris@82 215 Burnson for the bug report.
Chris@82 216
Chris@82 217 * Fixed stack-overflow problem on OpenBSD caused by using alloca with
Chris@82 218 too large a buffer.
Chris@82 219
Chris@82 220 FFTW 3.2.2
Chris@82 221
Chris@82 222 * Improve performance of some copy operations of complex arrays on
Chris@82 223 x86 machines.
Chris@82 224
Chris@82 225 * Add configure flag to disable alloca(), which is broken in mingw64.
Chris@82 226
Chris@82 227 * Planning in FFTW_ESTIMATE mode for r2r transforms became slower
Chris@82 228 between fftw-3.1.3 and 3.2. This regression has now been fixed.
Chris@82 229
Chris@82 230 FFTW 3.2.1
Chris@82 231
Chris@82 232 * Performance improvements for some multidimensional r2c/c2r transforms;
Chris@82 233 thanks to Eugene Miloslavsky for his benchmark reports.
Chris@82 234
Chris@82 235 * Compile with icc on MacOS X, use better icc compiler flags.
Chris@82 236
Chris@82 237 * Compilation fixes for systems where snprintf is defined as a macro;
Chris@82 238 thanks to Marcus Mae for the bug report.
Chris@82 239
Chris@82 240 * Fortran documentation now recommends not using dfftw_execute,
Chris@82 241 because of reports of problems with various Fortran compilers;
Chris@82 242 it is better to use dfftw_execute_dft etcetera.
Chris@82 243
Chris@82 244 * Some documentation clarifications, e.g. of fact that --enable-openmp
Chris@82 245 and --enable-threads are mutually exclusive (thanks to Long To),
Chris@82 246 and document slightly odd behavior of plan_guru_r2r in Fortran
Chris@82 247 (thanks to Alexander Pozdneev).
Chris@82 248
Chris@82 249 * FAQ was accidentally omitted from 3.2 tarball.
Chris@82 250
Chris@82 251 * Remove some extraneous (harmless) files accidentally included in
Chris@82 252 a subdirectory of the 3.2 tarball.
Chris@82 253
Chris@82 254 FFTW 3.2
Chris@82 255
Chris@82 256 * Worked around apparent glibc bug that leads to rare hangs when freeing
Chris@82 257 semaphores.
Chris@82 258
Chris@82 259 * Fixed segfault due to unaligned access in certain obscure problems
Chris@82 260 that use SSE and multiple threads.
Chris@82 261
Chris@82 262 * MPI transforms not included, as they are still in alpha; the alpha
Chris@82 263 versions of the MPI transforms have been moved to FFTW 3.3alpha1.
Chris@82 264
Chris@82 265 FFTW 3.2alpha3
Chris@82 266
Chris@82 267 * Performance improvements for sizes with factors of 5 and 10.
Chris@82 268
Chris@82 269 * Documented FFTW_WISDOM_ONLY flag, at the suggestion of Mario
Chris@82 270 Emmenlauer and Phil Dumont.
Chris@82 271
Chris@82 272 * Port Cell code to SDK2.1 (libspe2), as opposed to the old libspe1 code.
Chris@82 273
Chris@82 274 * Performance improvements in Cell code for N < 32k, thanks to Jan Wagner
Chris@82 275 for the suggestions.
Chris@82 276
Chris@82 277 * Cycle counter for Sun x86_64 compiler, and compilation fix in cycle
Chris@82 278 counter for AIX/xlc (thanks to Jeff Haferman for the bug report).
Chris@82 279
Chris@82 280 * Fixed incorrect type prefix in MPI code that prevented wisdom routines
Chris@82 281 from working in single precision (thanks to Eric A. Borisch for the report).
Chris@82 282
Chris@82 283 * Added 'make check' for MPI code (which still fails in a couple corner
Chris@82 284 cases, but should be much better than in alpha2).
Chris@82 285
Chris@82 286 * Many other small fixes.
Chris@82 287
Chris@82 288 FFTW 3.2alpha2
Chris@82 289
Chris@82 290 * Support for the Cell processor, donated by IBM Research; see README.Cell
Chris@82 291 and the Cell section of the manual.
Chris@82 292
Chris@82 293 * New 64-bit API: for every "plan_guru" function there is a new "plan_guru64"
Chris@82 294 function with the same semantics, but which takes fftw_iodim64 instead of
Chris@82 295 fftw_iodim. fftw_iodim64 is the same as fftw_iodim, except that it takes
Chris@82 296 ptrdiff_t integer types as parameters, which is a 64-bit type on
Chris@82 297 64-bit machines. This is only useful for specifying very large transforms
Chris@82 298 on 64-bit machines. (Internally, FFTW uses ptrdiff_t everywhere
Chris@82 299 regardless of what API you choose.)
Chris@82 300
Chris@82 301 * Experimental MPI support. Complex one- and multi-dimensional FFTs,
Chris@82 302 multi-dimensional r2r, multi-dimensional r2c/c2r transforms, and
Chris@82 303 distributed transpose operations, with 1d block distributions.
Chris@82 304 (This is an alpha preview: routines have not been exhaustively
Chris@82 305 tested, documentation is incomplete, and some functionality is
Chris@82 306 missing, e.g. Fortran support.) See mpi/README and also the MPI
Chris@82 307 section of the manual.
Chris@82 308
Chris@82 309 * Significantly faster r2c/c2r transforms, especially on machines with SIMD.
Chris@82 310
Chris@82 311 * Rewritten multi-threaded support for better performance by
Chris@82 312 re-using a fixed pool of threads rather than continually
Chris@82 313 respawning and joining (which nowadays is much slower).
Chris@82 314
Chris@82 315 * Support for MIPS paired-single SIMD instructions, donated by
Chris@82 316 Codesourcery.
Chris@82 317
Chris@82 318 * FFTW_WISDOM_ONLY planner flag, to create plan only if wisdom is
Chris@82 319 available and return NULL otherwise.
Chris@82 320
Chris@82 321 * Removed k7 support, which only worked in 32-bit mode and is
Chris@82 322 becoming obsolete. Use --enable-sse instead.
Chris@82 323
Chris@82 324 * Added --with-g77-wrappers configure option to force inclusion
Chris@82 325 of g77 wrappers, in addition to whatever is needed for the
Chris@82 326 detected Fortran compilers. This is mainly intended for GNU/Linux
Chris@82 327 distros switching to gfortran that wish to include both
Chris@82 328 gfortran and g77 support in FFTW.
Chris@82 329
Chris@82 330 * In manual, renamed "guru execute" functions to "new-array execute"
Chris@82 331 functions, to reduce confusion with the guru planner interface.
Chris@82 332 (The programming interface is unchanged.)
Chris@82 333
Chris@82 334 * Add missing __declspec attribute to threads API functions when compiling
Chris@82 335 for Windows; thanks to Robert O. Morris for the bug report.
Chris@82 336
Chris@82 337 * Fixed missing return value from dfftw_init_threads in Fortran;
Chris@82 338 thanks to Markus Wetzstein for the bug report.
Chris@82 339
Chris@82 340 FFTW 3.1.3
Chris@82 341
Chris@82 342 * Bug fix: FFTW computes incorrect results when the user plans both
Chris@82 343 REDFT11 and RODFT11 transforms of certain sizes. The bug is caused
Chris@82 344 by incorrect sharing of twiddle-factor tables between the two
Chris@82 345 transforms, and only occurs when both are used. Thanks to Paul
Chris@82 346 A. Valiant for the bug report.
Chris@82 347
Chris@82 348 FFTW 3.1.2
Chris@82 349
Chris@82 350 * Correct bug in configure script: --enable-portable-binary option was ignored!
Chris@82 351 Thanks to Andrew Salamon for the bug report.
Chris@82 352
Chris@82 353 * Threads compilation fix on AIX: prefer xlc_r to cc_r, and don't use
Chris@82 354 either if we are using gcc. Thanks to Guy Moebs for the bug report.
Chris@82 355
Chris@82 356 * Updated FAQ to note that Apple gcc 4.0.1 on MacOS/Intel is broken,
Chris@82 357 and suggest a workaround. configure script now detects Core/Duo arch.
Chris@82 358
Chris@82 359 * Use -maltivec when checking for altivec.h. Fixes Gentoo bug #129304,
Chris@82 360 thanks to Markus Dittrich.
Chris@82 361
Chris@82 362 FFTW 3.1.1
Chris@82 363
Chris@82 364 * Performance improvements for Intel EMT64.
Chris@82 365
Chris@82 366 * Performance improvements for large-size transforms with SIMD.
Chris@82 367
Chris@82 368 * Cycle counter support for Intel icc and Visual C++ on x86-64.
Chris@82 369
Chris@82 370 * In fftw-wisdom tool, replaced obsolete --impatient with --measure.
Chris@82 371
Chris@82 372 * Fixed compilation failure with AIX/xlc; thanks to Joseph Thomas.
Chris@82 373
Chris@82 374 * Windows DLL support for Fortran API (added missing __declspec(dllexport)).
Chris@82 375
Chris@82 376 * SSE/SSE2 code works properly (i.e. disables itself) on older 386 and 486
Chris@82 377 CPUs lacking a CPUID instruction; thanks to Eric Korpela.
Chris@82 378
Chris@82 379 FFTW 3.1
Chris@82 380
Chris@82 381 * Faster FFTW_ESTIMATE planner.
Chris@82 382
Chris@82 383 * New (faster) algorithm for REDFT00/RODFT00 (type-I DCT/DST) of odd size.
Chris@82 384
Chris@82 385 * "4-step" algorithm for faster FFTs of very large sizes (> 2^18).
Chris@82 386
Chris@82 387 * Faster in-place real-data DFTs (for R2HC and HC2R r2r formats).
Chris@82 388
Chris@82 389 * Faster in-place non-square transpositions (FFTW uses these internally
Chris@82 390 for in-place FFTs, and you can also perform them explicitly using
Chris@82 391 the guru interface).
Chris@82 392
Chris@82 393 * Faster prime-size DFTs: implemented Bluestein's algorithm, as well
Chris@82 394 as a zero-padded Rader variant to limit recursive use of Rader's algorithm.
Chris@82 395
Chris@82 396 * SIMD support for split complex arrays.
Chris@82 397
Chris@82 398 * Much faster Altivec/VMX performance.
Chris@82 399
Chris@82 400 * New fftw_set_timelimit function to specify a (rough) upper bound to the
Chris@82 401 planning time (does not affect ESTIMATE mode).
Chris@82 402
Chris@82 403 * Removed --enable-3dnow support; use --enable-k7 instead.
Chris@82 404
Chris@82 405 * FMA (fused multiply-add) version is now included in "standard" FFTW,
Chris@82 406 and is enabled with --enable-fma (the default on PowerPC and Itanium).
Chris@82 407
Chris@82 408 * Automatic detection of native architecture flag for gcc. New
Chris@82 409 configure options: --enable-portable-binary and --with-gcc-arch=<arch>,
Chris@82 410 for people distributing compiled binaries of FFTW (see manual).
Chris@82 411
Chris@82 412 * Automatic detection of Altivec under Linux with gcc 3.4 (so that
Chris@82 413 same binary should work on both Altivec and non-Altivec PowerPCs).
Chris@82 414
Chris@82 415 * Compiler-specific tweaks/flags/workarounds for gcc 3.4, xlc, HP/UX,
Chris@82 416 Solaris/Intel.
Chris@82 417
Chris@82 418 * Various documentation clarifications.
Chris@82 419
Chris@82 420 * 64-bit clean. (Fixes a bug affecting the split guru planner on
Chris@82 421 64-bit machines, reported by David Necas.)
Chris@82 422
Chris@82 423 * Fixed Debian bug #259612: inadvertent use of SSE instructions on
Chris@82 424 non-SSE machines (causing a crash) for --enable-sse binaries.
Chris@82 425
Chris@82 426 * Fixed bug that caused HC2R transforms to destroy the input in
Chris@82 427 certain cases, even if the user specified FFTW_PRESERVE_INPUT.
Chris@82 428
Chris@82 429 * Fixed bug where wisdom would be lost under rare circumstances,
Chris@82 430 causing excessive planning time.
Chris@82 431
Chris@82 432 * FAQ notes bug in gcc-3.4.[1-3] that causes FFTW to crash with SSE/SSE2.
Chris@82 433
Chris@82 434 * Fixed accidentally exported symbol that prohibited simultaneous
Chris@82 435 linking to double/single multithreaded FFTW (thanks to Alessio Massaro).
Chris@82 436
Chris@82 437 * Support Win32 threads under MinGW (thanks to Alessio Massaro).
Chris@82 438
Chris@82 439 * Fixed problem with building DLL under Cygwin; thanks to Stephane Fillod.
Chris@82 440
Chris@82 441 * Fix build failure if no Fortran compiler is found (thanks to Charles
Chris@82 442 Radley for the bug report).
Chris@82 443
Chris@82 444 * Fixed compilation failure with icc 8.0 and SSE/SSE2. Automatic
Chris@82 445 detection of icc architecture flag (e.g. -xW).
Chris@82 446
Chris@82 447 * Fixed compilation with OpenMP on AIX (thanks to Greg Bauer).
Chris@82 448
Chris@82 449 * Fixed compilation failure on x86-64 with gcc (thanks to Orion Poplawski).
Chris@82 450
Chris@82 451 * Incorporated patch from FreeBSD ports (FreeBSD does not have memalign,
Chris@82 452 but its malloc is 16-byte aligned).
Chris@82 453
Chris@82 454 * Cycle-counter compilation fixes for Itanium, Alpha, x86-64, Sparc,
Chris@82 455 MacOS (thanks to Matt Boman, John Bowman, and James A. Treacy for
Chris@82 456 reports/fixes). Added x86-64 cycle counter for PGI compilers,
Chris@82 457 courtesy Cristiano Calonaci.
Chris@82 458
Chris@82 459 * Fix compilation problem in test program due to C99 conflict.
Chris@82 460
Chris@82 461 * Portability fix for import_system_wisdom with djgpp (thanks to Juan
Chris@82 462 Manuel Guerrero).
Chris@82 463
Chris@82 464 * Fixed compilation failure on MacOS 10.3 due to getopt conflict.
Chris@82 465
Chris@82 466 * Work around Visual C++ (version 6/7) bug in SSE compilation;
Chris@82 467 thanks to Eddie Yee for his detailed report.
Chris@82 468
Chris@82 469 Changes from FFTW 3.1 beta 2:
Chris@82 470
Chris@82 471 * Several minor compilation fixes.
Chris@82 472
Chris@82 473 * Eliminate FFTW_TIMELIMIT flag and replace fftw_timelimit global with
Chris@82 474 fftw_set_timelimit function. Make wisdom work with time-limited plans.
Chris@82 475
Chris@82 476 Changes from FFTW 3.1 beta 1:
Chris@82 477
Chris@82 478 * Fixes for creating DLLs under Windows; thanks to John Pavel for his feedback.
Chris@82 479
Chris@82 480 * Fixed more 64-bit problems, thanks to John Pavel for the bug report.
Chris@82 481
Chris@82 482 * Further speed improvements for Altivec/VMX.
Chris@82 483
Chris@82 484 * Further speed improvements for non-square transpositions.
Chris@82 485
Chris@82 486 * Many minor tweaks.
Chris@82 487
Chris@82 488 FFTW 3.0.1
Chris@82 489
Chris@82 490 * Some speed improvements in SIMD code.
Chris@82 491
Chris@82 492 * --without-cycle-counter option is removed. If no cycle counter is found,
Chris@82 493 then the estimator is always used. A --with-slow-timer option is provided
Chris@82 494 to force the use of lower-resolution timers.
Chris@82 495
Chris@82 496 * Several fixes for compilation under Visual C++, with help from Stefane Ruel.
Chris@82 497
Chris@82 498 * Added x86 cycle counter for Visual C++, with help from Morten Nissov.
Chris@82 499
Chris@82 500 * Added S390 cycle counter, courtesy of James Treacy.
Chris@82 501
Chris@82 502 * Added missing static keyword that prevented simultaneous linkage
Chris@82 503 of different-precision versions; thanks to Rasmus Larsen for the bug report.
Chris@82 504
Chris@82 505 * Corrected accidental omission of f77_wisdom.f file; thanks to Alan Watson.
Chris@82 506
Chris@82 507 * Support -xopenmp flag for SunOS; thanks to John Lou for the bug report.
Chris@82 508
Chris@82 509 * Compilation with HP/UX cc requires -Wp,-H128000 flag to increase
Chris@82 510 preprocessor limits; thanks to Peter Vouras for the bug report.
Chris@82 511
Chris@82 512 * Removed non-portable use of 'tempfile' in fftw-wisdom-to-conf script;
Chris@82 513 thanks to Nicolas Decoster for the patch.
Chris@82 514
Chris@82 515 * Added 'make smallcheck' target in tests/ directory, at the request of
Chris@82 516 James Treacy.
Chris@82 517
Chris@82 518 FFTW 3.0
Chris@82 519
Chris@82 520 Major goals of this release:
Chris@82 521
Chris@82 522 * Speed: often 20% or more faster than FFTW 2.x, even without SIMD (see below).
Chris@82 523
Chris@82 524 * Complete rewrite, to make it easier to add new algorithms and transforms.
Chris@82 525
Chris@82 526 * New API, to support more general semantics.
Chris@82 527
Chris@82 528 Other enhancements:
Chris@82 529
Chris@82 530 * SIMD acceleration on supporting CPUs (SSE, SSE2, 3DNow!, and AltiVec).
Chris@82 531 (With special thanks to Franz Franchetti for many experimental prototypes
Chris@82 532 and to Stefan Kral for the vectorizing generator from fftwgel.)
Chris@82 533
Chris@82 534 * True in-place 1d transforms of large sizes (as well as compressed
Chris@82 535 twiddle tables for additional memory/cache savings).
Chris@82 536
Chris@82 537 * More arbitrary placement of real & imaginary data, e.g. including
Chris@82 538 interleaved (as in FFTW 2.x) as well as separate real/imag arrays.
Chris@82 539
Chris@82 540 * Efficient prime-size transforms of real data.
Chris@82 541
Chris@82 542 * Multidimensional transforms can operate on a subset of a larger matrix,
Chris@82 543 and/or transform selected dimensions of a multidimensional array.
Chris@82 544
Chris@82 545 * By popular demand, simultaneous linking to double precision (fftw),
Chris@82 546 single precision (fftwf), and long-double precision (fftwl) versions
Chris@82 547 of FFTW is now supported.
Chris@82 548
Chris@82 549 * Cycle counters (on all modern CPUs) are exploited to speed planning.
Chris@82 550
Chris@82 551 * Efficient transforms of real even/odd arrays, a.k.a. discrete
Chris@82 552 cosine/sine transforms (types I-IV). (Currently work via pre/post
Chris@82 553 processing of real transforms, ala FFTPACK, so are not optimal.)
Chris@82 554
Chris@82 555 * DHTs (Discrete Hartley Transforms), again via post-processing
Chris@82 556 of real transforms (and thus suboptimal, for now).
Chris@82 557
Chris@82 558 * Support for linking to just those parts of FFTW that you need,
Chris@82 559 greatly reducing the size of statically linked programs when
Chris@82 560 only a limited set of transform sizes/types are required.
Chris@82 561
Chris@82 562 * Canonical global wisdom file (/etc/fftw/wisdom) on Unix, along
Chris@82 563 with a command-line tool (fftw-wisdom) to generate/update it.
Chris@82 564
Chris@82 565 * Fortran API can be used with both g77 and non-g77 compilers
Chris@82 566 simultaneously.
Chris@82 567
Chris@82 568 * Multi-threaded version has optional OpenMP support.
Chris@82 569
Chris@82 570 * Authors' good looks have greatly improved with age.
Chris@82 571
Chris@82 572 Changes from 3.0beta3:
Chris@82 573
Chris@82 574 * Separate FMA distribution to better exploit fused multiply-add instructions
Chris@82 575 on PowerPC (and possibly other) architectures.
Chris@82 576
Chris@82 577 * Performance improvements via some inlining tweaks.
Chris@82 578
Chris@82 579 * fftw_flops now returns double arguments, not int, to avoid overflows
Chris@82 580 for large sizes.
Chris@82 581
Chris@82 582 * Workarounds for automake bugs.
Chris@82 583
Chris@82 584 Changes from 3.0beta2:
Chris@82 585
Chris@82 586 * The standard REDFT00/RODFT00 (DCT-I/DST-I) algorithm (used in
Chris@82 587 FFTPACK, NR, etcetera) turns out to have poor numerical accuracy, so
Chris@82 588 we replaced it with a slower routine that is more accurate.
Chris@82 589
Chris@82 590 * The guru planner and execute functions now have two variants, one that
Chris@82 591 takes complex arguments and one that takes separate real/imag pointers.
Chris@82 592
Chris@82 593 * Execute and planner routines now automatically align the stack on x86,
Chris@82 594 in case the calling program is misaligned.
Chris@82 595
Chris@82 596 * README file for test program.
Chris@82 597
Chris@82 598 * Fixed bugs in the combination of SIMD with multi-threaded transforms.
Chris@82 599
Chris@82 600 * Eliminated internal fftw_threads_init function, which some people were
Chris@82 601 calling accidentally instead of the fftw_init_threads API function.
Chris@82 602
Chris@82 603 * Check for -openmp flag (Intel C compiler) when --enable-openmp is used.
Chris@82 604
Chris@82 605 * Support AMD x86-64 SIMD and cycle counter.
Chris@82 606
Chris@82 607 * Support SSE2 intrinsics in forthcoming gcc 3.3.
Chris@82 608
Chris@82 609 Changes from 3.0beta1:
Chris@82 610
Chris@82 611 * Faster in-place 1d transforms of non-power-of-two sizes.
Chris@82 612
Chris@82 613 * SIMD improvements for in-place, multi-dimensional, and/or non-FFTW_PATIENT
Chris@82 614 transforms.
Chris@82 615
Chris@82 616 * Added support for hard-coded DCT/DST/DHT codelets of small sizes; the
Chris@82 617 default distribution only includes hard-coded size-8 DCT-II/III, however.
Chris@82 618
Chris@82 619 * Many minor improvements to the manual. Added section on using the
Chris@82 620 codelet generator to customize and enhance FFTW.
Chris@82 621
Chris@82 622 * The default 'make check' should now only take a few minutes; for more
Chris@82 623 strenuous tests (which may take a day or so), do 'cd tests; make bigcheck'.
Chris@82 624
Chris@82 625 * fftw_print_plan is split into fftw_fprint_plan and fftw_print_plan, where
Chris@82 626 the latter uses stdout.
Chris@82 627
Chris@82 628 * Fixed ability to compile with a C++ compiler.
Chris@82 629
Chris@82 630 * Fixed support for C99 complex type under glibc.
Chris@82 631
Chris@82 632 * Fixed problems with alloca under MinGW, AIX.
Chris@82 633
Chris@82 634 * Workaround for gcc/SPARC bug.
Chris@82 635
Chris@82 636 * Fixed multi-threaded initialization failure on IRIX due to lack of
Chris@82 637 user-accessible PTHREAD_SCOPE_SYSTEM there.