annotate src/fftw-3.3.8/NEWS @ 169:223a55898ab9 tip default

Add null config files
author Chris Cannam <cannam@all-day-breakfast.com>
date Mon, 02 Mar 2020 14:03:47 +0000
parents bd3cc4d1df30
children
rev   line source
cannam@167 1 FFTW 3.3.8:
cannam@167 2
cannam@167 3 * Fixed AVX, AVX2 for gcc-8.
cannam@167 4
cannam@167 5 By default, FFTW 3.3.7 was broken with gcc-8. AVX and AVX2 code
cannam@167 6 assumed that the compiler honors the distinction between +0 and -0,
cannam@167 7 but gcc-8 -ffast-math does not. The default CFLAGS included -ffast-math.
cannam@167 8 This release ensures that FFTW works with gcc-8 -ffast-math, and
cannam@167 9 removes -ffast-math from the default CFLAGS for good measure.
cannam@167 10
cannam@167 11 FFTW 3.3.7:
cannam@167 12
cannam@167 13 * Experimental support for CMake.
cannam@167 14
cannam@167 15 The primary build mechanism for FFTW remains GNU autoconf/automake.
cannam@167 16 CMake support is meant to offer an easy way to compile FFTW on
cannam@167 17 Windows, and as such it does not cover all the features of the
cannam@167 18 automake build system, such as exotic cycle counters,
cannam@167 19 cross-compiling, or build of binaries for a mixture of ISA's
cannam@167 20 (e.g., amd64 vs amd64+avx vs amd64+avx2). Patches are welcome.
cannam@167 21
cannam@167 22 * Fixes for armv7a cycle counter.
cannam@167 23 * Official support for aarch64, now that we have hardware to test it.
cannam@167 24 * Tweak usage of FMA instructions in a way that favors newer processors
cannam@167 25 (Skylake and Ryzen) over older processors (Haswell).
cannam@167 26 * tests/bench: use 64-bit precision to compute mflops.
cannam@167 27
cannam@167 28 FFTW 3.3.6-pl2:
cannam@167 29
cannam@167 30 * Bugfix: MPI Fortran-03 headers were missing in FFTW 3.3.6-pl1.
cannam@167 31
cannam@167 32 FFTW 3.3.6-pl1:
cannam@167 33
cannam@167 34 * Bugfix: FFTW 3.3.6 had the wrong libtool version number, and generated
cannam@167 35 shared libraries of the form libfftw3.so.2.6.6 instead of
cannam@167 36 libfftw3.so.3.*.
cannam@167 37
cannam@167 38 FFTW 3.3.6:
cannam@167 39
cannam@167 40 * The fftw_make_planner_thread_safe() API introduced in 3.3.5 didn't
cannam@167 41 work, and this 3.3.6 fixes it. Sorry about that.
cannam@167 42 * compilation fixes for IBM XLC
cannam@167 43 * compilation fixes for threads on Windows
cannam@167 44 * fix SIMD autodetection on amd64 when (_MSC_VER > 1500)
cannam@167 45
cannam@167 46 FFTW 3.3.5:
cannam@167 47
cannam@167 48 * New SIMD support:
cannam@167 49 - Power8 VSX instructions in single and double precision.
cannam@167 50 To use, add --enable-vsx to configure.
cannam@167 51 - Support for AVX2 (256-bit FMA instructions).
cannam@167 52 To use, add --enable-avx2 to configure.
cannam@167 53 - Experimental support for AVX512 and KCVI. (--enable-avx512, --enable-kcvi)
cannam@167 54 This code is expected to work but the FFTW maintainers do not have
cannam@167 55 hardware to test it.
cannam@167 56 - Support for AVX128/FMA (for some AMD machines) (--enable-avx128-fma)
cannam@167 57 - Double precision Neon SIMD for aarch64.
cannam@167 58 This code is expected to work but the FFTW maintainers do not have
cannam@167 59 hardware to test it.
cannam@167 60 - generic SIMD support using gcc vector intrinsics
cannam@167 61 * Add fftw_make_planner_thread_safe() API
cannam@167 62 * fix #18 (disable float128 for CUDACC)
cannam@167 63 * fix #19: missing Fortran interface for fftwq_alloc_real
cannam@167 64 * fix #21 (don't use float128 on Portland compilers, which pretend to be gcc)
cannam@167 65 * fix: Avoid segfaults due to double free in MPI transpose
cannam@167 66
cannam@167 67 * Special note for distribution maintainers: Although FFTW supports a
cannam@167 68 zillion SIMD instruction sets, enabling them all at the same time is
cannam@167 69 a bad idea, because it increases the planning time for minimal gain.
cannam@167 70 We recommend that general-purpose x86 distributions only enable SSE2
cannam@167 71 and perhaps AVX. Users who care about the last ounce of performance
cannam@167 72 should recompile FFTW themselves.
cannam@167 73
cannam@167 74 FFTW 3.3.4
cannam@167 75
cannam@167 76 * New functions fftw_alignment_of (to check whether two arrays are
cannam@167 77 equally aligned for the purposes of applying a plan) and fftw_sprint_plan
cannam@167 78 (to output a description of plan to a string).
cannam@167 79
cannam@167 80 * Bugfix in fftw-wisdom-to-conf; thanks to Florian Oppermann for the
cannam@167 81 bug report.
cannam@167 82
cannam@167 83 * Fixed manual to work with texinfo-5.
cannam@167 84
cannam@167 85 * Increased timing interval on x86_64 to reduce timing errors.
cannam@167 86
cannam@167 87 * Default to Win32 threads, not pthreads, if both are present.
cannam@167 88
cannam@167 89 * Various build-script fixes.
cannam@167 90
cannam@167 91 FFTW 3.3.3
cannam@167 92
cannam@167 93 * Fix deadlock bug in MPI transforms (thanks to Michael Pippig for the
cannam@167 94 bug report and patch, and to Graham Dennis for the bug report).
cannam@167 95
cannam@167 96 * Use 128-bit ARM NEON instructions instead of 64-bits. This change
cannam@167 97 appears to speed up even ARM processors with a 64-bit NEON pipe.
cannam@167 98
cannam@167 99 * Speed improvements for single-precision AVX.
cannam@167 100
cannam@167 101 * Speed up planner on machines without "official" cycle counters, such as ARM.
cannam@167 102
cannam@167 103 FFTW 3.3.2
cannam@167 104
cannam@167 105 * Removed an archaic stack-alignment hack that was failing with
cannam@167 106 gcc-4.7/i386.
cannam@167 107
cannam@167 108 * Added stack-alignment hack necessary for gcc on Windows/i386. We
cannam@167 109 will regret this in ten years (see previous change).
cannam@167 110
cannam@167 111 * Fix incompatibility with Intel icc which pretends to be gcc
cannam@167 112 but does not support quad precision.
cannam@167 113
cannam@167 114 * make libfftw{threads,mpi} depend upon libfftw when using libtool;
cannam@167 115 this is consistent with most other libraries and simplifies the life
cannam@167 116 of various distributors of GNU/Linux.
cannam@167 117
cannam@167 118 FFTW 3.3.1
cannam@167 119
cannam@167 120 * Changes since 3.3.1-beta1:
cannam@167 121
cannam@167 122 - Reduced planning time in estimate mode for sizes with large
cannam@167 123 prime factors.
cannam@167 124
cannam@167 125 - Added AVX autodetection under Visual Studio. Thanks Carsten
cannam@167 126 Steger for submitting the necessary code.
cannam@167 127
cannam@167 128 - Modern Fortran interface now uses a separate fftw3l.f03 interface
cannam@167 129 file for the long double interface, which is not supported by
cannam@167 130 some Fortran compilers. Provided new fftw3q.f03 interface file
cannam@167 131 to access the quadruple-precision FFTW routines with recent
cannam@167 132 versions of gcc/gfortran.
cannam@167 133
cannam@167 134 * Added support for the NEON extensions to the ARM ISA. (Note to beta
cannam@167 135 users: an ARM cycle counter is not yet implemented; please contact
cannam@167 136 fftw@fftw.org if you know how to do it right.)
cannam@167 137
cannam@167 138 * MPI code now compiles even if mpicc is a C++ compiler; thanks to
cannam@167 139 Kyle Spyksma for the bug report.
cannam@167 140
cannam@167 141 FFTW 3.3
cannam@167 142
cannam@167 143 * Changes since 3.3-beta1:
cannam@167 144
cannam@167 145 - Compiling OpenMP support (--enable-openmp) now installs a
cannam@167 146 fftw3_omp library, instead of fftw3_threads, so that OpenMP
cannam@167 147 and POSIX threads (--enable-threads) libraries can be built
cannam@167 148 and installed at the same time.
cannam@167 149
cannam@167 150 - Various minor compilation fixes, corrections of manual typos, and
cannam@167 151 improvements to the benchmark test program.
cannam@167 152
cannam@167 153 * Add support for the AVX extensions to x86 and x86-64. The AVX code
cannam@167 154 works with 16-byte alignment (as opposed to 32-byte alignment),
cannam@167 155 so there is no ABI change compared to FFTW 3.2.2.
cannam@167 156
cannam@167 157 * Added Fortran 2003 interface, which should be usable on most modern
cannam@167 158 Fortran compilers (e.g. gfortran) and provides type-checked access
cannam@167 159 to the the C FFTW interface. (The legacy Fortran-77 interface is
cannam@167 160 still included also.)
cannam@167 161
cannam@167 162 * Added MPI distributed-memory transforms. Compared to 3.3alpha,
cannam@167 163 the major changes in the MPI transforms are:
cannam@167 164 - Fixed some deadlock and crashing bugs.
cannam@167 165 - Added Fortran 2003 interface.
cannam@167 166 - Added new-array execute functions for MPI plans.
cannam@167 167 - Eliminated use of large MPI tags, since Cray MPI requires tags < 2^24;
cannam@167 168 thanks to Jonathan Bentz for the bug report.
cannam@167 169 - Expanded documentation.
cannam@167 170 - 'make check' now runs MPI tests
cannam@167 171 - Some ABI changes - not binary-compatible with 3.3alpha MPI.
cannam@167 172
cannam@167 173 * Add support for quad-precision __float128 in gcc 4.6 or later (on x86.
cannam@167 174 x86-64, and Itanium). The new routines use the fftwq_ prefix.
cannam@167 175
cannam@167 176 * Removed support for MIPS paired-single instructions due to lack of
cannam@167 177 available hardware for testing. Users who want this functionality
cannam@167 178 should continue using FFTW 3.2.x. (Note that FFTW 3.3 still works
cannam@167 179 on MIPS; this only concerns special instructions available on some
cannam@167 180 MIPS chips.)
cannam@167 181
cannam@167 182 * Removed support for the Cell Broadband Engine. Cell users should
cannam@167 183 use FFTW 3.2.x.
cannam@167 184
cannam@167 185 * New convenience functions fftw_alloc_real and fftw_alloc_complex
cannam@167 186 to use fftw_malloc for real and complex arrays without typecasts
cannam@167 187 or sizeof.
cannam@167 188
cannam@167 189 * New convenience functions fftw_export_wisdom_to_filename and
cannam@167 190 fftw_import_wisdom_from_filename that export/import wisdom
cannam@167 191 to a file, which don't require you to open/close the file yourself.
cannam@167 192
cannam@167 193 * New function fftw_cost to return FFTW's internal cost metric for
cannam@167 194 a given plan; thanks to Rhys Ulerich and Nathanael Schaeffer for the
cannam@167 195 suggestion.
cannam@167 196
cannam@167 197 * The --enable-sse2 configure flag now works in both double and single
cannam@167 198 precision (and is equivalent to --enable-sse in the latter case).
cannam@167 199
cannam@167 200 * Remove --enable-portable-binary flag: we new produce portable binaries
cannam@167 201 by default.
cannam@167 202
cannam@167 203 * Remove the automatic detection of native architecture flag for gcc
cannam@167 204 which was introduced in fftw-3.1, since new gcc supports -mtune=native.
cannam@167 205 Remove the --with-gcc-arch flag; if you want to specify a particlar
cannam@167 206 arch to configure, use ./configure CC="gcc -mtune=...".
cannam@167 207
cannam@167 208 * --with-our-malloc16 configure flag is now renamed --with-our-malloc.
cannam@167 209
cannam@167 210 * Fixed build problem failure when srand48 declaration is missing;
cannam@167 211 thanks to Ralf Wildenhues for the bug report.
cannam@167 212
cannam@167 213 * Fixed bug in fftw_set_timelimit: ensure that a negative timelimit
cannam@167 214 is equivalent to no timelimit in all cases. Thanks to William Andrew
cannam@167 215 Burnson for the bug report.
cannam@167 216
cannam@167 217 * Fixed stack-overflow problem on OpenBSD caused by using alloca with
cannam@167 218 too large a buffer.
cannam@167 219
cannam@167 220 FFTW 3.2.2
cannam@167 221
cannam@167 222 * Improve performance of some copy operations of complex arrays on
cannam@167 223 x86 machines.
cannam@167 224
cannam@167 225 * Add configure flag to disable alloca(), which is broken in mingw64.
cannam@167 226
cannam@167 227 * Planning in FFTW_ESTIMATE mode for r2r transforms became slower
cannam@167 228 between fftw-3.1.3 and 3.2. This regression has now been fixed.
cannam@167 229
cannam@167 230 FFTW 3.2.1
cannam@167 231
cannam@167 232 * Performance improvements for some multidimensional r2c/c2r transforms;
cannam@167 233 thanks to Eugene Miloslavsky for his benchmark reports.
cannam@167 234
cannam@167 235 * Compile with icc on MacOS X, use better icc compiler flags.
cannam@167 236
cannam@167 237 * Compilation fixes for systems where snprintf is defined as a macro;
cannam@167 238 thanks to Marcus Mae for the bug report.
cannam@167 239
cannam@167 240 * Fortran documentation now recommends not using dfftw_execute,
cannam@167 241 because of reports of problems with various Fortran compilers;
cannam@167 242 it is better to use dfftw_execute_dft etcetera.
cannam@167 243
cannam@167 244 * Some documentation clarifications, e.g. of fact that --enable-openmp
cannam@167 245 and --enable-threads are mutually exclusive (thanks to Long To),
cannam@167 246 and document slightly odd behavior of plan_guru_r2r in Fortran
cannam@167 247 (thanks to Alexander Pozdneev).
cannam@167 248
cannam@167 249 * FAQ was accidentally omitted from 3.2 tarball.
cannam@167 250
cannam@167 251 * Remove some extraneous (harmless) files accidentally included in
cannam@167 252 a subdirectory of the 3.2 tarball.
cannam@167 253
cannam@167 254 FFTW 3.2
cannam@167 255
cannam@167 256 * Worked around apparent glibc bug that leads to rare hangs when freeing
cannam@167 257 semaphores.
cannam@167 258
cannam@167 259 * Fixed segfault due to unaligned access in certain obscure problems
cannam@167 260 that use SSE and multiple threads.
cannam@167 261
cannam@167 262 * MPI transforms not included, as they are still in alpha; the alpha
cannam@167 263 versions of the MPI transforms have been moved to FFTW 3.3alpha1.
cannam@167 264
cannam@167 265 FFTW 3.2alpha3
cannam@167 266
cannam@167 267 * Performance improvements for sizes with factors of 5 and 10.
cannam@167 268
cannam@167 269 * Documented FFTW_WISDOM_ONLY flag, at the suggestion of Mario
cannam@167 270 Emmenlauer and Phil Dumont.
cannam@167 271
cannam@167 272 * Port Cell code to SDK2.1 (libspe2), as opposed to the old libspe1 code.
cannam@167 273
cannam@167 274 * Performance improvements in Cell code for N < 32k, thanks to Jan Wagner
cannam@167 275 for the suggestions.
cannam@167 276
cannam@167 277 * Cycle counter for Sun x86_64 compiler, and compilation fix in cycle
cannam@167 278 counter for AIX/xlc (thanks to Jeff Haferman for the bug report).
cannam@167 279
cannam@167 280 * Fixed incorrect type prefix in MPI code that prevented wisdom routines
cannam@167 281 from working in single precision (thanks to Eric A. Borisch for the report).
cannam@167 282
cannam@167 283 * Added 'make check' for MPI code (which still fails in a couple corner
cannam@167 284 cases, but should be much better than in alpha2).
cannam@167 285
cannam@167 286 * Many other small fixes.
cannam@167 287
cannam@167 288 FFTW 3.2alpha2
cannam@167 289
cannam@167 290 * Support for the Cell processor, donated by IBM Research; see README.Cell
cannam@167 291 and the Cell section of the manual.
cannam@167 292
cannam@167 293 * New 64-bit API: for every "plan_guru" function there is a new "plan_guru64"
cannam@167 294 function with the same semantics, but which takes fftw_iodim64 instead of
cannam@167 295 fftw_iodim. fftw_iodim64 is the same as fftw_iodim, except that it takes
cannam@167 296 ptrdiff_t integer types as parameters, which is a 64-bit type on
cannam@167 297 64-bit machines. This is only useful for specifying very large transforms
cannam@167 298 on 64-bit machines. (Internally, FFTW uses ptrdiff_t everywhere
cannam@167 299 regardless of what API you choose.)
cannam@167 300
cannam@167 301 * Experimental MPI support. Complex one- and multi-dimensional FFTs,
cannam@167 302 multi-dimensional r2r, multi-dimensional r2c/c2r transforms, and
cannam@167 303 distributed transpose operations, with 1d block distributions.
cannam@167 304 (This is an alpha preview: routines have not been exhaustively
cannam@167 305 tested, documentation is incomplete, and some functionality is
cannam@167 306 missing, e.g. Fortran support.) See mpi/README and also the MPI
cannam@167 307 section of the manual.
cannam@167 308
cannam@167 309 * Significantly faster r2c/c2r transforms, especially on machines with SIMD.
cannam@167 310
cannam@167 311 * Rewritten multi-threaded support for better performance by
cannam@167 312 re-using a fixed pool of threads rather than continually
cannam@167 313 respawning and joining (which nowadays is much slower).
cannam@167 314
cannam@167 315 * Support for MIPS paired-single SIMD instructions, donated by
cannam@167 316 Codesourcery.
cannam@167 317
cannam@167 318 * FFTW_WISDOM_ONLY planner flag, to create plan only if wisdom is
cannam@167 319 available and return NULL otherwise.
cannam@167 320
cannam@167 321 * Removed k7 support, which only worked in 32-bit mode and is
cannam@167 322 becoming obsolete. Use --enable-sse instead.
cannam@167 323
cannam@167 324 * Added --with-g77-wrappers configure option to force inclusion
cannam@167 325 of g77 wrappers, in addition to whatever is needed for the
cannam@167 326 detected Fortran compilers. This is mainly intended for GNU/Linux
cannam@167 327 distros switching to gfortran that wish to include both
cannam@167 328 gfortran and g77 support in FFTW.
cannam@167 329
cannam@167 330 * In manual, renamed "guru execute" functions to "new-array execute"
cannam@167 331 functions, to reduce confusion with the guru planner interface.
cannam@167 332 (The programming interface is unchanged.)
cannam@167 333
cannam@167 334 * Add missing __declspec attribute to threads API functions when compiling
cannam@167 335 for Windows; thanks to Robert O. Morris for the bug report.
cannam@167 336
cannam@167 337 * Fixed missing return value from dfftw_init_threads in Fortran;
cannam@167 338 thanks to Markus Wetzstein for the bug report.
cannam@167 339
cannam@167 340 FFTW 3.1.3
cannam@167 341
cannam@167 342 * Bug fix: FFTW computes incorrect results when the user plans both
cannam@167 343 REDFT11 and RODFT11 transforms of certain sizes. The bug is caused
cannam@167 344 by incorrect sharing of twiddle-factor tables between the two
cannam@167 345 transforms, and only occurs when both are used. Thanks to Paul
cannam@167 346 A. Valiant for the bug report.
cannam@167 347
cannam@167 348 FFTW 3.1.2
cannam@167 349
cannam@167 350 * Correct bug in configure script: --enable-portable-binary option was ignored!
cannam@167 351 Thanks to Andrew Salamon for the bug report.
cannam@167 352
cannam@167 353 * Threads compilation fix on AIX: prefer xlc_r to cc_r, and don't use
cannam@167 354 either if we are using gcc. Thanks to Guy Moebs for the bug report.
cannam@167 355
cannam@167 356 * Updated FAQ to note that Apple gcc 4.0.1 on MacOS/Intel is broken,
cannam@167 357 and suggest a workaround. configure script now detects Core/Duo arch.
cannam@167 358
cannam@167 359 * Use -maltivec when checking for altivec.h. Fixes Gentoo bug #129304,
cannam@167 360 thanks to Markus Dittrich.
cannam@167 361
cannam@167 362 FFTW 3.1.1
cannam@167 363
cannam@167 364 * Performance improvements for Intel EMT64.
cannam@167 365
cannam@167 366 * Performance improvements for large-size transforms with SIMD.
cannam@167 367
cannam@167 368 * Cycle counter support for Intel icc and Visual C++ on x86-64.
cannam@167 369
cannam@167 370 * In fftw-wisdom tool, replaced obsolete --impatient with --measure.
cannam@167 371
cannam@167 372 * Fixed compilation failure with AIX/xlc; thanks to Joseph Thomas.
cannam@167 373
cannam@167 374 * Windows DLL support for Fortran API (added missing __declspec(dllexport)).
cannam@167 375
cannam@167 376 * SSE/SSE2 code works properly (i.e. disables itself) on older 386 and 486
cannam@167 377 CPUs lacking a CPUID instruction; thanks to Eric Korpela.
cannam@167 378
cannam@167 379 FFTW 3.1
cannam@167 380
cannam@167 381 * Faster FFTW_ESTIMATE planner.
cannam@167 382
cannam@167 383 * New (faster) algorithm for REDFT00/RODFT00 (type-I DCT/DST) of odd size.
cannam@167 384
cannam@167 385 * "4-step" algorithm for faster FFTs of very large sizes (> 2^18).
cannam@167 386
cannam@167 387 * Faster in-place real-data DFTs (for R2HC and HC2R r2r formats).
cannam@167 388
cannam@167 389 * Faster in-place non-square transpositions (FFTW uses these internally
cannam@167 390 for in-place FFTs, and you can also perform them explicitly using
cannam@167 391 the guru interface).
cannam@167 392
cannam@167 393 * Faster prime-size DFTs: implemented Bluestein's algorithm, as well
cannam@167 394 as a zero-padded Rader variant to limit recursive use of Rader's algorithm.
cannam@167 395
cannam@167 396 * SIMD support for split complex arrays.
cannam@167 397
cannam@167 398 * Much faster Altivec/VMX performance.
cannam@167 399
cannam@167 400 * New fftw_set_timelimit function to specify a (rough) upper bound to the
cannam@167 401 planning time (does not affect ESTIMATE mode).
cannam@167 402
cannam@167 403 * Removed --enable-3dnow support; use --enable-k7 instead.
cannam@167 404
cannam@167 405 * FMA (fused multiply-add) version is now included in "standard" FFTW,
cannam@167 406 and is enabled with --enable-fma (the default on PowerPC and Itanium).
cannam@167 407
cannam@167 408 * Automatic detection of native architecture flag for gcc. New
cannam@167 409 configure options: --enable-portable-binary and --with-gcc-arch=<arch>,
cannam@167 410 for people distributing compiled binaries of FFTW (see manual).
cannam@167 411
cannam@167 412 * Automatic detection of Altivec under Linux with gcc 3.4 (so that
cannam@167 413 same binary should work on both Altivec and non-Altivec PowerPCs).
cannam@167 414
cannam@167 415 * Compiler-specific tweaks/flags/workarounds for gcc 3.4, xlc, HP/UX,
cannam@167 416 Solaris/Intel.
cannam@167 417
cannam@167 418 * Various documentation clarifications.
cannam@167 419
cannam@167 420 * 64-bit clean. (Fixes a bug affecting the split guru planner on
cannam@167 421 64-bit machines, reported by David Necas.)
cannam@167 422
cannam@167 423 * Fixed Debian bug #259612: inadvertent use of SSE instructions on
cannam@167 424 non-SSE machines (causing a crash) for --enable-sse binaries.
cannam@167 425
cannam@167 426 * Fixed bug that caused HC2R transforms to destroy the input in
cannam@167 427 certain cases, even if the user specified FFTW_PRESERVE_INPUT.
cannam@167 428
cannam@167 429 * Fixed bug where wisdom would be lost under rare circumstances,
cannam@167 430 causing excessive planning time.
cannam@167 431
cannam@167 432 * FAQ notes bug in gcc-3.4.[1-3] that causes FFTW to crash with SSE/SSE2.
cannam@167 433
cannam@167 434 * Fixed accidentally exported symbol that prohibited simultaneous
cannam@167 435 linking to double/single multithreaded FFTW (thanks to Alessio Massaro).
cannam@167 436
cannam@167 437 * Support Win32 threads under MinGW (thanks to Alessio Massaro).
cannam@167 438
cannam@167 439 * Fixed problem with building DLL under Cygwin; thanks to Stephane Fillod.
cannam@167 440
cannam@167 441 * Fix build failure if no Fortran compiler is found (thanks to Charles
cannam@167 442 Radley for the bug report).
cannam@167 443
cannam@167 444 * Fixed compilation failure with icc 8.0 and SSE/SSE2. Automatic
cannam@167 445 detection of icc architecture flag (e.g. -xW).
cannam@167 446
cannam@167 447 * Fixed compilation with OpenMP on AIX (thanks to Greg Bauer).
cannam@167 448
cannam@167 449 * Fixed compilation failure on x86-64 with gcc (thanks to Orion Poplawski).
cannam@167 450
cannam@167 451 * Incorporated patch from FreeBSD ports (FreeBSD does not have memalign,
cannam@167 452 but its malloc is 16-byte aligned).
cannam@167 453
cannam@167 454 * Cycle-counter compilation fixes for Itanium, Alpha, x86-64, Sparc,
cannam@167 455 MacOS (thanks to Matt Boman, John Bowman, and James A. Treacy for
cannam@167 456 reports/fixes). Added x86-64 cycle counter for PGI compilers,
cannam@167 457 courtesy Cristiano Calonaci.
cannam@167 458
cannam@167 459 * Fix compilation problem in test program due to C99 conflict.
cannam@167 460
cannam@167 461 * Portability fix for import_system_wisdom with djgpp (thanks to Juan
cannam@167 462 Manuel Guerrero).
cannam@167 463
cannam@167 464 * Fixed compilation failure on MacOS 10.3 due to getopt conflict.
cannam@167 465
cannam@167 466 * Work around Visual C++ (version 6/7) bug in SSE compilation;
cannam@167 467 thanks to Eddie Yee for his detailed report.
cannam@167 468
cannam@167 469 Changes from FFTW 3.1 beta 2:
cannam@167 470
cannam@167 471 * Several minor compilation fixes.
cannam@167 472
cannam@167 473 * Eliminate FFTW_TIMELIMIT flag and replace fftw_timelimit global with
cannam@167 474 fftw_set_timelimit function. Make wisdom work with time-limited plans.
cannam@167 475
cannam@167 476 Changes from FFTW 3.1 beta 1:
cannam@167 477
cannam@167 478 * Fixes for creating DLLs under Windows; thanks to John Pavel for his feedback.
cannam@167 479
cannam@167 480 * Fixed more 64-bit problems, thanks to John Pavel for the bug report.
cannam@167 481
cannam@167 482 * Further speed improvements for Altivec/VMX.
cannam@167 483
cannam@167 484 * Further speed improvements for non-square transpositions.
cannam@167 485
cannam@167 486 * Many minor tweaks.
cannam@167 487
cannam@167 488 FFTW 3.0.1
cannam@167 489
cannam@167 490 * Some speed improvements in SIMD code.
cannam@167 491
cannam@167 492 * --without-cycle-counter option is removed. If no cycle counter is found,
cannam@167 493 then the estimator is always used. A --with-slow-timer option is provided
cannam@167 494 to force the use of lower-resolution timers.
cannam@167 495
cannam@167 496 * Several fixes for compilation under Visual C++, with help from Stefane Ruel.
cannam@167 497
cannam@167 498 * Added x86 cycle counter for Visual C++, with help from Morten Nissov.
cannam@167 499
cannam@167 500 * Added S390 cycle counter, courtesy of James Treacy.
cannam@167 501
cannam@167 502 * Added missing static keyword that prevented simultaneous linkage
cannam@167 503 of different-precision versions; thanks to Rasmus Larsen for the bug report.
cannam@167 504
cannam@167 505 * Corrected accidental omission of f77_wisdom.f file; thanks to Alan Watson.
cannam@167 506
cannam@167 507 * Support -xopenmp flag for SunOS; thanks to John Lou for the bug report.
cannam@167 508
cannam@167 509 * Compilation with HP/UX cc requires -Wp,-H128000 flag to increase
cannam@167 510 preprocessor limits; thanks to Peter Vouras for the bug report.
cannam@167 511
cannam@167 512 * Removed non-portable use of 'tempfile' in fftw-wisdom-to-conf script;
cannam@167 513 thanks to Nicolas Decoster for the patch.
cannam@167 514
cannam@167 515 * Added 'make smallcheck' target in tests/ directory, at the request of
cannam@167 516 James Treacy.
cannam@167 517
cannam@167 518 FFTW 3.0
cannam@167 519
cannam@167 520 Major goals of this release:
cannam@167 521
cannam@167 522 * Speed: often 20% or more faster than FFTW 2.x, even without SIMD (see below).
cannam@167 523
cannam@167 524 * Complete rewrite, to make it easier to add new algorithms and transforms.
cannam@167 525
cannam@167 526 * New API, to support more general semantics.
cannam@167 527
cannam@167 528 Other enhancements:
cannam@167 529
cannam@167 530 * SIMD acceleration on supporting CPUs (SSE, SSE2, 3DNow!, and AltiVec).
cannam@167 531 (With special thanks to Franz Franchetti for many experimental prototypes
cannam@167 532 and to Stefan Kral for the vectorizing generator from fftwgel.)
cannam@167 533
cannam@167 534 * True in-place 1d transforms of large sizes (as well as compressed
cannam@167 535 twiddle tables for additional memory/cache savings).
cannam@167 536
cannam@167 537 * More arbitrary placement of real & imaginary data, e.g. including
cannam@167 538 interleaved (as in FFTW 2.x) as well as separate real/imag arrays.
cannam@167 539
cannam@167 540 * Efficient prime-size transforms of real data.
cannam@167 541
cannam@167 542 * Multidimensional transforms can operate on a subset of a larger matrix,
cannam@167 543 and/or transform selected dimensions of a multidimensional array.
cannam@167 544
cannam@167 545 * By popular demand, simultaneous linking to double precision (fftw),
cannam@167 546 single precision (fftwf), and long-double precision (fftwl) versions
cannam@167 547 of FFTW is now supported.
cannam@167 548
cannam@167 549 * Cycle counters (on all modern CPUs) are exploited to speed planning.
cannam@167 550
cannam@167 551 * Efficient transforms of real even/odd arrays, a.k.a. discrete
cannam@167 552 cosine/sine transforms (types I-IV). (Currently work via pre/post
cannam@167 553 processing of real transforms, ala FFTPACK, so are not optimal.)
cannam@167 554
cannam@167 555 * DHTs (Discrete Hartley Transforms), again via post-processing
cannam@167 556 of real transforms (and thus suboptimal, for now).
cannam@167 557
cannam@167 558 * Support for linking to just those parts of FFTW that you need,
cannam@167 559 greatly reducing the size of statically linked programs when
cannam@167 560 only a limited set of transform sizes/types are required.
cannam@167 561
cannam@167 562 * Canonical global wisdom file (/etc/fftw/wisdom) on Unix, along
cannam@167 563 with a command-line tool (fftw-wisdom) to generate/update it.
cannam@167 564
cannam@167 565 * Fortran API can be used with both g77 and non-g77 compilers
cannam@167 566 simultaneously.
cannam@167 567
cannam@167 568 * Multi-threaded version has optional OpenMP support.
cannam@167 569
cannam@167 570 * Authors' good looks have greatly improved with age.
cannam@167 571
cannam@167 572 Changes from 3.0beta3:
cannam@167 573
cannam@167 574 * Separate FMA distribution to better exploit fused multiply-add instructions
cannam@167 575 on PowerPC (and possibly other) architectures.
cannam@167 576
cannam@167 577 * Performance improvements via some inlining tweaks.
cannam@167 578
cannam@167 579 * fftw_flops now returns double arguments, not int, to avoid overflows
cannam@167 580 for large sizes.
cannam@167 581
cannam@167 582 * Workarounds for automake bugs.
cannam@167 583
cannam@167 584 Changes from 3.0beta2:
cannam@167 585
cannam@167 586 * The standard REDFT00/RODFT00 (DCT-I/DST-I) algorithm (used in
cannam@167 587 FFTPACK, NR, etcetera) turns out to have poor numerical accuracy, so
cannam@167 588 we replaced it with a slower routine that is more accurate.
cannam@167 589
cannam@167 590 * The guru planner and execute functions now have two variants, one that
cannam@167 591 takes complex arguments and one that takes separate real/imag pointers.
cannam@167 592
cannam@167 593 * Execute and planner routines now automatically align the stack on x86,
cannam@167 594 in case the calling program is misaligned.
cannam@167 595
cannam@167 596 * README file for test program.
cannam@167 597
cannam@167 598 * Fixed bugs in the combination of SIMD with multi-threaded transforms.
cannam@167 599
cannam@167 600 * Eliminated internal fftw_threads_init function, which some people were
cannam@167 601 calling accidentally instead of the fftw_init_threads API function.
cannam@167 602
cannam@167 603 * Check for -openmp flag (Intel C compiler) when --enable-openmp is used.
cannam@167 604
cannam@167 605 * Support AMD x86-64 SIMD and cycle counter.
cannam@167 606
cannam@167 607 * Support SSE2 intrinsics in forthcoming gcc 3.3.
cannam@167 608
cannam@167 609 Changes from 3.0beta1:
cannam@167 610
cannam@167 611 * Faster in-place 1d transforms of non-power-of-two sizes.
cannam@167 612
cannam@167 613 * SIMD improvements for in-place, multi-dimensional, and/or non-FFTW_PATIENT
cannam@167 614 transforms.
cannam@167 615
cannam@167 616 * Added support for hard-coded DCT/DST/DHT codelets of small sizes; the
cannam@167 617 default distribution only includes hard-coded size-8 DCT-II/III, however.
cannam@167 618
cannam@167 619 * Many minor improvements to the manual. Added section on using the
cannam@167 620 codelet generator to customize and enhance FFTW.
cannam@167 621
cannam@167 622 * The default 'make check' should now only take a few minutes; for more
cannam@167 623 strenuous tests (which may take a day or so), do 'cd tests; make bigcheck'.
cannam@167 624
cannam@167 625 * fftw_print_plan is split into fftw_fprint_plan and fftw_print_plan, where
cannam@167 626 the latter uses stdout.
cannam@167 627
cannam@167 628 * Fixed ability to compile with a C++ compiler.
cannam@167 629
cannam@167 630 * Fixed support for C99 complex type under glibc.
cannam@167 631
cannam@167 632 * Fixed problems with alloca under MinGW, AIX.
cannam@167 633
cannam@167 634 * Workaround for gcc/SPARC bug.
cannam@167 635
cannam@167 636 * Fixed multi-threaded initialization failure on IRIX due to lack of
cannam@167 637 user-accessible PTHREAD_SCOPE_SYSTEM there.