Mercurial > hg > sv-dependency-builds
diff src/fftw-3.3.3/NEWS @ 95:89f5e221ed7b
Add FFTW3
author | Chris Cannam <cannam@all-day-breakfast.com> |
---|---|
date | Wed, 20 Mar 2013 15:35:50 +0000 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/src/fftw-3.3.3/NEWS Wed Mar 20 15:35:50 2013 +0000 @@ -0,0 +1,525 @@ +FFTW 3.3.3 + +* Fix deadlock bug in MPI transforms (thanks to Michael Pippig for the + bug report and patch, and to Graham Dennis for the bug report). + +* Use 128-bit ARM NEON instructions instead of 64-bits. This change + appears to speed up even ARM processors with a 64-bit NEON pipe. + +* Speed improvements for single-precision AVX. + +* Speed up planner on machines without "official" cycle counters, such as ARM. + +FFTW 3.3.2 + +* Removed an archaic stack-alignment hack that was failing with + gcc-4.7/i386. + +* Added stack-alignment hack necessary for gcc on Windows/i386. We + will regret this in ten years (see previous change). + +* Fix incompatibility with Intel icc which pretends to be gcc + but does not support quad precision. + +* make libfftw{threads,mpi} depend upon libfftw when using libtool; + this is consistent with most other libraries and simplifies the life + of various distributors of GNU/Linux. + +FFTW 3.3.1 + +* Changes since 3.3.1-beta1: + + - Reduced planning time in estimate mode for sizes with large + prime factors. + + - Added AVX autodetection under Visual Studio. Thanks Carsten + Steger for submitting the necessary code. + + - Modern Fortran interface now uses a separate fftw3l.f03 interface + file for the long double interface, which is not supported by + some Fortran compilers. Provided new fftw3q.f03 interface file + to access the quadruple-precision FFTW routines with recent + versions of gcc/gfortran. + +* Added support for the NEON extensions to the ARM ISA. (Note to beta + users: an ARM cycle counter is not yet implemented; please contact + fftw@fftw.org if you know how to do it right.) + +* MPI code now compiles even if mpicc is a C++ compiler; thanks to + Kyle Spyksma for the bug report. + +FFTW 3.3 + +* Changes since 3.3-beta1: + + - Compiling OpenMP support (--enable-openmp) now installs a + fftw3_omp library, instead of fftw3_threads, so that OpenMP + and POSIX threads (--enable-threads) libraries can be built + and installed at the same time. + + - Various minor compilation fixes, corrections of manual typos, and + improvements to the benchmark test program. + +* Add support for the AVX extensions to x86 and x86-64. The AVX code + works with 16-byte alignment (as opposed to 32-byte alignment), + so there is no ABI change compared to FFTW 3.2.2. + +* Added Fortran 2003 interface, which should be usable on most modern + Fortran compilers (e.g. gfortran) and provides type-checked access + to the the C FFTW interface. (The legacy Fortran-77 interface is + still included also.) + +* Added MPI distributed-memory transforms. Compared to 3.3alpha, + the major changes in the MPI transforms are: + - Fixed some deadlock and crashing bugs. + - Added Fortran 2003 interface. + - Added new-array execute functions for MPI plans. + - Eliminated use of large MPI tags, since Cray MPI requires tags < 2^24; + thanks to Jonathan Bentz for the bug report. + - Expanded documentation. + - 'make check' now runs MPI tests + - Some ABI changes - not binary-compatible with 3.3alpha MPI. + +* Add support for quad-precision __float128 in gcc 4.6 or later (on x86. + x86-64, and Itanium). The new routines use the fftwq_ prefix. + +* Removed support for MIPS paired-single instructions due to lack of + available hardware for testing. Users who want this functionality + should continue using FFTW 3.2.x. (Note that FFTW 3.3 still works + on MIPS; this only concerns special instructions available on some + MIPS chips.) + +* Removed support for the Cell Broadband Engine. Cell users should + use FFTW 3.2.x. + +* New convenience functions fftw_alloc_real and fftw_alloc_complex + to use fftw_malloc for real and complex arrays without typecasts + or sizeof. + +* New convenience functions fftw_export_wisdom_to_filename and + fftw_import_wisdom_from_filename that export/import wisdom + to a file, which don't require you to open/close the file yourself. + +* New function fftw_cost to return FFTW's internal cost metric for + a given plan; thanks to Rhys Ulerich and Nathanael Schaeffer for the + suggestion. + +* The --enable-sse2 configure flag now works in both double and single + precision (and is equivalent to --enable-sse in the latter case). + +* Remove --enable-portable-binary flag: we new produce portable binaries + by default. + +* Remove the automatic detection of native architecture flag for gcc + which was introduced in fftw-3.1, since new gcc supports -mtune=native. + Remove the --with-gcc-arch flag; if you want to specify a particlar + arch to configure, use ./configure CC="gcc -mtune=...". + +* --with-our-malloc16 configure flag is now renamed --with-our-malloc. + +* Fixed build problem failure when srand48 declaration is missing; + thanks to Ralf Wildenhues for the bug report. + +* Fixed bug in fftw_set_timelimit: ensure that a negative timelimit + is equivalent to no timelimit in all cases. Thanks to William Andrew + Burnson for the bug report. + +* Fixed stack-overflow problem on OpenBSD caused by using alloca with + too large a buffer. + +FFTW 3.2.2 + +* Improve performance of some copy operations of complex arrays on + x86 machines. + +* Add configure flag to disable alloca(), which is broken in mingw64. + +* Planning in FFTW_ESTIMATE mode for r2r transforms became slower + between fftw-3.1.3 and 3.2. This regression has now been fixed. + +FFTW 3.2.1 + +* Performance improvements for some multidimensional r2c/c2r transforms; + thanks to Eugene Miloslavsky for his benchmark reports. + +* Compile with icc on MacOS X, use better icc compiler flags. + +* Compilation fixes for systems where snprintf is defined as a macro; + thanks to Marcus Mae for the bug report. + +* Fortran documentation now recommends not using dfftw_execute, + because of reports of problems with various Fortran compilers; + it is better to use dfftw_execute_dft etcetera. + +* Some documentation clarifications, e.g. of fact that --enable-openmp + and --enable-threads are mutually exclusive (thanks to Long To), + and document slightly odd behavior of plan_guru_r2r in Fortran + (thanks to Alexander Pozdneev). + +* FAQ was accidentally omitted from 3.2 tarball. + +* Remove some extraneous (harmless) files accidentally included in + a subdirectory of the 3.2 tarball. + +FFTW 3.2 + +* Worked around apparent glibc bug that leads to rare hangs when freeing + semaphores. + +* Fixed segfault due to unaligned access in certain obscure problems + that use SSE and multiple threads. + +* MPI transforms not included, as they are still in alpha; the alpha + versions of the MPI transforms have been moved to FFTW 3.3alpha1. + +FFTW 3.2alpha3 + +* Performance improvements for sizes with factors of 5 and 10. + +* Documented FFTW_WISDOM_ONLY flag, at the suggestion of Mario + Emmenlauer and Phil Dumont. + +* Port Cell code to SDK2.1 (libspe2), as opposed to the old libspe1 code. + +* Performance improvements in Cell code for N < 32k, thanks to Jan Wagner + for the suggestions. + +* Cycle counter for Sun x86_64 compiler, and compilation fix in cycle + counter for AIX/xlc (thanks to Jeff Haferman for the bug report). + +* Fixed incorrect type prefix in MPI code that prevented wisdom routines + from working in single precision (thanks to Eric A. Borisch for the report). + +* Added 'make check' for MPI code (which still fails in a couple corner + cases, but should be much better than in alpha2). + +* Many other small fixes. + +FFTW 3.2alpha2 + +* Support for the Cell processor, donated by IBM Research; see README.Cell + and the Cell section of the manual. + +* New 64-bit API: for every "plan_guru" function there is a new "plan_guru64" + function with the same semantics, but which takes fftw_iodim64 instead of + fftw_iodim. fftw_iodim64 is the same as fftw_iodim, except that it takes + ptrdiff_t integer types as parameters, which is a 64-bit type on + 64-bit machines. This is only useful for specifying very large transforms + on 64-bit machines. (Internally, FFTW uses ptrdiff_t everywhere + regardless of what API you choose.) + +* Experimental MPI support. Complex one- and multi-dimensional FFTs, + multi-dimensional r2r, multi-dimensional r2c/c2r transforms, and + distributed transpose operations, with 1d block distributions. + (This is an alpha preview: routines have not been exhaustively + tested, documentation is incomplete, and some functionality is + missing, e.g. Fortran support.) See mpi/README and also the MPI + section of the manual. + +* Significantly faster r2c/c2r transforms, especially on machines with SIMD. + +* Rewritten multi-threaded support for better performance by + re-using a fixed pool of threads rather than continually + respawning and joining (which nowadays is much slower). + +* Support for MIPS paired-single SIMD instructions, donated by + Codesourcery. + +* FFTW_WISDOM_ONLY planner flag, to create plan only if wisdom is + available and return NULL otherwise. + +* Removed k7 support, which only worked in 32-bit mode and is + becoming obsolete. Use --enable-sse instead. + +* Added --with-g77-wrappers configure option to force inclusion + of g77 wrappers, in addition to whatever is needed for the + detected Fortran compilers. This is mainly intended for GNU/Linux + distros switching to gfortran that wish to include both + gfortran and g77 support in FFTW. + +* In manual, renamed "guru execute" functions to "new-array execute" + functions, to reduce confusion with the guru planner interface. + (The programming interface is unchanged.) + +* Add missing __declspec attribute to threads API functions when compiling + for Windows; thanks to Robert O. Morris for the bug report. + +* Fixed missing return value from dfftw_init_threads in Fortran; + thanks to Markus Wetzstein for the bug report. + +FFTW 3.1.1 + +* Performance improvements for Intel EMT64. + +* Performance improvements for large-size transforms with SIMD. + +* Cycle counter support for Intel icc and Visual C++ on x86-64. + +* In fftw-wisdom tool, replaced obsolete --impatient with --measure. + +* Fixed compilation failure with AIX/xlc; thanks to Joseph Thomas. + +* Windows DLL support for Fortran API (added missing __declspec(dllexport)). + +* SSE/SSE2 code works properly (i.e. disables itself) on older 386 and 486 + CPUs lacking a CPUID instruction; thanks to Eric Korpela. + +FFTW 3.1 + +* Faster FFTW_ESTIMATE planner. + +* New (faster) algorithm for REDFT00/RODFT00 (type-I DCT/DST) of odd size. + +* "4-step" algorithm for faster FFTs of very large sizes (> 2^18). + +* Faster in-place real-data DFTs (for R2HC and HC2R r2r formats). + +* Faster in-place non-square transpositions (FFTW uses these internally + for in-place FFTs, and you can also perform them explicitly using + the guru interface). + +* Faster prime-size DFTs: implemented Bluestein's algorithm, as well + as a zero-padded Rader variant to limit recursive use of Rader's algorithm. + +* SIMD support for split complex arrays. + +* Much faster Altivec/VMX performance. + +* New fftw_set_timelimit function to specify a (rough) upper bound to the + planning time (does not affect ESTIMATE mode). + +* Removed --enable-3dnow support; use --enable-k7 instead. + +* FMA (fused multiply-add) version is now included in "standard" FFTW, + and is enabled with --enable-fma (the default on PowerPC and Itanium). + +* Automatic detection of native architecture flag for gcc. New + configure options: --enable-portable-binary and --with-gcc-arch=<arch>, + for people distributing compiled binaries of FFTW (see manual). + +* Automatic detection of Altivec under Linux with gcc 3.4 (so that + same binary should work on both Altivec and non-Altivec PowerPCs). + +* Compiler-specific tweaks/flags/workarounds for gcc 3.4, xlc, HP/UX, + Solaris/Intel. + +* Various documentation clarifications. + +* 64-bit clean. (Fixes a bug affecting the split guru planner on + 64-bit machines, reported by David Necas.) + +* Fixed Debian bug #259612: inadvertent use of SSE instructions on + non-SSE machines (causing a crash) for --enable-sse binaries. + +* Fixed bug that caused HC2R transforms to destroy the input in + certain cases, even if the user specified FFTW_PRESERVE_INPUT. + +* Fixed bug where wisdom would be lost under rare circumstances, + causing excessive planning time. + +* FAQ notes bug in gcc-3.4.[1-3] that causes FFTW to crash with SSE/SSE2. + +* Fixed accidentally exported symbol that prohibited simultaneous + linking to double/single multithreaded FFTW (thanks to Alessio Massaro). + +* Support Win32 threads under MinGW (thanks to Alessio Massaro). + +* Fixed problem with building DLL under Cygwin; thanks to Stephane Fillod. + +* Fix build failure if no Fortran compiler is found (thanks to Charles + Radley for the bug report). + +* Fixed compilation failure with icc 8.0 and SSE/SSE2. Automatic + detection of icc architecture flag (e.g. -xW). + +* Fixed compilation with OpenMP on AIX (thanks to Greg Bauer). + +* Fixed compilation failure on x86-64 with gcc (thanks to Orion Poplawski). + +* Incorporated patch from FreeBSD ports (FreeBSD does not have memalign, + but its malloc is 16-byte aligned). + +* Cycle-counter compilation fixes for Itanium, Alpha, x86-64, Sparc, + MacOS (thanks to Matt Boman, John Bowman, and James A. Treacy for + reports/fixes). Added x86-64 cycle counter for PGI compilers, + courtesy Cristiano Calonaci. + +* Fix compilation problem in test program due to C99 conflict. + +* Portability fix for import_system_wisdom with djgpp (thanks to Juan + Manuel Guerrero). + +* Fixed compilation failure on MacOS 10.3 due to getopt conflict. + +* Work around Visual C++ (version 6/7) bug in SSE compilation; + thanks to Eddie Yee for his detailed report. + +Changes from FFTW 3.1 beta 2: + +* Several minor compilation fixes. + +* Eliminate FFTW_TIMELIMIT flag and replace fftw_timelimit global with + fftw_set_timelimit function. Make wisdom work with time-limited plans. + +Changes from FFTW 3.1 beta 1: + +* Fixes for creating DLLs under Windows; thanks to John Pavel for his feedback. + +* Fixed more 64-bit problems, thanks to John Pavel for the bug report. + +* Further speed improvements for Altivec/VMX. + +* Further speed improvements for non-square transpositions. + +* Many minor tweaks. + +FFTW 3.0.1 + +* Some speed improvements in SIMD code. + +* --without-cycle-counter option is removed. If no cycle counter is found, + then the estimator is always used. A --with-slow-timer option is provided + to force the use of lower-resolution timers. + +* Several fixes for compilation under Visual C++, with help from Stefane Ruel. + +* Added x86 cycle counter for Visual C++, with help from Morten Nissov. + +* Added S390 cycle counter, courtesy of James Treacy. + +* Added missing static keyword that prevented simultaneous linkage + of different-precision versions; thanks to Rasmus Larsen for the bug report. + +* Corrected accidental omission of f77_wisdom.f file; thanks to Alan Watson. + +* Support -xopenmp flag for SunOS; thanks to John Lou for the bug report. + +* Compilation with HP/UX cc requires -Wp,-H128000 flag to increase + preprocessor limits; thanks to Peter Vouras for the bug report. + +* Removed non-portable use of 'tempfile' in fftw-wisdom-to-conf script; + thanks to Nicolas Decoster for the patch. + +* Added 'make smallcheck' target in tests/ directory, at the request of + James Treacy. + +FFTW 3.0 + +Major goals of this release: + +* Speed: often 20% or more faster than FFTW 2.x, even without SIMD (see below). + +* Complete rewrite, to make it easier to add new algorithms and transforms. + +* New API, to support more general semantics. + +Other enhancements: + +* SIMD acceleration on supporting CPUs (SSE, SSE2, 3DNow!, and AltiVec). + (With special thanks to Franz Franchetti for many experimental prototypes + and to Stefan Kral for the vectorizing generator from fftwgel.) + +* True in-place 1d transforms of large sizes (as well as compressed + twiddle tables for additional memory/cache savings). + +* More arbitrary placement of real & imaginary data, e.g. including + interleaved (as in FFTW 2.x) as well as separate real/imag arrays. + +* Efficient prime-size transforms of real data. + +* Multidimensional transforms can operate on a subset of a larger matrix, + and/or transform selected dimensions of a multidimensional array. + +* By popular demand, simultaneous linking to double precision (fftw), + single precision (fftwf), and long-double precision (fftwl) versions + of FFTW is now supported. + +* Cycle counters (on all modern CPUs) are exploited to speed planning. + +* Efficient transforms of real even/odd arrays, a.k.a. discrete + cosine/sine transforms (types I-IV). (Currently work via pre/post + processing of real transforms, ala FFTPACK, so are not optimal.) + +* DHTs (Discrete Hartley Transforms), again via post-processing + of real transforms (and thus suboptimal, for now). + +* Support for linking to just those parts of FFTW that you need, + greatly reducing the size of statically linked programs when + only a limited set of transform sizes/types are required. + +* Canonical global wisdom file (/etc/fftw/wisdom) on Unix, along + with a command-line tool (fftw-wisdom) to generate/update it. + +* Fortran API can be used with both g77 and non-g77 compilers + simultaneously. + +* Multi-threaded version has optional OpenMP support. + +* Authors' good looks have greatly improved with age. + +Changes from 3.0beta3: + +* Separate FMA distribution to better exploit fused multiply-add instructions + on PowerPC (and possibly other) architectures. + +* Performance improvements via some inlining tweaks. + +* fftw_flops now returns double arguments, not int, to avoid overflows + for large sizes. + +* Workarounds for automake bugs. + +Changes from 3.0beta2: + +* The standard REDFT00/RODFT00 (DCT-I/DST-I) algorithm (used in + FFTPACK, NR, etcetera) turns out to have poor numerical accuracy, so + we replaced it with a slower routine that is more accurate. + +* The guru planner and execute functions now have two variants, one that + takes complex arguments and one that takes separate real/imag pointers. + +* Execute and planner routines now automatically align the stack on x86, + in case the calling program is misaligned. + +* README file for test program. + +* Fixed bugs in the combination of SIMD with multi-threaded transforms. + +* Eliminated internal fftw_threads_init function, which some people were + calling accidentally instead of the fftw_init_threads API function. + +* Check for -openmp flag (Intel C compiler) when --enable-openmp is used. + +* Support AMD x86-64 SIMD and cycle counter. + +* Support SSE2 intrinsics in forthcoming gcc 3.3. + +Changes from 3.0beta1: + +* Faster in-place 1d transforms of non-power-of-two sizes. + +* SIMD improvements for in-place, multi-dimensional, and/or non-FFTW_PATIENT + transforms. + +* Added support for hard-coded DCT/DST/DHT codelets of small sizes; the + default distribution only includes hard-coded size-8 DCT-II/III, however. + +* Many minor improvements to the manual. Added section on using the + codelet generator to customize and enhance FFTW. + +* The default 'make check' should now only take a few minutes; for more + strenuous tests (which may take a day or so), do 'cd tests; make bigcheck'. + +* fftw_print_plan is split into fftw_fprint_plan and fftw_print_plan, where + the latter uses stdout. + +* Fixed ability to compile with a C++ compiler. + +* Fixed support for C99 complex type under glibc. + +* Fixed problems with alloca under MinGW, AIX. + +* Workaround for gcc/SPARC bug. + +* Fixed multi-threaded initialization failure on IRIX due to lack of + user-accessible PTHREAD_SCOPE_SYSTEM there.