cannam@167: FFTW 3.3.8: cannam@167: cannam@167: * Fixed AVX, AVX2 for gcc-8. cannam@167: cannam@167: By default, FFTW 3.3.7 was broken with gcc-8. AVX and AVX2 code cannam@167: assumed that the compiler honors the distinction between +0 and -0, cannam@167: but gcc-8 -ffast-math does not. The default CFLAGS included -ffast-math. cannam@167: This release ensures that FFTW works with gcc-8 -ffast-math, and cannam@167: removes -ffast-math from the default CFLAGS for good measure. cannam@167: cannam@167: FFTW 3.3.7: cannam@167: cannam@167: * Experimental support for CMake. cannam@167: cannam@167: The primary build mechanism for FFTW remains GNU autoconf/automake. cannam@167: CMake support is meant to offer an easy way to compile FFTW on cannam@167: Windows, and as such it does not cover all the features of the cannam@167: automake build system, such as exotic cycle counters, cannam@167: cross-compiling, or build of binaries for a mixture of ISA's cannam@167: (e.g., amd64 vs amd64+avx vs amd64+avx2). Patches are welcome. cannam@167: cannam@167: * Fixes for armv7a cycle counter. cannam@167: * Official support for aarch64, now that we have hardware to test it. cannam@167: * Tweak usage of FMA instructions in a way that favors newer processors cannam@167: (Skylake and Ryzen) over older processors (Haswell). cannam@167: * tests/bench: use 64-bit precision to compute mflops. cannam@167: cannam@167: FFTW 3.3.6-pl2: cannam@167: cannam@167: * Bugfix: MPI Fortran-03 headers were missing in FFTW 3.3.6-pl1. cannam@167: cannam@167: FFTW 3.3.6-pl1: cannam@167: cannam@167: * Bugfix: FFTW 3.3.6 had the wrong libtool version number, and generated cannam@167: shared libraries of the form libfftw3.so.2.6.6 instead of cannam@167: libfftw3.so.3.*. cannam@167: cannam@167: FFTW 3.3.6: cannam@167: cannam@167: * The fftw_make_planner_thread_safe() API introduced in 3.3.5 didn't cannam@167: work, and this 3.3.6 fixes it. Sorry about that. cannam@167: * compilation fixes for IBM XLC cannam@167: * compilation fixes for threads on Windows cannam@167: * fix SIMD autodetection on amd64 when (_MSC_VER > 1500) cannam@167: cannam@167: FFTW 3.3.5: cannam@167: cannam@167: * New SIMD support: cannam@167: - Power8 VSX instructions in single and double precision. cannam@167: To use, add --enable-vsx to configure. cannam@167: - Support for AVX2 (256-bit FMA instructions). cannam@167: To use, add --enable-avx2 to configure. cannam@167: - Experimental support for AVX512 and KCVI. (--enable-avx512, --enable-kcvi) cannam@167: This code is expected to work but the FFTW maintainers do not have cannam@167: hardware to test it. cannam@167: - Support for AVX128/FMA (for some AMD machines) (--enable-avx128-fma) cannam@167: - Double precision Neon SIMD for aarch64. cannam@167: This code is expected to work but the FFTW maintainers do not have cannam@167: hardware to test it. cannam@167: - generic SIMD support using gcc vector intrinsics cannam@167: * Add fftw_make_planner_thread_safe() API cannam@167: * fix #18 (disable float128 for CUDACC) cannam@167: * fix #19: missing Fortran interface for fftwq_alloc_real cannam@167: * fix #21 (don't use float128 on Portland compilers, which pretend to be gcc) cannam@167: * fix: Avoid segfaults due to double free in MPI transpose cannam@167: cannam@167: * Special note for distribution maintainers: Although FFTW supports a cannam@167: zillion SIMD instruction sets, enabling them all at the same time is cannam@167: a bad idea, because it increases the planning time for minimal gain. cannam@167: We recommend that general-purpose x86 distributions only enable SSE2 cannam@167: and perhaps AVX. Users who care about the last ounce of performance cannam@167: should recompile FFTW themselves. cannam@167: cannam@167: FFTW 3.3.4 cannam@167: cannam@167: * New functions fftw_alignment_of (to check whether two arrays are cannam@167: equally aligned for the purposes of applying a plan) and fftw_sprint_plan cannam@167: (to output a description of plan to a string). cannam@167: cannam@167: * Bugfix in fftw-wisdom-to-conf; thanks to Florian Oppermann for the cannam@167: bug report. cannam@167: cannam@167: * Fixed manual to work with texinfo-5. cannam@167: cannam@167: * Increased timing interval on x86_64 to reduce timing errors. cannam@167: cannam@167: * Default to Win32 threads, not pthreads, if both are present. cannam@167: cannam@167: * Various build-script fixes. cannam@167: cannam@167: FFTW 3.3.3 cannam@167: cannam@167: * Fix deadlock bug in MPI transforms (thanks to Michael Pippig for the cannam@167: bug report and patch, and to Graham Dennis for the bug report). cannam@167: cannam@167: * Use 128-bit ARM NEON instructions instead of 64-bits. This change cannam@167: appears to speed up even ARM processors with a 64-bit NEON pipe. cannam@167: cannam@167: * Speed improvements for single-precision AVX. cannam@167: cannam@167: * Speed up planner on machines without "official" cycle counters, such as ARM. cannam@167: cannam@167: FFTW 3.3.2 cannam@167: cannam@167: * Removed an archaic stack-alignment hack that was failing with cannam@167: gcc-4.7/i386. cannam@167: cannam@167: * Added stack-alignment hack necessary for gcc on Windows/i386. We cannam@167: will regret this in ten years (see previous change). cannam@167: cannam@167: * Fix incompatibility with Intel icc which pretends to be gcc cannam@167: but does not support quad precision. cannam@167: cannam@167: * make libfftw{threads,mpi} depend upon libfftw when using libtool; cannam@167: this is consistent with most other libraries and simplifies the life cannam@167: of various distributors of GNU/Linux. cannam@167: cannam@167: FFTW 3.3.1 cannam@167: cannam@167: * Changes since 3.3.1-beta1: cannam@167: cannam@167: - Reduced planning time in estimate mode for sizes with large cannam@167: prime factors. cannam@167: cannam@167: - Added AVX autodetection under Visual Studio. Thanks Carsten cannam@167: Steger for submitting the necessary code. cannam@167: cannam@167: - Modern Fortran interface now uses a separate fftw3l.f03 interface cannam@167: file for the long double interface, which is not supported by cannam@167: some Fortran compilers. Provided new fftw3q.f03 interface file cannam@167: to access the quadruple-precision FFTW routines with recent cannam@167: versions of gcc/gfortran. cannam@167: cannam@167: * Added support for the NEON extensions to the ARM ISA. (Note to beta cannam@167: users: an ARM cycle counter is not yet implemented; please contact cannam@167: fftw@fftw.org if you know how to do it right.) cannam@167: cannam@167: * MPI code now compiles even if mpicc is a C++ compiler; thanks to cannam@167: Kyle Spyksma for the bug report. cannam@167: cannam@167: FFTW 3.3 cannam@167: cannam@167: * Changes since 3.3-beta1: cannam@167: cannam@167: - Compiling OpenMP support (--enable-openmp) now installs a cannam@167: fftw3_omp library, instead of fftw3_threads, so that OpenMP cannam@167: and POSIX threads (--enable-threads) libraries can be built cannam@167: and installed at the same time. cannam@167: cannam@167: - Various minor compilation fixes, corrections of manual typos, and cannam@167: improvements to the benchmark test program. cannam@167: cannam@167: * Add support for the AVX extensions to x86 and x86-64. The AVX code cannam@167: works with 16-byte alignment (as opposed to 32-byte alignment), cannam@167: so there is no ABI change compared to FFTW 3.2.2. cannam@167: cannam@167: * Added Fortran 2003 interface, which should be usable on most modern cannam@167: Fortran compilers (e.g. gfortran) and provides type-checked access cannam@167: to the the C FFTW interface. (The legacy Fortran-77 interface is cannam@167: still included also.) cannam@167: cannam@167: * Added MPI distributed-memory transforms. Compared to 3.3alpha, cannam@167: the major changes in the MPI transforms are: cannam@167: - Fixed some deadlock and crashing bugs. cannam@167: - Added Fortran 2003 interface. cannam@167: - Added new-array execute functions for MPI plans. cannam@167: - Eliminated use of large MPI tags, since Cray MPI requires tags < 2^24; cannam@167: thanks to Jonathan Bentz for the bug report. cannam@167: - Expanded documentation. cannam@167: - 'make check' now runs MPI tests cannam@167: - Some ABI changes - not binary-compatible with 3.3alpha MPI. cannam@167: cannam@167: * Add support for quad-precision __float128 in gcc 4.6 or later (on x86. cannam@167: x86-64, and Itanium). The new routines use the fftwq_ prefix. cannam@167: cannam@167: * Removed support for MIPS paired-single instructions due to lack of cannam@167: available hardware for testing. Users who want this functionality cannam@167: should continue using FFTW 3.2.x. (Note that FFTW 3.3 still works cannam@167: on MIPS; this only concerns special instructions available on some cannam@167: MIPS chips.) cannam@167: cannam@167: * Removed support for the Cell Broadband Engine. Cell users should cannam@167: use FFTW 3.2.x. cannam@167: cannam@167: * New convenience functions fftw_alloc_real and fftw_alloc_complex cannam@167: to use fftw_malloc for real and complex arrays without typecasts cannam@167: or sizeof. cannam@167: cannam@167: * New convenience functions fftw_export_wisdom_to_filename and cannam@167: fftw_import_wisdom_from_filename that export/import wisdom cannam@167: to a file, which don't require you to open/close the file yourself. cannam@167: cannam@167: * New function fftw_cost to return FFTW's internal cost metric for cannam@167: a given plan; thanks to Rhys Ulerich and Nathanael Schaeffer for the cannam@167: suggestion. cannam@167: cannam@167: * The --enable-sse2 configure flag now works in both double and single cannam@167: precision (and is equivalent to --enable-sse in the latter case). cannam@167: cannam@167: * Remove --enable-portable-binary flag: we new produce portable binaries cannam@167: by default. cannam@167: cannam@167: * Remove the automatic detection of native architecture flag for gcc cannam@167: which was introduced in fftw-3.1, since new gcc supports -mtune=native. cannam@167: Remove the --with-gcc-arch flag; if you want to specify a particlar cannam@167: arch to configure, use ./configure CC="gcc -mtune=...". cannam@167: cannam@167: * --with-our-malloc16 configure flag is now renamed --with-our-malloc. cannam@167: cannam@167: * Fixed build problem failure when srand48 declaration is missing; cannam@167: thanks to Ralf Wildenhues for the bug report. cannam@167: cannam@167: * Fixed bug in fftw_set_timelimit: ensure that a negative timelimit cannam@167: is equivalent to no timelimit in all cases. Thanks to William Andrew cannam@167: Burnson for the bug report. cannam@167: cannam@167: * Fixed stack-overflow problem on OpenBSD caused by using alloca with cannam@167: too large a buffer. cannam@167: cannam@167: FFTW 3.2.2 cannam@167: cannam@167: * Improve performance of some copy operations of complex arrays on cannam@167: x86 machines. cannam@167: cannam@167: * Add configure flag to disable alloca(), which is broken in mingw64. cannam@167: cannam@167: * Planning in FFTW_ESTIMATE mode for r2r transforms became slower cannam@167: between fftw-3.1.3 and 3.2. This regression has now been fixed. cannam@167: cannam@167: FFTW 3.2.1 cannam@167: cannam@167: * Performance improvements for some multidimensional r2c/c2r transforms; cannam@167: thanks to Eugene Miloslavsky for his benchmark reports. cannam@167: cannam@167: * Compile with icc on MacOS X, use better icc compiler flags. cannam@167: cannam@167: * Compilation fixes for systems where snprintf is defined as a macro; cannam@167: thanks to Marcus Mae for the bug report. cannam@167: cannam@167: * Fortran documentation now recommends not using dfftw_execute, cannam@167: because of reports of problems with various Fortran compilers; cannam@167: it is better to use dfftw_execute_dft etcetera. cannam@167: cannam@167: * Some documentation clarifications, e.g. of fact that --enable-openmp cannam@167: and --enable-threads are mutually exclusive (thanks to Long To), cannam@167: and document slightly odd behavior of plan_guru_r2r in Fortran cannam@167: (thanks to Alexander Pozdneev). cannam@167: cannam@167: * FAQ was accidentally omitted from 3.2 tarball. cannam@167: cannam@167: * Remove some extraneous (harmless) files accidentally included in cannam@167: a subdirectory of the 3.2 tarball. cannam@167: cannam@167: FFTW 3.2 cannam@167: cannam@167: * Worked around apparent glibc bug that leads to rare hangs when freeing cannam@167: semaphores. cannam@167: cannam@167: * Fixed segfault due to unaligned access in certain obscure problems cannam@167: that use SSE and multiple threads. cannam@167: cannam@167: * MPI transforms not included, as they are still in alpha; the alpha cannam@167: versions of the MPI transforms have been moved to FFTW 3.3alpha1. cannam@167: cannam@167: FFTW 3.2alpha3 cannam@167: cannam@167: * Performance improvements for sizes with factors of 5 and 10. cannam@167: cannam@167: * Documented FFTW_WISDOM_ONLY flag, at the suggestion of Mario cannam@167: Emmenlauer and Phil Dumont. cannam@167: cannam@167: * Port Cell code to SDK2.1 (libspe2), as opposed to the old libspe1 code. cannam@167: cannam@167: * Performance improvements in Cell code for N < 32k, thanks to Jan Wagner cannam@167: for the suggestions. cannam@167: cannam@167: * Cycle counter for Sun x86_64 compiler, and compilation fix in cycle cannam@167: counter for AIX/xlc (thanks to Jeff Haferman for the bug report). cannam@167: cannam@167: * Fixed incorrect type prefix in MPI code that prevented wisdom routines cannam@167: from working in single precision (thanks to Eric A. Borisch for the report). cannam@167: cannam@167: * Added 'make check' for MPI code (which still fails in a couple corner cannam@167: cases, but should be much better than in alpha2). cannam@167: cannam@167: * Many other small fixes. cannam@167: cannam@167: FFTW 3.2alpha2 cannam@167: cannam@167: * Support for the Cell processor, donated by IBM Research; see README.Cell cannam@167: and the Cell section of the manual. cannam@167: cannam@167: * New 64-bit API: for every "plan_guru" function there is a new "plan_guru64" cannam@167: function with the same semantics, but which takes fftw_iodim64 instead of cannam@167: fftw_iodim. fftw_iodim64 is the same as fftw_iodim, except that it takes cannam@167: ptrdiff_t integer types as parameters, which is a 64-bit type on cannam@167: 64-bit machines. This is only useful for specifying very large transforms cannam@167: on 64-bit machines. (Internally, FFTW uses ptrdiff_t everywhere cannam@167: regardless of what API you choose.) cannam@167: cannam@167: * Experimental MPI support. Complex one- and multi-dimensional FFTs, cannam@167: multi-dimensional r2r, multi-dimensional r2c/c2r transforms, and cannam@167: distributed transpose operations, with 1d block distributions. cannam@167: (This is an alpha preview: routines have not been exhaustively cannam@167: tested, documentation is incomplete, and some functionality is cannam@167: missing, e.g. Fortran support.) See mpi/README and also the MPI cannam@167: section of the manual. cannam@167: cannam@167: * Significantly faster r2c/c2r transforms, especially on machines with SIMD. cannam@167: cannam@167: * Rewritten multi-threaded support for better performance by cannam@167: re-using a fixed pool of threads rather than continually cannam@167: respawning and joining (which nowadays is much slower). cannam@167: cannam@167: * Support for MIPS paired-single SIMD instructions, donated by cannam@167: Codesourcery. cannam@167: cannam@167: * FFTW_WISDOM_ONLY planner flag, to create plan only if wisdom is cannam@167: available and return NULL otherwise. cannam@167: cannam@167: * Removed k7 support, which only worked in 32-bit mode and is cannam@167: becoming obsolete. Use --enable-sse instead. cannam@167: cannam@167: * Added --with-g77-wrappers configure option to force inclusion cannam@167: of g77 wrappers, in addition to whatever is needed for the cannam@167: detected Fortran compilers. This is mainly intended for GNU/Linux cannam@167: distros switching to gfortran that wish to include both cannam@167: gfortran and g77 support in FFTW. cannam@167: cannam@167: * In manual, renamed "guru execute" functions to "new-array execute" cannam@167: functions, to reduce confusion with the guru planner interface. cannam@167: (The programming interface is unchanged.) cannam@167: cannam@167: * Add missing __declspec attribute to threads API functions when compiling cannam@167: for Windows; thanks to Robert O. Morris for the bug report. cannam@167: cannam@167: * Fixed missing return value from dfftw_init_threads in Fortran; cannam@167: thanks to Markus Wetzstein for the bug report. cannam@167: cannam@167: FFTW 3.1.3 cannam@167: cannam@167: * Bug fix: FFTW computes incorrect results when the user plans both cannam@167: REDFT11 and RODFT11 transforms of certain sizes. The bug is caused cannam@167: by incorrect sharing of twiddle-factor tables between the two cannam@167: transforms, and only occurs when both are used. Thanks to Paul cannam@167: A. Valiant for the bug report. cannam@167: cannam@167: FFTW 3.1.2 cannam@167: cannam@167: * Correct bug in configure script: --enable-portable-binary option was ignored! cannam@167: Thanks to Andrew Salamon for the bug report. cannam@167: cannam@167: * Threads compilation fix on AIX: prefer xlc_r to cc_r, and don't use cannam@167: either if we are using gcc. Thanks to Guy Moebs for the bug report. cannam@167: cannam@167: * Updated FAQ to note that Apple gcc 4.0.1 on MacOS/Intel is broken, cannam@167: and suggest a workaround. configure script now detects Core/Duo arch. cannam@167: cannam@167: * Use -maltivec when checking for altivec.h. Fixes Gentoo bug #129304, cannam@167: thanks to Markus Dittrich. cannam@167: cannam@167: FFTW 3.1.1 cannam@167: cannam@167: * Performance improvements for Intel EMT64. cannam@167: cannam@167: * Performance improvements for large-size transforms with SIMD. cannam@167: cannam@167: * Cycle counter support for Intel icc and Visual C++ on x86-64. cannam@167: cannam@167: * In fftw-wisdom tool, replaced obsolete --impatient with --measure. cannam@167: cannam@167: * Fixed compilation failure with AIX/xlc; thanks to Joseph Thomas. cannam@167: cannam@167: * Windows DLL support for Fortran API (added missing __declspec(dllexport)). cannam@167: cannam@167: * SSE/SSE2 code works properly (i.e. disables itself) on older 386 and 486 cannam@167: CPUs lacking a CPUID instruction; thanks to Eric Korpela. cannam@167: cannam@167: FFTW 3.1 cannam@167: cannam@167: * Faster FFTW_ESTIMATE planner. cannam@167: cannam@167: * New (faster) algorithm for REDFT00/RODFT00 (type-I DCT/DST) of odd size. cannam@167: cannam@167: * "4-step" algorithm for faster FFTs of very large sizes (> 2^18). cannam@167: cannam@167: * Faster in-place real-data DFTs (for R2HC and HC2R r2r formats). cannam@167: cannam@167: * Faster in-place non-square transpositions (FFTW uses these internally cannam@167: for in-place FFTs, and you can also perform them explicitly using cannam@167: the guru interface). cannam@167: cannam@167: * Faster prime-size DFTs: implemented Bluestein's algorithm, as well cannam@167: as a zero-padded Rader variant to limit recursive use of Rader's algorithm. cannam@167: cannam@167: * SIMD support for split complex arrays. cannam@167: cannam@167: * Much faster Altivec/VMX performance. cannam@167: cannam@167: * New fftw_set_timelimit function to specify a (rough) upper bound to the cannam@167: planning time (does not affect ESTIMATE mode). cannam@167: cannam@167: * Removed --enable-3dnow support; use --enable-k7 instead. cannam@167: cannam@167: * FMA (fused multiply-add) version is now included in "standard" FFTW, cannam@167: and is enabled with --enable-fma (the default on PowerPC and Itanium). cannam@167: cannam@167: * Automatic detection of native architecture flag for gcc. New cannam@167: configure options: --enable-portable-binary and --with-gcc-arch=, cannam@167: for people distributing compiled binaries of FFTW (see manual). cannam@167: cannam@167: * Automatic detection of Altivec under Linux with gcc 3.4 (so that cannam@167: same binary should work on both Altivec and non-Altivec PowerPCs). cannam@167: cannam@167: * Compiler-specific tweaks/flags/workarounds for gcc 3.4, xlc, HP/UX, cannam@167: Solaris/Intel. cannam@167: cannam@167: * Various documentation clarifications. cannam@167: cannam@167: * 64-bit clean. (Fixes a bug affecting the split guru planner on cannam@167: 64-bit machines, reported by David Necas.) cannam@167: cannam@167: * Fixed Debian bug #259612: inadvertent use of SSE instructions on cannam@167: non-SSE machines (causing a crash) for --enable-sse binaries. cannam@167: cannam@167: * Fixed bug that caused HC2R transforms to destroy the input in cannam@167: certain cases, even if the user specified FFTW_PRESERVE_INPUT. cannam@167: cannam@167: * Fixed bug where wisdom would be lost under rare circumstances, cannam@167: causing excessive planning time. cannam@167: cannam@167: * FAQ notes bug in gcc-3.4.[1-3] that causes FFTW to crash with SSE/SSE2. cannam@167: cannam@167: * Fixed accidentally exported symbol that prohibited simultaneous cannam@167: linking to double/single multithreaded FFTW (thanks to Alessio Massaro). cannam@167: cannam@167: * Support Win32 threads under MinGW (thanks to Alessio Massaro). cannam@167: cannam@167: * Fixed problem with building DLL under Cygwin; thanks to Stephane Fillod. cannam@167: cannam@167: * Fix build failure if no Fortran compiler is found (thanks to Charles cannam@167: Radley for the bug report). cannam@167: cannam@167: * Fixed compilation failure with icc 8.0 and SSE/SSE2. Automatic cannam@167: detection of icc architecture flag (e.g. -xW). cannam@167: cannam@167: * Fixed compilation with OpenMP on AIX (thanks to Greg Bauer). cannam@167: cannam@167: * Fixed compilation failure on x86-64 with gcc (thanks to Orion Poplawski). cannam@167: cannam@167: * Incorporated patch from FreeBSD ports (FreeBSD does not have memalign, cannam@167: but its malloc is 16-byte aligned). cannam@167: cannam@167: * Cycle-counter compilation fixes for Itanium, Alpha, x86-64, Sparc, cannam@167: MacOS (thanks to Matt Boman, John Bowman, and James A. Treacy for cannam@167: reports/fixes). Added x86-64 cycle counter for PGI compilers, cannam@167: courtesy Cristiano Calonaci. cannam@167: cannam@167: * Fix compilation problem in test program due to C99 conflict. cannam@167: cannam@167: * Portability fix for import_system_wisdom with djgpp (thanks to Juan cannam@167: Manuel Guerrero). cannam@167: cannam@167: * Fixed compilation failure on MacOS 10.3 due to getopt conflict. cannam@167: cannam@167: * Work around Visual C++ (version 6/7) bug in SSE compilation; cannam@167: thanks to Eddie Yee for his detailed report. cannam@167: cannam@167: Changes from FFTW 3.1 beta 2: cannam@167: cannam@167: * Several minor compilation fixes. cannam@167: cannam@167: * Eliminate FFTW_TIMELIMIT flag and replace fftw_timelimit global with cannam@167: fftw_set_timelimit function. Make wisdom work with time-limited plans. cannam@167: cannam@167: Changes from FFTW 3.1 beta 1: cannam@167: cannam@167: * Fixes for creating DLLs under Windows; thanks to John Pavel for his feedback. cannam@167: cannam@167: * Fixed more 64-bit problems, thanks to John Pavel for the bug report. cannam@167: cannam@167: * Further speed improvements for Altivec/VMX. cannam@167: cannam@167: * Further speed improvements for non-square transpositions. cannam@167: cannam@167: * Many minor tweaks. cannam@167: cannam@167: FFTW 3.0.1 cannam@167: cannam@167: * Some speed improvements in SIMD code. cannam@167: cannam@167: * --without-cycle-counter option is removed. If no cycle counter is found, cannam@167: then the estimator is always used. A --with-slow-timer option is provided cannam@167: to force the use of lower-resolution timers. cannam@167: cannam@167: * Several fixes for compilation under Visual C++, with help from Stefane Ruel. cannam@167: cannam@167: * Added x86 cycle counter for Visual C++, with help from Morten Nissov. cannam@167: cannam@167: * Added S390 cycle counter, courtesy of James Treacy. cannam@167: cannam@167: * Added missing static keyword that prevented simultaneous linkage cannam@167: of different-precision versions; thanks to Rasmus Larsen for the bug report. cannam@167: cannam@167: * Corrected accidental omission of f77_wisdom.f file; thanks to Alan Watson. cannam@167: cannam@167: * Support -xopenmp flag for SunOS; thanks to John Lou for the bug report. cannam@167: cannam@167: * Compilation with HP/UX cc requires -Wp,-H128000 flag to increase cannam@167: preprocessor limits; thanks to Peter Vouras for the bug report. cannam@167: cannam@167: * Removed non-portable use of 'tempfile' in fftw-wisdom-to-conf script; cannam@167: thanks to Nicolas Decoster for the patch. cannam@167: cannam@167: * Added 'make smallcheck' target in tests/ directory, at the request of cannam@167: James Treacy. cannam@167: cannam@167: FFTW 3.0 cannam@167: cannam@167: Major goals of this release: cannam@167: cannam@167: * Speed: often 20% or more faster than FFTW 2.x, even without SIMD (see below). cannam@167: cannam@167: * Complete rewrite, to make it easier to add new algorithms and transforms. cannam@167: cannam@167: * New API, to support more general semantics. cannam@167: cannam@167: Other enhancements: cannam@167: cannam@167: * SIMD acceleration on supporting CPUs (SSE, SSE2, 3DNow!, and AltiVec). cannam@167: (With special thanks to Franz Franchetti for many experimental prototypes cannam@167: and to Stefan Kral for the vectorizing generator from fftwgel.) cannam@167: cannam@167: * True in-place 1d transforms of large sizes (as well as compressed cannam@167: twiddle tables for additional memory/cache savings). cannam@167: cannam@167: * More arbitrary placement of real & imaginary data, e.g. including cannam@167: interleaved (as in FFTW 2.x) as well as separate real/imag arrays. cannam@167: cannam@167: * Efficient prime-size transforms of real data. cannam@167: cannam@167: * Multidimensional transforms can operate on a subset of a larger matrix, cannam@167: and/or transform selected dimensions of a multidimensional array. cannam@167: cannam@167: * By popular demand, simultaneous linking to double precision (fftw), cannam@167: single precision (fftwf), and long-double precision (fftwl) versions cannam@167: of FFTW is now supported. cannam@167: cannam@167: * Cycle counters (on all modern CPUs) are exploited to speed planning. cannam@167: cannam@167: * Efficient transforms of real even/odd arrays, a.k.a. discrete cannam@167: cosine/sine transforms (types I-IV). (Currently work via pre/post cannam@167: processing of real transforms, ala FFTPACK, so are not optimal.) cannam@167: cannam@167: * DHTs (Discrete Hartley Transforms), again via post-processing cannam@167: of real transforms (and thus suboptimal, for now). cannam@167: cannam@167: * Support for linking to just those parts of FFTW that you need, cannam@167: greatly reducing the size of statically linked programs when cannam@167: only a limited set of transform sizes/types are required. cannam@167: cannam@167: * Canonical global wisdom file (/etc/fftw/wisdom) on Unix, along cannam@167: with a command-line tool (fftw-wisdom) to generate/update it. cannam@167: cannam@167: * Fortran API can be used with both g77 and non-g77 compilers cannam@167: simultaneously. cannam@167: cannam@167: * Multi-threaded version has optional OpenMP support. cannam@167: cannam@167: * Authors' good looks have greatly improved with age. cannam@167: cannam@167: Changes from 3.0beta3: cannam@167: cannam@167: * Separate FMA distribution to better exploit fused multiply-add instructions cannam@167: on PowerPC (and possibly other) architectures. cannam@167: cannam@167: * Performance improvements via some inlining tweaks. cannam@167: cannam@167: * fftw_flops now returns double arguments, not int, to avoid overflows cannam@167: for large sizes. cannam@167: cannam@167: * Workarounds for automake bugs. cannam@167: cannam@167: Changes from 3.0beta2: cannam@167: cannam@167: * The standard REDFT00/RODFT00 (DCT-I/DST-I) algorithm (used in cannam@167: FFTPACK, NR, etcetera) turns out to have poor numerical accuracy, so cannam@167: we replaced it with a slower routine that is more accurate. cannam@167: cannam@167: * The guru planner and execute functions now have two variants, one that cannam@167: takes complex arguments and one that takes separate real/imag pointers. cannam@167: cannam@167: * Execute and planner routines now automatically align the stack on x86, cannam@167: in case the calling program is misaligned. cannam@167: cannam@167: * README file for test program. cannam@167: cannam@167: * Fixed bugs in the combination of SIMD with multi-threaded transforms. cannam@167: cannam@167: * Eliminated internal fftw_threads_init function, which some people were cannam@167: calling accidentally instead of the fftw_init_threads API function. cannam@167: cannam@167: * Check for -openmp flag (Intel C compiler) when --enable-openmp is used. cannam@167: cannam@167: * Support AMD x86-64 SIMD and cycle counter. cannam@167: cannam@167: * Support SSE2 intrinsics in forthcoming gcc 3.3. cannam@167: cannam@167: Changes from 3.0beta1: cannam@167: cannam@167: * Faster in-place 1d transforms of non-power-of-two sizes. cannam@167: cannam@167: * SIMD improvements for in-place, multi-dimensional, and/or non-FFTW_PATIENT cannam@167: transforms. cannam@167: cannam@167: * Added support for hard-coded DCT/DST/DHT codelets of small sizes; the cannam@167: default distribution only includes hard-coded size-8 DCT-II/III, however. cannam@167: cannam@167: * Many minor improvements to the manual. Added section on using the cannam@167: codelet generator to customize and enhance FFTW. cannam@167: cannam@167: * The default 'make check' should now only take a few minutes; for more cannam@167: strenuous tests (which may take a day or so), do 'cd tests; make bigcheck'. cannam@167: cannam@167: * fftw_print_plan is split into fftw_fprint_plan and fftw_print_plan, where cannam@167: the latter uses stdout. cannam@167: cannam@167: * Fixed ability to compile with a C++ compiler. cannam@167: cannam@167: * Fixed support for C99 complex type under glibc. cannam@167: cannam@167: * Fixed problems with alloca under MinGW, AIX. cannam@167: cannam@167: * Workaround for gcc/SPARC bug. cannam@167: cannam@167: * Fixed multi-threaded initialization failure on IRIX due to lack of cannam@167: user-accessible PTHREAD_SCOPE_SYSTEM there.