sv-dependency-builds: src/fftw-3.3.8/NEWS annotate

annotate src/fftw-3.3.8/NEWS @ 82:d0c2a83c1364

Add FFTW 3.3.8 source, and a Linux build

author	Chris Cannam
date	Tue, 19 Nov 2019 14:52:55 +0000
parents
children

rev	line source
Chris@82	1 FFTW 3.3.8:
Chris@82	2
Chris@82	3 * Fixed AVX, AVX2 for gcc-8.
Chris@82	4
Chris@82	5 By default, FFTW 3.3.7 was broken with gcc-8. AVX and AVX2 code
Chris@82	6 assumed that the compiler honors the distinction between +0 and -0,
Chris@82	7 but gcc-8 -ffast-math does not. The default CFLAGS included -ffast-math.
Chris@82	8 This release ensures that FFTW works with gcc-8 -ffast-math, and
Chris@82	9 removes -ffast-math from the default CFLAGS for good measure.
Chris@82	10
Chris@82	11 FFTW 3.3.7:
Chris@82	12
Chris@82	13 * Experimental support for CMake.
Chris@82	14
Chris@82	15 The primary build mechanism for FFTW remains GNU autoconf/automake.
Chris@82	16 CMake support is meant to offer an easy way to compile FFTW on
Chris@82	17 Windows, and as such it does not cover all the features of the
Chris@82	18 automake build system, such as exotic cycle counters,
Chris@82	19 cross-compiling, or build of binaries for a mixture of ISA's
Chris@82	20 (e.g., amd64 vs amd64+avx vs amd64+avx2). Patches are welcome.
Chris@82	21
Chris@82	22 * Fixes for armv7a cycle counter.
Chris@82	23 * Official support for aarch64, now that we have hardware to test it.
Chris@82	24 * Tweak usage of FMA instructions in a way that favors newer processors
Chris@82	25 (Skylake and Ryzen) over older processors (Haswell).
Chris@82	26 * tests/bench: use 64-bit precision to compute mflops.
Chris@82	27
Chris@82	28 FFTW 3.3.6-pl2:
Chris@82	29
Chris@82	30 * Bugfix: MPI Fortran-03 headers were missing in FFTW 3.3.6-pl1.
Chris@82	31
Chris@82	32 FFTW 3.3.6-pl1:
Chris@82	33
Chris@82	34 * Bugfix: FFTW 3.3.6 had the wrong libtool version number, and generated
Chris@82	35 shared libraries of the form libfftw3.so.2.6.6 instead of
Chris@82	36 libfftw3.so.3.*.
Chris@82	37
Chris@82	38 FFTW 3.3.6:
Chris@82	39
Chris@82	40 * The fftw_make_planner_thread_safe() API introduced in 3.3.5 didn't
Chris@82	41 work, and this 3.3.6 fixes it. Sorry about that.
Chris@82	42 * compilation fixes for IBM XLC
Chris@82	43 * compilation fixes for threads on Windows
Chris@82	44 * fix SIMD autodetection on amd64 when (_MSC_VER > 1500)
Chris@82	45
Chris@82	46 FFTW 3.3.5:
Chris@82	47
Chris@82	48 * New SIMD support:
Chris@82	49 - Power8 VSX instructions in single and double precision.
Chris@82	50 To use, add --enable-vsx to configure.
Chris@82	51 - Support for AVX2 (256-bit FMA instructions).
Chris@82	52 To use, add --enable-avx2 to configure.
Chris@82	53 - Experimental support for AVX512 and KCVI. (--enable-avx512, --enable-kcvi)
Chris@82	54 This code is expected to work but the FFTW maintainers do not have
Chris@82	55 hardware to test it.
Chris@82	56 - Support for AVX128/FMA (for some AMD machines) (--enable-avx128-fma)
Chris@82	57 - Double precision Neon SIMD for aarch64.
Chris@82	58 This code is expected to work but the FFTW maintainers do not have
Chris@82	59 hardware to test it.
Chris@82	60 - generic SIMD support using gcc vector intrinsics
Chris@82	61 * Add fftw_make_planner_thread_safe() API
Chris@82	62 * fix #18 (disable float128 for CUDACC)
Chris@82	63 * fix #19: missing Fortran interface for fftwq_alloc_real
Chris@82	64 * fix #21 (don't use float128 on Portland compilers, which pretend to be gcc)
Chris@82	65 * fix: Avoid segfaults due to double free in MPI transpose
Chris@82	66
Chris@82	67 * Special note for distribution maintainers: Although FFTW supports a
Chris@82	68 zillion SIMD instruction sets, enabling them all at the same time is
Chris@82	69 a bad idea, because it increases the planning time for minimal gain.
Chris@82	70 We recommend that general-purpose x86 distributions only enable SSE2
Chris@82	71 and perhaps AVX. Users who care about the last ounce of performance
Chris@82	72 should recompile FFTW themselves.
Chris@82	73
Chris@82	74 FFTW 3.3.4
Chris@82	75
Chris@82	76 * New functions fftw_alignment_of (to check whether two arrays are
Chris@82	77 equally aligned for the purposes of applying a plan) and fftw_sprint_plan
Chris@82	78 (to output a description of plan to a string).
Chris@82	79
Chris@82	80 * Bugfix in fftw-wisdom-to-conf; thanks to Florian Oppermann for the
Chris@82	81 bug report.
Chris@82	82
Chris@82	83 * Fixed manual to work with texinfo-5.
Chris@82	84
Chris@82	85 * Increased timing interval on x86_64 to reduce timing errors.
Chris@82	86
Chris@82	87 * Default to Win32 threads, not pthreads, if both are present.
Chris@82	88
Chris@82	89 * Various build-script fixes.
Chris@82	90
Chris@82	91 FFTW 3.3.3
Chris@82	92
Chris@82	93 * Fix deadlock bug in MPI transforms (thanks to Michael Pippig for the
Chris@82	94 bug report and patch, and to Graham Dennis for the bug report).
Chris@82	95
Chris@82	96 * Use 128-bit ARM NEON instructions instead of 64-bits. This change
Chris@82	97 appears to speed up even ARM processors with a 64-bit NEON pipe.
Chris@82	98
Chris@82	99 * Speed improvements for single-precision AVX.
Chris@82	100
Chris@82	101 * Speed up planner on machines without "official" cycle counters, such as ARM.
Chris@82	102
Chris@82	103 FFTW 3.3.2
Chris@82	104
Chris@82	105 * Removed an archaic stack-alignment hack that was failing with
Chris@82	106 gcc-4.7/i386.
Chris@82	107
Chris@82	108 * Added stack-alignment hack necessary for gcc on Windows/i386. We
Chris@82	109 will regret this in ten years (see previous change).
Chris@82	110
Chris@82	111 * Fix incompatibility with Intel icc which pretends to be gcc
Chris@82	112 but does not support quad precision.
Chris@82	113
Chris@82	114 * make libfftw{threads,mpi} depend upon libfftw when using libtool;
Chris@82	115 this is consistent with most other libraries and simplifies the life
Chris@82	116 of various distributors of GNU/Linux.
Chris@82	117
Chris@82	118 FFTW 3.3.1
Chris@82	119
Chris@82	120 * Changes since 3.3.1-beta1:
Chris@82	121
Chris@82	122 - Reduced planning time in estimate mode for sizes with large
Chris@82	123 prime factors.
Chris@82	124
Chris@82	125 - Added AVX autodetection under Visual Studio. Thanks Carsten
Chris@82	126 Steger for submitting the necessary code.
Chris@82	127
Chris@82	128 - Modern Fortran interface now uses a separate fftw3l.f03 interface
Chris@82	129 file for the long double interface, which is not supported by
Chris@82	130 some Fortran compilers. Provided new fftw3q.f03 interface file
Chris@82	131 to access the quadruple-precision FFTW routines with recent
Chris@82	132 versions of gcc/gfortran.
Chris@82	133
Chris@82	134 * Added support for the NEON extensions to the ARM ISA. (Note to beta
Chris@82	135 users: an ARM cycle counter is not yet implemented; please contact
Chris@82	136 fftw@fftw.org if you know how to do it right.)
Chris@82	137
Chris@82	138 * MPI code now compiles even if mpicc is a C++ compiler; thanks to
Chris@82	139 Kyle Spyksma for the bug report.
Chris@82	140
Chris@82	141 FFTW 3.3
Chris@82	142
Chris@82	143 * Changes since 3.3-beta1:
Chris@82	144
Chris@82	145 - Compiling OpenMP support (--enable-openmp) now installs a
Chris@82	146 fftw3_omp library, instead of fftw3_threads, so that OpenMP
Chris@82	147 and POSIX threads (--enable-threads) libraries can be built
Chris@82	148 and installed at the same time.
Chris@82	149
Chris@82	150 - Various minor compilation fixes, corrections of manual typos, and
Chris@82	151 improvements to the benchmark test program.
Chris@82	152
Chris@82	153 * Add support for the AVX extensions to x86 and x86-64. The AVX code
Chris@82	154 works with 16-byte alignment (as opposed to 32-byte alignment),
Chris@82	155 so there is no ABI change compared to FFTW 3.2.2.
Chris@82	156
Chris@82	157 * Added Fortran 2003 interface, which should be usable on most modern
Chris@82	158 Fortran compilers (e.g. gfortran) and provides type-checked access
Chris@82	159 to the the C FFTW interface. (The legacy Fortran-77 interface is
Chris@82	160 still included also.)
Chris@82	161
Chris@82	162 * Added MPI distributed-memory transforms. Compared to 3.3alpha,
Chris@82	163 the major changes in the MPI transforms are:
Chris@82	164 - Fixed some deadlock and crashing bugs.
Chris@82	165 - Added Fortran 2003 interface.
Chris@82	166 - Added new-array execute functions for MPI plans.
Chris@82	167 - Eliminated use of large MPI tags, since Cray MPI requires tags < 2^24;
Chris@82	168 thanks to Jonathan Bentz for the bug report.
Chris@82	169 - Expanded documentation.
Chris@82	170 - 'make check' now runs MPI tests
Chris@82	171 - Some ABI changes - not binary-compatible with 3.3alpha MPI.
Chris@82	172
Chris@82	173 * Add support for quad-precision __float128 in gcc 4.6 or later (on x86.
Chris@82	174 x86-64, and Itanium). The new routines use the fftwq_ prefix.
Chris@82	175
Chris@82	176 * Removed support for MIPS paired-single instructions due to lack of
Chris@82	177 available hardware for testing. Users who want this functionality
Chris@82	178 should continue using FFTW 3.2.x. (Note that FFTW 3.3 still works
Chris@82	179 on MIPS; this only concerns special instructions available on some
Chris@82	180 MIPS chips.)
Chris@82	181
Chris@82	182 * Removed support for the Cell Broadband Engine. Cell users should
Chris@82	183 use FFTW 3.2.x.
Chris@82	184
Chris@82	185 * New convenience functions fftw_alloc_real and fftw_alloc_complex
Chris@82	186 to use fftw_malloc for real and complex arrays without typecasts
Chris@82	187 or sizeof.
Chris@82	188
Chris@82	189 * New convenience functions fftw_export_wisdom_to_filename and
Chris@82	190 fftw_import_wisdom_from_filename that export/import wisdom
Chris@82	191 to a file, which don't require you to open/close the file yourself.
Chris@82	192
Chris@82	193 * New function fftw_cost to return FFTW's internal cost metric for
Chris@82	194 a given plan; thanks to Rhys Ulerich and Nathanael Schaeffer for the
Chris@82	195 suggestion.
Chris@82	196
Chris@82	197 * The --enable-sse2 configure flag now works in both double and single
Chris@82	198 precision (and is equivalent to --enable-sse in the latter case).
Chris@82	199
Chris@82	200 * Remove --enable-portable-binary flag: we new produce portable binaries
Chris@82	201 by default.
Chris@82	202
Chris@82	203 * Remove the automatic detection of native architecture flag for gcc
Chris@82	204 which was introduced in fftw-3.1, since new gcc supports -mtune=native.
Chris@82	205 Remove the --with-gcc-arch flag; if you want to specify a particlar
Chris@82	206 arch to configure, use ./configure CC="gcc -mtune=...".
Chris@82	207
Chris@82	208 * --with-our-malloc16 configure flag is now renamed --with-our-malloc.
Chris@82	209
Chris@82	210 * Fixed build problem failure when srand48 declaration is missing;
Chris@82	211 thanks to Ralf Wildenhues for the bug report.
Chris@82	212
Chris@82	213 * Fixed bug in fftw_set_timelimit: ensure that a negative timelimit
Chris@82	214 is equivalent to no timelimit in all cases. Thanks to William Andrew
Chris@82	215 Burnson for the bug report.
Chris@82	216
Chris@82	217 * Fixed stack-overflow problem on OpenBSD caused by using alloca with
Chris@82	218 too large a buffer.
Chris@82	219
Chris@82	220 FFTW 3.2.2
Chris@82	221
Chris@82	222 * Improve performance of some copy operations of complex arrays on
Chris@82	223 x86 machines.
Chris@82	224
Chris@82	225 * Add configure flag to disable alloca(), which is broken in mingw64.
Chris@82	226
Chris@82	227 * Planning in FFTW_ESTIMATE mode for r2r transforms became slower
Chris@82	228 between fftw-3.1.3 and 3.2. This regression has now been fixed.
Chris@82	229
Chris@82	230 FFTW 3.2.1
Chris@82	231
Chris@82	232 * Performance improvements for some multidimensional r2c/c2r transforms;
Chris@82	233 thanks to Eugene Miloslavsky for his benchmark reports.
Chris@82	234
Chris@82	235 * Compile with icc on MacOS X, use better icc compiler flags.
Chris@82	236
Chris@82	237 * Compilation fixes for systems where snprintf is defined as a macro;
Chris@82	238 thanks to Marcus Mae for the bug report.
Chris@82	239
Chris@82	240 * Fortran documentation now recommends not using dfftw_execute,
Chris@82	241 because of reports of problems with various Fortran compilers;
Chris@82	242 it is better to use dfftw_execute_dft etcetera.
Chris@82	243
Chris@82	244 * Some documentation clarifications, e.g. of fact that --enable-openmp
Chris@82	245 and --enable-threads are mutually exclusive (thanks to Long To),
Chris@82	246 and document slightly odd behavior of plan_guru_r2r in Fortran
Chris@82	247 (thanks to Alexander Pozdneev).
Chris@82	248
Chris@82	249 * FAQ was accidentally omitted from 3.2 tarball.
Chris@82	250
Chris@82	251 * Remove some extraneous (harmless) files accidentally included in
Chris@82	252 a subdirectory of the 3.2 tarball.
Chris@82	253
Chris@82	254 FFTW 3.2
Chris@82	255
Chris@82	256 * Worked around apparent glibc bug that leads to rare hangs when freeing
Chris@82	257 semaphores.
Chris@82	258
Chris@82	259 * Fixed segfault due to unaligned access in certain obscure problems
Chris@82	260 that use SSE and multiple threads.
Chris@82	261
Chris@82	262 * MPI transforms not included, as they are still in alpha; the alpha
Chris@82	263 versions of the MPI transforms have been moved to FFTW 3.3alpha1.
Chris@82	264
Chris@82	265 FFTW 3.2alpha3
Chris@82	266
Chris@82	267 * Performance improvements for sizes with factors of 5 and 10.
Chris@82	268
Chris@82	269 * Documented FFTW_WISDOM_ONLY flag, at the suggestion of Mario
Chris@82	270 Emmenlauer and Phil Dumont.
Chris@82	271
Chris@82	272 * Port Cell code to SDK2.1 (libspe2), as opposed to the old libspe1 code.
Chris@82	273
Chris@82	274 * Performance improvements in Cell code for N < 32k, thanks to Jan Wagner
Chris@82	275 for the suggestions.
Chris@82	276
Chris@82	277 * Cycle counter for Sun x86_64 compiler, and compilation fix in cycle
Chris@82	278 counter for AIX/xlc (thanks to Jeff Haferman for the bug report).
Chris@82	279
Chris@82	280 * Fixed incorrect type prefix in MPI code that prevented wisdom routines
Chris@82	281 from working in single precision (thanks to Eric A. Borisch for the report).
Chris@82	282
Chris@82	283 * Added 'make check' for MPI code (which still fails in a couple corner
Chris@82	284 cases, but should be much better than in alpha2).
Chris@82	285
Chris@82	286 * Many other small fixes.
Chris@82	287
Chris@82	288 FFTW 3.2alpha2
Chris@82	289
Chris@82	290 * Support for the Cell processor, donated by IBM Research; see README.Cell
Chris@82	291 and the Cell section of the manual.
Chris@82	292
Chris@82	293 * New 64-bit API: for every "plan_guru" function there is a new "plan_guru64"
Chris@82	294 function with the same semantics, but which takes fftw_iodim64 instead of
Chris@82	295 fftw_iodim. fftw_iodim64 is the same as fftw_iodim, except that it takes
Chris@82	296 ptrdiff_t integer types as parameters, which is a 64-bit type on
Chris@82	297 64-bit machines. This is only useful for specifying very large transforms
Chris@82	298 on 64-bit machines. (Internally, FFTW uses ptrdiff_t everywhere
Chris@82	299 regardless of what API you choose.)
Chris@82	300
Chris@82	301 * Experimental MPI support. Complex one- and multi-dimensional FFTs,
Chris@82	302 multi-dimensional r2r, multi-dimensional r2c/c2r transforms, and
Chris@82	303 distributed transpose operations, with 1d block distributions.
Chris@82	304 (This is an alpha preview: routines have not been exhaustively
Chris@82	305 tested, documentation is incomplete, and some functionality is
Chris@82	306 missing, e.g. Fortran support.) See mpi/README and also the MPI
Chris@82	307 section of the manual.
Chris@82	308
Chris@82	309 * Significantly faster r2c/c2r transforms, especially on machines with SIMD.
Chris@82	310
Chris@82	311 * Rewritten multi-threaded support for better performance by
Chris@82	312 re-using a fixed pool of threads rather than continually
Chris@82	313 respawning and joining (which nowadays is much slower).
Chris@82	314
Chris@82	315 * Support for MIPS paired-single SIMD instructions, donated by
Chris@82	316 Codesourcery.
Chris@82	317
Chris@82	318 * FFTW_WISDOM_ONLY planner flag, to create plan only if wisdom is
Chris@82	319 available and return NULL otherwise.
Chris@82	320
Chris@82	321 * Removed k7 support, which only worked in 32-bit mode and is
Chris@82	322 becoming obsolete. Use --enable-sse instead.
Chris@82	323
Chris@82	324 * Added --with-g77-wrappers configure option to force inclusion
Chris@82	325 of g77 wrappers, in addition to whatever is needed for the
Chris@82	326 detected Fortran compilers. This is mainly intended for GNU/Linux
Chris@82	327 distros switching to gfortran that wish to include both
Chris@82	328 gfortran and g77 support in FFTW.
Chris@82	329
Chris@82	330 * In manual, renamed "guru execute" functions to "new-array execute"
Chris@82	331 functions, to reduce confusion with the guru planner interface.
Chris@82	332 (The programming interface is unchanged.)
Chris@82	333
Chris@82	334 * Add missing __declspec attribute to threads API functions when compiling
Chris@82	335 for Windows; thanks to Robert O. Morris for the bug report.
Chris@82	336
Chris@82	337 * Fixed missing return value from dfftw_init_threads in Fortran;
Chris@82	338 thanks to Markus Wetzstein for the bug report.
Chris@82	339
Chris@82	340 FFTW 3.1.3
Chris@82	341
Chris@82	342 * Bug fix: FFTW computes incorrect results when the user plans both
Chris@82	343 REDFT11 and RODFT11 transforms of certain sizes. The bug is caused
Chris@82	344 by incorrect sharing of twiddle-factor tables between the two
Chris@82	345 transforms, and only occurs when both are used. Thanks to Paul
Chris@82	346 A. Valiant for the bug report.
Chris@82	347
Chris@82	348 FFTW 3.1.2
Chris@82	349
Chris@82	350 * Correct bug in configure script: --enable-portable-binary option was ignored!
Chris@82	351 Thanks to Andrew Salamon for the bug report.
Chris@82	352
Chris@82	353 * Threads compilation fix on AIX: prefer xlc_r to cc_r, and don't use
Chris@82	354 either if we are using gcc. Thanks to Guy Moebs for the bug report.
Chris@82	355
Chris@82	356 * Updated FAQ to note that Apple gcc 4.0.1 on MacOS/Intel is broken,
Chris@82	357 and suggest a workaround. configure script now detects Core/Duo arch.
Chris@82	358
Chris@82	359 * Use -maltivec when checking for altivec.h. Fixes Gentoo bug #129304,
Chris@82	360 thanks to Markus Dittrich.
Chris@82	361
Chris@82	362 FFTW 3.1.1
Chris@82	363
Chris@82	364 * Performance improvements for Intel EMT64.
Chris@82	365
Chris@82	366 * Performance improvements for large-size transforms with SIMD.
Chris@82	367
Chris@82	368 * Cycle counter support for Intel icc and Visual C++ on x86-64.
Chris@82	369
Chris@82	370 * In fftw-wisdom tool, replaced obsolete --impatient with --measure.
Chris@82	371
Chris@82	372 * Fixed compilation failure with AIX/xlc; thanks to Joseph Thomas.
Chris@82	373
Chris@82	374 * Windows DLL support for Fortran API (added missing __declspec(dllexport)).
Chris@82	375
Chris@82	376 * SSE/SSE2 code works properly (i.e. disables itself) on older 386 and 486
Chris@82	377 CPUs lacking a CPUID instruction; thanks to Eric Korpela.
Chris@82	378
Chris@82	379 FFTW 3.1
Chris@82	380
Chris@82	381 * Faster FFTW_ESTIMATE planner.
Chris@82	382
Chris@82	383 * New (faster) algorithm for REDFT00/RODFT00 (type-I DCT/DST) of odd size.
Chris@82	384
Chris@82	385 * "4-step" algorithm for faster FFTs of very large sizes (> 2^18).
Chris@82	386
Chris@82	387 * Faster in-place real-data DFTs (for R2HC and HC2R r2r formats).
Chris@82	388
Chris@82	389 * Faster in-place non-square transpositions (FFTW uses these internally
Chris@82	390 for in-place FFTs, and you can also perform them explicitly using
Chris@82	391 the guru interface).
Chris@82	392
Chris@82	393 * Faster prime-size DFTs: implemented Bluestein's algorithm, as well
Chris@82	394 as a zero-padded Rader variant to limit recursive use of Rader's algorithm.
Chris@82	395
Chris@82	396 * SIMD support for split complex arrays.
Chris@82	397
Chris@82	398 * Much faster Altivec/VMX performance.
Chris@82	399
Chris@82	400 * New fftw_set_timelimit function to specify a (rough) upper bound to the
Chris@82	401 planning time (does not affect ESTIMATE mode).
Chris@82	402
Chris@82	403 * Removed --enable-3dnow support; use --enable-k7 instead.
Chris@82	404
Chris@82	405 * FMA (fused multiply-add) version is now included in "standard" FFTW,
Chris@82	406 and is enabled with --enable-fma (the default on PowerPC and Itanium).
Chris@82	407
Chris@82	408 * Automatic detection of native architecture flag for gcc. New
Chris@82	409 configure options: --enable-portable-binary and --with-gcc-arch=<arch>,
Chris@82	410 for people distributing compiled binaries of FFTW (see manual).
Chris@82	411
Chris@82	412 * Automatic detection of Altivec under Linux with gcc 3.4 (so that
Chris@82	413 same binary should work on both Altivec and non-Altivec PowerPCs).
Chris@82	414
Chris@82	415 * Compiler-specific tweaks/flags/workarounds for gcc 3.4, xlc, HP/UX,
Chris@82	416 Solaris/Intel.
Chris@82	417
Chris@82	418 * Various documentation clarifications.
Chris@82	419
Chris@82	420 * 64-bit clean. (Fixes a bug affecting the split guru planner on
Chris@82	421 64-bit machines, reported by David Necas.)
Chris@82	422
Chris@82	423 * Fixed Debian bug #259612: inadvertent use of SSE instructions on
Chris@82	424 non-SSE machines (causing a crash) for --enable-sse binaries.
Chris@82	425
Chris@82	426 * Fixed bug that caused HC2R transforms to destroy the input in
Chris@82	427 certain cases, even if the user specified FFTW_PRESERVE_INPUT.
Chris@82	428
Chris@82	429 * Fixed bug where wisdom would be lost under rare circumstances,
Chris@82	430 causing excessive planning time.
Chris@82	431
Chris@82	432 * FAQ notes bug in gcc-3.4.[1-3] that causes FFTW to crash with SSE/SSE2.
Chris@82	433
Chris@82	434 * Fixed accidentally exported symbol that prohibited simultaneous
Chris@82	435 linking to double/single multithreaded FFTW (thanks to Alessio Massaro).
Chris@82	436
Chris@82	437 * Support Win32 threads under MinGW (thanks to Alessio Massaro).
Chris@82	438
Chris@82	439 * Fixed problem with building DLL under Cygwin; thanks to Stephane Fillod.
Chris@82	440
Chris@82	441 * Fix build failure if no Fortran compiler is found (thanks to Charles
Chris@82	442 Radley for the bug report).
Chris@82	443
Chris@82	444 * Fixed compilation failure with icc 8.0 and SSE/SSE2. Automatic
Chris@82	445 detection of icc architecture flag (e.g. -xW).
Chris@82	446
Chris@82	447 * Fixed compilation with OpenMP on AIX (thanks to Greg Bauer).
Chris@82	448
Chris@82	449 * Fixed compilation failure on x86-64 with gcc (thanks to Orion Poplawski).
Chris@82	450
Chris@82	451 * Incorporated patch from FreeBSD ports (FreeBSD does not have memalign,
Chris@82	452 but its malloc is 16-byte aligned).
Chris@82	453
Chris@82	454 * Cycle-counter compilation fixes for Itanium, Alpha, x86-64, Sparc,
Chris@82	455 MacOS (thanks to Matt Boman, John Bowman, and James A. Treacy for
Chris@82	456 reports/fixes). Added x86-64 cycle counter for PGI compilers,
Chris@82	457 courtesy Cristiano Calonaci.
Chris@82	458
Chris@82	459 * Fix compilation problem in test program due to C99 conflict.
Chris@82	460
Chris@82	461 * Portability fix for import_system_wisdom with djgpp (thanks to Juan
Chris@82	462 Manuel Guerrero).
Chris@82	463
Chris@82	464 * Fixed compilation failure on MacOS 10.3 due to getopt conflict.
Chris@82	465
Chris@82	466 * Work around Visual C++ (version 6/7) bug in SSE compilation;
Chris@82	467 thanks to Eddie Yee for his detailed report.
Chris@82	468
Chris@82	469 Changes from FFTW 3.1 beta 2:
Chris@82	470
Chris@82	471 * Several minor compilation fixes.
Chris@82	472
Chris@82	473 * Eliminate FFTW_TIMELIMIT flag and replace fftw_timelimit global with
Chris@82	474 fftw_set_timelimit function. Make wisdom work with time-limited plans.
Chris@82	475
Chris@82	476 Changes from FFTW 3.1 beta 1:
Chris@82	477
Chris@82	478 * Fixes for creating DLLs under Windows; thanks to John Pavel for his feedback.
Chris@82	479
Chris@82	480 * Fixed more 64-bit problems, thanks to John Pavel for the bug report.
Chris@82	481
Chris@82	482 * Further speed improvements for Altivec/VMX.
Chris@82	483
Chris@82	484 * Further speed improvements for non-square transpositions.
Chris@82	485
Chris@82	486 * Many minor tweaks.
Chris@82	487
Chris@82	488 FFTW 3.0.1
Chris@82	489
Chris@82	490 * Some speed improvements in SIMD code.
Chris@82	491
Chris@82	492 * --without-cycle-counter option is removed. If no cycle counter is found,
Chris@82	493 then the estimator is always used. A --with-slow-timer option is provided
Chris@82	494 to force the use of lower-resolution timers.
Chris@82	495
Chris@82	496 * Several fixes for compilation under Visual C++, with help from Stefane Ruel.
Chris@82	497
Chris@82	498 * Added x86 cycle counter for Visual C++, with help from Morten Nissov.
Chris@82	499
Chris@82	500 * Added S390 cycle counter, courtesy of James Treacy.
Chris@82	501
Chris@82	502 * Added missing static keyword that prevented simultaneous linkage
Chris@82	503 of different-precision versions; thanks to Rasmus Larsen for the bug report.
Chris@82	504
Chris@82	505 * Corrected accidental omission of f77_wisdom.f file; thanks to Alan Watson.
Chris@82	506
Chris@82	507 * Support -xopenmp flag for SunOS; thanks to John Lou for the bug report.
Chris@82	508
Chris@82	509 * Compilation with HP/UX cc requires -Wp,-H128000 flag to increase
Chris@82	510 preprocessor limits; thanks to Peter Vouras for the bug report.
Chris@82	511
Chris@82	512 * Removed non-portable use of 'tempfile' in fftw-wisdom-to-conf script;
Chris@82	513 thanks to Nicolas Decoster for the patch.
Chris@82	514
Chris@82	515 * Added 'make smallcheck' target in tests/ directory, at the request of
Chris@82	516 James Treacy.
Chris@82	517
Chris@82	518 FFTW 3.0
Chris@82	519
Chris@82	520 Major goals of this release:
Chris@82	521
Chris@82	522 * Speed: often 20% or more faster than FFTW 2.x, even without SIMD (see below).
Chris@82	523
Chris@82	524 * Complete rewrite, to make it easier to add new algorithms and transforms.
Chris@82	525
Chris@82	526 * New API, to support more general semantics.
Chris@82	527
Chris@82	528 Other enhancements:
Chris@82	529
Chris@82	530 * SIMD acceleration on supporting CPUs (SSE, SSE2, 3DNow!, and AltiVec).
Chris@82	531 (With special thanks to Franz Franchetti for many experimental prototypes
Chris@82	532 and to Stefan Kral for the vectorizing generator from fftwgel.)
Chris@82	533
Chris@82	534 * True in-place 1d transforms of large sizes (as well as compressed
Chris@82	535 twiddle tables for additional memory/cache savings).
Chris@82	536
Chris@82	537 * More arbitrary placement of real & imaginary data, e.g. including
Chris@82	538 interleaved (as in FFTW 2.x) as well as separate real/imag arrays.
Chris@82	539
Chris@82	540 * Efficient prime-size transforms of real data.
Chris@82	541
Chris@82	542 * Multidimensional transforms can operate on a subset of a larger matrix,
Chris@82	543 and/or transform selected dimensions of a multidimensional array.
Chris@82	544
Chris@82	545 * By popular demand, simultaneous linking to double precision (fftw),
Chris@82	546 single precision (fftwf), and long-double precision (fftwl) versions
Chris@82	547 of FFTW is now supported.
Chris@82	548
Chris@82	549 * Cycle counters (on all modern CPUs) are exploited to speed planning.
Chris@82	550
Chris@82	551 * Efficient transforms of real even/odd arrays, a.k.a. discrete
Chris@82	552 cosine/sine transforms (types I-IV). (Currently work via pre/post
Chris@82	553 processing of real transforms, ala FFTPACK, so are not optimal.)
Chris@82	554
Chris@82	555 * DHTs (Discrete Hartley Transforms), again via post-processing
Chris@82	556 of real transforms (and thus suboptimal, for now).
Chris@82	557
Chris@82	558 * Support for linking to just those parts of FFTW that you need,
Chris@82	559 greatly reducing the size of statically linked programs when
Chris@82	560 only a limited set of transform sizes/types are required.
Chris@82	561
Chris@82	562 * Canonical global wisdom file (/etc/fftw/wisdom) on Unix, along
Chris@82	563 with a command-line tool (fftw-wisdom) to generate/update it.
Chris@82	564
Chris@82	565 * Fortran API can be used with both g77 and non-g77 compilers
Chris@82	566 simultaneously.
Chris@82	567
Chris@82	568 * Multi-threaded version has optional OpenMP support.
Chris@82	569
Chris@82	570 * Authors' good looks have greatly improved with age.
Chris@82	571
Chris@82	572 Changes from 3.0beta3:
Chris@82	573
Chris@82	574 * Separate FMA distribution to better exploit fused multiply-add instructions
Chris@82	575 on PowerPC (and possibly other) architectures.
Chris@82	576
Chris@82	577 * Performance improvements via some inlining tweaks.
Chris@82	578
Chris@82	579 * fftw_flops now returns double arguments, not int, to avoid overflows
Chris@82	580 for large sizes.
Chris@82	581
Chris@82	582 * Workarounds for automake bugs.
Chris@82	583
Chris@82	584 Changes from 3.0beta2:
Chris@82	585
Chris@82	586 * The standard REDFT00/RODFT00 (DCT-I/DST-I) algorithm (used in
Chris@82	587 FFTPACK, NR, etcetera) turns out to have poor numerical accuracy, so
Chris@82	588 we replaced it with a slower routine that is more accurate.
Chris@82	589
Chris@82	590 * The guru planner and execute functions now have two variants, one that
Chris@82	591 takes complex arguments and one that takes separate real/imag pointers.
Chris@82	592
Chris@82	593 * Execute and planner routines now automatically align the stack on x86,
Chris@82	594 in case the calling program is misaligned.
Chris@82	595
Chris@82	596 * README file for test program.
Chris@82	597
Chris@82	598 * Fixed bugs in the combination of SIMD with multi-threaded transforms.
Chris@82	599
Chris@82	600 * Eliminated internal fftw_threads_init function, which some people were
Chris@82	601 calling accidentally instead of the fftw_init_threads API function.
Chris@82	602
Chris@82	603 * Check for -openmp flag (Intel C compiler) when --enable-openmp is used.
Chris@82	604
Chris@82	605 * Support AMD x86-64 SIMD and cycle counter.
Chris@82	606
Chris@82	607 * Support SSE2 intrinsics in forthcoming gcc 3.3.
Chris@82	608
Chris@82	609 Changes from 3.0beta1:
Chris@82	610
Chris@82	611 * Faster in-place 1d transforms of non-power-of-two sizes.
Chris@82	612
Chris@82	613 * SIMD improvements for in-place, multi-dimensional, and/or non-FFTW_PATIENT
Chris@82	614 transforms.
Chris@82	615
Chris@82	616 * Added support for hard-coded DCT/DST/DHT codelets of small sizes; the
Chris@82	617 default distribution only includes hard-coded size-8 DCT-II/III, however.
Chris@82	618
Chris@82	619 * Many minor improvements to the manual. Added section on using the
Chris@82	620 codelet generator to customize and enhance FFTW.
Chris@82	621
Chris@82	622 * The default 'make check' should now only take a few minutes; for more
Chris@82	623 strenuous tests (which may take a day or so), do 'cd tests; make bigcheck'.
Chris@82	624
Chris@82	625 * fftw_print_plan is split into fftw_fprint_plan and fftw_print_plan, where
Chris@82	626 the latter uses stdout.
Chris@82	627
Chris@82	628 * Fixed ability to compile with a C++ compiler.
Chris@82	629
Chris@82	630 * Fixed support for C99 complex type under glibc.
Chris@82	631
Chris@82	632 * Fixed problems with alloca under MinGW, AIX.
Chris@82	633
Chris@82	634 * Workaround for gcc/SPARC bug.
Chris@82	635
Chris@82	636 * Fixed multi-threaded initialization failure on IRIX due to lack of
Chris@82	637 user-accessible PTHREAD_SCOPE_SYSTEM there.

Mercurial > hg > sv-dependency-builds

annotate src/fftw-3.3.8/NEWS @ 82:d0c2a83c1364