sv-dependency-builds: src/fftw-3.3.5/NEWS annotate

annotate src/fftw-3.3.5/NEWS @ 73:02caadb7509e

Rebuild with --disable-stack-protector for mingw32

author	Chris Cannam
date	Fri, 25 Jan 2019 14:31:07 +0000
parents	2cd0e3b3e1fd
children

rev	line source
Chris@42	1 FFTW 3.3.5:
Chris@42	2
Chris@42	3 * New SIMD support:
Chris@42	4 - Power8 VSX instructions in single and double precision.
Chris@42	5 To use, add --enable-vsx to configure.
Chris@42	6 - Support for AVX2 (256-bit FMA instructions).
Chris@42	7 To use, add --enable-avx2 to configure.
Chris@42	8 - Experimental support for AVX512 and KCVI. (--enable-avx512, --enable-kcvi)
Chris@42	9 This code is expected to work but the FFTW maintainers do not have
Chris@42	10 hardware to test it.
Chris@42	11 - Support for AVX128/FMA (for some AMD machines) (--enable-avx128-fma)
Chris@42	12 - Double precision Neon SIMD for aarch64.
Chris@42	13 This code is expected to work but the FFTW maintainers do not have
Chris@42	14 hardware to test it.
Chris@42	15 - generic SIMD support using gcc vector intrinsics
Chris@42	16 * Add fftw_make_planner_thread_safe() API
Chris@42	17 * fix #18 (disable float128 for CUDACC)
Chris@42	18 * fix #19: missing Fortran interface for fftwq_alloc_real
Chris@42	19 * fix #21 (don't use float128 on Portland compilers, which pretend to be gcc)
Chris@42	20 * fix: Avoid segfaults due to double free in MPI transpose
Chris@42	21
Chris@42	22 * Special note for distribution maintainers: Although FFTW supports a
Chris@42	23 zillion SIMD instruction sets, enabling them all at the same time is
Chris@42	24 a bad idea, because it increases the planning time for minimal gain.
Chris@42	25 We recommend that general-purpose x86 distributions only enable SSE2
Chris@42	26 and perhaps AVX. Users who care about the last ounce of performance
Chris@42	27 should recompile FFTW themselves.
Chris@42	28
Chris@42	29 FFTW 3.3.4
Chris@42	30
Chris@42	31 * New functions fftw_alignment_of (to check whether two arrays are
Chris@42	32 equally aligned for the purposes of applying a plan) and fftw_sprint_plan
Chris@42	33 (to output a description of plan to a string).
Chris@42	34
Chris@42	35 * Bugfix in fftw-wisdom-to-conf; thanks to Florian Oppermann for the
Chris@42	36 bug report.
Chris@42	37
Chris@42	38 * Fixed manual to work with texinfo-5.
Chris@42	39
Chris@42	40 * Increased timing interval on x86_64 to reduce timing errors.
Chris@42	41
Chris@42	42 * Default to Win32 threads, not pthreads, if both are present.
Chris@42	43
Chris@42	44 * Various build-script fixes.
Chris@42	45
Chris@42	46 FFTW 3.3.3
Chris@42	47
Chris@42	48 * Fix deadlock bug in MPI transforms (thanks to Michael Pippig for the
Chris@42	49 bug report and patch, and to Graham Dennis for the bug report).
Chris@42	50
Chris@42	51 * Use 128-bit ARM NEON instructions instead of 64-bits. This change
Chris@42	52 appears to speed up even ARM processors with a 64-bit NEON pipe.
Chris@42	53
Chris@42	54 * Speed improvements for single-precision AVX.
Chris@42	55
Chris@42	56 * Speed up planner on machines without "official" cycle counters, such as ARM.
Chris@42	57
Chris@42	58 FFTW 3.3.2
Chris@42	59
Chris@42	60 * Removed an archaic stack-alignment hack that was failing with
Chris@42	61 gcc-4.7/i386.
Chris@42	62
Chris@42	63 * Added stack-alignment hack necessary for gcc on Windows/i386. We
Chris@42	64 will regret this in ten years (see previous change).
Chris@42	65
Chris@42	66 * Fix incompatibility with Intel icc which pretends to be gcc
Chris@42	67 but does not support quad precision.
Chris@42	68
Chris@42	69 * make libfftw{threads,mpi} depend upon libfftw when using libtool;
Chris@42	70 this is consistent with most other libraries and simplifies the life
Chris@42	71 of various distributors of GNU/Linux.
Chris@42	72
Chris@42	73 FFTW 3.3.1
Chris@42	74
Chris@42	75 * Changes since 3.3.1-beta1:
Chris@42	76
Chris@42	77 - Reduced planning time in estimate mode for sizes with large
Chris@42	78 prime factors.
Chris@42	79
Chris@42	80 - Added AVX autodetection under Visual Studio. Thanks Carsten
Chris@42	81 Steger for submitting the necessary code.
Chris@42	82
Chris@42	83 - Modern Fortran interface now uses a separate fftw3l.f03 interface
Chris@42	84 file for the long double interface, which is not supported by
Chris@42	85 some Fortran compilers. Provided new fftw3q.f03 interface file
Chris@42	86 to access the quadruple-precision FFTW routines with recent
Chris@42	87 versions of gcc/gfortran.
Chris@42	88
Chris@42	89 * Added support for the NEON extensions to the ARM ISA. (Note to beta
Chris@42	90 users: an ARM cycle counter is not yet implemented; please contact
Chris@42	91 fftw@fftw.org if you know how to do it right.)
Chris@42	92
Chris@42	93 * MPI code now compiles even if mpicc is a C++ compiler; thanks to
Chris@42	94 Kyle Spyksma for the bug report.
Chris@42	95
Chris@42	96 FFTW 3.3
Chris@42	97
Chris@42	98 * Changes since 3.3-beta1:
Chris@42	99
Chris@42	100 - Compiling OpenMP support (--enable-openmp) now installs a
Chris@42	101 fftw3_omp library, instead of fftw3_threads, so that OpenMP
Chris@42	102 and POSIX threads (--enable-threads) libraries can be built
Chris@42	103 and installed at the same time.
Chris@42	104
Chris@42	105 - Various minor compilation fixes, corrections of manual typos, and
Chris@42	106 improvements to the benchmark test program.
Chris@42	107
Chris@42	108 * Add support for the AVX extensions to x86 and x86-64. The AVX code
Chris@42	109 works with 16-byte alignment (as opposed to 32-byte alignment),
Chris@42	110 so there is no ABI change compared to FFTW 3.2.2.
Chris@42	111
Chris@42	112 * Added Fortran 2003 interface, which should be usable on most modern
Chris@42	113 Fortran compilers (e.g. gfortran) and provides type-checked access
Chris@42	114 to the the C FFTW interface. (The legacy Fortran-77 interface is
Chris@42	115 still included also.)
Chris@42	116
Chris@42	117 * Added MPI distributed-memory transforms. Compared to 3.3alpha,
Chris@42	118 the major changes in the MPI transforms are:
Chris@42	119 - Fixed some deadlock and crashing bugs.
Chris@42	120 - Added Fortran 2003 interface.
Chris@42	121 - Added new-array execute functions for MPI plans.
Chris@42	122 - Eliminated use of large MPI tags, since Cray MPI requires tags < 2^24;
Chris@42	123 thanks to Jonathan Bentz for the bug report.
Chris@42	124 - Expanded documentation.
Chris@42	125 - 'make check' now runs MPI tests
Chris@42	126 - Some ABI changes - not binary-compatible with 3.3alpha MPI.
Chris@42	127
Chris@42	128 * Add support for quad-precision __float128 in gcc 4.6 or later (on x86.
Chris@42	129 x86-64, and Itanium). The new routines use the fftwq_ prefix.
Chris@42	130
Chris@42	131 * Removed support for MIPS paired-single instructions due to lack of
Chris@42	132 available hardware for testing. Users who want this functionality
Chris@42	133 should continue using FFTW 3.2.x. (Note that FFTW 3.3 still works
Chris@42	134 on MIPS; this only concerns special instructions available on some
Chris@42	135 MIPS chips.)
Chris@42	136
Chris@42	137 * Removed support for the Cell Broadband Engine. Cell users should
Chris@42	138 use FFTW 3.2.x.
Chris@42	139
Chris@42	140 * New convenience functions fftw_alloc_real and fftw_alloc_complex
Chris@42	141 to use fftw_malloc for real and complex arrays without typecasts
Chris@42	142 or sizeof.
Chris@42	143
Chris@42	144 * New convenience functions fftw_export_wisdom_to_filename and
Chris@42	145 fftw_import_wisdom_from_filename that export/import wisdom
Chris@42	146 to a file, which don't require you to open/close the file yourself.
Chris@42	147
Chris@42	148 * New function fftw_cost to return FFTW's internal cost metric for
Chris@42	149 a given plan; thanks to Rhys Ulerich and Nathanael Schaeffer for the
Chris@42	150 suggestion.
Chris@42	151
Chris@42	152 * The --enable-sse2 configure flag now works in both double and single
Chris@42	153 precision (and is equivalent to --enable-sse in the latter case).
Chris@42	154
Chris@42	155 * Remove --enable-portable-binary flag: we new produce portable binaries
Chris@42	156 by default.
Chris@42	157
Chris@42	158 * Remove the automatic detection of native architecture flag for gcc
Chris@42	159 which was introduced in fftw-3.1, since new gcc supports -mtune=native.
Chris@42	160 Remove the --with-gcc-arch flag; if you want to specify a particlar
Chris@42	161 arch to configure, use ./configure CC="gcc -mtune=...".
Chris@42	162
Chris@42	163 * --with-our-malloc16 configure flag is now renamed --with-our-malloc.
Chris@42	164
Chris@42	165 * Fixed build problem failure when srand48 declaration is missing;
Chris@42	166 thanks to Ralf Wildenhues for the bug report.
Chris@42	167
Chris@42	168 * Fixed bug in fftw_set_timelimit: ensure that a negative timelimit
Chris@42	169 is equivalent to no timelimit in all cases. Thanks to William Andrew
Chris@42	170 Burnson for the bug report.
Chris@42	171
Chris@42	172 * Fixed stack-overflow problem on OpenBSD caused by using alloca with
Chris@42	173 too large a buffer.
Chris@42	174
Chris@42	175 FFTW 3.2.2
Chris@42	176
Chris@42	177 * Improve performance of some copy operations of complex arrays on
Chris@42	178 x86 machines.
Chris@42	179
Chris@42	180 * Add configure flag to disable alloca(), which is broken in mingw64.
Chris@42	181
Chris@42	182 * Planning in FFTW_ESTIMATE mode for r2r transforms became slower
Chris@42	183 between fftw-3.1.3 and 3.2. This regression has now been fixed.
Chris@42	184
Chris@42	185 FFTW 3.2.1
Chris@42	186
Chris@42	187 * Performance improvements for some multidimensional r2c/c2r transforms;
Chris@42	188 thanks to Eugene Miloslavsky for his benchmark reports.
Chris@42	189
Chris@42	190 * Compile with icc on MacOS X, use better icc compiler flags.
Chris@42	191
Chris@42	192 * Compilation fixes for systems where snprintf is defined as a macro;
Chris@42	193 thanks to Marcus Mae for the bug report.
Chris@42	194
Chris@42	195 * Fortran documentation now recommends not using dfftw_execute,
Chris@42	196 because of reports of problems with various Fortran compilers;
Chris@42	197 it is better to use dfftw_execute_dft etcetera.
Chris@42	198
Chris@42	199 * Some documentation clarifications, e.g. of fact that --enable-openmp
Chris@42	200 and --enable-threads are mutually exclusive (thanks to Long To),
Chris@42	201 and document slightly odd behavior of plan_guru_r2r in Fortran
Chris@42	202 (thanks to Alexander Pozdneev).
Chris@42	203
Chris@42	204 * FAQ was accidentally omitted from 3.2 tarball.
Chris@42	205
Chris@42	206 * Remove some extraneous (harmless) files accidentally included in
Chris@42	207 a subdirectory of the 3.2 tarball.
Chris@42	208
Chris@42	209 FFTW 3.2
Chris@42	210
Chris@42	211 * Worked around apparent glibc bug that leads to rare hangs when freeing
Chris@42	212 semaphores.
Chris@42	213
Chris@42	214 * Fixed segfault due to unaligned access in certain obscure problems
Chris@42	215 that use SSE and multiple threads.
Chris@42	216
Chris@42	217 * MPI transforms not included, as they are still in alpha; the alpha
Chris@42	218 versions of the MPI transforms have been moved to FFTW 3.3alpha1.
Chris@42	219
Chris@42	220 FFTW 3.2alpha3
Chris@42	221
Chris@42	222 * Performance improvements for sizes with factors of 5 and 10.
Chris@42	223
Chris@42	224 * Documented FFTW_WISDOM_ONLY flag, at the suggestion of Mario
Chris@42	225 Emmenlauer and Phil Dumont.
Chris@42	226
Chris@42	227 * Port Cell code to SDK2.1 (libspe2), as opposed to the old libspe1 code.
Chris@42	228
Chris@42	229 * Performance improvements in Cell code for N < 32k, thanks to Jan Wagner
Chris@42	230 for the suggestions.
Chris@42	231
Chris@42	232 * Cycle counter for Sun x86_64 compiler, and compilation fix in cycle
Chris@42	233 counter for AIX/xlc (thanks to Jeff Haferman for the bug report).
Chris@42	234
Chris@42	235 * Fixed incorrect type prefix in MPI code that prevented wisdom routines
Chris@42	236 from working in single precision (thanks to Eric A. Borisch for the report).
Chris@42	237
Chris@42	238 * Added 'make check' for MPI code (which still fails in a couple corner
Chris@42	239 cases, but should be much better than in alpha2).
Chris@42	240
Chris@42	241 * Many other small fixes.
Chris@42	242
Chris@42	243 FFTW 3.2alpha2
Chris@42	244
Chris@42	245 * Support for the Cell processor, donated by IBM Research; see README.Cell
Chris@42	246 and the Cell section of the manual.
Chris@42	247
Chris@42	248 * New 64-bit API: for every "plan_guru" function there is a new "plan_guru64"
Chris@42	249 function with the same semantics, but which takes fftw_iodim64 instead of
Chris@42	250 fftw_iodim. fftw_iodim64 is the same as fftw_iodim, except that it takes
Chris@42	251 ptrdiff_t integer types as parameters, which is a 64-bit type on
Chris@42	252 64-bit machines. This is only useful for specifying very large transforms
Chris@42	253 on 64-bit machines. (Internally, FFTW uses ptrdiff_t everywhere
Chris@42	254 regardless of what API you choose.)
Chris@42	255
Chris@42	256 * Experimental MPI support. Complex one- and multi-dimensional FFTs,
Chris@42	257 multi-dimensional r2r, multi-dimensional r2c/c2r transforms, and
Chris@42	258 distributed transpose operations, with 1d block distributions.
Chris@42	259 (This is an alpha preview: routines have not been exhaustively
Chris@42	260 tested, documentation is incomplete, and some functionality is
Chris@42	261 missing, e.g. Fortran support.) See mpi/README and also the MPI
Chris@42	262 section of the manual.
Chris@42	263
Chris@42	264 * Significantly faster r2c/c2r transforms, especially on machines with SIMD.
Chris@42	265
Chris@42	266 * Rewritten multi-threaded support for better performance by
Chris@42	267 re-using a fixed pool of threads rather than continually
Chris@42	268 respawning and joining (which nowadays is much slower).
Chris@42	269
Chris@42	270 * Support for MIPS paired-single SIMD instructions, donated by
Chris@42	271 Codesourcery.
Chris@42	272
Chris@42	273 * FFTW_WISDOM_ONLY planner flag, to create plan only if wisdom is
Chris@42	274 available and return NULL otherwise.
Chris@42	275
Chris@42	276 * Removed k7 support, which only worked in 32-bit mode and is
Chris@42	277 becoming obsolete. Use --enable-sse instead.
Chris@42	278
Chris@42	279 * Added --with-g77-wrappers configure option to force inclusion
Chris@42	280 of g77 wrappers, in addition to whatever is needed for the
Chris@42	281 detected Fortran compilers. This is mainly intended for GNU/Linux
Chris@42	282 distros switching to gfortran that wish to include both
Chris@42	283 gfortran and g77 support in FFTW.
Chris@42	284
Chris@42	285 * In manual, renamed "guru execute" functions to "new-array execute"
Chris@42	286 functions, to reduce confusion with the guru planner interface.
Chris@42	287 (The programming interface is unchanged.)
Chris@42	288
Chris@42	289 * Add missing __declspec attribute to threads API functions when compiling
Chris@42	290 for Windows; thanks to Robert O. Morris for the bug report.
Chris@42	291
Chris@42	292 * Fixed missing return value from dfftw_init_threads in Fortran;
Chris@42	293 thanks to Markus Wetzstein for the bug report.
Chris@42	294
Chris@42	295 FFTW 3.1.3
Chris@42	296
Chris@42	297 * Bug fix: FFTW computes incorrect results when the user plans both
Chris@42	298 REDFT11 and RODFT11 transforms of certain sizes. The bug is caused
Chris@42	299 by incorrect sharing of twiddle-factor tables between the two
Chris@42	300 transforms, and only occurs when both are used. Thanks to Paul
Chris@42	301 A. Valiant for the bug report.
Chris@42	302
Chris@42	303 FFTW 3.1.2
Chris@42	304
Chris@42	305 * Correct bug in configure script: --enable-portable-binary option was ignored!
Chris@42	306 Thanks to Andrew Salamon for the bug report.
Chris@42	307
Chris@42	308 * Threads compilation fix on AIX: prefer xlc_r to cc_r, and don't use
Chris@42	309 either if we are using gcc. Thanks to Guy Moebs for the bug report.
Chris@42	310
Chris@42	311 * Updated FAQ to note that Apple gcc 4.0.1 on MacOS/Intel is broken,
Chris@42	312 and suggest a workaround. configure script now detects Core/Duo arch.
Chris@42	313
Chris@42	314 * Use -maltivec when checking for altivec.h. Fixes Gentoo bug #129304,
Chris@42	315 thanks to Markus Dittrich.
Chris@42	316
Chris@42	317 FFTW 3.1.1
Chris@42	318
Chris@42	319 * Performance improvements for Intel EMT64.
Chris@42	320
Chris@42	321 * Performance improvements for large-size transforms with SIMD.
Chris@42	322
Chris@42	323 * Cycle counter support for Intel icc and Visual C++ on x86-64.
Chris@42	324
Chris@42	325 * In fftw-wisdom tool, replaced obsolete --impatient with --measure.
Chris@42	326
Chris@42	327 * Fixed compilation failure with AIX/xlc; thanks to Joseph Thomas.
Chris@42	328
Chris@42	329 * Windows DLL support for Fortran API (added missing __declspec(dllexport)).
Chris@42	330
Chris@42	331 * SSE/SSE2 code works properly (i.e. disables itself) on older 386 and 486
Chris@42	332 CPUs lacking a CPUID instruction; thanks to Eric Korpela.
Chris@42	333
Chris@42	334 FFTW 3.1
Chris@42	335
Chris@42	336 * Faster FFTW_ESTIMATE planner.
Chris@42	337
Chris@42	338 * New (faster) algorithm for REDFT00/RODFT00 (type-I DCT/DST) of odd size.
Chris@42	339
Chris@42	340 * "4-step" algorithm for faster FFTs of very large sizes (> 2^18).
Chris@42	341
Chris@42	342 * Faster in-place real-data DFTs (for R2HC and HC2R r2r formats).
Chris@42	343
Chris@42	344 * Faster in-place non-square transpositions (FFTW uses these internally
Chris@42	345 for in-place FFTs, and you can also perform them explicitly using
Chris@42	346 the guru interface).
Chris@42	347
Chris@42	348 * Faster prime-size DFTs: implemented Bluestein's algorithm, as well
Chris@42	349 as a zero-padded Rader variant to limit recursive use of Rader's algorithm.
Chris@42	350
Chris@42	351 * SIMD support for split complex arrays.
Chris@42	352
Chris@42	353 * Much faster Altivec/VMX performance.
Chris@42	354
Chris@42	355 * New fftw_set_timelimit function to specify a (rough) upper bound to the
Chris@42	356 planning time (does not affect ESTIMATE mode).
Chris@42	357
Chris@42	358 * Removed --enable-3dnow support; use --enable-k7 instead.
Chris@42	359
Chris@42	360 * FMA (fused multiply-add) version is now included in "standard" FFTW,
Chris@42	361 and is enabled with --enable-fma (the default on PowerPC and Itanium).
Chris@42	362
Chris@42	363 * Automatic detection of native architecture flag for gcc. New
Chris@42	364 configure options: --enable-portable-binary and --with-gcc-arch=<arch>,
Chris@42	365 for people distributing compiled binaries of FFTW (see manual).
Chris@42	366
Chris@42	367 * Automatic detection of Altivec under Linux with gcc 3.4 (so that
Chris@42	368 same binary should work on both Altivec and non-Altivec PowerPCs).
Chris@42	369
Chris@42	370 * Compiler-specific tweaks/flags/workarounds for gcc 3.4, xlc, HP/UX,
Chris@42	371 Solaris/Intel.
Chris@42	372
Chris@42	373 * Various documentation clarifications.
Chris@42	374
Chris@42	375 * 64-bit clean. (Fixes a bug affecting the split guru planner on
Chris@42	376 64-bit machines, reported by David Necas.)
Chris@42	377
Chris@42	378 * Fixed Debian bug #259612: inadvertent use of SSE instructions on
Chris@42	379 non-SSE machines (causing a crash) for --enable-sse binaries.
Chris@42	380
Chris@42	381 * Fixed bug that caused HC2R transforms to destroy the input in
Chris@42	382 certain cases, even if the user specified FFTW_PRESERVE_INPUT.
Chris@42	383
Chris@42	384 * Fixed bug where wisdom would be lost under rare circumstances,
Chris@42	385 causing excessive planning time.
Chris@42	386
Chris@42	387 * FAQ notes bug in gcc-3.4.[1-3] that causes FFTW to crash with SSE/SSE2.
Chris@42	388
Chris@42	389 * Fixed accidentally exported symbol that prohibited simultaneous
Chris@42	390 linking to double/single multithreaded FFTW (thanks to Alessio Massaro).
Chris@42	391
Chris@42	392 * Support Win32 threads under MinGW (thanks to Alessio Massaro).
Chris@42	393
Chris@42	394 * Fixed problem with building DLL under Cygwin; thanks to Stephane Fillod.
Chris@42	395
Chris@42	396 * Fix build failure if no Fortran compiler is found (thanks to Charles
Chris@42	397 Radley for the bug report).
Chris@42	398
Chris@42	399 * Fixed compilation failure with icc 8.0 and SSE/SSE2. Automatic
Chris@42	400 detection of icc architecture flag (e.g. -xW).
Chris@42	401
Chris@42	402 * Fixed compilation with OpenMP on AIX (thanks to Greg Bauer).
Chris@42	403
Chris@42	404 * Fixed compilation failure on x86-64 with gcc (thanks to Orion Poplawski).
Chris@42	405
Chris@42	406 * Incorporated patch from FreeBSD ports (FreeBSD does not have memalign,
Chris@42	407 but its malloc is 16-byte aligned).
Chris@42	408
Chris@42	409 * Cycle-counter compilation fixes for Itanium, Alpha, x86-64, Sparc,
Chris@42	410 MacOS (thanks to Matt Boman, John Bowman, and James A. Treacy for
Chris@42	411 reports/fixes). Added x86-64 cycle counter for PGI compilers,
Chris@42	412 courtesy Cristiano Calonaci.
Chris@42	413
Chris@42	414 * Fix compilation problem in test program due to C99 conflict.
Chris@42	415
Chris@42	416 * Portability fix for import_system_wisdom with djgpp (thanks to Juan
Chris@42	417 Manuel Guerrero).
Chris@42	418
Chris@42	419 * Fixed compilation failure on MacOS 10.3 due to getopt conflict.
Chris@42	420
Chris@42	421 * Work around Visual C++ (version 6/7) bug in SSE compilation;
Chris@42	422 thanks to Eddie Yee for his detailed report.
Chris@42	423
Chris@42	424 Changes from FFTW 3.1 beta 2:
Chris@42	425
Chris@42	426 * Several minor compilation fixes.
Chris@42	427
Chris@42	428 * Eliminate FFTW_TIMELIMIT flag and replace fftw_timelimit global with
Chris@42	429 fftw_set_timelimit function. Make wisdom work with time-limited plans.
Chris@42	430
Chris@42	431 Changes from FFTW 3.1 beta 1:
Chris@42	432
Chris@42	433 * Fixes for creating DLLs under Windows; thanks to John Pavel for his feedback.
Chris@42	434
Chris@42	435 * Fixed more 64-bit problems, thanks to John Pavel for the bug report.
Chris@42	436
Chris@42	437 * Further speed improvements for Altivec/VMX.
Chris@42	438
Chris@42	439 * Further speed improvements for non-square transpositions.
Chris@42	440
Chris@42	441 * Many minor tweaks.
Chris@42	442
Chris@42	443 FFTW 3.0.1
Chris@42	444
Chris@42	445 * Some speed improvements in SIMD code.
Chris@42	446
Chris@42	447 * --without-cycle-counter option is removed. If no cycle counter is found,
Chris@42	448 then the estimator is always used. A --with-slow-timer option is provided
Chris@42	449 to force the use of lower-resolution timers.
Chris@42	450
Chris@42	451 * Several fixes for compilation under Visual C++, with help from Stefane Ruel.
Chris@42	452
Chris@42	453 * Added x86 cycle counter for Visual C++, with help from Morten Nissov.
Chris@42	454
Chris@42	455 * Added S390 cycle counter, courtesy of James Treacy.
Chris@42	456
Chris@42	457 * Added missing static keyword that prevented simultaneous linkage
Chris@42	458 of different-precision versions; thanks to Rasmus Larsen for the bug report.
Chris@42	459
Chris@42	460 * Corrected accidental omission of f77_wisdom.f file; thanks to Alan Watson.
Chris@42	461
Chris@42	462 * Support -xopenmp flag for SunOS; thanks to John Lou for the bug report.
Chris@42	463
Chris@42	464 * Compilation with HP/UX cc requires -Wp,-H128000 flag to increase
Chris@42	465 preprocessor limits; thanks to Peter Vouras for the bug report.
Chris@42	466
Chris@42	467 * Removed non-portable use of 'tempfile' in fftw-wisdom-to-conf script;
Chris@42	468 thanks to Nicolas Decoster for the patch.
Chris@42	469
Chris@42	470 * Added 'make smallcheck' target in tests/ directory, at the request of
Chris@42	471 James Treacy.
Chris@42	472
Chris@42	473 FFTW 3.0
Chris@42	474
Chris@42	475 Major goals of this release:
Chris@42	476
Chris@42	477 * Speed: often 20% or more faster than FFTW 2.x, even without SIMD (see below).
Chris@42	478
Chris@42	479 * Complete rewrite, to make it easier to add new algorithms and transforms.
Chris@42	480
Chris@42	481 * New API, to support more general semantics.
Chris@42	482
Chris@42	483 Other enhancements:
Chris@42	484
Chris@42	485 * SIMD acceleration on supporting CPUs (SSE, SSE2, 3DNow!, and AltiVec).
Chris@42	486 (With special thanks to Franz Franchetti for many experimental prototypes
Chris@42	487 and to Stefan Kral for the vectorizing generator from fftwgel.)
Chris@42	488
Chris@42	489 * True in-place 1d transforms of large sizes (as well as compressed
Chris@42	490 twiddle tables for additional memory/cache savings).
Chris@42	491
Chris@42	492 * More arbitrary placement of real & imaginary data, e.g. including
Chris@42	493 interleaved (as in FFTW 2.x) as well as separate real/imag arrays.
Chris@42	494
Chris@42	495 * Efficient prime-size transforms of real data.
Chris@42	496
Chris@42	497 * Multidimensional transforms can operate on a subset of a larger matrix,
Chris@42	498 and/or transform selected dimensions of a multidimensional array.
Chris@42	499
Chris@42	500 * By popular demand, simultaneous linking to double precision (fftw),
Chris@42	501 single precision (fftwf), and long-double precision (fftwl) versions
Chris@42	502 of FFTW is now supported.
Chris@42	503
Chris@42	504 * Cycle counters (on all modern CPUs) are exploited to speed planning.
Chris@42	505
Chris@42	506 * Efficient transforms of real even/odd arrays, a.k.a. discrete
Chris@42	507 cosine/sine transforms (types I-IV). (Currently work via pre/post
Chris@42	508 processing of real transforms, ala FFTPACK, so are not optimal.)
Chris@42	509
Chris@42	510 * DHTs (Discrete Hartley Transforms), again via post-processing
Chris@42	511 of real transforms (and thus suboptimal, for now).
Chris@42	512
Chris@42	513 * Support for linking to just those parts of FFTW that you need,
Chris@42	514 greatly reducing the size of statically linked programs when
Chris@42	515 only a limited set of transform sizes/types are required.
Chris@42	516
Chris@42	517 * Canonical global wisdom file (/etc/fftw/wisdom) on Unix, along
Chris@42	518 with a command-line tool (fftw-wisdom) to generate/update it.
Chris@42	519
Chris@42	520 * Fortran API can be used with both g77 and non-g77 compilers
Chris@42	521 simultaneously.
Chris@42	522
Chris@42	523 * Multi-threaded version has optional OpenMP support.
Chris@42	524
Chris@42	525 * Authors' good looks have greatly improved with age.
Chris@42	526
Chris@42	527 Changes from 3.0beta3:
Chris@42	528
Chris@42	529 * Separate FMA distribution to better exploit fused multiply-add instructions
Chris@42	530 on PowerPC (and possibly other) architectures.
Chris@42	531
Chris@42	532 * Performance improvements via some inlining tweaks.
Chris@42	533
Chris@42	534 * fftw_flops now returns double arguments, not int, to avoid overflows
Chris@42	535 for large sizes.
Chris@42	536
Chris@42	537 * Workarounds for automake bugs.
Chris@42	538
Chris@42	539 Changes from 3.0beta2:
Chris@42	540
Chris@42	541 * The standard REDFT00/RODFT00 (DCT-I/DST-I) algorithm (used in
Chris@42	542 FFTPACK, NR, etcetera) turns out to have poor numerical accuracy, so
Chris@42	543 we replaced it with a slower routine that is more accurate.
Chris@42	544
Chris@42	545 * The guru planner and execute functions now have two variants, one that
Chris@42	546 takes complex arguments and one that takes separate real/imag pointers.
Chris@42	547
Chris@42	548 * Execute and planner routines now automatically align the stack on x86,
Chris@42	549 in case the calling program is misaligned.
Chris@42	550
Chris@42	551 * README file for test program.
Chris@42	552
Chris@42	553 * Fixed bugs in the combination of SIMD with multi-threaded transforms.
Chris@42	554
Chris@42	555 * Eliminated internal fftw_threads_init function, which some people were
Chris@42	556 calling accidentally instead of the fftw_init_threads API function.
Chris@42	557
Chris@42	558 * Check for -openmp flag (Intel C compiler) when --enable-openmp is used.
Chris@42	559
Chris@42	560 * Support AMD x86-64 SIMD and cycle counter.
Chris@42	561
Chris@42	562 * Support SSE2 intrinsics in forthcoming gcc 3.3.
Chris@42	563
Chris@42	564 Changes from 3.0beta1:
Chris@42	565
Chris@42	566 * Faster in-place 1d transforms of non-power-of-two sizes.
Chris@42	567
Chris@42	568 * SIMD improvements for in-place, multi-dimensional, and/or non-FFTW_PATIENT
Chris@42	569 transforms.
Chris@42	570
Chris@42	571 * Added support for hard-coded DCT/DST/DHT codelets of small sizes; the
Chris@42	572 default distribution only includes hard-coded size-8 DCT-II/III, however.
Chris@42	573
Chris@42	574 * Many minor improvements to the manual. Added section on using the
Chris@42	575 codelet generator to customize and enhance FFTW.
Chris@42	576
Chris@42	577 * The default 'make check' should now only take a few minutes; for more
Chris@42	578 strenuous tests (which may take a day or so), do 'cd tests; make bigcheck'.
Chris@42	579
Chris@42	580 * fftw_print_plan is split into fftw_fprint_plan and fftw_print_plan, where
Chris@42	581 the latter uses stdout.
Chris@42	582
Chris@42	583 * Fixed ability to compile with a C++ compiler.
Chris@42	584
Chris@42	585 * Fixed support for C99 complex type under glibc.
Chris@42	586
Chris@42	587 * Fixed problems with alloca under MinGW, AIX.
Chris@42	588
Chris@42	589 * Workaround for gcc/SPARC bug.
Chris@42	590
Chris@42	591 * Fixed multi-threaded initialization failure on IRIX due to lack of
Chris@42	592 user-accessible PTHREAD_SCOPE_SYSTEM there.

Mercurial > hg > sv-dependency-builds

annotate src/fftw-3.3.5/NEWS @ 73:02caadb7509e