Mercurial > hg > sv-dependency-builds
comparison src/fftw-3.3.8/NEWS @ 82:d0c2a83c1364
Add FFTW 3.3.8 source, and a Linux build
author | Chris Cannam |
---|---|
date | Tue, 19 Nov 2019 14:52:55 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
81:7029a4916348 | 82:d0c2a83c1364 |
---|---|
1 FFTW 3.3.8: | |
2 | |
3 * Fixed AVX, AVX2 for gcc-8. | |
4 | |
5 By default, FFTW 3.3.7 was broken with gcc-8. AVX and AVX2 code | |
6 assumed that the compiler honors the distinction between +0 and -0, | |
7 but gcc-8 -ffast-math does not. The default CFLAGS included -ffast-math. | |
8 This release ensures that FFTW works with gcc-8 -ffast-math, and | |
9 removes -ffast-math from the default CFLAGS for good measure. | |
10 | |
11 FFTW 3.3.7: | |
12 | |
13 * Experimental support for CMake. | |
14 | |
15 The primary build mechanism for FFTW remains GNU autoconf/automake. | |
16 CMake support is meant to offer an easy way to compile FFTW on | |
17 Windows, and as such it does not cover all the features of the | |
18 automake build system, such as exotic cycle counters, | |
19 cross-compiling, or build of binaries for a mixture of ISA's | |
20 (e.g., amd64 vs amd64+avx vs amd64+avx2). Patches are welcome. | |
21 | |
22 * Fixes for armv7a cycle counter. | |
23 * Official support for aarch64, now that we have hardware to test it. | |
24 * Tweak usage of FMA instructions in a way that favors newer processors | |
25 (Skylake and Ryzen) over older processors (Haswell). | |
26 * tests/bench: use 64-bit precision to compute mflops. | |
27 | |
28 FFTW 3.3.6-pl2: | |
29 | |
30 * Bugfix: MPI Fortran-03 headers were missing in FFTW 3.3.6-pl1. | |
31 | |
32 FFTW 3.3.6-pl1: | |
33 | |
34 * Bugfix: FFTW 3.3.6 had the wrong libtool version number, and generated | |
35 shared libraries of the form libfftw3.so.2.6.6 instead of | |
36 libfftw3.so.3.*. | |
37 | |
38 FFTW 3.3.6: | |
39 | |
40 * The fftw_make_planner_thread_safe() API introduced in 3.3.5 didn't | |
41 work, and this 3.3.6 fixes it. Sorry about that. | |
42 * compilation fixes for IBM XLC | |
43 * compilation fixes for threads on Windows | |
44 * fix SIMD autodetection on amd64 when (_MSC_VER > 1500) | |
45 | |
46 FFTW 3.3.5: | |
47 | |
48 * New SIMD support: | |
49 - Power8 VSX instructions in single and double precision. | |
50 To use, add --enable-vsx to configure. | |
51 - Support for AVX2 (256-bit FMA instructions). | |
52 To use, add --enable-avx2 to configure. | |
53 - Experimental support for AVX512 and KCVI. (--enable-avx512, --enable-kcvi) | |
54 This code is expected to work but the FFTW maintainers do not have | |
55 hardware to test it. | |
56 - Support for AVX128/FMA (for some AMD machines) (--enable-avx128-fma) | |
57 - Double precision Neon SIMD for aarch64. | |
58 This code is expected to work but the FFTW maintainers do not have | |
59 hardware to test it. | |
60 - generic SIMD support using gcc vector intrinsics | |
61 * Add fftw_make_planner_thread_safe() API | |
62 * fix #18 (disable float128 for CUDACC) | |
63 * fix #19: missing Fortran interface for fftwq_alloc_real | |
64 * fix #21 (don't use float128 on Portland compilers, which pretend to be gcc) | |
65 * fix: Avoid segfaults due to double free in MPI transpose | |
66 | |
67 * Special note for distribution maintainers: Although FFTW supports a | |
68 zillion SIMD instruction sets, enabling them all at the same time is | |
69 a bad idea, because it increases the planning time for minimal gain. | |
70 We recommend that general-purpose x86 distributions only enable SSE2 | |
71 and perhaps AVX. Users who care about the last ounce of performance | |
72 should recompile FFTW themselves. | |
73 | |
74 FFTW 3.3.4 | |
75 | |
76 * New functions fftw_alignment_of (to check whether two arrays are | |
77 equally aligned for the purposes of applying a plan) and fftw_sprint_plan | |
78 (to output a description of plan to a string). | |
79 | |
80 * Bugfix in fftw-wisdom-to-conf; thanks to Florian Oppermann for the | |
81 bug report. | |
82 | |
83 * Fixed manual to work with texinfo-5. | |
84 | |
85 * Increased timing interval on x86_64 to reduce timing errors. | |
86 | |
87 * Default to Win32 threads, not pthreads, if both are present. | |
88 | |
89 * Various build-script fixes. | |
90 | |
91 FFTW 3.3.3 | |
92 | |
93 * Fix deadlock bug in MPI transforms (thanks to Michael Pippig for the | |
94 bug report and patch, and to Graham Dennis for the bug report). | |
95 | |
96 * Use 128-bit ARM NEON instructions instead of 64-bits. This change | |
97 appears to speed up even ARM processors with a 64-bit NEON pipe. | |
98 | |
99 * Speed improvements for single-precision AVX. | |
100 | |
101 * Speed up planner on machines without "official" cycle counters, such as ARM. | |
102 | |
103 FFTW 3.3.2 | |
104 | |
105 * Removed an archaic stack-alignment hack that was failing with | |
106 gcc-4.7/i386. | |
107 | |
108 * Added stack-alignment hack necessary for gcc on Windows/i386. We | |
109 will regret this in ten years (see previous change). | |
110 | |
111 * Fix incompatibility with Intel icc which pretends to be gcc | |
112 but does not support quad precision. | |
113 | |
114 * make libfftw{threads,mpi} depend upon libfftw when using libtool; | |
115 this is consistent with most other libraries and simplifies the life | |
116 of various distributors of GNU/Linux. | |
117 | |
118 FFTW 3.3.1 | |
119 | |
120 * Changes since 3.3.1-beta1: | |
121 | |
122 - Reduced planning time in estimate mode for sizes with large | |
123 prime factors. | |
124 | |
125 - Added AVX autodetection under Visual Studio. Thanks Carsten | |
126 Steger for submitting the necessary code. | |
127 | |
128 - Modern Fortran interface now uses a separate fftw3l.f03 interface | |
129 file for the long double interface, which is not supported by | |
130 some Fortran compilers. Provided new fftw3q.f03 interface file | |
131 to access the quadruple-precision FFTW routines with recent | |
132 versions of gcc/gfortran. | |
133 | |
134 * Added support for the NEON extensions to the ARM ISA. (Note to beta | |
135 users: an ARM cycle counter is not yet implemented; please contact | |
136 fftw@fftw.org if you know how to do it right.) | |
137 | |
138 * MPI code now compiles even if mpicc is a C++ compiler; thanks to | |
139 Kyle Spyksma for the bug report. | |
140 | |
141 FFTW 3.3 | |
142 | |
143 * Changes since 3.3-beta1: | |
144 | |
145 - Compiling OpenMP support (--enable-openmp) now installs a | |
146 fftw3_omp library, instead of fftw3_threads, so that OpenMP | |
147 and POSIX threads (--enable-threads) libraries can be built | |
148 and installed at the same time. | |
149 | |
150 - Various minor compilation fixes, corrections of manual typos, and | |
151 improvements to the benchmark test program. | |
152 | |
153 * Add support for the AVX extensions to x86 and x86-64. The AVX code | |
154 works with 16-byte alignment (as opposed to 32-byte alignment), | |
155 so there is no ABI change compared to FFTW 3.2.2. | |
156 | |
157 * Added Fortran 2003 interface, which should be usable on most modern | |
158 Fortran compilers (e.g. gfortran) and provides type-checked access | |
159 to the the C FFTW interface. (The legacy Fortran-77 interface is | |
160 still included also.) | |
161 | |
162 * Added MPI distributed-memory transforms. Compared to 3.3alpha, | |
163 the major changes in the MPI transforms are: | |
164 - Fixed some deadlock and crashing bugs. | |
165 - Added Fortran 2003 interface. | |
166 - Added new-array execute functions for MPI plans. | |
167 - Eliminated use of large MPI tags, since Cray MPI requires tags < 2^24; | |
168 thanks to Jonathan Bentz for the bug report. | |
169 - Expanded documentation. | |
170 - 'make check' now runs MPI tests | |
171 - Some ABI changes - not binary-compatible with 3.3alpha MPI. | |
172 | |
173 * Add support for quad-precision __float128 in gcc 4.6 or later (on x86. | |
174 x86-64, and Itanium). The new routines use the fftwq_ prefix. | |
175 | |
176 * Removed support for MIPS paired-single instructions due to lack of | |
177 available hardware for testing. Users who want this functionality | |
178 should continue using FFTW 3.2.x. (Note that FFTW 3.3 still works | |
179 on MIPS; this only concerns special instructions available on some | |
180 MIPS chips.) | |
181 | |
182 * Removed support for the Cell Broadband Engine. Cell users should | |
183 use FFTW 3.2.x. | |
184 | |
185 * New convenience functions fftw_alloc_real and fftw_alloc_complex | |
186 to use fftw_malloc for real and complex arrays without typecasts | |
187 or sizeof. | |
188 | |
189 * New convenience functions fftw_export_wisdom_to_filename and | |
190 fftw_import_wisdom_from_filename that export/import wisdom | |
191 to a file, which don't require you to open/close the file yourself. | |
192 | |
193 * New function fftw_cost to return FFTW's internal cost metric for | |
194 a given plan; thanks to Rhys Ulerich and Nathanael Schaeffer for the | |
195 suggestion. | |
196 | |
197 * The --enable-sse2 configure flag now works in both double and single | |
198 precision (and is equivalent to --enable-sse in the latter case). | |
199 | |
200 * Remove --enable-portable-binary flag: we new produce portable binaries | |
201 by default. | |
202 | |
203 * Remove the automatic detection of native architecture flag for gcc | |
204 which was introduced in fftw-3.1, since new gcc supports -mtune=native. | |
205 Remove the --with-gcc-arch flag; if you want to specify a particlar | |
206 arch to configure, use ./configure CC="gcc -mtune=...". | |
207 | |
208 * --with-our-malloc16 configure flag is now renamed --with-our-malloc. | |
209 | |
210 * Fixed build problem failure when srand48 declaration is missing; | |
211 thanks to Ralf Wildenhues for the bug report. | |
212 | |
213 * Fixed bug in fftw_set_timelimit: ensure that a negative timelimit | |
214 is equivalent to no timelimit in all cases. Thanks to William Andrew | |
215 Burnson for the bug report. | |
216 | |
217 * Fixed stack-overflow problem on OpenBSD caused by using alloca with | |
218 too large a buffer. | |
219 | |
220 FFTW 3.2.2 | |
221 | |
222 * Improve performance of some copy operations of complex arrays on | |
223 x86 machines. | |
224 | |
225 * Add configure flag to disable alloca(), which is broken in mingw64. | |
226 | |
227 * Planning in FFTW_ESTIMATE mode for r2r transforms became slower | |
228 between fftw-3.1.3 and 3.2. This regression has now been fixed. | |
229 | |
230 FFTW 3.2.1 | |
231 | |
232 * Performance improvements for some multidimensional r2c/c2r transforms; | |
233 thanks to Eugene Miloslavsky for his benchmark reports. | |
234 | |
235 * Compile with icc on MacOS X, use better icc compiler flags. | |
236 | |
237 * Compilation fixes for systems where snprintf is defined as a macro; | |
238 thanks to Marcus Mae for the bug report. | |
239 | |
240 * Fortran documentation now recommends not using dfftw_execute, | |
241 because of reports of problems with various Fortran compilers; | |
242 it is better to use dfftw_execute_dft etcetera. | |
243 | |
244 * Some documentation clarifications, e.g. of fact that --enable-openmp | |
245 and --enable-threads are mutually exclusive (thanks to Long To), | |
246 and document slightly odd behavior of plan_guru_r2r in Fortran | |
247 (thanks to Alexander Pozdneev). | |
248 | |
249 * FAQ was accidentally omitted from 3.2 tarball. | |
250 | |
251 * Remove some extraneous (harmless) files accidentally included in | |
252 a subdirectory of the 3.2 tarball. | |
253 | |
254 FFTW 3.2 | |
255 | |
256 * Worked around apparent glibc bug that leads to rare hangs when freeing | |
257 semaphores. | |
258 | |
259 * Fixed segfault due to unaligned access in certain obscure problems | |
260 that use SSE and multiple threads. | |
261 | |
262 * MPI transforms not included, as they are still in alpha; the alpha | |
263 versions of the MPI transforms have been moved to FFTW 3.3alpha1. | |
264 | |
265 FFTW 3.2alpha3 | |
266 | |
267 * Performance improvements for sizes with factors of 5 and 10. | |
268 | |
269 * Documented FFTW_WISDOM_ONLY flag, at the suggestion of Mario | |
270 Emmenlauer and Phil Dumont. | |
271 | |
272 * Port Cell code to SDK2.1 (libspe2), as opposed to the old libspe1 code. | |
273 | |
274 * Performance improvements in Cell code for N < 32k, thanks to Jan Wagner | |
275 for the suggestions. | |
276 | |
277 * Cycle counter for Sun x86_64 compiler, and compilation fix in cycle | |
278 counter for AIX/xlc (thanks to Jeff Haferman for the bug report). | |
279 | |
280 * Fixed incorrect type prefix in MPI code that prevented wisdom routines | |
281 from working in single precision (thanks to Eric A. Borisch for the report). | |
282 | |
283 * Added 'make check' for MPI code (which still fails in a couple corner | |
284 cases, but should be much better than in alpha2). | |
285 | |
286 * Many other small fixes. | |
287 | |
288 FFTW 3.2alpha2 | |
289 | |
290 * Support for the Cell processor, donated by IBM Research; see README.Cell | |
291 and the Cell section of the manual. | |
292 | |
293 * New 64-bit API: for every "plan_guru" function there is a new "plan_guru64" | |
294 function with the same semantics, but which takes fftw_iodim64 instead of | |
295 fftw_iodim. fftw_iodim64 is the same as fftw_iodim, except that it takes | |
296 ptrdiff_t integer types as parameters, which is a 64-bit type on | |
297 64-bit machines. This is only useful for specifying very large transforms | |
298 on 64-bit machines. (Internally, FFTW uses ptrdiff_t everywhere | |
299 regardless of what API you choose.) | |
300 | |
301 * Experimental MPI support. Complex one- and multi-dimensional FFTs, | |
302 multi-dimensional r2r, multi-dimensional r2c/c2r transforms, and | |
303 distributed transpose operations, with 1d block distributions. | |
304 (This is an alpha preview: routines have not been exhaustively | |
305 tested, documentation is incomplete, and some functionality is | |
306 missing, e.g. Fortran support.) See mpi/README and also the MPI | |
307 section of the manual. | |
308 | |
309 * Significantly faster r2c/c2r transforms, especially on machines with SIMD. | |
310 | |
311 * Rewritten multi-threaded support for better performance by | |
312 re-using a fixed pool of threads rather than continually | |
313 respawning and joining (which nowadays is much slower). | |
314 | |
315 * Support for MIPS paired-single SIMD instructions, donated by | |
316 Codesourcery. | |
317 | |
318 * FFTW_WISDOM_ONLY planner flag, to create plan only if wisdom is | |
319 available and return NULL otherwise. | |
320 | |
321 * Removed k7 support, which only worked in 32-bit mode and is | |
322 becoming obsolete. Use --enable-sse instead. | |
323 | |
324 * Added --with-g77-wrappers configure option to force inclusion | |
325 of g77 wrappers, in addition to whatever is needed for the | |
326 detected Fortran compilers. This is mainly intended for GNU/Linux | |
327 distros switching to gfortran that wish to include both | |
328 gfortran and g77 support in FFTW. | |
329 | |
330 * In manual, renamed "guru execute" functions to "new-array execute" | |
331 functions, to reduce confusion with the guru planner interface. | |
332 (The programming interface is unchanged.) | |
333 | |
334 * Add missing __declspec attribute to threads API functions when compiling | |
335 for Windows; thanks to Robert O. Morris for the bug report. | |
336 | |
337 * Fixed missing return value from dfftw_init_threads in Fortran; | |
338 thanks to Markus Wetzstein for the bug report. | |
339 | |
340 FFTW 3.1.3 | |
341 | |
342 * Bug fix: FFTW computes incorrect results when the user plans both | |
343 REDFT11 and RODFT11 transforms of certain sizes. The bug is caused | |
344 by incorrect sharing of twiddle-factor tables between the two | |
345 transforms, and only occurs when both are used. Thanks to Paul | |
346 A. Valiant for the bug report. | |
347 | |
348 FFTW 3.1.2 | |
349 | |
350 * Correct bug in configure script: --enable-portable-binary option was ignored! | |
351 Thanks to Andrew Salamon for the bug report. | |
352 | |
353 * Threads compilation fix on AIX: prefer xlc_r to cc_r, and don't use | |
354 either if we are using gcc. Thanks to Guy Moebs for the bug report. | |
355 | |
356 * Updated FAQ to note that Apple gcc 4.0.1 on MacOS/Intel is broken, | |
357 and suggest a workaround. configure script now detects Core/Duo arch. | |
358 | |
359 * Use -maltivec when checking for altivec.h. Fixes Gentoo bug #129304, | |
360 thanks to Markus Dittrich. | |
361 | |
362 FFTW 3.1.1 | |
363 | |
364 * Performance improvements for Intel EMT64. | |
365 | |
366 * Performance improvements for large-size transforms with SIMD. | |
367 | |
368 * Cycle counter support for Intel icc and Visual C++ on x86-64. | |
369 | |
370 * In fftw-wisdom tool, replaced obsolete --impatient with --measure. | |
371 | |
372 * Fixed compilation failure with AIX/xlc; thanks to Joseph Thomas. | |
373 | |
374 * Windows DLL support for Fortran API (added missing __declspec(dllexport)). | |
375 | |
376 * SSE/SSE2 code works properly (i.e. disables itself) on older 386 and 486 | |
377 CPUs lacking a CPUID instruction; thanks to Eric Korpela. | |
378 | |
379 FFTW 3.1 | |
380 | |
381 * Faster FFTW_ESTIMATE planner. | |
382 | |
383 * New (faster) algorithm for REDFT00/RODFT00 (type-I DCT/DST) of odd size. | |
384 | |
385 * "4-step" algorithm for faster FFTs of very large sizes (> 2^18). | |
386 | |
387 * Faster in-place real-data DFTs (for R2HC and HC2R r2r formats). | |
388 | |
389 * Faster in-place non-square transpositions (FFTW uses these internally | |
390 for in-place FFTs, and you can also perform them explicitly using | |
391 the guru interface). | |
392 | |
393 * Faster prime-size DFTs: implemented Bluestein's algorithm, as well | |
394 as a zero-padded Rader variant to limit recursive use of Rader's algorithm. | |
395 | |
396 * SIMD support for split complex arrays. | |
397 | |
398 * Much faster Altivec/VMX performance. | |
399 | |
400 * New fftw_set_timelimit function to specify a (rough) upper bound to the | |
401 planning time (does not affect ESTIMATE mode). | |
402 | |
403 * Removed --enable-3dnow support; use --enable-k7 instead. | |
404 | |
405 * FMA (fused multiply-add) version is now included in "standard" FFTW, | |
406 and is enabled with --enable-fma (the default on PowerPC and Itanium). | |
407 | |
408 * Automatic detection of native architecture flag for gcc. New | |
409 configure options: --enable-portable-binary and --with-gcc-arch=<arch>, | |
410 for people distributing compiled binaries of FFTW (see manual). | |
411 | |
412 * Automatic detection of Altivec under Linux with gcc 3.4 (so that | |
413 same binary should work on both Altivec and non-Altivec PowerPCs). | |
414 | |
415 * Compiler-specific tweaks/flags/workarounds for gcc 3.4, xlc, HP/UX, | |
416 Solaris/Intel. | |
417 | |
418 * Various documentation clarifications. | |
419 | |
420 * 64-bit clean. (Fixes a bug affecting the split guru planner on | |
421 64-bit machines, reported by David Necas.) | |
422 | |
423 * Fixed Debian bug #259612: inadvertent use of SSE instructions on | |
424 non-SSE machines (causing a crash) for --enable-sse binaries. | |
425 | |
426 * Fixed bug that caused HC2R transforms to destroy the input in | |
427 certain cases, even if the user specified FFTW_PRESERVE_INPUT. | |
428 | |
429 * Fixed bug where wisdom would be lost under rare circumstances, | |
430 causing excessive planning time. | |
431 | |
432 * FAQ notes bug in gcc-3.4.[1-3] that causes FFTW to crash with SSE/SSE2. | |
433 | |
434 * Fixed accidentally exported symbol that prohibited simultaneous | |
435 linking to double/single multithreaded FFTW (thanks to Alessio Massaro). | |
436 | |
437 * Support Win32 threads under MinGW (thanks to Alessio Massaro). | |
438 | |
439 * Fixed problem with building DLL under Cygwin; thanks to Stephane Fillod. | |
440 | |
441 * Fix build failure if no Fortran compiler is found (thanks to Charles | |
442 Radley for the bug report). | |
443 | |
444 * Fixed compilation failure with icc 8.0 and SSE/SSE2. Automatic | |
445 detection of icc architecture flag (e.g. -xW). | |
446 | |
447 * Fixed compilation with OpenMP on AIX (thanks to Greg Bauer). | |
448 | |
449 * Fixed compilation failure on x86-64 with gcc (thanks to Orion Poplawski). | |
450 | |
451 * Incorporated patch from FreeBSD ports (FreeBSD does not have memalign, | |
452 but its malloc is 16-byte aligned). | |
453 | |
454 * Cycle-counter compilation fixes for Itanium, Alpha, x86-64, Sparc, | |
455 MacOS (thanks to Matt Boman, John Bowman, and James A. Treacy for | |
456 reports/fixes). Added x86-64 cycle counter for PGI compilers, | |
457 courtesy Cristiano Calonaci. | |
458 | |
459 * Fix compilation problem in test program due to C99 conflict. | |
460 | |
461 * Portability fix for import_system_wisdom with djgpp (thanks to Juan | |
462 Manuel Guerrero). | |
463 | |
464 * Fixed compilation failure on MacOS 10.3 due to getopt conflict. | |
465 | |
466 * Work around Visual C++ (version 6/7) bug in SSE compilation; | |
467 thanks to Eddie Yee for his detailed report. | |
468 | |
469 Changes from FFTW 3.1 beta 2: | |
470 | |
471 * Several minor compilation fixes. | |
472 | |
473 * Eliminate FFTW_TIMELIMIT flag and replace fftw_timelimit global with | |
474 fftw_set_timelimit function. Make wisdom work with time-limited plans. | |
475 | |
476 Changes from FFTW 3.1 beta 1: | |
477 | |
478 * Fixes for creating DLLs under Windows; thanks to John Pavel for his feedback. | |
479 | |
480 * Fixed more 64-bit problems, thanks to John Pavel for the bug report. | |
481 | |
482 * Further speed improvements for Altivec/VMX. | |
483 | |
484 * Further speed improvements for non-square transpositions. | |
485 | |
486 * Many minor tweaks. | |
487 | |
488 FFTW 3.0.1 | |
489 | |
490 * Some speed improvements in SIMD code. | |
491 | |
492 * --without-cycle-counter option is removed. If no cycle counter is found, | |
493 then the estimator is always used. A --with-slow-timer option is provided | |
494 to force the use of lower-resolution timers. | |
495 | |
496 * Several fixes for compilation under Visual C++, with help from Stefane Ruel. | |
497 | |
498 * Added x86 cycle counter for Visual C++, with help from Morten Nissov. | |
499 | |
500 * Added S390 cycle counter, courtesy of James Treacy. | |
501 | |
502 * Added missing static keyword that prevented simultaneous linkage | |
503 of different-precision versions; thanks to Rasmus Larsen for the bug report. | |
504 | |
505 * Corrected accidental omission of f77_wisdom.f file; thanks to Alan Watson. | |
506 | |
507 * Support -xopenmp flag for SunOS; thanks to John Lou for the bug report. | |
508 | |
509 * Compilation with HP/UX cc requires -Wp,-H128000 flag to increase | |
510 preprocessor limits; thanks to Peter Vouras for the bug report. | |
511 | |
512 * Removed non-portable use of 'tempfile' in fftw-wisdom-to-conf script; | |
513 thanks to Nicolas Decoster for the patch. | |
514 | |
515 * Added 'make smallcheck' target in tests/ directory, at the request of | |
516 James Treacy. | |
517 | |
518 FFTW 3.0 | |
519 | |
520 Major goals of this release: | |
521 | |
522 * Speed: often 20% or more faster than FFTW 2.x, even without SIMD (see below). | |
523 | |
524 * Complete rewrite, to make it easier to add new algorithms and transforms. | |
525 | |
526 * New API, to support more general semantics. | |
527 | |
528 Other enhancements: | |
529 | |
530 * SIMD acceleration on supporting CPUs (SSE, SSE2, 3DNow!, and AltiVec). | |
531 (With special thanks to Franz Franchetti for many experimental prototypes | |
532 and to Stefan Kral for the vectorizing generator from fftwgel.) | |
533 | |
534 * True in-place 1d transforms of large sizes (as well as compressed | |
535 twiddle tables for additional memory/cache savings). | |
536 | |
537 * More arbitrary placement of real & imaginary data, e.g. including | |
538 interleaved (as in FFTW 2.x) as well as separate real/imag arrays. | |
539 | |
540 * Efficient prime-size transforms of real data. | |
541 | |
542 * Multidimensional transforms can operate on a subset of a larger matrix, | |
543 and/or transform selected dimensions of a multidimensional array. | |
544 | |
545 * By popular demand, simultaneous linking to double precision (fftw), | |
546 single precision (fftwf), and long-double precision (fftwl) versions | |
547 of FFTW is now supported. | |
548 | |
549 * Cycle counters (on all modern CPUs) are exploited to speed planning. | |
550 | |
551 * Efficient transforms of real even/odd arrays, a.k.a. discrete | |
552 cosine/sine transforms (types I-IV). (Currently work via pre/post | |
553 processing of real transforms, ala FFTPACK, so are not optimal.) | |
554 | |
555 * DHTs (Discrete Hartley Transforms), again via post-processing | |
556 of real transforms (and thus suboptimal, for now). | |
557 | |
558 * Support for linking to just those parts of FFTW that you need, | |
559 greatly reducing the size of statically linked programs when | |
560 only a limited set of transform sizes/types are required. | |
561 | |
562 * Canonical global wisdom file (/etc/fftw/wisdom) on Unix, along | |
563 with a command-line tool (fftw-wisdom) to generate/update it. | |
564 | |
565 * Fortran API can be used with both g77 and non-g77 compilers | |
566 simultaneously. | |
567 | |
568 * Multi-threaded version has optional OpenMP support. | |
569 | |
570 * Authors' good looks have greatly improved with age. | |
571 | |
572 Changes from 3.0beta3: | |
573 | |
574 * Separate FMA distribution to better exploit fused multiply-add instructions | |
575 on PowerPC (and possibly other) architectures. | |
576 | |
577 * Performance improvements via some inlining tweaks. | |
578 | |
579 * fftw_flops now returns double arguments, not int, to avoid overflows | |
580 for large sizes. | |
581 | |
582 * Workarounds for automake bugs. | |
583 | |
584 Changes from 3.0beta2: | |
585 | |
586 * The standard REDFT00/RODFT00 (DCT-I/DST-I) algorithm (used in | |
587 FFTPACK, NR, etcetera) turns out to have poor numerical accuracy, so | |
588 we replaced it with a slower routine that is more accurate. | |
589 | |
590 * The guru planner and execute functions now have two variants, one that | |
591 takes complex arguments and one that takes separate real/imag pointers. | |
592 | |
593 * Execute and planner routines now automatically align the stack on x86, | |
594 in case the calling program is misaligned. | |
595 | |
596 * README file for test program. | |
597 | |
598 * Fixed bugs in the combination of SIMD with multi-threaded transforms. | |
599 | |
600 * Eliminated internal fftw_threads_init function, which some people were | |
601 calling accidentally instead of the fftw_init_threads API function. | |
602 | |
603 * Check for -openmp flag (Intel C compiler) when --enable-openmp is used. | |
604 | |
605 * Support AMD x86-64 SIMD and cycle counter. | |
606 | |
607 * Support SSE2 intrinsics in forthcoming gcc 3.3. | |
608 | |
609 Changes from 3.0beta1: | |
610 | |
611 * Faster in-place 1d transforms of non-power-of-two sizes. | |
612 | |
613 * SIMD improvements for in-place, multi-dimensional, and/or non-FFTW_PATIENT | |
614 transforms. | |
615 | |
616 * Added support for hard-coded DCT/DST/DHT codelets of small sizes; the | |
617 default distribution only includes hard-coded size-8 DCT-II/III, however. | |
618 | |
619 * Many minor improvements to the manual. Added section on using the | |
620 codelet generator to customize and enhance FFTW. | |
621 | |
622 * The default 'make check' should now only take a few minutes; for more | |
623 strenuous tests (which may take a day or so), do 'cd tests; make bigcheck'. | |
624 | |
625 * fftw_print_plan is split into fftw_fprint_plan and fftw_print_plan, where | |
626 the latter uses stdout. | |
627 | |
628 * Fixed ability to compile with a C++ compiler. | |
629 | |
630 * Fixed support for C99 complex type under glibc. | |
631 | |
632 * Fixed problems with alloca under MinGW, AIX. | |
633 | |
634 * Workaround for gcc/SPARC bug. | |
635 | |
636 * Fixed multi-threaded initialization failure on IRIX due to lack of | |
637 user-accessible PTHREAD_SCOPE_SYSTEM there. |