Mercurial > hg > sv-dependency-builds
comparison src/fftw-3.3.5/NEWS @ 127:7867fa7e1b6b
Current fftw source
author | Chris Cannam <cannam@all-day-breakfast.com> |
---|---|
date | Tue, 18 Oct 2016 13:40:26 +0100 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
126:4a7071416412 | 127:7867fa7e1b6b |
---|---|
1 FFTW 3.3.5: | |
2 | |
3 * New SIMD support: | |
4 - Power8 VSX instructions in single and double precision. | |
5 To use, add --enable-vsx to configure. | |
6 - Support for AVX2 (256-bit FMA instructions). | |
7 To use, add --enable-avx2 to configure. | |
8 - Experimental support for AVX512 and KCVI. (--enable-avx512, --enable-kcvi) | |
9 This code is expected to work but the FFTW maintainers do not have | |
10 hardware to test it. | |
11 - Support for AVX128/FMA (for some AMD machines) (--enable-avx128-fma) | |
12 - Double precision Neon SIMD for aarch64. | |
13 This code is expected to work but the FFTW maintainers do not have | |
14 hardware to test it. | |
15 - generic SIMD support using gcc vector intrinsics | |
16 * Add fftw_make_planner_thread_safe() API | |
17 * fix #18 (disable float128 for CUDACC) | |
18 * fix #19: missing Fortran interface for fftwq_alloc_real | |
19 * fix #21 (don't use float128 on Portland compilers, which pretend to be gcc) | |
20 * fix: Avoid segfaults due to double free in MPI transpose | |
21 | |
22 * Special note for distribution maintainers: Although FFTW supports a | |
23 zillion SIMD instruction sets, enabling them all at the same time is | |
24 a bad idea, because it increases the planning time for minimal gain. | |
25 We recommend that general-purpose x86 distributions only enable SSE2 | |
26 and perhaps AVX. Users who care about the last ounce of performance | |
27 should recompile FFTW themselves. | |
28 | |
29 FFTW 3.3.4 | |
30 | |
31 * New functions fftw_alignment_of (to check whether two arrays are | |
32 equally aligned for the purposes of applying a plan) and fftw_sprint_plan | |
33 (to output a description of plan to a string). | |
34 | |
35 * Bugfix in fftw-wisdom-to-conf; thanks to Florian Oppermann for the | |
36 bug report. | |
37 | |
38 * Fixed manual to work with texinfo-5. | |
39 | |
40 * Increased timing interval on x86_64 to reduce timing errors. | |
41 | |
42 * Default to Win32 threads, not pthreads, if both are present. | |
43 | |
44 * Various build-script fixes. | |
45 | |
46 FFTW 3.3.3 | |
47 | |
48 * Fix deadlock bug in MPI transforms (thanks to Michael Pippig for the | |
49 bug report and patch, and to Graham Dennis for the bug report). | |
50 | |
51 * Use 128-bit ARM NEON instructions instead of 64-bits. This change | |
52 appears to speed up even ARM processors with a 64-bit NEON pipe. | |
53 | |
54 * Speed improvements for single-precision AVX. | |
55 | |
56 * Speed up planner on machines without "official" cycle counters, such as ARM. | |
57 | |
58 FFTW 3.3.2 | |
59 | |
60 * Removed an archaic stack-alignment hack that was failing with | |
61 gcc-4.7/i386. | |
62 | |
63 * Added stack-alignment hack necessary for gcc on Windows/i386. We | |
64 will regret this in ten years (see previous change). | |
65 | |
66 * Fix incompatibility with Intel icc which pretends to be gcc | |
67 but does not support quad precision. | |
68 | |
69 * make libfftw{threads,mpi} depend upon libfftw when using libtool; | |
70 this is consistent with most other libraries and simplifies the life | |
71 of various distributors of GNU/Linux. | |
72 | |
73 FFTW 3.3.1 | |
74 | |
75 * Changes since 3.3.1-beta1: | |
76 | |
77 - Reduced planning time in estimate mode for sizes with large | |
78 prime factors. | |
79 | |
80 - Added AVX autodetection under Visual Studio. Thanks Carsten | |
81 Steger for submitting the necessary code. | |
82 | |
83 - Modern Fortran interface now uses a separate fftw3l.f03 interface | |
84 file for the long double interface, which is not supported by | |
85 some Fortran compilers. Provided new fftw3q.f03 interface file | |
86 to access the quadruple-precision FFTW routines with recent | |
87 versions of gcc/gfortran. | |
88 | |
89 * Added support for the NEON extensions to the ARM ISA. (Note to beta | |
90 users: an ARM cycle counter is not yet implemented; please contact | |
91 fftw@fftw.org if you know how to do it right.) | |
92 | |
93 * MPI code now compiles even if mpicc is a C++ compiler; thanks to | |
94 Kyle Spyksma for the bug report. | |
95 | |
96 FFTW 3.3 | |
97 | |
98 * Changes since 3.3-beta1: | |
99 | |
100 - Compiling OpenMP support (--enable-openmp) now installs a | |
101 fftw3_omp library, instead of fftw3_threads, so that OpenMP | |
102 and POSIX threads (--enable-threads) libraries can be built | |
103 and installed at the same time. | |
104 | |
105 - Various minor compilation fixes, corrections of manual typos, and | |
106 improvements to the benchmark test program. | |
107 | |
108 * Add support for the AVX extensions to x86 and x86-64. The AVX code | |
109 works with 16-byte alignment (as opposed to 32-byte alignment), | |
110 so there is no ABI change compared to FFTW 3.2.2. | |
111 | |
112 * Added Fortran 2003 interface, which should be usable on most modern | |
113 Fortran compilers (e.g. gfortran) and provides type-checked access | |
114 to the the C FFTW interface. (The legacy Fortran-77 interface is | |
115 still included also.) | |
116 | |
117 * Added MPI distributed-memory transforms. Compared to 3.3alpha, | |
118 the major changes in the MPI transforms are: | |
119 - Fixed some deadlock and crashing bugs. | |
120 - Added Fortran 2003 interface. | |
121 - Added new-array execute functions for MPI plans. | |
122 - Eliminated use of large MPI tags, since Cray MPI requires tags < 2^24; | |
123 thanks to Jonathan Bentz for the bug report. | |
124 - Expanded documentation. | |
125 - 'make check' now runs MPI tests | |
126 - Some ABI changes - not binary-compatible with 3.3alpha MPI. | |
127 | |
128 * Add support for quad-precision __float128 in gcc 4.6 or later (on x86. | |
129 x86-64, and Itanium). The new routines use the fftwq_ prefix. | |
130 | |
131 * Removed support for MIPS paired-single instructions due to lack of | |
132 available hardware for testing. Users who want this functionality | |
133 should continue using FFTW 3.2.x. (Note that FFTW 3.3 still works | |
134 on MIPS; this only concerns special instructions available on some | |
135 MIPS chips.) | |
136 | |
137 * Removed support for the Cell Broadband Engine. Cell users should | |
138 use FFTW 3.2.x. | |
139 | |
140 * New convenience functions fftw_alloc_real and fftw_alloc_complex | |
141 to use fftw_malloc for real and complex arrays without typecasts | |
142 or sizeof. | |
143 | |
144 * New convenience functions fftw_export_wisdom_to_filename and | |
145 fftw_import_wisdom_from_filename that export/import wisdom | |
146 to a file, which don't require you to open/close the file yourself. | |
147 | |
148 * New function fftw_cost to return FFTW's internal cost metric for | |
149 a given plan; thanks to Rhys Ulerich and Nathanael Schaeffer for the | |
150 suggestion. | |
151 | |
152 * The --enable-sse2 configure flag now works in both double and single | |
153 precision (and is equivalent to --enable-sse in the latter case). | |
154 | |
155 * Remove --enable-portable-binary flag: we new produce portable binaries | |
156 by default. | |
157 | |
158 * Remove the automatic detection of native architecture flag for gcc | |
159 which was introduced in fftw-3.1, since new gcc supports -mtune=native. | |
160 Remove the --with-gcc-arch flag; if you want to specify a particlar | |
161 arch to configure, use ./configure CC="gcc -mtune=...". | |
162 | |
163 * --with-our-malloc16 configure flag is now renamed --with-our-malloc. | |
164 | |
165 * Fixed build problem failure when srand48 declaration is missing; | |
166 thanks to Ralf Wildenhues for the bug report. | |
167 | |
168 * Fixed bug in fftw_set_timelimit: ensure that a negative timelimit | |
169 is equivalent to no timelimit in all cases. Thanks to William Andrew | |
170 Burnson for the bug report. | |
171 | |
172 * Fixed stack-overflow problem on OpenBSD caused by using alloca with | |
173 too large a buffer. | |
174 | |
175 FFTW 3.2.2 | |
176 | |
177 * Improve performance of some copy operations of complex arrays on | |
178 x86 machines. | |
179 | |
180 * Add configure flag to disable alloca(), which is broken in mingw64. | |
181 | |
182 * Planning in FFTW_ESTIMATE mode for r2r transforms became slower | |
183 between fftw-3.1.3 and 3.2. This regression has now been fixed. | |
184 | |
185 FFTW 3.2.1 | |
186 | |
187 * Performance improvements for some multidimensional r2c/c2r transforms; | |
188 thanks to Eugene Miloslavsky for his benchmark reports. | |
189 | |
190 * Compile with icc on MacOS X, use better icc compiler flags. | |
191 | |
192 * Compilation fixes for systems where snprintf is defined as a macro; | |
193 thanks to Marcus Mae for the bug report. | |
194 | |
195 * Fortran documentation now recommends not using dfftw_execute, | |
196 because of reports of problems with various Fortran compilers; | |
197 it is better to use dfftw_execute_dft etcetera. | |
198 | |
199 * Some documentation clarifications, e.g. of fact that --enable-openmp | |
200 and --enable-threads are mutually exclusive (thanks to Long To), | |
201 and document slightly odd behavior of plan_guru_r2r in Fortran | |
202 (thanks to Alexander Pozdneev). | |
203 | |
204 * FAQ was accidentally omitted from 3.2 tarball. | |
205 | |
206 * Remove some extraneous (harmless) files accidentally included in | |
207 a subdirectory of the 3.2 tarball. | |
208 | |
209 FFTW 3.2 | |
210 | |
211 * Worked around apparent glibc bug that leads to rare hangs when freeing | |
212 semaphores. | |
213 | |
214 * Fixed segfault due to unaligned access in certain obscure problems | |
215 that use SSE and multiple threads. | |
216 | |
217 * MPI transforms not included, as they are still in alpha; the alpha | |
218 versions of the MPI transforms have been moved to FFTW 3.3alpha1. | |
219 | |
220 FFTW 3.2alpha3 | |
221 | |
222 * Performance improvements for sizes with factors of 5 and 10. | |
223 | |
224 * Documented FFTW_WISDOM_ONLY flag, at the suggestion of Mario | |
225 Emmenlauer and Phil Dumont. | |
226 | |
227 * Port Cell code to SDK2.1 (libspe2), as opposed to the old libspe1 code. | |
228 | |
229 * Performance improvements in Cell code for N < 32k, thanks to Jan Wagner | |
230 for the suggestions. | |
231 | |
232 * Cycle counter for Sun x86_64 compiler, and compilation fix in cycle | |
233 counter for AIX/xlc (thanks to Jeff Haferman for the bug report). | |
234 | |
235 * Fixed incorrect type prefix in MPI code that prevented wisdom routines | |
236 from working in single precision (thanks to Eric A. Borisch for the report). | |
237 | |
238 * Added 'make check' for MPI code (which still fails in a couple corner | |
239 cases, but should be much better than in alpha2). | |
240 | |
241 * Many other small fixes. | |
242 | |
243 FFTW 3.2alpha2 | |
244 | |
245 * Support for the Cell processor, donated by IBM Research; see README.Cell | |
246 and the Cell section of the manual. | |
247 | |
248 * New 64-bit API: for every "plan_guru" function there is a new "plan_guru64" | |
249 function with the same semantics, but which takes fftw_iodim64 instead of | |
250 fftw_iodim. fftw_iodim64 is the same as fftw_iodim, except that it takes | |
251 ptrdiff_t integer types as parameters, which is a 64-bit type on | |
252 64-bit machines. This is only useful for specifying very large transforms | |
253 on 64-bit machines. (Internally, FFTW uses ptrdiff_t everywhere | |
254 regardless of what API you choose.) | |
255 | |
256 * Experimental MPI support. Complex one- and multi-dimensional FFTs, | |
257 multi-dimensional r2r, multi-dimensional r2c/c2r transforms, and | |
258 distributed transpose operations, with 1d block distributions. | |
259 (This is an alpha preview: routines have not been exhaustively | |
260 tested, documentation is incomplete, and some functionality is | |
261 missing, e.g. Fortran support.) See mpi/README and also the MPI | |
262 section of the manual. | |
263 | |
264 * Significantly faster r2c/c2r transforms, especially on machines with SIMD. | |
265 | |
266 * Rewritten multi-threaded support for better performance by | |
267 re-using a fixed pool of threads rather than continually | |
268 respawning and joining (which nowadays is much slower). | |
269 | |
270 * Support for MIPS paired-single SIMD instructions, donated by | |
271 Codesourcery. | |
272 | |
273 * FFTW_WISDOM_ONLY planner flag, to create plan only if wisdom is | |
274 available and return NULL otherwise. | |
275 | |
276 * Removed k7 support, which only worked in 32-bit mode and is | |
277 becoming obsolete. Use --enable-sse instead. | |
278 | |
279 * Added --with-g77-wrappers configure option to force inclusion | |
280 of g77 wrappers, in addition to whatever is needed for the | |
281 detected Fortran compilers. This is mainly intended for GNU/Linux | |
282 distros switching to gfortran that wish to include both | |
283 gfortran and g77 support in FFTW. | |
284 | |
285 * In manual, renamed "guru execute" functions to "new-array execute" | |
286 functions, to reduce confusion with the guru planner interface. | |
287 (The programming interface is unchanged.) | |
288 | |
289 * Add missing __declspec attribute to threads API functions when compiling | |
290 for Windows; thanks to Robert O. Morris for the bug report. | |
291 | |
292 * Fixed missing return value from dfftw_init_threads in Fortran; | |
293 thanks to Markus Wetzstein for the bug report. | |
294 | |
295 FFTW 3.1.3 | |
296 | |
297 * Bug fix: FFTW computes incorrect results when the user plans both | |
298 REDFT11 and RODFT11 transforms of certain sizes. The bug is caused | |
299 by incorrect sharing of twiddle-factor tables between the two | |
300 transforms, and only occurs when both are used. Thanks to Paul | |
301 A. Valiant for the bug report. | |
302 | |
303 FFTW 3.1.2 | |
304 | |
305 * Correct bug in configure script: --enable-portable-binary option was ignored! | |
306 Thanks to Andrew Salamon for the bug report. | |
307 | |
308 * Threads compilation fix on AIX: prefer xlc_r to cc_r, and don't use | |
309 either if we are using gcc. Thanks to Guy Moebs for the bug report. | |
310 | |
311 * Updated FAQ to note that Apple gcc 4.0.1 on MacOS/Intel is broken, | |
312 and suggest a workaround. configure script now detects Core/Duo arch. | |
313 | |
314 * Use -maltivec when checking for altivec.h. Fixes Gentoo bug #129304, | |
315 thanks to Markus Dittrich. | |
316 | |
317 FFTW 3.1.1 | |
318 | |
319 * Performance improvements for Intel EMT64. | |
320 | |
321 * Performance improvements for large-size transforms with SIMD. | |
322 | |
323 * Cycle counter support for Intel icc and Visual C++ on x86-64. | |
324 | |
325 * In fftw-wisdom tool, replaced obsolete --impatient with --measure. | |
326 | |
327 * Fixed compilation failure with AIX/xlc; thanks to Joseph Thomas. | |
328 | |
329 * Windows DLL support for Fortran API (added missing __declspec(dllexport)). | |
330 | |
331 * SSE/SSE2 code works properly (i.e. disables itself) on older 386 and 486 | |
332 CPUs lacking a CPUID instruction; thanks to Eric Korpela. | |
333 | |
334 FFTW 3.1 | |
335 | |
336 * Faster FFTW_ESTIMATE planner. | |
337 | |
338 * New (faster) algorithm for REDFT00/RODFT00 (type-I DCT/DST) of odd size. | |
339 | |
340 * "4-step" algorithm for faster FFTs of very large sizes (> 2^18). | |
341 | |
342 * Faster in-place real-data DFTs (for R2HC and HC2R r2r formats). | |
343 | |
344 * Faster in-place non-square transpositions (FFTW uses these internally | |
345 for in-place FFTs, and you can also perform them explicitly using | |
346 the guru interface). | |
347 | |
348 * Faster prime-size DFTs: implemented Bluestein's algorithm, as well | |
349 as a zero-padded Rader variant to limit recursive use of Rader's algorithm. | |
350 | |
351 * SIMD support for split complex arrays. | |
352 | |
353 * Much faster Altivec/VMX performance. | |
354 | |
355 * New fftw_set_timelimit function to specify a (rough) upper bound to the | |
356 planning time (does not affect ESTIMATE mode). | |
357 | |
358 * Removed --enable-3dnow support; use --enable-k7 instead. | |
359 | |
360 * FMA (fused multiply-add) version is now included in "standard" FFTW, | |
361 and is enabled with --enable-fma (the default on PowerPC and Itanium). | |
362 | |
363 * Automatic detection of native architecture flag for gcc. New | |
364 configure options: --enable-portable-binary and --with-gcc-arch=<arch>, | |
365 for people distributing compiled binaries of FFTW (see manual). | |
366 | |
367 * Automatic detection of Altivec under Linux with gcc 3.4 (so that | |
368 same binary should work on both Altivec and non-Altivec PowerPCs). | |
369 | |
370 * Compiler-specific tweaks/flags/workarounds for gcc 3.4, xlc, HP/UX, | |
371 Solaris/Intel. | |
372 | |
373 * Various documentation clarifications. | |
374 | |
375 * 64-bit clean. (Fixes a bug affecting the split guru planner on | |
376 64-bit machines, reported by David Necas.) | |
377 | |
378 * Fixed Debian bug #259612: inadvertent use of SSE instructions on | |
379 non-SSE machines (causing a crash) for --enable-sse binaries. | |
380 | |
381 * Fixed bug that caused HC2R transforms to destroy the input in | |
382 certain cases, even if the user specified FFTW_PRESERVE_INPUT. | |
383 | |
384 * Fixed bug where wisdom would be lost under rare circumstances, | |
385 causing excessive planning time. | |
386 | |
387 * FAQ notes bug in gcc-3.4.[1-3] that causes FFTW to crash with SSE/SSE2. | |
388 | |
389 * Fixed accidentally exported symbol that prohibited simultaneous | |
390 linking to double/single multithreaded FFTW (thanks to Alessio Massaro). | |
391 | |
392 * Support Win32 threads under MinGW (thanks to Alessio Massaro). | |
393 | |
394 * Fixed problem with building DLL under Cygwin; thanks to Stephane Fillod. | |
395 | |
396 * Fix build failure if no Fortran compiler is found (thanks to Charles | |
397 Radley for the bug report). | |
398 | |
399 * Fixed compilation failure with icc 8.0 and SSE/SSE2. Automatic | |
400 detection of icc architecture flag (e.g. -xW). | |
401 | |
402 * Fixed compilation with OpenMP on AIX (thanks to Greg Bauer). | |
403 | |
404 * Fixed compilation failure on x86-64 with gcc (thanks to Orion Poplawski). | |
405 | |
406 * Incorporated patch from FreeBSD ports (FreeBSD does not have memalign, | |
407 but its malloc is 16-byte aligned). | |
408 | |
409 * Cycle-counter compilation fixes for Itanium, Alpha, x86-64, Sparc, | |
410 MacOS (thanks to Matt Boman, John Bowman, and James A. Treacy for | |
411 reports/fixes). Added x86-64 cycle counter for PGI compilers, | |
412 courtesy Cristiano Calonaci. | |
413 | |
414 * Fix compilation problem in test program due to C99 conflict. | |
415 | |
416 * Portability fix for import_system_wisdom with djgpp (thanks to Juan | |
417 Manuel Guerrero). | |
418 | |
419 * Fixed compilation failure on MacOS 10.3 due to getopt conflict. | |
420 | |
421 * Work around Visual C++ (version 6/7) bug in SSE compilation; | |
422 thanks to Eddie Yee for his detailed report. | |
423 | |
424 Changes from FFTW 3.1 beta 2: | |
425 | |
426 * Several minor compilation fixes. | |
427 | |
428 * Eliminate FFTW_TIMELIMIT flag and replace fftw_timelimit global with | |
429 fftw_set_timelimit function. Make wisdom work with time-limited plans. | |
430 | |
431 Changes from FFTW 3.1 beta 1: | |
432 | |
433 * Fixes for creating DLLs under Windows; thanks to John Pavel for his feedback. | |
434 | |
435 * Fixed more 64-bit problems, thanks to John Pavel for the bug report. | |
436 | |
437 * Further speed improvements for Altivec/VMX. | |
438 | |
439 * Further speed improvements for non-square transpositions. | |
440 | |
441 * Many minor tweaks. | |
442 | |
443 FFTW 3.0.1 | |
444 | |
445 * Some speed improvements in SIMD code. | |
446 | |
447 * --without-cycle-counter option is removed. If no cycle counter is found, | |
448 then the estimator is always used. A --with-slow-timer option is provided | |
449 to force the use of lower-resolution timers. | |
450 | |
451 * Several fixes for compilation under Visual C++, with help from Stefane Ruel. | |
452 | |
453 * Added x86 cycle counter for Visual C++, with help from Morten Nissov. | |
454 | |
455 * Added S390 cycle counter, courtesy of James Treacy. | |
456 | |
457 * Added missing static keyword that prevented simultaneous linkage | |
458 of different-precision versions; thanks to Rasmus Larsen for the bug report. | |
459 | |
460 * Corrected accidental omission of f77_wisdom.f file; thanks to Alan Watson. | |
461 | |
462 * Support -xopenmp flag for SunOS; thanks to John Lou for the bug report. | |
463 | |
464 * Compilation with HP/UX cc requires -Wp,-H128000 flag to increase | |
465 preprocessor limits; thanks to Peter Vouras for the bug report. | |
466 | |
467 * Removed non-portable use of 'tempfile' in fftw-wisdom-to-conf script; | |
468 thanks to Nicolas Decoster for the patch. | |
469 | |
470 * Added 'make smallcheck' target in tests/ directory, at the request of | |
471 James Treacy. | |
472 | |
473 FFTW 3.0 | |
474 | |
475 Major goals of this release: | |
476 | |
477 * Speed: often 20% or more faster than FFTW 2.x, even without SIMD (see below). | |
478 | |
479 * Complete rewrite, to make it easier to add new algorithms and transforms. | |
480 | |
481 * New API, to support more general semantics. | |
482 | |
483 Other enhancements: | |
484 | |
485 * SIMD acceleration on supporting CPUs (SSE, SSE2, 3DNow!, and AltiVec). | |
486 (With special thanks to Franz Franchetti for many experimental prototypes | |
487 and to Stefan Kral for the vectorizing generator from fftwgel.) | |
488 | |
489 * True in-place 1d transforms of large sizes (as well as compressed | |
490 twiddle tables for additional memory/cache savings). | |
491 | |
492 * More arbitrary placement of real & imaginary data, e.g. including | |
493 interleaved (as in FFTW 2.x) as well as separate real/imag arrays. | |
494 | |
495 * Efficient prime-size transforms of real data. | |
496 | |
497 * Multidimensional transforms can operate on a subset of a larger matrix, | |
498 and/or transform selected dimensions of a multidimensional array. | |
499 | |
500 * By popular demand, simultaneous linking to double precision (fftw), | |
501 single precision (fftwf), and long-double precision (fftwl) versions | |
502 of FFTW is now supported. | |
503 | |
504 * Cycle counters (on all modern CPUs) are exploited to speed planning. | |
505 | |
506 * Efficient transforms of real even/odd arrays, a.k.a. discrete | |
507 cosine/sine transforms (types I-IV). (Currently work via pre/post | |
508 processing of real transforms, ala FFTPACK, so are not optimal.) | |
509 | |
510 * DHTs (Discrete Hartley Transforms), again via post-processing | |
511 of real transforms (and thus suboptimal, for now). | |
512 | |
513 * Support for linking to just those parts of FFTW that you need, | |
514 greatly reducing the size of statically linked programs when | |
515 only a limited set of transform sizes/types are required. | |
516 | |
517 * Canonical global wisdom file (/etc/fftw/wisdom) on Unix, along | |
518 with a command-line tool (fftw-wisdom) to generate/update it. | |
519 | |
520 * Fortran API can be used with both g77 and non-g77 compilers | |
521 simultaneously. | |
522 | |
523 * Multi-threaded version has optional OpenMP support. | |
524 | |
525 * Authors' good looks have greatly improved with age. | |
526 | |
527 Changes from 3.0beta3: | |
528 | |
529 * Separate FMA distribution to better exploit fused multiply-add instructions | |
530 on PowerPC (and possibly other) architectures. | |
531 | |
532 * Performance improvements via some inlining tweaks. | |
533 | |
534 * fftw_flops now returns double arguments, not int, to avoid overflows | |
535 for large sizes. | |
536 | |
537 * Workarounds for automake bugs. | |
538 | |
539 Changes from 3.0beta2: | |
540 | |
541 * The standard REDFT00/RODFT00 (DCT-I/DST-I) algorithm (used in | |
542 FFTPACK, NR, etcetera) turns out to have poor numerical accuracy, so | |
543 we replaced it with a slower routine that is more accurate. | |
544 | |
545 * The guru planner and execute functions now have two variants, one that | |
546 takes complex arguments and one that takes separate real/imag pointers. | |
547 | |
548 * Execute and planner routines now automatically align the stack on x86, | |
549 in case the calling program is misaligned. | |
550 | |
551 * README file for test program. | |
552 | |
553 * Fixed bugs in the combination of SIMD with multi-threaded transforms. | |
554 | |
555 * Eliminated internal fftw_threads_init function, which some people were | |
556 calling accidentally instead of the fftw_init_threads API function. | |
557 | |
558 * Check for -openmp flag (Intel C compiler) when --enable-openmp is used. | |
559 | |
560 * Support AMD x86-64 SIMD and cycle counter. | |
561 | |
562 * Support SSE2 intrinsics in forthcoming gcc 3.3. | |
563 | |
564 Changes from 3.0beta1: | |
565 | |
566 * Faster in-place 1d transforms of non-power-of-two sizes. | |
567 | |
568 * SIMD improvements for in-place, multi-dimensional, and/or non-FFTW_PATIENT | |
569 transforms. | |
570 | |
571 * Added support for hard-coded DCT/DST/DHT codelets of small sizes; the | |
572 default distribution only includes hard-coded size-8 DCT-II/III, however. | |
573 | |
574 * Many minor improvements to the manual. Added section on using the | |
575 codelet generator to customize and enhance FFTW. | |
576 | |
577 * The default 'make check' should now only take a few minutes; for more | |
578 strenuous tests (which may take a day or so), do 'cd tests; make bigcheck'. | |
579 | |
580 * fftw_print_plan is split into fftw_fprint_plan and fftw_print_plan, where | |
581 the latter uses stdout. | |
582 | |
583 * Fixed ability to compile with a C++ compiler. | |
584 | |
585 * Fixed support for C99 complex type under glibc. | |
586 | |
587 * Fixed problems with alloca under MinGW, AIX. | |
588 | |
589 * Workaround for gcc/SPARC bug. | |
590 | |
591 * Fixed multi-threaded initialization failure on IRIX due to lack of | |
592 user-accessible PTHREAD_SCOPE_SYSTEM there. |