Mercurial > hg > sv-dependency-builds
comparison src/fftw-3.3.3/NEWS @ 95:89f5e221ed7b
Add FFTW3
author | Chris Cannam <cannam@all-day-breakfast.com> |
---|---|
date | Wed, 20 Mar 2013 15:35:50 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
94:d278df1123f9 | 95:89f5e221ed7b |
---|---|
1 FFTW 3.3.3 | |
2 | |
3 * Fix deadlock bug in MPI transforms (thanks to Michael Pippig for the | |
4 bug report and patch, and to Graham Dennis for the bug report). | |
5 | |
6 * Use 128-bit ARM NEON instructions instead of 64-bits. This change | |
7 appears to speed up even ARM processors with a 64-bit NEON pipe. | |
8 | |
9 * Speed improvements for single-precision AVX. | |
10 | |
11 * Speed up planner on machines without "official" cycle counters, such as ARM. | |
12 | |
13 FFTW 3.3.2 | |
14 | |
15 * Removed an archaic stack-alignment hack that was failing with | |
16 gcc-4.7/i386. | |
17 | |
18 * Added stack-alignment hack necessary for gcc on Windows/i386. We | |
19 will regret this in ten years (see previous change). | |
20 | |
21 * Fix incompatibility with Intel icc which pretends to be gcc | |
22 but does not support quad precision. | |
23 | |
24 * make libfftw{threads,mpi} depend upon libfftw when using libtool; | |
25 this is consistent with most other libraries and simplifies the life | |
26 of various distributors of GNU/Linux. | |
27 | |
28 FFTW 3.3.1 | |
29 | |
30 * Changes since 3.3.1-beta1: | |
31 | |
32 - Reduced planning time in estimate mode for sizes with large | |
33 prime factors. | |
34 | |
35 - Added AVX autodetection under Visual Studio. Thanks Carsten | |
36 Steger for submitting the necessary code. | |
37 | |
38 - Modern Fortran interface now uses a separate fftw3l.f03 interface | |
39 file for the long double interface, which is not supported by | |
40 some Fortran compilers. Provided new fftw3q.f03 interface file | |
41 to access the quadruple-precision FFTW routines with recent | |
42 versions of gcc/gfortran. | |
43 | |
44 * Added support for the NEON extensions to the ARM ISA. (Note to beta | |
45 users: an ARM cycle counter is not yet implemented; please contact | |
46 fftw@fftw.org if you know how to do it right.) | |
47 | |
48 * MPI code now compiles even if mpicc is a C++ compiler; thanks to | |
49 Kyle Spyksma for the bug report. | |
50 | |
51 FFTW 3.3 | |
52 | |
53 * Changes since 3.3-beta1: | |
54 | |
55 - Compiling OpenMP support (--enable-openmp) now installs a | |
56 fftw3_omp library, instead of fftw3_threads, so that OpenMP | |
57 and POSIX threads (--enable-threads) libraries can be built | |
58 and installed at the same time. | |
59 | |
60 - Various minor compilation fixes, corrections of manual typos, and | |
61 improvements to the benchmark test program. | |
62 | |
63 * Add support for the AVX extensions to x86 and x86-64. The AVX code | |
64 works with 16-byte alignment (as opposed to 32-byte alignment), | |
65 so there is no ABI change compared to FFTW 3.2.2. | |
66 | |
67 * Added Fortran 2003 interface, which should be usable on most modern | |
68 Fortran compilers (e.g. gfortran) and provides type-checked access | |
69 to the the C FFTW interface. (The legacy Fortran-77 interface is | |
70 still included also.) | |
71 | |
72 * Added MPI distributed-memory transforms. Compared to 3.3alpha, | |
73 the major changes in the MPI transforms are: | |
74 - Fixed some deadlock and crashing bugs. | |
75 - Added Fortran 2003 interface. | |
76 - Added new-array execute functions for MPI plans. | |
77 - Eliminated use of large MPI tags, since Cray MPI requires tags < 2^24; | |
78 thanks to Jonathan Bentz for the bug report. | |
79 - Expanded documentation. | |
80 - 'make check' now runs MPI tests | |
81 - Some ABI changes - not binary-compatible with 3.3alpha MPI. | |
82 | |
83 * Add support for quad-precision __float128 in gcc 4.6 or later (on x86. | |
84 x86-64, and Itanium). The new routines use the fftwq_ prefix. | |
85 | |
86 * Removed support for MIPS paired-single instructions due to lack of | |
87 available hardware for testing. Users who want this functionality | |
88 should continue using FFTW 3.2.x. (Note that FFTW 3.3 still works | |
89 on MIPS; this only concerns special instructions available on some | |
90 MIPS chips.) | |
91 | |
92 * Removed support for the Cell Broadband Engine. Cell users should | |
93 use FFTW 3.2.x. | |
94 | |
95 * New convenience functions fftw_alloc_real and fftw_alloc_complex | |
96 to use fftw_malloc for real and complex arrays without typecasts | |
97 or sizeof. | |
98 | |
99 * New convenience functions fftw_export_wisdom_to_filename and | |
100 fftw_import_wisdom_from_filename that export/import wisdom | |
101 to a file, which don't require you to open/close the file yourself. | |
102 | |
103 * New function fftw_cost to return FFTW's internal cost metric for | |
104 a given plan; thanks to Rhys Ulerich and Nathanael Schaeffer for the | |
105 suggestion. | |
106 | |
107 * The --enable-sse2 configure flag now works in both double and single | |
108 precision (and is equivalent to --enable-sse in the latter case). | |
109 | |
110 * Remove --enable-portable-binary flag: we new produce portable binaries | |
111 by default. | |
112 | |
113 * Remove the automatic detection of native architecture flag for gcc | |
114 which was introduced in fftw-3.1, since new gcc supports -mtune=native. | |
115 Remove the --with-gcc-arch flag; if you want to specify a particlar | |
116 arch to configure, use ./configure CC="gcc -mtune=...". | |
117 | |
118 * --with-our-malloc16 configure flag is now renamed --with-our-malloc. | |
119 | |
120 * Fixed build problem failure when srand48 declaration is missing; | |
121 thanks to Ralf Wildenhues for the bug report. | |
122 | |
123 * Fixed bug in fftw_set_timelimit: ensure that a negative timelimit | |
124 is equivalent to no timelimit in all cases. Thanks to William Andrew | |
125 Burnson for the bug report. | |
126 | |
127 * Fixed stack-overflow problem on OpenBSD caused by using alloca with | |
128 too large a buffer. | |
129 | |
130 FFTW 3.2.2 | |
131 | |
132 * Improve performance of some copy operations of complex arrays on | |
133 x86 machines. | |
134 | |
135 * Add configure flag to disable alloca(), which is broken in mingw64. | |
136 | |
137 * Planning in FFTW_ESTIMATE mode for r2r transforms became slower | |
138 between fftw-3.1.3 and 3.2. This regression has now been fixed. | |
139 | |
140 FFTW 3.2.1 | |
141 | |
142 * Performance improvements for some multidimensional r2c/c2r transforms; | |
143 thanks to Eugene Miloslavsky for his benchmark reports. | |
144 | |
145 * Compile with icc on MacOS X, use better icc compiler flags. | |
146 | |
147 * Compilation fixes for systems where snprintf is defined as a macro; | |
148 thanks to Marcus Mae for the bug report. | |
149 | |
150 * Fortran documentation now recommends not using dfftw_execute, | |
151 because of reports of problems with various Fortran compilers; | |
152 it is better to use dfftw_execute_dft etcetera. | |
153 | |
154 * Some documentation clarifications, e.g. of fact that --enable-openmp | |
155 and --enable-threads are mutually exclusive (thanks to Long To), | |
156 and document slightly odd behavior of plan_guru_r2r in Fortran | |
157 (thanks to Alexander Pozdneev). | |
158 | |
159 * FAQ was accidentally omitted from 3.2 tarball. | |
160 | |
161 * Remove some extraneous (harmless) files accidentally included in | |
162 a subdirectory of the 3.2 tarball. | |
163 | |
164 FFTW 3.2 | |
165 | |
166 * Worked around apparent glibc bug that leads to rare hangs when freeing | |
167 semaphores. | |
168 | |
169 * Fixed segfault due to unaligned access in certain obscure problems | |
170 that use SSE and multiple threads. | |
171 | |
172 * MPI transforms not included, as they are still in alpha; the alpha | |
173 versions of the MPI transforms have been moved to FFTW 3.3alpha1. | |
174 | |
175 FFTW 3.2alpha3 | |
176 | |
177 * Performance improvements for sizes with factors of 5 and 10. | |
178 | |
179 * Documented FFTW_WISDOM_ONLY flag, at the suggestion of Mario | |
180 Emmenlauer and Phil Dumont. | |
181 | |
182 * Port Cell code to SDK2.1 (libspe2), as opposed to the old libspe1 code. | |
183 | |
184 * Performance improvements in Cell code for N < 32k, thanks to Jan Wagner | |
185 for the suggestions. | |
186 | |
187 * Cycle counter for Sun x86_64 compiler, and compilation fix in cycle | |
188 counter for AIX/xlc (thanks to Jeff Haferman for the bug report). | |
189 | |
190 * Fixed incorrect type prefix in MPI code that prevented wisdom routines | |
191 from working in single precision (thanks to Eric A. Borisch for the report). | |
192 | |
193 * Added 'make check' for MPI code (which still fails in a couple corner | |
194 cases, but should be much better than in alpha2). | |
195 | |
196 * Many other small fixes. | |
197 | |
198 FFTW 3.2alpha2 | |
199 | |
200 * Support for the Cell processor, donated by IBM Research; see README.Cell | |
201 and the Cell section of the manual. | |
202 | |
203 * New 64-bit API: for every "plan_guru" function there is a new "plan_guru64" | |
204 function with the same semantics, but which takes fftw_iodim64 instead of | |
205 fftw_iodim. fftw_iodim64 is the same as fftw_iodim, except that it takes | |
206 ptrdiff_t integer types as parameters, which is a 64-bit type on | |
207 64-bit machines. This is only useful for specifying very large transforms | |
208 on 64-bit machines. (Internally, FFTW uses ptrdiff_t everywhere | |
209 regardless of what API you choose.) | |
210 | |
211 * Experimental MPI support. Complex one- and multi-dimensional FFTs, | |
212 multi-dimensional r2r, multi-dimensional r2c/c2r transforms, and | |
213 distributed transpose operations, with 1d block distributions. | |
214 (This is an alpha preview: routines have not been exhaustively | |
215 tested, documentation is incomplete, and some functionality is | |
216 missing, e.g. Fortran support.) See mpi/README and also the MPI | |
217 section of the manual. | |
218 | |
219 * Significantly faster r2c/c2r transforms, especially on machines with SIMD. | |
220 | |
221 * Rewritten multi-threaded support for better performance by | |
222 re-using a fixed pool of threads rather than continually | |
223 respawning and joining (which nowadays is much slower). | |
224 | |
225 * Support for MIPS paired-single SIMD instructions, donated by | |
226 Codesourcery. | |
227 | |
228 * FFTW_WISDOM_ONLY planner flag, to create plan only if wisdom is | |
229 available and return NULL otherwise. | |
230 | |
231 * Removed k7 support, which only worked in 32-bit mode and is | |
232 becoming obsolete. Use --enable-sse instead. | |
233 | |
234 * Added --with-g77-wrappers configure option to force inclusion | |
235 of g77 wrappers, in addition to whatever is needed for the | |
236 detected Fortran compilers. This is mainly intended for GNU/Linux | |
237 distros switching to gfortran that wish to include both | |
238 gfortran and g77 support in FFTW. | |
239 | |
240 * In manual, renamed "guru execute" functions to "new-array execute" | |
241 functions, to reduce confusion with the guru planner interface. | |
242 (The programming interface is unchanged.) | |
243 | |
244 * Add missing __declspec attribute to threads API functions when compiling | |
245 for Windows; thanks to Robert O. Morris for the bug report. | |
246 | |
247 * Fixed missing return value from dfftw_init_threads in Fortran; | |
248 thanks to Markus Wetzstein for the bug report. | |
249 | |
250 FFTW 3.1.1 | |
251 | |
252 * Performance improvements for Intel EMT64. | |
253 | |
254 * Performance improvements for large-size transforms with SIMD. | |
255 | |
256 * Cycle counter support for Intel icc and Visual C++ on x86-64. | |
257 | |
258 * In fftw-wisdom tool, replaced obsolete --impatient with --measure. | |
259 | |
260 * Fixed compilation failure with AIX/xlc; thanks to Joseph Thomas. | |
261 | |
262 * Windows DLL support for Fortran API (added missing __declspec(dllexport)). | |
263 | |
264 * SSE/SSE2 code works properly (i.e. disables itself) on older 386 and 486 | |
265 CPUs lacking a CPUID instruction; thanks to Eric Korpela. | |
266 | |
267 FFTW 3.1 | |
268 | |
269 * Faster FFTW_ESTIMATE planner. | |
270 | |
271 * New (faster) algorithm for REDFT00/RODFT00 (type-I DCT/DST) of odd size. | |
272 | |
273 * "4-step" algorithm for faster FFTs of very large sizes (> 2^18). | |
274 | |
275 * Faster in-place real-data DFTs (for R2HC and HC2R r2r formats). | |
276 | |
277 * Faster in-place non-square transpositions (FFTW uses these internally | |
278 for in-place FFTs, and you can also perform them explicitly using | |
279 the guru interface). | |
280 | |
281 * Faster prime-size DFTs: implemented Bluestein's algorithm, as well | |
282 as a zero-padded Rader variant to limit recursive use of Rader's algorithm. | |
283 | |
284 * SIMD support for split complex arrays. | |
285 | |
286 * Much faster Altivec/VMX performance. | |
287 | |
288 * New fftw_set_timelimit function to specify a (rough) upper bound to the | |
289 planning time (does not affect ESTIMATE mode). | |
290 | |
291 * Removed --enable-3dnow support; use --enable-k7 instead. | |
292 | |
293 * FMA (fused multiply-add) version is now included in "standard" FFTW, | |
294 and is enabled with --enable-fma (the default on PowerPC and Itanium). | |
295 | |
296 * Automatic detection of native architecture flag for gcc. New | |
297 configure options: --enable-portable-binary and --with-gcc-arch=<arch>, | |
298 for people distributing compiled binaries of FFTW (see manual). | |
299 | |
300 * Automatic detection of Altivec under Linux with gcc 3.4 (so that | |
301 same binary should work on both Altivec and non-Altivec PowerPCs). | |
302 | |
303 * Compiler-specific tweaks/flags/workarounds for gcc 3.4, xlc, HP/UX, | |
304 Solaris/Intel. | |
305 | |
306 * Various documentation clarifications. | |
307 | |
308 * 64-bit clean. (Fixes a bug affecting the split guru planner on | |
309 64-bit machines, reported by David Necas.) | |
310 | |
311 * Fixed Debian bug #259612: inadvertent use of SSE instructions on | |
312 non-SSE machines (causing a crash) for --enable-sse binaries. | |
313 | |
314 * Fixed bug that caused HC2R transforms to destroy the input in | |
315 certain cases, even if the user specified FFTW_PRESERVE_INPUT. | |
316 | |
317 * Fixed bug where wisdom would be lost under rare circumstances, | |
318 causing excessive planning time. | |
319 | |
320 * FAQ notes bug in gcc-3.4.[1-3] that causes FFTW to crash with SSE/SSE2. | |
321 | |
322 * Fixed accidentally exported symbol that prohibited simultaneous | |
323 linking to double/single multithreaded FFTW (thanks to Alessio Massaro). | |
324 | |
325 * Support Win32 threads under MinGW (thanks to Alessio Massaro). | |
326 | |
327 * Fixed problem with building DLL under Cygwin; thanks to Stephane Fillod. | |
328 | |
329 * Fix build failure if no Fortran compiler is found (thanks to Charles | |
330 Radley for the bug report). | |
331 | |
332 * Fixed compilation failure with icc 8.0 and SSE/SSE2. Automatic | |
333 detection of icc architecture flag (e.g. -xW). | |
334 | |
335 * Fixed compilation with OpenMP on AIX (thanks to Greg Bauer). | |
336 | |
337 * Fixed compilation failure on x86-64 with gcc (thanks to Orion Poplawski). | |
338 | |
339 * Incorporated patch from FreeBSD ports (FreeBSD does not have memalign, | |
340 but its malloc is 16-byte aligned). | |
341 | |
342 * Cycle-counter compilation fixes for Itanium, Alpha, x86-64, Sparc, | |
343 MacOS (thanks to Matt Boman, John Bowman, and James A. Treacy for | |
344 reports/fixes). Added x86-64 cycle counter for PGI compilers, | |
345 courtesy Cristiano Calonaci. | |
346 | |
347 * Fix compilation problem in test program due to C99 conflict. | |
348 | |
349 * Portability fix for import_system_wisdom with djgpp (thanks to Juan | |
350 Manuel Guerrero). | |
351 | |
352 * Fixed compilation failure on MacOS 10.3 due to getopt conflict. | |
353 | |
354 * Work around Visual C++ (version 6/7) bug in SSE compilation; | |
355 thanks to Eddie Yee for his detailed report. | |
356 | |
357 Changes from FFTW 3.1 beta 2: | |
358 | |
359 * Several minor compilation fixes. | |
360 | |
361 * Eliminate FFTW_TIMELIMIT flag and replace fftw_timelimit global with | |
362 fftw_set_timelimit function. Make wisdom work with time-limited plans. | |
363 | |
364 Changes from FFTW 3.1 beta 1: | |
365 | |
366 * Fixes for creating DLLs under Windows; thanks to John Pavel for his feedback. | |
367 | |
368 * Fixed more 64-bit problems, thanks to John Pavel for the bug report. | |
369 | |
370 * Further speed improvements for Altivec/VMX. | |
371 | |
372 * Further speed improvements for non-square transpositions. | |
373 | |
374 * Many minor tweaks. | |
375 | |
376 FFTW 3.0.1 | |
377 | |
378 * Some speed improvements in SIMD code. | |
379 | |
380 * --without-cycle-counter option is removed. If no cycle counter is found, | |
381 then the estimator is always used. A --with-slow-timer option is provided | |
382 to force the use of lower-resolution timers. | |
383 | |
384 * Several fixes for compilation under Visual C++, with help from Stefane Ruel. | |
385 | |
386 * Added x86 cycle counter for Visual C++, with help from Morten Nissov. | |
387 | |
388 * Added S390 cycle counter, courtesy of James Treacy. | |
389 | |
390 * Added missing static keyword that prevented simultaneous linkage | |
391 of different-precision versions; thanks to Rasmus Larsen for the bug report. | |
392 | |
393 * Corrected accidental omission of f77_wisdom.f file; thanks to Alan Watson. | |
394 | |
395 * Support -xopenmp flag for SunOS; thanks to John Lou for the bug report. | |
396 | |
397 * Compilation with HP/UX cc requires -Wp,-H128000 flag to increase | |
398 preprocessor limits; thanks to Peter Vouras for the bug report. | |
399 | |
400 * Removed non-portable use of 'tempfile' in fftw-wisdom-to-conf script; | |
401 thanks to Nicolas Decoster for the patch. | |
402 | |
403 * Added 'make smallcheck' target in tests/ directory, at the request of | |
404 James Treacy. | |
405 | |
406 FFTW 3.0 | |
407 | |
408 Major goals of this release: | |
409 | |
410 * Speed: often 20% or more faster than FFTW 2.x, even without SIMD (see below). | |
411 | |
412 * Complete rewrite, to make it easier to add new algorithms and transforms. | |
413 | |
414 * New API, to support more general semantics. | |
415 | |
416 Other enhancements: | |
417 | |
418 * SIMD acceleration on supporting CPUs (SSE, SSE2, 3DNow!, and AltiVec). | |
419 (With special thanks to Franz Franchetti for many experimental prototypes | |
420 and to Stefan Kral for the vectorizing generator from fftwgel.) | |
421 | |
422 * True in-place 1d transforms of large sizes (as well as compressed | |
423 twiddle tables for additional memory/cache savings). | |
424 | |
425 * More arbitrary placement of real & imaginary data, e.g. including | |
426 interleaved (as in FFTW 2.x) as well as separate real/imag arrays. | |
427 | |
428 * Efficient prime-size transforms of real data. | |
429 | |
430 * Multidimensional transforms can operate on a subset of a larger matrix, | |
431 and/or transform selected dimensions of a multidimensional array. | |
432 | |
433 * By popular demand, simultaneous linking to double precision (fftw), | |
434 single precision (fftwf), and long-double precision (fftwl) versions | |
435 of FFTW is now supported. | |
436 | |
437 * Cycle counters (on all modern CPUs) are exploited to speed planning. | |
438 | |
439 * Efficient transforms of real even/odd arrays, a.k.a. discrete | |
440 cosine/sine transforms (types I-IV). (Currently work via pre/post | |
441 processing of real transforms, ala FFTPACK, so are not optimal.) | |
442 | |
443 * DHTs (Discrete Hartley Transforms), again via post-processing | |
444 of real transforms (and thus suboptimal, for now). | |
445 | |
446 * Support for linking to just those parts of FFTW that you need, | |
447 greatly reducing the size of statically linked programs when | |
448 only a limited set of transform sizes/types are required. | |
449 | |
450 * Canonical global wisdom file (/etc/fftw/wisdom) on Unix, along | |
451 with a command-line tool (fftw-wisdom) to generate/update it. | |
452 | |
453 * Fortran API can be used with both g77 and non-g77 compilers | |
454 simultaneously. | |
455 | |
456 * Multi-threaded version has optional OpenMP support. | |
457 | |
458 * Authors' good looks have greatly improved with age. | |
459 | |
460 Changes from 3.0beta3: | |
461 | |
462 * Separate FMA distribution to better exploit fused multiply-add instructions | |
463 on PowerPC (and possibly other) architectures. | |
464 | |
465 * Performance improvements via some inlining tweaks. | |
466 | |
467 * fftw_flops now returns double arguments, not int, to avoid overflows | |
468 for large sizes. | |
469 | |
470 * Workarounds for automake bugs. | |
471 | |
472 Changes from 3.0beta2: | |
473 | |
474 * The standard REDFT00/RODFT00 (DCT-I/DST-I) algorithm (used in | |
475 FFTPACK, NR, etcetera) turns out to have poor numerical accuracy, so | |
476 we replaced it with a slower routine that is more accurate. | |
477 | |
478 * The guru planner and execute functions now have two variants, one that | |
479 takes complex arguments and one that takes separate real/imag pointers. | |
480 | |
481 * Execute and planner routines now automatically align the stack on x86, | |
482 in case the calling program is misaligned. | |
483 | |
484 * README file for test program. | |
485 | |
486 * Fixed bugs in the combination of SIMD with multi-threaded transforms. | |
487 | |
488 * Eliminated internal fftw_threads_init function, which some people were | |
489 calling accidentally instead of the fftw_init_threads API function. | |
490 | |
491 * Check for -openmp flag (Intel C compiler) when --enable-openmp is used. | |
492 | |
493 * Support AMD x86-64 SIMD and cycle counter. | |
494 | |
495 * Support SSE2 intrinsics in forthcoming gcc 3.3. | |
496 | |
497 Changes from 3.0beta1: | |
498 | |
499 * Faster in-place 1d transforms of non-power-of-two sizes. | |
500 | |
501 * SIMD improvements for in-place, multi-dimensional, and/or non-FFTW_PATIENT | |
502 transforms. | |
503 | |
504 * Added support for hard-coded DCT/DST/DHT codelets of small sizes; the | |
505 default distribution only includes hard-coded size-8 DCT-II/III, however. | |
506 | |
507 * Many minor improvements to the manual. Added section on using the | |
508 codelet generator to customize and enhance FFTW. | |
509 | |
510 * The default 'make check' should now only take a few minutes; for more | |
511 strenuous tests (which may take a day or so), do 'cd tests; make bigcheck'. | |
512 | |
513 * fftw_print_plan is split into fftw_fprint_plan and fftw_print_plan, where | |
514 the latter uses stdout. | |
515 | |
516 * Fixed ability to compile with a C++ compiler. | |
517 | |
518 * Fixed support for C99 complex type under glibc. | |
519 | |
520 * Fixed problems with alloca under MinGW, AIX. | |
521 | |
522 * Workaround for gcc/SPARC bug. | |
523 | |
524 * Fixed multi-threaded initialization failure on IRIX due to lack of | |
525 user-accessible PTHREAD_SCOPE_SYSTEM there. |