comparison src/fftw-3.3.5/NEWS @ 127:7867fa7e1b6b

Current fftw source
author Chris Cannam <cannam@all-day-breakfast.com>
date Tue, 18 Oct 2016 13:40:26 +0100
parents
children
comparison
equal deleted inserted replaced
126:4a7071416412 127:7867fa7e1b6b
1 FFTW 3.3.5:
2
3 * New SIMD support:
4 - Power8 VSX instructions in single and double precision.
5 To use, add --enable-vsx to configure.
6 - Support for AVX2 (256-bit FMA instructions).
7 To use, add --enable-avx2 to configure.
8 - Experimental support for AVX512 and KCVI. (--enable-avx512, --enable-kcvi)
9 This code is expected to work but the FFTW maintainers do not have
10 hardware to test it.
11 - Support for AVX128/FMA (for some AMD machines) (--enable-avx128-fma)
12 - Double precision Neon SIMD for aarch64.
13 This code is expected to work but the FFTW maintainers do not have
14 hardware to test it.
15 - generic SIMD support using gcc vector intrinsics
16 * Add fftw_make_planner_thread_safe() API
17 * fix #18 (disable float128 for CUDACC)
18 * fix #19: missing Fortran interface for fftwq_alloc_real
19 * fix #21 (don't use float128 on Portland compilers, which pretend to be gcc)
20 * fix: Avoid segfaults due to double free in MPI transpose
21
22 * Special note for distribution maintainers: Although FFTW supports a
23 zillion SIMD instruction sets, enabling them all at the same time is
24 a bad idea, because it increases the planning time for minimal gain.
25 We recommend that general-purpose x86 distributions only enable SSE2
26 and perhaps AVX. Users who care about the last ounce of performance
27 should recompile FFTW themselves.
28
29 FFTW 3.3.4
30
31 * New functions fftw_alignment_of (to check whether two arrays are
32 equally aligned for the purposes of applying a plan) and fftw_sprint_plan
33 (to output a description of plan to a string).
34
35 * Bugfix in fftw-wisdom-to-conf; thanks to Florian Oppermann for the
36 bug report.
37
38 * Fixed manual to work with texinfo-5.
39
40 * Increased timing interval on x86_64 to reduce timing errors.
41
42 * Default to Win32 threads, not pthreads, if both are present.
43
44 * Various build-script fixes.
45
46 FFTW 3.3.3
47
48 * Fix deadlock bug in MPI transforms (thanks to Michael Pippig for the
49 bug report and patch, and to Graham Dennis for the bug report).
50
51 * Use 128-bit ARM NEON instructions instead of 64-bits. This change
52 appears to speed up even ARM processors with a 64-bit NEON pipe.
53
54 * Speed improvements for single-precision AVX.
55
56 * Speed up planner on machines without "official" cycle counters, such as ARM.
57
58 FFTW 3.3.2
59
60 * Removed an archaic stack-alignment hack that was failing with
61 gcc-4.7/i386.
62
63 * Added stack-alignment hack necessary for gcc on Windows/i386. We
64 will regret this in ten years (see previous change).
65
66 * Fix incompatibility with Intel icc which pretends to be gcc
67 but does not support quad precision.
68
69 * make libfftw{threads,mpi} depend upon libfftw when using libtool;
70 this is consistent with most other libraries and simplifies the life
71 of various distributors of GNU/Linux.
72
73 FFTW 3.3.1
74
75 * Changes since 3.3.1-beta1:
76
77 - Reduced planning time in estimate mode for sizes with large
78 prime factors.
79
80 - Added AVX autodetection under Visual Studio. Thanks Carsten
81 Steger for submitting the necessary code.
82
83 - Modern Fortran interface now uses a separate fftw3l.f03 interface
84 file for the long double interface, which is not supported by
85 some Fortran compilers. Provided new fftw3q.f03 interface file
86 to access the quadruple-precision FFTW routines with recent
87 versions of gcc/gfortran.
88
89 * Added support for the NEON extensions to the ARM ISA. (Note to beta
90 users: an ARM cycle counter is not yet implemented; please contact
91 fftw@fftw.org if you know how to do it right.)
92
93 * MPI code now compiles even if mpicc is a C++ compiler; thanks to
94 Kyle Spyksma for the bug report.
95
96 FFTW 3.3
97
98 * Changes since 3.3-beta1:
99
100 - Compiling OpenMP support (--enable-openmp) now installs a
101 fftw3_omp library, instead of fftw3_threads, so that OpenMP
102 and POSIX threads (--enable-threads) libraries can be built
103 and installed at the same time.
104
105 - Various minor compilation fixes, corrections of manual typos, and
106 improvements to the benchmark test program.
107
108 * Add support for the AVX extensions to x86 and x86-64. The AVX code
109 works with 16-byte alignment (as opposed to 32-byte alignment),
110 so there is no ABI change compared to FFTW 3.2.2.
111
112 * Added Fortran 2003 interface, which should be usable on most modern
113 Fortran compilers (e.g. gfortran) and provides type-checked access
114 to the the C FFTW interface. (The legacy Fortran-77 interface is
115 still included also.)
116
117 * Added MPI distributed-memory transforms. Compared to 3.3alpha,
118 the major changes in the MPI transforms are:
119 - Fixed some deadlock and crashing bugs.
120 - Added Fortran 2003 interface.
121 - Added new-array execute functions for MPI plans.
122 - Eliminated use of large MPI tags, since Cray MPI requires tags < 2^24;
123 thanks to Jonathan Bentz for the bug report.
124 - Expanded documentation.
125 - 'make check' now runs MPI tests
126 - Some ABI changes - not binary-compatible with 3.3alpha MPI.
127
128 * Add support for quad-precision __float128 in gcc 4.6 or later (on x86.
129 x86-64, and Itanium). The new routines use the fftwq_ prefix.
130
131 * Removed support for MIPS paired-single instructions due to lack of
132 available hardware for testing. Users who want this functionality
133 should continue using FFTW 3.2.x. (Note that FFTW 3.3 still works
134 on MIPS; this only concerns special instructions available on some
135 MIPS chips.)
136
137 * Removed support for the Cell Broadband Engine. Cell users should
138 use FFTW 3.2.x.
139
140 * New convenience functions fftw_alloc_real and fftw_alloc_complex
141 to use fftw_malloc for real and complex arrays without typecasts
142 or sizeof.
143
144 * New convenience functions fftw_export_wisdom_to_filename and
145 fftw_import_wisdom_from_filename that export/import wisdom
146 to a file, which don't require you to open/close the file yourself.
147
148 * New function fftw_cost to return FFTW's internal cost metric for
149 a given plan; thanks to Rhys Ulerich and Nathanael Schaeffer for the
150 suggestion.
151
152 * The --enable-sse2 configure flag now works in both double and single
153 precision (and is equivalent to --enable-sse in the latter case).
154
155 * Remove --enable-portable-binary flag: we new produce portable binaries
156 by default.
157
158 * Remove the automatic detection of native architecture flag for gcc
159 which was introduced in fftw-3.1, since new gcc supports -mtune=native.
160 Remove the --with-gcc-arch flag; if you want to specify a particlar
161 arch to configure, use ./configure CC="gcc -mtune=...".
162
163 * --with-our-malloc16 configure flag is now renamed --with-our-malloc.
164
165 * Fixed build problem failure when srand48 declaration is missing;
166 thanks to Ralf Wildenhues for the bug report.
167
168 * Fixed bug in fftw_set_timelimit: ensure that a negative timelimit
169 is equivalent to no timelimit in all cases. Thanks to William Andrew
170 Burnson for the bug report.
171
172 * Fixed stack-overflow problem on OpenBSD caused by using alloca with
173 too large a buffer.
174
175 FFTW 3.2.2
176
177 * Improve performance of some copy operations of complex arrays on
178 x86 machines.
179
180 * Add configure flag to disable alloca(), which is broken in mingw64.
181
182 * Planning in FFTW_ESTIMATE mode for r2r transforms became slower
183 between fftw-3.1.3 and 3.2. This regression has now been fixed.
184
185 FFTW 3.2.1
186
187 * Performance improvements for some multidimensional r2c/c2r transforms;
188 thanks to Eugene Miloslavsky for his benchmark reports.
189
190 * Compile with icc on MacOS X, use better icc compiler flags.
191
192 * Compilation fixes for systems where snprintf is defined as a macro;
193 thanks to Marcus Mae for the bug report.
194
195 * Fortran documentation now recommends not using dfftw_execute,
196 because of reports of problems with various Fortran compilers;
197 it is better to use dfftw_execute_dft etcetera.
198
199 * Some documentation clarifications, e.g. of fact that --enable-openmp
200 and --enable-threads are mutually exclusive (thanks to Long To),
201 and document slightly odd behavior of plan_guru_r2r in Fortran
202 (thanks to Alexander Pozdneev).
203
204 * FAQ was accidentally omitted from 3.2 tarball.
205
206 * Remove some extraneous (harmless) files accidentally included in
207 a subdirectory of the 3.2 tarball.
208
209 FFTW 3.2
210
211 * Worked around apparent glibc bug that leads to rare hangs when freeing
212 semaphores.
213
214 * Fixed segfault due to unaligned access in certain obscure problems
215 that use SSE and multiple threads.
216
217 * MPI transforms not included, as they are still in alpha; the alpha
218 versions of the MPI transforms have been moved to FFTW 3.3alpha1.
219
220 FFTW 3.2alpha3
221
222 * Performance improvements for sizes with factors of 5 and 10.
223
224 * Documented FFTW_WISDOM_ONLY flag, at the suggestion of Mario
225 Emmenlauer and Phil Dumont.
226
227 * Port Cell code to SDK2.1 (libspe2), as opposed to the old libspe1 code.
228
229 * Performance improvements in Cell code for N < 32k, thanks to Jan Wagner
230 for the suggestions.
231
232 * Cycle counter for Sun x86_64 compiler, and compilation fix in cycle
233 counter for AIX/xlc (thanks to Jeff Haferman for the bug report).
234
235 * Fixed incorrect type prefix in MPI code that prevented wisdom routines
236 from working in single precision (thanks to Eric A. Borisch for the report).
237
238 * Added 'make check' for MPI code (which still fails in a couple corner
239 cases, but should be much better than in alpha2).
240
241 * Many other small fixes.
242
243 FFTW 3.2alpha2
244
245 * Support for the Cell processor, donated by IBM Research; see README.Cell
246 and the Cell section of the manual.
247
248 * New 64-bit API: for every "plan_guru" function there is a new "plan_guru64"
249 function with the same semantics, but which takes fftw_iodim64 instead of
250 fftw_iodim. fftw_iodim64 is the same as fftw_iodim, except that it takes
251 ptrdiff_t integer types as parameters, which is a 64-bit type on
252 64-bit machines. This is only useful for specifying very large transforms
253 on 64-bit machines. (Internally, FFTW uses ptrdiff_t everywhere
254 regardless of what API you choose.)
255
256 * Experimental MPI support. Complex one- and multi-dimensional FFTs,
257 multi-dimensional r2r, multi-dimensional r2c/c2r transforms, and
258 distributed transpose operations, with 1d block distributions.
259 (This is an alpha preview: routines have not been exhaustively
260 tested, documentation is incomplete, and some functionality is
261 missing, e.g. Fortran support.) See mpi/README and also the MPI
262 section of the manual.
263
264 * Significantly faster r2c/c2r transforms, especially on machines with SIMD.
265
266 * Rewritten multi-threaded support for better performance by
267 re-using a fixed pool of threads rather than continually
268 respawning and joining (which nowadays is much slower).
269
270 * Support for MIPS paired-single SIMD instructions, donated by
271 Codesourcery.
272
273 * FFTW_WISDOM_ONLY planner flag, to create plan only if wisdom is
274 available and return NULL otherwise.
275
276 * Removed k7 support, which only worked in 32-bit mode and is
277 becoming obsolete. Use --enable-sse instead.
278
279 * Added --with-g77-wrappers configure option to force inclusion
280 of g77 wrappers, in addition to whatever is needed for the
281 detected Fortran compilers. This is mainly intended for GNU/Linux
282 distros switching to gfortran that wish to include both
283 gfortran and g77 support in FFTW.
284
285 * In manual, renamed "guru execute" functions to "new-array execute"
286 functions, to reduce confusion with the guru planner interface.
287 (The programming interface is unchanged.)
288
289 * Add missing __declspec attribute to threads API functions when compiling
290 for Windows; thanks to Robert O. Morris for the bug report.
291
292 * Fixed missing return value from dfftw_init_threads in Fortran;
293 thanks to Markus Wetzstein for the bug report.
294
295 FFTW 3.1.3
296
297 * Bug fix: FFTW computes incorrect results when the user plans both
298 REDFT11 and RODFT11 transforms of certain sizes. The bug is caused
299 by incorrect sharing of twiddle-factor tables between the two
300 transforms, and only occurs when both are used. Thanks to Paul
301 A. Valiant for the bug report.
302
303 FFTW 3.1.2
304
305 * Correct bug in configure script: --enable-portable-binary option was ignored!
306 Thanks to Andrew Salamon for the bug report.
307
308 * Threads compilation fix on AIX: prefer xlc_r to cc_r, and don't use
309 either if we are using gcc. Thanks to Guy Moebs for the bug report.
310
311 * Updated FAQ to note that Apple gcc 4.0.1 on MacOS/Intel is broken,
312 and suggest a workaround. configure script now detects Core/Duo arch.
313
314 * Use -maltivec when checking for altivec.h. Fixes Gentoo bug #129304,
315 thanks to Markus Dittrich.
316
317 FFTW 3.1.1
318
319 * Performance improvements for Intel EMT64.
320
321 * Performance improvements for large-size transforms with SIMD.
322
323 * Cycle counter support for Intel icc and Visual C++ on x86-64.
324
325 * In fftw-wisdom tool, replaced obsolete --impatient with --measure.
326
327 * Fixed compilation failure with AIX/xlc; thanks to Joseph Thomas.
328
329 * Windows DLL support for Fortran API (added missing __declspec(dllexport)).
330
331 * SSE/SSE2 code works properly (i.e. disables itself) on older 386 and 486
332 CPUs lacking a CPUID instruction; thanks to Eric Korpela.
333
334 FFTW 3.1
335
336 * Faster FFTW_ESTIMATE planner.
337
338 * New (faster) algorithm for REDFT00/RODFT00 (type-I DCT/DST) of odd size.
339
340 * "4-step" algorithm for faster FFTs of very large sizes (> 2^18).
341
342 * Faster in-place real-data DFTs (for R2HC and HC2R r2r formats).
343
344 * Faster in-place non-square transpositions (FFTW uses these internally
345 for in-place FFTs, and you can also perform them explicitly using
346 the guru interface).
347
348 * Faster prime-size DFTs: implemented Bluestein's algorithm, as well
349 as a zero-padded Rader variant to limit recursive use of Rader's algorithm.
350
351 * SIMD support for split complex arrays.
352
353 * Much faster Altivec/VMX performance.
354
355 * New fftw_set_timelimit function to specify a (rough) upper bound to the
356 planning time (does not affect ESTIMATE mode).
357
358 * Removed --enable-3dnow support; use --enable-k7 instead.
359
360 * FMA (fused multiply-add) version is now included in "standard" FFTW,
361 and is enabled with --enable-fma (the default on PowerPC and Itanium).
362
363 * Automatic detection of native architecture flag for gcc. New
364 configure options: --enable-portable-binary and --with-gcc-arch=<arch>,
365 for people distributing compiled binaries of FFTW (see manual).
366
367 * Automatic detection of Altivec under Linux with gcc 3.4 (so that
368 same binary should work on both Altivec and non-Altivec PowerPCs).
369
370 * Compiler-specific tweaks/flags/workarounds for gcc 3.4, xlc, HP/UX,
371 Solaris/Intel.
372
373 * Various documentation clarifications.
374
375 * 64-bit clean. (Fixes a bug affecting the split guru planner on
376 64-bit machines, reported by David Necas.)
377
378 * Fixed Debian bug #259612: inadvertent use of SSE instructions on
379 non-SSE machines (causing a crash) for --enable-sse binaries.
380
381 * Fixed bug that caused HC2R transforms to destroy the input in
382 certain cases, even if the user specified FFTW_PRESERVE_INPUT.
383
384 * Fixed bug where wisdom would be lost under rare circumstances,
385 causing excessive planning time.
386
387 * FAQ notes bug in gcc-3.4.[1-3] that causes FFTW to crash with SSE/SSE2.
388
389 * Fixed accidentally exported symbol that prohibited simultaneous
390 linking to double/single multithreaded FFTW (thanks to Alessio Massaro).
391
392 * Support Win32 threads under MinGW (thanks to Alessio Massaro).
393
394 * Fixed problem with building DLL under Cygwin; thanks to Stephane Fillod.
395
396 * Fix build failure if no Fortran compiler is found (thanks to Charles
397 Radley for the bug report).
398
399 * Fixed compilation failure with icc 8.0 and SSE/SSE2. Automatic
400 detection of icc architecture flag (e.g. -xW).
401
402 * Fixed compilation with OpenMP on AIX (thanks to Greg Bauer).
403
404 * Fixed compilation failure on x86-64 with gcc (thanks to Orion Poplawski).
405
406 * Incorporated patch from FreeBSD ports (FreeBSD does not have memalign,
407 but its malloc is 16-byte aligned).
408
409 * Cycle-counter compilation fixes for Itanium, Alpha, x86-64, Sparc,
410 MacOS (thanks to Matt Boman, John Bowman, and James A. Treacy for
411 reports/fixes). Added x86-64 cycle counter for PGI compilers,
412 courtesy Cristiano Calonaci.
413
414 * Fix compilation problem in test program due to C99 conflict.
415
416 * Portability fix for import_system_wisdom with djgpp (thanks to Juan
417 Manuel Guerrero).
418
419 * Fixed compilation failure on MacOS 10.3 due to getopt conflict.
420
421 * Work around Visual C++ (version 6/7) bug in SSE compilation;
422 thanks to Eddie Yee for his detailed report.
423
424 Changes from FFTW 3.1 beta 2:
425
426 * Several minor compilation fixes.
427
428 * Eliminate FFTW_TIMELIMIT flag and replace fftw_timelimit global with
429 fftw_set_timelimit function. Make wisdom work with time-limited plans.
430
431 Changes from FFTW 3.1 beta 1:
432
433 * Fixes for creating DLLs under Windows; thanks to John Pavel for his feedback.
434
435 * Fixed more 64-bit problems, thanks to John Pavel for the bug report.
436
437 * Further speed improvements for Altivec/VMX.
438
439 * Further speed improvements for non-square transpositions.
440
441 * Many minor tweaks.
442
443 FFTW 3.0.1
444
445 * Some speed improvements in SIMD code.
446
447 * --without-cycle-counter option is removed. If no cycle counter is found,
448 then the estimator is always used. A --with-slow-timer option is provided
449 to force the use of lower-resolution timers.
450
451 * Several fixes for compilation under Visual C++, with help from Stefane Ruel.
452
453 * Added x86 cycle counter for Visual C++, with help from Morten Nissov.
454
455 * Added S390 cycle counter, courtesy of James Treacy.
456
457 * Added missing static keyword that prevented simultaneous linkage
458 of different-precision versions; thanks to Rasmus Larsen for the bug report.
459
460 * Corrected accidental omission of f77_wisdom.f file; thanks to Alan Watson.
461
462 * Support -xopenmp flag for SunOS; thanks to John Lou for the bug report.
463
464 * Compilation with HP/UX cc requires -Wp,-H128000 flag to increase
465 preprocessor limits; thanks to Peter Vouras for the bug report.
466
467 * Removed non-portable use of 'tempfile' in fftw-wisdom-to-conf script;
468 thanks to Nicolas Decoster for the patch.
469
470 * Added 'make smallcheck' target in tests/ directory, at the request of
471 James Treacy.
472
473 FFTW 3.0
474
475 Major goals of this release:
476
477 * Speed: often 20% or more faster than FFTW 2.x, even without SIMD (see below).
478
479 * Complete rewrite, to make it easier to add new algorithms and transforms.
480
481 * New API, to support more general semantics.
482
483 Other enhancements:
484
485 * SIMD acceleration on supporting CPUs (SSE, SSE2, 3DNow!, and AltiVec).
486 (With special thanks to Franz Franchetti for many experimental prototypes
487 and to Stefan Kral for the vectorizing generator from fftwgel.)
488
489 * True in-place 1d transforms of large sizes (as well as compressed
490 twiddle tables for additional memory/cache savings).
491
492 * More arbitrary placement of real & imaginary data, e.g. including
493 interleaved (as in FFTW 2.x) as well as separate real/imag arrays.
494
495 * Efficient prime-size transforms of real data.
496
497 * Multidimensional transforms can operate on a subset of a larger matrix,
498 and/or transform selected dimensions of a multidimensional array.
499
500 * By popular demand, simultaneous linking to double precision (fftw),
501 single precision (fftwf), and long-double precision (fftwl) versions
502 of FFTW is now supported.
503
504 * Cycle counters (on all modern CPUs) are exploited to speed planning.
505
506 * Efficient transforms of real even/odd arrays, a.k.a. discrete
507 cosine/sine transforms (types I-IV). (Currently work via pre/post
508 processing of real transforms, ala FFTPACK, so are not optimal.)
509
510 * DHTs (Discrete Hartley Transforms), again via post-processing
511 of real transforms (and thus suboptimal, for now).
512
513 * Support for linking to just those parts of FFTW that you need,
514 greatly reducing the size of statically linked programs when
515 only a limited set of transform sizes/types are required.
516
517 * Canonical global wisdom file (/etc/fftw/wisdom) on Unix, along
518 with a command-line tool (fftw-wisdom) to generate/update it.
519
520 * Fortran API can be used with both g77 and non-g77 compilers
521 simultaneously.
522
523 * Multi-threaded version has optional OpenMP support.
524
525 * Authors' good looks have greatly improved with age.
526
527 Changes from 3.0beta3:
528
529 * Separate FMA distribution to better exploit fused multiply-add instructions
530 on PowerPC (and possibly other) architectures.
531
532 * Performance improvements via some inlining tweaks.
533
534 * fftw_flops now returns double arguments, not int, to avoid overflows
535 for large sizes.
536
537 * Workarounds for automake bugs.
538
539 Changes from 3.0beta2:
540
541 * The standard REDFT00/RODFT00 (DCT-I/DST-I) algorithm (used in
542 FFTPACK, NR, etcetera) turns out to have poor numerical accuracy, so
543 we replaced it with a slower routine that is more accurate.
544
545 * The guru planner and execute functions now have two variants, one that
546 takes complex arguments and one that takes separate real/imag pointers.
547
548 * Execute and planner routines now automatically align the stack on x86,
549 in case the calling program is misaligned.
550
551 * README file for test program.
552
553 * Fixed bugs in the combination of SIMD with multi-threaded transforms.
554
555 * Eliminated internal fftw_threads_init function, which some people were
556 calling accidentally instead of the fftw_init_threads API function.
557
558 * Check for -openmp flag (Intel C compiler) when --enable-openmp is used.
559
560 * Support AMD x86-64 SIMD and cycle counter.
561
562 * Support SSE2 intrinsics in forthcoming gcc 3.3.
563
564 Changes from 3.0beta1:
565
566 * Faster in-place 1d transforms of non-power-of-two sizes.
567
568 * SIMD improvements for in-place, multi-dimensional, and/or non-FFTW_PATIENT
569 transforms.
570
571 * Added support for hard-coded DCT/DST/DHT codelets of small sizes; the
572 default distribution only includes hard-coded size-8 DCT-II/III, however.
573
574 * Many minor improvements to the manual. Added section on using the
575 codelet generator to customize and enhance FFTW.
576
577 * The default 'make check' should now only take a few minutes; for more
578 strenuous tests (which may take a day or so), do 'cd tests; make bigcheck'.
579
580 * fftw_print_plan is split into fftw_fprint_plan and fftw_print_plan, where
581 the latter uses stdout.
582
583 * Fixed ability to compile with a C++ compiler.
584
585 * Fixed support for C99 complex type under glibc.
586
587 * Fixed problems with alloca under MinGW, AIX.
588
589 * Workaround for gcc/SPARC bug.
590
591 * Fixed multi-threaded initialization failure on IRIX due to lack of
592 user-accessible PTHREAD_SCOPE_SYSTEM there.