comparison src/fftw-3.3.8/NEWS @ 82:d0c2a83c1364

Add FFTW 3.3.8 source, and a Linux build
author Chris Cannam
date Tue, 19 Nov 2019 14:52:55 +0000
parents
children
comparison
equal deleted inserted replaced
81:7029a4916348 82:d0c2a83c1364
1 FFTW 3.3.8:
2
3 * Fixed AVX, AVX2 for gcc-8.
4
5 By default, FFTW 3.3.7 was broken with gcc-8. AVX and AVX2 code
6 assumed that the compiler honors the distinction between +0 and -0,
7 but gcc-8 -ffast-math does not. The default CFLAGS included -ffast-math.
8 This release ensures that FFTW works with gcc-8 -ffast-math, and
9 removes -ffast-math from the default CFLAGS for good measure.
10
11 FFTW 3.3.7:
12
13 * Experimental support for CMake.
14
15 The primary build mechanism for FFTW remains GNU autoconf/automake.
16 CMake support is meant to offer an easy way to compile FFTW on
17 Windows, and as such it does not cover all the features of the
18 automake build system, such as exotic cycle counters,
19 cross-compiling, or build of binaries for a mixture of ISA's
20 (e.g., amd64 vs amd64+avx vs amd64+avx2). Patches are welcome.
21
22 * Fixes for armv7a cycle counter.
23 * Official support for aarch64, now that we have hardware to test it.
24 * Tweak usage of FMA instructions in a way that favors newer processors
25 (Skylake and Ryzen) over older processors (Haswell).
26 * tests/bench: use 64-bit precision to compute mflops.
27
28 FFTW 3.3.6-pl2:
29
30 * Bugfix: MPI Fortran-03 headers were missing in FFTW 3.3.6-pl1.
31
32 FFTW 3.3.6-pl1:
33
34 * Bugfix: FFTW 3.3.6 had the wrong libtool version number, and generated
35 shared libraries of the form libfftw3.so.2.6.6 instead of
36 libfftw3.so.3.*.
37
38 FFTW 3.3.6:
39
40 * The fftw_make_planner_thread_safe() API introduced in 3.3.5 didn't
41 work, and this 3.3.6 fixes it. Sorry about that.
42 * compilation fixes for IBM XLC
43 * compilation fixes for threads on Windows
44 * fix SIMD autodetection on amd64 when (_MSC_VER > 1500)
45
46 FFTW 3.3.5:
47
48 * New SIMD support:
49 - Power8 VSX instructions in single and double precision.
50 To use, add --enable-vsx to configure.
51 - Support for AVX2 (256-bit FMA instructions).
52 To use, add --enable-avx2 to configure.
53 - Experimental support for AVX512 and KCVI. (--enable-avx512, --enable-kcvi)
54 This code is expected to work but the FFTW maintainers do not have
55 hardware to test it.
56 - Support for AVX128/FMA (for some AMD machines) (--enable-avx128-fma)
57 - Double precision Neon SIMD for aarch64.
58 This code is expected to work but the FFTW maintainers do not have
59 hardware to test it.
60 - generic SIMD support using gcc vector intrinsics
61 * Add fftw_make_planner_thread_safe() API
62 * fix #18 (disable float128 for CUDACC)
63 * fix #19: missing Fortran interface for fftwq_alloc_real
64 * fix #21 (don't use float128 on Portland compilers, which pretend to be gcc)
65 * fix: Avoid segfaults due to double free in MPI transpose
66
67 * Special note for distribution maintainers: Although FFTW supports a
68 zillion SIMD instruction sets, enabling them all at the same time is
69 a bad idea, because it increases the planning time for minimal gain.
70 We recommend that general-purpose x86 distributions only enable SSE2
71 and perhaps AVX. Users who care about the last ounce of performance
72 should recompile FFTW themselves.
73
74 FFTW 3.3.4
75
76 * New functions fftw_alignment_of (to check whether two arrays are
77 equally aligned for the purposes of applying a plan) and fftw_sprint_plan
78 (to output a description of plan to a string).
79
80 * Bugfix in fftw-wisdom-to-conf; thanks to Florian Oppermann for the
81 bug report.
82
83 * Fixed manual to work with texinfo-5.
84
85 * Increased timing interval on x86_64 to reduce timing errors.
86
87 * Default to Win32 threads, not pthreads, if both are present.
88
89 * Various build-script fixes.
90
91 FFTW 3.3.3
92
93 * Fix deadlock bug in MPI transforms (thanks to Michael Pippig for the
94 bug report and patch, and to Graham Dennis for the bug report).
95
96 * Use 128-bit ARM NEON instructions instead of 64-bits. This change
97 appears to speed up even ARM processors with a 64-bit NEON pipe.
98
99 * Speed improvements for single-precision AVX.
100
101 * Speed up planner on machines without "official" cycle counters, such as ARM.
102
103 FFTW 3.3.2
104
105 * Removed an archaic stack-alignment hack that was failing with
106 gcc-4.7/i386.
107
108 * Added stack-alignment hack necessary for gcc on Windows/i386. We
109 will regret this in ten years (see previous change).
110
111 * Fix incompatibility with Intel icc which pretends to be gcc
112 but does not support quad precision.
113
114 * make libfftw{threads,mpi} depend upon libfftw when using libtool;
115 this is consistent with most other libraries and simplifies the life
116 of various distributors of GNU/Linux.
117
118 FFTW 3.3.1
119
120 * Changes since 3.3.1-beta1:
121
122 - Reduced planning time in estimate mode for sizes with large
123 prime factors.
124
125 - Added AVX autodetection under Visual Studio. Thanks Carsten
126 Steger for submitting the necessary code.
127
128 - Modern Fortran interface now uses a separate fftw3l.f03 interface
129 file for the long double interface, which is not supported by
130 some Fortran compilers. Provided new fftw3q.f03 interface file
131 to access the quadruple-precision FFTW routines with recent
132 versions of gcc/gfortran.
133
134 * Added support for the NEON extensions to the ARM ISA. (Note to beta
135 users: an ARM cycle counter is not yet implemented; please contact
136 fftw@fftw.org if you know how to do it right.)
137
138 * MPI code now compiles even if mpicc is a C++ compiler; thanks to
139 Kyle Spyksma for the bug report.
140
141 FFTW 3.3
142
143 * Changes since 3.3-beta1:
144
145 - Compiling OpenMP support (--enable-openmp) now installs a
146 fftw3_omp library, instead of fftw3_threads, so that OpenMP
147 and POSIX threads (--enable-threads) libraries can be built
148 and installed at the same time.
149
150 - Various minor compilation fixes, corrections of manual typos, and
151 improvements to the benchmark test program.
152
153 * Add support for the AVX extensions to x86 and x86-64. The AVX code
154 works with 16-byte alignment (as opposed to 32-byte alignment),
155 so there is no ABI change compared to FFTW 3.2.2.
156
157 * Added Fortran 2003 interface, which should be usable on most modern
158 Fortran compilers (e.g. gfortran) and provides type-checked access
159 to the the C FFTW interface. (The legacy Fortran-77 interface is
160 still included also.)
161
162 * Added MPI distributed-memory transforms. Compared to 3.3alpha,
163 the major changes in the MPI transforms are:
164 - Fixed some deadlock and crashing bugs.
165 - Added Fortran 2003 interface.
166 - Added new-array execute functions for MPI plans.
167 - Eliminated use of large MPI tags, since Cray MPI requires tags < 2^24;
168 thanks to Jonathan Bentz for the bug report.
169 - Expanded documentation.
170 - 'make check' now runs MPI tests
171 - Some ABI changes - not binary-compatible with 3.3alpha MPI.
172
173 * Add support for quad-precision __float128 in gcc 4.6 or later (on x86.
174 x86-64, and Itanium). The new routines use the fftwq_ prefix.
175
176 * Removed support for MIPS paired-single instructions due to lack of
177 available hardware for testing. Users who want this functionality
178 should continue using FFTW 3.2.x. (Note that FFTW 3.3 still works
179 on MIPS; this only concerns special instructions available on some
180 MIPS chips.)
181
182 * Removed support for the Cell Broadband Engine. Cell users should
183 use FFTW 3.2.x.
184
185 * New convenience functions fftw_alloc_real and fftw_alloc_complex
186 to use fftw_malloc for real and complex arrays without typecasts
187 or sizeof.
188
189 * New convenience functions fftw_export_wisdom_to_filename and
190 fftw_import_wisdom_from_filename that export/import wisdom
191 to a file, which don't require you to open/close the file yourself.
192
193 * New function fftw_cost to return FFTW's internal cost metric for
194 a given plan; thanks to Rhys Ulerich and Nathanael Schaeffer for the
195 suggestion.
196
197 * The --enable-sse2 configure flag now works in both double and single
198 precision (and is equivalent to --enable-sse in the latter case).
199
200 * Remove --enable-portable-binary flag: we new produce portable binaries
201 by default.
202
203 * Remove the automatic detection of native architecture flag for gcc
204 which was introduced in fftw-3.1, since new gcc supports -mtune=native.
205 Remove the --with-gcc-arch flag; if you want to specify a particlar
206 arch to configure, use ./configure CC="gcc -mtune=...".
207
208 * --with-our-malloc16 configure flag is now renamed --with-our-malloc.
209
210 * Fixed build problem failure when srand48 declaration is missing;
211 thanks to Ralf Wildenhues for the bug report.
212
213 * Fixed bug in fftw_set_timelimit: ensure that a negative timelimit
214 is equivalent to no timelimit in all cases. Thanks to William Andrew
215 Burnson for the bug report.
216
217 * Fixed stack-overflow problem on OpenBSD caused by using alloca with
218 too large a buffer.
219
220 FFTW 3.2.2
221
222 * Improve performance of some copy operations of complex arrays on
223 x86 machines.
224
225 * Add configure flag to disable alloca(), which is broken in mingw64.
226
227 * Planning in FFTW_ESTIMATE mode for r2r transforms became slower
228 between fftw-3.1.3 and 3.2. This regression has now been fixed.
229
230 FFTW 3.2.1
231
232 * Performance improvements for some multidimensional r2c/c2r transforms;
233 thanks to Eugene Miloslavsky for his benchmark reports.
234
235 * Compile with icc on MacOS X, use better icc compiler flags.
236
237 * Compilation fixes for systems where snprintf is defined as a macro;
238 thanks to Marcus Mae for the bug report.
239
240 * Fortran documentation now recommends not using dfftw_execute,
241 because of reports of problems with various Fortran compilers;
242 it is better to use dfftw_execute_dft etcetera.
243
244 * Some documentation clarifications, e.g. of fact that --enable-openmp
245 and --enable-threads are mutually exclusive (thanks to Long To),
246 and document slightly odd behavior of plan_guru_r2r in Fortran
247 (thanks to Alexander Pozdneev).
248
249 * FAQ was accidentally omitted from 3.2 tarball.
250
251 * Remove some extraneous (harmless) files accidentally included in
252 a subdirectory of the 3.2 tarball.
253
254 FFTW 3.2
255
256 * Worked around apparent glibc bug that leads to rare hangs when freeing
257 semaphores.
258
259 * Fixed segfault due to unaligned access in certain obscure problems
260 that use SSE and multiple threads.
261
262 * MPI transforms not included, as they are still in alpha; the alpha
263 versions of the MPI transforms have been moved to FFTW 3.3alpha1.
264
265 FFTW 3.2alpha3
266
267 * Performance improvements for sizes with factors of 5 and 10.
268
269 * Documented FFTW_WISDOM_ONLY flag, at the suggestion of Mario
270 Emmenlauer and Phil Dumont.
271
272 * Port Cell code to SDK2.1 (libspe2), as opposed to the old libspe1 code.
273
274 * Performance improvements in Cell code for N < 32k, thanks to Jan Wagner
275 for the suggestions.
276
277 * Cycle counter for Sun x86_64 compiler, and compilation fix in cycle
278 counter for AIX/xlc (thanks to Jeff Haferman for the bug report).
279
280 * Fixed incorrect type prefix in MPI code that prevented wisdom routines
281 from working in single precision (thanks to Eric A. Borisch for the report).
282
283 * Added 'make check' for MPI code (which still fails in a couple corner
284 cases, but should be much better than in alpha2).
285
286 * Many other small fixes.
287
288 FFTW 3.2alpha2
289
290 * Support for the Cell processor, donated by IBM Research; see README.Cell
291 and the Cell section of the manual.
292
293 * New 64-bit API: for every "plan_guru" function there is a new "plan_guru64"
294 function with the same semantics, but which takes fftw_iodim64 instead of
295 fftw_iodim. fftw_iodim64 is the same as fftw_iodim, except that it takes
296 ptrdiff_t integer types as parameters, which is a 64-bit type on
297 64-bit machines. This is only useful for specifying very large transforms
298 on 64-bit machines. (Internally, FFTW uses ptrdiff_t everywhere
299 regardless of what API you choose.)
300
301 * Experimental MPI support. Complex one- and multi-dimensional FFTs,
302 multi-dimensional r2r, multi-dimensional r2c/c2r transforms, and
303 distributed transpose operations, with 1d block distributions.
304 (This is an alpha preview: routines have not been exhaustively
305 tested, documentation is incomplete, and some functionality is
306 missing, e.g. Fortran support.) See mpi/README and also the MPI
307 section of the manual.
308
309 * Significantly faster r2c/c2r transforms, especially on machines with SIMD.
310
311 * Rewritten multi-threaded support for better performance by
312 re-using a fixed pool of threads rather than continually
313 respawning and joining (which nowadays is much slower).
314
315 * Support for MIPS paired-single SIMD instructions, donated by
316 Codesourcery.
317
318 * FFTW_WISDOM_ONLY planner flag, to create plan only if wisdom is
319 available and return NULL otherwise.
320
321 * Removed k7 support, which only worked in 32-bit mode and is
322 becoming obsolete. Use --enable-sse instead.
323
324 * Added --with-g77-wrappers configure option to force inclusion
325 of g77 wrappers, in addition to whatever is needed for the
326 detected Fortran compilers. This is mainly intended for GNU/Linux
327 distros switching to gfortran that wish to include both
328 gfortran and g77 support in FFTW.
329
330 * In manual, renamed "guru execute" functions to "new-array execute"
331 functions, to reduce confusion with the guru planner interface.
332 (The programming interface is unchanged.)
333
334 * Add missing __declspec attribute to threads API functions when compiling
335 for Windows; thanks to Robert O. Morris for the bug report.
336
337 * Fixed missing return value from dfftw_init_threads in Fortran;
338 thanks to Markus Wetzstein for the bug report.
339
340 FFTW 3.1.3
341
342 * Bug fix: FFTW computes incorrect results when the user plans both
343 REDFT11 and RODFT11 transforms of certain sizes. The bug is caused
344 by incorrect sharing of twiddle-factor tables between the two
345 transforms, and only occurs when both are used. Thanks to Paul
346 A. Valiant for the bug report.
347
348 FFTW 3.1.2
349
350 * Correct bug in configure script: --enable-portable-binary option was ignored!
351 Thanks to Andrew Salamon for the bug report.
352
353 * Threads compilation fix on AIX: prefer xlc_r to cc_r, and don't use
354 either if we are using gcc. Thanks to Guy Moebs for the bug report.
355
356 * Updated FAQ to note that Apple gcc 4.0.1 on MacOS/Intel is broken,
357 and suggest a workaround. configure script now detects Core/Duo arch.
358
359 * Use -maltivec when checking for altivec.h. Fixes Gentoo bug #129304,
360 thanks to Markus Dittrich.
361
362 FFTW 3.1.1
363
364 * Performance improvements for Intel EMT64.
365
366 * Performance improvements for large-size transforms with SIMD.
367
368 * Cycle counter support for Intel icc and Visual C++ on x86-64.
369
370 * In fftw-wisdom tool, replaced obsolete --impatient with --measure.
371
372 * Fixed compilation failure with AIX/xlc; thanks to Joseph Thomas.
373
374 * Windows DLL support for Fortran API (added missing __declspec(dllexport)).
375
376 * SSE/SSE2 code works properly (i.e. disables itself) on older 386 and 486
377 CPUs lacking a CPUID instruction; thanks to Eric Korpela.
378
379 FFTW 3.1
380
381 * Faster FFTW_ESTIMATE planner.
382
383 * New (faster) algorithm for REDFT00/RODFT00 (type-I DCT/DST) of odd size.
384
385 * "4-step" algorithm for faster FFTs of very large sizes (> 2^18).
386
387 * Faster in-place real-data DFTs (for R2HC and HC2R r2r formats).
388
389 * Faster in-place non-square transpositions (FFTW uses these internally
390 for in-place FFTs, and you can also perform them explicitly using
391 the guru interface).
392
393 * Faster prime-size DFTs: implemented Bluestein's algorithm, as well
394 as a zero-padded Rader variant to limit recursive use of Rader's algorithm.
395
396 * SIMD support for split complex arrays.
397
398 * Much faster Altivec/VMX performance.
399
400 * New fftw_set_timelimit function to specify a (rough) upper bound to the
401 planning time (does not affect ESTIMATE mode).
402
403 * Removed --enable-3dnow support; use --enable-k7 instead.
404
405 * FMA (fused multiply-add) version is now included in "standard" FFTW,
406 and is enabled with --enable-fma (the default on PowerPC and Itanium).
407
408 * Automatic detection of native architecture flag for gcc. New
409 configure options: --enable-portable-binary and --with-gcc-arch=<arch>,
410 for people distributing compiled binaries of FFTW (see manual).
411
412 * Automatic detection of Altivec under Linux with gcc 3.4 (so that
413 same binary should work on both Altivec and non-Altivec PowerPCs).
414
415 * Compiler-specific tweaks/flags/workarounds for gcc 3.4, xlc, HP/UX,
416 Solaris/Intel.
417
418 * Various documentation clarifications.
419
420 * 64-bit clean. (Fixes a bug affecting the split guru planner on
421 64-bit machines, reported by David Necas.)
422
423 * Fixed Debian bug #259612: inadvertent use of SSE instructions on
424 non-SSE machines (causing a crash) for --enable-sse binaries.
425
426 * Fixed bug that caused HC2R transforms to destroy the input in
427 certain cases, even if the user specified FFTW_PRESERVE_INPUT.
428
429 * Fixed bug where wisdom would be lost under rare circumstances,
430 causing excessive planning time.
431
432 * FAQ notes bug in gcc-3.4.[1-3] that causes FFTW to crash with SSE/SSE2.
433
434 * Fixed accidentally exported symbol that prohibited simultaneous
435 linking to double/single multithreaded FFTW (thanks to Alessio Massaro).
436
437 * Support Win32 threads under MinGW (thanks to Alessio Massaro).
438
439 * Fixed problem with building DLL under Cygwin; thanks to Stephane Fillod.
440
441 * Fix build failure if no Fortran compiler is found (thanks to Charles
442 Radley for the bug report).
443
444 * Fixed compilation failure with icc 8.0 and SSE/SSE2. Automatic
445 detection of icc architecture flag (e.g. -xW).
446
447 * Fixed compilation with OpenMP on AIX (thanks to Greg Bauer).
448
449 * Fixed compilation failure on x86-64 with gcc (thanks to Orion Poplawski).
450
451 * Incorporated patch from FreeBSD ports (FreeBSD does not have memalign,
452 but its malloc is 16-byte aligned).
453
454 * Cycle-counter compilation fixes for Itanium, Alpha, x86-64, Sparc,
455 MacOS (thanks to Matt Boman, John Bowman, and James A. Treacy for
456 reports/fixes). Added x86-64 cycle counter for PGI compilers,
457 courtesy Cristiano Calonaci.
458
459 * Fix compilation problem in test program due to C99 conflict.
460
461 * Portability fix for import_system_wisdom with djgpp (thanks to Juan
462 Manuel Guerrero).
463
464 * Fixed compilation failure on MacOS 10.3 due to getopt conflict.
465
466 * Work around Visual C++ (version 6/7) bug in SSE compilation;
467 thanks to Eddie Yee for his detailed report.
468
469 Changes from FFTW 3.1 beta 2:
470
471 * Several minor compilation fixes.
472
473 * Eliminate FFTW_TIMELIMIT flag and replace fftw_timelimit global with
474 fftw_set_timelimit function. Make wisdom work with time-limited plans.
475
476 Changes from FFTW 3.1 beta 1:
477
478 * Fixes for creating DLLs under Windows; thanks to John Pavel for his feedback.
479
480 * Fixed more 64-bit problems, thanks to John Pavel for the bug report.
481
482 * Further speed improvements for Altivec/VMX.
483
484 * Further speed improvements for non-square transpositions.
485
486 * Many minor tweaks.
487
488 FFTW 3.0.1
489
490 * Some speed improvements in SIMD code.
491
492 * --without-cycle-counter option is removed. If no cycle counter is found,
493 then the estimator is always used. A --with-slow-timer option is provided
494 to force the use of lower-resolution timers.
495
496 * Several fixes for compilation under Visual C++, with help from Stefane Ruel.
497
498 * Added x86 cycle counter for Visual C++, with help from Morten Nissov.
499
500 * Added S390 cycle counter, courtesy of James Treacy.
501
502 * Added missing static keyword that prevented simultaneous linkage
503 of different-precision versions; thanks to Rasmus Larsen for the bug report.
504
505 * Corrected accidental omission of f77_wisdom.f file; thanks to Alan Watson.
506
507 * Support -xopenmp flag for SunOS; thanks to John Lou for the bug report.
508
509 * Compilation with HP/UX cc requires -Wp,-H128000 flag to increase
510 preprocessor limits; thanks to Peter Vouras for the bug report.
511
512 * Removed non-portable use of 'tempfile' in fftw-wisdom-to-conf script;
513 thanks to Nicolas Decoster for the patch.
514
515 * Added 'make smallcheck' target in tests/ directory, at the request of
516 James Treacy.
517
518 FFTW 3.0
519
520 Major goals of this release:
521
522 * Speed: often 20% or more faster than FFTW 2.x, even without SIMD (see below).
523
524 * Complete rewrite, to make it easier to add new algorithms and transforms.
525
526 * New API, to support more general semantics.
527
528 Other enhancements:
529
530 * SIMD acceleration on supporting CPUs (SSE, SSE2, 3DNow!, and AltiVec).
531 (With special thanks to Franz Franchetti for many experimental prototypes
532 and to Stefan Kral for the vectorizing generator from fftwgel.)
533
534 * True in-place 1d transforms of large sizes (as well as compressed
535 twiddle tables for additional memory/cache savings).
536
537 * More arbitrary placement of real & imaginary data, e.g. including
538 interleaved (as in FFTW 2.x) as well as separate real/imag arrays.
539
540 * Efficient prime-size transforms of real data.
541
542 * Multidimensional transforms can operate on a subset of a larger matrix,
543 and/or transform selected dimensions of a multidimensional array.
544
545 * By popular demand, simultaneous linking to double precision (fftw),
546 single precision (fftwf), and long-double precision (fftwl) versions
547 of FFTW is now supported.
548
549 * Cycle counters (on all modern CPUs) are exploited to speed planning.
550
551 * Efficient transforms of real even/odd arrays, a.k.a. discrete
552 cosine/sine transforms (types I-IV). (Currently work via pre/post
553 processing of real transforms, ala FFTPACK, so are not optimal.)
554
555 * DHTs (Discrete Hartley Transforms), again via post-processing
556 of real transforms (and thus suboptimal, for now).
557
558 * Support for linking to just those parts of FFTW that you need,
559 greatly reducing the size of statically linked programs when
560 only a limited set of transform sizes/types are required.
561
562 * Canonical global wisdom file (/etc/fftw/wisdom) on Unix, along
563 with a command-line tool (fftw-wisdom) to generate/update it.
564
565 * Fortran API can be used with both g77 and non-g77 compilers
566 simultaneously.
567
568 * Multi-threaded version has optional OpenMP support.
569
570 * Authors' good looks have greatly improved with age.
571
572 Changes from 3.0beta3:
573
574 * Separate FMA distribution to better exploit fused multiply-add instructions
575 on PowerPC (and possibly other) architectures.
576
577 * Performance improvements via some inlining tweaks.
578
579 * fftw_flops now returns double arguments, not int, to avoid overflows
580 for large sizes.
581
582 * Workarounds for automake bugs.
583
584 Changes from 3.0beta2:
585
586 * The standard REDFT00/RODFT00 (DCT-I/DST-I) algorithm (used in
587 FFTPACK, NR, etcetera) turns out to have poor numerical accuracy, so
588 we replaced it with a slower routine that is more accurate.
589
590 * The guru planner and execute functions now have two variants, one that
591 takes complex arguments and one that takes separate real/imag pointers.
592
593 * Execute and planner routines now automatically align the stack on x86,
594 in case the calling program is misaligned.
595
596 * README file for test program.
597
598 * Fixed bugs in the combination of SIMD with multi-threaded transforms.
599
600 * Eliminated internal fftw_threads_init function, which some people were
601 calling accidentally instead of the fftw_init_threads API function.
602
603 * Check for -openmp flag (Intel C compiler) when --enable-openmp is used.
604
605 * Support AMD x86-64 SIMD and cycle counter.
606
607 * Support SSE2 intrinsics in forthcoming gcc 3.3.
608
609 Changes from 3.0beta1:
610
611 * Faster in-place 1d transforms of non-power-of-two sizes.
612
613 * SIMD improvements for in-place, multi-dimensional, and/or non-FFTW_PATIENT
614 transforms.
615
616 * Added support for hard-coded DCT/DST/DHT codelets of small sizes; the
617 default distribution only includes hard-coded size-8 DCT-II/III, however.
618
619 * Many minor improvements to the manual. Added section on using the
620 codelet generator to customize and enhance FFTW.
621
622 * The default 'make check' should now only take a few minutes; for more
623 strenuous tests (which may take a day or so), do 'cd tests; make bigcheck'.
624
625 * fftw_print_plan is split into fftw_fprint_plan and fftw_print_plan, where
626 the latter uses stdout.
627
628 * Fixed ability to compile with a C++ compiler.
629
630 * Fixed support for C99 complex type under glibc.
631
632 * Fixed problems with alloca under MinGW, AIX.
633
634 * Workaround for gcc/SPARC bug.
635
636 * Fixed multi-threaded initialization failure on IRIX due to lack of
637 user-accessible PTHREAD_SCOPE_SYSTEM there.