Chris@19
|
1 FFTW 3.3.4
|
Chris@19
|
2
|
Chris@19
|
3 * New functions fftw_alignment_of (to check whether two arrays are
|
Chris@19
|
4 equally aligned for the purposes of applying a plan) and fftw_sprint_plan
|
Chris@19
|
5 (to output a description of plan to a string).
|
Chris@19
|
6
|
Chris@19
|
7 * Bugfix in fftw-wisdom-to-conf; thanks to Florian Oppermann for the
|
Chris@19
|
8 bug report.
|
Chris@19
|
9
|
Chris@19
|
10 * Fixed manual to work with texinfo-5.
|
Chris@19
|
11
|
Chris@19
|
12 * Increased timing interval on x86_64 to reduce timing errors.
|
Chris@19
|
13
|
Chris@19
|
14 * Default to Win32 threads, not pthreads, if both are present.
|
Chris@19
|
15
|
Chris@19
|
16 * Various build-script fixes.
|
Chris@19
|
17
|
Chris@19
|
18 FFTW 3.3.3
|
Chris@19
|
19
|
Chris@19
|
20 * Fix deadlock bug in MPI transforms (thanks to Michael Pippig for the
|
Chris@19
|
21 bug report and patch, and to Graham Dennis for the bug report).
|
Chris@19
|
22
|
Chris@19
|
23 * Use 128-bit ARM NEON instructions instead of 64-bits. This change
|
Chris@19
|
24 appears to speed up even ARM processors with a 64-bit NEON pipe.
|
Chris@19
|
25
|
Chris@19
|
26 * Speed improvements for single-precision AVX.
|
Chris@19
|
27
|
Chris@19
|
28 * Speed up planner on machines without "official" cycle counters, such as ARM.
|
Chris@19
|
29
|
Chris@19
|
30 FFTW 3.3.2
|
Chris@19
|
31
|
Chris@19
|
32 * Removed an archaic stack-alignment hack that was failing with
|
Chris@19
|
33 gcc-4.7/i386.
|
Chris@19
|
34
|
Chris@19
|
35 * Added stack-alignment hack necessary for gcc on Windows/i386. We
|
Chris@19
|
36 will regret this in ten years (see previous change).
|
Chris@19
|
37
|
Chris@19
|
38 * Fix incompatibility with Intel icc which pretends to be gcc
|
Chris@19
|
39 but does not support quad precision.
|
Chris@19
|
40
|
Chris@19
|
41 * make libfftw{threads,mpi} depend upon libfftw when using libtool;
|
Chris@19
|
42 this is consistent with most other libraries and simplifies the life
|
Chris@19
|
43 of various distributors of GNU/Linux.
|
Chris@19
|
44
|
Chris@19
|
45 FFTW 3.3.1
|
Chris@19
|
46
|
Chris@19
|
47 * Changes since 3.3.1-beta1:
|
Chris@19
|
48
|
Chris@19
|
49 - Reduced planning time in estimate mode for sizes with large
|
Chris@19
|
50 prime factors.
|
Chris@19
|
51
|
Chris@19
|
52 - Added AVX autodetection under Visual Studio. Thanks Carsten
|
Chris@19
|
53 Steger for submitting the necessary code.
|
Chris@19
|
54
|
Chris@19
|
55 - Modern Fortran interface now uses a separate fftw3l.f03 interface
|
Chris@19
|
56 file for the long double interface, which is not supported by
|
Chris@19
|
57 some Fortran compilers. Provided new fftw3q.f03 interface file
|
Chris@19
|
58 to access the quadruple-precision FFTW routines with recent
|
Chris@19
|
59 versions of gcc/gfortran.
|
Chris@19
|
60
|
Chris@19
|
61 * Added support for the NEON extensions to the ARM ISA. (Note to beta
|
Chris@19
|
62 users: an ARM cycle counter is not yet implemented; please contact
|
Chris@19
|
63 fftw@fftw.org if you know how to do it right.)
|
Chris@19
|
64
|
Chris@19
|
65 * MPI code now compiles even if mpicc is a C++ compiler; thanks to
|
Chris@19
|
66 Kyle Spyksma for the bug report.
|
Chris@19
|
67
|
Chris@19
|
68 FFTW 3.3
|
Chris@19
|
69
|
Chris@19
|
70 * Changes since 3.3-beta1:
|
Chris@19
|
71
|
Chris@19
|
72 - Compiling OpenMP support (--enable-openmp) now installs a
|
Chris@19
|
73 fftw3_omp library, instead of fftw3_threads, so that OpenMP
|
Chris@19
|
74 and POSIX threads (--enable-threads) libraries can be built
|
Chris@19
|
75 and installed at the same time.
|
Chris@19
|
76
|
Chris@19
|
77 - Various minor compilation fixes, corrections of manual typos, and
|
Chris@19
|
78 improvements to the benchmark test program.
|
Chris@19
|
79
|
Chris@19
|
80 * Add support for the AVX extensions to x86 and x86-64. The AVX code
|
Chris@19
|
81 works with 16-byte alignment (as opposed to 32-byte alignment),
|
Chris@19
|
82 so there is no ABI change compared to FFTW 3.2.2.
|
Chris@19
|
83
|
Chris@19
|
84 * Added Fortran 2003 interface, which should be usable on most modern
|
Chris@19
|
85 Fortran compilers (e.g. gfortran) and provides type-checked access
|
Chris@19
|
86 to the the C FFTW interface. (The legacy Fortran-77 interface is
|
Chris@19
|
87 still included also.)
|
Chris@19
|
88
|
Chris@19
|
89 * Added MPI distributed-memory transforms. Compared to 3.3alpha,
|
Chris@19
|
90 the major changes in the MPI transforms are:
|
Chris@19
|
91 - Fixed some deadlock and crashing bugs.
|
Chris@19
|
92 - Added Fortran 2003 interface.
|
Chris@19
|
93 - Added new-array execute functions for MPI plans.
|
Chris@19
|
94 - Eliminated use of large MPI tags, since Cray MPI requires tags < 2^24;
|
Chris@19
|
95 thanks to Jonathan Bentz for the bug report.
|
Chris@19
|
96 - Expanded documentation.
|
Chris@19
|
97 - 'make check' now runs MPI tests
|
Chris@19
|
98 - Some ABI changes - not binary-compatible with 3.3alpha MPI.
|
Chris@19
|
99
|
Chris@19
|
100 * Add support for quad-precision __float128 in gcc 4.6 or later (on x86.
|
Chris@19
|
101 x86-64, and Itanium). The new routines use the fftwq_ prefix.
|
Chris@19
|
102
|
Chris@19
|
103 * Removed support for MIPS paired-single instructions due to lack of
|
Chris@19
|
104 available hardware for testing. Users who want this functionality
|
Chris@19
|
105 should continue using FFTW 3.2.x. (Note that FFTW 3.3 still works
|
Chris@19
|
106 on MIPS; this only concerns special instructions available on some
|
Chris@19
|
107 MIPS chips.)
|
Chris@19
|
108
|
Chris@19
|
109 * Removed support for the Cell Broadband Engine. Cell users should
|
Chris@19
|
110 use FFTW 3.2.x.
|
Chris@19
|
111
|
Chris@19
|
112 * New convenience functions fftw_alloc_real and fftw_alloc_complex
|
Chris@19
|
113 to use fftw_malloc for real and complex arrays without typecasts
|
Chris@19
|
114 or sizeof.
|
Chris@19
|
115
|
Chris@19
|
116 * New convenience functions fftw_export_wisdom_to_filename and
|
Chris@19
|
117 fftw_import_wisdom_from_filename that export/import wisdom
|
Chris@19
|
118 to a file, which don't require you to open/close the file yourself.
|
Chris@19
|
119
|
Chris@19
|
120 * New function fftw_cost to return FFTW's internal cost metric for
|
Chris@19
|
121 a given plan; thanks to Rhys Ulerich and Nathanael Schaeffer for the
|
Chris@19
|
122 suggestion.
|
Chris@19
|
123
|
Chris@19
|
124 * The --enable-sse2 configure flag now works in both double and single
|
Chris@19
|
125 precision (and is equivalent to --enable-sse in the latter case).
|
Chris@19
|
126
|
Chris@19
|
127 * Remove --enable-portable-binary flag: we new produce portable binaries
|
Chris@19
|
128 by default.
|
Chris@19
|
129
|
Chris@19
|
130 * Remove the automatic detection of native architecture flag for gcc
|
Chris@19
|
131 which was introduced in fftw-3.1, since new gcc supports -mtune=native.
|
Chris@19
|
132 Remove the --with-gcc-arch flag; if you want to specify a particlar
|
Chris@19
|
133 arch to configure, use ./configure CC="gcc -mtune=...".
|
Chris@19
|
134
|
Chris@19
|
135 * --with-our-malloc16 configure flag is now renamed --with-our-malloc.
|
Chris@19
|
136
|
Chris@19
|
137 * Fixed build problem failure when srand48 declaration is missing;
|
Chris@19
|
138 thanks to Ralf Wildenhues for the bug report.
|
Chris@19
|
139
|
Chris@19
|
140 * Fixed bug in fftw_set_timelimit: ensure that a negative timelimit
|
Chris@19
|
141 is equivalent to no timelimit in all cases. Thanks to William Andrew
|
Chris@19
|
142 Burnson for the bug report.
|
Chris@19
|
143
|
Chris@19
|
144 * Fixed stack-overflow problem on OpenBSD caused by using alloca with
|
Chris@19
|
145 too large a buffer.
|
Chris@19
|
146
|
Chris@19
|
147 FFTW 3.2.2
|
Chris@19
|
148
|
Chris@19
|
149 * Improve performance of some copy operations of complex arrays on
|
Chris@19
|
150 x86 machines.
|
Chris@19
|
151
|
Chris@19
|
152 * Add configure flag to disable alloca(), which is broken in mingw64.
|
Chris@19
|
153
|
Chris@19
|
154 * Planning in FFTW_ESTIMATE mode for r2r transforms became slower
|
Chris@19
|
155 between fftw-3.1.3 and 3.2. This regression has now been fixed.
|
Chris@19
|
156
|
Chris@19
|
157 FFTW 3.2.1
|
Chris@19
|
158
|
Chris@19
|
159 * Performance improvements for some multidimensional r2c/c2r transforms;
|
Chris@19
|
160 thanks to Eugene Miloslavsky for his benchmark reports.
|
Chris@19
|
161
|
Chris@19
|
162 * Compile with icc on MacOS X, use better icc compiler flags.
|
Chris@19
|
163
|
Chris@19
|
164 * Compilation fixes for systems where snprintf is defined as a macro;
|
Chris@19
|
165 thanks to Marcus Mae for the bug report.
|
Chris@19
|
166
|
Chris@19
|
167 * Fortran documentation now recommends not using dfftw_execute,
|
Chris@19
|
168 because of reports of problems with various Fortran compilers;
|
Chris@19
|
169 it is better to use dfftw_execute_dft etcetera.
|
Chris@19
|
170
|
Chris@19
|
171 * Some documentation clarifications, e.g. of fact that --enable-openmp
|
Chris@19
|
172 and --enable-threads are mutually exclusive (thanks to Long To),
|
Chris@19
|
173 and document slightly odd behavior of plan_guru_r2r in Fortran
|
Chris@19
|
174 (thanks to Alexander Pozdneev).
|
Chris@19
|
175
|
Chris@19
|
176 * FAQ was accidentally omitted from 3.2 tarball.
|
Chris@19
|
177
|
Chris@19
|
178 * Remove some extraneous (harmless) files accidentally included in
|
Chris@19
|
179 a subdirectory of the 3.2 tarball.
|
Chris@19
|
180
|
Chris@19
|
181 FFTW 3.2
|
Chris@19
|
182
|
Chris@19
|
183 * Worked around apparent glibc bug that leads to rare hangs when freeing
|
Chris@19
|
184 semaphores.
|
Chris@19
|
185
|
Chris@19
|
186 * Fixed segfault due to unaligned access in certain obscure problems
|
Chris@19
|
187 that use SSE and multiple threads.
|
Chris@19
|
188
|
Chris@19
|
189 * MPI transforms not included, as they are still in alpha; the alpha
|
Chris@19
|
190 versions of the MPI transforms have been moved to FFTW 3.3alpha1.
|
Chris@19
|
191
|
Chris@19
|
192 FFTW 3.2alpha3
|
Chris@19
|
193
|
Chris@19
|
194 * Performance improvements for sizes with factors of 5 and 10.
|
Chris@19
|
195
|
Chris@19
|
196 * Documented FFTW_WISDOM_ONLY flag, at the suggestion of Mario
|
Chris@19
|
197 Emmenlauer and Phil Dumont.
|
Chris@19
|
198
|
Chris@19
|
199 * Port Cell code to SDK2.1 (libspe2), as opposed to the old libspe1 code.
|
Chris@19
|
200
|
Chris@19
|
201 * Performance improvements in Cell code for N < 32k, thanks to Jan Wagner
|
Chris@19
|
202 for the suggestions.
|
Chris@19
|
203
|
Chris@19
|
204 * Cycle counter for Sun x86_64 compiler, and compilation fix in cycle
|
Chris@19
|
205 counter for AIX/xlc (thanks to Jeff Haferman for the bug report).
|
Chris@19
|
206
|
Chris@19
|
207 * Fixed incorrect type prefix in MPI code that prevented wisdom routines
|
Chris@19
|
208 from working in single precision (thanks to Eric A. Borisch for the report).
|
Chris@19
|
209
|
Chris@19
|
210 * Added 'make check' for MPI code (which still fails in a couple corner
|
Chris@19
|
211 cases, but should be much better than in alpha2).
|
Chris@19
|
212
|
Chris@19
|
213 * Many other small fixes.
|
Chris@19
|
214
|
Chris@19
|
215 FFTW 3.2alpha2
|
Chris@19
|
216
|
Chris@19
|
217 * Support for the Cell processor, donated by IBM Research; see README.Cell
|
Chris@19
|
218 and the Cell section of the manual.
|
Chris@19
|
219
|
Chris@19
|
220 * New 64-bit API: for every "plan_guru" function there is a new "plan_guru64"
|
Chris@19
|
221 function with the same semantics, but which takes fftw_iodim64 instead of
|
Chris@19
|
222 fftw_iodim. fftw_iodim64 is the same as fftw_iodim, except that it takes
|
Chris@19
|
223 ptrdiff_t integer types as parameters, which is a 64-bit type on
|
Chris@19
|
224 64-bit machines. This is only useful for specifying very large transforms
|
Chris@19
|
225 on 64-bit machines. (Internally, FFTW uses ptrdiff_t everywhere
|
Chris@19
|
226 regardless of what API you choose.)
|
Chris@19
|
227
|
Chris@19
|
228 * Experimental MPI support. Complex one- and multi-dimensional FFTs,
|
Chris@19
|
229 multi-dimensional r2r, multi-dimensional r2c/c2r transforms, and
|
Chris@19
|
230 distributed transpose operations, with 1d block distributions.
|
Chris@19
|
231 (This is an alpha preview: routines have not been exhaustively
|
Chris@19
|
232 tested, documentation is incomplete, and some functionality is
|
Chris@19
|
233 missing, e.g. Fortran support.) See mpi/README and also the MPI
|
Chris@19
|
234 section of the manual.
|
Chris@19
|
235
|
Chris@19
|
236 * Significantly faster r2c/c2r transforms, especially on machines with SIMD.
|
Chris@19
|
237
|
Chris@19
|
238 * Rewritten multi-threaded support for better performance by
|
Chris@19
|
239 re-using a fixed pool of threads rather than continually
|
Chris@19
|
240 respawning and joining (which nowadays is much slower).
|
Chris@19
|
241
|
Chris@19
|
242 * Support for MIPS paired-single SIMD instructions, donated by
|
Chris@19
|
243 Codesourcery.
|
Chris@19
|
244
|
Chris@19
|
245 * FFTW_WISDOM_ONLY planner flag, to create plan only if wisdom is
|
Chris@19
|
246 available and return NULL otherwise.
|
Chris@19
|
247
|
Chris@19
|
248 * Removed k7 support, which only worked in 32-bit mode and is
|
Chris@19
|
249 becoming obsolete. Use --enable-sse instead.
|
Chris@19
|
250
|
Chris@19
|
251 * Added --with-g77-wrappers configure option to force inclusion
|
Chris@19
|
252 of g77 wrappers, in addition to whatever is needed for the
|
Chris@19
|
253 detected Fortran compilers. This is mainly intended for GNU/Linux
|
Chris@19
|
254 distros switching to gfortran that wish to include both
|
Chris@19
|
255 gfortran and g77 support in FFTW.
|
Chris@19
|
256
|
Chris@19
|
257 * In manual, renamed "guru execute" functions to "new-array execute"
|
Chris@19
|
258 functions, to reduce confusion with the guru planner interface.
|
Chris@19
|
259 (The programming interface is unchanged.)
|
Chris@19
|
260
|
Chris@19
|
261 * Add missing __declspec attribute to threads API functions when compiling
|
Chris@19
|
262 for Windows; thanks to Robert O. Morris for the bug report.
|
Chris@19
|
263
|
Chris@19
|
264 * Fixed missing return value from dfftw_init_threads in Fortran;
|
Chris@19
|
265 thanks to Markus Wetzstein for the bug report.
|
Chris@19
|
266
|
Chris@19
|
267 FFTW 3.1.3
|
Chris@19
|
268
|
Chris@19
|
269 * Bug fix: FFTW computes incorrect results when the user plans both
|
Chris@19
|
270 REDFT11 and RODFT11 transforms of certain sizes. The bug is caused
|
Chris@19
|
271 by incorrect sharing of twiddle-factor tables between the two
|
Chris@19
|
272 transforms, and only occurs when both are used. Thanks to Paul
|
Chris@19
|
273 A. Valiant for the bug report.
|
Chris@19
|
274
|
Chris@19
|
275 FFTW 3.1.2
|
Chris@19
|
276
|
Chris@19
|
277 * Correct bug in configure script: --enable-portable-binary option was ignored!
|
Chris@19
|
278 Thanks to Andrew Salamon for the bug report.
|
Chris@19
|
279
|
Chris@19
|
280 * Threads compilation fix on AIX: prefer xlc_r to cc_r, and don't use
|
Chris@19
|
281 either if we are using gcc. Thanks to Guy Moebs for the bug report.
|
Chris@19
|
282
|
Chris@19
|
283 * Updated FAQ to note that Apple gcc 4.0.1 on MacOS/Intel is broken,
|
Chris@19
|
284 and suggest a workaround. configure script now detects Core/Duo arch.
|
Chris@19
|
285
|
Chris@19
|
286 * Use -maltivec when checking for altivec.h. Fixes Gentoo bug #129304,
|
Chris@19
|
287 thanks to Markus Dittrich.
|
Chris@19
|
288
|
Chris@19
|
289 FFTW 3.1.1
|
Chris@19
|
290
|
Chris@19
|
291 * Performance improvements for Intel EMT64.
|
Chris@19
|
292
|
Chris@19
|
293 * Performance improvements for large-size transforms with SIMD.
|
Chris@19
|
294
|
Chris@19
|
295 * Cycle counter support for Intel icc and Visual C++ on x86-64.
|
Chris@19
|
296
|
Chris@19
|
297 * In fftw-wisdom tool, replaced obsolete --impatient with --measure.
|
Chris@19
|
298
|
Chris@19
|
299 * Fixed compilation failure with AIX/xlc; thanks to Joseph Thomas.
|
Chris@19
|
300
|
Chris@19
|
301 * Windows DLL support for Fortran API (added missing __declspec(dllexport)).
|
Chris@19
|
302
|
Chris@19
|
303 * SSE/SSE2 code works properly (i.e. disables itself) on older 386 and 486
|
Chris@19
|
304 CPUs lacking a CPUID instruction; thanks to Eric Korpela.
|
Chris@19
|
305
|
Chris@19
|
306 FFTW 3.1
|
Chris@19
|
307
|
Chris@19
|
308 * Faster FFTW_ESTIMATE planner.
|
Chris@19
|
309
|
Chris@19
|
310 * New (faster) algorithm for REDFT00/RODFT00 (type-I DCT/DST) of odd size.
|
Chris@19
|
311
|
Chris@19
|
312 * "4-step" algorithm for faster FFTs of very large sizes (> 2^18).
|
Chris@19
|
313
|
Chris@19
|
314 * Faster in-place real-data DFTs (for R2HC and HC2R r2r formats).
|
Chris@19
|
315
|
Chris@19
|
316 * Faster in-place non-square transpositions (FFTW uses these internally
|
Chris@19
|
317 for in-place FFTs, and you can also perform them explicitly using
|
Chris@19
|
318 the guru interface).
|
Chris@19
|
319
|
Chris@19
|
320 * Faster prime-size DFTs: implemented Bluestein's algorithm, as well
|
Chris@19
|
321 as a zero-padded Rader variant to limit recursive use of Rader's algorithm.
|
Chris@19
|
322
|
Chris@19
|
323 * SIMD support for split complex arrays.
|
Chris@19
|
324
|
Chris@19
|
325 * Much faster Altivec/VMX performance.
|
Chris@19
|
326
|
Chris@19
|
327 * New fftw_set_timelimit function to specify a (rough) upper bound to the
|
Chris@19
|
328 planning time (does not affect ESTIMATE mode).
|
Chris@19
|
329
|
Chris@19
|
330 * Removed --enable-3dnow support; use --enable-k7 instead.
|
Chris@19
|
331
|
Chris@19
|
332 * FMA (fused multiply-add) version is now included in "standard" FFTW,
|
Chris@19
|
333 and is enabled with --enable-fma (the default on PowerPC and Itanium).
|
Chris@19
|
334
|
Chris@19
|
335 * Automatic detection of native architecture flag for gcc. New
|
Chris@19
|
336 configure options: --enable-portable-binary and --with-gcc-arch=<arch>,
|
Chris@19
|
337 for people distributing compiled binaries of FFTW (see manual).
|
Chris@19
|
338
|
Chris@19
|
339 * Automatic detection of Altivec under Linux with gcc 3.4 (so that
|
Chris@19
|
340 same binary should work on both Altivec and non-Altivec PowerPCs).
|
Chris@19
|
341
|
Chris@19
|
342 * Compiler-specific tweaks/flags/workarounds for gcc 3.4, xlc, HP/UX,
|
Chris@19
|
343 Solaris/Intel.
|
Chris@19
|
344
|
Chris@19
|
345 * Various documentation clarifications.
|
Chris@19
|
346
|
Chris@19
|
347 * 64-bit clean. (Fixes a bug affecting the split guru planner on
|
Chris@19
|
348 64-bit machines, reported by David Necas.)
|
Chris@19
|
349
|
Chris@19
|
350 * Fixed Debian bug #259612: inadvertent use of SSE instructions on
|
Chris@19
|
351 non-SSE machines (causing a crash) for --enable-sse binaries.
|
Chris@19
|
352
|
Chris@19
|
353 * Fixed bug that caused HC2R transforms to destroy the input in
|
Chris@19
|
354 certain cases, even if the user specified FFTW_PRESERVE_INPUT.
|
Chris@19
|
355
|
Chris@19
|
356 * Fixed bug where wisdom would be lost under rare circumstances,
|
Chris@19
|
357 causing excessive planning time.
|
Chris@19
|
358
|
Chris@19
|
359 * FAQ notes bug in gcc-3.4.[1-3] that causes FFTW to crash with SSE/SSE2.
|
Chris@19
|
360
|
Chris@19
|
361 * Fixed accidentally exported symbol that prohibited simultaneous
|
Chris@19
|
362 linking to double/single multithreaded FFTW (thanks to Alessio Massaro).
|
Chris@19
|
363
|
Chris@19
|
364 * Support Win32 threads under MinGW (thanks to Alessio Massaro).
|
Chris@19
|
365
|
Chris@19
|
366 * Fixed problem with building DLL under Cygwin; thanks to Stephane Fillod.
|
Chris@19
|
367
|
Chris@19
|
368 * Fix build failure if no Fortran compiler is found (thanks to Charles
|
Chris@19
|
369 Radley for the bug report).
|
Chris@19
|
370
|
Chris@19
|
371 * Fixed compilation failure with icc 8.0 and SSE/SSE2. Automatic
|
Chris@19
|
372 detection of icc architecture flag (e.g. -xW).
|
Chris@19
|
373
|
Chris@19
|
374 * Fixed compilation with OpenMP on AIX (thanks to Greg Bauer).
|
Chris@19
|
375
|
Chris@19
|
376 * Fixed compilation failure on x86-64 with gcc (thanks to Orion Poplawski).
|
Chris@19
|
377
|
Chris@19
|
378 * Incorporated patch from FreeBSD ports (FreeBSD does not have memalign,
|
Chris@19
|
379 but its malloc is 16-byte aligned).
|
Chris@19
|
380
|
Chris@19
|
381 * Cycle-counter compilation fixes for Itanium, Alpha, x86-64, Sparc,
|
Chris@19
|
382 MacOS (thanks to Matt Boman, John Bowman, and James A. Treacy for
|
Chris@19
|
383 reports/fixes). Added x86-64 cycle counter for PGI compilers,
|
Chris@19
|
384 courtesy Cristiano Calonaci.
|
Chris@19
|
385
|
Chris@19
|
386 * Fix compilation problem in test program due to C99 conflict.
|
Chris@19
|
387
|
Chris@19
|
388 * Portability fix for import_system_wisdom with djgpp (thanks to Juan
|
Chris@19
|
389 Manuel Guerrero).
|
Chris@19
|
390
|
Chris@19
|
391 * Fixed compilation failure on MacOS 10.3 due to getopt conflict.
|
Chris@19
|
392
|
Chris@19
|
393 * Work around Visual C++ (version 6/7) bug in SSE compilation;
|
Chris@19
|
394 thanks to Eddie Yee for his detailed report.
|
Chris@19
|
395
|
Chris@19
|
396 Changes from FFTW 3.1 beta 2:
|
Chris@19
|
397
|
Chris@19
|
398 * Several minor compilation fixes.
|
Chris@19
|
399
|
Chris@19
|
400 * Eliminate FFTW_TIMELIMIT flag and replace fftw_timelimit global with
|
Chris@19
|
401 fftw_set_timelimit function. Make wisdom work with time-limited plans.
|
Chris@19
|
402
|
Chris@19
|
403 Changes from FFTW 3.1 beta 1:
|
Chris@19
|
404
|
Chris@19
|
405 * Fixes for creating DLLs under Windows; thanks to John Pavel for his feedback.
|
Chris@19
|
406
|
Chris@19
|
407 * Fixed more 64-bit problems, thanks to John Pavel for the bug report.
|
Chris@19
|
408
|
Chris@19
|
409 * Further speed improvements for Altivec/VMX.
|
Chris@19
|
410
|
Chris@19
|
411 * Further speed improvements for non-square transpositions.
|
Chris@19
|
412
|
Chris@19
|
413 * Many minor tweaks.
|
Chris@19
|
414
|
Chris@19
|
415 FFTW 3.0.1
|
Chris@19
|
416
|
Chris@19
|
417 * Some speed improvements in SIMD code.
|
Chris@19
|
418
|
Chris@19
|
419 * --without-cycle-counter option is removed. If no cycle counter is found,
|
Chris@19
|
420 then the estimator is always used. A --with-slow-timer option is provided
|
Chris@19
|
421 to force the use of lower-resolution timers.
|
Chris@19
|
422
|
Chris@19
|
423 * Several fixes for compilation under Visual C++, with help from Stefane Ruel.
|
Chris@19
|
424
|
Chris@19
|
425 * Added x86 cycle counter for Visual C++, with help from Morten Nissov.
|
Chris@19
|
426
|
Chris@19
|
427 * Added S390 cycle counter, courtesy of James Treacy.
|
Chris@19
|
428
|
Chris@19
|
429 * Added missing static keyword that prevented simultaneous linkage
|
Chris@19
|
430 of different-precision versions; thanks to Rasmus Larsen for the bug report.
|
Chris@19
|
431
|
Chris@19
|
432 * Corrected accidental omission of f77_wisdom.f file; thanks to Alan Watson.
|
Chris@19
|
433
|
Chris@19
|
434 * Support -xopenmp flag for SunOS; thanks to John Lou for the bug report.
|
Chris@19
|
435
|
Chris@19
|
436 * Compilation with HP/UX cc requires -Wp,-H128000 flag to increase
|
Chris@19
|
437 preprocessor limits; thanks to Peter Vouras for the bug report.
|
Chris@19
|
438
|
Chris@19
|
439 * Removed non-portable use of 'tempfile' in fftw-wisdom-to-conf script;
|
Chris@19
|
440 thanks to Nicolas Decoster for the patch.
|
Chris@19
|
441
|
Chris@19
|
442 * Added 'make smallcheck' target in tests/ directory, at the request of
|
Chris@19
|
443 James Treacy.
|
Chris@19
|
444
|
Chris@19
|
445 FFTW 3.0
|
Chris@19
|
446
|
Chris@19
|
447 Major goals of this release:
|
Chris@19
|
448
|
Chris@19
|
449 * Speed: often 20% or more faster than FFTW 2.x, even without SIMD (see below).
|
Chris@19
|
450
|
Chris@19
|
451 * Complete rewrite, to make it easier to add new algorithms and transforms.
|
Chris@19
|
452
|
Chris@19
|
453 * New API, to support more general semantics.
|
Chris@19
|
454
|
Chris@19
|
455 Other enhancements:
|
Chris@19
|
456
|
Chris@19
|
457 * SIMD acceleration on supporting CPUs (SSE, SSE2, 3DNow!, and AltiVec).
|
Chris@19
|
458 (With special thanks to Franz Franchetti for many experimental prototypes
|
Chris@19
|
459 and to Stefan Kral for the vectorizing generator from fftwgel.)
|
Chris@19
|
460
|
Chris@19
|
461 * True in-place 1d transforms of large sizes (as well as compressed
|
Chris@19
|
462 twiddle tables for additional memory/cache savings).
|
Chris@19
|
463
|
Chris@19
|
464 * More arbitrary placement of real & imaginary data, e.g. including
|
Chris@19
|
465 interleaved (as in FFTW 2.x) as well as separate real/imag arrays.
|
Chris@19
|
466
|
Chris@19
|
467 * Efficient prime-size transforms of real data.
|
Chris@19
|
468
|
Chris@19
|
469 * Multidimensional transforms can operate on a subset of a larger matrix,
|
Chris@19
|
470 and/or transform selected dimensions of a multidimensional array.
|
Chris@19
|
471
|
Chris@19
|
472 * By popular demand, simultaneous linking to double precision (fftw),
|
Chris@19
|
473 single precision (fftwf), and long-double precision (fftwl) versions
|
Chris@19
|
474 of FFTW is now supported.
|
Chris@19
|
475
|
Chris@19
|
476 * Cycle counters (on all modern CPUs) are exploited to speed planning.
|
Chris@19
|
477
|
Chris@19
|
478 * Efficient transforms of real even/odd arrays, a.k.a. discrete
|
Chris@19
|
479 cosine/sine transforms (types I-IV). (Currently work via pre/post
|
Chris@19
|
480 processing of real transforms, ala FFTPACK, so are not optimal.)
|
Chris@19
|
481
|
Chris@19
|
482 * DHTs (Discrete Hartley Transforms), again via post-processing
|
Chris@19
|
483 of real transforms (and thus suboptimal, for now).
|
Chris@19
|
484
|
Chris@19
|
485 * Support for linking to just those parts of FFTW that you need,
|
Chris@19
|
486 greatly reducing the size of statically linked programs when
|
Chris@19
|
487 only a limited set of transform sizes/types are required.
|
Chris@19
|
488
|
Chris@19
|
489 * Canonical global wisdom file (/etc/fftw/wisdom) on Unix, along
|
Chris@19
|
490 with a command-line tool (fftw-wisdom) to generate/update it.
|
Chris@19
|
491
|
Chris@19
|
492 * Fortran API can be used with both g77 and non-g77 compilers
|
Chris@19
|
493 simultaneously.
|
Chris@19
|
494
|
Chris@19
|
495 * Multi-threaded version has optional OpenMP support.
|
Chris@19
|
496
|
Chris@19
|
497 * Authors' good looks have greatly improved with age.
|
Chris@19
|
498
|
Chris@19
|
499 Changes from 3.0beta3:
|
Chris@19
|
500
|
Chris@19
|
501 * Separate FMA distribution to better exploit fused multiply-add instructions
|
Chris@19
|
502 on PowerPC (and possibly other) architectures.
|
Chris@19
|
503
|
Chris@19
|
504 * Performance improvements via some inlining tweaks.
|
Chris@19
|
505
|
Chris@19
|
506 * fftw_flops now returns double arguments, not int, to avoid overflows
|
Chris@19
|
507 for large sizes.
|
Chris@19
|
508
|
Chris@19
|
509 * Workarounds for automake bugs.
|
Chris@19
|
510
|
Chris@19
|
511 Changes from 3.0beta2:
|
Chris@19
|
512
|
Chris@19
|
513 * The standard REDFT00/RODFT00 (DCT-I/DST-I) algorithm (used in
|
Chris@19
|
514 FFTPACK, NR, etcetera) turns out to have poor numerical accuracy, so
|
Chris@19
|
515 we replaced it with a slower routine that is more accurate.
|
Chris@19
|
516
|
Chris@19
|
517 * The guru planner and execute functions now have two variants, one that
|
Chris@19
|
518 takes complex arguments and one that takes separate real/imag pointers.
|
Chris@19
|
519
|
Chris@19
|
520 * Execute and planner routines now automatically align the stack on x86,
|
Chris@19
|
521 in case the calling program is misaligned.
|
Chris@19
|
522
|
Chris@19
|
523 * README file for test program.
|
Chris@19
|
524
|
Chris@19
|
525 * Fixed bugs in the combination of SIMD with multi-threaded transforms.
|
Chris@19
|
526
|
Chris@19
|
527 * Eliminated internal fftw_threads_init function, which some people were
|
Chris@19
|
528 calling accidentally instead of the fftw_init_threads API function.
|
Chris@19
|
529
|
Chris@19
|
530 * Check for -openmp flag (Intel C compiler) when --enable-openmp is used.
|
Chris@19
|
531
|
Chris@19
|
532 * Support AMD x86-64 SIMD and cycle counter.
|
Chris@19
|
533
|
Chris@19
|
534 * Support SSE2 intrinsics in forthcoming gcc 3.3.
|
Chris@19
|
535
|
Chris@19
|
536 Changes from 3.0beta1:
|
Chris@19
|
537
|
Chris@19
|
538 * Faster in-place 1d transforms of non-power-of-two sizes.
|
Chris@19
|
539
|
Chris@19
|
540 * SIMD improvements for in-place, multi-dimensional, and/or non-FFTW_PATIENT
|
Chris@19
|
541 transforms.
|
Chris@19
|
542
|
Chris@19
|
543 * Added support for hard-coded DCT/DST/DHT codelets of small sizes; the
|
Chris@19
|
544 default distribution only includes hard-coded size-8 DCT-II/III, however.
|
Chris@19
|
545
|
Chris@19
|
546 * Many minor improvements to the manual. Added section on using the
|
Chris@19
|
547 codelet generator to customize and enhance FFTW.
|
Chris@19
|
548
|
Chris@19
|
549 * The default 'make check' should now only take a few minutes; for more
|
Chris@19
|
550 strenuous tests (which may take a day or so), do 'cd tests; make bigcheck'.
|
Chris@19
|
551
|
Chris@19
|
552 * fftw_print_plan is split into fftw_fprint_plan and fftw_print_plan, where
|
Chris@19
|
553 the latter uses stdout.
|
Chris@19
|
554
|
Chris@19
|
555 * Fixed ability to compile with a C++ compiler.
|
Chris@19
|
556
|
Chris@19
|
557 * Fixed support for C99 complex type under glibc.
|
Chris@19
|
558
|
Chris@19
|
559 * Fixed problems with alloca under MinGW, AIX.
|
Chris@19
|
560
|
Chris@19
|
561 * Workaround for gcc/SPARC bug.
|
Chris@19
|
562
|
Chris@19
|
563 * Fixed multi-threaded initialization failure on IRIX due to lack of
|
Chris@19
|
564 user-accessible PTHREAD_SCOPE_SYSTEM there.
|