Mercurial > hg > constant-q-cpp
comparison src/ext/kissfft/TIPS @ 174:5ed6e970541b
Remote location for this subrepo is unresponsive, include sources in this repo instead
author | Chris Cannam <c.cannam@qmul.ac.uk> |
---|---|
date | Fri, 17 Jul 2015 15:48:01 +0100 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
173:223f2a8c4f65 | 174:5ed6e970541b |
---|---|
1 Speed: | |
2 * If you want to use multiple cores, then compile with -openmp or -fopenmp (see your compiler docs). | |
3 Realize that larger FFTs will reap more benefit than smaller FFTs. This generally uses more CPU time, but | |
4 less wall time. | |
5 | |
6 * experiment with compiler flags | |
7 Special thanks to Oscar Lesta. He suggested some compiler flags | |
8 for gcc that make a big difference. They shave 10-15% off | |
9 execution time on some systems. Try some combination of: | |
10 -march=pentiumpro | |
11 -ffast-math | |
12 -fomit-frame-pointer | |
13 | |
14 * If the input data has no imaginary component, use the kiss_fftr code under tools/. | |
15 Real ffts are roughly twice as fast as complex. | |
16 | |
17 * If you can rearrange your code to do 4 FFTs in parallel and you are on a recent Intel or AMD machine, | |
18 then you might want to experiment with the USE_SIMD code. See README.simd | |
19 | |
20 | |
21 Reducing code size: | |
22 * remove some of the butterflies. There are currently butterflies optimized for radices | |
23 2,3,4,5. It is worth mentioning that you can still use FFT sizes that contain | |
24 other factors, they just won't be quite as fast. You can decide for yourself | |
25 whether to keep radix 2 or 4. If you do some work in this area, let me | |
26 know what you find. | |
27 | |
28 * For platforms where ROM/code space is more plentiful than RAM, | |
29 consider creating a hardcoded kiss_fft_state. In other words, decide which | |
30 FFT size(s) you want and make a structure with the correct factors and twiddles. | |
31 | |
32 * Frank van der Hulst offered numerous suggestions for smaller code size and correct operation | |
33 on embedded targets. "I'm happy to help anyone who is trying to implement KISSFFT on a micro" | |
34 | |
35 Some of these were rolled into the mainline code base: | |
36 - using long casts to promote intermediate results of short*short multiplication | |
37 - delaying allocation of buffers that are sometimes unused. | |
38 In some cases, it may be desirable to limit capability in order to better suit the target: | |
39 - predefining the twiddle tables for the desired fft size. |