Mercurial > hg > qm-dsp
comparison ext/kissfft/TIPS @ 409:1f1999b0f577
Bring in kissfft into this repo (formerly a subrepo, but the remote is not responding)
| author | Chris Cannam <c.cannam@qmul.ac.uk> |
|---|---|
| date | Tue, 21 Jul 2015 07:34:15 +0100 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| 408:5316fa4b0f33 | 409:1f1999b0f577 |
|---|---|
| 1 Speed: | |
| 2 * If you want to use multiple cores, then compile with -openmp or -fopenmp (see your compiler docs). | |
| 3 Realize that larger FFTs will reap more benefit than smaller FFTs. This generally uses more CPU time, but | |
| 4 less wall time. | |
| 5 | |
| 6 * experiment with compiler flags | |
| 7 Special thanks to Oscar Lesta. He suggested some compiler flags | |
| 8 for gcc that make a big difference. They shave 10-15% off | |
| 9 execution time on some systems. Try some combination of: | |
| 10 -march=pentiumpro | |
| 11 -ffast-math | |
| 12 -fomit-frame-pointer | |
| 13 | |
| 14 * If the input data has no imaginary component, use the kiss_fftr code under tools/. | |
| 15 Real ffts are roughly twice as fast as complex. | |
| 16 | |
| 17 * If you can rearrange your code to do 4 FFTs in parallel and you are on a recent Intel or AMD machine, | |
| 18 then you might want to experiment with the USE_SIMD code. See README.simd | |
| 19 | |
| 20 | |
| 21 Reducing code size: | |
| 22 * remove some of the butterflies. There are currently butterflies optimized for radices | |
| 23 2,3,4,5. It is worth mentioning that you can still use FFT sizes that contain | |
| 24 other factors, they just won't be quite as fast. You can decide for yourself | |
| 25 whether to keep radix 2 or 4. If you do some work in this area, let me | |
| 26 know what you find. | |
| 27 | |
| 28 * For platforms where ROM/code space is more plentiful than RAM, | |
| 29 consider creating a hardcoded kiss_fft_state. In other words, decide which | |
| 30 FFT size(s) you want and make a structure with the correct factors and twiddles. | |
| 31 | |
| 32 * Frank van der Hulst offered numerous suggestions for smaller code size and correct operation | |
| 33 on embedded targets. "I'm happy to help anyone who is trying to implement KISSFFT on a micro" | |
| 34 | |
| 35 Some of these were rolled into the mainline code base: | |
| 36 - using long casts to promote intermediate results of short*short multiplication | |
| 37 - delaying allocation of buffers that are sometimes unused. | |
| 38 In some cases, it may be desirable to limit capability in order to better suit the target: | |
| 39 - predefining the twiddle tables for the desired fft size. |
