comparison constant-q-cpp/src/ext/kissfft/TIPS @ 366:5d0a2ebb4d17

Bring dependent libraries in to repo
author Chris Cannam
date Fri, 24 Jun 2016 14:47:45 +0100
parents
children
comparison
equal deleted inserted replaced
365:112766f4c34b 366:5d0a2ebb4d17
1 Speed:
2 * If you want to use multiple cores, then compile with -openmp or -fopenmp (see your compiler docs).
3 Realize that larger FFTs will reap more benefit than smaller FFTs. This generally uses more CPU time, but
4 less wall time.
5
6 * experiment with compiler flags
7 Special thanks to Oscar Lesta. He suggested some compiler flags
8 for gcc that make a big difference. They shave 10-15% off
9 execution time on some systems. Try some combination of:
10 -march=pentiumpro
11 -ffast-math
12 -fomit-frame-pointer
13
14 * If the input data has no imaginary component, use the kiss_fftr code under tools/.
15 Real ffts are roughly twice as fast as complex.
16
17 * If you can rearrange your code to do 4 FFTs in parallel and you are on a recent Intel or AMD machine,
18 then you might want to experiment with the USE_SIMD code. See README.simd
19
20
21 Reducing code size:
22 * remove some of the butterflies. There are currently butterflies optimized for radices
23 2,3,4,5. It is worth mentioning that you can still use FFT sizes that contain
24 other factors, they just won't be quite as fast. You can decide for yourself
25 whether to keep radix 2 or 4. If you do some work in this area, let me
26 know what you find.
27
28 * For platforms where ROM/code space is more plentiful than RAM,
29 consider creating a hardcoded kiss_fft_state. In other words, decide which
30 FFT size(s) you want and make a structure with the correct factors and twiddles.
31
32 * Frank van der Hulst offered numerous suggestions for smaller code size and correct operation
33 on embedded targets. "I'm happy to help anyone who is trying to implement KISSFFT on a micro"
34
35 Some of these were rolled into the mainline code base:
36 - using long casts to promote intermediate results of short*short multiplication
37 - delaying allocation of buffers that are sometimes unused.
38 In some cases, it may be desirable to limit capability in order to better suit the target:
39 - predefining the twiddle tables for the desired fft size.