annotate constant-q-cpp/src/ext/kissfft/TIPS @ 372:af71cbdab621 tip

Update bqvec code
author Chris Cannam
date Tue, 19 Nov 2019 10:13:32 +0000
parents 5d0a2ebb4d17
children
rev   line source
Chris@366 1 Speed:
Chris@366 2 * If you want to use multiple cores, then compile with -openmp or -fopenmp (see your compiler docs).
Chris@366 3 Realize that larger FFTs will reap more benefit than smaller FFTs. This generally uses more CPU time, but
Chris@366 4 less wall time.
Chris@366 5
Chris@366 6 * experiment with compiler flags
Chris@366 7 Special thanks to Oscar Lesta. He suggested some compiler flags
Chris@366 8 for gcc that make a big difference. They shave 10-15% off
Chris@366 9 execution time on some systems. Try some combination of:
Chris@366 10 -march=pentiumpro
Chris@366 11 -ffast-math
Chris@366 12 -fomit-frame-pointer
Chris@366 13
Chris@366 14 * If the input data has no imaginary component, use the kiss_fftr code under tools/.
Chris@366 15 Real ffts are roughly twice as fast as complex.
Chris@366 16
Chris@366 17 * If you can rearrange your code to do 4 FFTs in parallel and you are on a recent Intel or AMD machine,
Chris@366 18 then you might want to experiment with the USE_SIMD code. See README.simd
Chris@366 19
Chris@366 20
Chris@366 21 Reducing code size:
Chris@366 22 * remove some of the butterflies. There are currently butterflies optimized for radices
Chris@366 23 2,3,4,5. It is worth mentioning that you can still use FFT sizes that contain
Chris@366 24 other factors, they just won't be quite as fast. You can decide for yourself
Chris@366 25 whether to keep radix 2 or 4. If you do some work in this area, let me
Chris@366 26 know what you find.
Chris@366 27
Chris@366 28 * For platforms where ROM/code space is more plentiful than RAM,
Chris@366 29 consider creating a hardcoded kiss_fft_state. In other words, decide which
Chris@366 30 FFT size(s) you want and make a structure with the correct factors and twiddles.
Chris@366 31
Chris@366 32 * Frank van der Hulst offered numerous suggestions for smaller code size and correct operation
Chris@366 33 on embedded targets. "I'm happy to help anyone who is trying to implement KISSFFT on a micro"
Chris@366 34
Chris@366 35 Some of these were rolled into the mainline code base:
Chris@366 36 - using long casts to promote intermediate results of short*short multiplication
Chris@366 37 - delaying allocation of buffers that are sometimes unused.
Chris@366 38 In some cases, it may be desirable to limit capability in order to better suit the target:
Chris@366 39 - predefining the twiddle tables for the desired fft size.