qm-dsp: ext/kissfft/TIPS annotate

annotate ext/kissfft/TIPS @ 510:2adcd94c2079

Update test

author	Chris Cannam <cannam@all-day-breakfast.com>
date	Thu, 06 Jun 2019 14:26:46 +0100
parents	1f1999b0f577
children

rev	line source
c@409	1 Speed:
c@409	2 * If you want to use multiple cores, then compile with -openmp or -fopenmp (see your compiler docs).
c@409	3 Realize that larger FFTs will reap more benefit than smaller FFTs. This generally uses more CPU time, but
c@409	4 less wall time.
c@409	5
c@409	6 * experiment with compiler flags
c@409	7 Special thanks to Oscar Lesta. He suggested some compiler flags
c@409	8 for gcc that make a big difference. They shave 10-15% off
c@409	9 execution time on some systems. Try some combination of:
c@409	10 -march=pentiumpro
c@409	11 -ffast-math
c@409	12 -fomit-frame-pointer
c@409	13
c@409	14 * If the input data has no imaginary component, use the kiss_fftr code under tools/.
c@409	15 Real ffts are roughly twice as fast as complex.
c@409	16
c@409	17 * If you can rearrange your code to do 4 FFTs in parallel and you are on a recent Intel or AMD machine,
c@409	18 then you might want to experiment with the USE_SIMD code. See README.simd
c@409	19
c@409	20
c@409	21 Reducing code size:
c@409	22 * remove some of the butterflies. There are currently butterflies optimized for radices
c@409	23 2,3,4,5. It is worth mentioning that you can still use FFT sizes that contain
c@409	24 other factors, they just won't be quite as fast. You can decide for yourself
c@409	25 whether to keep radix 2 or 4. If you do some work in this area, let me
c@409	26 know what you find.
c@409	27
c@409	28 * For platforms where ROM/code space is more plentiful than RAM,
c@409	29 consider creating a hardcoded kiss_fft_state. In other words, decide which
c@409	30 FFT size(s) you want and make a structure with the correct factors and twiddles.
c@409	31
c@409	32 * Frank van der Hulst offered numerous suggestions for smaller code size and correct operation
c@409	33 on embedded targets. "I'm happy to help anyone who is trying to implement KISSFFT on a micro"
c@409	34
c@409	35 Some of these were rolled into the mainline code base:
c@409	36 - using long casts to promote intermediate results of short*short multiplication
c@409	37 - delaying allocation of buffers that are sometimes unused.
c@409	38 In some cases, it may be desirable to limit capability in order to better suit the target:
c@409	39 - predefining the twiddle tables for the desired fft size.

Mercurial > hg > qm-dsp

annotate ext/kissfft/TIPS @ 510:2adcd94c2079