c@174: Speed: c@174: * If you want to use multiple cores, then compile with -openmp or -fopenmp (see your compiler docs). c@174: Realize that larger FFTs will reap more benefit than smaller FFTs. This generally uses more CPU time, but c@174: less wall time. c@174: c@174: * experiment with compiler flags c@174: Special thanks to Oscar Lesta. He suggested some compiler flags c@174: for gcc that make a big difference. They shave 10-15% off c@174: execution time on some systems. Try some combination of: c@174: -march=pentiumpro c@174: -ffast-math c@174: -fomit-frame-pointer c@174: c@174: * If the input data has no imaginary component, use the kiss_fftr code under tools/. c@174: Real ffts are roughly twice as fast as complex. c@174: c@174: * If you can rearrange your code to do 4 FFTs in parallel and you are on a recent Intel or AMD machine, c@174: then you might want to experiment with the USE_SIMD code. See README.simd c@174: c@174: c@174: Reducing code size: c@174: * remove some of the butterflies. There are currently butterflies optimized for radices c@174: 2,3,4,5. It is worth mentioning that you can still use FFT sizes that contain c@174: other factors, they just won't be quite as fast. You can decide for yourself c@174: whether to keep radix 2 or 4. If you do some work in this area, let me c@174: know what you find. c@174: c@174: * For platforms where ROM/code space is more plentiful than RAM, c@174: consider creating a hardcoded kiss_fft_state. In other words, decide which c@174: FFT size(s) you want and make a structure with the correct factors and twiddles. c@174: c@174: * Frank van der Hulst offered numerous suggestions for smaller code size and correct operation c@174: on embedded targets. "I'm happy to help anyone who is trying to implement KISSFFT on a micro" c@174: c@174: Some of these were rolled into the mainline code base: c@174: - using long casts to promote intermediate results of short*short multiplication c@174: - delaying allocation of buffers that are sometimes unused. c@174: In some cases, it may be desirable to limit capability in order to better suit the target: c@174: - predefining the twiddle tables for the desired fft size.