c@409: Speed: c@409: * If you want to use multiple cores, then compile with -openmp or -fopenmp (see your compiler docs). c@409: Realize that larger FFTs will reap more benefit than smaller FFTs. This generally uses more CPU time, but c@409: less wall time. c@409: c@409: * experiment with compiler flags c@409: Special thanks to Oscar Lesta. He suggested some compiler flags c@409: for gcc that make a big difference. They shave 10-15% off c@409: execution time on some systems. Try some combination of: c@409: -march=pentiumpro c@409: -ffast-math c@409: -fomit-frame-pointer c@409: c@409: * If the input data has no imaginary component, use the kiss_fftr code under tools/. c@409: Real ffts are roughly twice as fast as complex. c@409: c@409: * If you can rearrange your code to do 4 FFTs in parallel and you are on a recent Intel or AMD machine, c@409: then you might want to experiment with the USE_SIMD code. See README.simd c@409: c@409: c@409: Reducing code size: c@409: * remove some of the butterflies. There are currently butterflies optimized for radices c@409: 2,3,4,5. It is worth mentioning that you can still use FFT sizes that contain c@409: other factors, they just won't be quite as fast. You can decide for yourself c@409: whether to keep radix 2 or 4. If you do some work in this area, let me c@409: know what you find. c@409: c@409: * For platforms where ROM/code space is more plentiful than RAM, c@409: consider creating a hardcoded kiss_fft_state. In other words, decide which c@409: FFT size(s) you want and make a structure with the correct factors and twiddles. c@409: c@409: * Frank van der Hulst offered numerous suggestions for smaller code size and correct operation c@409: on embedded targets. "I'm happy to help anyone who is trying to implement KISSFFT on a micro" c@409: c@409: Some of these were rolled into the mainline code base: c@409: - using long casts to promote intermediate results of short*short multiplication c@409: - delaying allocation of buffers that are sometimes unused. c@409: In some cases, it may be desirable to limit capability in order to better suit the target: c@409: - predefining the twiddle tables for the desired fft size.