| Implementation | Result | Time (first half) | Time (second half) | Rate (second half) |
|---|---|---|---|---|
| Nayuki | ||||
| Nockert | ||||
| Dntj | ||||
| Cross | ||||
| KissFFT |
If 2150 iterations of real-to-complex FFT of size 2048 takes less than 10 seconds, then we may be able to make a high quality real-time phase vocoder (just).
A phase-vocoder of course must use overlapped windowed FFT (although you can choose the size, within limits), IFFT, and cartesian-polar conversion to calculate the phase for the instantaneous frequency.
A reasonable estimate of CPU cost for the whole thing is somewhere around 10x the cost of simple non-overlapping short-time forward Fourier transforms across the signal.
2150 iterations corresponds to 100 seconds of audio non-overlapped at 44.1kHz, so if that takes less than 10 seconds, then in theory we might be OK.