Wiki » History » Version 14
Version 13 (Chris Cannam, 2013-10-22 09:19 AM) → Version 14/22 (Chris Cannam, 2013-10-22 02:34 PM)
h1. Summary of results
{{>toc}}
h2. What we're looking at
Here we're only looking at causal methods, so no forward/backward filtering. The question in my head is really whether faster IIR filters are still so much faster as to be worth using in preference to linear-phase methods with better (?) theoretical quality. Of course that would always depend on the application, but it's interesting to compare.
We compared
* @decimate@: the "Decimator":http://code.soundsoftware.ac.uk/projects/qm-dsp/embedded/classDecimator.html implementation in the "qm-dsp":/projects/qm-dsp library, which uses an IIR lowpass filter (perhaps an elliptical filter?) with 8 coefficient pairs;
* @decimate_b@: the "DecimatorB":http://code.soundsoftware.ac.uk/projects/qm-dsp/embedded/classDecimatorB.html class in the "qm-dsp":/projects/qm-dsp library, which uses a Butterworth IIR lowpass filter of order 6;
* @resample_hq@, @resample_mq@, @resample_lq@: the "Resampler":http://code.soundsoftware.ac.uk/projects/qm-dsp/embedded/classResampler.html implementation in the "qm-dsp":/projects/qm-dsp library, which uses a lengthy Kaiser-windowed sinc filter, at three different quality settings;
* @src@: the sndfile-resample program which uses "libsamplerate":http://mega-nerd.com/SRC/, a well trusted resampler also using a Kaiser-windowed sinc implementation, at its default quality setting;
* @zoh@: the sndfile-resample zero-order hold resampler, which just takes every Nth sample without any filtering, serving as a baseline.
h2. Speed
For 5292000 input frames on a Core i3-3229Y low-voltage CPU. (Frames-per-second values are for input frames.)
All code is 64-bit. The qm-dsp implementations (resample_* and decimate) were compiled with -O3 -ffast-math while the libsamplerate implementations (src and zoh) were standard Ubuntu packages, so probably -O2. This is likely to make a very significant difference, so these results are more useful for comparison among the qm-dsp implementations than between qm-dsp and libsamplerate.
The decimate implementation supports factors up to 8 only, so 16x, 32x and 64x are handled in two passes.
All implementations use libsndfile for audio file I/O, so that should not be a factor in overall speed.
These frames-per-second figures look terribly precise, but I imagine there's a good 10% margin of error (run to run or whatever).
h3. Implementations by decimation factor
For 11520000 input frames.
h4. Factor 02 2
|Kfps|Clock |Frames per second|Clock time|Implementation|
| 61604|0.187|zoh| 14225806|0.372|decimate|
| 52602|0.219|decimate_b| 14187667|0.373|decimate_b|
| 52363|0.220|decimate| 8939189|0.592|zoh|
| 17668|0.652|resample_lq| 3732016|1.418|resample_lq|
| 9365|1.230|resample_mq| 1856842|2.850|resample_mq|
| 4768|2.416|resample_hq| 989158|5.350|resample_hq|
| 2176|5.294|src|
516141|10.253|src|
h4. Factor 04 4
|Kfps|Clock |Frames per second|Clock time|Implementation|
| 93658|0.123|zoh| |17070967|0.310|zoh|
| 58181|0.198|decimate| |14659279|0.361|decimate|
| 47213|0.244|decimate_b| |13465648|0.393|decimate_b|
| 19896|0.579|resample_lq| 4285020|1.235|resample_lq|
| 9982|1.154|resample_mq| 2186776|2.420|resample_mq|
| 4965|2.320|resample_hq| 1056287|5.010|resample_hq|
| 2292|5.026|src|
610099|8.674|src|
h4. Factor 08 8
|Kfps|Clock |Frames per second|Clock time|Implementation|
| 128000|0.090|zoh| |26328358|0.201|zoh|
| 60952|0.189|decimate| |13926315|0.380|decimate|
| 44651|0.258|decimate_b| |12027272|0.440|decimate_b|
| 21215|0.543|resample_lq| 4895467|1.081|resample_lq|
| 10331|1.115|resample_mq| 2470588|2.142|resample_mq|
| 3480|3.310|resample_hq| 1166409|4.537|resample_hq|
| 2361|4.879|src|
614919|8.606|src|
h4. Factor 16
|Kfps|Clock |Frames per second|Clock time|Implementation|
| 160000|0.072|zoh| |33493670|0.158|zoh|
| 46080|0.250|decimate| |12721153|0.416|decimate|
| 43636|0.264|decimate_b| |11141052|0.475|decimate_b|
| 22068|0.522|resample_lq| 5093358|1.039|resample_lq|
| 7700|1.496|resample_mq| 2515209|2.104|resample_mq|
| 3529|3.264|resample_hq| 1182041|4.477|resample_hq|
| 2119|5.435|src|
668857|7.912|src|
h4. Factor 32
|Kfps|Clock |Frames per second|Clock time|Implementation|
| 182857|0.063|zoh| |41669291|0.127|zoh|
| 53333|0.216|decimate| |14498630|0.365|decimate|
| 42825|0.269|decimate_b| |12540284|0.422|decimate_b|
| 21021|0.548|resample_lq| 5318592|0.995|resample_lq|
| 7379|1.561|resample_mq| 2312937|2.288|resample_mq|
| 3443|3.345|resample_hq| 1148936|4.606|resample_hq|
| 2179|5.286|src|
670467|7.893|src|
h4. Factor 64
|Kfps|Clock |Frames per second|Clock time|Implementation|
| 188852|0.061|zoh| |42000000|0.126|zoh|
| 53581|0.215|decimate| |13397468|0.395|decimate|
| 42666|0.270|decimate_b| |11972850|0.442|decimate_b|
| 16202|0.711|resample_lq| 5040000|1.050|resample_lq|
| 7417|1.553|resample_mq| 2365668|2.237|resample_mq|
| 3489|3.301|resample_hq| 1232704|4.293|resample_hq|
| 2423|4.753|src|
636057|8.320|src|
h3. Decimation factors by implementation
For 11520000 input frames.
h4. Implementation zoh
|Kfps|Clock |Frames per second|Clock time|Factor|
| 188852|0.061|64| |42000000|0.126|factor 64|
| 182857|0.063|32| |41669291|0.127|factor 32|
| 160000|0.072|16| |33493670|0.158|factor 16|
| 128000|0.090|08| |26328358|0.201|factor 8|
| 93658|0.123|04| |17070967|0.310|factor 4|
| 61604|0.187|02|
8939189|0.592|factor 2|
h4. Implementation decimate
|Kfps|Clock |Frames per second|Clock time|Factor|
| 60952|0.189|08| |14659279|0.361|factor 4|
| 58181|0.198|04| |14498630|0.365|factor 32|
| 53581|0.215|64| |14225806|0.372|factor 2|
| 53333|0.216|32| |13926315|0.380|factor 8|
| 52363|0.220|02| |13397468|0.395|factor 64|
| 46080|0.250|16|
|12721153|0.416|factor 16|
h4. Implementation decimate_b
|Kfps|Clock |Frames per second|Clock time|Factor|
| 52602|0.219|02| |14187667|0.373|factor 2|
| 47213|0.244|04| |13465648|0.393|factor 4|
| 44651|0.258|08| |12540284|0.422|factor 32|
| 43636|0.264|16| |12027272|0.440|factor 8|
| 42825|0.269|32| |11972850|0.442|factor 64|
| 42666|0.270|64|
|11141052|0.475|factor 16|
h4. Implementation resample_hq
|Kfps|Clock |Frames per second|Clock time|Factor|
| 4965|2.320|04| 1232704|4.293|factor 64|
| 4768|2.416|02| 1182041|4.477|factor 16|
| 3529|3.264|16| 1166409|4.537|factor 8|
| 3489|3.301|64| 1148936|4.606|factor 32|
| 3480|3.310|08| 1056287|5.010|factor 4|
| 3443|3.345|32|
989158|5.350|factor 2|
h4. Implementation resample_mq
|Kfps|Clock |Frames per second|Clock time|Factor|
| 10331|1.115|08| 2515209|2.104|factor 16|
| 9982|1.154|04| 2470588|2.142|factor 8|
| 9365|1.230|02| 2365668|2.237|factor 64|
| 7700|1.496|16| 2312937|2.288|factor 32|
| 7417|1.553|64| 2186776|2.420|factor 4|
| 7379|1.561|32|
1856842|2.850|factor 2|
h4. Implementation resample_lq
|Kfps|Clock |Frames per second|Clock time|Factor|
| 22068|0.522|16| 5318592|0.995|factor 32|
| 21215|0.543|08| 5093358|1.039|factor 16|
| 21021|0.548|32| 5040000|1.050|factor 64|
| 19896|0.579|04| 4895467|1.081|factor 8|
| 17668|0.652|02| 4285020|1.235|factor 4|
| 16202|0.711|64|
3732016|1.418|factor 2|
h4. Implementation src
|Kfps|Clock |Frames per second|Clock time|Factor|
| 2423|4.753|64| 670467|7.893|factor 32|
| 2361|4.879|08| 668857|7.912|factor 16|
| 2292|5.026|04| 636057|8.320|factor 64|
| 2179|5.286|32| 614919|8.606|factor 8|
| 2176|5.294|02| 610099|8.674|factor 4|
| 2119|5.435|16|
516141|10.253|factor 2|
h3. Resampler filter lengths
Filter lengths the qm-dsp Resamplers decided to use:
|Factor|Length (hq)|Length (mq)|Length (lq)|
|2|643|291|119|
|4|1285|579|237|
|8|2567|1155|471|
|16|5131|2307|939|
|32|10261|4613|1877|
|64|20519|9223|3751|
{{>toc}}
h2. What we're looking at
Here we're only looking at causal methods, so no forward/backward filtering. The question in my head is really whether faster IIR filters are still so much faster as to be worth using in preference to linear-phase methods with better (?) theoretical quality. Of course that would always depend on the application, but it's interesting to compare.
We compared
* @decimate@: the "Decimator":http://code.soundsoftware.ac.uk/projects/qm-dsp/embedded/classDecimator.html implementation in the "qm-dsp":/projects/qm-dsp library, which uses an IIR lowpass filter (perhaps an elliptical filter?) with 8 coefficient pairs;
* @decimate_b@: the "DecimatorB":http://code.soundsoftware.ac.uk/projects/qm-dsp/embedded/classDecimatorB.html class in the "qm-dsp":/projects/qm-dsp library, which uses a Butterworth IIR lowpass filter of order 6;
* @resample_hq@, @resample_mq@, @resample_lq@: the "Resampler":http://code.soundsoftware.ac.uk/projects/qm-dsp/embedded/classResampler.html implementation in the "qm-dsp":/projects/qm-dsp library, which uses a lengthy Kaiser-windowed sinc filter, at three different quality settings;
* @src@: the sndfile-resample program which uses "libsamplerate":http://mega-nerd.com/SRC/, a well trusted resampler also using a Kaiser-windowed sinc implementation, at its default quality setting;
* @zoh@: the sndfile-resample zero-order hold resampler, which just takes every Nth sample without any filtering, serving as a baseline.
h2. Speed
For 5292000 input frames on a Core i3-3229Y low-voltage CPU. (Frames-per-second values are for input frames.)
All code is 64-bit. The qm-dsp implementations (resample_* and decimate) were compiled with -O3 -ffast-math while the libsamplerate implementations (src and zoh) were standard Ubuntu packages, so probably -O2. This is likely to make a very significant difference, so these results are more useful for comparison among the qm-dsp implementations than between qm-dsp and libsamplerate.
The decimate implementation supports factors up to 8 only, so 16x, 32x and 64x are handled in two passes.
All implementations use libsndfile for audio file I/O, so that should not be a factor in overall speed.
These frames-per-second figures look terribly precise, but I imagine there's a good 10% margin of error (run to run or whatever).
h3. Implementations by decimation factor
For 11520000 input frames.
h4. Factor 02 2
|Kfps|Clock |Frames per second|Clock time|Implementation|
| 61604|0.187|zoh| 14225806|0.372|decimate|
| 52602|0.219|decimate_b| 14187667|0.373|decimate_b|
| 52363|0.220|decimate| 8939189|0.592|zoh|
| 17668|0.652|resample_lq| 3732016|1.418|resample_lq|
| 9365|1.230|resample_mq| 1856842|2.850|resample_mq|
| 4768|2.416|resample_hq| 989158|5.350|resample_hq|
| 2176|5.294|src|
516141|10.253|src|
h4. Factor 04 4
|Kfps|Clock |Frames per second|Clock time|Implementation|
| 93658|0.123|zoh| |17070967|0.310|zoh|
| 58181|0.198|decimate| |14659279|0.361|decimate|
| 47213|0.244|decimate_b| |13465648|0.393|decimate_b|
| 19896|0.579|resample_lq| 4285020|1.235|resample_lq|
| 9982|1.154|resample_mq| 2186776|2.420|resample_mq|
| 4965|2.320|resample_hq| 1056287|5.010|resample_hq|
| 2292|5.026|src|
610099|8.674|src|
h4. Factor 08 8
|Kfps|Clock |Frames per second|Clock time|Implementation|
| 128000|0.090|zoh| |26328358|0.201|zoh|
| 60952|0.189|decimate| |13926315|0.380|decimate|
| 44651|0.258|decimate_b| |12027272|0.440|decimate_b|
| 21215|0.543|resample_lq| 4895467|1.081|resample_lq|
| 10331|1.115|resample_mq| 2470588|2.142|resample_mq|
| 3480|3.310|resample_hq| 1166409|4.537|resample_hq|
| 2361|4.879|src|
614919|8.606|src|
h4. Factor 16
|Kfps|Clock |Frames per second|Clock time|Implementation|
| 160000|0.072|zoh| |33493670|0.158|zoh|
| 46080|0.250|decimate| |12721153|0.416|decimate|
| 43636|0.264|decimate_b| |11141052|0.475|decimate_b|
| 22068|0.522|resample_lq| 5093358|1.039|resample_lq|
| 7700|1.496|resample_mq| 2515209|2.104|resample_mq|
| 3529|3.264|resample_hq| 1182041|4.477|resample_hq|
| 2119|5.435|src|
668857|7.912|src|
h4. Factor 32
|Kfps|Clock |Frames per second|Clock time|Implementation|
| 182857|0.063|zoh| |41669291|0.127|zoh|
| 53333|0.216|decimate| |14498630|0.365|decimate|
| 42825|0.269|decimate_b| |12540284|0.422|decimate_b|
| 21021|0.548|resample_lq| 5318592|0.995|resample_lq|
| 7379|1.561|resample_mq| 2312937|2.288|resample_mq|
| 3443|3.345|resample_hq| 1148936|4.606|resample_hq|
| 2179|5.286|src|
670467|7.893|src|
h4. Factor 64
|Kfps|Clock |Frames per second|Clock time|Implementation|
| 188852|0.061|zoh| |42000000|0.126|zoh|
| 53581|0.215|decimate| |13397468|0.395|decimate|
| 42666|0.270|decimate_b| |11972850|0.442|decimate_b|
| 16202|0.711|resample_lq| 5040000|1.050|resample_lq|
| 7417|1.553|resample_mq| 2365668|2.237|resample_mq|
| 3489|3.301|resample_hq| 1232704|4.293|resample_hq|
| 2423|4.753|src|
636057|8.320|src|
h3. Decimation factors by implementation
For 11520000 input frames.
h4. Implementation zoh
|Kfps|Clock |Frames per second|Clock time|Factor|
| 188852|0.061|64| |42000000|0.126|factor 64|
| 182857|0.063|32| |41669291|0.127|factor 32|
| 160000|0.072|16| |33493670|0.158|factor 16|
| 128000|0.090|08| |26328358|0.201|factor 8|
| 93658|0.123|04| |17070967|0.310|factor 4|
| 61604|0.187|02|
8939189|0.592|factor 2|
h4. Implementation decimate
|Kfps|Clock |Frames per second|Clock time|Factor|
| 60952|0.189|08| |14659279|0.361|factor 4|
| 58181|0.198|04| |14498630|0.365|factor 32|
| 53581|0.215|64| |14225806|0.372|factor 2|
| 53333|0.216|32| |13926315|0.380|factor 8|
| 52363|0.220|02| |13397468|0.395|factor 64|
| 46080|0.250|16|
|12721153|0.416|factor 16|
h4. Implementation decimate_b
|Kfps|Clock |Frames per second|Clock time|Factor|
| 52602|0.219|02| |14187667|0.373|factor 2|
| 47213|0.244|04| |13465648|0.393|factor 4|
| 44651|0.258|08| |12540284|0.422|factor 32|
| 43636|0.264|16| |12027272|0.440|factor 8|
| 42825|0.269|32| |11972850|0.442|factor 64|
| 42666|0.270|64|
|11141052|0.475|factor 16|
h4. Implementation resample_hq
|Kfps|Clock |Frames per second|Clock time|Factor|
| 4965|2.320|04| 1232704|4.293|factor 64|
| 4768|2.416|02| 1182041|4.477|factor 16|
| 3529|3.264|16| 1166409|4.537|factor 8|
| 3489|3.301|64| 1148936|4.606|factor 32|
| 3480|3.310|08| 1056287|5.010|factor 4|
| 3443|3.345|32|
989158|5.350|factor 2|
h4. Implementation resample_mq
|Kfps|Clock |Frames per second|Clock time|Factor|
| 10331|1.115|08| 2515209|2.104|factor 16|
| 9982|1.154|04| 2470588|2.142|factor 8|
| 9365|1.230|02| 2365668|2.237|factor 64|
| 7700|1.496|16| 2312937|2.288|factor 32|
| 7417|1.553|64| 2186776|2.420|factor 4|
| 7379|1.561|32|
1856842|2.850|factor 2|
h4. Implementation resample_lq
|Kfps|Clock |Frames per second|Clock time|Factor|
| 22068|0.522|16| 5318592|0.995|factor 32|
| 21215|0.543|08| 5093358|1.039|factor 16|
| 21021|0.548|32| 5040000|1.050|factor 64|
| 19896|0.579|04| 4895467|1.081|factor 8|
| 17668|0.652|02| 4285020|1.235|factor 4|
| 16202|0.711|64|
3732016|1.418|factor 2|
h4. Implementation src
|Kfps|Clock |Frames per second|Clock time|Factor|
| 2423|4.753|64| 670467|7.893|factor 32|
| 2361|4.879|08| 668857|7.912|factor 16|
| 2292|5.026|04| 636057|8.320|factor 64|
| 2179|5.286|32| 614919|8.606|factor 8|
| 2176|5.294|02| 610099|8.674|factor 4|
| 2119|5.435|16|
516141|10.253|factor 2|
h3. Resampler filter lengths
Filter lengths the qm-dsp Resamplers decided to use:
|Factor|Length (hq)|Length (mq)|Length (lq)|
|2|643|291|119|
|4|1285|579|237|
|8|2567|1155|471|
|16|5131|2307|939|
|32|10261|4613|1877|
|64|20519|9223|3751|