Mercurial > hg > silvet
view testdata/timing/results.txt @ 115:fbf9b824aaf3 bqvec-openmp
Report on last couple of tests
author | Chris Cannam |
---|---|
date | Wed, 07 May 2014 09:48:56 +0100 |
parents | 2169e7a448c5 |
children | c3c768ac4340 |
line wrap: on
line source
Thinkpad T540p i5-4330M @2.80GHz with 16GB RAM, plugged in Arch Linux, gcc 4.8.2 Using sonic-annotator v1.0 (commit:41c4de1e05d8), release build Debug flags: -g -fPIC Release flags: -O3 -ffast-math -msse -mfpmath=sse -ftree-vectorize -fPIC Release flags for qm-dsp also include -fomit-frame-pointer The input file is 1-channel 16-bit PCM at 44100Hz, duration 0m43.5s. DEBUG/RELEASE: commit:ce64d11ef336, release build of Silvet, release build of qm-dsp real 1m44.456s user 1m44.343s sys 0m0.210s commit:ce64d11ef336, debug build of Silvet, release build of qm-dsp real 14m16.124s user 14m16.907s sys 0m0.217s commit:ce64d11ef336, release build of Silvet, debug build of qm-dsp real 1m55.204s user 1m55.053s sys 0m0.253s Subsequent tests use release builds of both. VAMP FEATURE SUPPRESSION: commit:7133f78ccbf6, as commit:ce64d11ef336 but with CQ output feature return commented out real 1m46.162s user 1m46.093s sys 0m0.157s commit:78a7bf247016, as commit:ce64d11ef336 but with CQ output and FCQ output feature return commented out real 1m45.206s user 1m45.153s sys 0m0.147s conclusion: no advantage in removing these DEBUG PRINTOUTS: commit:f3bf6503e6c6, as commit:ce64d11ef336 but with debug printouts removed real 1m43.744s user 1m43.657s sys 0m0.203s conclusion: obviously we want to remove these eventually, but might as well keep in during testing EM ITERATIONS: commit:5314d3361dfb, as commit:ce64d11ef336 but with only 6 EM iterations instead of 12 real 0m59.055s user 0m58.897s sys 0m0.193s conclusion: EM dominates the time taken, not CQ or note forming CQ DECIMATOR CONFIGURATION: Uncommitted revision (because changes are in CQ subrepo) that is as commit:ce64d11ef336 but with resampler SNR=30 and BW=0.04 instead of SNR=60 and BW=0.02 real 1m43.176s user 1m43.067s sys 0m0.190s conclusion: supports the previous test OPENMP: commit:62b7be1226d5, as commit:ce64d11ef336 but with OpenMP parallel "for" in the main EM iteration loop (4 cores) real 0m56.400s user 2m59.740s sys 0m0.237s EM TWEAKS: commit:a0dedcbfa628, as commit:ce64d11ef336 but with variables hoisted out of loops and consts added wherever applicable real 1m44.548s user 1m44.460s sys 0m0.183s conclusion: compiler already knows this stuff commit:64b08cc12da0, as commit:ce64d11ef336 but with loops merged so as theoretically to reduce intermediate calculations real 3m46.969s user 3m46.850s sys 0m0.220s commit:6075e92d63ab, as commit:64b08cc12da0 but with innermost loop reverted to three loops with simple bodies instead of one with a more complex body real 1m44.767s user 1m44.490s sys 0m0.190s commit:97b77e7cb94c, as commit:6075e92d63ab but with templates stored as doubles instead of floats (doubling the size of the plugin binary) real 1m40.135s user 1m39.820s sys 0m0.230s commit:a6e136aaa202, as commit:97b77e7cb94c but with target vectors & grids initialised to epsilon instead of copied & then overwritten (this one also makes the intention clearer I think so is worth doing) real 1m39.277s user 1m39.000s sys 0m0.183s commit:840c0d703bbb, as commit:a6e136aaa202 but using single-precision floats for all EM code (and templates). This is probably not wise without separately testing the quality of the results but it's interesting to compare real 1m29.003s user 1m28.697s sys 0m0.197s BQVEC: commit:81eaba98985b, as commit:a6e136aaa202 but converted to use bqvec for basic allocation etc; processing logic unchanged real 1m37.320s user 1m36.863s sys 0m0.240s commit:891cbcf1e4d2, as commit:81eaba98985b but with some calculations vectorised [note: has silly bug] real 1m24.961s user 1m24.663s sys 0m0.177s commit:853b2d750688, as commit:891cbcf1e4d2 but with silly bug fixed real 1m26.876s user 1m26.387s sys 0m0.267s commit:9ecad4c9c2a2, as commit:853b2d750688 but using a couple of bqvec calls in expectation function real 1m9.153s user 1m8.837s sys 0m0.187s (this seems unlikely -- what have I broken?) commit:8259193b3b16, as commit:9ecad4c9c2a2 but avoiding some allocations real 1m10.631s user 1m10.327s sys 0m0.180s (still broken?) commit:19f6832fdc8a, as commit:9ecad4c9c2a2 but with the arguments to v_add_with_gain supplied in the right order (that's what I'd broken!) real 1m28.957s user 1m28.437s sys 0m0.213s BQVEC and OPENMP commit:ac750e222ad3, result of merging openmp branch commit:62b7be1226d into bqvec branch commit:19f6832fdc8a real 0m44.650s user 2m19.997s sys 0m0.343s commit:c4eae816bdb3, as commit:ac750e222ad3 but with some logic to make using the shifts optional (though on by default). Performance *should* be unchanged here. real 0m43.979s user 2m19.297s sys 0m0.360s commit:b2f0967cb8d1, as commit:c4eae816bdb3 but storing the templates as float arrays and then pulling them out into individual one-per-shift-factor double arrays each of which is explicitly allocated with the proper alignment. Uses more memory, and the code is ugly, but gets aligned starts for slightly more of the vector ops. real 0m50.856s user 2m44.937s sys 0m0.463s