Chris@72: 
Chris@72: Thinkpad T540p i5-4330M @2.80GHz with 16GB RAM, plugged in
Chris@72: Arch Linux, gcc 4.8.2
Chris@72: Using sonic-annotator v1.0 (commit:41c4de1e05d8), release build
Chris@72: 
Chris@72: Debug flags: -g -fPIC
Chris@72: Release flags: -O3 -ffast-math -msse -mfpmath=sse -ftree-vectorize -fPIC
Chris@72: 
Chris@73: Release flags for qm-dsp also include -fomit-frame-pointer
Chris@73: 
Chris@73: The input file is 1-channel 16-bit PCM at 44100Hz, duration 0m43.5s.
Chris@72: 
Chris@72: 
Chris@72: DEBUG/RELEASE:
Chris@72: 
Chris@72: commit:ce64d11ef336, release build of Silvet, release build of qm-dsp
Chris@72: 
Chris@73: real	1m44.456s
Chris@73: user	1m44.343s
Chris@73: sys	0m0.210s
Chris@72: 
Chris@72: commit:ce64d11ef336, debug build of Silvet, release build of qm-dsp
Chris@72: 
Chris@73: real	14m16.124s
Chris@73: user	14m16.907s
Chris@73: sys	0m0.217s
Chris@72: 
Chris@72: commit:ce64d11ef336, release build of Silvet, debug build of qm-dsp
Chris@72: 
Chris@73: real	1m55.204s
Chris@73: user	1m55.053s
Chris@73: sys	0m0.253s
Chris@72: 
Chris@72: Subsequent tests use release builds of both.
Chris@72: 
Chris@72: 
Chris@73: VAMP FEATURE SUPPRESSION:
Chris@73: 
Chris@78: commit:7133f78ccbf6, as commit:ce64d11ef336 but with CQ output feature
Chris@78: return commented out
Chris@78: 
Chris@78: real	1m46.162s
Chris@78: user	1m46.093s
Chris@78: sys	0m0.157s
Chris@78: 
Chris@78: commit:78a7bf247016, as commit:ce64d11ef336 but with CQ output and FCQ
Chris@78: output feature return commented out
Chris@78: 
Chris@78: real	1m45.206s
Chris@78: user	1m45.153s
Chris@78: sys	0m0.147s
Chris@78: 
Chris@78: conclusion: no advantage in removing these
Chris@78: 
Chris@78: 
Chris@78: DEBUG PRINTOUTS:
Chris@78: 
Chris@78: commit:f3bf6503e6c6, as commit:ce64d11ef336 but with debug printouts
Chris@78: removed
Chris@78: 
Chris@78: real	1m43.744s
Chris@78: user	1m43.657s
Chris@78: sys	0m0.203s
Chris@78: 
Chris@82: conclusion: obviously we want to remove these eventually, but might as
Chris@78: well keep in during testing
Chris@78: 
Chris@82: 
Chris@82: EM ITERATIONS:
Chris@82: 
Chris@82: commit:5314d3361dfb, as commit:ce64d11ef336 but with only 6 EM
Chris@82: iterations instead of 12
Chris@82: 
Chris@82: real	0m59.055s
Chris@82: user	0m58.897s
Chris@82: sys	0m0.193s
Chris@82: 
Chris@82: conclusion: EM dominates the time taken, not CQ or note forming
Chris@82: 
Chris@82: 
Chris@82: CQ DECIMATOR CONFIGURATION:
Chris@82: 
Chris@82: Uncommitted revision (because changes are in CQ subrepo) that is as
Chris@82: commit:ce64d11ef336 but with resampler SNR=30 and BW=0.04 instead of
Chris@82: SNR=60 and BW=0.02
Chris@82: 
Chris@82: real	1m43.176s
Chris@82: user	1m43.067s
Chris@82: sys	0m0.190s
Chris@82: 
Chris@82: conclusion: supports the previous test
Chris@82: 
Chris@108: 
Chris@108: OPENMP:
Chris@108: 
Chris@108: commit:62b7be1226d5, as commit:ce64d11ef336 but with OpenMP parallel
Chris@108: "for" in the main EM iteration loop (4 cores)
Chris@108: 
Chris@108: real	0m56.400s
Chris@108: user	2m59.740s
Chris@108: sys	0m0.237s
Chris@108: 
Chris@108: 
Chris@108: EM TWEAKS:
Chris@108: 
Chris@108: commit:a0dedcbfa628, as commit:ce64d11ef336 but with variables hoisted
Chris@108: out of loops and consts added wherever applicable
Chris@108: 
Chris@108: real	1m44.548s
Chris@108: user	1m44.460s
Chris@108: sys	0m0.183s
Chris@108: 
Chris@108: conclusion: compiler already knows this stuff
Chris@108: 
Chris@108: commit:64b08cc12da0, as commit:ce64d11ef336 but with loops merged so
Chris@108: as theoretically to reduce intermediate calculations
Chris@108: 
Chris@108: real	3m46.969s
Chris@108: user	3m46.850s
Chris@108: sys	0m0.220s
Chris@108: 
Chris@108: commit:6075e92d63ab, as commit:64b08cc12da0 but with innermost loop
Chris@108: reverted to three loops with simple bodies instead of one with a more
Chris@108: complex body
Chris@108: 
Chris@108: real	1m44.767s
Chris@108: user	1m44.490s
Chris@108: sys	0m0.190s
Chris@108: 
Chris@108: commit:97b77e7cb94c, as commit:6075e92d63ab but with templates stored
Chris@108: as doubles instead of floats (doubling the size of the plugin binary)
Chris@108: 
Chris@108: real	1m40.135s
Chris@108: user	1m39.820s
Chris@108: sys	0m0.230s
Chris@108: 
Chris@108: commit:a6e136aaa202, as commit:97b77e7cb94c but with target vectors &
Chris@108: grids initialised to epsilon instead of copied & then overwritten
Chris@108: (this one also makes the intention clearer I think so is worth doing)
Chris@108: 
Chris@108: real	1m39.277s
Chris@108: user	1m39.000s
Chris@108: sys	0m0.183s
Chris@108: 
Chris@108: commit:840c0d703bbb, as commit:a6e136aaa202 but using single-precision
Chris@108: floats for all EM code (and templates). This is probably not wise
Chris@108: without separately testing the quality of the results but it's
Chris@108: interesting to compare
Chris@108: 
Chris@108: real	1m29.003s
Chris@108: user	1m28.697s
Chris@108: sys	0m0.197s
Chris@108: 
Chris@118: commit:91bb029a847a, as commit:a6e136aaa202 but with the series of
Chris@118: calculations reordered to match that in the recent bqvec code
Chris@118: commit:b2f0967cb8d1. Just testing whether it is the replacement of
Chris@118: std::vector or the reordering of vector operations that was saving the
Chris@118: time in bqvec branch.
Chris@118: 
Chris@118: real	2m52.922s
Chris@118: user	2m52.480s
Chris@118: sys	0m0.263s
Chris@118: 
Chris@108: 
Chris@108: BQVEC:
Chris@108: 
Chris@108: commit:81eaba98985b, as commit:a6e136aaa202 but converted to use bqvec
Chris@108: for basic allocation etc; processing logic unchanged
Chris@108: 
Chris@108: real	1m37.320s
Chris@108: user	1m36.863s
Chris@108: sys	0m0.240s
Chris@108: 
Chris@108: commit:891cbcf1e4d2, as commit:81eaba98985b but with some calculations
Chris@108: vectorised [note: has silly bug]
Chris@108: 
Chris@108: real	1m24.961s
Chris@108: user	1m24.663s
Chris@108: sys	0m0.177s
Chris@108: 
Chris@108: commit:853b2d750688, as commit:891cbcf1e4d2 but with silly bug fixed
Chris@108: 
Chris@108: real	1m26.876s
Chris@108: user	1m26.387s
Chris@108: sys	0m0.267s
Chris@108: 
Chris@108: commit:9ecad4c9c2a2, as commit:853b2d750688 but using a couple of
Chris@108: bqvec calls in expectation function
Chris@108: 
Chris@108: real	1m9.153s
Chris@108: user	1m8.837s
Chris@108: sys	0m0.187s
Chris@108: 
Chris@108: (this seems unlikely -- what have I broken?)
Chris@108: 
Chris@108: commit:8259193b3b16, as commit:9ecad4c9c2a2 but avoiding some
Chris@108: allocations
Chris@108: 
Chris@108: real	1m10.631s
Chris@108: user	1m10.327s
Chris@108: sys	0m0.180s
Chris@108: 
Chris@108: (still broken?)
Chris@108: 
Chris@108: commit:19f6832fdc8a, as commit:9ecad4c9c2a2 but with the arguments to
Chris@108: v_add_with_gain supplied in the right order (that's what I'd broken!)
Chris@108: 
Chris@108: real	1m28.957s
Chris@108: user	1m28.437s
Chris@108: sys	0m0.213s
Chris@108: 
Chris@108: 
Chris@108: BQVEC and OPENMP
Chris@108: 
Chris@108: commit:ac750e222ad3, result of merging openmp branch
Chris@108: commit:62b7be1226d into bqvec branch commit:19f6832fdc8a
Chris@108: 
Chris@108: real	0m44.650s
Chris@108: user	2m19.997s
Chris@108: sys	0m0.343s
Chris@118: 
Chris@118: commit:c4eae816bdb3, as commit:ac750e222ad3 but with some logic to
Chris@118: make using the shifts optional (though on by default). Performance
Chris@118: *should* be unchanged here.
Chris@118: 
Chris@118: real	0m43.979s
Chris@118: user	2m19.297s
Chris@118: sys	0m0.360s
Chris@118: 
Chris@118: commit:b2f0967cb8d1, as commit:c4eae816bdb3 but storing the templates
Chris@118: as float arrays and then pulling them out into individual
Chris@118: one-per-shift-factor double arrays each of which is explicitly
Chris@118: allocated with the proper alignment. Uses more memory, and the code is
Chris@118: ugly, but gets aligned starts for slightly more of the vector ops.
Chris@118: 
Chris@118: real	0m50.856s
Chris@118: user	2m44.937s
Chris@118: sys	0m0.463s
Chris@118: 
Chris@120: commit:6890dea115c3, as commit:c4eae816bdb3 with a loop factored out
Chris@120: 
Chris@120: real	0m40.565s
Chris@120: user	2m3.883s
Chris@120: sys	0m0.307s
Chris@120: 
Chris@124: commit:230920148ee5, as commit:6890dea115c3 with a simpler openmp loop
Chris@124: 
Chris@124: real	0m39.761s
Chris@124: user	2m3.093s
Chris@124: sys	0m0.347s
Chris@124: 
Chris@129: commit:df05f855f63b, same code as commit:230920148ee5 but with bqvec
Chris@129: as subrepo (just checking)
Chris@129: 
Chris@129: real	0m40.799s
Chris@129: user	2m2.603s
Chris@129: sys	0m0.313s
Chris@129: 
Chris@129: commit:df05f855f63b again, with OMP_NUM_THREADS=1
Chris@129: 
Chris@129: real	1m18.265s
Chris@129: user	1m17.707s
Chris@129: sys	0m0.223s
Chris@129: 
Chris@131: 
Chris@131: FURTHER WORK FOLLOWING BQVEC and OPENMP MERGE
Chris@131: 
Chris@131: commit:f25b8e7de0ed, as commit:df05f855f63b but not processing
Chris@131: templates that are out of range for an instrument (since they should
Chris@131: be all zeros anyway)
Chris@131: 
Chris@131: real	0m23.640s
Chris@131: user	0m59.903s
Chris@131: sys	0m0.277s
Chris@131: 
Chris@133: commit:f25b8e7de0ed in draft mode (no shifts):
Chris@133: 
Chris@133: real	0m13.970s
Chris@133: user	0m21.670s
Chris@133: sys	0m0.260s
Chris@133: 
Chris@132: 
Chris@132: COMPARATIVE TIMINGS from OTHER COMPUTERS
Chris@132: 
Chris@132: Thinkpad T40p Pentium-M (Centrino) @1.6GHz with 1.5GB RAM, plugged in
Chris@132: Arch Linux, gcc 4.8.2
Chris@132: Using sonic-annotator v0.8, release build
Chris@132: 
Chris@132: Release flags: -O3 -ffast-math -msse -mfpmath=sse -ftree-vectorize -fPIC
Chris@132: 
Chris@132: commit:f25b8e7de0ed
Chris@132: 
Chris@132: real	6m54.670s
Chris@132: user	6m31.817s
Chris@132: sys	0m14.753s
Chris@132: 
Chris@132: commit:ce64d11ef336
Chris@132: 
Chris@132: real	9m0.637s
Chris@132: user	8m16.800s
Chris@132: sys	0m25.877s
Chris@132: 
Chris@132: commit:f25b8e7de0ed with -msse2 -march=pentium-m added to compiler flags:
Chris@132: 
Chris@132: real	7m4.231s
Chris@132: user	6m41.760s
Chris@132: sys	0m13.807s
Chris@132: 
Chris@132: commit:f25b8e7de0ed in draft mode (no shifts):
Chris@132: 
Chris@132: real	3m30.218s
Chris@132: user	3m10.527s
Chris@132: sys	0m15.887s
Chris@132: