annotate testdata/timing/results.txt @ 120:ab1d8efbb7b5 bqvec-openmp

More results
author Chris Cannam
date Wed, 07 May 2014 10:44:11 +0100
parents 36f58a539125
children 8bbb4a17f783
rev   line source
Chris@72 1
Chris@72 2 Thinkpad T540p i5-4330M @2.80GHz with 16GB RAM, plugged in
Chris@72 3 Arch Linux, gcc 4.8.2
Chris@72 4 Using sonic-annotator v1.0 (commit:41c4de1e05d8), release build
Chris@72 5
Chris@72 6 Debug flags: -g -fPIC
Chris@72 7 Release flags: -O3 -ffast-math -msse -mfpmath=sse -ftree-vectorize -fPIC
Chris@72 8
Chris@73 9 Release flags for qm-dsp also include -fomit-frame-pointer
Chris@73 10
Chris@73 11 The input file is 1-channel 16-bit PCM at 44100Hz, duration 0m43.5s.
Chris@72 12
Chris@72 13
Chris@72 14 DEBUG/RELEASE:
Chris@72 15
Chris@72 16 commit:ce64d11ef336, release build of Silvet, release build of qm-dsp
Chris@72 17
Chris@73 18 real 1m44.456s
Chris@73 19 user 1m44.343s
Chris@73 20 sys 0m0.210s
Chris@72 21
Chris@72 22 commit:ce64d11ef336, debug build of Silvet, release build of qm-dsp
Chris@72 23
Chris@73 24 real 14m16.124s
Chris@73 25 user 14m16.907s
Chris@73 26 sys 0m0.217s
Chris@72 27
Chris@72 28 commit:ce64d11ef336, release build of Silvet, debug build of qm-dsp
Chris@72 29
Chris@73 30 real 1m55.204s
Chris@73 31 user 1m55.053s
Chris@73 32 sys 0m0.253s
Chris@72 33
Chris@72 34 Subsequent tests use release builds of both.
Chris@72 35
Chris@72 36
Chris@73 37 VAMP FEATURE SUPPRESSION:
Chris@73 38
Chris@78 39 commit:7133f78ccbf6, as commit:ce64d11ef336 but with CQ output feature
Chris@78 40 return commented out
Chris@78 41
Chris@78 42 real 1m46.162s
Chris@78 43 user 1m46.093s
Chris@78 44 sys 0m0.157s
Chris@78 45
Chris@78 46 commit:78a7bf247016, as commit:ce64d11ef336 but with CQ output and FCQ
Chris@78 47 output feature return commented out
Chris@78 48
Chris@78 49 real 1m45.206s
Chris@78 50 user 1m45.153s
Chris@78 51 sys 0m0.147s
Chris@78 52
Chris@78 53 conclusion: no advantage in removing these
Chris@78 54
Chris@78 55
Chris@78 56 DEBUG PRINTOUTS:
Chris@78 57
Chris@78 58 commit:f3bf6503e6c6, as commit:ce64d11ef336 but with debug printouts
Chris@78 59 removed
Chris@78 60
Chris@78 61 real 1m43.744s
Chris@78 62 user 1m43.657s
Chris@78 63 sys 0m0.203s
Chris@78 64
Chris@82 65 conclusion: obviously we want to remove these eventually, but might as
Chris@78 66 well keep in during testing
Chris@78 67
Chris@82 68
Chris@82 69 EM ITERATIONS:
Chris@82 70
Chris@82 71 commit:5314d3361dfb, as commit:ce64d11ef336 but with only 6 EM
Chris@82 72 iterations instead of 12
Chris@82 73
Chris@82 74 real 0m59.055s
Chris@82 75 user 0m58.897s
Chris@82 76 sys 0m0.193s
Chris@82 77
Chris@82 78 conclusion: EM dominates the time taken, not CQ or note forming
Chris@82 79
Chris@82 80
Chris@82 81 CQ DECIMATOR CONFIGURATION:
Chris@82 82
Chris@82 83 Uncommitted revision (because changes are in CQ subrepo) that is as
Chris@82 84 commit:ce64d11ef336 but with resampler SNR=30 and BW=0.04 instead of
Chris@82 85 SNR=60 and BW=0.02
Chris@82 86
Chris@82 87 real 1m43.176s
Chris@82 88 user 1m43.067s
Chris@82 89 sys 0m0.190s
Chris@82 90
Chris@82 91 conclusion: supports the previous test
Chris@82 92
Chris@108 93
Chris@108 94 OPENMP:
Chris@108 95
Chris@108 96 commit:62b7be1226d5, as commit:ce64d11ef336 but with OpenMP parallel
Chris@108 97 "for" in the main EM iteration loop (4 cores)
Chris@108 98
Chris@108 99 real 0m56.400s
Chris@108 100 user 2m59.740s
Chris@108 101 sys 0m0.237s
Chris@108 102
Chris@108 103
Chris@108 104 EM TWEAKS:
Chris@108 105
Chris@108 106 commit:a0dedcbfa628, as commit:ce64d11ef336 but with variables hoisted
Chris@108 107 out of loops and consts added wherever applicable
Chris@108 108
Chris@108 109 real 1m44.548s
Chris@108 110 user 1m44.460s
Chris@108 111 sys 0m0.183s
Chris@108 112
Chris@108 113 conclusion: compiler already knows this stuff
Chris@108 114
Chris@108 115 commit:64b08cc12da0, as commit:ce64d11ef336 but with loops merged so
Chris@108 116 as theoretically to reduce intermediate calculations
Chris@108 117
Chris@108 118 real 3m46.969s
Chris@108 119 user 3m46.850s
Chris@108 120 sys 0m0.220s
Chris@108 121
Chris@108 122 commit:6075e92d63ab, as commit:64b08cc12da0 but with innermost loop
Chris@108 123 reverted to three loops with simple bodies instead of one with a more
Chris@108 124 complex body
Chris@108 125
Chris@108 126 real 1m44.767s
Chris@108 127 user 1m44.490s
Chris@108 128 sys 0m0.190s
Chris@108 129
Chris@108 130 commit:97b77e7cb94c, as commit:6075e92d63ab but with templates stored
Chris@108 131 as doubles instead of floats (doubling the size of the plugin binary)
Chris@108 132
Chris@108 133 real 1m40.135s
Chris@108 134 user 1m39.820s
Chris@108 135 sys 0m0.230s
Chris@108 136
Chris@108 137 commit:a6e136aaa202, as commit:97b77e7cb94c but with target vectors &
Chris@108 138 grids initialised to epsilon instead of copied & then overwritten
Chris@108 139 (this one also makes the intention clearer I think so is worth doing)
Chris@108 140
Chris@108 141 real 1m39.277s
Chris@108 142 user 1m39.000s
Chris@108 143 sys 0m0.183s
Chris@108 144
Chris@108 145 commit:840c0d703bbb, as commit:a6e136aaa202 but using single-precision
Chris@108 146 floats for all EM code (and templates). This is probably not wise
Chris@108 147 without separately testing the quality of the results but it's
Chris@108 148 interesting to compare
Chris@108 149
Chris@108 150 real 1m29.003s
Chris@108 151 user 1m28.697s
Chris@108 152 sys 0m0.197s
Chris@108 153
Chris@118 154 commit:91bb029a847a, as commit:a6e136aaa202 but with the series of
Chris@118 155 calculations reordered to match that in the recent bqvec code
Chris@118 156 commit:b2f0967cb8d1. Just testing whether it is the replacement of
Chris@118 157 std::vector or the reordering of vector operations that was saving the
Chris@118 158 time in bqvec branch.
Chris@118 159
Chris@118 160 real 2m52.922s
Chris@118 161 user 2m52.480s
Chris@118 162 sys 0m0.263s
Chris@118 163
Chris@108 164
Chris@108 165 BQVEC:
Chris@108 166
Chris@108 167 commit:81eaba98985b, as commit:a6e136aaa202 but converted to use bqvec
Chris@108 168 for basic allocation etc; processing logic unchanged
Chris@108 169
Chris@108 170 real 1m37.320s
Chris@108 171 user 1m36.863s
Chris@108 172 sys 0m0.240s
Chris@108 173
Chris@108 174 commit:891cbcf1e4d2, as commit:81eaba98985b but with some calculations
Chris@108 175 vectorised [note: has silly bug]
Chris@108 176
Chris@108 177 real 1m24.961s
Chris@108 178 user 1m24.663s
Chris@108 179 sys 0m0.177s
Chris@108 180
Chris@108 181 commit:853b2d750688, as commit:891cbcf1e4d2 but with silly bug fixed
Chris@108 182
Chris@108 183 real 1m26.876s
Chris@108 184 user 1m26.387s
Chris@108 185 sys 0m0.267s
Chris@108 186
Chris@108 187 commit:9ecad4c9c2a2, as commit:853b2d750688 but using a couple of
Chris@108 188 bqvec calls in expectation function
Chris@108 189
Chris@108 190 real 1m9.153s
Chris@108 191 user 1m8.837s
Chris@108 192 sys 0m0.187s
Chris@108 193
Chris@108 194 (this seems unlikely -- what have I broken?)
Chris@108 195
Chris@108 196 commit:8259193b3b16, as commit:9ecad4c9c2a2 but avoiding some
Chris@108 197 allocations
Chris@108 198
Chris@108 199 real 1m10.631s
Chris@108 200 user 1m10.327s
Chris@108 201 sys 0m0.180s
Chris@108 202
Chris@108 203 (still broken?)
Chris@108 204
Chris@108 205 commit:19f6832fdc8a, as commit:9ecad4c9c2a2 but with the arguments to
Chris@108 206 v_add_with_gain supplied in the right order (that's what I'd broken!)
Chris@108 207
Chris@108 208 real 1m28.957s
Chris@108 209 user 1m28.437s
Chris@108 210 sys 0m0.213s
Chris@108 211
Chris@108 212
Chris@108 213 BQVEC and OPENMP
Chris@108 214
Chris@108 215 commit:ac750e222ad3, result of merging openmp branch
Chris@108 216 commit:62b7be1226d into bqvec branch commit:19f6832fdc8a
Chris@108 217
Chris@108 218 real 0m44.650s
Chris@108 219 user 2m19.997s
Chris@108 220 sys 0m0.343s
Chris@118 221
Chris@118 222 commit:c4eae816bdb3, as commit:ac750e222ad3 but with some logic to
Chris@118 223 make using the shifts optional (though on by default). Performance
Chris@118 224 *should* be unchanged here.
Chris@118 225
Chris@118 226 real 0m43.979s
Chris@118 227 user 2m19.297s
Chris@118 228 sys 0m0.360s
Chris@118 229
Chris@118 230 commit:b2f0967cb8d1, as commit:c4eae816bdb3 but storing the templates
Chris@118 231 as float arrays and then pulling them out into individual
Chris@118 232 one-per-shift-factor double arrays each of which is explicitly
Chris@118 233 allocated with the proper alignment. Uses more memory, and the code is
Chris@118 234 ugly, but gets aligned starts for slightly more of the vector ops.
Chris@118 235
Chris@118 236 real 0m50.856s
Chris@118 237 user 2m44.937s
Chris@118 238 sys 0m0.463s
Chris@118 239
Chris@120 240 commit:6890dea115c3, as commit:c4eae816bdb3 with a loop factored out
Chris@120 241
Chris@120 242 real 0m40.565s
Chris@120 243 user 2m3.883s
Chris@120 244 sys 0m0.307s
Chris@120 245