Chris@72
|
1
|
Chris@72
|
2 Thinkpad T540p i5-4330M @2.80GHz with 16GB RAM, plugged in
|
Chris@72
|
3 Arch Linux, gcc 4.8.2
|
Chris@72
|
4 Using sonic-annotator v1.0 (commit:41c4de1e05d8), release build
|
Chris@72
|
5
|
Chris@72
|
6 Debug flags: -g -fPIC
|
Chris@72
|
7 Release flags: -O3 -ffast-math -msse -mfpmath=sse -ftree-vectorize -fPIC
|
Chris@72
|
8
|
Chris@73
|
9 Release flags for qm-dsp also include -fomit-frame-pointer
|
Chris@73
|
10
|
Chris@73
|
11 The input file is 1-channel 16-bit PCM at 44100Hz, duration 0m43.5s.
|
Chris@72
|
12
|
Chris@72
|
13
|
Chris@72
|
14 DEBUG/RELEASE:
|
Chris@72
|
15
|
Chris@72
|
16 commit:ce64d11ef336, release build of Silvet, release build of qm-dsp
|
Chris@72
|
17
|
Chris@73
|
18 real 1m44.456s
|
Chris@73
|
19 user 1m44.343s
|
Chris@73
|
20 sys 0m0.210s
|
Chris@72
|
21
|
Chris@72
|
22 commit:ce64d11ef336, debug build of Silvet, release build of qm-dsp
|
Chris@72
|
23
|
Chris@73
|
24 real 14m16.124s
|
Chris@73
|
25 user 14m16.907s
|
Chris@73
|
26 sys 0m0.217s
|
Chris@72
|
27
|
Chris@72
|
28 commit:ce64d11ef336, release build of Silvet, debug build of qm-dsp
|
Chris@72
|
29
|
Chris@73
|
30 real 1m55.204s
|
Chris@73
|
31 user 1m55.053s
|
Chris@73
|
32 sys 0m0.253s
|
Chris@72
|
33
|
Chris@72
|
34 Subsequent tests use release builds of both.
|
Chris@72
|
35
|
Chris@72
|
36
|
Chris@73
|
37 VAMP FEATURE SUPPRESSION:
|
Chris@73
|
38
|
Chris@78
|
39 commit:7133f78ccbf6, as commit:ce64d11ef336 but with CQ output feature
|
Chris@78
|
40 return commented out
|
Chris@78
|
41
|
Chris@78
|
42 real 1m46.162s
|
Chris@78
|
43 user 1m46.093s
|
Chris@78
|
44 sys 0m0.157s
|
Chris@78
|
45
|
Chris@78
|
46 commit:78a7bf247016, as commit:ce64d11ef336 but with CQ output and FCQ
|
Chris@78
|
47 output feature return commented out
|
Chris@78
|
48
|
Chris@78
|
49 real 1m45.206s
|
Chris@78
|
50 user 1m45.153s
|
Chris@78
|
51 sys 0m0.147s
|
Chris@78
|
52
|
Chris@78
|
53 conclusion: no advantage in removing these
|
Chris@78
|
54
|
Chris@78
|
55
|
Chris@78
|
56 DEBUG PRINTOUTS:
|
Chris@78
|
57
|
Chris@78
|
58 commit:f3bf6503e6c6, as commit:ce64d11ef336 but with debug printouts
|
Chris@78
|
59 removed
|
Chris@78
|
60
|
Chris@78
|
61 real 1m43.744s
|
Chris@78
|
62 user 1m43.657s
|
Chris@78
|
63 sys 0m0.203s
|
Chris@78
|
64
|
Chris@82
|
65 conclusion: obviously we want to remove these eventually, but might as
|
Chris@78
|
66 well keep in during testing
|
Chris@78
|
67
|
Chris@82
|
68
|
Chris@82
|
69 EM ITERATIONS:
|
Chris@82
|
70
|
Chris@82
|
71 commit:5314d3361dfb, as commit:ce64d11ef336 but with only 6 EM
|
Chris@82
|
72 iterations instead of 12
|
Chris@82
|
73
|
Chris@82
|
74 real 0m59.055s
|
Chris@82
|
75 user 0m58.897s
|
Chris@82
|
76 sys 0m0.193s
|
Chris@82
|
77
|
Chris@82
|
78 conclusion: EM dominates the time taken, not CQ or note forming
|
Chris@82
|
79
|
Chris@82
|
80
|
Chris@82
|
81 CQ DECIMATOR CONFIGURATION:
|
Chris@82
|
82
|
Chris@82
|
83 Uncommitted revision (because changes are in CQ subrepo) that is as
|
Chris@82
|
84 commit:ce64d11ef336 but with resampler SNR=30 and BW=0.04 instead of
|
Chris@82
|
85 SNR=60 and BW=0.02
|
Chris@82
|
86
|
Chris@82
|
87 real 1m43.176s
|
Chris@82
|
88 user 1m43.067s
|
Chris@82
|
89 sys 0m0.190s
|
Chris@82
|
90
|
Chris@82
|
91 conclusion: supports the previous test
|
Chris@82
|
92
|
Chris@108
|
93
|
Chris@108
|
94 OPENMP:
|
Chris@108
|
95
|
Chris@108
|
96 commit:62b7be1226d5, as commit:ce64d11ef336 but with OpenMP parallel
|
Chris@108
|
97 "for" in the main EM iteration loop (4 cores)
|
Chris@108
|
98
|
Chris@108
|
99 real 0m56.400s
|
Chris@108
|
100 user 2m59.740s
|
Chris@108
|
101 sys 0m0.237s
|
Chris@108
|
102
|
Chris@108
|
103
|
Chris@108
|
104 EM TWEAKS:
|
Chris@108
|
105
|
Chris@108
|
106 commit:a0dedcbfa628, as commit:ce64d11ef336 but with variables hoisted
|
Chris@108
|
107 out of loops and consts added wherever applicable
|
Chris@108
|
108
|
Chris@108
|
109 real 1m44.548s
|
Chris@108
|
110 user 1m44.460s
|
Chris@108
|
111 sys 0m0.183s
|
Chris@108
|
112
|
Chris@108
|
113 conclusion: compiler already knows this stuff
|
Chris@108
|
114
|
Chris@108
|
115 commit:64b08cc12da0, as commit:ce64d11ef336 but with loops merged so
|
Chris@108
|
116 as theoretically to reduce intermediate calculations
|
Chris@108
|
117
|
Chris@108
|
118 real 3m46.969s
|
Chris@108
|
119 user 3m46.850s
|
Chris@108
|
120 sys 0m0.220s
|
Chris@108
|
121
|
Chris@108
|
122 commit:6075e92d63ab, as commit:64b08cc12da0 but with innermost loop
|
Chris@108
|
123 reverted to three loops with simple bodies instead of one with a more
|
Chris@108
|
124 complex body
|
Chris@108
|
125
|
Chris@108
|
126 real 1m44.767s
|
Chris@108
|
127 user 1m44.490s
|
Chris@108
|
128 sys 0m0.190s
|
Chris@108
|
129
|
Chris@108
|
130 commit:97b77e7cb94c, as commit:6075e92d63ab but with templates stored
|
Chris@108
|
131 as doubles instead of floats (doubling the size of the plugin binary)
|
Chris@108
|
132
|
Chris@108
|
133 real 1m40.135s
|
Chris@108
|
134 user 1m39.820s
|
Chris@108
|
135 sys 0m0.230s
|
Chris@108
|
136
|
Chris@108
|
137 commit:a6e136aaa202, as commit:97b77e7cb94c but with target vectors &
|
Chris@108
|
138 grids initialised to epsilon instead of copied & then overwritten
|
Chris@108
|
139 (this one also makes the intention clearer I think so is worth doing)
|
Chris@108
|
140
|
Chris@108
|
141 real 1m39.277s
|
Chris@108
|
142 user 1m39.000s
|
Chris@108
|
143 sys 0m0.183s
|
Chris@108
|
144
|
Chris@108
|
145 commit:840c0d703bbb, as commit:a6e136aaa202 but using single-precision
|
Chris@108
|
146 floats for all EM code (and templates). This is probably not wise
|
Chris@108
|
147 without separately testing the quality of the results but it's
|
Chris@108
|
148 interesting to compare
|
Chris@108
|
149
|
Chris@108
|
150 real 1m29.003s
|
Chris@108
|
151 user 1m28.697s
|
Chris@108
|
152 sys 0m0.197s
|
Chris@108
|
153
|
Chris@118
|
154 commit:91bb029a847a, as commit:a6e136aaa202 but with the series of
|
Chris@118
|
155 calculations reordered to match that in the recent bqvec code
|
Chris@118
|
156 commit:b2f0967cb8d1. Just testing whether it is the replacement of
|
Chris@118
|
157 std::vector or the reordering of vector operations that was saving the
|
Chris@118
|
158 time in bqvec branch.
|
Chris@118
|
159
|
Chris@118
|
160 real 2m52.922s
|
Chris@118
|
161 user 2m52.480s
|
Chris@118
|
162 sys 0m0.263s
|
Chris@118
|
163
|
Chris@108
|
164
|
Chris@108
|
165 BQVEC:
|
Chris@108
|
166
|
Chris@108
|
167 commit:81eaba98985b, as commit:a6e136aaa202 but converted to use bqvec
|
Chris@108
|
168 for basic allocation etc; processing logic unchanged
|
Chris@108
|
169
|
Chris@108
|
170 real 1m37.320s
|
Chris@108
|
171 user 1m36.863s
|
Chris@108
|
172 sys 0m0.240s
|
Chris@108
|
173
|
Chris@108
|
174 commit:891cbcf1e4d2, as commit:81eaba98985b but with some calculations
|
Chris@108
|
175 vectorised [note: has silly bug]
|
Chris@108
|
176
|
Chris@108
|
177 real 1m24.961s
|
Chris@108
|
178 user 1m24.663s
|
Chris@108
|
179 sys 0m0.177s
|
Chris@108
|
180
|
Chris@108
|
181 commit:853b2d750688, as commit:891cbcf1e4d2 but with silly bug fixed
|
Chris@108
|
182
|
Chris@108
|
183 real 1m26.876s
|
Chris@108
|
184 user 1m26.387s
|
Chris@108
|
185 sys 0m0.267s
|
Chris@108
|
186
|
Chris@108
|
187 commit:9ecad4c9c2a2, as commit:853b2d750688 but using a couple of
|
Chris@108
|
188 bqvec calls in expectation function
|
Chris@108
|
189
|
Chris@108
|
190 real 1m9.153s
|
Chris@108
|
191 user 1m8.837s
|
Chris@108
|
192 sys 0m0.187s
|
Chris@108
|
193
|
Chris@108
|
194 (this seems unlikely -- what have I broken?)
|
Chris@108
|
195
|
Chris@108
|
196 commit:8259193b3b16, as commit:9ecad4c9c2a2 but avoiding some
|
Chris@108
|
197 allocations
|
Chris@108
|
198
|
Chris@108
|
199 real 1m10.631s
|
Chris@108
|
200 user 1m10.327s
|
Chris@108
|
201 sys 0m0.180s
|
Chris@108
|
202
|
Chris@108
|
203 (still broken?)
|
Chris@108
|
204
|
Chris@108
|
205 commit:19f6832fdc8a, as commit:9ecad4c9c2a2 but with the arguments to
|
Chris@108
|
206 v_add_with_gain supplied in the right order (that's what I'd broken!)
|
Chris@108
|
207
|
Chris@108
|
208 real 1m28.957s
|
Chris@108
|
209 user 1m28.437s
|
Chris@108
|
210 sys 0m0.213s
|
Chris@108
|
211
|
Chris@108
|
212
|
Chris@108
|
213 BQVEC and OPENMP
|
Chris@108
|
214
|
Chris@108
|
215 commit:ac750e222ad3, result of merging openmp branch
|
Chris@108
|
216 commit:62b7be1226d into bqvec branch commit:19f6832fdc8a
|
Chris@108
|
217
|
Chris@108
|
218 real 0m44.650s
|
Chris@108
|
219 user 2m19.997s
|
Chris@108
|
220 sys 0m0.343s
|
Chris@118
|
221
|
Chris@118
|
222 commit:c4eae816bdb3, as commit:ac750e222ad3 but with some logic to
|
Chris@118
|
223 make using the shifts optional (though on by default). Performance
|
Chris@118
|
224 *should* be unchanged here.
|
Chris@118
|
225
|
Chris@118
|
226 real 0m43.979s
|
Chris@118
|
227 user 2m19.297s
|
Chris@118
|
228 sys 0m0.360s
|
Chris@118
|
229
|
Chris@118
|
230 commit:b2f0967cb8d1, as commit:c4eae816bdb3 but storing the templates
|
Chris@118
|
231 as float arrays and then pulling them out into individual
|
Chris@118
|
232 one-per-shift-factor double arrays each of which is explicitly
|
Chris@118
|
233 allocated with the proper alignment. Uses more memory, and the code is
|
Chris@118
|
234 ugly, but gets aligned starts for slightly more of the vector ops.
|
Chris@118
|
235
|
Chris@118
|
236 real 0m50.856s
|
Chris@118
|
237 user 2m44.937s
|
Chris@118
|
238 sys 0m0.463s
|
Chris@118
|
239
|