Speed » History » Version 10

Chris Cannam, 2014-05-08 10:12 AM

1 1 Chris Cannam
h1. Speed
2 1 Chris Cannam
3 7 Chris Cannam
h2. Aims
4 7 Chris Cannam
5 1 Chris Cannam
We want to make the plugin as fast as possible, but I think there's a case to be made for providing fast and slow modes (see [[Possibilities for Plugin Parameters]]).
6 1 Chris Cannam
7 1 Chris Cannam
In "fast" mode we should have the aim of producing a reasonable transcription in faster than real-time on any computer from the past 5 years or so. "Slow" mode has no particular speed constraint, simply as fast as possible an implementation of the best results we can easily do.
8 1 Chris Cannam
9 1 Chris Cannam
See the "timing":/projects/silvet/repository/show/testdata/timing directory in the repo for timing tests. These are all carried out on a Thinkpad T540p with Intel i5-4330M under 64-bit Linux.
10 1 Chris Cannam
11 7 Chris Cannam
h2. Work so far
12 1 Chris Cannam
13 6 Chris Cannam
 * commit:ce64d11ef336, pre-optimisation (release build) takes 104 seconds to process a 43.5-second file. (For reference, a debug build takes over 850 seconds.)
14 2 Chris Cannam
15 5 Chris Cannam
 * Experiments to test where the time is spent:
16 6 Chris Cannam
 ** commit:78a7bf247016 removing the unused Vamp plugin outputs: no more than 1% difference
17 6 Chris Cannam
 ** commit:f3bf6503e6c6 removing debug printouts: no more than 1% difference
18 5 Chris Cannam
 ** Adjusting the CQ resampler parameters to allow a lower SNR: no more than 1% difference
19 6 Chris Cannam
 ** commit:5314d3361dfb halving the number of EM iterations: reduces runtime by 43% (to 59 sec). If this is linear, then EM must be taking around 86% of the total.
20 5 Chris Cannam
21 1 Chris Cannam
 * Optimising EM:
22 6 Chris Cannam
 ** commit:97b77e7cb94c storing the templates as double instead of single-precision floats saves around 4% overall, for 100 sec
23 6 Chris Cannam
 ** (Alternatively, commit:840c0d703bbb storing them as floats and using single-precision arithmetic throughout saves around 14%, but presumably produces different results -- not pursued at this point)
24 6 Chris Cannam
 ** commit:19f6832fdc8a using bqvec library for raw vector allocation and manipulation instead of std::vector saves a further 10%, for 89 sec
25 5 Chris Cannam
 ** A couple of experiments to try to get the template arrays better aligned failed
26 6 Chris Cannam
 ** commit:6890dea115c3 factoring out a further loop saves another 11%, for 78 sec
27 5 Chris Cannam
28 5 Chris Cannam
 * Multi-threading:
29 6 Chris Cannam
 ** commit:df05f855f63b using OpenMP for the loop through columns when calling out to EM halves the runtime again (for 41s total), though now consuming 122s "user" time
30 6 Chris Cannam
 ** the same code with OMP_NUM_THREADS=1 now runs in 78 sec
31 8 Chris Cannam
32 9 Chris Cannam
That work was merged to default, for a new baseline time of 41s.
33 9 Chris Cannam
34 9 Chris Cannam
 * Optimising EM again:
35 9 Chris Cannam
 ** commit:f25b8e7de0ed not processing templates that are out of range for an instrument: saves 58% for 24 sec
36 9 Chris Cannam
37 9 Chris Cannam
38 8 Chris Cannam
h2. Other possibilities
39 8 Chris Cannam
40 8 Chris Cannam
 * Compare the quality of results using float arithmetic to those using doubles
41 8 Chris Cannam
 * Adaptively select the number of EM iterations -- if the process is converging more quickly, break off sooner (how to measure convergence?)
42 10 Chris Cannam
 * Optimise the constant-Q -- it wasn't a significant part of the runtime to start with, but is presumably becoming more significant now