Speed » History » Version 23
Chris Cannam, 2014-05-08 12:22 PM
1 | 1 | Chris Cannam | h1. Speed |
---|---|---|---|
2 | 1 | Chris Cannam | |
3 | 7 | Chris Cannam | h2. Aims |
4 | 7 | Chris Cannam | |
5 | 1 | Chris Cannam | We want to make the plugin as fast as possible, but I think there's a case to be made for providing fast and slow modes (see [[Possibilities for Plugin Parameters]]). |
6 | 1 | Chris Cannam | |
7 | 1 | Chris Cannam | In "fast" mode we should have the aim of producing a reasonable transcription in faster than real-time on any computer from the past 5 years or so. "Slow" mode has no particular speed constraint, simply as fast as possible an implementation of the best results we can easily do. |
8 | 1 | Chris Cannam | |
9 | 20 | Chris Cannam | See the "timing":/projects/silvet/repository/show/testdata/timing directory in the repo for timing tests, summarised below. See the end of the results file, and "slower computers" below, for some figures from older hardware. |
10 | 1 | Chris Cannam | |
11 | 1 | Chris Cannam | h2. Work so far |
12 | 20 | Chris Cannam | |
13 | 20 | Chris Cannam | Thinkpad T540p, 2-core+HT 64-bit Intel i5-4330M under 64-bit Linux. |
14 | 1 | Chris Cannam | |
15 | 6 | Chris Cannam | * commit:ce64d11ef336, pre-optimisation (release build) takes 104 seconds to process a 43.5-second file. (For reference, a debug build takes over 850 seconds.) |
16 | 2 | Chris Cannam | |
17 | 5 | Chris Cannam | * Experiments to test where the time is spent: |
18 | 6 | Chris Cannam | ** commit:78a7bf247016 removing the unused Vamp plugin outputs: no more than 1% difference |
19 | 6 | Chris Cannam | ** commit:f3bf6503e6c6 removing debug printouts: no more than 1% difference |
20 | 5 | Chris Cannam | ** Adjusting the CQ resampler parameters to allow a lower SNR: no more than 1% difference |
21 | 6 | Chris Cannam | ** commit:5314d3361dfb halving the number of EM iterations: reduces runtime by 43% (to 59 sec). If this is linear, then EM must be taking around 86% of the total. |
22 | 5 | Chris Cannam | |
23 | 1 | Chris Cannam | * Optimising EM: |
24 | 6 | Chris Cannam | ** commit:97b77e7cb94c storing the templates as double instead of single-precision floats saves around 4% overall, for 100 sec |
25 | 6 | Chris Cannam | ** (Alternatively, commit:840c0d703bbb storing them as floats and using single-precision arithmetic throughout saves around 14%, but presumably produces different results -- not pursued at this point) |
26 | 6 | Chris Cannam | ** commit:19f6832fdc8a using bqvec library for raw vector allocation and manipulation instead of std::vector saves a further 10%, for 89 sec |
27 | 5 | Chris Cannam | ** A couple of experiments to try to get the template arrays better aligned failed |
28 | 6 | Chris Cannam | ** commit:6890dea115c3 factoring out a further loop saves another 11%, for 78 sec |
29 | 5 | Chris Cannam | |
30 | 5 | Chris Cannam | * Multi-threading: |
31 | 6 | Chris Cannam | ** commit:df05f855f63b using OpenMP for the loop through columns when calling out to EM halves the runtime again (for 41s total), though now consuming 122s "user" time |
32 | 6 | Chris Cannam | ** the same code with OMP_NUM_THREADS=1 now runs in 78 sec |
33 | 8 | Chris Cannam | |
34 | 9 | Chris Cannam | That work was merged to default, for a new baseline time of 41s. |
35 | 9 | Chris Cannam | |
36 | 9 | Chris Cannam | * Optimising EM again: |
37 | 1 | Chris Cannam | ** commit:f25b8e7de0ed not processing templates that are out of range for an instrument: saves 58% for 24 sec, or 41s single-threaded |
38 | 15 | Chris Cannam | |
39 | 15 | Chris Cannam | h2. Slower computers |
40 | 15 | Chris Cannam | |
41 | 21 | Chris Cannam | Thinkpad T40p, single-core 32-bit 1.6GHz Pentium-M. This is almost a decade old and quite a lot slower than any reasonable target for real-time performance. |
42 | 15 | Chris Cannam | |
43 | 15 | Chris Cannam | * commit:ce64d11ef336 (104s on reference computer): 541 sec |
44 | 23 | Chris Cannam | * commit:f25b8e7de0ed (24s on reference computer or 41s single-threaded): 415 sec (only 23% faster, or less than 11% of real-time speed) |
45 | 23 | Chris Cannam | * commit:f25b8e7de0ed in draft mode (no shifts) (14s on reference computer): 210 sec (21% of real-time speed) |
46 | 23 | Chris Cannam | |
47 | 22 | Chris Cannam | |
48 | 9 | Chris Cannam | |
49 | 8 | Chris Cannam | h2. Other possibilities |
50 | 8 | Chris Cannam | |
51 | 8 | Chris Cannam | * Compare the quality of results using float arithmetic to those using doubles |
52 | 1 | Chris Cannam | * Adaptively select the number of EM iterations -- if the process is converging more quickly, break off sooner (how to measure convergence?) |
53 | 11 | Chris Cannam | * Optimise the constant-Q -- it wasn't a very significant part of the runtime to start with, but is presumably becoming more significant now |