Speed » History » Version 5
Chris Cannam, 2014-05-07 01:48 PM
1 | 1 | Chris Cannam | h1. Speed |
---|---|---|---|
2 | 1 | Chris Cannam | |
3 | 1 | Chris Cannam | We want to make the plugin as fast as possible, but I think there's a case to be made for providing fast and slow modes (see [[Possibilities for Plugin Parameters]]). |
4 | 1 | Chris Cannam | |
5 | 1 | Chris Cannam | In "fast" mode we should have the aim of producing a reasonable transcription in faster than real-time on any computer from the past 5 years or so. "Slow" mode has no particular speed constraint, simply as fast as possible an implementation of the best results we can easily do. |
6 | 1 | Chris Cannam | |
7 | 1 | Chris Cannam | See the "timing":/projects/silvet/repository/show/testdata/timing directory in the repo for timing tests. These are all carried out on a Thinkpad T540p with Intel i5-4330M under 64-bit Linux. |
8 | 1 | Chris Cannam | |
9 | 1 | Chris Cannam | Work so far: |
10 | 1 | Chris Cannam | |
11 | 2 | Chris Cannam | * Pre-optimisation, commit:ce64d11ef336 (release build) takes 104 seconds to process a 43.5-second file. (For reference, a debug build takes over 850 seconds.) |
12 | 2 | Chris Cannam | |
13 | 5 | Chris Cannam | * Experiments to test where the time is spent: |
14 | 5 | Chris Cannam | ** Removing the unused Vamp plugin outputs: no more than 1% difference |
15 | 5 | Chris Cannam | ** Removing debug printouts: no more than 1% difference |
16 | 5 | Chris Cannam | ** Adjusting the CQ resampler parameters to allow a lower SNR: no more than 1% difference |
17 | 5 | Chris Cannam | ** Halving the number of EM iterations: reduces runtime by 43% (to 59 sec). If this is linear, then EM must be taking around 86% of the total. |
18 | 5 | Chris Cannam | |
19 | 5 | Chris Cannam | * Optimising EM: |
20 | 5 | Chris Cannam | ** Storing the templates as double instead of single-precision floats saves around 4% overall, for 100 sec |
21 | 5 | Chris Cannam | ** (Alternatively, storing them as floats and using single-precision arithmetic throughout saves around 14%, but presumably produces different results -- not pursued at this point) |
22 | 5 | Chris Cannam | ** Using bqvec library for raw vector allocation and manipulation instead of std::vector saves a further 10%, for 89 sec |
23 | 5 | Chris Cannam | ** A couple of experiments to try to get the template arrays better aligned failed |
24 | 5 | Chris Cannam | ** Factoring out a further loop saves another 11%, for 78 sec |
25 | 5 | Chris Cannam | |
26 | 5 | Chris Cannam | * Multi-threading: |
27 | 5 | Chris Cannam | ** Using OpenMP for the loop through columns when calling out to EM halves the runtime again (for 41s total), though now consuming 122s "user" time |