Piano Evaluation for Level Normalisation » History » Version 46

Chris Cannam, 2014-07-23 03:52 PM

1 1 Chris Cannam
h1. Piano Evaluation for Level Normalisation
2 1 Chris Cannam
3 1 Chris Cannam
Lack of normalisation for Vamp plugin inputs is a problem when analysing quiet recordings (see #1028).
4 1 Chris Cannam
5 1 Chris Cannam
Testing using a small set of piano recordings, quickly evaluating performance across the first 30 seconds under a number of different normalisation / level management regimes.
6 1 Chris Cannam
7 3 Chris Cannam
h3. Input files
8 1 Chris Cannam
9 1 Chris Cannam
|Filename|Signal max approx|
10 4 Chris Cannam
|@31.wav@|0.57|
11 4 Chris Cannam
|@MAPS_MUS-bach_846_AkPnBcht.wav@|0.12|
12 4 Chris Cannam
|@MAPS_MUS-chpn_op7_1_ENSTDkAm.wav@|0.33|
13 4 Chris Cannam
|@MAPS_MUS-scn15_7_SptkBGAm.wav@|0.13|
14 4 Chris Cannam
|@mz_333_1MINp_align.wav@|0.10|
15 2 Chris Cannam
16 2 Chris Cannam
The plugin has one internal threshold parameter, which can be lowered to find quieter notes (at the expense of course of more false positives). We don't really want to expose this (or any continuous controls) as a parameter. But we need to have approximately predictable input levels, for this threshold to be meaningful.
17 2 Chris Cannam
18 3 Chris Cannam
h3. Methods
19 2 Chris Cannam
20 2 Chris Cannam
|Name|Hg revision|Description|
21 25 Chris Cannam
|@norm@|commit:d721a17f3e14|Normalise to 0.50 max before running plugin (can't do this in plugin: it's here as the reference case)|
22 4 Chris Cannam
|@as-is@|commit:d721a17f3e14|No normalisation|
23 4 Chris Cannam
|@to-date@|commit:d9b688700819|Track max signal level _so far_, adjust each sample so that max is at 0.50|
24 28 Chris Cannam
|@r2@,@r3@,@r4@,@r5@,@r6@|commit:b5a8836dd2a4|Preprocess with "Flatten Dynamics":/projects/flattendynamics at 0.02, 0.03, 0.04, 0.05, 0.06 target RMS levels respectively|
25 46 Chris Cannam
|@q8@|commit:4ac067799e0b|With "Flatten Dynamics second attempt":/projects/flattendynamics/wiki/Wiki with max RMS targeted to 0.08|
26 32 Chris Cannam
|@t4@|commit:d67fae2bb29e|With Flatten Dynamics attempt 2a with max RMS targeted to 0.04|
27 1 Chris Cannam
|@u4@|commit:70773820e719|With Flatten Dynamics attempt 2b with max RMS targeted to 0.04|
28 46 Chris Cannam
|@s5@|commit:1d5258a37cdd|Drop back to slightly simpler version (see discussion below)|
29 32 Chris Cannam
30 3 Chris Cannam
h3. Results
31 3 Chris Cannam
32 1 Chris Cannam
Reporting only the note onset F-measure for the first 30 seconds of each piece.
33 1 Chris Cannam
34 46 Chris Cannam
|Filename|@norm@|@as-is@|@to-date@|@r2@|@r3@|@r4@|@r5@|@r6@|@q8@|@t4@|@u4@|@s5@|
35 46 Chris Cannam
|@31.wav@|50|33|40|45|47|48|45|43|42|49|45|45|
36 46 Chris Cannam
|@MAPS_MUS-bach_846_AkPnBcht.wav@|87|15|62|64|85|87|87|86|81|86|87|88|
37 46 Chris Cannam
|@MAPS_MUS-chpn_op7_1_ENSTDkAm.wav@|33|31|31|11|25|31|32|31|32|34|35|33|
38 46 Chris Cannam
|@MAPS_MUS-scn15_7_SptkBGAm.wav@|73|16|61|50|57|67|74|75|70|69|68|71|
39 46 Chris Cannam
|@mz_333_1MINp_align.wav@|66|3|58|42|60|64|66|63|66|63|65|66|
40 7 Chris Cannam
41 10 Chris Cannam
The precision (_proportion of correct onsets among detected onsets, or 1 minus the false-positive rate_) and recall (_proportion of correctly-detected onsets among all ground-truth onsets, or true-positive rate_) vary as you would hope:
42 10 Chris Cannam
43 10 Chris Cannam
 * when the resulting audio level is quieter than the @norm@ case, precision is high and recall is low but the F-measure is worse than the @norm@ case
44 10 Chris Cannam
 * when the resulting audio level is louder than the @norm@ case, precision is low and recall is high and the F-measure is still worse than the @norm@ case
45 10 Chris Cannam
46 12 Chris Cannam
This suggests that our threshold (which happens to be 6) is moderately well-suited to the @norm@ case, at least to optimise F-measure (this might not be the most perceptually useful measure though).
47 13 Chris Cannam
48 38 Chris Cannam
The best results (apart from @norm@) above seem to be @r5@ and @u4@. Let's try to refine the parameters for each of those and see if any patterns emerge.
49 38 Chris Cannam
50 38 Chris Cannam
h4. Flatten Dynamics fine-tuning
51 38 Chris Cannam
52 39 Chris Cannam
The adjustable parameters within @r5@, with their defaults, are
53 39 Chris Cannam
54 39 Chris Cannam
|Parameter|Description|Default|
55 39 Chris Cannam
|@historySeconds@|Length of RMS window|4.0 sec|
56 39 Chris Cannam
|@catchUpSeconds@|Length of gain slide window|0.5 sec|
57 39 Chris Cannam
|@targetRMS@|Target RMS value|0.05|
58 39 Chris Cannam
|@maxGain@|Hard limit on gain|20.0|
59 39 Chris Cannam
60 39 Chris Cannam
The @targetRMS@ is the one we have been varying across @r2@, @r3@ etc -- for @r5@ it is fixed at 0.05. We don't need to test @maxGain@ variation.
61 39 Chris Cannam
62 39 Chris Cannam
Here @r5hNcM@ represents the @r5@ method with @historySeconds@ = N and @catchUpSeconds@ = M/10. So @r5@ is the same as @r5h4c05@. The @r5@ test was run again, hence variation from above results.
63 39 Chris Cannam
64 44 Chris Cannam
|Filename|@norm@|@r5@|@r5h2c05@|@r5h5c05@|@r5h6c05@|@r5h8c05@|@r5h4c01@|@r5h4c10@|
65 41 Chris Cannam
|@31.wav@|50|47|38|47|48|46|46|53|
66 41 Chris Cannam
|@MAPS_MUS-bach_846_AkPnBcht.wav@|87|87|87|87|87|88|86|88|
67 41 Chris Cannam
|@MAPS_MUS-chpn_op7_1_ENSTDkAm.wav@|33|32|33|32|29|31|32|31|
68 41 Chris Cannam
|@MAPS_MUS-scn15_7_SptkBGAm.wav@|73|73|66|72|76|73|73|73|
69 41 Chris Cannam
|@mz_333_1MINp_align.wav@|66|66|64|64|66|63|65|66|
70 41 Chris Cannam
71 42 Chris Cannam
The adjustable parameters within @u4@, with their defaults, are
72 42 Chris Cannam
73 42 Chris Cannam
|Parameter|Description|Default|
74 42 Chris Cannam
|@longTermSeconds@|Length of long-term RMS window|4.0 sec|
75 42 Chris Cannam
|@shortTermSeconds@|Length of short-term RMS window|1.0 sec|
76 42 Chris Cannam
|@catchUpSeconds@|Length of gain slide window|0.2 sec|
77 42 Chris Cannam
|@targetMaxRMS@|Target RMS value|0.04|
78 42 Chris Cannam
|@rmsMaxDecay@|Fallback multiplier for max RMS per sample|0.999|
79 42 Chris Cannam
|@squashFactor@|Exponent to skew 0,1 range toward top of range|0.3| 
80 42 Chris Cannam
|@maxGain@|Hard limit on gain|20.0|
81 42 Chris Cannam
82 43 Chris Cannam
Start by varying @squashFactor@ with others at defaults:
83 1 Chris Cannam
84 1 Chris Cannam
|Filename|@norm@|@r5@|0.1|0.3|0.5|1.0|
85 43 Chris Cannam
|@31.wav@|50|47|42|40|41|45|
86 43 Chris Cannam
|@MAPS_MUS-bach_846_AkPnBcht.wav@|87|87|81|82|82|85|
87 43 Chris Cannam
|@MAPS_MUS-chpn_op7_1_ENSTDkAm.wav@|33|32|29|30|33|30|
88 43 Chris Cannam
|@MAPS_MUS-scn15_7_SptkBGAm.wav@|73|73|59|64|68|63|
89 43 Chris Cannam
|@mz_333_1MINp_align.wav@|66|66|65|67|64|59|
90 43 Chris Cannam
91 43 Chris Cannam
The 0.3 results are far worse than the @u4@ results obtained earlier (even though this is the same code). Variance is evidently high.
92 43 Chris Cannam
93 43 Chris Cannam
I don't think @u4@ is showing good enough results to justify its complexity over the global-only @r5@ code, and the squash factor seems to offer little.
94 1 Chris Cannam
95 44 Chris Cannam
Let's supersede the @u@-series with an @s@-series that uses the long-term window (only) from @r5@ but with some decay in max RMS value to account for pieces that go loud-soft alternately. Parameters:
96 44 Chris Cannam
97 44 Chris Cannam
|Parameter|Description|Default|
98 44 Chris Cannam
|@historySeconds@|Length of long-term RMS window|4.0 sec|
99 44 Chris Cannam
|@catchUpSeconds@|Length of gain slide window|0.2 sec|
100 44 Chris Cannam
|@targetMaxRMS@|Target RMS value|0.05|
101 44 Chris Cannam
|@rmsMaxDecay@|Fallback multiplier for max RMS per sample|0.999|
102 44 Chris Cannam
|@maxGain@|Hard limit on gain|20.0|
103 44 Chris Cannam
104 44 Chris Cannam
We have not yet adjusted this for target RMS, never mind the others. Here's target RMS variation:
105 44 Chris Cannam
106 45 Chris Cannam
|Filename|@norm@|@r5@|@s3@|@s4@|@s5@|@s6@|@s7@|
107 45 Chris Cannam
|@31.wav@|50|47|45|46|42|44|45|
108 45 Chris Cannam
|@MAPS_MUS-bach_846_AkPnBcht.wav@|87|87|84|84|83|81|76|
109 45 Chris Cannam
|@MAPS_MUS-chpn_op7_1_ENSTDkAm.wav@|33|32|21|31|33|30|30|
110 45 Chris Cannam
|@MAPS_MUS-scn15_7_SptkBGAm.wav@|73|73|57|64|68|66|63|
111 45 Chris Cannam
|@mz_333_1MINp_align.wav@|66|66|56|60|63|63|63|
112 45 Chris Cannam
113 45 Chris Cannam
Varying fallback multiplier for @s5@:
114 45 Chris Cannam
115 45 Chris Cannam
|Filename|@norm@|@r5@|@s5@|0.9|0.99|1.0|
116 45 Chris Cannam
|@31.wav@|50|47|42|44|45|47|
117 45 Chris Cannam
|@MAPS_MUS-bach_846_AkPnBcht.wav@|87|87|83|83|84|83|
118 45 Chris Cannam
|@MAPS_MUS-chpn_op7_1_ENSTDkAm.wav@|33|32|33|31|29|1(??)|
119 45 Chris Cannam
|@MAPS_MUS-scn15_7_SptkBGAm.wav@|73|73|68|67|69|57|
120 45 Chris Cannam
|@mz_333_1MINp_align.wav@|66|66|63|63|63|57|
121 45 Chris Cannam
122 42 Chris Cannam
123 39 Chris Cannam
124 14 Chris Cannam
h4. For different piano template sets
125 14 Chris Cannam
126 17 Chris Cannam
The above results are all generated using four piano templates, numbered 1-3 plus @pianorwc@.
127 17 Chris Cannam
128 17 Chris Cannam
Here are results using the @norm@ and @as-is@ methods, but with different sets of piano templates: first with three templates (1-3) and then with each template in turn as the only one.
129 17 Chris Cannam
130 19 Chris Cannam
The template turns out not to make an enormous difference -- perhaps because these recordings contain nothing but piano?
131 13 Chris Cannam
132 13 Chris Cannam
|Filename|@norm@/all|@as-is@/all|@norm@/3of4|@as-is@/3of4|@norm@/1|@as-is@/1|@norm@/2|@as-is@/2|@norm@/3|@as-is@/3|@norm@/rwc|@as-is@/rwc|
133 22 Chris Cannam
|@31.wav@|50|33|51|30|50|34|44|42|50|32|56|36|
134 22 Chris Cannam
|@MAPS_MUS-bach_846_AkPnBcht.wav@|87|15|86|16|86|24|75|20|73|10|71|18|
135 22 Chris Cannam
|@MAPS_MUS-chpn_op7_1_ENSTDkAm.wav@|33|31|32|32|31|22|29|31|35|34|32|28|
136 22 Chris Cannam
|@MAPS_MUS-scn15_7_SptkBGAm.wav@|73|16|71|19|71|12|68|14|72|17|70|15|
137 22 Chris Cannam
|@mz_333_1MINp_align.wav@|66|3|68|1|63|4|67|2|67|1|63|3|
138 20 Chris Cannam
139 20 Chris Cannam
h4. For "generic" template set
140 20 Chris Cannam
141 20 Chris Cannam
The above results all use template sets with only piano templates in them.
142 20 Chris Cannam
143 20 Chris Cannam
Here are results using the @norm@ and @as-is@ methods, but with the full set of instrument templates (four pianos plus all the rest).
144 21 Chris Cannam
145 1 Chris Cannam
|Filename|@norm@|@as-is@|
146 1 Chris Cannam
|@31.wav@|49|37|
147 1 Chris Cannam
|@MAPS_MUS-bach_846_AkPnBcht.wav@|79|34|
148 1 Chris Cannam
|@MAPS_MUS-chpn_op7_1_ENSTDkAm.wav@|31|28|
149 1 Chris Cannam
|@MAPS_MUS-scn15_7_SptkBGAm.wav@|67|16|
150 1 Chris Cannam
|@mz_333_1MINp_align.wav@|63|5|
151 34 Chris Cannam
152 34 Chris Cannam
h4. Cross-checking with non-piano test data
153 34 Chris Cannam
154 34 Chris Cannam
The results need to be roughly comparable with those obtained from pre-normalised data using other datasets as well as the piano one. Here is a subset of the TRIOS dataset. The @norm@ result is that obtained from the plugin prior to doing this work, using pre-normalised data.
155 34 Chris Cannam
156 36 Chris Cannam
The @mirex@ result is that from the MIREX 2012 submission in MATLAB, but note that this always uses all instrument templates while the plugin results are based on selecting the "right" instrument for the piece (which is assumed to be the best, though we aren't actually testing that here).
157 35 Chris Cannam
158 46 Chris Cannam
|File|@mirex@|@norm@|@u4@|@s5@|
159 46 Chris Cannam
|mozart/piano|60|64|56|59|
160 46 Chris Cannam
|mozart/viola|33|37|35|39|
161 46 Chris Cannam
|mozart/mix|51|58|55|52|
162 46 Chris Cannam
|mozart/clarinet|74|80|86|89|
163 46 Chris Cannam
|lussier/piano|45|52|63|59|
164 46 Chris Cannam
|lussier/mix|36|43|40|38|
165 46 Chris Cannam
|lussier/bassoon|43|75|80|79|
166 46 Chris Cannam
|lussier/trumpet|43|46|51|47|
167 46 Chris Cannam
|take_five/piano|61|46|69|64|
168 46 Chris Cannam
|take_five/mix|62|73|69|70|
169 46 Chris Cannam
|take_five/saxophone|78|80|84|86|