Piano Evaluation for Level Normalisation » History » Version 10

Chris Cannam, 2014-07-16 07:05 PM

1 1 Chris Cannam
h1. Piano Evaluation for Level Normalisation
2 1 Chris Cannam
3 1 Chris Cannam
Lack of normalisation for Vamp plugin inputs is a problem when analysing quiet recordings (see #1028).
4 1 Chris Cannam
5 1 Chris Cannam
Testing using a small set of piano recordings, quickly evaluating performance across the first 30 seconds under a number of different normalisation / level management regimes.
6 1 Chris Cannam
7 3 Chris Cannam
h3. Input files
8 1 Chris Cannam
9 1 Chris Cannam
|Filename|Signal max approx|
10 4 Chris Cannam
|@31.wav@|0.57|
11 4 Chris Cannam
|@MAPS_MUS-bach_846_AkPnBcht.wav@|0.12|
12 4 Chris Cannam
|@MAPS_MUS-chpn_op7_1_ENSTDkAm.wav@|0.33|
13 4 Chris Cannam
|@MAPS_MUS-scn15_7_SptkBGAm.wav@|0.13|
14 4 Chris Cannam
|@mz_333_1MINp_align.wav@|0.10|
15 2 Chris Cannam
16 2 Chris Cannam
The plugin has one internal threshold parameter, which can be lowered to find quieter notes (at the expense of course of more false positives). We don't really want to expose this (or any continuous controls) as a parameter. But we need to have approximately predictable input levels, for this threshold to be meaningful.
17 2 Chris Cannam
18 3 Chris Cannam
h3. Methods
19 2 Chris Cannam
20 2 Chris Cannam
|Name|Hg revision|Description|
21 4 Chris Cannam
|@as-is@|commit:d721a17f3e14|No normalisation|
22 4 Chris Cannam
|@norm@|commit:d721a17f3e14|Normalise to 0.50 max before running plugin (can't do this in plugin)|
23 4 Chris Cannam
|@to-date@|commit:d9b688700819|Track max signal level _so far_, adjust each sample so that max is at 0.50|
24 1 Chris Cannam
25 3 Chris Cannam
h3. Results
26 3 Chris Cannam
27 3 Chris Cannam
Reporting only the note onset F-measure for the first 30 seconds of each piece.
28 3 Chris Cannam
29 10 Chris Cannam
|Filename|@norm@|@as-is@|@to-date@|
30 10 Chris Cannam
|@31.wav@|50|33|40|
31 10 Chris Cannam
|@MAPS_MUS-bach_846_AkPnBcht.wav@|87|15|62|
32 10 Chris Cannam
|@MAPS_MUS-chpn_op7_1_ENSTDkAm.wav@|33|31|31|
33 10 Chris Cannam
|@MAPS_MUS-scn15_7_SptkBGAm.wav@|73|16|61|
34 10 Chris Cannam
|@mz_333_1MINp_align.wav@|66|3|58|
35 7 Chris Cannam
36 10 Chris Cannam
The precision (_proportion of correct onsets among detected onsets, or 1 minus the false-positive rate_) and recall (_proportion of correctly-detected onsets among all ground-truth onsets, or true-positive rate_) vary as you would hope:
37 10 Chris Cannam
38 10 Chris Cannam
 * when the resulting audio level is quieter than the @norm@ case, precision is high and recall is low but the F-measure is worse than the @norm@ case
39 10 Chris Cannam
 * when the resulting audio level is louder than the @norm@ case, precision is low and recall is high and the F-measure is still worse than the @norm@ case
40 10 Chris Cannam
41 10 Chris Cannam
This suggests that our threshold is moderately well-suited to the @norm@ case, at least to optimise F-measure (this might not be the most perceptually useful measure though).