Piano Evaluation for Level Normalisation » History » Version 10
« Previous -
Version 10/47
(diff) -
Next » -
Current version
Chris Cannam, 2014-07-16 07:05 PM
Piano Evaluation for Level Normalisation¶
Lack of normalisation for Vamp plugin inputs is a problem when analysing quiet recordings (see #1028).
Testing using a small set of piano recordings, quickly evaluating performance across the first 30 seconds under a number of different normalisation / level management regimes.
Input files¶
Filename | Signal max approx |
31.wav |
0.57 |
MAPS_MUS-bach_846_AkPnBcht.wav |
0.12 |
MAPS_MUS-chpn_op7_1_ENSTDkAm.wav |
0.33 |
MAPS_MUS-scn15_7_SptkBGAm.wav |
0.13 |
mz_333_1MINp_align.wav |
0.10 |
The plugin has one internal threshold parameter, which can be lowered to find quieter notes (at the expense of course of more false positives). We don't really want to expose this (or any continuous controls) as a parameter. But we need to have approximately predictable input levels, for this threshold to be meaningful.
Methods¶
Name | Hg revision | Description |
as-is |
d721a17f3e14 | No normalisation |
norm |
d721a17f3e14 | Normalise to 0.50 max before running plugin (can't do this in plugin) |
to-date |
d9b688700819 | Track max signal level so far, adjust each sample so that max is at 0.50 |
Results¶
Reporting only the note onset F-measure for the first 30 seconds of each piece.
Filename | norm |
as-is |
to-date |
31.wav |
50 | 33 | 40 |
MAPS_MUS-bach_846_AkPnBcht.wav |
87 | 15 | 62 |
MAPS_MUS-chpn_op7_1_ENSTDkAm.wav |
33 | 31 | 31 |
MAPS_MUS-scn15_7_SptkBGAm.wav |
73 | 16 | 61 |
mz_333_1MINp_align.wav |
66 | 3 | 58 |
The precision (proportion of correct onsets among detected onsets, or 1 minus the false-positive rate) and recall (proportion of correctly-detected onsets among all ground-truth onsets, or true-positive rate) vary as you would hope:
- when the resulting audio level is quieter than the
norm
case, precision is high and recall is low but the F-measure is worse than thenorm
case - when the resulting audio level is louder than the
norm
case, precision is low and recall is high and the F-measure is still worse than thenorm
case
This suggests that our threshold is moderately well-suited to the norm
case, at least to optimise F-measure (this might not be the most perceptually useful measure though).