Piano Evaluation for Level Normalisation » History » Version 22

Version 21 (Chris Cannam, 2014-07-17 10:49 AM) → Version 22/47 (Chris Cannam, 2014-07-17 10:56 AM)

h1. Piano Evaluation for Level Normalisation

Lack of normalisation for Vamp plugin inputs is a problem when analysing quiet recordings (see #1028).

Testing using a small set of piano recordings, quickly evaluating performance across the first 30 seconds under a number of different normalisation / level management regimes.

h3. Input files

|Filename|Signal max approx|
|@31.wav@|0.57|
|@MAPS_MUS-bach_846_AkPnBcht.wav@|0.12|
|@MAPS_MUS-chpn_op7_1_ENSTDkAm.wav@|0.33|
|@MAPS_MUS-scn15_7_SptkBGAm.wav@|0.13|
|@mz_333_1MINp_align.wav@|0.10|

The plugin has one internal threshold parameter, which can be lowered to find quieter notes (at the expense of course of more false positives). We don't really want to expose this (or any continuous controls) as a parameter. But we need to have approximately predictable input levels, for this threshold to be meaningful.

h3. Methods

|Name|Hg revision|Description|
|@norm@|commit:d721a17f3e14|Normalise to 0.50 max before running plugin (can't do this in plugin, it's the reference case)|
|@as-is@|commit:d721a17f3e14|No normalisation|
|@to-date@|commit:d9b688700819|Track max signal level _so far_, adjust each sample so that max is at 0.50|

h3. Results

Reporting only the note onset F-measure for the first 30 seconds of each piece.

|Filename|@norm@|@as-is@|@to-date@|
|@31.wav@|50|33|40|
|@MAPS_MUS-bach_846_AkPnBcht.wav@|87|15|62|
|@MAPS_MUS-chpn_op7_1_ENSTDkAm.wav@|33|31|31|
|@MAPS_MUS-scn15_7_SptkBGAm.wav@|73|16|61|
|@mz_333_1MINp_align.wav@|66|3|58|

The precision (_proportion of correct onsets among detected onsets, or 1 minus the false-positive rate_) and recall (_proportion of correctly-detected onsets among all ground-truth onsets, or true-positive rate_) vary as you would hope:

* when the resulting audio level is quieter than the @norm@ case, precision is high and recall is low but the F-measure is worse than the @norm@ case
* when the resulting audio level is louder than the @norm@ case, precision is low and recall is high and the F-measure is still worse than the @norm@ case

This suggests that our threshold (which happens to be 6) is moderately well-suited to the @norm@ case, at least to optimise F-measure (this might not be the most perceptually useful measure though).



h4. For different piano template sets

The above results are all generated using four piano templates, numbered 1-3 plus @pianorwc@.

Here are results using the @norm@ and @as-is@ methods, but with different sets of piano templates: first with three templates (1-3) and then with each template in turn as the only one.

The template turns out not to make an enormous difference -- perhaps because these recordings contain nothing but piano?

|Filename|@norm@/all|@as-is@/all|@norm@/3of4|@as-is@/3of4|@norm@/1|@as-is@/1|@norm@/2|@as-is@/2|@norm@/3|@as-is@/3|@norm@/rwc|@as-is@/rwc|
|@31.wav@|50|33|51|30|50|34|44|42|50|32|56|36| |@31.wav@|50|33|51|30|50|34|44|42|50|32|||
|@MAPS_MUS-bach_846_AkPnBcht.wav@|87|15|86|16|86|24|75|20|73|10|71|18| |@MAPS_MUS-bach_846_AkPnBcht.wav@|87|15|86|16|86|24|75|20|73|10|||
|@MAPS_MUS-chpn_op7_1_ENSTDkAm.wav@|33|31|32|32|31|22|29|31|35|34|32|28| |@MAPS_MUS-chpn_op7_1_ENSTDkAm.wav@|33|31|32|32|31|22|29|31|35|34|||
|@MAPS_MUS-scn15_7_SptkBGAm.wav@|73|16|71|19|71|12|68|14|72|17|70|15| |@MAPS_MUS-scn15_7_SptkBGAm.wav@|73|16|71|19|71|12|68|14|72|17|||
|@mz_333_1MINp_align.wav@|66|3|68|1|63|4|67|2|67|1|63|3|

|@mz_333_1MINp_align.wav@|66|3|68|1|63|4|67|2|67|1|||

h4. For "generic" template set

The above results all use template sets with only piano templates in them.

Here are results using the @norm@ and @as-is@ methods, but with the full set of instrument templates (four pianos plus all the rest).

|Filename|@norm@|@as-is@|
|@31.wav@|49|37|
|@MAPS_MUS-bach_846_AkPnBcht.wav@|79|34|
|@MAPS_MUS-chpn_op7_1_ENSTDkAm.wav@|31|28|
|@MAPS_MUS-scn15_7_SptkBGAm.wav@|67|16|
|@mz_333_1MINp_align.wav@|63|5|