Piano Evaluation for Level Normalisation » History » Version 15

« Previous - Version 15/47 (diff) - Next » - Current version
Chris Cannam, 2014-07-17 10:21 AM


Piano Evaluation for Level Normalisation

Lack of normalisation for Vamp plugin inputs is a problem when analysing quiet recordings (see #1028).

Testing using a small set of piano recordings, quickly evaluating performance across the first 30 seconds under a number of different normalisation / level management regimes.

Input files

Filename Signal max approx
31.wav 0.57
MAPS_MUS-bach_846_AkPnBcht.wav 0.12
MAPS_MUS-chpn_op7_1_ENSTDkAm.wav 0.33
MAPS_MUS-scn15_7_SptkBGAm.wav 0.13
mz_333_1MINp_align.wav 0.10

The plugin has one internal threshold parameter, which can be lowered to find quieter notes (at the expense of course of more false positives). We don't really want to expose this (or any continuous controls) as a parameter. But we need to have approximately predictable input levels, for this threshold to be meaningful.

Methods

Name Hg revision Description
norm d721a17f3e14 Normalise to 0.50 max before running plugin (can't do this in plugin, it's the reference case)
as-is d721a17f3e14 No normalisation
to-date d9b688700819 Track max signal level so far, adjust each sample so that max is at 0.50

Results

Reporting only the note onset F-measure for the first 30 seconds of each piece.

Filename norm as-is to-date
31.wav 50 33 40
MAPS_MUS-bach_846_AkPnBcht.wav 87 15 62
MAPS_MUS-chpn_op7_1_ENSTDkAm.wav 33 31 31
MAPS_MUS-scn15_7_SptkBGAm.wav 73 16 61
mz_333_1MINp_align.wav 66 3 58

The precision (proportion of correct onsets among detected onsets, or 1 minus the false-positive rate) and recall (proportion of correctly-detected onsets among all ground-truth onsets, or true-positive rate) vary as you would hope:

  • when the resulting audio level is quieter than the norm case, precision is high and recall is low but the F-measure is worse than the norm case
  • when the resulting audio level is louder than the norm case, precision is low and recall is high and the F-measure is still worse than the norm case

This suggests that our threshold (which happens to be 6) is moderately well-suited to the norm case, at least to optimise F-measure (this might not be the most perceptually useful measure though).

For different piano template sets

The above results are all generated using four piano templates, numbered 1-3 plus pianorwc. Here are results using the norm and as-is methods, but with different sets of piano templates: first with three templates (1-3) and then with each template in turn as the only one.

Filename norm/all as-is/all norm/3of4 as-is/3of4 norm/1 as-is/1 norm/2 as-is/2 norm/3 as-is/3 norm/rwc as-is/rwc
31.wav 50 33 51 30
MAPS_MUS-bach_846_AkPnBcht.wav 87 15 86 16
MAPS_MUS-chpn_op7_1_ENSTDkAm.wav 33 31 32 32
MAPS_MUS-scn15_7_SptkBGAm.wav 73 16 71 19
mz_333_1MINp_align.wav 66 3 68 1