Piano Evaluation for Level Normalisation » History » Version 32

« Previous - Version 32/47 (diff) - Next » - Current version
Chris Cannam, 2014-07-22 02:15 PM


Piano Evaluation for Level Normalisation

Lack of normalisation for Vamp plugin inputs is a problem when analysing quiet recordings (see #1028).

Testing using a small set of piano recordings, quickly evaluating performance across the first 30 seconds under a number of different normalisation / level management regimes.

Input files

Filename Signal max approx
31.wav 0.57
MAPS_MUS-bach_846_AkPnBcht.wav 0.12
MAPS_MUS-chpn_op7_1_ENSTDkAm.wav 0.33
MAPS_MUS-scn15_7_SptkBGAm.wav 0.13
mz_333_1MINp_align.wav 0.10

The plugin has one internal threshold parameter, which can be lowered to find quieter notes (at the expense of course of more false positives). We don't really want to expose this (or any continuous controls) as a parameter. But we need to have approximately predictable input levels, for this threshold to be meaningful.

Methods

Name Hg revision Description
norm d721a17f3e14 Normalise to 0.50 max before running plugin (can't do this in plugin: it's here as the reference case)
as-is d721a17f3e14 No normalisation
to-date d9b688700819 Track max signal level so far, adjust each sample so that max is at 0.50
r2,r3,r4,r5,r6 b5a8836dd2a4 Preprocess with Flatten Dynamics at 0.02, 0.03, 0.04, 0.05, 0.06 target RMS levels respectively
s8 4ac067799e0b With Flatten Dynamics second attempt with max RMS targeted to 0.08
t4 d67fae2bb29e With Flatten Dynamics attempt 2a with max RMS targeted to 0.04

Results

Reporting only the note onset F-measure for the first 30 seconds of each piece.

Filename norm as-is to-date r2 r3 r4 r5 r6 s8
31.wav 50 33 40 45 47 48 45 43 42
MAPS_MUS-bach_846_AkPnBcht.wav 87 15 62 64 85 87 87 86 81
MAPS_MUS-chpn_op7_1_ENSTDkAm.wav 33 31 31 11 25 31 32 31 32
MAPS_MUS-scn15_7_SptkBGAm.wav 73 16 61 50 57 67 74 75 70
mz_333_1MINp_align.wav 66 3 58 42 60 64 66 63 66

The precision (proportion of correct onsets among detected onsets, or 1 minus the false-positive rate) and recall (proportion of correctly-detected onsets among all ground-truth onsets, or true-positive rate) vary as you would hope:

  • when the resulting audio level is quieter than the norm case, precision is high and recall is low but the F-measure is worse than the norm case
  • when the resulting audio level is louder than the norm case, precision is low and recall is high and the F-measure is still worse than the norm case

This suggests that our threshold (which happens to be 6) is moderately well-suited to the norm case, at least to optimise F-measure (this might not be the most perceptually useful measure though).

For different piano template sets

The above results are all generated using four piano templates, numbered 1-3 plus pianorwc.

Here are results using the norm and as-is methods, but with different sets of piano templates: first with three templates (1-3) and then with each template in turn as the only one.

The template turns out not to make an enormous difference -- perhaps because these recordings contain nothing but piano?

Filename norm/all as-is/all norm/3of4 as-is/3of4 norm/1 as-is/1 norm/2 as-is/2 norm/3 as-is/3 norm/rwc as-is/rwc
31.wav 50 33 51 30 50 34 44 42 50 32 56 36
MAPS_MUS-bach_846_AkPnBcht.wav 87 15 86 16 86 24 75 20 73 10 71 18
MAPS_MUS-chpn_op7_1_ENSTDkAm.wav 33 31 32 32 31 22 29 31 35 34 32 28
MAPS_MUS-scn15_7_SptkBGAm.wav 73 16 71 19 71 12 68 14 72 17 70 15
mz_333_1MINp_align.wav 66 3 68 1 63 4 67 2 67 1 63 3

For "generic" template set

The above results all use template sets with only piano templates in them.

Here are results using the norm and as-is methods, but with the full set of instrument templates (four pianos plus all the rest).

Filename norm as-is
31.wav 49 37
MAPS_MUS-bach_846_AkPnBcht.wav 79 34
MAPS_MUS-chpn_op7_1_ENSTDkAm.wav 31 28
MAPS_MUS-scn15_7_SptkBGAm.wav 67 16
mz_333_1MINp_align.wav 63 5