Piano Evaluation for Level Normalisation » History » Version 40

« Previous - Version 40/47 (diff) - Next » - Current version
Chris Cannam, 2014-07-23 02:40 PM


Piano Evaluation for Level Normalisation

Lack of normalisation for Vamp plugin inputs is a problem when analysing quiet recordings (see #1028).

Testing using a small set of piano recordings, quickly evaluating performance across the first 30 seconds under a number of different normalisation / level management regimes.

Input files

Filename Signal max approx
31.wav 0.57
MAPS_MUS-bach_846_AkPnBcht.wav 0.12
MAPS_MUS-chpn_op7_1_ENSTDkAm.wav 0.33
MAPS_MUS-scn15_7_SptkBGAm.wav 0.13
mz_333_1MINp_align.wav 0.10

The plugin has one internal threshold parameter, which can be lowered to find quieter notes (at the expense of course of more false positives). We don't really want to expose this (or any continuous controls) as a parameter. But we need to have approximately predictable input levels, for this threshold to be meaningful.

Methods

Name Hg revision Description
norm d721a17f3e14 Normalise to 0.50 max before running plugin (can't do this in plugin: it's here as the reference case)
as-is d721a17f3e14 No normalisation
to-date d9b688700819 Track max signal level so far, adjust each sample so that max is at 0.50
r2,r3,r4,r5,r6 b5a8836dd2a4 Preprocess with Flatten Dynamics at 0.02, 0.03, 0.04, 0.05, 0.06 target RMS levels respectively
s8 4ac067799e0b With Flatten Dynamics second attempt with max RMS targeted to 0.08
t4 d67fae2bb29e With Flatten Dynamics attempt 2a with max RMS targeted to 0.04
u4 70773820e719 With Flatten Dynamics attempt 2b with max RMS targeted to 0.04

Results

Reporting only the note onset F-measure for the first 30 seconds of each piece.

Filename norm as-is to-date r2 r3 r4 r5 r6 s8 t4 u4
31.wav 50 33 40 45 47 48 45 43 42 49 45
MAPS_MUS-bach_846_AkPnBcht.wav 87 15 62 64 85 87 87 86 81 86 87
MAPS_MUS-chpn_op7_1_ENSTDkAm.wav 33 31 31 11 25 31 32 31 32 34 35
MAPS_MUS-scn15_7_SptkBGAm.wav 73 16 61 50 57 67 74 75 70 69 68
mz_333_1MINp_align.wav 66 3 58 42 60 64 66 63 66 63 65

The precision (proportion of correct onsets among detected onsets, or 1 minus the false-positive rate) and recall (proportion of correctly-detected onsets among all ground-truth onsets, or true-positive rate) vary as you would hope:

  • when the resulting audio level is quieter than the norm case, precision is high and recall is low but the F-measure is worse than the norm case
  • when the resulting audio level is louder than the norm case, precision is low and recall is high and the F-measure is still worse than the norm case

This suggests that our threshold (which happens to be 6) is moderately well-suited to the norm case, at least to optimise F-measure (this might not be the most perceptually useful measure though).

The best results (apart from norm) above seem to be r5 and u4. Let's try to refine the parameters for each of those and see if any patterns emerge.

Flatten Dynamics fine-tuning

The adjustable parameters within r5, with their defaults, are

Parameter Description Default
historySeconds Length of RMS window 4.0 sec
catchUpSeconds Length of gain slide window 0.5 sec
targetRMS Target RMS value 0.05
maxGain Hard limit on gain 20.0

The targetRMS is the one we have been varying across r2, r3 etc -- for r5 it is fixed at 0.05. We don't need to test maxGain variation.

Here r5hNcM represents the r5 method with historySeconds = N and catchUpSeconds = M/10. So r5 is the same as r5h4c05. The r5 test was run again, hence variation from above results.

Filename norm r5 r5h2c05 r5h5c05 r5h6c05 r5h8c05 r5h4c01 r5h4c10@
31.wav 50 47 38 47 48 46
MAPS_MUS-bach_846_AkPnBcht.wav 87 87 87 87 87 88
MAPS_MUS-chpn_op7_1_ENSTDkAm.wav 33 32 33 32 29 31
MAPS_MUS-scn15_7_SptkBGAm.wav 73 73 66 72 76 73
mz_333_1MINp_align.wav 66 66 64 64 66 63

For different piano template sets

The above results are all generated using four piano templates, numbered 1-3 plus pianorwc.

Here are results using the norm and as-is methods, but with different sets of piano templates: first with three templates (1-3) and then with each template in turn as the only one.

The template turns out not to make an enormous difference -- perhaps because these recordings contain nothing but piano?

Filename norm/all as-is/all norm/3of4 as-is/3of4 norm/1 as-is/1 norm/2 as-is/2 norm/3 as-is/3 norm/rwc as-is/rwc
31.wav 50 33 51 30 50 34 44 42 50 32 56 36
MAPS_MUS-bach_846_AkPnBcht.wav 87 15 86 16 86 24 75 20 73 10 71 18
MAPS_MUS-chpn_op7_1_ENSTDkAm.wav 33 31 32 32 31 22 29 31 35 34 32 28
MAPS_MUS-scn15_7_SptkBGAm.wav 73 16 71 19 71 12 68 14 72 17 70 15
mz_333_1MINp_align.wav 66 3 68 1 63 4 67 2 67 1 63 3

For "generic" template set

The above results all use template sets with only piano templates in them.

Here are results using the norm and as-is methods, but with the full set of instrument templates (four pianos plus all the rest).

Filename norm as-is
31.wav 49 37
MAPS_MUS-bach_846_AkPnBcht.wav 79 34
MAPS_MUS-chpn_op7_1_ENSTDkAm.wav 31 28
MAPS_MUS-scn15_7_SptkBGAm.wav 67 16
mz_333_1MINp_align.wav 63 5

Cross-checking with non-piano test data

The results need to be roughly comparable with those obtained from pre-normalised data using other datasets as well as the piano one. Here is a subset of the TRIOS dataset. The norm result is that obtained from the plugin prior to doing this work, using pre-normalised data.

The mirex result is that from the MIREX 2012 submission in MATLAB, but note that this always uses all instrument templates while the plugin results are based on selecting the "right" instrument for the piece (which is assumed to be the best, though we aren't actually testing that here).

File mirex norm u4
mozart/piano 60 64 56
mozart/viola 33 37 35
mozart/mix 51 58 55
mozart/clarinet 74 80 86
lussier/piano 45 52 63
lussier/mix 36 43 40
lussier/bassoon 43 75 80
lussier/trumpet 43 46 51
take_five/piano 61 46 69
take_five/mix 62 73 69
take_five/saxophone 78 80 84