Piano Evaluation for Level Normalisation » History » Version 45
« Previous -
Version 45/47
(diff) -
Next » -
Current version
Chris Cannam, 2014-07-23 03:40 PM
Piano Evaluation for Level Normalisation¶
Lack of normalisation for Vamp plugin inputs is a problem when analysing quiet recordings (see #1028).
Testing using a small set of piano recordings, quickly evaluating performance across the first 30 seconds under a number of different normalisation / level management regimes.
Input files¶
Filename | Signal max approx |
31.wav |
0.57 |
MAPS_MUS-bach_846_AkPnBcht.wav |
0.12 |
MAPS_MUS-chpn_op7_1_ENSTDkAm.wav |
0.33 |
MAPS_MUS-scn15_7_SptkBGAm.wav |
0.13 |
mz_333_1MINp_align.wav |
0.10 |
The plugin has one internal threshold parameter, which can be lowered to find quieter notes (at the expense of course of more false positives). We don't really want to expose this (or any continuous controls) as a parameter. But we need to have approximately predictable input levels, for this threshold to be meaningful.
Methods¶
Name | Hg revision | Description |
norm |
d721a17f3e14 | Normalise to 0.50 max before running plugin (can't do this in plugin: it's here as the reference case) |
as-is |
d721a17f3e14 | No normalisation |
to-date |
d9b688700819 | Track max signal level so far, adjust each sample so that max is at 0.50 |
r2 ,r3 ,r4 ,r5 ,r6 |
b5a8836dd2a4 | Preprocess with Flatten Dynamics at 0.02, 0.03, 0.04, 0.05, 0.06 target RMS levels respectively |
s8 |
4ac067799e0b | With Flatten Dynamics second attempt with max RMS targeted to 0.08 |
t4 |
d67fae2bb29e | With Flatten Dynamics attempt 2a with max RMS targeted to 0.04 |
u4 |
70773820e719 | With Flatten Dynamics attempt 2b with max RMS targeted to 0.04 |
Results¶
Reporting only the note onset F-measure for the first 30 seconds of each piece.
Filename | norm |
as-is |
to-date |
r2 |
r3 |
r4 |
r5 |
r6 |
s8 |
t4 |
u4 |
31.wav |
50 | 33 | 40 | 45 | 47 | 48 | 45 | 43 | 42 | 49 | 45 |
MAPS_MUS-bach_846_AkPnBcht.wav |
87 | 15 | 62 | 64 | 85 | 87 | 87 | 86 | 81 | 86 | 87 |
MAPS_MUS-chpn_op7_1_ENSTDkAm.wav |
33 | 31 | 31 | 11 | 25 | 31 | 32 | 31 | 32 | 34 | 35 |
MAPS_MUS-scn15_7_SptkBGAm.wav |
73 | 16 | 61 | 50 | 57 | 67 | 74 | 75 | 70 | 69 | 68 |
mz_333_1MINp_align.wav |
66 | 3 | 58 | 42 | 60 | 64 | 66 | 63 | 66 | 63 | 65 |
The precision (proportion of correct onsets among detected onsets, or 1 minus the false-positive rate) and recall (proportion of correctly-detected onsets among all ground-truth onsets, or true-positive rate) vary as you would hope:
- when the resulting audio level is quieter than the
norm
case, precision is high and recall is low but the F-measure is worse than thenorm
case - when the resulting audio level is louder than the
norm
case, precision is low and recall is high and the F-measure is still worse than thenorm
case
This suggests that our threshold (which happens to be 6) is moderately well-suited to the norm
case, at least to optimise F-measure (this might not be the most perceptually useful measure though).
The best results (apart from norm
) above seem to be r5
and u4
. Let's try to refine the parameters for each of those and see if any patterns emerge.
Flatten Dynamics fine-tuning¶
The adjustable parameters within r5
, with their defaults, are
Parameter | Description | Default |
historySeconds |
Length of RMS window | 4.0 sec |
catchUpSeconds |
Length of gain slide window | 0.5 sec |
targetRMS |
Target RMS value | 0.05 |
maxGain |
Hard limit on gain | 20.0 |
The targetRMS
is the one we have been varying across r2
, r3
etc -- for r5
it is fixed at 0.05. We don't need to test maxGain
variation.
Here r5hNcM
represents the r5
method with historySeconds
= N and catchUpSeconds
= M/10. So r5
is the same as r5h4c05
. The r5
test was run again, hence variation from above results.
Filename | norm |
r5 |
r5h2c05 |
r5h5c05 |
r5h6c05 |
r5h8c05 |
r5h4c01 |
r5h4c10 |
31.wav |
50 | 47 | 38 | 47 | 48 | 46 | 46 | 53 |
MAPS_MUS-bach_846_AkPnBcht.wav |
87 | 87 | 87 | 87 | 87 | 88 | 86 | 88 |
MAPS_MUS-chpn_op7_1_ENSTDkAm.wav |
33 | 32 | 33 | 32 | 29 | 31 | 32 | 31 |
MAPS_MUS-scn15_7_SptkBGAm.wav |
73 | 73 | 66 | 72 | 76 | 73 | 73 | 73 |
mz_333_1MINp_align.wav |
66 | 66 | 64 | 64 | 66 | 63 | 65 | 66 |
The adjustable parameters within u4
, with their defaults, are
Parameter | Description | Default | ||||
longTermSeconds |
Length of long-term RMS window | 4.0 sec | ||||
shortTermSeconds |
Length of short-term RMS window | 1.0 sec | ||||
catchUpSeconds |
Length of gain slide window | 0.2 sec | ||||
targetMaxRMS |
Target RMS value | 0.04 | ||||
rmsMaxDecay |
Fallback multiplier for max RMS per sample | 0.999 | ||||
squashFactor |
Exponent to skew 0,1 range toward top of range | 0.3 | |
maxGain |
Hard limit on gain | 20.0 |
Start by varying squashFactor
with others at defaults:
Filename | norm |
r5 |
0.1 | 0.3 | 0.5 | 1.0 |
31.wav |
50 | 47 | 42 | 40 | 41 | 45 |
MAPS_MUS-bach_846_AkPnBcht.wav |
87 | 87 | 81 | 82 | 82 | 85 |
MAPS_MUS-chpn_op7_1_ENSTDkAm.wav |
33 | 32 | 29 | 30 | 33 | 30 |
MAPS_MUS-scn15_7_SptkBGAm.wav |
73 | 73 | 59 | 64 | 68 | 63 |
mz_333_1MINp_align.wav |
66 | 66 | 65 | 67 | 64 | 59 |
The 0.3 results are far worse than the u4
results obtained earlier (even though this is the same code). Variance is evidently high.
I don't think u4
is showing good enough results to justify its complexity over the global-only r5
code, and the squash factor seems to offer little.
Let's supersede the u
-series with an s
-series that uses the long-term window (only) from r5
but with some decay in max RMS value to account for pieces that go loud-soft alternately. Parameters:
Parameter | Description | Default |
historySeconds |
Length of long-term RMS window | 4.0 sec |
catchUpSeconds |
Length of gain slide window | 0.2 sec |
targetMaxRMS |
Target RMS value | 0.05 |
rmsMaxDecay |
Fallback multiplier for max RMS per sample | 0.999 |
maxGain |
Hard limit on gain | 20.0 |
We have not yet adjusted this for target RMS, never mind the others. Here's target RMS variation:
Filename | norm |
r5 |
s3 |
s4 |
s5 |
s6 |
s7 |
31.wav |
50 | 47 | 45 | 46 | 42 | 44 | 45 |
MAPS_MUS-bach_846_AkPnBcht.wav |
87 | 87 | 84 | 84 | 83 | 81 | 76 |
MAPS_MUS-chpn_op7_1_ENSTDkAm.wav |
33 | 32 | 21 | 31 | 33 | 30 | 30 |
MAPS_MUS-scn15_7_SptkBGAm.wav |
73 | 73 | 57 | 64 | 68 | 66 | 63 |
mz_333_1MINp_align.wav |
66 | 66 | 56 | 60 | 63 | 63 | 63 |
Varying fallback multiplier for s5
:
Filename | norm |
r5 |
s5 |
0.9 | 0.99 | 1.0 |
31.wav |
50 | 47 | 42 | 44 | 45 | 47 |
MAPS_MUS-bach_846_AkPnBcht.wav |
87 | 87 | 83 | 83 | 84 | 83 |
MAPS_MUS-chpn_op7_1_ENSTDkAm.wav |
33 | 32 | 33 | 31 | 29 | 1(??) |
MAPS_MUS-scn15_7_SptkBGAm.wav |
73 | 73 | 68 | 67 | 69 | 57 |
mz_333_1MINp_align.wav |
66 | 66 | 63 | 63 | 63 | 57 |
For different piano template sets¶
The above results are all generated using four piano templates, numbered 1-3 plus pianorwc
.
Here are results using the norm
and as-is
methods, but with different sets of piano templates: first with three templates (1-3) and then with each template in turn as the only one.
The template turns out not to make an enormous difference -- perhaps because these recordings contain nothing but piano?
Filename | norm /all |
as-is /all |
norm /3of4 |
as-is /3of4 |
norm /1 |
as-is /1 |
norm /2 |
as-is /2 |
norm /3 |
as-is /3 |
norm /rwc |
as-is /rwc |
31.wav |
50 | 33 | 51 | 30 | 50 | 34 | 44 | 42 | 50 | 32 | 56 | 36 |
MAPS_MUS-bach_846_AkPnBcht.wav |
87 | 15 | 86 | 16 | 86 | 24 | 75 | 20 | 73 | 10 | 71 | 18 |
MAPS_MUS-chpn_op7_1_ENSTDkAm.wav |
33 | 31 | 32 | 32 | 31 | 22 | 29 | 31 | 35 | 34 | 32 | 28 |
MAPS_MUS-scn15_7_SptkBGAm.wav |
73 | 16 | 71 | 19 | 71 | 12 | 68 | 14 | 72 | 17 | 70 | 15 |
mz_333_1MINp_align.wav |
66 | 3 | 68 | 1 | 63 | 4 | 67 | 2 | 67 | 1 | 63 | 3 |
For "generic" template set¶
The above results all use template sets with only piano templates in them.
Here are results using the norm
and as-is
methods, but with the full set of instrument templates (four pianos plus all the rest).
Filename | norm |
as-is |
31.wav |
49 | 37 |
MAPS_MUS-bach_846_AkPnBcht.wav |
79 | 34 |
MAPS_MUS-chpn_op7_1_ENSTDkAm.wav |
31 | 28 |
MAPS_MUS-scn15_7_SptkBGAm.wav |
67 | 16 |
mz_333_1MINp_align.wav |
63 | 5 |
Cross-checking with non-piano test data¶
The results need to be roughly comparable with those obtained from pre-normalised data using other datasets as well as the piano one. Here is a subset of the TRIOS dataset. The norm
result is that obtained from the plugin prior to doing this work, using pre-normalised data.
The mirex
result is that from the MIREX 2012 submission in MATLAB, but note that this always uses all instrument templates while the plugin results are based on selecting the "right" instrument for the piece (which is assumed to be the best, though we aren't actually testing that here).
File | mirex |
norm |
u4 |
mozart/piano | 60 | 64 | 56 |
mozart/viola | 33 | 37 | 35 |
mozart/mix | 51 | 58 | 55 |
mozart/clarinet | 74 | 80 | 86 |
lussier/piano | 45 | 52 | 63 |
lussier/mix | 36 | 43 | 40 |
lussier/bassoon | 43 | 75 | 80 |
lussier/trumpet | 43 | 46 | 51 |
take_five/piano | 61 | 46 | 69 |
take_five/mix | 62 | 73 | 69 |
take_five/saxophone | 78 | 80 | 84 |