Piano Evaluation for Level Normalisation¶

Silvet Note Transcription

Piano Evaluation for Level Normalisation » History » Version 45

Input files¶

Methods¶

Results¶

Wiki

Flatten Dynamics fine-tuning¶

For different piano template sets¶

For "generic" template set¶

Cross-checking with non-piano test data¶

« Previous - Version 45/47 (diff) - Next » - Current version
Chris Cannam, 2014-07-23 03:40 PM

Lack of normalisation for Vamp plugin inputs is a problem when analysing quiet recordings (see #1028).

Testing using a small set of piano recordings, quickly evaluating performance across the first 30 seconds under a number of different normalisation / level management regimes.

Filename	Signal max approx
`31.wav`	0.57
`MAPS_MUS-bach_846_AkPnBcht.wav`	0.12
`MAPS_MUS-chpn_op7_1_ENSTDkAm.wav`	0.33
`MAPS_MUS-scn15_7_SptkBGAm.wav`	0.13
`mz_333_1MINp_align.wav`	0.10

The plugin has one internal threshold parameter, which can be lowered to find quieter notes (at the expense of course of more false positives). We don't really want to expose this (or any continuous controls) as a parameter. But we need to have approximately predictable input levels, for this threshold to be meaningful.

Name	Hg revision	Description
`norm`	d721a17f3e14	Normalise to 0.50 max before running plugin (can't do this in plugin: it's here as the reference case)
`as-is`	d721a17f3e14	No normalisation
`to-date`	d9b688700819	Track max signal level so far, adjust each sample so that max is at 0.50
`r2`,`r3`,`r4`,`r5`,`r6`	b5a8836dd2a4	Preprocess with Flatten Dynamics at 0.02, 0.03, 0.04, 0.05, 0.06 target RMS levels respectively
`s8`	4ac067799e0b	With Flatten Dynamics second attempt with max RMS targeted to 0.08
`t4`	d67fae2bb29e	With Flatten Dynamics attempt 2a with max RMS targeted to 0.04
`u4`	70773820e719	With Flatten Dynamics attempt 2b with max RMS targeted to 0.04

Reporting only the note onset F-measure for the first 30 seconds of each piece.

Filename	`norm`	`as-is`	`to-date`	`r2`	`r3`	`r4`	`r5`	`r6`	`s8`	`t4`	`u4`
`31.wav`	50	33	40	45	47	48	45	43	42	49	45
`MAPS_MUS-bach_846_AkPnBcht.wav`	87	15	62	64	85	87	87	86	81	86	87
`MAPS_MUS-chpn_op7_1_ENSTDkAm.wav`	33	31	31	11	25	31	32	31	32	34	35
`MAPS_MUS-scn15_7_SptkBGAm.wav`	73	16	61	50	57	67	74	75	70	69	68
`mz_333_1MINp_align.wav`	66	3	58	42	60	64	66	63	66	63	65

The precision (proportion of correct onsets among detected onsets, or 1 minus the false-positive rate) and recall (proportion of correctly-detected onsets among all ground-truth onsets, or true-positive rate) vary as you would hope:

when the resulting audio level is quieter than the norm case, precision is high and recall is low but the F-measure is worse than the norm case
when the resulting audio level is louder than the norm case, precision is low and recall is high and the F-measure is still worse than the norm case

This suggests that our threshold (which happens to be 6) is moderately well-suited to the norm case, at least to optimise F-measure (this might not be the most perceptually useful measure though).

The best results (apart from norm) above seem to be r5 and u4. Let's try to refine the parameters for each of those and see if any patterns emerge.

The adjustable parameters within r5, with their defaults, are

Parameter	Description	Default
`historySeconds`	Length of RMS window	4.0 sec
`catchUpSeconds`	Length of gain slide window	0.5 sec
`targetRMS`	Target RMS value	0.05
`maxGain`	Hard limit on gain	20.0

The targetRMS is the one we have been varying across r2, r3 etc -- for r5 it is fixed at 0.05. We don't need to test maxGain variation.

Here r5hNcM represents the r5 method with historySeconds = N and catchUpSeconds = M/10. So r5 is the same as r5h4c05. The r5 test was run again, hence variation from above results.

Filename	`norm`	`r5`	`r5h2c05`	`r5h5c05`	`r5h6c05`	`r5h8c05`	`r5h4c01`	`r5h4c10`
`31.wav`	50	47	38	47	48	46	46	53
`MAPS_MUS-bach_846_AkPnBcht.wav`	87	87	87	87	87	88	86	88
`MAPS_MUS-chpn_op7_1_ENSTDkAm.wav`	33	32	33	32	29	31	32	31
`MAPS_MUS-scn15_7_SptkBGAm.wav`	73	73	66	72	76	73	73	73
`mz_333_1MINp_align.wav`	66	66	64	64	66	63	65	66

The adjustable parameters within u4, with their defaults, are

Parameter	Description	Default
`longTermSeconds`	Length of long-term RMS window	4.0 sec
`shortTermSeconds`	Length of short-term RMS window	1.0 sec
`catchUpSeconds`	Length of gain slide window	0.2 sec
`targetMaxRMS`	Target RMS value	0.04
`rmsMaxDecay`	Fallback multiplier for max RMS per sample	0.999
`squashFactor`	Exponent to skew 0,1 range toward top of range	0.3	`maxGain`	Hard limit on gain	20.0

Start by varying squashFactor with others at defaults:

The 0.3 results are far worse than the u4 results obtained earlier (even though this is the same code). Variance is evidently high.

I don't think u4 is showing good enough results to justify its complexity over the global-only r5 code, and the squash factor seems to offer little.

Let's supersede the u-series with an s-series that uses the long-term window (only) from r5 but with some decay in max RMS value to account for pieces that go loud-soft alternately. Parameters:

Parameter	Description	Default
`historySeconds`	Length of long-term RMS window	4.0 sec
`catchUpSeconds`	Length of gain slide window	0.2 sec
`targetMaxRMS`	Target RMS value	0.05
`rmsMaxDecay`	Fallback multiplier for max RMS per sample	0.999
`maxGain`	Hard limit on gain	20.0

We have not yet adjusted this for target RMS, never mind the others. Here's target RMS variation:

Filename	`norm`	`r5`	`s3`	`s4`	`s5`	`s6`	`s7`
`31.wav`	50	47	45	46	42	44	45
`MAPS_MUS-bach_846_AkPnBcht.wav`	87	87	84	84	83	81	76
`MAPS_MUS-chpn_op7_1_ENSTDkAm.wav`	33	32	21	31	33	30	30
`MAPS_MUS-scn15_7_SptkBGAm.wav`	73	73	57	64	68	66	63
`mz_333_1MINp_align.wav`	66	66	56	60	63	63	63

Varying fallback multiplier for s5:

Filename	`norm`	`r5`	`s5`	0.9	0.99	1.0
`31.wav`	50	47	42	44	45	47
`MAPS_MUS-bach_846_AkPnBcht.wav`	87	87	83	83	84	83
`MAPS_MUS-chpn_op7_1_ENSTDkAm.wav`	33	32	33	31	29	1(??)
`MAPS_MUS-scn15_7_SptkBGAm.wav`	73	73	68	67	69	57
`mz_333_1MINp_align.wav`	66	66	63	63	63	57

The above results are all generated using four piano templates, numbered 1-3 plus pianorwc.

Here are results using the norm and as-is methods, but with different sets of piano templates: first with three templates (1-3) and then with each template in turn as the only one.

The template turns out not to make an enormous difference -- perhaps because these recordings contain nothing but piano?

Filename	`norm`/all	`as-is`/all	`norm`/3of4	`as-is`/3of4	`norm`/1	`as-is`/1	`norm`/2	`as-is`/2	`norm`/3	`as-is`/3	`norm`/rwc	`as-is`/rwc
`31.wav`	50	33	51	30	50	34	44	42	50	32	56	36
`MAPS_MUS-bach_846_AkPnBcht.wav`	87	15	86	16	86	24	75	20	73	10	71	18
`MAPS_MUS-chpn_op7_1_ENSTDkAm.wav`	33	31	32	32	31	22	29	31	35	34	32	28
`MAPS_MUS-scn15_7_SptkBGAm.wav`	73	16	71	19	71	12	68	14	72	17	70	15
`mz_333_1MINp_align.wav`	66	3	68	1	63	4	67	2	67	1	63	3

The above results all use template sets with only piano templates in them.

Here are results using the norm and as-is methods, but with the full set of instrument templates (four pianos plus all the rest).

Filename	`norm`	`as-is`
`31.wav`	49	37
`MAPS_MUS-bach_846_AkPnBcht.wav`	79	34
`MAPS_MUS-chpn_op7_1_ENSTDkAm.wav`	31	28
`MAPS_MUS-scn15_7_SptkBGAm.wav`	67	16
`mz_333_1MINp_align.wav`	63	5

The results need to be roughly comparable with those obtained from pre-normalised data using other datasets as well as the piano one. Here is a subset of the TRIOS dataset. The norm result is that obtained from the plugin prior to doing this work, using pre-normalised data.

The mirex result is that from the MIREX 2012 submission in MATLAB, but note that this always uses all instrument templates while the plugin results are based on selecting the "right" instrument for the piece (which is assumed to be the best, though we aren't actually testing that here).

File	`mirex`	`norm`	`u4`
mozart/piano	60	64	56
mozart/viola	33	37	35
mozart/mix	51	58	55
mozart/clarinet	74	80	86
lussier/piano	45	52	63
lussier/mix	36	43	40
lussier/bassoon	43	75	80
lussier/trumpet	43	46	51
take_five/piano	61	46	69
take_five/mix	62	73	69
take_five/saxophone	78	80	84

MAPS_MUS-bach_846_AkPnBcht.wav

MAPS_MUS-chpn_op7_1_ENSTDkAm.wav

MAPS_MUS-scn15_7_SptkBGAm.wav

mz_333_1MINp_align.wav