Wiki » History » Version 4

Chris Cannam, 2014-07-21 08:12 AM

1 3 Chris Cannam
h1. Flatten Dynamics
2 1 Chris Cannam
3 3 Chris Cannam
This differs from a "musical" dynamics compressor because it should be fairly drastic, it doesn't need to be especially musical, and it wants to scale everything so as to have a quite predictable overall RMS level across the whole file.
4 3 Chris Cannam
5 4 Chris Cannam
Generally speaking, a plugin using this to flatten out its input will probably also want to use its reported gain to un-flatten its output.
6 3 Chris Cannam
7 1 Chris Cannam
Trying this out in the "Piano Evaluation of the Silvet Note Transcription plugin":/projects/silvet/wiki/Piano_Evaluation_for_Level_Normalisation
8 1 Chris Cannam
9 3 Chris Cannam
I use the term "level" below where in implementation terms I'm using RMS -- some other sort of averaged level calculation might do.
10 3 Chris Cannam
11 1 Chris Cannam
h3. First attempt
12 2 Chris Cannam
13 1 Chris Cannam
As of commit:e36fe9312ad4
14 1 Chris Cannam
15 3 Chris Cannam
The aim is just to make the level across a few seconds of audio tend toward some target.
16 1 Chris Cannam
17 3 Chris Cannam
We have a target level T (example 0.05). Start with an initial gain G equal to 1.
18 1 Chris Cannam
19 1 Chris Cannam
At each sample:
20 1 Chris Cannam
21 3 Chris Cannam
* Update calculation of level of the past 4 seconds of audio
22 3 Chris Cannam
* Find the gain G' that would be necessary to make that level equal to T (i.e. T / level)
23 1 Chris Cannam
* Update our stored gain G to move it 1/N of the distance from G to G' (where N is 0.5 seconds in sample count)
24 1 Chris Cannam
* Return the sample scaled by G
25 3 Chris Cannam
26 3 Chris Cannam
h3. Possible alternative
27 3 Chris Cannam
28 3 Chris Cannam
Aim to get the maximum level across the whole input, measured in a moving window of a few seconds length, scaled to our target T. We need to do this for the maximum-so-far (input is in real time).
29 3 Chris Cannam
30 3 Chris Cannam
Meanwhile aim to get each individual sample scaled according to the local level, that of the past one or two seconds at most. This should be more like a compressor, some sort of knee'd or sigmoid curve that finds the difference between the locally-averaged recent level and the target level, scales this on the curve, then applies the resulting gain.