Wiki » History » Version 6
Chris Cannam, 2014-07-22 08:35 AM
1 | 3 | Chris Cannam | h1. Flatten Dynamics |
---|---|---|---|
2 | 1 | Chris Cannam | |
3 | 3 | Chris Cannam | This differs from a "musical" dynamics compressor because it should be fairly drastic, it doesn't need to be especially musical, and it wants to scale everything so as to have a quite predictable overall RMS level across the whole file. |
4 | 3 | Chris Cannam | |
5 | 4 | Chris Cannam | Generally speaking, a plugin using this to flatten out its input will probably also want to use its reported gain to un-flatten its output. |
6 | 3 | Chris Cannam | |
7 | 1 | Chris Cannam | Trying this out in the "Piano Evaluation of the Silvet Note Transcription plugin":/projects/silvet/wiki/Piano_Evaluation_for_Level_Normalisation |
8 | 1 | Chris Cannam | |
9 | 3 | Chris Cannam | I use the term "level" below where in implementation terms I'm using RMS -- some other sort of averaged level calculation might do. |
10 | 3 | Chris Cannam | |
11 | 1 | Chris Cannam | h3. First attempt |
12 | 2 | Chris Cannam | |
13 | 1 | Chris Cannam | As of commit:e36fe9312ad4 |
14 | 1 | Chris Cannam | |
15 | 3 | Chris Cannam | The aim is just to make the level across a few seconds of audio tend toward some target. |
16 | 1 | Chris Cannam | |
17 | 3 | Chris Cannam | We have a target level T (example 0.05). Start with an initial gain G equal to 1. |
18 | 1 | Chris Cannam | |
19 | 1 | Chris Cannam | At each sample: |
20 | 1 | Chris Cannam | |
21 | 3 | Chris Cannam | * Update calculation of level of the past 4 seconds of audio |
22 | 3 | Chris Cannam | * Find the gain G' that would be necessary to make that level equal to T (i.e. T / level) |
23 | 1 | Chris Cannam | * Update our stored gain G to move it 1/N of the distance from G to G' (where N is 0.5 seconds in sample count) |
24 | 1 | Chris Cannam | * Return the sample scaled by G |
25 | 3 | Chris Cannam | |
26 | 6 | Chris Cannam | h3. Second attempt |
27 | 3 | Chris Cannam | |
28 | 3 | Chris Cannam | Aim to get the maximum level across the whole input, measured in a moving window of a few seconds length, scaled to our target T. We need to do this for the maximum-so-far (input is in real time). |
29 | 3 | Chris Cannam | |
30 | 3 | Chris Cannam | Meanwhile aim to get each individual sample scaled according to the local level, that of the past one or two seconds at most. This should be more like a compressor, some sort of knee'd or sigmoid curve that finds the difference between the locally-averaged recent level and the target level, scales this on the curve, then applies the resulting gain. |
31 | 5 | Chris Cannam | |
32 | 5 | Chris Cannam | Some experiments on this in commit:6b732542a34c which I'm testing out a bit in Silvet. |