Version 4 - History - Wiki - Flatten Dynamics

Wiki » History » Version 4

Chris Cannam, 2014-07-21 08:12 AM

-Chris Cannam
+h1. Flatten Dynamics
 Chris Cannam
-Chris Cannam
+This differs from a "musical" dynamics compressor because it should be fairly drastic, it doesn't need to be especially musical, and it wants to scale everything so as to have a quite predictable overall RMS level across the whole file.
 Chris Cannam
-Chris Cannam
+Generally speaking, a plugin using this to flatten out its input will probably also want to use its reported gain to un-flatten its output.
 Chris Cannam
-Chris Cannam
+Trying this out in the "Piano Evaluation of the Silvet Note Transcription plugin":/projects/silvet/wiki/Piano_Evaluation_for_Level_Normalisation
 Chris Cannam
-Chris Cannam
+I use the term "level" below where in implementation terms I'm using RMS -- some other sort of averaged level calculation might do.
 Chris Cannam
-Chris Cannam
+h3. First attempt
 Chris Cannam
-Chris Cannam
+As of commit:e36fe9312ad4
 Chris Cannam
-Chris Cannam
+The aim is just to make the level across a few seconds of audio tend toward some target.
 Chris Cannam
-Chris Cannam
+We have a target level T (example 0.05). Start with an initial gain G equal to 1.
 Chris Cannam
-Chris Cannam
+At each sample:
 Chris Cannam
-Chris Cannam
+* Update calculation of level of the past 4 seconds of audio
-Chris Cannam
+* Find the gain G' that would be necessary to make that level equal to T (i.e. T / level)
-Chris Cannam
+* Update our stored gain G to move it 1/N of the distance from G to G' (where N is 0.5 seconds in sample count)
-Chris Cannam
+* Return the sample scaled by G
 Chris Cannam
-Chris Cannam
+h3. Possible alternative
 Chris Cannam
-Chris Cannam
+Aim to get the maximum level across the whole input, measured in a moving window of a few seconds length, scaled to our target T. We need to do this for the maximum-so-far (input is in real time).
 Chris Cannam
-Chris Cannam
+Meanwhile aim to get each individual sample scaled according to the local level, that of the past one or two seconds at most. This should be more like a compressor, some sort of knee'd or sigmoid curve that finds the difference between the locally-averaged recent level and the target level, scales this on the curve, then applies the resulting gain.