This differs from a "musical" dynamics compressor because it should be fairly drastic, it doesn't need to be especially musical, and it wants to scale everything so as to have a quite predictable overall RMS level across the whole file.
Note that in practice, any plugin using this to flatten out its input should also use its reported gain to un-flatten its output.
Trying this out in the Piano Evaluation of the Silvet Note Transcription plugin
I use the term "level" below where in implementation terms I'm using RMS -- some other sort of averaged level calculation might do.
As of e36fe9312ad4
The aim is just to make the level across a few seconds of audio tend toward some target.
We have a target level T (example 0.05). Start with an initial gain G equal to 1.
At each sample:
- Update calculation of level of the past 4 seconds of audio
- Find the gain G' that would be necessary to make that level equal to T (i.e. T / level)
- Update our stored gain G to move it 1/N of the distance from G to G' (where N is 0.5 seconds in sample count)
- Return the sample scaled by G
Aim to get the maximum level across the whole input, measured in a moving window of a few seconds length, scaled to our target T. We need to do this for the maximum-so-far (input is in real time).
Meanwhile aim to get each individual sample scaled according to the local level, that of the past one or two seconds at most. This should be more like a compressor, some sort of knee'd or sigmoid curve that finds the difference between the locally-averaged recent level and the target level, scales this on the curve, then applies the resulting gain.