## Wiki » History » Version 3

*Chris Cannam, 2014-07-18 08:23 PM *

1 | 3 | Chris Cannam | h1. Flatten Dynamics |
---|---|---|---|

2 | 1 | Chris Cannam | |

3 | 3 | Chris Cannam | This differs from a "musical" dynamics compressor because it should be fairly drastic, it doesn't need to be especially musical, and it wants to scale everything so as to have a quite predictable overall RMS level across the whole file. |

4 | 3 | Chris Cannam | |

5 | 3 | Chris Cannam | Note that in practice, any plugin using this to flatten out its input should also use its reported gain to un-flatten its output. |

6 | 3 | Chris Cannam | |

7 | 1 | Chris Cannam | Trying this out in the "Piano Evaluation of the Silvet Note Transcription plugin":/projects/silvet/wiki/Piano_Evaluation_for_Level_Normalisation |

8 | 1 | Chris Cannam | |

9 | 3 | Chris Cannam | I use the term "level" below where in implementation terms I'm using RMS -- some other sort of averaged level calculation might do. |

10 | 3 | Chris Cannam | |

11 | 1 | Chris Cannam | h3. First attempt |

12 | 2 | Chris Cannam | |

13 | 1 | Chris Cannam | As of commit:e36fe9312ad4 |

14 | 1 | Chris Cannam | |

15 | 3 | Chris Cannam | The aim is just to make the level across a few seconds of audio tend toward some target. |

16 | 1 | Chris Cannam | |

17 | 3 | Chris Cannam | We have a target level T (example 0.05). Start with an initial gain G equal to 1. |

18 | 1 | Chris Cannam | |

19 | 1 | Chris Cannam | At each sample: |

20 | 1 | Chris Cannam | |

21 | 3 | Chris Cannam | * Update calculation of level of the past 4 seconds of audio |

22 | 3 | Chris Cannam | * Find the gain G' that would be necessary to make that level equal to T (i.e. T / level) |

23 | 1 | Chris Cannam | * Update our stored gain G to move it 1/N of the distance from G to G' (where N is 0.5 seconds in sample count) |

24 | 1 | Chris Cannam | * Return the sample scaled by G |

25 | 3 | Chris Cannam | |

26 | 3 | Chris Cannam | h3. Possible alternative |

27 | 3 | Chris Cannam | |

28 | 3 | Chris Cannam | Aim to get the maximum level across the whole input, measured in a moving window of a few seconds length, scaled to our target T. We need to do this for the maximum-so-far (input is in real time). |

29 | 3 | Chris Cannam | |

30 | 3 | Chris Cannam | Meanwhile aim to get each individual sample scaled according to the local level, that of the past one or two seconds at most. This should be more like a compressor, some sort of knee'd or sigmoid curve that finds the difference between the locally-averaged recent level and the target level, scales this on the curve, then applies the resulting gain. |