Method and implementation Notes¶
- Main reference. A Shift-Invariant Latent Variable Model for Automatic Music Transcription
- MIREX 2012: a simplified method, without the HMM block. Multiple-F0 Estimation and Note Tracking for MIREX 2012 using a Shift-Invariant Latent Variable Model
- Updated in 2013 as Multiple-instrument polyphonic music transcription using a temporally constrained shift-invariant model
- For more recent work in MATLAB, see the project Automatic Music Transcription using efficient Multi-Source SI-PLCA.
We are aiming to produce an "online" implementation (i.e. not requiring all audio before processing starts) in Vamp plugin form, based on the MIREX 2012 method.
About the method¶
The CMJ paper uses a hidden Markov model for note clustering, but the MIREX submission used a simple thresholding method. The thresholding method worked quite well, so we should probably use that.
The published method has a convolution stage to handle fine pitch variations with a 20 cent resolution. This makes the process many times slower, gaining maybe 2% in overall performance, so we should probably omit it to begin with. The diagrams in Benetos 2013 illustrate this stage.
There is no accommodation for percussion; one might preprocess to remove broadband percussive events.
There is no accommodation for typical qualities of vocal performance (portamento, vibrato etc).
Constant-Q transform: the existing code uses it uses Anssi's MATLAB toolbox. This is substantially better than the Constant-Q in the qm-dsp library. A good first step would be to do a good new C++ Constant-Q implementation. See this project for that work.
- We could potentially parameterise the sparsity level on z ("sz" variable in MATLAB) as a rough correspondence with number of simultaneous notes
- The MIREX method also eliminated any polyphony > 4 by dropping weaker notes
- No particular temporal constraints -- the template has no "duration" -- meaning input can be broken up as required, could be processed completely frame-by-frame at the cost of having to reinitialise EM at each frame
How to test¶
- Constant-Q -- compare with Anssi's MATLAB
- Random initialisers for EM mean the method doesn't always produce the same output but it generally converges to within 1% say
- Can compare pitch-activations (equation 12 in the paper, "z" variable in the MATLAB code)
- Test data: Trios dataset (in C4DM datasets) + MIREX development dataset + RWC + MAPS
See also: Joachim's work and comparison of different CQT methods: http://www.eecs.qmul.ac.uk/~jga/eusipco2012.html