Implementation Notes » History » Version 3
Chris Cannam, 2014-02-13 12:59 PM
1 | 1 | Chris Cannam | h1. Method and implementation Notes |
---|---|---|---|
2 | 1 | Chris Cannam | |
3 | 1 | Chris Cannam | h2. Publications |
4 | 1 | Chris Cannam | |
5 | 1 | Chris Cannam | * Main reference. "A Shift-Invariant Latent Variable Model for Automatic Music Transcription":http://www.mitpressjournals.org/doi/abs/10.1162/COMJ_a_00146 |
6 | 1 | Chris Cannam | * MIREX 2012: a simplified method, without the HMM block. "Multiple-F0 Estimation and Note Tracking for MIREX 2012 using a Shift-Invariant Latent Variable Model":http://www.music-ir.org/mirex/abstracts/2012/BD1.pdf |
7 | 2 | Chris Cannam | * Updated in 2013 as "Multiple-instrument polyphonic music transcription using a temporally constrained shift-invariant model":http://openaccess.city.ac.uk/2155/ |
8 | 2 | Chris Cannam | * For more recent work in MATLAB, see the project "Automatic Music Transcription using efficient Multi-Source SI-PLCA (+GPU support)":/projects/amt_mssiplca_fast. |
9 | 1 | Chris Cannam | |
10 | 1 | Chris Cannam | We are aiming to produce an "online" implementation (i.e. not requiring all audio before processing starts) in Vamp plugin form, based on the MIREX 2012 method. |
11 | 1 | Chris Cannam | |
12 | 1 | Chris Cannam | h2. About the method |
13 | 1 | Chris Cannam | |
14 | 3 | Chris Cannam | The basic flow is audio -> "Constant-Q transform":/projects/constant-q-cpp -> "Probabilistic Latent Component Analysis":http://www.cs.illinois.edu/~paris/pubs/plca-report.pdf using "Expectation-Maximization":http://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm -> some note-clustering algorithm. |
15 | 1 | Chris Cannam | |
16 | 1 | Chris Cannam | The CMJ paper uses a hidden Markov model for note clustering, but the MIREX submission used a simple thresholding method. The thresholding method worked quite well, so we should probably use that. |
17 | 1 | Chris Cannam | |
18 | 1 | Chris Cannam | The published method has a convolution stage to handle fine pitch variations with a 20 cent resolution. This makes the process many times slower, gaining maybe 2% in overall performance, so we should probably omit it to begin with. The diagrams in "Benetos 2013":http://openaccess.city.ac.uk/2155/ illustrate this stage. |
19 | 1 | Chris Cannam | |
20 | 1 | Chris Cannam | There is no accommodation for percussion; one might preprocess to remove broadband percussive events. |
21 | 1 | Chris Cannam | |
22 | 1 | Chris Cannam | There is no accommodation for typical qualities of vocal performance (melisma, vibrato etc). |
23 | 1 | Chris Cannam | |
24 | 1 | Chris Cannam | h2. Implementation notes |
25 | 1 | Chris Cannam | |
26 | 1 | Chris Cannam | *Constant-Q transform*: the existing code uses it uses Anssi's "MATLAB toolbox":/projects/constant-q-toolbox. This is substantially better than the Constant-Q in the "qm-dsp library":/projects/qm-dsp. A good first step would be to do a good new C++ Constant-Q implementation. See "this project":/projects/constant-q-cpp for that work. |
27 | 1 | Chris Cannam | |
28 | 1 | Chris Cannam | *Other notes*: |
29 | 1 | Chris Cannam | |
30 | 1 | Chris Cannam | * We could potentially parameterise the sparsity level on z ("sz" variable in MATLAB) as a rough correspondence with number of simultaneous notes |
31 | 1 | Chris Cannam | * The MIREX method also eliminated any polyphony > 4 by dropping weaker notes |
32 | 1 | Chris Cannam | * No particular temporal constraints -- the template has no "duration" -- meaning input can be broken up as required, could be processed completely frame-by-frame at the cost of having to reinitialise EM at each frame |
33 | 1 | Chris Cannam | |
34 | 1 | Chris Cannam | h2. How to test |
35 | 1 | Chris Cannam | |
36 | 1 | Chris Cannam | * Constant-Q -- compare with Anssi's MATLAB |
37 | 1 | Chris Cannam | * Random initialisers for EM mean the method doesn't always produce the same output but it generally converges to within 1% say |
38 | 1 | Chris Cannam | * Can compare pitch-activations (equation 12 in the paper, "z" variable in the MATLAB code) |
39 | 1 | Chris Cannam | * Test data: Trios dataset (in C4DM datasets) + MIREX development dataset + RWC + MAPS |
40 | 1 | Chris Cannam | |
41 | 1 | Chris Cannam | See also: Joachim's work and comparison of different CQT methods: http://www.eecs.qmul.ac.uk/~jga/eusipco2012.html |