Version 1 - History - Implementation Notes - Silvet Note Transcription

-Chris Cannam
+h1. Method and implementation Notes
 Chris Cannam
-Chris Cannam
+h2. Publications
 Chris Cannam
-Chris Cannam
+ * Main reference. "A Shift-Invariant Latent Variable Model for Automatic Music Transcription":http://www.mitpressjournals.org/doi/abs/10.1162/COMJ_a_00146
-Chris Cannam
+ * MIREX 2012: a simplified method, without the HMM block. "Multiple-F0 Estimation and Note Tracking for MIREX 2012 using a Shift-Invariant Latent Variable Model":http://www.music-ir.org/mirex/abstracts/2012/BD1.pdf
-Chris Cannam
+ * Updated in 2013. See "Automatic Music Transcription using efficient Multi-Source SI-PLCA (+GPU support)":/projects/amt_mssiplca_fast for a MATLAB implementation.
 Chris Cannam
-Chris Cannam
+We are aiming to produce an "online" implementation (i.e. not requiring all audio before processing starts) in Vamp plugin form, based on the MIREX 2012 method.
 Chris Cannam
-Chris Cannam
+h2. About the method
 Chris Cannam
-Chris Cannam
+The basic flow is audio -> "Constant-Q transform":/projects/constant-q-cpp -> "Probablistic Latent Component Analysis":http://www.cs.illinois.edu/~paris/pubs/plca-report.pdf using "Expectation-Maximization":http://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm -> some note-clustering algorithm.
 Chris Cannam
-Chris Cannam
+The CMJ paper uses a hidden Markov model for note clustering, but the MIREX submission used a simple thresholding method. The thresholding method worked quite well, so we should probably use that.
 Chris Cannam
-Chris Cannam
+The published method has a convolution stage to handle fine pitch variations with a 20 cent resolution. This makes the process many times slower, gaining maybe 2% in overall performance, so we should probably omit it to begin with. The diagrams in "Benetos 2013":http://openaccess.city.ac.uk/2155/ illustrate this stage.
 Chris Cannam
-Chris Cannam
+There is no accommodation for percussion; one might preprocess to remove broadband percussive events.
 Chris Cannam
-Chris Cannam
+There is no accommodation for typical qualities of vocal performance (melisma, vibrato etc).
 Chris Cannam
-Chris Cannam
+h2. Implementation notes
 Chris Cannam
-Chris Cannam
+*Constant-Q transform*: the existing code uses it uses Anssi's "MATLAB toolbox":/projects/constant-q-toolbox. This is substantially better than the Constant-Q in the "qm-dsp library":/projects/qm-dsp. A good first step would be to do a good new C++ Constant-Q implementation. See "this project":/projects/constant-q-cpp for that work.
 Chris Cannam
-Chris Cannam
+*Other notes*:
 Chris Cannam
-Chris Cannam
+ * We could potentially parameterise the sparsity level on z ("sz" variable in MATLAB) as a rough correspondence with number of simultaneous notes
-Chris Cannam
+ * The MIREX method also eliminated any polyphony > 4 by dropping weaker notes
-Chris Cannam
+ * No particular temporal constraints -- the template has no "duration" -- meaning input can be broken up as required, could be processed completely frame-by-frame at the cost of having to reinitialise EM at each frame
 Chris Cannam
-Chris Cannam
+h2. How to test
 Chris Cannam
-Chris Cannam
+ * Constant-Q -- compare with Anssi's MATLAB
-Chris Cannam
+ * Random initialisers for EM mean the method doesn't always produce the same output but it generally converges to within 1% say
-Chris Cannam
+ * Can compare pitch-activations (equation 12 in the paper, "z" variable in the MATLAB code)
-Chris Cannam
+ * Test data: Trios dataset (in C4DM datasets) + MIREX development dataset + RWC + MAPS
 Chris Cannam
-Chris Cannam
+See also: Joachim's work and comparison of different CQT methods: http://www.eecs.qmul.ac.uk/~jga/eusipco2012.html

1

Chris Cannam

h1. Method and implementation Notes

2

1

Chris Cannam

3

1

Chris Cannam

h2. Publications

4

1

Chris Cannam

5

1

Chris Cannam

 * Main reference. "A Shift-Invariant Latent Variable Model for Automatic Music Transcription":http://www.mitpressjournals.org/doi/abs/10.1162/COMJ_a_00146

6

1

Chris Cannam

 * MIREX 2012: a simplified method, without the HMM block. "Multiple-F0 Estimation and Note Tracking for MIREX 2012 using a Shift-Invariant Latent Variable Model":http://www.music-ir.org/mirex/abstracts/2012/BD1.pdf

7

1

Chris Cannam

 * Updated in 2013. See "Automatic Music Transcription using efficient Multi-Source SI-PLCA (+GPU support)":/projects/amt_mssiplca_fast for a MATLAB implementation.

8

1

Chris Cannam

9

1

Chris Cannam

We are aiming to produce an "online" implementation (i.e. not requiring all audio before processing starts) in Vamp plugin form, based on the MIREX 2012 method.

10

1

Chris Cannam

11

1

Chris Cannam

h2. About the method

12

1

Chris Cannam

13

1

Chris Cannam

The basic flow is audio -> "Constant-Q transform":/projects/constant-q-cpp -> "Probablistic Latent Component Analysis":http://www.cs.illinois.edu/~paris/pubs/plca-report.pdf using "Expectation-Maximization":http://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm -> some note-clustering algorithm.

14

1

Chris Cannam

15

1

Chris Cannam

The CMJ paper uses a hidden Markov model for note clustering, but the MIREX submission used a simple thresholding method. The thresholding method worked quite well, so we should probably use that.

16

1

Chris Cannam

17

1

Chris Cannam

The published method has a convolution stage to handle fine pitch variations with a 20 cent resolution. This makes the process many times slower, gaining maybe 2% in overall performance, so we should probably omit it to begin with. The diagrams in "Benetos 2013":http://openaccess.city.ac.uk/2155/ illustrate this stage.

18

1

Chris Cannam

19

1

Chris Cannam

There is no accommodation for percussion; one might preprocess to remove broadband percussive events.

20

1

Chris Cannam

21

1

Chris Cannam

There is no accommodation for typical qualities of vocal performance (melisma, vibrato etc).

22

1

Chris Cannam

23

1

Chris Cannam

h2. Implementation notes

24

1

Chris Cannam

25

1

Chris Cannam

*Constant-Q transform*: the existing code uses it uses Anssi's "MATLAB toolbox":/projects/constant-q-toolbox. This is substantially better than the Constant-Q in the "qm-dsp library":/projects/qm-dsp. A good first step would be to do a good new C++ Constant-Q implementation. See "this project":/projects/constant-q-cpp for that work.

26

1

Chris Cannam

27

1

Chris Cannam

*Other notes*:

28

1

Chris Cannam

29

1

Chris Cannam

 * We could potentially parameterise the sparsity level on z ("sz" variable in MATLAB) as a rough correspondence with number of simultaneous notes

30

1

Chris Cannam

 * The MIREX method also eliminated any polyphony > 4 by dropping weaker notes

31

1

Chris Cannam

 * No particular temporal constraints -- the template has no "duration" -- meaning input can be broken up as required, could be processed completely frame-by-frame at the cost of having to reinitialise EM at each frame

32

1

Chris Cannam

33

1

Chris Cannam

h2. How to test

34

1

Chris Cannam

35

1

Chris Cannam

 * Constant-Q -- compare with Anssi's MATLAB

36

1

Chris Cannam

 * Random initialisers for EM mean the method doesn't always produce the same output but it generally converges to within 1% say

37

1

Chris Cannam

 * Can compare pitch-activations (equation 12 in the paper, "z" variable in the MATLAB code)

38

1

Chris Cannam

 * Test data: Trios dataset (in C4DM datasets) + MIREX development dataset + RWC + MAPS

39

1

Chris Cannam

40

1

Chris Cannam

See also: Joachim's work and comparison of different CQT methods: http://www.eecs.qmul.ac.uk/~jga/eusipco2012.html

Silvet Note Transcription

Implementation Notes » History » Version 1