cannam@16
|
1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
cannam@16
|
2 <html>
|
cannam@16
|
3 <head>
|
cannam@16
|
4 <link rel="stylesheet" media="screen" type="text/css" href="/screen.css"/>
|
cannam@16
|
5 <link rel="icon" type="image/png" href="/images/waveform.png"/>
|
cannam@16
|
6 <link rel="shortcut" type="image/png" href="/images/waveform.png"/>
|
cannam@16
|
7 <title>QM Vamp Plugins: User Documentation</title>
|
cannam@16
|
8 <meta name="robots" content="index"/>
|
cannam@16
|
9 </head>
|
cannam@16
|
10 <body>
|
cannam@16
|
11 <h1 id="header"><span>Vamp Plugins</span></h1>
|
cannam@16
|
12
|
cannam@16
|
13 <h2>QM Vamp Plugins</h2>
|
cannam@16
|
14
|
cannam@16
|
15 <p>The QM Vamp Plugin set is a library of Vamp audio feature
|
cannam@16
|
16 extraction plugins developed at the <a
|
Chris@109
|
17 href="http://c4dm.eecs.qmul.ac.uk/">Centre for Digital Music</a> at
|
Chris@109
|
18 Queen Mary, University of London. These plugins are provided as a
|
Chris@109
|
19 single library file, made available in source and binary form for
|
Chris@109
|
20 Windows, OS/X, and Linux via the <a
|
Chris@109
|
21 href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/">SoundSoftware
|
Chris@109
|
22 code site</a> (see <a
|
Chris@109
|
23 href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/files">download
|
Chris@109
|
24 page</a>).
|
Chris@109
|
25
|
cannam@16
|
26 </p>
|
cannam@16
|
27 <p>For more information about Vamp plugins, see <a href="http://www.vamp-plugins.org/">http://www.vamp-plugins.org/</a> .
|
cannam@16
|
28 </p>
|
cannam@16
|
29
|
cannam@16
|
30 <div class="toc2">1. <a href="#qm-onsetdetector">Note Onset Detector</a></div>
|
cannam@16
|
31 <div class="toc2">2. <a href="#qm-tempotracker">Tempo and Beat Tracker</a></div>
|
cannam@29
|
32 <div class="toc2">3. <a href="#qm-barbeattracker">Bar and Beat Tracker</a></div>
|
cannam@29
|
33 <div class="toc2">4. <a href="#qm-keydetector">Key Detector</a></div>
|
cannam@29
|
34 <div class="toc2">5. <a href="#qm-tonalchange">Tonal Change</a></div>
|
cannam@29
|
35 <div class="toc2">6. <a href="#qm-adaptivespectrogram">Adaptive Spectrogram</a></div>
|
cannam@29
|
36 <div class="toc2">7. <a href="#qm-transcription">Polyphonic Transcription</a></div>
|
cannam@29
|
37 <div class="toc2">8. <a href="#qm-segmenter">Segmenter</a></div>
|
cannam@29
|
38 <div class="toc2">9. <a href="#qm-similarity">Similarity</a></div>
|
cannam@29
|
39 <div class="toc2">10. <a href="#qm-dwt">Discrete Wavelet Transform</a></div>
|
cannam@29
|
40 <div class="toc2">11. <a href="#qm-constantq">Constant-Q Spectrogram</a></div>
|
cannam@29
|
41 <div class="toc2">12. <a href="#qm-chromagram">Chromagram</a></div>
|
cannam@29
|
42 <div class="toc2">13. <a href="#qm-mfcc">Mel-Frequency Cepstral Coefficients</a></div>
|
cannam@16
|
43
|
cannam@29
|
44 <a name="qm-onsetdetector"></a><h2>1. Note Onset Detector</h2>
|
cannam@16
|
45
|
cannam@16
|
46 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-onsetdetector</code>
|
cannam@16
|
47 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-onsetdetector">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-onsetdetector</a>
|
Chris@109
|
48 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/files">Download location</a>
|
cannam@16
|
49 </p>
|
cannam@16
|
50 <p>Note Onset Detector analyses a single channel of audio and estimates
|
cannam@16
|
51 the onset times of notes within the music – that is, the times at
|
cannam@16
|
52 which notes and other audible events begin.
|
cannam@16
|
53 </p>
|
cannam@16
|
54 <p>It calculates an onset likelihood function for each spectral frame,
|
cannam@16
|
55 and picks peaks in a smoothed version of this function. The plugin is
|
cannam@16
|
56 non-causal, returning all results at the end of processing.
|
cannam@16
|
57 </p>
|
cannam@16
|
58 <h3>Parameters</h3>
|
cannam@16
|
59
|
cannam@16
|
60 <p><b>Onset Detection Function Type</b> – The method used to calculate the
|
cannam@16
|
61 onset likelihood function. The most versatile method is the default,
|
cannam@16
|
62 "Complex Domain" (see reference, Duxbury et al 2003). "Spectral
|
cannam@16
|
63 Difference" may be appropriate for percussive recordings, "Phase
|
cannam@16
|
64 Deviation" for non-percussive music, and "Broadband Energy Rise" (see
|
cannam@16
|
65 reference, Barry et al 2005) for identifying percussive onsets in
|
cannam@16
|
66 mixed music.
|
cannam@16
|
67 </p>
|
cannam@16
|
68 <p><b>Onset Detector Sensitivity</b> – Sensitivity level for peak detection
|
cannam@16
|
69 in the onset likelihood function. The higher the sensitivity, the
|
cannam@16
|
70 more onsets will (rightly or wrongly) be detected. The peak picker
|
cannam@16
|
71 does not have a simple threshold level; instead, this parameter
|
cannam@16
|
72 controls the required "steepness" of the slopes in the smoothed
|
cannam@16
|
73 detection function either side of a peak value, in order for that peak
|
cannam@16
|
74 to be accepted as an onset.
|
cannam@16
|
75 </p>
|
cannam@16
|
76 <p><b>Adaptive Whitening</b> – This option evens out the temporal and
|
cannam@16
|
77 frequency variation in the signal, which can yield improved
|
cannam@16
|
78 performance in onset detection, for example in audio with big
|
cannam@16
|
79 variations in dynamics.
|
cannam@16
|
80 </p>
|
cannam@16
|
81 <h3>Outputs</h3>
|
cannam@16
|
82
|
cannam@16
|
83 <p><b>Note Onsets</b> – The detected note onset times, returned as a single
|
cannam@16
|
84 feature with timestamp but no value for each detected note.
|
cannam@16
|
85 </p>
|
cannam@16
|
86 <p><b>Onset Detection Function</b> – The raw note onset likelihood function
|
cannam@16
|
87 that was calculated as the first step of the detection process.
|
cannam@16
|
88 </p>
|
cannam@16
|
89 <p><b>Smoothed Detection Function</b> – The note onset likelihood function
|
cannam@16
|
90 following median filtering. This is the function from which
|
cannam@16
|
91 sufficiently steep peak values are picked and classified as onsets.
|
cannam@16
|
92 </p>
|
cannam@16
|
93 <h3>References and Credits</h3>
|
cannam@16
|
94
|
cannam@16
|
95 <p><b>Basic detection methods</b>: C. Duxbury, J. P. Bello, M. Davies and
|
cannam@16
|
96 M. Sandler, <i><a href="http://www.elec.qmul.ac.uk/dafx03/proceedings/pdfs/dafx81.pdf">Complex domain Onset Detection for Musical Signals</a></i>. In
|
cannam@16
|
97 Proceedings of the 6th Conference on Digital Audio Effects
|
cannam@16
|
98 (DAFx-03). London, UK. September 2003.
|
cannam@16
|
99 </p>
|
cannam@16
|
100 <p><b>Adaptive whitening</b>: D. Stowell and M. D. Plumbley, <i><a href="http://www.elec.qmul.ac.uk/digitalmusic/papers/2007/StowellPlumbley07-icmc.pdf">Adaptive whitening for improved real-time audio onset detection</a></i>. In
|
cannam@16
|
101 Proceedings of the International Computer Music Conference (ICMC'07),
|
cannam@16
|
102 August 2007.
|
cannam@16
|
103 </p>
|
cannam@16
|
104 <p><b>Percussion onset detector</b>: D. Barry, D. Fitzgerald, E. Coyle and
|
cannam@16
|
105 B. Lawlor, <i><a href="http://eleceng.dit.ie/papers/15.pdf">Drum Source Separation using Percussive Feature Detection and Spectral Modulation</a></i>. ISSC 2005.
|
cannam@16
|
106 </p>
|
cannam@16
|
107 <p>The Note Onset Detector Vamp plugin was written by Chris Duxbury, Juan
|
cannam@16
|
108 Pablo Bello and Christian Landone.
|
cannam@16
|
109 </p>
|
cannam@29
|
110
|
cannam@16
|
111 <a name="qm-tempotracker"></a><h2>2. Tempo and Beat Tracker</h2>
|
cannam@16
|
112
|
cannam@16
|
113 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-tempotracker</code>
|
cannam@16
|
114 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tempotracker">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tempotracker</a>
|
Chris@109
|
115 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/files">Download location</a>
|
cannam@16
|
116 </p>
|
cannam@16
|
117 <p>Tempo and Beat Tracker analyses a single channel of audio and
|
cannam@16
|
118 estimates the positions of metrical beats within the music (the
|
cannam@16
|
119 equivalent of a human listener tapping their foot to the beat).
|
cannam@16
|
120 </p>
|
cannam@16
|
121 <h3>Parameters</h3>
|
cannam@16
|
122
|
cannam@46
|
123 <p><b>Beat Tracking Method</b> – The method used to track beats. The default, "New", uses a hybrid of the "Old" two-state beat tracking model
|
cannam@46
|
124 (see reference Davies 2007) and a dynamic programming method (see reference
|
cannam@46
|
125 Ellis 2007). A more detailed description is given below within the Bar and
|
cannam@46
|
126 Beat Tracker plugin. </p>
|
cannam@29
|
127
|
cannam@29
|
128 <p><b>Onset Detection Function Type</b> – The algorithm used to calculate the
|
cannam@16
|
129 onset likelihood function. The most versatile method is the default,
|
cannam@16
|
130 "Complex Domain" (see reference, Duxbury et al 2003). "Spectral
|
cannam@16
|
131 Difference" may be appropriate for percussive recordings, "Phase
|
cannam@16
|
132 Deviation" for non-percussive music, and "Broadband Energy Rise" (see
|
cannam@16
|
133 reference, Barry et al 2005) for identifying percussive onsets in
|
cannam@16
|
134 mixed music.
|
cannam@16
|
135 </p>
|
cannam@16
|
136 <p><b>Adaptive Whitening</b> – This option evens out the temporal and
|
cannam@16
|
137 frequency variation in the signal, which can yield improved
|
cannam@16
|
138 performance in onset detection, for example in audio with big
|
cannam@16
|
139 variations in dynamics.
|
cannam@16
|
140 </p>
|
cannam@16
|
141 <h3>Outputs</h3>
|
cannam@16
|
142
|
cannam@16
|
143 <p><b>Beats</b> – The estimated beat locations, returned as a single feature,
|
cannam@16
|
144 with timestamp but no value, for each beat, labelled with the
|
cannam@16
|
145 corresponding estimated tempo at that beat.
|
cannam@16
|
146 </p>
|
cannam@16
|
147 <p><b>Onset Detection Function</b> – The raw note onset likelihood function
|
cannam@16
|
148 used in beat estimation.
|
cannam@16
|
149 </p>
|
cannam@16
|
150 <p><b>Tempo</b> – The estimated tempo, returned as a feature each time the
|
cannam@16
|
151 estimated tempo changes, with a single value for the tempo in beats
|
cannam@16
|
152 per minute.
|
cannam@16
|
153 </p>
|
cannam@16
|
154 <h3>References and Credits</h3>
|
cannam@16
|
155
|
cannam@16
|
156 <p><b>Beat tracking method</b>: M. E. P. Davies and M. D. Plumbley.
|
cannam@16
|
157 <i><a href="http://www.elec.qmul.ac.uk/people/markp/2007/DaviesPlumbley07-taslp.pdf">Context-dependent beat tracking of musical audio</a></i>. In IEEE
|
cannam@16
|
158 Transactions on Audio, Speech and Language Processing. Vol. 15, No. 3,
|
cannam@46
|
159 pp1009-1020, 2007;<br>M. E. P. Davies and M. D. Plumbley.
|
cannam@16
|
160 <i><a href="http://www.elec.qmul.ac.uk/people/markp/2005/DaviesPlumbley05-icassp.pdf">Beat Tracking With A Two State Model</a></i>. In Proceedings of the IEEE
|
cannam@16
|
161 International Conference on Acoustics, Speech and Signal Processing
|
cannam@46
|
162 (ICASSP 2005), Vol. 3, pp241-244 Philadelphia, USA, March 19-23, 2005;
|
cannam@46
|
163 <br>D. P. W. Ellis. <i>Beat Tracking by Dynamic
|
cannam@46
|
164 Programming</i>. In Journal of New Music Research. Vol. 37, No. 1,
|
cannam@46
|
165 pp51-60, 2007.
|
cannam@16
|
166 </p>
|
cannam@16
|
167 <p><b>Onset detection methods</b>: C. Duxbury, J. P. Bello, M. Davies and
|
cannam@16
|
168 M. Sandler, <i><a href="http://www.elec.qmul.ac.uk/dafx03/proceedings/pdfs/dafx81.pdf">Complex domain Onset Detection for Musical Signals</a></i>. In
|
cannam@16
|
169 Proceedings of the 6th Conference on Digital Audio Effects
|
cannam@16
|
170 (DAFx-03). London, UK. September 2003.
|
cannam@16
|
171 </p>
|
cannam@16
|
172 <p><b>Adaptive whitening</b>: D. Stowell and M. D. Plumbley, <i><a href="http://www.elec.qmul.ac.uk/digitalmusic/papers/2007/StowellPlumbley07-icmc.pdf">Adaptive whitening for improved real-time audio onset detection</a></i>. In
|
cannam@16
|
173 Proceedings of the International Computer Music Conference (ICMC'07),
|
cannam@16
|
174 August 2007.
|
cannam@16
|
175 </p>
|
cannam@16
|
176 <p><b>Percussion onset detector</b>: D. Barry, D. Fitzgerald, E. Coyle and
|
cannam@16
|
177 B. Lawlor, <i><a href="http://eleceng.dit.ie/papers/15.pdf">Drum Source Separation using Percussive Feature Detection and Spectral Modulation</a></i>. ISSC 2005.
|
cannam@16
|
178 </p>
|
cannam@16
|
179 <p>The Tempo and Beat Tracker Vamp plugin was written by Matthew Davies
|
cannam@16
|
180 and Christian Landone.
|
cannam@16
|
181 </p>
|
cannam@29
|
182
|
cannam@29
|
183
|
cannam@29
|
184 <a name="qm-barbeattracker"></a><h2>3. Bar and Beat Tracker</h2>
|
cannam@29
|
185
|
cannam@29
|
186 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-barbeattracker</code>
|
cannam@29
|
187 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-barbeattracker">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-barbeattracker</a>
|
Chris@109
|
188 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/files">Download location</a>
|
cannam@29
|
189 </p>
|
cannam@29
|
190
|
cannam@29
|
191 <p>Bar and Beat Tracker analyses a single channel of audio and
|
cannam@29
|
192 estimates the positions of bar lines and the resulting counted
|
cannam@29
|
193 metrical beat positions within the music (where the first beat of
|
cannam@29
|
194 each bar is "1", the equivalent of counting in time to the music).
|
cannam@29
|
195 It is closely related to the <a href="#qm-tempotracker">Tempo and
|
cannam@29
|
196 Beat Tracker</a>, producing the same results for beat position as
|
cannam@29
|
197 that plugin's "New" beat tracking method.
|
cannam@29
|
198
|
cannam@29
|
199 </p>
|
cannam@29
|
200
|
cannam@29
|
201 <h3>Method</h3>
|
cannam@29
|
202
|
cannam@29
|
203 <p>The plugin first calculates an onset detection function using the
|
cannam@29
|
204 "Complex Domain" method (see <a href="#qm-tempotracker">Tempo and Beat
|
cannam@29
|
205 Tracker</a>).</p>
|
cannam@29
|
206
|
cannam@29
|
207 <p>The beat tracking method performs two passes over the onset
|
cannam@29
|
208 detection function, first to estimate the tempo contour, and then
|
cannam@29
|
209 given the tempo, to recover the beat locations.</p>
|
cannam@29
|
210
|
cannam@29
|
211 <p>To identify the tempo, the onset detection function is partitioned
|
cannam@29
|
212 into 6-second frames with a 1.5-second increment. The autocorrelation
|
cannam@29
|
213 function of each 6-second onset detection function is found and this
|
cannam@29
|
214 is then passed through a perceptually weighted comb filterbank (see
|
cannam@29
|
215 reference Davies 2007). The successive comb filterbank output signals
|
cannam@29
|
216 are grouped together into a matrix of observations of periodicity
|
cannam@29
|
217 through time. The best path of periodicity through these observations
|
cannam@29
|
218 is found using the Viterbi algorithm, where the transition matrix is
|
cannam@29
|
219 defined as a diagonal Gaussian.</p>
|
cannam@29
|
220
|
cannam@29
|
221 <p>Given the estimates of periodicity, the beat locations are recovered
|
cannam@29
|
222 by applying the dynamic programming algorithm (see reference Ellis
|
cannam@29
|
223 2007). This process involves the calculation of a recursive cumulative
|
cannam@29
|
224 score function and backtrace signal. The cumulative score indicates
|
cannam@29
|
225 the likelihood of a beat existing at each sample of the onset
|
cannam@29
|
226 detection function input, and the backtrace gives the location of the
|
cannam@29
|
227 best previous beat given this point in time. Once the cumulative score
|
cannam@29
|
228 and backtrace have been calculated for the whole input signal, the
|
cannam@29
|
229 best path through beat locations is found by recursively sampling the
|
cannam@29
|
230 backtrace signal from the end of the input signal back to the
|
cannam@29
|
231 beginning. See reference Stark et al. 2009 for a description of the
|
cannam@29
|
232 real-time implementation of the beat tracking algorithm.</p>
|
cannam@29
|
233
|
cannam@29
|
234 <p>Once the beat locations have been identified, the plugin makes a
|
cannam@29
|
235 second pass over the input audio signal, partitioning it into beat
|
cannam@29
|
236 synchronous frames. The audio within each beat frame is down-sampled
|
cannam@29
|
237 to give a new sampling frequency of 2.8kHz. A beat-synchronous
|
cannam@29
|
238 spectral representation is then calculated within each frame, from
|
cannam@29
|
239 which a measure of beat spectral difference is calculated using
|
cannam@29
|
240 Jensen-Shannon divergence. The bar boundaries are identified as those
|
cannam@29
|
241 beat transitions leading to most consistent spectral change given the
|
cannam@29
|
242 specified number of beats per bar.</p>
|
cannam@29
|
243
|
cannam@29
|
244 <h3>Parameters</h3>
|
cannam@29
|
245
|
cannam@29
|
246 <p><b>Beats per Bar</b> – The number of beats per bar (or measure). The
|
cannam@29
|
247 plugin assumes that the number of beats per bar is fixed throughout
|
cannam@29
|
248 the music.
|
cannam@29
|
249 </p>
|
cannam@29
|
250 <h3>Outputs</h3>
|
cannam@29
|
251
|
cannam@29
|
252 <p><b>Beats</b> – The estimated beat locations, returned as a single feature,
|
cannam@29
|
253 with timestamp but no value, for each beat, labelled with the
|
cannam@29
|
254 number of that beat within the bar (e.g. consecutively 1, 2, 3, 4 for 4 beats to the bar).
|
cannam@29
|
255 </p>
|
cannam@29
|
256 <p><b>Bars</b> – The estimated bar line locations, returned as a single feature,
|
cannam@29
|
257 with timestamp but no value, for each bar.
|
cannam@29
|
258 </p>
|
cannam@29
|
259 <p><b>Beat Count</b> – The estimated beat locations, returned as a single feature,
|
cannam@29
|
260 with timestamp and a value corresponding to the
|
cannam@29
|
261 number of that beat within the bar. This is similar to the Beats output except that it returns a counting function rather than a series of instants.
|
cannam@29
|
262 </p>
|
cannam@29
|
263 <p><b>Beat Spectral Difference</b> – The new-bar likelihood function used in bar line estimation.
|
cannam@29
|
264 </p>
|
cannam@29
|
265
|
cannam@29
|
266 <h3>References and Credits</h3>
|
cannam@29
|
267
|
cannam@29
|
268 <p><b>Beat tracking method</b>: A. M. Stark, M. E. P. Davies and
|
cannam@29
|
269 M. D. Plumbley. <i>Real-time beat-synchronous analysis of musical
|
cannam@29
|
270 audio</i>. To appear in Proceedings of 12th International Conference
|
cannam@29
|
271 on Digital Audio Effects (DAFx). 2009;<br>M. E. P. Davies and
|
cannam@29
|
272 M. D. Plumbley. <i><a
|
cannam@29
|
273 href="http://www.elec.qmul.ac.uk/people/markp/2007/DaviesPlumbley07-taslp.pdf">Context-dependent
|
cannam@29
|
274 beat tracking of musical audio</a></i>. In IEEE Transactions on
|
cannam@29
|
275 Audio, Speech and Language Processing. Vol. 15, No. 3, pp1009-1020,
|
cannam@29
|
276 2007;<br>D. P. W. Ellis. <i>Beat Tracking by Dynamic
|
cannam@29
|
277 Programming</i>. In Journal of New Music Research. Vol. 37, No. 1,
|
cannam@29
|
278 pp51-60, 2007.</p>
|
cannam@29
|
279
|
cannam@29
|
280 <p><b>Bar finding method</b>: M. E. P. Davies and M. D. Plumbley. <i>A
|
cannam@29
|
281 spectral difference approach to extracting downbeats in musical
|
cannam@29
|
282 audio</i>. In Proceedings of 14th European Signal Processing Conference
|
cannam@29
|
283 (EUSIPCO), Italy, 2006.</p>
|
cannam@29
|
284
|
cannam@29
|
285 <p>The Bar and Beat Tracker Vamp plugin was written by Matthew Davies and Adam Stark.
|
cannam@29
|
286 </p>
|
cannam@29
|
287
|
cannam@29
|
288
|
cannam@29
|
289
|
cannam@29
|
290 <a name="qm-keydetector"></a><h2>4. Key Detector</h2>
|
cannam@16
|
291
|
cannam@16
|
292 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-keydetector</code>
|
cannam@16
|
293 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-keydetector">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-keydetector</a>
|
Chris@109
|
294 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/files">Download location</a>
|
cannam@16
|
295 </p>
|
cannam@16
|
296 <p>Key Detector analyses a single channel of audio and continuously
|
cannam@16
|
297 estimates the key of the music by comparing the degree to which a
|
cannam@16
|
298 block-by-block chromagram correlates to the stored key profiles for
|
cannam@16
|
299 each major and minor key.
|
cannam@16
|
300 </p>
|
cannam@16
|
301 <p>The key profiles are drawn from analysis of Book I of the Well
|
cannam@16
|
302 Tempered Klavier by J S Bach, recorded at A=440 equal temperament.
|
cannam@16
|
303 </p>
|
cannam@16
|
304 <h3>Parameters</h3>
|
cannam@16
|
305
|
cannam@16
|
306 <p><b>Tuning Frequency</b> – The frequency of concert A in the music under
|
cannam@16
|
307 analysis.
|
cannam@16
|
308 </p>
|
cannam@16
|
309 <p><b>Window Length</b> – The number of chroma analysis frames taken into
|
cannam@16
|
310 account for key estimation. This controls how eager the key detector
|
cannam@16
|
311 will be to return short-duration tonal changes as new key changes (the
|
cannam@16
|
312 shorter the window, the more likely it is to detect a new key change).
|
cannam@16
|
313 </p>
|
cannam@16
|
314 <h3>Outputs</h3>
|
cannam@16
|
315
|
cannam@16
|
316 <p><b>Tonic Pitch</b> – The tonic pitch of each estimated key change,
|
cannam@16
|
317 returned as a single-valued feature at the point where the key change
|
cannam@16
|
318 is detected, with value counted from 1 to 12 where C is 1, C# or Db is
|
cannam@16
|
319 2, and so on up to B which is 12.
|
cannam@16
|
320 </p>
|
cannam@16
|
321 <p><b>Key Mode</b> – The major or minor mode of the estimated key, where
|
cannam@16
|
322 major is 0 and minor is 1.
|
cannam@16
|
323 </p>
|
cannam@16
|
324 <p><b>Key</b> – The estimated key for each key change, returned as a
|
cannam@16
|
325 single-valued feature at the point where the key change is detected,
|
cannam@16
|
326 with value counted from 1 to 24 where 1-12 are the major keys and
|
cannam@16
|
327 13-24 are the minor keys, such that C major is 1, C# major is 2, and
|
cannam@16
|
328 so on up to B major which is 12; then C minor is 13, Db minor is 14,
|
cannam@16
|
329 and so on up to B minor which is 24.
|
cannam@16
|
330 </p>
|
cannam@16
|
331 <p><b>Key Strength Plot</b> – A grid representing the ongoing key
|
cannam@16
|
332 "probability" throughout the music. This is returned as a feature for
|
cannam@16
|
333 each chroma frame, containing 25 bins. Bins 1-12 are the major keys
|
cannam@16
|
334 from C upwards; bins 14-25 are the minor keys from C upwards. The
|
cannam@16
|
335 13th bin is unused: it just provides space between the first and
|
cannam@16
|
336 second halves of the feature if displayed in a single plot.
|
cannam@16
|
337 </p>
|
cannam@16
|
338 <p>The outputs are also labelled with pitch or key as text.
|
cannam@16
|
339 </p>
|
cannam@16
|
340 <h3>References and Credits</h3>
|
cannam@16
|
341
|
cannam@16
|
342 <p><b>Method</b>: see K. Noland and M. Sandler. <i><a href="http://www.aes.org/e-lib/browse.cfm?elib=14140">Signal Processing Parameters for Tonality Estimation</a></i>. In Proceedings of Audio Engineering Society
|
cannam@16
|
343 122nd Convention, Vienna, 2007.
|
cannam@16
|
344 </p>
|
cannam@16
|
345 <p>The Key Detector Vamp plugin was written by Katy Noland and Christian
|
cannam@16
|
346 Landone.
|
cannam@16
|
347 </p>
|
cannam@29
|
348
|
cannam@29
|
349 <a name="qm-tonalchange"></a><h2>5. Tonal Change</h2>
|
cannam@16
|
350
|
cannam@16
|
351 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-tonalchange</code>
|
cannam@16
|
352 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tonalchange">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tonalchange</a>
|
Chris@109
|
353 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/files">Download location</a>
|
cannam@16
|
354 </p>
|
cannam@16
|
355 <p>Tonal Change analyses a single channel of audio, detecting harmonic
|
cannam@16
|
356 changes such as chord boundaries.
|
cannam@16
|
357 </p>
|
cannam@16
|
358 <h3>Parameters</h3>
|
cannam@16
|
359
|
cannam@16
|
360 <p><b>Gaussian smoothing</b> – The window length for the internal smoothing
|
cannam@16
|
361 operation, in chroma analysis frames. This controls how eager the
|
cannam@16
|
362 tonal change detector will be to identify very short-term tonal
|
cannam@16
|
363 changes. The default value of 5 is quite short, and may lead to more
|
cannam@16
|
364 (not always meaningful) results being returned; for many purposes a
|
cannam@16
|
365 larger value, closer to the maximum of 20, may be appropriate.
|
cannam@16
|
366 </p>
|
cannam@16
|
367 <p><b>Chromagram minimum pitch</b> – The MIDI pitch value (0-127) of the
|
cannam@16
|
368 minimum pitch included in the internal chromagram analyis.
|
cannam@16
|
369 </p>
|
cannam@16
|
370 <p><b>Chromagram maximum pitch</b> – The MIDI pitch value (0-127) of the
|
cannam@16
|
371 maximum pitch included in the internal chromagram analyis.
|
cannam@16
|
372 </p>
|
cannam@16
|
373 <p><b>Chromagram tuning frequency</b> – The frequency of concert A in the
|
cannam@16
|
374 music under analysis.
|
cannam@16
|
375 </p>
|
cannam@16
|
376 <h3>Outputs</h3>
|
cannam@16
|
377
|
cannam@16
|
378 <p><b>Transform to 6D Tonal Content Space</b> – A representation of the
|
cannam@16
|
379 musical content in a six-dimensional tonal space onto which the
|
cannam@16
|
380 algorithm maps 12-bin chroma vectors extracted from the audio.
|
cannam@16
|
381 </p>
|
cannam@16
|
382 <p><b>Tonal Change Detection Function</b> – A function representing the
|
cannam@16
|
383 estimated likelihood of a tonal change occurring in each spectral
|
cannam@16
|
384 frame.
|
cannam@16
|
385 </p>
|
cannam@16
|
386 <p><b>Tonal Change Positions</b> – The resulting estimated positions of tonal
|
cannam@16
|
387 changes.
|
cannam@16
|
388 </p>
|
cannam@16
|
389 <h3>References and Credits</h3>
|
cannam@16
|
390
|
cannam@16
|
391 <p><b>Method</b>: C. A. Harte, M. Gasser, and M. Sandler. <i><a href="http://portal.acm.org/citation.cfm?id=1178723.1178727">Detecting harmonic change in musical audio</a></i>. In Proceedings of the 1st ACM workshop on
|
cannam@16
|
392 Audio and Music Computing Multimedia, Santa Barbara, 2006.
|
cannam@16
|
393 </p>
|
cannam@29
|
394 <p>The Tonal Change Vamp plugin was written by Chris Harte and Martin
|
cannam@16
|
395 Gasser.
|
cannam@16
|
396 </p>
|
cannam@29
|
397
|
cannam@29
|
398
|
cannam@29
|
399 <a name="qm-adaptivespectrogram"></a><h2>6. Adaptive Spectrogram</h2>
|
cannam@29
|
400
|
cannam@29
|
401 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-adaptivespectrogram</code>
|
cannam@29
|
402 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-adaptivespectrogram">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-adaptivespectrogram</a>
|
Chris@109
|
403 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/files">Download location</a>
|
cannam@29
|
404 </p>
|
cannam@29
|
405
|
cannam@29
|
406 <p>Adaptive Spectrogram produces a composite spectrogram from a set of
|
cannam@29
|
407 series of short-time Fourier transforms at differing resolutions.
|
cannam@29
|
408 Values are selected from these spectrograms by repeated subdivision by
|
cannam@29
|
409 time and frequency in order to maximise an entropy function across
|
cannam@29
|
410 each column.</p>
|
cannam@29
|
411
|
cannam@29
|
412 <h3>Parameters</h3>
|
cannam@29
|
413
|
cannam@29
|
414 <p><b>Number of resolutions</b> – The number of distinct
|
cannam@29
|
415 resolutions to calculate and use. The resolutions will be consecutive
|
cannam@29
|
416 powers of two starting from the smallest resolution specified.</p>
|
cannam@29
|
417
|
cannam@29
|
418 <p><b>Smallest resolution</b> – The smallest of the set of
|
cannam@29
|
419 resolutions to use.</p>
|
cannam@29
|
420
|
cannam@29
|
421 <p><b>Omit alternate resolutions</b> – Causes the plugin to
|
cannam@29
|
422 ignore alternate resolutions (i.e. the smallest resolution multiplied
|
cannam@29
|
423 by 2, 8, 32, etc) when composing a spectrogram. The smallest
|
cannam@29
|
424 resolution specified, and its multiples by 4, 16, etc as applicable,
|
cannam@29
|
425 will be retained. The total number of resolutions actually included
|
cannam@29
|
426 in the resulting spectrogram will therefore be N/2 (for even N) or
|
cannam@29
|
427 (N+1)/2 (for odd N) where N is the value of the "number of
|
cannam@29
|
428 resolutions" parameter. This permits a wider range of resolutions to
|
cannam@29
|
429 be included with less processing, at obvious cost in quality.</p>
|
cannam@29
|
430
|
cannam@29
|
431 <p><b>Multi-threaded processing</b> – Enables multi-threading of
|
cannam@29
|
432 the spectrogram calculation. This usually results in somewhat faster
|
cannam@29
|
433 processing where multiple CPU cores are available.</p>
|
cannam@29
|
434
|
cannam@29
|
435 <p>As an example of the resolution parameters, if the "number of
|
cannam@29
|
436 resolutions" is set to 5, "smallest resolution" to 128, and "omit
|
cannam@29
|
437 alternate resolutions" is not used, the composite spectrogram will be
|
cannam@29
|
438 calculated using spectrograms from 128, 256, 512, 1024, and 2048 point
|
cannam@29
|
439 short-time Fourier transforms (with 50% overlap in each case). With
|
cannam@29
|
440 "omit alternate resolutions" set, the same parameters would result in
|
cannam@29
|
441 spectrograms from 128, 512, and 2048 point STFTs being used.</p>
|
cannam@29
|
442
|
cannam@29
|
443 <h3>References and Credits</h3>
|
cannam@29
|
444
|
cannam@29
|
445 <p><b>Method</b>: X. Wen and M. Sandler. <i><a href="http://scitation.aip.org/getabs/servlet/GetabsServlet?prog=normal&id=ISPECX000003000001000051000001">Composite spectrogram using multiple Fourier transforms</a></i>. IET Signal Processing, 3(1):51-63, 2009.
|
cannam@29
|
446 </p>
|
cannam@29
|
447
|
cannam@29
|
448 <p>The Adaptive Spectrogram Vamp plugin was written by Wen Xue and Chris Cannam.</p>
|
cannam@29
|
449
|
cannam@29
|
450 <a name="qm-transcription"></a><h2>7. Polyphonic Transcription</h2>
|
cannam@29
|
451
|
cannam@29
|
452 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-transcription</code>
|
cannam@29
|
453 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-transcription">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-transcription</a>
|
Chris@109
|
454 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/files">Download location</a>
|
cannam@29
|
455
|
cannam@29
|
456 <p>The Polyphonic Transcription plugin estimates a note transcription
|
cannam@29
|
457 using MIDI pitch values from its input audio, returning a feature for
|
cannam@29
|
458 each note (with timestamp and duration) whose value is the MIDI pitch
|
cannam@29
|
459 number. Velocity is not estimated.</p>
|
cannam@29
|
460
|
cannam@29
|
461 <p>Although the published description of the method is described as
|
cannam@29
|
462 real-time, the implementation used in this plugin is non-causal; it
|
cannam@29
|
463 buffers its input to operate on in a single unit, doing all the real
|
cannam@29
|
464 work after its entire input has been received, and is very memory
|
cannam@29
|
465 intensive. However, it is relatively fast (faster than real-time)
|
cannam@29
|
466 compared to other polyphonic transcription methods.</p>
|
cannam@29
|
467
|
cannam@29
|
468 <p>The plugin works best at 44.1KHz input sample rate, and is tuned for
|
cannam@29
|
469 piano and guitar music.</p>
|
cannam@29
|
470
|
cannam@29
|
471
|
cannam@29
|
472 <h3>References and Credits</h3>
|
cannam@29
|
473
|
cannam@29
|
474 <p><b>Method</b>: R. Zhou and J. D. Reiss. <i>A Real-Time Polyphonic Music Transcription System</i>. In Proceedings of the Fourth Music Information Retrieval Evaluation eXchange (MIREX), Philadelphia, USA, 2008;<br>R. Zhou and J. D. Reiss. <i>A Real-Time Frame-Based Multiple Pitch Estimation Method Using the Resonator Time Frequency Image</i>. Third Music Information Retrieval Evaluation eXchange (MIREX), Vienna, Austria, 2007.</p>
|
cannam@29
|
475
|
cannam@29
|
476 <p>The Polyphonic Transcription Vamp plugin was written by Ruohua Zhou.</p>
|
cannam@29
|
477
|
cannam@29
|
478
|
cannam@29
|
479 <a name="qm-segmenter"></a><h2>8. Segmenter</h2>
|
cannam@16
|
480
|
cannam@16
|
481 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-segmenter</code>
|
cannam@16
|
482 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-segmenter">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-segmenter</a>
|
Chris@109
|
483 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/files">Download location</a>
|
cannam@16
|
484 </p>
|
cannam@16
|
485 <p>Segmenter divides a single channel of music up into structurally
|
cannam@16
|
486 consistent segments. It returns a numeric value (the segment type)
|
cannam@16
|
487 for each moment at which a new segment starts.
|
cannam@16
|
488 </p>
|
cannam@16
|
489 <p>For music with clearly tonally distinguishable sections such as verse,
|
cannam@16
|
490 chorus, etc., segments with the same type may be expected to be
|
cannam@16
|
491 similar to one another in some structural sense. For example,
|
cannam@16
|
492 repetitions of the chorus are likely to share a segment type.
|
cannam@16
|
493 </p>
|
cannam@16
|
494 <p>The plugin only attempts to identify similar segments; it does not
|
cannam@16
|
495 attempt to label them. For example, it makes no attempt to tell you
|
cannam@16
|
496 which segment is the chorus.
|
cannam@16
|
497 </p>
|
cannam@16
|
498 <p>Note that this plugin does a substantial amount of processing after
|
cannam@16
|
499 receiving all of the input audio data, before it produces any results.
|
cannam@16
|
500 </p>
|
cannam@16
|
501 <h3>Method</h3>
|
cannam@16
|
502
|
cannam@16
|
503 <p>The method relies upon structural/timbral similarity to obtain the
|
cannam@16
|
504 high-level song structure. This is based on the assumption that the
|
cannam@16
|
505 distributions of timbre features are similar over corresponding
|
cannam@16
|
506 structural elements of the music.
|
cannam@16
|
507 </p>
|
cannam@16
|
508 <p>The algorithm works by obtaining a frequency-domain representation of
|
cannam@16
|
509 the audio signal using a Constant-Q transform, a Chromagram or
|
cannam@16
|
510 Mel-Frequency Cepstral Coefficients (MFCC) as underlying features (the
|
cannam@16
|
511 particular feature is selectable as a parameter). The extracted
|
cannam@16
|
512 features are normalised in accordance with the MPEG-7 standard (NASE
|
cannam@16
|
513 descriptor), which means the spectrum is converted to decibel scale
|
cannam@16
|
514 and each spectral vector is normalised by the RMS energy envelope.
|
cannam@16
|
515 The value of this envelope is stored for each processing block of
|
cannam@16
|
516 audio. This is followed by the extraction of 20 principal components
|
cannam@16
|
517 per block using PCA, yielding a sequence of 21 dimensional feature
|
cannam@16
|
518 vectors where the last element in each vector corresponds to the
|
cannam@16
|
519 energy envelope.
|
cannam@16
|
520 </p>
|
cannam@16
|
521 <p>A 40-state Hidden Markov Model is then trained on the whole sequence
|
cannam@16
|
522 of features, with each state of the HMM corresponding to a specific
|
cannam@16
|
523 timbre type. This process partitions the timbre-space of a given track
|
cannam@16
|
524 into 40 possible types. The important assumption of the model is that
|
cannam@16
|
525 the distribution of these features remain consistent over a structural
|
cannam@16
|
526 segment. After training and decoding the HMM, the song is assigned a
|
cannam@16
|
527 sequence of timbre-features according to specific timbre-type
|
cannam@16
|
528 distributions for each possible structural segment.
|
cannam@16
|
529 </p>
|
cannam@16
|
530 <p>The segmentation itself is computed by clustering timbre-type
|
cannam@16
|
531 histograms. A series of histograms are created over a sliding window
|
cannam@16
|
532 which are grouped into M clusters by an adapted soft k-means
|
cannam@16
|
533 algorithm. Each of these clusters will correspond to a specific
|
cannam@16
|
534 segment-type of the analyzed song. Reference histograms, iteratively
|
cannam@16
|
535 updated during clustering, describe the timbre distribution for each
|
cannam@16
|
536 segment. The segmentation arises from the final cluster assignments.
|
cannam@16
|
537 </p>
|
cannam@16
|
538 <h3>Parameters</h3>
|
cannam@16
|
539
|
cannam@16
|
540 <p><b>Number of segment-types</b> – The maximum number of clusters
|
cannam@16
|
541 (segment-types) to be returned. The default is 10. Unlike many
|
cannam@16
|
542 clustering algorithms, the constrained clustering used in this plugin
|
cannam@16
|
543 does not produce too many clusters or vary significantly even if this
|
cannam@16
|
544 is set too high. However, this parameter can be useful for limiting
|
cannam@16
|
545 the number of expected segment-types.
|
cannam@16
|
546 </p>
|
cannam@16
|
547 <p><b>Feature Type</b> – The type of spectral feature used for segmentation. The available features are:<ul><li>"Hybrid", the default, which uses a Constant-Q transform (see <a href="#qm-constantq">related
|
cannam@16
|
548 plugin</a>): this is generally effective for modern studio recordings;</li><li> "Chromatic", using a chromagram derived from the Constant-Q feature (see <a href="#qm-chromagram">related plugin</a>): this may be preferable for live, acoustic, or older recordings, in which repeated sections may be less consistent in
|
cannam@16
|
549 sound;</li><li>"Timbral", using Mel-Frequency
|
cannam@16
|
550 Cepstral Coefficients (see <a href="#qm-mfcc">related plugin</a>), which is more likely to
|
cannam@16
|
551 result in classification by instrumentation rather than musical
|
cannam@16
|
552 content.</li></ul>
|
cannam@16
|
553 </p>
|
cannam@16
|
554 <p><b>Minimum segment duration</b> – The approximate expected minimum
|
cannam@16
|
555 duration for a segment, from 1 to 15 seconds. Changing this parameter
|
cannam@16
|
556 may help the plugin to find musical sections rather than just
|
cannam@16
|
557 following changes in the sound of the music, and also avoid wasting a
|
cannam@16
|
558 segment-type cluster for timbrally distinct but too-short segments.
|
cannam@16
|
559 The default of 4 seconds usually produces good results.
|
cannam@16
|
560 </p>
|
cannam@16
|
561 <h3>Outputs</h3>
|
cannam@16
|
562
|
cannam@16
|
563 <p><b>Segmentation</b> – The estimated segment boundaries, returned as a
|
cannam@16
|
564 single feature with one value at each segment boundary, with the value
|
cannam@16
|
565 representing the segment type number for the segment starting at that
|
cannam@16
|
566 boundary.
|
cannam@16
|
567 </p>
|
cannam@16
|
568 <h3>References and Credits</h3>
|
cannam@16
|
569
|
cannam@16
|
570 <p><b>Method</b>: M. Levy and M. Sandler. <i><a href="http://ieeexplore.ieee.org/iel5/10376/4432632/04432648.pdf?arnumber=4432648">Structural segmentation of musical audio by constrained clustering</a></i>. IEEE Transactions on Audio, Speech, and Language Processing, February 2008.
|
cannam@16
|
571 </p>
|
cannam@16
|
572 <p>Note that this plugin does not implement the beat-sychronous aspect
|
cannam@16
|
573 of the segmentation method described in the paper.
|
cannam@16
|
574 </p>
|
cannam@16
|
575 <p>The Segmenter Vamp plugin was written by Mark Levy. Thanks to George
|
cannam@16
|
576 Fazekas for providing much of this documentation.
|
cannam@16
|
577 </p>
|
cannam@29
|
578 <a name="qm-similarity"></a><h2>9. Similarity</h2>
|
cannam@16
|
579
|
cannam@16
|
580 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-similarity</code>
|
cannam@16
|
581 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-similarity">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-similarity</a>
|
Chris@109
|
582 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/files">Download location</a>
|
cannam@16
|
583 </p>
|
cannam@16
|
584 <p>Similarity treats each channel of its audio input as a separate
|
cannam@16
|
585 "track", and estimates how similar the tracks are to one another using
|
cannam@16
|
586 a selectable similarity measure.
|
cannam@16
|
587 </p>
|
cannam@16
|
588 <p>The plugin also returns the intermediate data used as a basis of the
|
cannam@16
|
589 similarity measure; it can therefore be used on a single channel of
|
cannam@16
|
590 input (with the resulting intermediate data then being applied in some
|
cannam@16
|
591 other similarity or clustering algorithm, for example) if desired, as
|
cannam@16
|
592 well as with multiple inputs.
|
cannam@16
|
593 </p>
|
cannam@16
|
594 <p>Because of the way this plugin handles multiple inputs, by assuming
|
cannam@16
|
595 that each channel represents a separate piece of music, it may not be
|
cannam@16
|
596 appropriate for use directly in a general-purpose host (unless you
|
cannam@16
|
597 actually want to do something like compare two stereo channels for
|
cannam@16
|
598 timbral similarity, which is unlikely).
|
cannam@16
|
599 </p>
|
cannam@16
|
600 <h3>Parameters</h3>
|
cannam@16
|
601
|
cannam@16
|
602 <p><b>Feature Type</b> – The underlying audio feature used for the similarity
|
cannam@16
|
603 measure. The available features are:
|
cannam@16
|
604 <ul><li>"Timbre", in which the distance
|
cannam@16
|
605 between tracks is a symmetrised Kullback-Leibler divergence between
|
cannam@16
|
606 Gaussian-modelled MFCC means and variances across each track, for the
|
cannam@16
|
607 first 20 MFCCs including C0 (see <a href="#qm-mfcc">related plugin</a>);</li><li>"Chroma", which uses Kullback-Leibler divergence of
|
cannam@16
|
608 mean chroma histogram (see <a href="#qm-chromagram">related plugin</a>);</li><li>"Rhythm", using the cosine distance between
|
cannam@16
|
609 "beat spectrum" measures derived from a short sampled section of the
|
cannam@16
|
610 track;</li><li>and combined "Timbre and Rhythm" and "Chroma and Rhythm"
|
cannam@16
|
611 features.</li></ul>
|
cannam@16
|
612 </p>
|
cannam@16
|
613 <h3>Outputs</h3>
|
cannam@16
|
614
|
cannam@16
|
615 <p><b>Distance Matrix</b> – A matrix of the distance measures between input
|
cannam@16
|
616 channels, returned as a series of vector features timestamped at
|
cannam@16
|
617 one-second intervals. The distance from channel i to channel j
|
cannam@16
|
618 appears as the j'th bin of the feature at time i.
|
cannam@16
|
619 </p>
|
cannam@16
|
620 <p><b>Distance from First Channel</b> – A single vector feature, timestamped
|
cannam@16
|
621 at time zero, containing the distances between the first input channel
|
cannam@16
|
622 and each of the input channels (including the first channel itself at
|
cannam@16
|
623 bin 0, which should have zero distance).
|
cannam@16
|
624 </p>
|
cannam@16
|
625 <p><b>Ordered Distances from First Channel</b> – A pair of vector features,
|
cannam@16
|
626 at times 0 and 1 second. The feature at time 0 contains the 1-based
|
cannam@16
|
627 indices of the input channels in the order of similarity to the first
|
cannam@16
|
628 input channel (so its first bin should always contain 1, as the first
|
cannam@16
|
629 channel is most similar to itself). The feature at time 1 contains,
|
cannam@16
|
630 in bin n, the distance between the first input channel and the channel
|
cannam@16
|
631 with index found at bin n of the feature at time 0.
|
cannam@16
|
632 </p>
|
cannam@16
|
633 <p><b>Feature Means</b> – A series of vector features containing the mean
|
cannam@16
|
634 values of each of the feature bins across the duration of each of the
|
cannam@16
|
635 input channels. This output returns one feature for each input
|
cannam@16
|
636 channel, timestamped at one-second intervals. The number of bins for
|
cannam@16
|
637 each feature depends on the feature type; it will be 20 for MFCC
|
cannam@16
|
638 features and 12 for chroma features. No features will be returned on
|
cannam@16
|
639 this output if the feature type is purely rhythmic.
|
cannam@16
|
640 </p>
|
cannam@16
|
641 <p><b>Feature Variances</b> – Just as Feature Means, but variances.
|
cannam@16
|
642 </p>
|
cannam@16
|
643 <p><b>Beat Spectra</b> – A series of vector features containing the rhythmic
|
cannam@16
|
644 autocorrelation profiles (beat spectra) for each of the input
|
cannam@16
|
645 channels. This output returns one 512-bin feature for each input
|
cannam@16
|
646 channel, timestamped at one-second intervals. No features will be
|
cannam@16
|
647 returned on this output if the feature type contains no rhythm
|
cannam@16
|
648 component.
|
cannam@16
|
649 </p>
|
cannam@16
|
650 <h3>References and Credits</h3>
|
cannam@16
|
651
|
cannam@16
|
652 <p><b>Timbral similarity</b>: M. Levy and M. Sandler. <i><a href="http://www.elec.qmul.ac.uk/easaier/papers/mlevytimbralsimilarity.pdf">Lightweight measures for timbral similarity of musical audio</a></i>. In Proceedings of the 1st
|
cannam@16
|
653 ACM workshop on Audio and Music Computing Multimedia, Santa Barbara,
|
cannam@16
|
654 2006.
|
cannam@16
|
655 </p>
|
cannam@16
|
656 <p><b>Combined rhythmic and timbral similarity</b>: K. Jacobson. <i><a href="http://ismir2006.ismir.net/PAPERS/ISMIR0696_Paper.pdf">A Multifaceted Approach to Music Similarity</a></i>. In Proceedings of the
|
cannam@16
|
657 Seventh International Conference on Music Information Retrieval
|
cannam@16
|
658 (ISMIR), 2006.
|
cannam@16
|
659 </p>
|
cannam@16
|
660 <p>The Similarity Vamp plugin was written by Mark Levy, Kurt Jacobson and
|
cannam@16
|
661 Chris Cannam.
|
cannam@16
|
662 </p>
|
cannam@29
|
663
|
cannam@29
|
664
|
cannam@29
|
665 <a name="qm-dwt"></a><h2>10. Discrete Wavelet Transform</h2>
|
cannam@29
|
666
|
cannam@29
|
667 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-dwt</code>
|
cannam@29
|
668 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-dwt">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-dwt</a>
|
Chris@109
|
669 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/files">Download location</a>
|
cannam@29
|
670
|
cannam@29
|
671 <p>Discrete Wavelet Transform plugin performs the forward DWT on the
|
cannam@29
|
672 signal. The wavelet coefficients are derived from a fast segmented DWT
|
cannam@29
|
673 algorithm without block end effects. The DWT can be performed with
|
cannam@29
|
674 various functions from a selection of wavelets up to the 16th scale.<p>
|
cannam@29
|
675
|
cannam@29
|
676 <p>The wavelet coefficients are returned as feature columns at a rate of
|
cannam@29
|
677 half the sample rate of the signal to be analysed. To simulate
|
cannam@29
|
678 multiresolution in the layer data table, the coefficient values at
|
cannam@29
|
679 higher scales are copied multiple times according to the number of the
|
cannam@29
|
680 scale. For example, for scale 2 each value will appear twice, at scale
|
cannam@29
|
681 3 they will be appear four times, at scale 4 there will be 8 times the
|
cannam@29
|
682 same coefficient value in order to simulate the lower resolution at
|
cannam@29
|
683 higher scales.</p>
|
cannam@29
|
684
|
cannam@29
|
685 <h3>Parameters</h3>
|
cannam@29
|
686
|
cannam@29
|
687 <p><b>Scales</b> – Adjusts the number of scales of the DWT. The
|
cannam@29
|
688 processing block size needs to be set to at least 2<sup>n</sup>, where n =
|
cannam@29
|
689 number of scales.</p>
|
cannam@29
|
690
|
cannam@29
|
691 <p><b>Wavelet</b> – Selects the wavelet function to be used for
|
cannam@29
|
692 the transform. Wavelets from the following families are available:
|
cannam@29
|
693 Daubechies, Symlets, Coiflets, Biorthogonal, Meyer.</p>
|
cannam@29
|
694
|
cannam@29
|
695 <h3>References and Credits</h3>
|
cannam@29
|
696
|
cannam@29
|
697 <p><b>Principles</b>: S. Mallat. <i>A theory for multiresolution signal decomposition: the wavelet representation</i>. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 11 (1989), pp. 674-693;<br>
|
cannam@29
|
698 P. Rajmic and J. Vlach. <i>Real-Time Audio Processing via Segmented Wavelet Transform</i>. In Proceedings of the 10th Int. Conference on Digital Audio Effects (DAFx-07), Bordeaux, France, September 10-15, 2007.</p>
|
cannam@29
|
699
|
cannam@29
|
700 <p>The Discrete Wavelet Transform plugin was written by Thomas Wilmering.</p>
|
cannam@29
|
701
|
cannam@29
|
702 <a name="qm-constantq"></a><h2>11. Constant-Q Spectrogram</h2>
|
cannam@16
|
703
|
cannam@16
|
704 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-constantq</code>
|
cannam@16
|
705 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-constantq">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-constantq</a>
|
Chris@109
|
706 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/files">Download location</a>
|
cannam@16
|
707 </p>
|
cannam@16
|
708 <p>Constant-Q Spectrogram calculates a spectrogram based on a short-time
|
cannam@16
|
709 windowed constant Q spectral transform. This is a spectrogram in
|
cannam@16
|
710 which the ratio of centre frequency to resolution is constant for each
|
cannam@16
|
711 frequency bin. The frequency bins correspond to the frequencies of
|
cannam@16
|
712 "musical notes" rather than being linearly spaced in frequency as they
|
cannam@16
|
713 are for the conventional DFT spectrogram.
|
cannam@16
|
714 </p>
|
cannam@16
|
715 <p>The pitch range and the number of frequency bins per octave may be
|
cannam@16
|
716 adjusted using the plugin's parameters. Note that the plugin's
|
cannam@16
|
717 preferred step and block sizes are defined by these parameters, and
|
cannam@16
|
718 the plugin will not accept any other block size than its preferred
|
cannam@16
|
719 value.
|
cannam@16
|
720 </p>
|
cannam@16
|
721 <h3>Parameters</h3>
|
cannam@16
|
722
|
cannam@16
|
723 <p><b>Minimum Pitch</b> – The MIDI pitch value (0-127) corresponding to the lowest
|
cannam@16
|
724 frequency to be included in the constant-Q transform.
|
cannam@16
|
725 </p>
|
cannam@16
|
726 <p><b>Maximum Pitch</b> – The MIDI pitch value (0-127) corresponding to the
|
cannam@16
|
727 lowest frequency to be included in the constant-Q transform.
|
cannam@16
|
728 </p>
|
cannam@16
|
729 <p><b>Tuning Frequency</b> – The frequency of concert A in the
|
cannam@16
|
730 music under analysis.
|
cannam@16
|
731 </p>
|
cannam@16
|
732 <p><b>Bins per Octave</b> – The number of constant-Q transform bins to be
|
cannam@16
|
733 computed per octave.
|
cannam@16
|
734 </p>
|
cannam@16
|
735 <p><b>Normalized</b> – Whether to normalize each output column to unit
|
cannam@16
|
736 maximum.
|
cannam@16
|
737 </p>
|
cannam@16
|
738 <h3>Outputs</h3>
|
cannam@16
|
739
|
cannam@16
|
740 <p><b>Constant-Q Spectrogram</b> – The calculated spectrogram, as a single
|
cannam@16
|
741 feature per process block containing one bin for each pitch included
|
cannam@16
|
742 in the spectrogram's range.
|
cannam@16
|
743 </p>
|
cannam@16
|
744 <h3>References and Credits</h3>
|
cannam@16
|
745
|
cannam@16
|
746 <p><b>Principle</b>: J. Brown. <i><a href="http://www.wellesley.edu/Physics/brown/pubs/cq1stPaper.pdf">Calculation of a constant Q spectral transform</a></i>. Journal of the Acoustical Society of America, 89(1):
|
cannam@16
|
747 425-434, 1991.
|
cannam@16
|
748 </p>
|
cannam@16
|
749 <p>The Constant-Q Spectrogram Vamp plugin was written by Christian
|
cannam@16
|
750 Landone.
|
cannam@16
|
751 </p>
|
cannam@29
|
752 <a name="qm-chromagram"></a><h2>12. Chromagram</h2>
|
cannam@16
|
753
|
cannam@16
|
754 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-chromagram</code>
|
cannam@16
|
755 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-chromagram">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-chromagram</a>
|
Chris@109
|
756 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/files">Download location</a>
|
cannam@16
|
757 </p>
|
cannam@16
|
758 <p>Chromagram calculates a constant Q spectral transform (as in the
|
cannam@16
|
759 Constant Q Spectrogram plugin) and then wraps the frequency bin values
|
cannam@16
|
760 into a single octave, with each bin containing the sum of the
|
cannam@16
|
761 magnitudes from the corresponding bin in all octaves. The number of
|
cannam@16
|
762 values in each feature vector returned by the plugin is therefore the
|
cannam@16
|
763 same as the number of bins per octave configured for the underlying
|
cannam@16
|
764 constant Q transform.
|
cannam@16
|
765 </p>
|
cannam@16
|
766 <p>The pitch range and the number of frequency bins per octave for the
|
cannam@16
|
767 transform may be adjusted using the plugin's parameters. Note that
|
cannam@16
|
768 the plugin's preferred step and block sizes depend on these
|
cannam@16
|
769 parameters, and the plugin will not accept any other block size than
|
cannam@16
|
770 its preferred value.
|
cannam@16
|
771 </p>
|
cannam@16
|
772 <h3>Parameters</h3>
|
cannam@16
|
773
|
cannam@16
|
774 <p><b>Minimum Pitch</b> – The MIDI pitch value (0-127) corresponding to the
|
cannam@16
|
775 lowest frequency to be included in the constant-Q transform used in
|
cannam@16
|
776 calculating the chromagram.
|
cannam@16
|
777 </p>
|
cannam@16
|
778 <p><b>Maximum Pitch</b> – The MIDI pitch value (0-127) corresponding to the
|
cannam@16
|
779 lowest frequency to be included in the constant-Q transform used in
|
cannam@16
|
780 calculating the chromagram.
|
cannam@16
|
781 </p>
|
cannam@16
|
782 <p><b>Tuning Frequency</b> – The frequency of concert A in the
|
cannam@16
|
783 music under analysis.
|
cannam@16
|
784 </p>
|
cannam@16
|
785 <p><b>Bins per Octave</b> – The number of constant-Q transform bins to be
|
cannam@16
|
786 computed per octave, and thus the total number of bins present in the
|
cannam@16
|
787 resulting chromagram.
|
cannam@16
|
788 </p>
|
cannam@16
|
789 <p><b>Normalized</b> – Whether to normalize each output column. Normalization
|
cannam@16
|
790 may be to unit sum or unit maximum.
|
cannam@16
|
791 </p>
|
cannam@16
|
792 <h3>Outputs</h3>
|
cannam@16
|
793
|
cannam@16
|
794 <p><b>Chromagram</b> – The calculated chromagram, as a single feature per
|
cannam@16
|
795 process block containing the number of bins given in the bins per
|
cannam@16
|
796 octave parameter.
|
cannam@16
|
797 </p>
|
cannam@16
|
798 <h3>References and Credits</h3>
|
cannam@16
|
799
|
cannam@16
|
800 <p>The Chromagram Vamp plugin was written by Christian Landone.
|
cannam@16
|
801 </p>
|
cannam@29
|
802 <a name="qm-mfcc"></a><h2>13. Mel-Frequency Cepstral Coefficients</h2>
|
cannam@16
|
803
|
cannam@16
|
804 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-mfcc</code>
|
cannam@16
|
805 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-mfcc">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-mfcc</a>
|
Chris@109
|
806 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/files">Download location</a>
|
cannam@16
|
807 </p>
|
cannam@16
|
808 <p>Mel-Frequency Cepstral Coefficients calculates MFCCs from a single
|
cannam@16
|
809 channel of audio. These coefficients, derived from a cosine transform
|
cannam@16
|
810 of the mapping of an audio spectrum onto a frequency scale modelled on
|
cannam@16
|
811 human auditory response, are widely used in speech recognition, music
|
cannam@16
|
812 classification and other tasks.
|
cannam@16
|
813 </p>
|
cannam@16
|
814 <h3>Parameters</h3>
|
cannam@16
|
815
|
cannam@16
|
816 <p><b>Number of Coefficients</b> – The number of MFCCs to return. Commonly
|
cannam@16
|
817 used values include 13 or the default 20. This number includes C0 if
|
cannam@16
|
818 requested (see Include C0 below).
|
cannam@16
|
819 </p>
|
cannam@16
|
820 <p><b>Power for Mel Amplitude Logs</b> – An optional power value to which the
|
cannam@16
|
821 spectral amplitudes should be raised before applying the cosine
|
cannam@16
|
822 transform. Values greater than 1 may in principle reduce the
|
cannam@16
|
823 contribution of noise to the results. The default is 1.
|
cannam@16
|
824 </p>
|
cannam@16
|
825 <p><b>Include C0</b> – Whether to include the "zero'th" coefficient, which
|
cannam@16
|
826 simply reflects the overall signal power across the Mel frequency
|
cannam@16
|
827 bands.
|
cannam@16
|
828 </p>
|
cannam@16
|
829 <h3>Outputs</h3>
|
cannam@16
|
830
|
cannam@16
|
831 <p><b>Coefficients</b> – The MFCC values, returned as one vector feature per
|
cannam@16
|
832 processing block.
|
cannam@16
|
833 </p>
|
cannam@16
|
834 <p><b>Means of Coefficients</b> – The overall means of the MFCC bins, as a
|
cannam@16
|
835 single vector feature with time 0 that is returned when processing is
|
cannam@16
|
836 complete.
|
cannam@16
|
837 </p>
|
cannam@16
|
838 <h3>References and Credits</h3>
|
cannam@16
|
839
|
cannam@16
|
840 <p><b>MFCCs in music</b>: See B. Logan. <i><a href="http://ismir2000.ismir.net/papers/logan_paper.pdf">Mel-Frequency Cepstral Coefficients for Music Modeling</a></i>. In Proceedings of the First International
|
cannam@16
|
841 Symposium on Music Information Retrieval (ISMIR), 2000.
|
cannam@16
|
842 </p>
|
cannam@16
|
843 <p>The Mel-Frequency Cepstral Coefficients Vamp plugin was written by
|
cannam@16
|
844 Nicolas Chetry and Chris Cannam.
|
cannam@16
|
845 </p>
|
cannam@16
|
846 <p></p>
|
cannam@16
|
847 </CONTENTS>
|
cannam@16
|
848 </body>
|
cannam@16
|
849 </html>
|