Mercurial > hg > vamp-website
comparison plugin-doc/qm-vamp-plugins.html @ 16:16f8de0dc974 website
* Add doc for QM plugins
author | cannam |
---|---|
date | Fri, 21 Nov 2008 11:41:45 +0000 |
parents | |
children | 90a1fa18d239 |
comparison
equal
deleted
inserted
replaced
15:c57ba57f33fa | 16:16f8de0dc974 |
---|---|
1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> | |
2 <html> | |
3 <head> | |
4 <link rel="stylesheet" media="screen" type="text/css" href="/screen.css"/> | |
5 <link rel="icon" type="image/png" href="/images/waveform.png"/> | |
6 <link rel="shortcut" type="image/png" href="/images/waveform.png"/> | |
7 <title>QM Vamp Plugins: User Documentation</title> | |
8 <meta name="robots" content="index"/> | |
9 </head> | |
10 <body> | |
11 <h1 id="header"><span>Vamp Plugins</span></h1> | |
12 | |
13 <h2>QM Vamp Plugins</h2> | |
14 | |
15 <p>The QM Vamp Plugin set is a library of Vamp audio feature | |
16 extraction plugins developed at the <a | |
17 href="http://www.elec.qmul.ac.uk/digitalmusic/">Centre for Digital | |
18 Music</a> at Queen Mary, University of London. These plugins are | |
19 provided as a single library file, made available in binary form for | |
20 Windows, OS/X, and Linux from the Centre for Digital Music's <a | |
21 href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">download | |
22 page</a>. | |
23 </p> | |
24 <p>For more information about Vamp plugins, see <a href="http://www.vamp-plugins.org/">http://www.vamp-plugins.org/</a> . | |
25 </p> | |
26 | |
27 <div class="toc2">1. <a href="#qm-onsetdetector">Note Onset Detector</a></div> | |
28 <div class="toc2">2. <a href="#qm-tempotracker">Tempo and Beat Tracker</a></div> | |
29 <div class="toc2">3. <a href="#qm-keydetector">Key Detector</a></div> | |
30 <div class="toc2">4. <a href="#qm-tonalchange">Tonal Change</a></div> | |
31 <div class="toc2">5. <a href="#qm-segmenter">Segmenter</a></div> | |
32 <div class="toc2">6. <a href="#qm-similarity">Similarity</a></div> | |
33 <div class="toc2">7. <a href="#qm-constantq">Constant-Q Spectrogram</a></div> | |
34 <div class="toc2">8. <a href="#qm-chromagram">Chromagram</a></div> | |
35 <div class="toc2">9. <a href="#qm-mfcc">Mel-Frequency Cepstral Coefficients</a></div> | |
36 | |
37 <a name="qm-onsetdetector"></a><a name="qm-"></a><h2>1. Note Onset Detector</h2> | |
38 | |
39 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-onsetdetector</code> | |
40 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-onsetdetector">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-onsetdetector</a> | |
41 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a> | |
42 </p> | |
43 <p>Note Onset Detector analyses a single channel of audio and estimates | |
44 the onset times of notes within the music – that is, the times at | |
45 which notes and other audible events begin. | |
46 </p> | |
47 <p>It calculates an onset likelihood function for each spectral frame, | |
48 and picks peaks in a smoothed version of this function. The plugin is | |
49 non-causal, returning all results at the end of processing. | |
50 </p> | |
51 <h3>Parameters</h3> | |
52 | |
53 <p><b>Onset Detection Function Type</b> – The method used to calculate the | |
54 onset likelihood function. The most versatile method is the default, | |
55 "Complex Domain" (see reference, Duxbury et al 2003). "Spectral | |
56 Difference" may be appropriate for percussive recordings, "Phase | |
57 Deviation" for non-percussive music, and "Broadband Energy Rise" (see | |
58 reference, Barry et al 2005) for identifying percussive onsets in | |
59 mixed music. | |
60 </p> | |
61 <p><b>Onset Detector Sensitivity</b> – Sensitivity level for peak detection | |
62 in the onset likelihood function. The higher the sensitivity, the | |
63 more onsets will (rightly or wrongly) be detected. The peak picker | |
64 does not have a simple threshold level; instead, this parameter | |
65 controls the required "steepness" of the slopes in the smoothed | |
66 detection function either side of a peak value, in order for that peak | |
67 to be accepted as an onset. | |
68 </p> | |
69 <p><b>Adaptive Whitening</b> – This option evens out the temporal and | |
70 frequency variation in the signal, which can yield improved | |
71 performance in onset detection, for example in audio with big | |
72 variations in dynamics. | |
73 </p> | |
74 <h3>Outputs</h3> | |
75 | |
76 <p><b>Note Onsets</b> – The detected note onset times, returned as a single | |
77 feature with timestamp but no value for each detected note. | |
78 </p> | |
79 <p><b>Onset Detection Function</b> – The raw note onset likelihood function | |
80 that was calculated as the first step of the detection process. | |
81 </p> | |
82 <p><b>Smoothed Detection Function</b> – The note onset likelihood function | |
83 following median filtering. This is the function from which | |
84 sufficiently steep peak values are picked and classified as onsets. | |
85 </p> | |
86 <h3>References and Credits</h3> | |
87 | |
88 <p><b>Basic detection methods</b>: C. Duxbury, J. P. Bello, M. Davies and | |
89 M. Sandler, <i><a href="http://www.elec.qmul.ac.uk/dafx03/proceedings/pdfs/dafx81.pdf">Complex domain Onset Detection for Musical Signals</a></i>. In | |
90 Proceedings of the 6th Conference on Digital Audio Effects | |
91 (DAFx-03). London, UK. September 2003. | |
92 </p> | |
93 <p><b>Adaptive whitening</b>: D. Stowell and M. D. Plumbley, <i><a href="http://www.elec.qmul.ac.uk/digitalmusic/papers/2007/StowellPlumbley07-icmc.pdf">Adaptive whitening for improved real-time audio onset detection</a></i>. In | |
94 Proceedings of the International Computer Music Conference (ICMC'07), | |
95 August 2007. | |
96 </p> | |
97 <p><b>Percussion onset detector</b>: D. Barry, D. Fitzgerald, E. Coyle and | |
98 B. Lawlor, <i><a href="http://eleceng.dit.ie/papers/15.pdf">Drum Source Separation using Percussive Feature Detection and Spectral Modulation</a></i>. ISSC 2005. | |
99 </p> | |
100 <p>The Note Onset Detector Vamp plugin was written by Chris Duxbury, Juan | |
101 Pablo Bello and Christian Landone. | |
102 </p> | |
103 <a name="qm-tempotracker"></a><h2>2. Tempo and Beat Tracker</h2> | |
104 | |
105 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-tempotracker</code> | |
106 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tempotracker">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tempotracker</a> | |
107 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a> | |
108 </p> | |
109 <p>Tempo and Beat Tracker analyses a single channel of audio and | |
110 estimates the positions of metrical beats within the music (the | |
111 equivalent of a human listener tapping their foot to the beat). | |
112 </p> | |
113 <h3>Parameters</h3> | |
114 | |
115 <p><b>Onset Detection Function Type</b> – The method used to calculate the | |
116 onset likelihood function. The most versatile method is the default, | |
117 "Complex Domain" (see reference, Duxbury et al 2003). "Spectral | |
118 Difference" may be appropriate for percussive recordings, "Phase | |
119 Deviation" for non-percussive music, and "Broadband Energy Rise" (see | |
120 reference, Barry et al 2005) for identifying percussive onsets in | |
121 mixed music. | |
122 </p> | |
123 <p><b>Adaptive Whitening</b> – This option evens out the temporal and | |
124 frequency variation in the signal, which can yield improved | |
125 performance in onset detection, for example in audio with big | |
126 variations in dynamics. | |
127 </p> | |
128 <h3>Outputs</h3> | |
129 | |
130 <p><b>Beats</b> – The estimated beat locations, returned as a single feature, | |
131 with timestamp but no value, for each beat, labelled with the | |
132 corresponding estimated tempo at that beat. | |
133 </p> | |
134 <p><b>Onset Detection Function</b> – The raw note onset likelihood function | |
135 used in beat estimation. | |
136 </p> | |
137 <p><b>Tempo</b> – The estimated tempo, returned as a feature each time the | |
138 estimated tempo changes, with a single value for the tempo in beats | |
139 per minute. | |
140 </p> | |
141 <h3>References and Credits</h3> | |
142 | |
143 <p><b>Beat tracking method</b>: M. E. P. Davies and M. D. Plumbley. | |
144 <i><a href="http://www.elec.qmul.ac.uk/people/markp/2007/DaviesPlumbley07-taslp.pdf">Context-dependent beat tracking of musical audio</a></i>. In IEEE | |
145 Transactions on Audio, Speech and Language Processing. Vol. 15, No. 3, | |
146 pp1009-1020, 2007. See also M. E. P. Davies and M. D. Plumbley. | |
147 <i><a href="http://www.elec.qmul.ac.uk/people/markp/2005/DaviesPlumbley05-icassp.pdf">Beat Tracking With A Two State Model</a></i>. In Proceedings of the IEEE | |
148 International Conference on Acoustics, Speech and Signal Processing | |
149 (ICASSP 2005), Vol. 3, pp241-244 Philadelphia, USA, March 19-23, 2005. | |
150 </p> | |
151 <p><b>Onset detection methods</b>: C. Duxbury, J. P. Bello, M. Davies and | |
152 M. Sandler, <i><a href="http://www.elec.qmul.ac.uk/dafx03/proceedings/pdfs/dafx81.pdf">Complex domain Onset Detection for Musical Signals</a></i>. In | |
153 Proceedings of the 6th Conference on Digital Audio Effects | |
154 (DAFx-03). London, UK. September 2003. | |
155 </p> | |
156 <p><b>Adaptive whitening</b>: D. Stowell and M. D. Plumbley, <i><a href="http://www.elec.qmul.ac.uk/digitalmusic/papers/2007/StowellPlumbley07-icmc.pdf">Adaptive whitening for improved real-time audio onset detection</a></i>. In | |
157 Proceedings of the International Computer Music Conference (ICMC'07), | |
158 August 2007. | |
159 </p> | |
160 <p><b>Percussion onset detector</b>: D. Barry, D. Fitzgerald, E. Coyle and | |
161 B. Lawlor, <i><a href="http://eleceng.dit.ie/papers/15.pdf">Drum Source Separation using Percussive Feature Detection and Spectral Modulation</a></i>. ISSC 2005. | |
162 </p> | |
163 <p>The Tempo and Beat Tracker Vamp plugin was written by Matthew Davies | |
164 and Christian Landone. | |
165 </p> | |
166 <a name="qm-keydetector"></a><h2>3. Key Detector</h2> | |
167 | |
168 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-keydetector</code> | |
169 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-keydetector">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-keydetector</a> | |
170 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a> | |
171 </p> | |
172 <p>Key Detector analyses a single channel of audio and continuously | |
173 estimates the key of the music by comparing the degree to which a | |
174 block-by-block chromagram correlates to the stored key profiles for | |
175 each major and minor key. | |
176 </p> | |
177 <p>The key profiles are drawn from analysis of Book I of the Well | |
178 Tempered Klavier by J S Bach, recorded at A=440 equal temperament. | |
179 </p> | |
180 <h3>Parameters</h3> | |
181 | |
182 <p><b>Tuning Frequency</b> – The frequency of concert A in the music under | |
183 analysis. | |
184 </p> | |
185 <p><b>Window Length</b> – The number of chroma analysis frames taken into | |
186 account for key estimation. This controls how eager the key detector | |
187 will be to return short-duration tonal changes as new key changes (the | |
188 shorter the window, the more likely it is to detect a new key change). | |
189 </p> | |
190 <h3>Outputs</h3> | |
191 | |
192 <p><b>Tonic Pitch</b> – The tonic pitch of each estimated key change, | |
193 returned as a single-valued feature at the point where the key change | |
194 is detected, with value counted from 1 to 12 where C is 1, C# or Db is | |
195 2, and so on up to B which is 12. | |
196 </p> | |
197 <p><b>Key Mode</b> – The major or minor mode of the estimated key, where | |
198 major is 0 and minor is 1. | |
199 </p> | |
200 <p><b>Key</b> – The estimated key for each key change, returned as a | |
201 single-valued feature at the point where the key change is detected, | |
202 with value counted from 1 to 24 where 1-12 are the major keys and | |
203 13-24 are the minor keys, such that C major is 1, C# major is 2, and | |
204 so on up to B major which is 12; then C minor is 13, Db minor is 14, | |
205 and so on up to B minor which is 24. | |
206 </p> | |
207 <p><b>Key Strength Plot</b> – A grid representing the ongoing key | |
208 "probability" throughout the music. This is returned as a feature for | |
209 each chroma frame, containing 25 bins. Bins 1-12 are the major keys | |
210 from C upwards; bins 14-25 are the minor keys from C upwards. The | |
211 13th bin is unused: it just provides space between the first and | |
212 second halves of the feature if displayed in a single plot. | |
213 </p> | |
214 <p>The outputs are also labelled with pitch or key as text. | |
215 </p> | |
216 <h3>References and Credits</h3> | |
217 | |
218 <p><b>Method</b>: see K. Noland and M. Sandler. <i><a href="http://www.aes.org/e-lib/browse.cfm?elib=14140">Signal Processing Parameters for Tonality Estimation</a></i>. In Proceedings of Audio Engineering Society | |
219 122nd Convention, Vienna, 2007. | |
220 </p> | |
221 <p>The Key Detector Vamp plugin was written by Katy Noland and Christian | |
222 Landone. | |
223 </p> | |
224 <a name="qm-tonalchange"></a><h2>4. Tonal Change</h2> | |
225 | |
226 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-tonalchange</code> | |
227 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tonalchange">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tonalchange</a> | |
228 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a> | |
229 </p> | |
230 <p>Tonal Change analyses a single channel of audio, detecting harmonic | |
231 changes such as chord boundaries. | |
232 </p> | |
233 <h3>Parameters</h3> | |
234 | |
235 <p><b>Gaussian smoothing</b> – The window length for the internal smoothing | |
236 operation, in chroma analysis frames. This controls how eager the | |
237 tonal change detector will be to identify very short-term tonal | |
238 changes. The default value of 5 is quite short, and may lead to more | |
239 (not always meaningful) results being returned; for many purposes a | |
240 larger value, closer to the maximum of 20, may be appropriate. | |
241 </p> | |
242 <p><b>Chromagram minimum pitch</b> – The MIDI pitch value (0-127) of the | |
243 minimum pitch included in the internal chromagram analyis. | |
244 </p> | |
245 <p><b>Chromagram maximum pitch</b> – The MIDI pitch value (0-127) of the | |
246 maximum pitch included in the internal chromagram analyis. | |
247 </p> | |
248 <p><b>Chromagram tuning frequency</b> – The frequency of concert A in the | |
249 music under analysis. | |
250 </p> | |
251 <h3>Outputs</h3> | |
252 | |
253 <p><b>Transform to 6D Tonal Content Space</b> – A representation of the | |
254 musical content in a six-dimensional tonal space onto which the | |
255 algorithm maps 12-bin chroma vectors extracted from the audio. | |
256 </p> | |
257 <p><b>Tonal Change Detection Function</b> – A function representing the | |
258 estimated likelihood of a tonal change occurring in each spectral | |
259 frame. | |
260 </p> | |
261 <p><b>Tonal Change Positions</b> – The resulting estimated positions of tonal | |
262 changes. | |
263 </p> | |
264 <h3>References and Credits</h3> | |
265 | |
266 <p><b>Method</b>: C. A. Harte, M. Gasser, and M. Sandler. <i><a href="http://portal.acm.org/citation.cfm?id=1178723.1178727">Detecting harmonic change in musical audio</a></i>. In Proceedings of the 1st ACM workshop on | |
267 Audio and Music Computing Multimedia, Santa Barbara, 2006. | |
268 </p> | |
269 <p>The Tonal Change Vamp plugin was wrtitten by Chris Harte and Martin | |
270 Gasser. | |
271 </p> | |
272 <a name="qm-segmenter"></a><h2>5. Segmenter</h2> | |
273 | |
274 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-segmenter</code> | |
275 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-segmenter">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-segmenter</a> | |
276 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a> | |
277 </p> | |
278 <p>Segmenter divides a single channel of music up into structurally | |
279 consistent segments. It returns a numeric value (the segment type) | |
280 for each moment at which a new segment starts. | |
281 </p> | |
282 <p>For music with clearly tonally distinguishable sections such as verse, | |
283 chorus, etc., segments with the same type may be expected to be | |
284 similar to one another in some structural sense. For example, | |
285 repetitions of the chorus are likely to share a segment type. | |
286 </p> | |
287 <p>The plugin only attempts to identify similar segments; it does not | |
288 attempt to label them. For example, it makes no attempt to tell you | |
289 which segment is the chorus. | |
290 </p> | |
291 <p>Note that this plugin does a substantial amount of processing after | |
292 receiving all of the input audio data, before it produces any results. | |
293 </p> | |
294 <h3>Method</h3> | |
295 | |
296 <p>The method relies upon structural/timbral similarity to obtain the | |
297 high-level song structure. This is based on the assumption that the | |
298 distributions of timbre features are similar over corresponding | |
299 structural elements of the music. | |
300 </p> | |
301 <p>The algorithm works by obtaining a frequency-domain representation of | |
302 the audio signal using a Constant-Q transform, a Chromagram or | |
303 Mel-Frequency Cepstral Coefficients (MFCC) as underlying features (the | |
304 particular feature is selectable as a parameter). The extracted | |
305 features are normalised in accordance with the MPEG-7 standard (NASE | |
306 descriptor), which means the spectrum is converted to decibel scale | |
307 and each spectral vector is normalised by the RMS energy envelope. | |
308 The value of this envelope is stored for each processing block of | |
309 audio. This is followed by the extraction of 20 principal components | |
310 per block using PCA, yielding a sequence of 21 dimensional feature | |
311 vectors where the last element in each vector corresponds to the | |
312 energy envelope. | |
313 </p> | |
314 <p>A 40-state Hidden Markov Model is then trained on the whole sequence | |
315 of features, with each state of the HMM corresponding to a specific | |
316 timbre type. This process partitions the timbre-space of a given track | |
317 into 40 possible types. The important assumption of the model is that | |
318 the distribution of these features remain consistent over a structural | |
319 segment. After training and decoding the HMM, the song is assigned a | |
320 sequence of timbre-features according to specific timbre-type | |
321 distributions for each possible structural segment. | |
322 </p> | |
323 <p>The segmentation itself is computed by clustering timbre-type | |
324 histograms. A series of histograms are created over a sliding window | |
325 which are grouped into M clusters by an adapted soft k-means | |
326 algorithm. Each of these clusters will correspond to a specific | |
327 segment-type of the analyzed song. Reference histograms, iteratively | |
328 updated during clustering, describe the timbre distribution for each | |
329 segment. The segmentation arises from the final cluster assignments. | |
330 </p> | |
331 <h3>Parameters</h3> | |
332 | |
333 <p><b>Number of segment-types</b> – The maximum number of clusters | |
334 (segment-types) to be returned. The default is 10. Unlike many | |
335 clustering algorithms, the constrained clustering used in this plugin | |
336 does not produce too many clusters or vary significantly even if this | |
337 is set too high. However, this parameter can be useful for limiting | |
338 the number of expected segment-types. | |
339 </p> | |
340 <p><b>Feature Type</b> – The type of spectral feature used for segmentation. The available features are:<ul><li>"Hybrid", the default, which uses a Constant-Q transform (see <a href="#qm-constantq">related | |
341 plugin</a>): this is generally effective for modern studio recordings;</li><li> "Chromatic", using a chromagram derived from the Constant-Q feature (see <a href="#qm-chromagram">related plugin</a>): this may be preferable for live, acoustic, or older recordings, in which repeated sections may be less consistent in | |
342 sound;</li><li>"Timbral", using Mel-Frequency | |
343 Cepstral Coefficients (see <a href="#qm-mfcc">related plugin</a>), which is more likely to | |
344 result in classification by instrumentation rather than musical | |
345 content.</li></ul> | |
346 </p> | |
347 <p><b>Minimum segment duration</b> – The approximate expected minimum | |
348 duration for a segment, from 1 to 15 seconds. Changing this parameter | |
349 may help the plugin to find musical sections rather than just | |
350 following changes in the sound of the music, and also avoid wasting a | |
351 segment-type cluster for timbrally distinct but too-short segments. | |
352 The default of 4 seconds usually produces good results. | |
353 </p> | |
354 <h3>Outputs</h3> | |
355 | |
356 <p><b>Segmentation</b> – The estimated segment boundaries, returned as a | |
357 single feature with one value at each segment boundary, with the value | |
358 representing the segment type number for the segment starting at that | |
359 boundary. | |
360 </p> | |
361 <h3>References and Credits</h3> | |
362 | |
363 <p><b>Method</b>: M. Levy and M. Sandler. <i><a href="http://ieeexplore.ieee.org/iel5/10376/4432632/04432648.pdf?arnumber=4432648">Structural segmentation of musical audio by constrained clustering</a></i>. IEEE Transactions on Audio, Speech, and Language Processing, February 2008. | |
364 </p> | |
365 <p>Note that this plugin does not implement the beat-sychronous aspect | |
366 of the segmentation method described in the paper. | |
367 </p> | |
368 <p>The Segmenter Vamp plugin was written by Mark Levy. Thanks to George | |
369 Fazekas for providing much of this documentation. | |
370 </p> | |
371 <a name="qm-similarity"></a><h2>6. Similarity</h2> | |
372 | |
373 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-similarity</code> | |
374 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-similarity">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-similarity</a> | |
375 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a> | |
376 </p> | |
377 <p>Similarity treats each channel of its audio input as a separate | |
378 "track", and estimates how similar the tracks are to one another using | |
379 a selectable similarity measure. | |
380 </p> | |
381 <p>The plugin also returns the intermediate data used as a basis of the | |
382 similarity measure; it can therefore be used on a single channel of | |
383 input (with the resulting intermediate data then being applied in some | |
384 other similarity or clustering algorithm, for example) if desired, as | |
385 well as with multiple inputs. | |
386 </p> | |
387 <p>Because of the way this plugin handles multiple inputs, by assuming | |
388 that each channel represents a separate piece of music, it may not be | |
389 appropriate for use directly in a general-purpose host (unless you | |
390 actually want to do something like compare two stereo channels for | |
391 timbral similarity, which is unlikely). | |
392 </p> | |
393 <h3>Parameters</h3> | |
394 | |
395 <p><b>Feature Type</b> – The underlying audio feature used for the similarity | |
396 measure. The available features are: | |
397 <ul><li>"Timbre", in which the distance | |
398 between tracks is a symmetrised Kullback-Leibler divergence between | |
399 Gaussian-modelled MFCC means and variances across each track, for the | |
400 first 20 MFCCs including C0 (see <a href="#qm-mfcc">related plugin</a>);</li><li>"Chroma", which uses Kullback-Leibler divergence of | |
401 mean chroma histogram (see <a href="#qm-chromagram">related plugin</a>);</li><li>"Rhythm", using the cosine distance between | |
402 "beat spectrum" measures derived from a short sampled section of the | |
403 track;</li><li>and combined "Timbre and Rhythm" and "Chroma and Rhythm" | |
404 features.</li></ul> | |
405 </p> | |
406 <h3>Outputs</h3> | |
407 | |
408 <p><b>Distance Matrix</b> – A matrix of the distance measures between input | |
409 channels, returned as a series of vector features timestamped at | |
410 one-second intervals. The distance from channel i to channel j | |
411 appears as the j'th bin of the feature at time i. | |
412 </p> | |
413 <p><b>Distance from First Channel</b> – A single vector feature, timestamped | |
414 at time zero, containing the distances between the first input channel | |
415 and each of the input channels (including the first channel itself at | |
416 bin 0, which should have zero distance). | |
417 </p> | |
418 <p><b>Ordered Distances from First Channel</b> – A pair of vector features, | |
419 at times 0 and 1 second. The feature at time 0 contains the 1-based | |
420 indices of the input channels in the order of similarity to the first | |
421 input channel (so its first bin should always contain 1, as the first | |
422 channel is most similar to itself). The feature at time 1 contains, | |
423 in bin n, the distance between the first input channel and the channel | |
424 with index found at bin n of the feature at time 0. | |
425 </p> | |
426 <p><b>Feature Means</b> – A series of vector features containing the mean | |
427 values of each of the feature bins across the duration of each of the | |
428 input channels. This output returns one feature for each input | |
429 channel, timestamped at one-second intervals. The number of bins for | |
430 each feature depends on the feature type; it will be 20 for MFCC | |
431 features and 12 for chroma features. No features will be returned on | |
432 this output if the feature type is purely rhythmic. | |
433 </p> | |
434 <p><b>Feature Variances</b> – Just as Feature Means, but variances. | |
435 </p> | |
436 <p><b>Beat Spectra</b> – A series of vector features containing the rhythmic | |
437 autocorrelation profiles (beat spectra) for each of the input | |
438 channels. This output returns one 512-bin feature for each input | |
439 channel, timestamped at one-second intervals. No features will be | |
440 returned on this output if the feature type contains no rhythm | |
441 component. | |
442 </p> | |
443 <h3>References and Credits</h3> | |
444 | |
445 <p><b>Timbral similarity</b>: M. Levy and M. Sandler. <i><a href="http://www.elec.qmul.ac.uk/easaier/papers/mlevytimbralsimilarity.pdf">Lightweight measures for timbral similarity of musical audio</a></i>. In Proceedings of the 1st | |
446 ACM workshop on Audio and Music Computing Multimedia, Santa Barbara, | |
447 2006. | |
448 </p> | |
449 <p><b>Combined rhythmic and timbral similarity</b>: K. Jacobson. <i><a href="http://ismir2006.ismir.net/PAPERS/ISMIR0696_Paper.pdf">A Multifaceted Approach to Music Similarity</a></i>. In Proceedings of the | |
450 Seventh International Conference on Music Information Retrieval | |
451 (ISMIR), 2006. | |
452 </p> | |
453 <p>The Similarity Vamp plugin was written by Mark Levy, Kurt Jacobson and | |
454 Chris Cannam. | |
455 </p> | |
456 <a name="qm-constantq"></a><h2>7. Constant-Q Spectrogram</h2> | |
457 | |
458 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-constantq</code> | |
459 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-constantq">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-constantq</a> | |
460 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a> | |
461 </p> | |
462 <p>Constant-Q Spectrogram calculates a spectrogram based on a short-time | |
463 windowed constant Q spectral transform. This is a spectrogram in | |
464 which the ratio of centre frequency to resolution is constant for each | |
465 frequency bin. The frequency bins correspond to the frequencies of | |
466 "musical notes" rather than being linearly spaced in frequency as they | |
467 are for the conventional DFT spectrogram. | |
468 </p> | |
469 <p>The pitch range and the number of frequency bins per octave may be | |
470 adjusted using the plugin's parameters. Note that the plugin's | |
471 preferred step and block sizes are defined by these parameters, and | |
472 the plugin will not accept any other block size than its preferred | |
473 value. | |
474 </p> | |
475 <h3>Parameters</h3> | |
476 | |
477 <p><b>Minimum Pitch</b> – The MIDI pitch value (0-127) corresponding to the lowest | |
478 frequency to be included in the constant-Q transform. | |
479 </p> | |
480 <p><b>Maximum Pitch</b> – The MIDI pitch value (0-127) corresponding to the | |
481 lowest frequency to be included in the constant-Q transform. | |
482 </p> | |
483 <p><b>Tuning Frequency</b> – The frequency of concert A in the | |
484 music under analysis. | |
485 </p> | |
486 <p><b>Bins per Octave</b> – The number of constant-Q transform bins to be | |
487 computed per octave. | |
488 </p> | |
489 <p><b>Normalized</b> – Whether to normalize each output column to unit | |
490 maximum. | |
491 </p> | |
492 <h3>Outputs</h3> | |
493 | |
494 <p><b>Constant-Q Spectrogram</b> – The calculated spectrogram, as a single | |
495 feature per process block containing one bin for each pitch included | |
496 in the spectrogram's range. | |
497 </p> | |
498 <h3>References and Credits</h3> | |
499 | |
500 <p><b>Principle</b>: J. Brown. <i><a href="http://www.wellesley.edu/Physics/brown/pubs/cq1stPaper.pdf">Calculation of a constant Q spectral transform</a></i>. Journal of the Acoustical Society of America, 89(1): | |
501 425-434, 1991. | |
502 </p> | |
503 <p>The Constant-Q Spectrogram Vamp plugin was written by Christian | |
504 Landone. | |
505 </p> | |
506 <a name="qm-chromagram"></a><h2>8. Chromagram</h2> | |
507 | |
508 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-chromagram</code> | |
509 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-chromagram">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-chromagram</a> | |
510 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a> | |
511 </p> | |
512 <p>Chromagram calculates a constant Q spectral transform (as in the | |
513 Constant Q Spectrogram plugin) and then wraps the frequency bin values | |
514 into a single octave, with each bin containing the sum of the | |
515 magnitudes from the corresponding bin in all octaves. The number of | |
516 values in each feature vector returned by the plugin is therefore the | |
517 same as the number of bins per octave configured for the underlying | |
518 constant Q transform. | |
519 </p> | |
520 <p>The pitch range and the number of frequency bins per octave for the | |
521 transform may be adjusted using the plugin's parameters. Note that | |
522 the plugin's preferred step and block sizes depend on these | |
523 parameters, and the plugin will not accept any other block size than | |
524 its preferred value. | |
525 </p> | |
526 <h3>Parameters</h3> | |
527 | |
528 <p><b>Minimum Pitch</b> – The MIDI pitch value (0-127) corresponding to the | |
529 lowest frequency to be included in the constant-Q transform used in | |
530 calculating the chromagram. | |
531 </p> | |
532 <p><b>Maximum Pitch</b> – The MIDI pitch value (0-127) corresponding to the | |
533 lowest frequency to be included in the constant-Q transform used in | |
534 calculating the chromagram. | |
535 </p> | |
536 <p><b>Tuning Frequency</b> – The frequency of concert A in the | |
537 music under analysis. | |
538 </p> | |
539 <p><b>Bins per Octave</b> – The number of constant-Q transform bins to be | |
540 computed per octave, and thus the total number of bins present in the | |
541 resulting chromagram. | |
542 </p> | |
543 <p><b>Normalized</b> – Whether to normalize each output column. Normalization | |
544 may be to unit sum or unit maximum. | |
545 </p> | |
546 <h3>Outputs</h3> | |
547 | |
548 <p><b>Chromagram</b> – The calculated chromagram, as a single feature per | |
549 process block containing the number of bins given in the bins per | |
550 octave parameter. | |
551 </p> | |
552 <h3>References and Credits</h3> | |
553 | |
554 <p>The Chromagram Vamp plugin was written by Christian Landone. | |
555 </p> | |
556 <a name="qm-mfcc"></a><h2>9. Mel-Frequency Cepstral Coefficients</h2> | |
557 | |
558 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-mfcc</code> | |
559 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-mfcc">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-mfcc</a> | |
560 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a> | |
561 </p> | |
562 <p>Mel-Frequency Cepstral Coefficients calculates MFCCs from a single | |
563 channel of audio. These coefficients, derived from a cosine transform | |
564 of the mapping of an audio spectrum onto a frequency scale modelled on | |
565 human auditory response, are widely used in speech recognition, music | |
566 classification and other tasks. | |
567 </p> | |
568 <h3>Parameters</h3> | |
569 | |
570 <p><b>Number of Coefficients</b> – The number of MFCCs to return. Commonly | |
571 used values include 13 or the default 20. This number includes C0 if | |
572 requested (see Include C0 below). | |
573 </p> | |
574 <p><b>Power for Mel Amplitude Logs</b> – An optional power value to which the | |
575 spectral amplitudes should be raised before applying the cosine | |
576 transform. Values greater than 1 may in principle reduce the | |
577 contribution of noise to the results. The default is 1. | |
578 </p> | |
579 <p><b>Include C0</b> – Whether to include the "zero'th" coefficient, which | |
580 simply reflects the overall signal power across the Mel frequency | |
581 bands. | |
582 </p> | |
583 <h3>Outputs</h3> | |
584 | |
585 <p><b>Coefficients</b> – The MFCC values, returned as one vector feature per | |
586 processing block. | |
587 </p> | |
588 <p><b>Means of Coefficients</b> – The overall means of the MFCC bins, as a | |
589 single vector feature with time 0 that is returned when processing is | |
590 complete. | |
591 </p> | |
592 <h3>References and Credits</h3> | |
593 | |
594 <p><b>MFCCs in music</b>: See B. Logan. <i><a href="http://ismir2000.ismir.net/papers/logan_paper.pdf">Mel-Frequency Cepstral Coefficients for Music Modeling</a></i>. In Proceedings of the First International | |
595 Symposium on Music Information Retrieval (ISMIR), 2000. | |
596 </p> | |
597 <p>The Mel-Frequency Cepstral Coefficients Vamp plugin was written by | |
598 Nicolas Chetry and Chris Cannam. | |
599 </p> | |
600 <p></p> | |
601 </CONTENTS> | |
602 </body> | |
603 </html> |