annotate plugin-doc/qm-vamp-plugins.html @ 101:bb33b0c75481 rdfquery

Remove download.html (now dynamic)
author Chris Cannam
date Tue, 24 Jun 2014 13:32:30 +0100
parents bcb5a0818120
children 678a88672953
rev   line source
cannam@16 1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
cannam@16 2 <html>
cannam@16 3 <head>
cannam@16 4 <link rel="stylesheet" media="screen" type="text/css" href="/screen.css"/>
cannam@16 5 <link rel="icon" type="image/png" href="/images/waveform.png"/>
cannam@16 6 <link rel="shortcut" type="image/png" href="/images/waveform.png"/>
cannam@16 7 <title>QM Vamp Plugins: User Documentation</title>
cannam@16 8 <meta name="robots" content="index"/>
cannam@16 9 </head>
cannam@16 10 <body>
cannam@16 11 <h1 id="header"><span>Vamp Plugins</span></h1>
cannam@16 12
cannam@16 13 <h2>QM Vamp Plugins</h2>
cannam@16 14
cannam@16 15 <p>The QM Vamp Plugin set is a library of Vamp audio feature
cannam@16 16 extraction plugins developed at the <a
cannam@16 17 href="http://www.elec.qmul.ac.uk/digitalmusic/">Centre for Digital
cannam@16 18 Music</a> at Queen Mary, University of London. These plugins are
cannam@16 19 provided as a single library file, made available in binary form for
cannam@16 20 Windows, OS/X, and Linux from the Centre for Digital Music's <a
cannam@32 21 href="http://isophonics.net/QMVampPlugins">download
cannam@16 22 page</a>.
cannam@16 23 </p>
cannam@16 24 <p>For more information about Vamp plugins, see <a href="http://www.vamp-plugins.org/">http://www.vamp-plugins.org/</a> .
cannam@16 25 </p>
cannam@16 26
cannam@16 27 <div class="toc2">1. &nbsp;<a href="#qm-onsetdetector">Note Onset Detector</a></div>
cannam@16 28 <div class="toc2">2. &nbsp;<a href="#qm-tempotracker">Tempo and Beat Tracker</a></div>
cannam@29 29 <div class="toc2">3. &nbsp;<a href="#qm-barbeattracker">Bar and Beat Tracker</a></div>
cannam@29 30 <div class="toc2">4. &nbsp;<a href="#qm-keydetector">Key Detector</a></div>
cannam@29 31 <div class="toc2">5. &nbsp;<a href="#qm-tonalchange">Tonal Change</a></div>
cannam@29 32 <div class="toc2">6. &nbsp;<a href="#qm-adaptivespectrogram">Adaptive Spectrogram</a></div>
cannam@29 33 <div class="toc2">7. &nbsp;<a href="#qm-transcription">Polyphonic Transcription</a></div>
cannam@29 34 <div class="toc2">8. &nbsp;<a href="#qm-segmenter">Segmenter</a></div>
cannam@29 35 <div class="toc2">9. &nbsp;<a href="#qm-similarity">Similarity</a></div>
cannam@29 36 <div class="toc2">10. &nbsp;<a href="#qm-dwt">Discrete Wavelet Transform</a></div>
cannam@29 37 <div class="toc2">11. &nbsp;<a href="#qm-constantq">Constant-Q Spectrogram</a></div>
cannam@29 38 <div class="toc2">12. &nbsp;<a href="#qm-chromagram">Chromagram</a></div>
cannam@29 39 <div class="toc2">13. &nbsp;<a href="#qm-mfcc">Mel-Frequency Cepstral Coefficients</a></div>
cannam@16 40
cannam@29 41 <a name="qm-onsetdetector"></a><h2>1. Note Onset Detector</h2>
cannam@16 42
cannam@16 43 <p><b>System identifier</b> &ndash; <code>vamp:qm-vamp-plugins:qm-onsetdetector</code>
cannam@16 44 <br><b>RDF URI</b> &ndash; <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-onsetdetector">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-onsetdetector</a>
cannam@16 45 <br><b>Links</b> &ndash; <a href="#">Back to top of library documentation</a> &ndash; <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a>
cannam@16 46 </p>
cannam@16 47 <p>Note Onset Detector analyses a single channel of audio and estimates
cannam@16 48 the onset times of notes within the music &ndash; that is, the times at
cannam@16 49 which notes and other audible events begin.
cannam@16 50 </p>
cannam@16 51 <p>It calculates an onset likelihood function for each spectral frame,
cannam@16 52 and picks peaks in a smoothed version of this function. The plugin is
cannam@16 53 non-causal, returning all results at the end of processing.
cannam@16 54 </p>
cannam@16 55 <h3>Parameters</h3>
cannam@16 56
cannam@16 57 <p><b>Onset Detection Function Type</b> &ndash; The method used to calculate the
cannam@16 58 onset likelihood function. The most versatile method is the default,
cannam@16 59 "Complex Domain" (see reference, Duxbury et al 2003). "Spectral
cannam@16 60 Difference" may be appropriate for percussive recordings, "Phase
cannam@16 61 Deviation" for non-percussive music, and "Broadband Energy Rise" (see
cannam@16 62 reference, Barry et al 2005) for identifying percussive onsets in
cannam@16 63 mixed music.
cannam@16 64 </p>
cannam@16 65 <p><b>Onset Detector Sensitivity</b> &ndash; Sensitivity level for peak detection
cannam@16 66 in the onset likelihood function. The higher the sensitivity, the
cannam@16 67 more onsets will (rightly or wrongly) be detected. The peak picker
cannam@16 68 does not have a simple threshold level; instead, this parameter
cannam@16 69 controls the required "steepness" of the slopes in the smoothed
cannam@16 70 detection function either side of a peak value, in order for that peak
cannam@16 71 to be accepted as an onset.
cannam@16 72 </p>
cannam@16 73 <p><b>Adaptive Whitening</b> &ndash; This option evens out the temporal and
cannam@16 74 frequency variation in the signal, which can yield improved
cannam@16 75 performance in onset detection, for example in audio with big
cannam@16 76 variations in dynamics.
cannam@16 77 </p>
cannam@16 78 <h3>Outputs</h3>
cannam@16 79
cannam@16 80 <p><b>Note Onsets</b> &ndash; The detected note onset times, returned as a single
cannam@16 81 feature with timestamp but no value for each detected note.
cannam@16 82 </p>
cannam@16 83 <p><b>Onset Detection Function</b> &ndash; The raw note onset likelihood function
cannam@16 84 that was calculated as the first step of the detection process.
cannam@16 85 </p>
cannam@16 86 <p><b>Smoothed Detection Function</b> &ndash; The note onset likelihood function
cannam@16 87 following median filtering. This is the function from which
cannam@16 88 sufficiently steep peak values are picked and classified as onsets.
cannam@16 89 </p>
cannam@16 90 <h3>References and Credits</h3>
cannam@16 91
cannam@16 92 <p><b>Basic detection methods</b>: C. Duxbury, J. P. Bello, M. Davies and
cannam@16 93 M. Sandler, <i><a href="http://www.elec.qmul.ac.uk/dafx03/proceedings/pdfs/dafx81.pdf">Complex domain Onset Detection for Musical Signals</a></i>. In
cannam@16 94 Proceedings of the 6th Conference on Digital Audio Effects
cannam@16 95 (DAFx-03). London, UK. September 2003.
cannam@16 96 </p>
cannam@16 97 <p><b>Adaptive whitening</b>: D. Stowell and M. D. Plumbley, <i><a href="http://www.elec.qmul.ac.uk/digitalmusic/papers/2007/StowellPlumbley07-icmc.pdf">Adaptive whitening for improved real-time audio onset detection</a></i>. In
cannam@16 98 Proceedings of the International Computer Music Conference (ICMC'07),
cannam@16 99 August 2007.
cannam@16 100 </p>
cannam@16 101 <p><b>Percussion onset detector</b>: D. Barry, D. Fitzgerald, E. Coyle and
cannam@16 102 B. Lawlor, <i><a href="http://eleceng.dit.ie/papers/15.pdf">Drum Source Separation using Percussive Feature Detection and Spectral Modulation</a></i>. ISSC 2005.
cannam@16 103 </p>
cannam@16 104 <p>The Note Onset Detector Vamp plugin was written by Chris Duxbury, Juan
cannam@16 105 Pablo Bello and Christian Landone.
cannam@16 106 </p>
cannam@29 107
cannam@16 108 <a name="qm-tempotracker"></a><h2>2. Tempo and Beat Tracker</h2>
cannam@16 109
cannam@16 110 <p><b>System identifier</b> &ndash; <code>vamp:qm-vamp-plugins:qm-tempotracker</code>
cannam@16 111 <br><b>RDF URI</b> &ndash; <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tempotracker">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tempotracker</a>
cannam@16 112 <br><b>Links</b> &ndash; <a href="#">Back to top of library documentation</a> &ndash; <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a>
cannam@16 113 </p>
cannam@16 114 <p>Tempo and Beat Tracker analyses a single channel of audio and
cannam@16 115 estimates the positions of metrical beats within the music (the
cannam@16 116 equivalent of a human listener tapping their foot to the beat).
cannam@16 117 </p>
cannam@16 118 <h3>Parameters</h3>
cannam@16 119
cannam@46 120 <p><b>Beat Tracking Method</b> &ndash; The method used to track beats. The default, "New", uses a hybrid of the "Old" two-state beat tracking model
cannam@46 121 (see reference Davies 2007) and a dynamic programming method (see reference
cannam@46 122 Ellis 2007). A more detailed description is given below within the Bar and
cannam@46 123 Beat Tracker plugin. </p>
cannam@29 124
cannam@29 125 <p><b>Onset Detection Function Type</b> &ndash; The algorithm used to calculate the
cannam@16 126 onset likelihood function. The most versatile method is the default,
cannam@16 127 "Complex Domain" (see reference, Duxbury et al 2003). "Spectral
cannam@16 128 Difference" may be appropriate for percussive recordings, "Phase
cannam@16 129 Deviation" for non-percussive music, and "Broadband Energy Rise" (see
cannam@16 130 reference, Barry et al 2005) for identifying percussive onsets in
cannam@16 131 mixed music.
cannam@16 132 </p>
cannam@16 133 <p><b>Adaptive Whitening</b> &ndash; This option evens out the temporal and
cannam@16 134 frequency variation in the signal, which can yield improved
cannam@16 135 performance in onset detection, for example in audio with big
cannam@16 136 variations in dynamics.
cannam@16 137 </p>
cannam@16 138 <h3>Outputs</h3>
cannam@16 139
cannam@16 140 <p><b>Beats</b> &ndash; The estimated beat locations, returned as a single feature,
cannam@16 141 with timestamp but no value, for each beat, labelled with the
cannam@16 142 corresponding estimated tempo at that beat.
cannam@16 143 </p>
cannam@16 144 <p><b>Onset Detection Function</b> &ndash; The raw note onset likelihood function
cannam@16 145 used in beat estimation.
cannam@16 146 </p>
cannam@16 147 <p><b>Tempo</b> &ndash; The estimated tempo, returned as a feature each time the
cannam@16 148 estimated tempo changes, with a single value for the tempo in beats
cannam@16 149 per minute.
cannam@16 150 </p>
cannam@16 151 <h3>References and Credits</h3>
cannam@16 152
cannam@16 153 <p><b>Beat tracking method</b>: M. E. P. Davies and M. D. Plumbley.
cannam@16 154 <i><a href="http://www.elec.qmul.ac.uk/people/markp/2007/DaviesPlumbley07-taslp.pdf">Context-dependent beat tracking of musical audio</a></i>. In IEEE
cannam@16 155 Transactions on Audio, Speech and Language Processing. Vol. 15, No. 3,
cannam@46 156 pp1009-1020, 2007;<br>M. E. P. Davies and M. D. Plumbley.
cannam@16 157 <i><a href="http://www.elec.qmul.ac.uk/people/markp/2005/DaviesPlumbley05-icassp.pdf">Beat Tracking With A Two State Model</a></i>. In Proceedings of the IEEE
cannam@16 158 International Conference on Acoustics, Speech and Signal Processing
cannam@46 159 (ICASSP 2005), Vol. 3, pp241-244 Philadelphia, USA, March 19-23, 2005;
cannam@46 160 <br>D. P. W. Ellis. <i>Beat Tracking by Dynamic
cannam@46 161 Programming</i>. In Journal of New Music Research. Vol. 37, No. 1,
cannam@46 162 pp51-60, 2007.
cannam@16 163 </p>
cannam@16 164 <p><b>Onset detection methods</b>: C. Duxbury, J. P. Bello, M. Davies and
cannam@16 165 M. Sandler, <i><a href="http://www.elec.qmul.ac.uk/dafx03/proceedings/pdfs/dafx81.pdf">Complex domain Onset Detection for Musical Signals</a></i>. In
cannam@16 166 Proceedings of the 6th Conference on Digital Audio Effects
cannam@16 167 (DAFx-03). London, UK. September 2003.
cannam@16 168 </p>
cannam@16 169 <p><b>Adaptive whitening</b>: D. Stowell and M. D. Plumbley, <i><a href="http://www.elec.qmul.ac.uk/digitalmusic/papers/2007/StowellPlumbley07-icmc.pdf">Adaptive whitening for improved real-time audio onset detection</a></i>. In
cannam@16 170 Proceedings of the International Computer Music Conference (ICMC'07),
cannam@16 171 August 2007.
cannam@16 172 </p>
cannam@16 173 <p><b>Percussion onset detector</b>: D. Barry, D. Fitzgerald, E. Coyle and
cannam@16 174 B. Lawlor, <i><a href="http://eleceng.dit.ie/papers/15.pdf">Drum Source Separation using Percussive Feature Detection and Spectral Modulation</a></i>. ISSC 2005.
cannam@16 175 </p>
cannam@16 176 <p>The Tempo and Beat Tracker Vamp plugin was written by Matthew Davies
cannam@16 177 and Christian Landone.
cannam@16 178 </p>
cannam@29 179
cannam@29 180
cannam@29 181 <a name="qm-barbeattracker"></a><h2>3. Bar and Beat Tracker</h2>
cannam@29 182
cannam@29 183 <p><b>System identifier</b> &ndash; <code>vamp:qm-vamp-plugins:qm-barbeattracker</code>
cannam@29 184 <br><b>RDF URI</b> &ndash; <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-barbeattracker">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-barbeattracker</a>
cannam@29 185 <br><b>Links</b> &ndash; <a href="#">Back to top of library documentation</a> &ndash; <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a>
cannam@29 186 </p>
cannam@29 187
cannam@29 188 <p>Bar and Beat Tracker analyses a single channel of audio and
cannam@29 189 estimates the positions of bar lines and the resulting counted
cannam@29 190 metrical beat positions within the music (where the first beat of
cannam@29 191 each bar is "1", the equivalent of counting in time to the music).
cannam@29 192 It is closely related to the <a href="#qm-tempotracker">Tempo and
cannam@29 193 Beat Tracker</a>, producing the same results for beat position as
cannam@29 194 that plugin's "New" beat tracking method.
cannam@29 195
cannam@29 196 </p>
cannam@29 197
cannam@29 198 <h3>Method</h3>
cannam@29 199
cannam@29 200 <p>The plugin first calculates an onset detection function using the
cannam@29 201 "Complex Domain" method (see <a href="#qm-tempotracker">Tempo and Beat
cannam@29 202 Tracker</a>).</p>
cannam@29 203
cannam@29 204 <p>The beat tracking method performs two passes over the onset
cannam@29 205 detection function, first to estimate the tempo contour, and then
cannam@29 206 given the tempo, to recover the beat locations.</p>
cannam@29 207
cannam@29 208 <p>To identify the tempo, the onset detection function is partitioned
cannam@29 209 into 6-second frames with a 1.5-second increment. The autocorrelation
cannam@29 210 function of each 6-second onset detection function is found and this
cannam@29 211 is then passed through a perceptually weighted comb filterbank (see
cannam@29 212 reference Davies 2007). The successive comb filterbank output signals
cannam@29 213 are grouped together into a matrix of observations of periodicity
cannam@29 214 through time. The best path of periodicity through these observations
cannam@29 215 is found using the Viterbi algorithm, where the transition matrix is
cannam@29 216 defined as a diagonal Gaussian.</p>
cannam@29 217
cannam@29 218 <p>Given the estimates of periodicity, the beat locations are recovered
cannam@29 219 by applying the dynamic programming algorithm (see reference Ellis
cannam@29 220 2007). This process involves the calculation of a recursive cumulative
cannam@29 221 score function and backtrace signal. The cumulative score indicates
cannam@29 222 the likelihood of a beat existing at each sample of the onset
cannam@29 223 detection function input, and the backtrace gives the location of the
cannam@29 224 best previous beat given this point in time. Once the cumulative score
cannam@29 225 and backtrace have been calculated for the whole input signal, the
cannam@29 226 best path through beat locations is found by recursively sampling the
cannam@29 227 backtrace signal from the end of the input signal back to the
cannam@29 228 beginning. See reference Stark et al. 2009 for a description of the
cannam@29 229 real-time implementation of the beat tracking algorithm.</p>
cannam@29 230
cannam@29 231 <p>Once the beat locations have been identified, the plugin makes a
cannam@29 232 second pass over the input audio signal, partitioning it into beat
cannam@29 233 synchronous frames. The audio within each beat frame is down-sampled
cannam@29 234 to give a new sampling frequency of 2.8kHz. A beat-synchronous
cannam@29 235 spectral representation is then calculated within each frame, from
cannam@29 236 which a measure of beat spectral difference is calculated using
cannam@29 237 Jensen-Shannon divergence. The bar boundaries are identified as those
cannam@29 238 beat transitions leading to most consistent spectral change given the
cannam@29 239 specified number of beats per bar.</p>
cannam@29 240
cannam@29 241 <h3>Parameters</h3>
cannam@29 242
cannam@29 243 <p><b>Beats per Bar</b> &ndash; The number of beats per bar (or measure). The
cannam@29 244 plugin assumes that the number of beats per bar is fixed throughout
cannam@29 245 the music.
cannam@29 246 </p>
cannam@29 247 <h3>Outputs</h3>
cannam@29 248
cannam@29 249 <p><b>Beats</b> &ndash; The estimated beat locations, returned as a single feature,
cannam@29 250 with timestamp but no value, for each beat, labelled with the
cannam@29 251 number of that beat within the bar (e.g. consecutively 1, 2, 3, 4 for 4 beats to the bar).
cannam@29 252 </p>
cannam@29 253 <p><b>Bars</b> &ndash; The estimated bar line locations, returned as a single feature,
cannam@29 254 with timestamp but no value, for each bar.
cannam@29 255 </p>
cannam@29 256 <p><b>Beat Count</b> &ndash; The estimated beat locations, returned as a single feature,
cannam@29 257 with timestamp and a value corresponding to the
cannam@29 258 number of that beat within the bar. This is similar to the Beats output except that it returns a counting function rather than a series of instants.
cannam@29 259 </p>
cannam@29 260 <p><b>Beat Spectral Difference</b> &ndash; The new-bar likelihood function used in bar line estimation.
cannam@29 261 </p>
cannam@29 262
cannam@29 263 <h3>References and Credits</h3>
cannam@29 264
cannam@29 265 <p><b>Beat tracking method</b>: A. M. Stark, M. E. P. Davies and
cannam@29 266 M. D. Plumbley. <i>Real-time beat-synchronous analysis of musical
cannam@29 267 audio</i>. To appear in Proceedings of 12th International Conference
cannam@29 268 on Digital Audio Effects (DAFx). 2009;<br>M. E. P. Davies and
cannam@29 269 M. D. Plumbley. <i><a
cannam@29 270 href="http://www.elec.qmul.ac.uk/people/markp/2007/DaviesPlumbley07-taslp.pdf">Context-dependent
cannam@29 271 beat tracking of musical audio</a></i>. In IEEE Transactions on
cannam@29 272 Audio, Speech and Language Processing. Vol. 15, No. 3, pp1009-1020,
cannam@29 273 2007;<br>D. P. W. Ellis. <i>Beat Tracking by Dynamic
cannam@29 274 Programming</i>. In Journal of New Music Research. Vol. 37, No. 1,
cannam@29 275 pp51-60, 2007.</p>
cannam@29 276
cannam@29 277 <p><b>Bar finding method</b>: M. E. P. Davies and M. D. Plumbley. <i>A
cannam@29 278 spectral difference approach to extracting downbeats in musical
cannam@29 279 audio</i>. In Proceedings of 14th European Signal Processing Conference
cannam@29 280 (EUSIPCO), Italy, 2006.</p>
cannam@29 281
cannam@29 282 <p>The Bar and Beat Tracker Vamp plugin was written by Matthew Davies and Adam Stark.
cannam@29 283 </p>
cannam@29 284
cannam@29 285
cannam@29 286
cannam@29 287 <a name="qm-keydetector"></a><h2>4. Key Detector</h2>
cannam@16 288
cannam@16 289 <p><b>System identifier</b> &ndash; <code>vamp:qm-vamp-plugins:qm-keydetector</code>
cannam@16 290 <br><b>RDF URI</b> &ndash; <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-keydetector">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-keydetector</a>
cannam@16 291 <br><b>Links</b> &ndash; <a href="#">Back to top of library documentation</a> &ndash; <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a>
cannam@16 292 </p>
cannam@16 293 <p>Key Detector analyses a single channel of audio and continuously
cannam@16 294 estimates the key of the music by comparing the degree to which a
cannam@16 295 block-by-block chromagram correlates to the stored key profiles for
cannam@16 296 each major and minor key.
cannam@16 297 </p>
cannam@16 298 <p>The key profiles are drawn from analysis of Book I of the Well
cannam@16 299 Tempered Klavier by J S Bach, recorded at A=440 equal temperament.
cannam@16 300 </p>
cannam@16 301 <h3>Parameters</h3>
cannam@16 302
cannam@16 303 <p><b>Tuning Frequency</b> &ndash; The frequency of concert A in the music under
cannam@16 304 analysis.
cannam@16 305 </p>
cannam@16 306 <p><b>Window Length</b> &ndash; The number of chroma analysis frames taken into
cannam@16 307 account for key estimation. This controls how eager the key detector
cannam@16 308 will be to return short-duration tonal changes as new key changes (the
cannam@16 309 shorter the window, the more likely it is to detect a new key change).
cannam@16 310 </p>
cannam@16 311 <h3>Outputs</h3>
cannam@16 312
cannam@16 313 <p><b>Tonic Pitch</b> &ndash; The tonic pitch of each estimated key change,
cannam@16 314 returned as a single-valued feature at the point where the key change
cannam@16 315 is detected, with value counted from 1 to 12 where C is 1, C# or Db is
cannam@16 316 2, and so on up to B which is 12.
cannam@16 317 </p>
cannam@16 318 <p><b>Key Mode</b> &ndash; The major or minor mode of the estimated key, where
cannam@16 319 major is 0 and minor is 1.
cannam@16 320 </p>
cannam@16 321 <p><b>Key</b> &ndash; The estimated key for each key change, returned as a
cannam@16 322 single-valued feature at the point where the key change is detected,
cannam@16 323 with value counted from 1 to 24 where 1-12 are the major keys and
cannam@16 324 13-24 are the minor keys, such that C major is 1, C# major is 2, and
cannam@16 325 so on up to B major which is 12; then C minor is 13, Db minor is 14,
cannam@16 326 and so on up to B minor which is 24.
cannam@16 327 </p>
cannam@16 328 <p><b>Key Strength Plot</b> &ndash; A grid representing the ongoing key
cannam@16 329 "probability" throughout the music. This is returned as a feature for
cannam@16 330 each chroma frame, containing 25 bins. Bins 1-12 are the major keys
cannam@16 331 from C upwards; bins 14-25 are the minor keys from C upwards. The
cannam@16 332 13th bin is unused: it just provides space between the first and
cannam@16 333 second halves of the feature if displayed in a single plot.
cannam@16 334 </p>
cannam@16 335 <p>The outputs are also labelled with pitch or key as text.
cannam@16 336 </p>
cannam@16 337 <h3>References and Credits</h3>
cannam@16 338
cannam@16 339 <p><b>Method</b>: see K. Noland and M. Sandler. <i><a href="http://www.aes.org/e-lib/browse.cfm?elib=14140">Signal Processing Parameters for Tonality Estimation</a></i>. In Proceedings of Audio Engineering Society
cannam@16 340 122nd Convention, Vienna, 2007.
cannam@16 341 </p>
cannam@16 342 <p>The Key Detector Vamp plugin was written by Katy Noland and Christian
cannam@16 343 Landone.
cannam@16 344 </p>
cannam@29 345
cannam@29 346 <a name="qm-tonalchange"></a><h2>5. Tonal Change</h2>
cannam@16 347
cannam@16 348 <p><b>System identifier</b> &ndash; <code>vamp:qm-vamp-plugins:qm-tonalchange</code>
cannam@16 349 <br><b>RDF URI</b> &ndash; <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tonalchange">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tonalchange</a>
cannam@16 350 <br><b>Links</b> &ndash; <a href="#">Back to top of library documentation</a> &ndash; <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a>
cannam@16 351 </p>
cannam@16 352 <p>Tonal Change analyses a single channel of audio, detecting harmonic
cannam@16 353 changes such as chord boundaries.
cannam@16 354 </p>
cannam@16 355 <h3>Parameters</h3>
cannam@16 356
cannam@16 357 <p><b>Gaussian smoothing</b> &ndash; The window length for the internal smoothing
cannam@16 358 operation, in chroma analysis frames. This controls how eager the
cannam@16 359 tonal change detector will be to identify very short-term tonal
cannam@16 360 changes. The default value of 5 is quite short, and may lead to more
cannam@16 361 (not always meaningful) results being returned; for many purposes a
cannam@16 362 larger value, closer to the maximum of 20, may be appropriate.
cannam@16 363 </p>
cannam@16 364 <p><b>Chromagram minimum pitch</b> &ndash; The MIDI pitch value (0-127) of the
cannam@16 365 minimum pitch included in the internal chromagram analyis.
cannam@16 366 </p>
cannam@16 367 <p><b>Chromagram maximum pitch</b> &ndash; The MIDI pitch value (0-127) of the
cannam@16 368 maximum pitch included in the internal chromagram analyis.
cannam@16 369 </p>
cannam@16 370 <p><b>Chromagram tuning frequency</b> &ndash; The frequency of concert A in the
cannam@16 371 music under analysis.
cannam@16 372 </p>
cannam@16 373 <h3>Outputs</h3>
cannam@16 374
cannam@16 375 <p><b>Transform to 6D Tonal Content Space</b> &ndash; A representation of the
cannam@16 376 musical content in a six-dimensional tonal space onto which the
cannam@16 377 algorithm maps 12-bin chroma vectors extracted from the audio.
cannam@16 378 </p>
cannam@16 379 <p><b>Tonal Change Detection Function</b> &ndash; A function representing the
cannam@16 380 estimated likelihood of a tonal change occurring in each spectral
cannam@16 381 frame.
cannam@16 382 </p>
cannam@16 383 <p><b>Tonal Change Positions</b> &ndash; The resulting estimated positions of tonal
cannam@16 384 changes.
cannam@16 385 </p>
cannam@16 386 <h3>References and Credits</h3>
cannam@16 387
cannam@16 388 <p><b>Method</b>: C. A. Harte, M. Gasser, and M. Sandler. <i><a href="http://portal.acm.org/citation.cfm?id=1178723.1178727">Detecting harmonic change in musical audio</a></i>. In Proceedings of the 1st ACM workshop on
cannam@16 389 Audio and Music Computing Multimedia, Santa Barbara, 2006.
cannam@16 390 </p>
cannam@29 391 <p>The Tonal Change Vamp plugin was written by Chris Harte and Martin
cannam@16 392 Gasser.
cannam@16 393 </p>
cannam@29 394
cannam@29 395
cannam@29 396 <a name="qm-adaptivespectrogram"></a><h2>6. Adaptive Spectrogram</h2>
cannam@29 397
cannam@29 398 <p><b>System identifier</b> &ndash; <code>vamp:qm-vamp-plugins:qm-adaptivespectrogram</code>
cannam@29 399 <br><b>RDF URI</b> &ndash; <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-adaptivespectrogram">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-adaptivespectrogram</a>
cannam@29 400 <br><b>Links</b> &ndash; <a href="#">Back to top of library documentation</a> &ndash; <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a>
cannam@29 401 </p>
cannam@29 402
cannam@29 403 <p>Adaptive Spectrogram produces a composite spectrogram from a set of
cannam@29 404 series of short-time Fourier transforms at differing resolutions.
cannam@29 405 Values are selected from these spectrograms by repeated subdivision by
cannam@29 406 time and frequency in order to maximise an entropy function across
cannam@29 407 each column.</p>
cannam@29 408
cannam@29 409 <h3>Parameters</h3>
cannam@29 410
cannam@29 411 <p><b>Number of resolutions</b> &ndash; The number of distinct
cannam@29 412 resolutions to calculate and use. The resolutions will be consecutive
cannam@29 413 powers of two starting from the smallest resolution specified.</p>
cannam@29 414
cannam@29 415 <p><b>Smallest resolution</b> &ndash; The smallest of the set of
cannam@29 416 resolutions to use.</p>
cannam@29 417
cannam@29 418 <p><b>Omit alternate resolutions</b> &ndash; Causes the plugin to
cannam@29 419 ignore alternate resolutions (i.e. the smallest resolution multiplied
cannam@29 420 by 2, 8, 32, etc) when composing a spectrogram. The smallest
cannam@29 421 resolution specified, and its multiples by 4, 16, etc as applicable,
cannam@29 422 will be retained. The total number of resolutions actually included
cannam@29 423 in the resulting spectrogram will therefore be N/2 (for even N) or
cannam@29 424 (N+1)/2 (for odd N) where N is the value of the "number of
cannam@29 425 resolutions" parameter. This permits a wider range of resolutions to
cannam@29 426 be included with less processing, at obvious cost in quality.</p>
cannam@29 427
cannam@29 428 <p><b>Multi-threaded processing</b> &ndash; Enables multi-threading of
cannam@29 429 the spectrogram calculation. This usually results in somewhat faster
cannam@29 430 processing where multiple CPU cores are available.</p>
cannam@29 431
cannam@29 432 <p>As an example of the resolution parameters, if the "number of
cannam@29 433 resolutions" is set to 5, "smallest resolution" to 128, and "omit
cannam@29 434 alternate resolutions" is not used, the composite spectrogram will be
cannam@29 435 calculated using spectrograms from 128, 256, 512, 1024, and 2048 point
cannam@29 436 short-time Fourier transforms (with 50% overlap in each case). With
cannam@29 437 "omit alternate resolutions" set, the same parameters would result in
cannam@29 438 spectrograms from 128, 512, and 2048 point STFTs being used.</p>
cannam@29 439
cannam@29 440 <h3>References and Credits</h3>
cannam@29 441
cannam@29 442 <p><b>Method</b>: X. Wen and M. Sandler. <i><a href="http://scitation.aip.org/getabs/servlet/GetabsServlet?prog=normal&id=ISPECX000003000001000051000001">Composite spectrogram using multiple Fourier transforms</a></i>. IET Signal Processing, 3(1):51-63, 2009.
cannam@29 443 </p>
cannam@29 444
cannam@29 445 <p>The Adaptive Spectrogram Vamp plugin was written by Wen Xue and Chris Cannam.</p>
cannam@29 446
cannam@29 447 <a name="qm-transcription"></a><h2>7. Polyphonic Transcription</h2>
cannam@29 448
cannam@29 449 <p><b>System identifier</b> &ndash; <code>vamp:qm-vamp-plugins:qm-transcription</code>
cannam@29 450 <br><b>RDF URI</b> &ndash; <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-transcription">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-transcription</a>
cannam@29 451 <br><b>Links</b> &ndash; <a href="#">Back to top of library documentation</a> &ndash; <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a>
cannam@29 452
cannam@29 453 <p>The Polyphonic Transcription plugin estimates a note transcription
cannam@29 454 using MIDI pitch values from its input audio, returning a feature for
cannam@29 455 each note (with timestamp and duration) whose value is the MIDI pitch
cannam@29 456 number. Velocity is not estimated.</p>
cannam@29 457
cannam@29 458 <p>Although the published description of the method is described as
cannam@29 459 real-time, the implementation used in this plugin is non-causal; it
cannam@29 460 buffers its input to operate on in a single unit, doing all the real
cannam@29 461 work after its entire input has been received, and is very memory
cannam@29 462 intensive. However, it is relatively fast (faster than real-time)
cannam@29 463 compared to other polyphonic transcription methods.</p>
cannam@29 464
cannam@29 465 <p>The plugin works best at 44.1KHz input sample rate, and is tuned for
cannam@29 466 piano and guitar music.</p>
cannam@29 467
cannam@29 468
cannam@29 469 <h3>References and Credits</h3>
cannam@29 470
cannam@29 471 <p><b>Method</b>: R. Zhou and J. D. Reiss. <i>A Real-Time Polyphonic Music Transcription System</i>. In Proceedings of the Fourth Music Information Retrieval Evaluation eXchange (MIREX), Philadelphia, USA, 2008;<br>R. Zhou and J. D. Reiss. <i>A Real-Time Frame-Based Multiple Pitch Estimation Method Using the Resonator Time Frequency Image</i>. Third Music Information Retrieval Evaluation eXchange (MIREX), Vienna, Austria, 2007.</p>
cannam@29 472
cannam@29 473 <p>The Polyphonic Transcription Vamp plugin was written by Ruohua Zhou.</p>
cannam@29 474
cannam@29 475
cannam@29 476 <a name="qm-segmenter"></a><h2>8. Segmenter</h2>
cannam@16 477
cannam@16 478 <p><b>System identifier</b> &ndash; <code>vamp:qm-vamp-plugins:qm-segmenter</code>
cannam@16 479 <br><b>RDF URI</b> &ndash; <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-segmenter">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-segmenter</a>
cannam@16 480 <br><b>Links</b> &ndash; <a href="#">Back to top of library documentation</a> &ndash; <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a>
cannam@16 481 </p>
cannam@16 482 <p>Segmenter divides a single channel of music up into structurally
cannam@16 483 consistent segments. It returns a numeric value (the segment type)
cannam@16 484 for each moment at which a new segment starts.
cannam@16 485 </p>
cannam@16 486 <p>For music with clearly tonally distinguishable sections such as verse,
cannam@16 487 chorus, etc., segments with the same type may be expected to be
cannam@16 488 similar to one another in some structural sense. For example,
cannam@16 489 repetitions of the chorus are likely to share a segment type.
cannam@16 490 </p>
cannam@16 491 <p>The plugin only attempts to identify similar segments; it does not
cannam@16 492 attempt to label them. For example, it makes no attempt to tell you
cannam@16 493 which segment is the chorus.
cannam@16 494 </p>
cannam@16 495 <p>Note that this plugin does a substantial amount of processing after
cannam@16 496 receiving all of the input audio data, before it produces any results.
cannam@16 497 </p>
cannam@16 498 <h3>Method</h3>
cannam@16 499
cannam@16 500 <p>The method relies upon structural/timbral similarity to obtain the
cannam@16 501 high-level song structure. This is based on the assumption that the
cannam@16 502 distributions of timbre features are similar over corresponding
cannam@16 503 structural elements of the music.
cannam@16 504 </p>
cannam@16 505 <p>The algorithm works by obtaining a frequency-domain representation of
cannam@16 506 the audio signal using a Constant-Q transform, a Chromagram or
cannam@16 507 Mel-Frequency Cepstral Coefficients (MFCC) as underlying features (the
cannam@16 508 particular feature is selectable as a parameter). The extracted
cannam@16 509 features are normalised in accordance with the MPEG-7 standard (NASE
cannam@16 510 descriptor), which means the spectrum is converted to decibel scale
cannam@16 511 and each spectral vector is normalised by the RMS energy envelope.
cannam@16 512 The value of this envelope is stored for each processing block of
cannam@16 513 audio. This is followed by the extraction of 20 principal components
cannam@16 514 per block using PCA, yielding a sequence of 21 dimensional feature
cannam@16 515 vectors where the last element in each vector corresponds to the
cannam@16 516 energy envelope.
cannam@16 517 </p>
cannam@16 518 <p>A 40-state Hidden Markov Model is then trained on the whole sequence
cannam@16 519 of features, with each state of the HMM corresponding to a specific
cannam@16 520 timbre type. This process partitions the timbre-space of a given track
cannam@16 521 into 40 possible types. The important assumption of the model is that
cannam@16 522 the distribution of these features remain consistent over a structural
cannam@16 523 segment. After training and decoding the HMM, the song is assigned a
cannam@16 524 sequence of timbre-features according to specific timbre-type
cannam@16 525 distributions for each possible structural segment.
cannam@16 526 </p>
cannam@16 527 <p>The segmentation itself is computed by clustering timbre-type
cannam@16 528 histograms. A series of histograms are created over a sliding window
cannam@16 529 which are grouped into M clusters by an adapted soft k-means
cannam@16 530 algorithm. Each of these clusters will correspond to a specific
cannam@16 531 segment-type of the analyzed song. Reference histograms, iteratively
cannam@16 532 updated during clustering, describe the timbre distribution for each
cannam@16 533 segment. The segmentation arises from the final cluster assignments.
cannam@16 534 </p>
cannam@16 535 <h3>Parameters</h3>
cannam@16 536
cannam@16 537 <p><b>Number of segment-types</b> &ndash; The maximum number of clusters
cannam@16 538 (segment-types) to be returned. The default is 10. Unlike many
cannam@16 539 clustering algorithms, the constrained clustering used in this plugin
cannam@16 540 does not produce too many clusters or vary significantly even if this
cannam@16 541 is set too high. However, this parameter can be useful for limiting
cannam@16 542 the number of expected segment-types.
cannam@16 543 </p>
cannam@16 544 <p><b>Feature Type</b> &ndash; The type of spectral feature used for segmentation. The available features are:<ul><li>"Hybrid", the default, which uses a Constant-Q transform (see <a href="#qm-constantq">related
cannam@16 545 plugin</a>): this is generally effective for modern studio recordings;</li><li> "Chromatic", using a chromagram derived from the Constant-Q feature (see <a href="#qm-chromagram">related plugin</a>): this may be preferable for live, acoustic, or older recordings, in which repeated sections may be less consistent in
cannam@16 546 sound;</li><li>"Timbral", using Mel-Frequency
cannam@16 547 Cepstral Coefficients (see <a href="#qm-mfcc">related plugin</a>), which is more likely to
cannam@16 548 result in classification by instrumentation rather than musical
cannam@16 549 content.</li></ul>
cannam@16 550 </p>
cannam@16 551 <p><b>Minimum segment duration</b> &ndash; The approximate expected minimum
cannam@16 552 duration for a segment, from 1 to 15 seconds. Changing this parameter
cannam@16 553 may help the plugin to find musical sections rather than just
cannam@16 554 following changes in the sound of the music, and also avoid wasting a
cannam@16 555 segment-type cluster for timbrally distinct but too-short segments.
cannam@16 556 The default of 4 seconds usually produces good results.
cannam@16 557 </p>
cannam@16 558 <h3>Outputs</h3>
cannam@16 559
cannam@16 560 <p><b>Segmentation</b> &ndash; The estimated segment boundaries, returned as a
cannam@16 561 single feature with one value at each segment boundary, with the value
cannam@16 562 representing the segment type number for the segment starting at that
cannam@16 563 boundary.
cannam@16 564 </p>
cannam@16 565 <h3>References and Credits</h3>
cannam@16 566
cannam@16 567 <p><b>Method</b>: M. Levy and M. Sandler. <i><a href="http://ieeexplore.ieee.org/iel5/10376/4432632/04432648.pdf?arnumber=4432648">Structural segmentation of musical audio by constrained clustering</a></i>. IEEE Transactions on Audio, Speech, and Language Processing, February 2008.
cannam@16 568 </p>
cannam@16 569 <p>Note that this plugin does not implement the beat-sychronous aspect
cannam@16 570 of the segmentation method described in the paper.
cannam@16 571 </p>
cannam@16 572 <p>The Segmenter Vamp plugin was written by Mark Levy. Thanks to George
cannam@16 573 Fazekas for providing much of this documentation.
cannam@16 574 </p>
cannam@29 575 <a name="qm-similarity"></a><h2>9. Similarity</h2>
cannam@16 576
cannam@16 577 <p><b>System identifier</b> &ndash; <code>vamp:qm-vamp-plugins:qm-similarity</code>
cannam@16 578 <br><b>RDF URI</b> &ndash; <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-similarity">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-similarity</a>
cannam@16 579 <br><b>Links</b> &ndash; <a href="#">Back to top of library documentation</a> &ndash; <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a>
cannam@16 580 </p>
cannam@16 581 <p>Similarity treats each channel of its audio input as a separate
cannam@16 582 "track", and estimates how similar the tracks are to one another using
cannam@16 583 a selectable similarity measure.
cannam@16 584 </p>
cannam@16 585 <p>The plugin also returns the intermediate data used as a basis of the
cannam@16 586 similarity measure; it can therefore be used on a single channel of
cannam@16 587 input (with the resulting intermediate data then being applied in some
cannam@16 588 other similarity or clustering algorithm, for example) if desired, as
cannam@16 589 well as with multiple inputs.
cannam@16 590 </p>
cannam@16 591 <p>Because of the way this plugin handles multiple inputs, by assuming
cannam@16 592 that each channel represents a separate piece of music, it may not be
cannam@16 593 appropriate for use directly in a general-purpose host (unless you
cannam@16 594 actually want to do something like compare two stereo channels for
cannam@16 595 timbral similarity, which is unlikely).
cannam@16 596 </p>
cannam@16 597 <h3>Parameters</h3>
cannam@16 598
cannam@16 599 <p><b>Feature Type</b> &ndash; The underlying audio feature used for the similarity
cannam@16 600 measure. The available features are:
cannam@16 601 <ul><li>"Timbre", in which the distance
cannam@16 602 between tracks is a symmetrised Kullback-Leibler divergence between
cannam@16 603 Gaussian-modelled MFCC means and variances across each track, for the
cannam@16 604 first 20 MFCCs including C0 (see <a href="#qm-mfcc">related plugin</a>);</li><li>"Chroma", which uses Kullback-Leibler divergence of
cannam@16 605 mean chroma histogram (see <a href="#qm-chromagram">related plugin</a>);</li><li>"Rhythm", using the cosine distance between
cannam@16 606 "beat spectrum" measures derived from a short sampled section of the
cannam@16 607 track;</li><li>and combined "Timbre and Rhythm" and "Chroma and Rhythm"
cannam@16 608 features.</li></ul>
cannam@16 609 </p>
cannam@16 610 <h3>Outputs</h3>
cannam@16 611
cannam@16 612 <p><b>Distance Matrix</b> &ndash; A matrix of the distance measures between input
cannam@16 613 channels, returned as a series of vector features timestamped at
cannam@16 614 one-second intervals. The distance from channel i to channel j
cannam@16 615 appears as the j'th bin of the feature at time i.
cannam@16 616 </p>
cannam@16 617 <p><b>Distance from First Channel</b> &ndash; A single vector feature, timestamped
cannam@16 618 at time zero, containing the distances between the first input channel
cannam@16 619 and each of the input channels (including the first channel itself at
cannam@16 620 bin 0, which should have zero distance).
cannam@16 621 </p>
cannam@16 622 <p><b>Ordered Distances from First Channel</b> &ndash; A pair of vector features,
cannam@16 623 at times 0 and 1 second. The feature at time 0 contains the 1-based
cannam@16 624 indices of the input channels in the order of similarity to the first
cannam@16 625 input channel (so its first bin should always contain 1, as the first
cannam@16 626 channel is most similar to itself). The feature at time 1 contains,
cannam@16 627 in bin n, the distance between the first input channel and the channel
cannam@16 628 with index found at bin n of the feature at time 0.
cannam@16 629 </p>
cannam@16 630 <p><b>Feature Means</b> &ndash; A series of vector features containing the mean
cannam@16 631 values of each of the feature bins across the duration of each of the
cannam@16 632 input channels. This output returns one feature for each input
cannam@16 633 channel, timestamped at one-second intervals. The number of bins for
cannam@16 634 each feature depends on the feature type; it will be 20 for MFCC
cannam@16 635 features and 12 for chroma features. No features will be returned on
cannam@16 636 this output if the feature type is purely rhythmic.
cannam@16 637 </p>
cannam@16 638 <p><b>Feature Variances</b> &ndash; Just as Feature Means, but variances.
cannam@16 639 </p>
cannam@16 640 <p><b>Beat Spectra</b> &ndash; A series of vector features containing the rhythmic
cannam@16 641 autocorrelation profiles (beat spectra) for each of the input
cannam@16 642 channels. This output returns one 512-bin feature for each input
cannam@16 643 channel, timestamped at one-second intervals. No features will be
cannam@16 644 returned on this output if the feature type contains no rhythm
cannam@16 645 component.
cannam@16 646 </p>
cannam@16 647 <h3>References and Credits</h3>
cannam@16 648
cannam@16 649 <p><b>Timbral similarity</b>: M. Levy and M. Sandler. <i><a href="http://www.elec.qmul.ac.uk/easaier/papers/mlevytimbralsimilarity.pdf">Lightweight measures for timbral similarity of musical audio</a></i>. In Proceedings of the 1st
cannam@16 650 ACM workshop on Audio and Music Computing Multimedia, Santa Barbara,
cannam@16 651 2006.
cannam@16 652 </p>
cannam@16 653 <p><b>Combined rhythmic and timbral similarity</b>: K. Jacobson. <i><a href="http://ismir2006.ismir.net/PAPERS/ISMIR0696_Paper.pdf">A Multifaceted Approach to Music Similarity</a></i>. In Proceedings of the
cannam@16 654 Seventh International Conference on Music Information Retrieval
cannam@16 655 (ISMIR), 2006.
cannam@16 656 </p>
cannam@16 657 <p>The Similarity Vamp plugin was written by Mark Levy, Kurt Jacobson and
cannam@16 658 Chris Cannam.
cannam@16 659 </p>
cannam@29 660
cannam@29 661
cannam@29 662 <a name="qm-dwt"></a><h2>10. Discrete Wavelet Transform</h2>
cannam@29 663
cannam@29 664 <p><b>System identifier</b> &ndash; <code>vamp:qm-vamp-plugins:qm-dwt</code>
cannam@29 665 <br><b>RDF URI</b> &ndash; <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-dwt">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-dwt</a>
cannam@29 666 <br><b>Links</b> &ndash; <a href="#">Back to top of library documentation</a> &ndash; <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a>
cannam@29 667
cannam@29 668 <p>Discrete Wavelet Transform plugin performs the forward DWT on the
cannam@29 669 signal. The wavelet coefficients are derived from a fast segmented DWT
cannam@29 670 algorithm without block end effects. The DWT can be performed with
cannam@29 671 various functions from a selection of wavelets up to the 16th scale.<p>
cannam@29 672
cannam@29 673 <p>The wavelet coefficients are returned as feature columns at a rate of
cannam@29 674 half the sample rate of the signal to be analysed. To simulate
cannam@29 675 multiresolution in the layer data table, the coefficient values at
cannam@29 676 higher scales are copied multiple times according to the number of the
cannam@29 677 scale. For example, for scale 2 each value will appear twice, at scale
cannam@29 678 3 they will be appear four times, at scale 4 there will be 8 times the
cannam@29 679 same coefficient value in order to simulate the lower resolution at
cannam@29 680 higher scales.</p>
cannam@29 681
cannam@29 682 <h3>Parameters</h3>
cannam@29 683
cannam@29 684 <p><b>Scales</b> &ndash; Adjusts the number of scales of the DWT. The
cannam@29 685 processing block size needs to be set to at least 2<sup>n</sup>, where n =
cannam@29 686 number of scales.</p>
cannam@29 687
cannam@29 688 <p><b>Wavelet</b> &ndash; Selects the wavelet function to be used for
cannam@29 689 the transform. Wavelets from the following families are available:
cannam@29 690 Daubechies, Symlets, Coiflets, Biorthogonal, Meyer.</p>
cannam@29 691
cannam@29 692 <h3>References and Credits</h3>
cannam@29 693
cannam@29 694 <p><b>Principles</b>: S. Mallat. <i>A theory for multiresolution signal decomposition: the wavelet representation</i>. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 11 (1989), pp. 674-693;<br>
cannam@29 695 P. Rajmic and J. Vlach. <i>Real-Time Audio Processing via Segmented Wavelet Transform</i>. In Proceedings of the 10th Int. Conference on Digital Audio Effects (DAFx-07), Bordeaux, France, September 10-15, 2007.</p>
cannam@29 696
cannam@29 697 <p>The Discrete Wavelet Transform plugin was written by Thomas Wilmering.</p>
cannam@29 698
cannam@29 699 <a name="qm-constantq"></a><h2>11. Constant-Q Spectrogram</h2>
cannam@16 700
cannam@16 701 <p><b>System identifier</b> &ndash; <code>vamp:qm-vamp-plugins:qm-constantq</code>
cannam@16 702 <br><b>RDF URI</b> &ndash; <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-constantq">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-constantq</a>
cannam@16 703 <br><b>Links</b> &ndash; <a href="#">Back to top of library documentation</a> &ndash; <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a>
cannam@16 704 </p>
cannam@16 705 <p>Constant-Q Spectrogram calculates a spectrogram based on a short-time
cannam@16 706 windowed constant Q spectral transform. This is a spectrogram in
cannam@16 707 which the ratio of centre frequency to resolution is constant for each
cannam@16 708 frequency bin. The frequency bins correspond to the frequencies of
cannam@16 709 "musical notes" rather than being linearly spaced in frequency as they
cannam@16 710 are for the conventional DFT spectrogram.
cannam@16 711 </p>
cannam@16 712 <p>The pitch range and the number of frequency bins per octave may be
cannam@16 713 adjusted using the plugin's parameters. Note that the plugin's
cannam@16 714 preferred step and block sizes are defined by these parameters, and
cannam@16 715 the plugin will not accept any other block size than its preferred
cannam@16 716 value.
cannam@16 717 </p>
cannam@16 718 <h3>Parameters</h3>
cannam@16 719
cannam@16 720 <p><b>Minimum Pitch</b> &ndash; The MIDI pitch value (0-127) corresponding to the lowest
cannam@16 721 frequency to be included in the constant-Q transform.
cannam@16 722 </p>
cannam@16 723 <p><b>Maximum Pitch</b> &ndash; The MIDI pitch value (0-127) corresponding to the
cannam@16 724 lowest frequency to be included in the constant-Q transform.
cannam@16 725 </p>
cannam@16 726 <p><b>Tuning Frequency</b> &ndash; The frequency of concert A in the
cannam@16 727 music under analysis.
cannam@16 728 </p>
cannam@16 729 <p><b>Bins per Octave</b> &ndash; The number of constant-Q transform bins to be
cannam@16 730 computed per octave.
cannam@16 731 </p>
cannam@16 732 <p><b>Normalized</b> &ndash; Whether to normalize each output column to unit
cannam@16 733 maximum.
cannam@16 734 </p>
cannam@16 735 <h3>Outputs</h3>
cannam@16 736
cannam@16 737 <p><b>Constant-Q Spectrogram</b> &ndash; The calculated spectrogram, as a single
cannam@16 738 feature per process block containing one bin for each pitch included
cannam@16 739 in the spectrogram's range.
cannam@16 740 </p>
cannam@16 741 <h3>References and Credits</h3>
cannam@16 742
cannam@16 743 <p><b>Principle</b>: J. Brown. <i><a href="http://www.wellesley.edu/Physics/brown/pubs/cq1stPaper.pdf">Calculation of a constant Q spectral transform</a></i>. Journal of the Acoustical Society of America, 89(1):
cannam@16 744 425-434, 1991.
cannam@16 745 </p>
cannam@16 746 <p>The Constant-Q Spectrogram Vamp plugin was written by Christian
cannam@16 747 Landone.
cannam@16 748 </p>
cannam@29 749 <a name="qm-chromagram"></a><h2>12. Chromagram</h2>
cannam@16 750
cannam@16 751 <p><b>System identifier</b> &ndash; <code>vamp:qm-vamp-plugins:qm-chromagram</code>
cannam@16 752 <br><b>RDF URI</b> &ndash; <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-chromagram">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-chromagram</a>
cannam@16 753 <br><b>Links</b> &ndash; <a href="#">Back to top of library documentation</a> &ndash; <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a>
cannam@16 754 </p>
cannam@16 755 <p>Chromagram calculates a constant Q spectral transform (as in the
cannam@16 756 Constant Q Spectrogram plugin) and then wraps the frequency bin values
cannam@16 757 into a single octave, with each bin containing the sum of the
cannam@16 758 magnitudes from the corresponding bin in all octaves. The number of
cannam@16 759 values in each feature vector returned by the plugin is therefore the
cannam@16 760 same as the number of bins per octave configured for the underlying
cannam@16 761 constant Q transform.
cannam@16 762 </p>
cannam@16 763 <p>The pitch range and the number of frequency bins per octave for the
cannam@16 764 transform may be adjusted using the plugin's parameters. Note that
cannam@16 765 the plugin's preferred step and block sizes depend on these
cannam@16 766 parameters, and the plugin will not accept any other block size than
cannam@16 767 its preferred value.
cannam@16 768 </p>
cannam@16 769 <h3>Parameters</h3>
cannam@16 770
cannam@16 771 <p><b>Minimum Pitch</b> &ndash; The MIDI pitch value (0-127) corresponding to the
cannam@16 772 lowest frequency to be included in the constant-Q transform used in
cannam@16 773 calculating the chromagram.
cannam@16 774 </p>
cannam@16 775 <p><b>Maximum Pitch</b> &ndash; The MIDI pitch value (0-127) corresponding to the
cannam@16 776 lowest frequency to be included in the constant-Q transform used in
cannam@16 777 calculating the chromagram.
cannam@16 778 </p>
cannam@16 779 <p><b>Tuning Frequency</b> &ndash; The frequency of concert A in the
cannam@16 780 music under analysis.
cannam@16 781 </p>
cannam@16 782 <p><b>Bins per Octave</b> &ndash; The number of constant-Q transform bins to be
cannam@16 783 computed per octave, and thus the total number of bins present in the
cannam@16 784 resulting chromagram.
cannam@16 785 </p>
cannam@16 786 <p><b>Normalized</b> &ndash; Whether to normalize each output column. Normalization
cannam@16 787 may be to unit sum or unit maximum.
cannam@16 788 </p>
cannam@16 789 <h3>Outputs</h3>
cannam@16 790
cannam@16 791 <p><b>Chromagram</b> &ndash; The calculated chromagram, as a single feature per
cannam@16 792 process block containing the number of bins given in the bins per
cannam@16 793 octave parameter.
cannam@16 794 </p>
cannam@16 795 <h3>References and Credits</h3>
cannam@16 796
cannam@16 797 <p>The Chromagram Vamp plugin was written by Christian Landone.
cannam@16 798 </p>
cannam@29 799 <a name="qm-mfcc"></a><h2>13. Mel-Frequency Cepstral Coefficients</h2>
cannam@16 800
cannam@16 801 <p><b>System identifier</b> &ndash; <code>vamp:qm-vamp-plugins:qm-mfcc</code>
cannam@16 802 <br><b>RDF URI</b> &ndash; <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-mfcc">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-mfcc</a>
cannam@16 803 <br><b>Links</b> &ndash; <a href="#">Back to top of library documentation</a> &ndash; <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a>
cannam@16 804 </p>
cannam@16 805 <p>Mel-Frequency Cepstral Coefficients calculates MFCCs from a single
cannam@16 806 channel of audio. These coefficients, derived from a cosine transform
cannam@16 807 of the mapping of an audio spectrum onto a frequency scale modelled on
cannam@16 808 human auditory response, are widely used in speech recognition, music
cannam@16 809 classification and other tasks.
cannam@16 810 </p>
cannam@16 811 <h3>Parameters</h3>
cannam@16 812
cannam@16 813 <p><b>Number of Coefficients</b> &ndash; The number of MFCCs to return. Commonly
cannam@16 814 used values include 13 or the default 20. This number includes C0 if
cannam@16 815 requested (see Include C0 below).
cannam@16 816 </p>
cannam@16 817 <p><b>Power for Mel Amplitude Logs</b> &ndash; An optional power value to which the
cannam@16 818 spectral amplitudes should be raised before applying the cosine
cannam@16 819 transform. Values greater than 1 may in principle reduce the
cannam@16 820 contribution of noise to the results. The default is 1.
cannam@16 821 </p>
cannam@16 822 <p><b>Include C0</b> &ndash; Whether to include the "zero'th" coefficient, which
cannam@16 823 simply reflects the overall signal power across the Mel frequency
cannam@16 824 bands.
cannam@16 825 </p>
cannam@16 826 <h3>Outputs</h3>
cannam@16 827
cannam@16 828 <p><b>Coefficients</b> &ndash; The MFCC values, returned as one vector feature per
cannam@16 829 processing block.
cannam@16 830 </p>
cannam@16 831 <p><b>Means of Coefficients</b> &ndash; The overall means of the MFCC bins, as a
cannam@16 832 single vector feature with time 0 that is returned when processing is
cannam@16 833 complete.
cannam@16 834 </p>
cannam@16 835 <h3>References and Credits</h3>
cannam@16 836
cannam@16 837 <p><b>MFCCs in music</b>: See B. Logan. <i><a href="http://ismir2000.ismir.net/papers/logan_paper.pdf">Mel-Frequency Cepstral Coefficients for Music Modeling</a></i>. In Proceedings of the First International
cannam@16 838 Symposium on Music Information Retrieval (ISMIR), 2000.
cannam@16 839 </p>
cannam@16 840 <p>The Mel-Frequency Cepstral Coefficients Vamp plugin was written by
cannam@16 841 Nicolas Chetry and Chris Cannam.
cannam@16 842 </p>
cannam@16 843 <p></p>
cannam@16 844 </CONTENTS>
cannam@16 845 </body>
cannam@16 846 </html>