vamp-website: plugin-doc/qm-vamp-plugins.html annotate

annotate plugin-doc/qm-vamp-plugins.html @ 22:d83d60afe81e website

* Update downloads

author	cannam
date	Tue, 09 Dec 2008 11:44:59 +0000
parents	16f8de0dc974
children	90a1fa18d239

rev	line source
cannam@16	1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
cannam@16	2 <html>
cannam@16	3 <head>
cannam@16	4 <link rel="stylesheet" media="screen" type="text/css" href="/screen.css"/>
cannam@16	5 <link rel="icon" type="image/png" href="/images/waveform.png"/>
cannam@16	6 <link rel="shortcut" type="image/png" href="/images/waveform.png"/>
cannam@16	7 <title>QM Vamp Plugins: User Documentation</title>
cannam@16	8 <meta name="robots" content="index"/>
cannam@16	9 </head>
cannam@16	10 <body>
cannam@16	11 <h1 id="header"><span>Vamp Plugins</span></h1>
cannam@16	12
cannam@16	13 <h2>QM Vamp Plugins</h2>
cannam@16	14
cannam@16	15 <p>The QM Vamp Plugin set is a library of Vamp audio feature
cannam@16	16 extraction plugins developed at the <a
cannam@16	17 href="http://www.elec.qmul.ac.uk/digitalmusic/">Centre for Digital
cannam@16	18 Music</a> at Queen Mary, University of London. These plugins are
cannam@16	19 provided as a single library file, made available in binary form for
cannam@16	20 Windows, OS/X, and Linux from the Centre for Digital Music's <a
cannam@16	21 href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">download
cannam@16	22 page</a>.
cannam@16	23 </p>
cannam@16	24 <p>For more information about Vamp plugins, see <a href="http://www.vamp-plugins.org/">http://www.vamp-plugins.org/</a> .
cannam@16	25 </p>
cannam@16	26
cannam@16	27 <div class="toc2">1.  <a href="#qm-onsetdetector">Note Onset Detector</a></div>
cannam@16	28 <div class="toc2">2.  <a href="#qm-tempotracker">Tempo and Beat Tracker</a></div>
cannam@16	29 <div class="toc2">3.  <a href="#qm-keydetector">Key Detector</a></div>
cannam@16	30 <div class="toc2">4.  <a href="#qm-tonalchange">Tonal Change</a></div>
cannam@16	31 <div class="toc2">5.  <a href="#qm-segmenter">Segmenter</a></div>
cannam@16	32 <div class="toc2">6.  <a href="#qm-similarity">Similarity</a></div>
cannam@16	33 <div class="toc2">7.  <a href="#qm-constantq">Constant-Q Spectrogram</a></div>
cannam@16	34 <div class="toc2">8.  <a href="#qm-chromagram">Chromagram</a></div>
cannam@16	35 <div class="toc2">9.  <a href="#qm-mfcc">Mel-Frequency Cepstral Coefficients</a></div>
cannam@16	36
cannam@16	37 <a name="qm-onsetdetector"></a><a name="qm-"></a><h2>1. Note Onset Detector</h2>
cannam@16	38
cannam@16	39 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-onsetdetector</code>
cannam@16	40 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-onsetdetector">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-onsetdetector</a>
cannam@16	41 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a>
cannam@16	42 </p>
cannam@16	43 <p>Note Onset Detector analyses a single channel of audio and estimates
cannam@16	44 the onset times of notes within the music – that is, the times at
cannam@16	45 which notes and other audible events begin.
cannam@16	46 </p>
cannam@16	47 <p>It calculates an onset likelihood function for each spectral frame,
cannam@16	48 and picks peaks in a smoothed version of this function. The plugin is
cannam@16	49 non-causal, returning all results at the end of processing.
cannam@16	50 </p>
cannam@16	51 <h3>Parameters</h3>
cannam@16	52
cannam@16	53 <p><b>Onset Detection Function Type</b> – The method used to calculate the
cannam@16	54 onset likelihood function. The most versatile method is the default,
cannam@16	55 "Complex Domain" (see reference, Duxbury et al 2003). "Spectral
cannam@16	56 Difference" may be appropriate for percussive recordings, "Phase
cannam@16	57 Deviation" for non-percussive music, and "Broadband Energy Rise" (see
cannam@16	58 reference, Barry et al 2005) for identifying percussive onsets in
cannam@16	59 mixed music.
cannam@16	60 </p>
cannam@16	61 <p><b>Onset Detector Sensitivity</b> – Sensitivity level for peak detection
cannam@16	62 in the onset likelihood function. The higher the sensitivity, the
cannam@16	63 more onsets will (rightly or wrongly) be detected. The peak picker
cannam@16	64 does not have a simple threshold level; instead, this parameter
cannam@16	65 controls the required "steepness" of the slopes in the smoothed
cannam@16	66 detection function either side of a peak value, in order for that peak
cannam@16	67 to be accepted as an onset.
cannam@16	68 </p>
cannam@16	69 <p><b>Adaptive Whitening</b> – This option evens out the temporal and
cannam@16	70 frequency variation in the signal, which can yield improved
cannam@16	71 performance in onset detection, for example in audio with big
cannam@16	72 variations in dynamics.
cannam@16	73 </p>
cannam@16	74 <h3>Outputs</h3>
cannam@16	75
cannam@16	76 <p><b>Note Onsets</b> – The detected note onset times, returned as a single
cannam@16	77 feature with timestamp but no value for each detected note.
cannam@16	78 </p>
cannam@16	79 <p><b>Onset Detection Function</b> – The raw note onset likelihood function
cannam@16	80 that was calculated as the first step of the detection process.
cannam@16	81 </p>
cannam@16	82 <p><b>Smoothed Detection Function</b> – The note onset likelihood function
cannam@16	83 following median filtering. This is the function from which
cannam@16	84 sufficiently steep peak values are picked and classified as onsets.
cannam@16	85 </p>
cannam@16	86 <h3>References and Credits</h3>
cannam@16	87
cannam@16	88 <p><b>Basic detection methods</b>: C. Duxbury, J. P. Bello, M. Davies and
cannam@16	89 M. Sandler, <i><a href="http://www.elec.qmul.ac.uk/dafx03/proceedings/pdfs/dafx81.pdf">Complex domain Onset Detection for Musical Signals</a></i>. In
cannam@16	90 Proceedings of the 6th Conference on Digital Audio Effects
cannam@16	91 (DAFx-03). London, UK. September 2003.
cannam@16	92 </p>
cannam@16	93 <p><b>Adaptive whitening</b>: D. Stowell and M. D. Plumbley, <i><a href="http://www.elec.qmul.ac.uk/digitalmusic/papers/2007/StowellPlumbley07-icmc.pdf">Adaptive whitening for improved real-time audio onset detection</a></i>. In
cannam@16	94 Proceedings of the International Computer Music Conference (ICMC'07),
cannam@16	95 August 2007.
cannam@16	96 </p>
cannam@16	97 <p><b>Percussion onset detector</b>: D. Barry, D. Fitzgerald, E. Coyle and
cannam@16	98 B. Lawlor, <i><a href="http://eleceng.dit.ie/papers/15.pdf">Drum Source Separation using Percussive Feature Detection and Spectral Modulation</a></i>. ISSC 2005.
cannam@16	99 </p>
cannam@16	100 <p>The Note Onset Detector Vamp plugin was written by Chris Duxbury, Juan
cannam@16	101 Pablo Bello and Christian Landone.
cannam@16	102 </p>
cannam@16	103 <a name="qm-tempotracker"></a><h2>2. Tempo and Beat Tracker</h2>
cannam@16	104
cannam@16	105 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-tempotracker</code>
cannam@16	106 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tempotracker">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tempotracker</a>
cannam@16	107 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a>
cannam@16	108 </p>
cannam@16	109 <p>Tempo and Beat Tracker analyses a single channel of audio and
cannam@16	110 estimates the positions of metrical beats within the music (the
cannam@16	111 equivalent of a human listener tapping their foot to the beat).
cannam@16	112 </p>
cannam@16	113 <h3>Parameters</h3>
cannam@16	114
cannam@16	115 <p><b>Onset Detection Function Type</b> – The method used to calculate the
cannam@16	116 onset likelihood function. The most versatile method is the default,
cannam@16	117 "Complex Domain" (see reference, Duxbury et al 2003). "Spectral
cannam@16	118 Difference" may be appropriate for percussive recordings, "Phase
cannam@16	119 Deviation" for non-percussive music, and "Broadband Energy Rise" (see
cannam@16	120 reference, Barry et al 2005) for identifying percussive onsets in
cannam@16	121 mixed music.
cannam@16	122 </p>
cannam@16	123 <p><b>Adaptive Whitening</b> – This option evens out the temporal and
cannam@16	124 frequency variation in the signal, which can yield improved
cannam@16	125 performance in onset detection, for example in audio with big
cannam@16	126 variations in dynamics.
cannam@16	127 </p>
cannam@16	128 <h3>Outputs</h3>
cannam@16	129
cannam@16	130 <p><b>Beats</b> – The estimated beat locations, returned as a single feature,
cannam@16	131 with timestamp but no value, for each beat, labelled with the
cannam@16	132 corresponding estimated tempo at that beat.
cannam@16	133 </p>
cannam@16	134 <p><b>Onset Detection Function</b> – The raw note onset likelihood function
cannam@16	135 used in beat estimation.
cannam@16	136 </p>
cannam@16	137 <p><b>Tempo</b> – The estimated tempo, returned as a feature each time the
cannam@16	138 estimated tempo changes, with a single value for the tempo in beats
cannam@16	139 per minute.
cannam@16	140 </p>
cannam@16	141 <h3>References and Credits</h3>
cannam@16	142
cannam@16	143 <p><b>Beat tracking method</b>: M. E. P. Davies and M. D. Plumbley.
cannam@16	144 <i><a href="http://www.elec.qmul.ac.uk/people/markp/2007/DaviesPlumbley07-taslp.pdf">Context-dependent beat tracking of musical audio</a></i>. In IEEE
cannam@16	145 Transactions on Audio, Speech and Language Processing. Vol. 15, No. 3,
cannam@16	146 pp1009-1020, 2007. See also M. E. P. Davies and M. D. Plumbley.
cannam@16	147 <i><a href="http://www.elec.qmul.ac.uk/people/markp/2005/DaviesPlumbley05-icassp.pdf">Beat Tracking With A Two State Model</a></i>. In Proceedings of the IEEE
cannam@16	148 International Conference on Acoustics, Speech and Signal Processing
cannam@16	149 (ICASSP 2005), Vol. 3, pp241-244 Philadelphia, USA, March 19-23, 2005.
cannam@16	150 </p>
cannam@16	151 <p><b>Onset detection methods</b>: C. Duxbury, J. P. Bello, M. Davies and
cannam@16	152 M. Sandler, <i><a href="http://www.elec.qmul.ac.uk/dafx03/proceedings/pdfs/dafx81.pdf">Complex domain Onset Detection for Musical Signals</a></i>. In
cannam@16	153 Proceedings of the 6th Conference on Digital Audio Effects
cannam@16	154 (DAFx-03). London, UK. September 2003.
cannam@16	155 </p>
cannam@16	156 <p><b>Adaptive whitening</b>: D. Stowell and M. D. Plumbley, <i><a href="http://www.elec.qmul.ac.uk/digitalmusic/papers/2007/StowellPlumbley07-icmc.pdf">Adaptive whitening for improved real-time audio onset detection</a></i>. In
cannam@16	157 Proceedings of the International Computer Music Conference (ICMC'07),
cannam@16	158 August 2007.
cannam@16	159 </p>
cannam@16	160 <p><b>Percussion onset detector</b>: D. Barry, D. Fitzgerald, E. Coyle and
cannam@16	161 B. Lawlor, <i><a href="http://eleceng.dit.ie/papers/15.pdf">Drum Source Separation using Percussive Feature Detection and Spectral Modulation</a></i>. ISSC 2005.
cannam@16	162 </p>
cannam@16	163 <p>The Tempo and Beat Tracker Vamp plugin was written by Matthew Davies
cannam@16	164 and Christian Landone.
cannam@16	165 </p>
cannam@16	166 <a name="qm-keydetector"></a><h2>3. Key Detector</h2>
cannam@16	167
cannam@16	168 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-keydetector</code>
cannam@16	169 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-keydetector">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-keydetector</a>
cannam@16	170 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a>
cannam@16	171 </p>
cannam@16	172 <p>Key Detector analyses a single channel of audio and continuously
cannam@16	173 estimates the key of the music by comparing the degree to which a
cannam@16	174 block-by-block chromagram correlates to the stored key profiles for
cannam@16	175 each major and minor key.
cannam@16	176 </p>
cannam@16	177 <p>The key profiles are drawn from analysis of Book I of the Well
cannam@16	178 Tempered Klavier by J S Bach, recorded at A=440 equal temperament.
cannam@16	179 </p>
cannam@16	180 <h3>Parameters</h3>
cannam@16	181
cannam@16	182 <p><b>Tuning Frequency</b> – The frequency of concert A in the music under
cannam@16	183 analysis.
cannam@16	184 </p>
cannam@16	185 <p><b>Window Length</b> – The number of chroma analysis frames taken into
cannam@16	186 account for key estimation. This controls how eager the key detector
cannam@16	187 will be to return short-duration tonal changes as new key changes (the
cannam@16	188 shorter the window, the more likely it is to detect a new key change).
cannam@16	189 </p>
cannam@16	190 <h3>Outputs</h3>
cannam@16	191
cannam@16	192 <p><b>Tonic Pitch</b> – The tonic pitch of each estimated key change,
cannam@16	193 returned as a single-valued feature at the point where the key change
cannam@16	194 is detected, with value counted from 1 to 12 where C is 1, C# or Db is
cannam@16	195 2, and so on up to B which is 12.
cannam@16	196 </p>
cannam@16	197 <p><b>Key Mode</b> – The major or minor mode of the estimated key, where
cannam@16	198 major is 0 and minor is 1.
cannam@16	199 </p>
cannam@16	200 <p><b>Key</b> – The estimated key for each key change, returned as a
cannam@16	201 single-valued feature at the point where the key change is detected,
cannam@16	202 with value counted from 1 to 24 where 1-12 are the major keys and
cannam@16	203 13-24 are the minor keys, such that C major is 1, C# major is 2, and
cannam@16	204 so on up to B major which is 12; then C minor is 13, Db minor is 14,
cannam@16	205 and so on up to B minor which is 24.
cannam@16	206 </p>
cannam@16	207 <p><b>Key Strength Plot</b> – A grid representing the ongoing key
cannam@16	208 "probability" throughout the music. This is returned as a feature for
cannam@16	209 each chroma frame, containing 25 bins. Bins 1-12 are the major keys
cannam@16	210 from C upwards; bins 14-25 are the minor keys from C upwards. The
cannam@16	211 13th bin is unused: it just provides space between the first and
cannam@16	212 second halves of the feature if displayed in a single plot.
cannam@16	213 </p>
cannam@16	214 <p>The outputs are also labelled with pitch or key as text.
cannam@16	215 </p>
cannam@16	216 <h3>References and Credits</h3>
cannam@16	217
cannam@16	218 <p><b>Method</b>: see K. Noland and M. Sandler. <i><a href="http://www.aes.org/e-lib/browse.cfm?elib=14140">Signal Processing Parameters for Tonality Estimation</a></i>. In Proceedings of Audio Engineering Society
cannam@16	219 122nd Convention, Vienna, 2007.
cannam@16	220 </p>
cannam@16	221 <p>The Key Detector Vamp plugin was written by Katy Noland and Christian
cannam@16	222 Landone.
cannam@16	223 </p>
cannam@16	224 <a name="qm-tonalchange"></a><h2>4. Tonal Change</h2>
cannam@16	225
cannam@16	226 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-tonalchange</code>
cannam@16	227 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tonalchange">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tonalchange</a>
cannam@16	228 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a>
cannam@16	229 </p>
cannam@16	230 <p>Tonal Change analyses a single channel of audio, detecting harmonic
cannam@16	231 changes such as chord boundaries.
cannam@16	232 </p>
cannam@16	233 <h3>Parameters</h3>
cannam@16	234
cannam@16	235 <p><b>Gaussian smoothing</b> – The window length for the internal smoothing
cannam@16	236 operation, in chroma analysis frames. This controls how eager the
cannam@16	237 tonal change detector will be to identify very short-term tonal
cannam@16	238 changes. The default value of 5 is quite short, and may lead to more
cannam@16	239 (not always meaningful) results being returned; for many purposes a
cannam@16	240 larger value, closer to the maximum of 20, may be appropriate.
cannam@16	241 </p>
cannam@16	242 <p><b>Chromagram minimum pitch</b> – The MIDI pitch value (0-127) of the
cannam@16	243 minimum pitch included in the internal chromagram analyis.
cannam@16	244 </p>
cannam@16	245 <p><b>Chromagram maximum pitch</b> – The MIDI pitch value (0-127) of the
cannam@16	246 maximum pitch included in the internal chromagram analyis.
cannam@16	247 </p>
cannam@16	248 <p><b>Chromagram tuning frequency</b> – The frequency of concert A in the
cannam@16	249 music under analysis.
cannam@16	250 </p>
cannam@16	251 <h3>Outputs</h3>
cannam@16	252
cannam@16	253 <p><b>Transform to 6D Tonal Content Space</b> – A representation of the
cannam@16	254 musical content in a six-dimensional tonal space onto which the
cannam@16	255 algorithm maps 12-bin chroma vectors extracted from the audio.
cannam@16	256 </p>
cannam@16	257 <p><b>Tonal Change Detection Function</b> – A function representing the
cannam@16	258 estimated likelihood of a tonal change occurring in each spectral
cannam@16	259 frame.
cannam@16	260 </p>
cannam@16	261 <p><b>Tonal Change Positions</b> – The resulting estimated positions of tonal
cannam@16	262 changes.
cannam@16	263 </p>
cannam@16	264 <h3>References and Credits</h3>
cannam@16	265
cannam@16	266 <p><b>Method</b>: C. A. Harte, M. Gasser, and M. Sandler. <i><a href="http://portal.acm.org/citation.cfm?id=1178723.1178727">Detecting harmonic change in musical audio</a></i>. In Proceedings of the 1st ACM workshop on
cannam@16	267 Audio and Music Computing Multimedia, Santa Barbara, 2006.
cannam@16	268 </p>
cannam@16	269 <p>The Tonal Change Vamp plugin was wrtitten by Chris Harte and Martin
cannam@16	270 Gasser.
cannam@16	271 </p>
cannam@16	272 <a name="qm-segmenter"></a><h2>5. Segmenter</h2>
cannam@16	273
cannam@16	274 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-segmenter</code>
cannam@16	275 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-segmenter">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-segmenter</a>
cannam@16	276 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a>
cannam@16	277 </p>
cannam@16	278 <p>Segmenter divides a single channel of music up into structurally
cannam@16	279 consistent segments. It returns a numeric value (the segment type)
cannam@16	280 for each moment at which a new segment starts.
cannam@16	281 </p>
cannam@16	282 <p>For music with clearly tonally distinguishable sections such as verse,
cannam@16	283 chorus, etc., segments with the same type may be expected to be
cannam@16	284 similar to one another in some structural sense. For example,
cannam@16	285 repetitions of the chorus are likely to share a segment type.
cannam@16	286 </p>
cannam@16	287 <p>The plugin only attempts to identify similar segments; it does not
cannam@16	288 attempt to label them. For example, it makes no attempt to tell you
cannam@16	289 which segment is the chorus.
cannam@16	290 </p>
cannam@16	291 <p>Note that this plugin does a substantial amount of processing after
cannam@16	292 receiving all of the input audio data, before it produces any results.
cannam@16	293 </p>
cannam@16	294 <h3>Method</h3>
cannam@16	295
cannam@16	296 <p>The method relies upon structural/timbral similarity to obtain the
cannam@16	297 high-level song structure. This is based on the assumption that the
cannam@16	298 distributions of timbre features are similar over corresponding
cannam@16	299 structural elements of the music.
cannam@16	300 </p>
cannam@16	301 <p>The algorithm works by obtaining a frequency-domain representation of
cannam@16	302 the audio signal using a Constant-Q transform, a Chromagram or
cannam@16	303 Mel-Frequency Cepstral Coefficients (MFCC) as underlying features (the
cannam@16	304 particular feature is selectable as a parameter). The extracted
cannam@16	305 features are normalised in accordance with the MPEG-7 standard (NASE
cannam@16	306 descriptor), which means the spectrum is converted to decibel scale
cannam@16	307 and each spectral vector is normalised by the RMS energy envelope.
cannam@16	308 The value of this envelope is stored for each processing block of
cannam@16	309 audio. This is followed by the extraction of 20 principal components
cannam@16	310 per block using PCA, yielding a sequence of 21 dimensional feature
cannam@16	311 vectors where the last element in each vector corresponds to the
cannam@16	312 energy envelope.
cannam@16	313 </p>
cannam@16	314 <p>A 40-state Hidden Markov Model is then trained on the whole sequence
cannam@16	315 of features, with each state of the HMM corresponding to a specific
cannam@16	316 timbre type. This process partitions the timbre-space of a given track
cannam@16	317 into 40 possible types. The important assumption of the model is that
cannam@16	318 the distribution of these features remain consistent over a structural
cannam@16	319 segment. After training and decoding the HMM, the song is assigned a
cannam@16	320 sequence of timbre-features according to specific timbre-type
cannam@16	321 distributions for each possible structural segment.
cannam@16	322 </p>
cannam@16	323 <p>The segmentation itself is computed by clustering timbre-type
cannam@16	324 histograms. A series of histograms are created over a sliding window
cannam@16	325 which are grouped into M clusters by an adapted soft k-means
cannam@16	326 algorithm. Each of these clusters will correspond to a specific
cannam@16	327 segment-type of the analyzed song. Reference histograms, iteratively
cannam@16	328 updated during clustering, describe the timbre distribution for each
cannam@16	329 segment. The segmentation arises from the final cluster assignments.
cannam@16	330 </p>
cannam@16	331 <h3>Parameters</h3>
cannam@16	332
cannam@16	333 <p><b>Number of segment-types</b> – The maximum number of clusters
cannam@16	334 (segment-types) to be returned. The default is 10. Unlike many
cannam@16	335 clustering algorithms, the constrained clustering used in this plugin
cannam@16	336 does not produce too many clusters or vary significantly even if this
cannam@16	337 is set too high. However, this parameter can be useful for limiting
cannam@16	338 the number of expected segment-types.
cannam@16	339 </p>
cannam@16	340 <p><b>Feature Type</b> – The type of spectral feature used for segmentation. The available features are:<ul><li>"Hybrid", the default, which uses a Constant-Q transform (see <a href="#qm-constantq">related
cannam@16	341 plugin</a>): this is generally effective for modern studio recordings;</li><li> "Chromatic", using a chromagram derived from the Constant-Q feature (see <a href="#qm-chromagram">related plugin</a>): this may be preferable for live, acoustic, or older recordings, in which repeated sections may be less consistent in
cannam@16	342 sound;</li><li>"Timbral", using Mel-Frequency
cannam@16	343 Cepstral Coefficients (see <a href="#qm-mfcc">related plugin</a>), which is more likely to
cannam@16	344 result in classification by instrumentation rather than musical
cannam@16	345 content.</li></ul>
cannam@16	346 </p>
cannam@16	347 <p><b>Minimum segment duration</b> – The approximate expected minimum
cannam@16	348 duration for a segment, from 1 to 15 seconds. Changing this parameter
cannam@16	349 may help the plugin to find musical sections rather than just
cannam@16	350 following changes in the sound of the music, and also avoid wasting a
cannam@16	351 segment-type cluster for timbrally distinct but too-short segments.
cannam@16	352 The default of 4 seconds usually produces good results.
cannam@16	353 </p>
cannam@16	354 <h3>Outputs</h3>
cannam@16	355
cannam@16	356 <p><b>Segmentation</b> – The estimated segment boundaries, returned as a
cannam@16	357 single feature with one value at each segment boundary, with the value
cannam@16	358 representing the segment type number for the segment starting at that
cannam@16	359 boundary.
cannam@16	360 </p>
cannam@16	361 <h3>References and Credits</h3>
cannam@16	362
cannam@16	363 <p><b>Method</b>: M. Levy and M. Sandler. <i><a href="http://ieeexplore.ieee.org/iel5/10376/4432632/04432648.pdf?arnumber=4432648">Structural segmentation of musical audio by constrained clustering</a></i>. IEEE Transactions on Audio, Speech, and Language Processing, February 2008.
cannam@16	364 </p>
cannam@16	365 <p>Note that this plugin does not implement the beat-sychronous aspect
cannam@16	366 of the segmentation method described in the paper.
cannam@16	367 </p>
cannam@16	368 <p>The Segmenter Vamp plugin was written by Mark Levy. Thanks to George
cannam@16	369 Fazekas for providing much of this documentation.
cannam@16	370 </p>
cannam@16	371 <a name="qm-similarity"></a><h2>6. Similarity</h2>
cannam@16	372
cannam@16	373 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-similarity</code>
cannam@16	374 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-similarity">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-similarity</a>
cannam@16	375 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a>
cannam@16	376 </p>
cannam@16	377 <p>Similarity treats each channel of its audio input as a separate
cannam@16	378 "track", and estimates how similar the tracks are to one another using
cannam@16	379 a selectable similarity measure.
cannam@16	380 </p>
cannam@16	381 <p>The plugin also returns the intermediate data used as a basis of the
cannam@16	382 similarity measure; it can therefore be used on a single channel of
cannam@16	383 input (with the resulting intermediate data then being applied in some
cannam@16	384 other similarity or clustering algorithm, for example) if desired, as
cannam@16	385 well as with multiple inputs.
cannam@16	386 </p>
cannam@16	387 <p>Because of the way this plugin handles multiple inputs, by assuming
cannam@16	388 that each channel represents a separate piece of music, it may not be
cannam@16	389 appropriate for use directly in a general-purpose host (unless you
cannam@16	390 actually want to do something like compare two stereo channels for
cannam@16	391 timbral similarity, which is unlikely).
cannam@16	392 </p>
cannam@16	393 <h3>Parameters</h3>
cannam@16	394
cannam@16	395 <p><b>Feature Type</b> – The underlying audio feature used for the similarity
cannam@16	396 measure. The available features are:
cannam@16	397 <ul><li>"Timbre", in which the distance
cannam@16	398 between tracks is a symmetrised Kullback-Leibler divergence between
cannam@16	399 Gaussian-modelled MFCC means and variances across each track, for the
cannam@16	400 first 20 MFCCs including C0 (see <a href="#qm-mfcc">related plugin</a>);</li><li>"Chroma", which uses Kullback-Leibler divergence of
cannam@16	401 mean chroma histogram (see <a href="#qm-chromagram">related plugin</a>);</li><li>"Rhythm", using the cosine distance between
cannam@16	402 "beat spectrum" measures derived from a short sampled section of the
cannam@16	403 track;</li><li>and combined "Timbre and Rhythm" and "Chroma and Rhythm"
cannam@16	404 features.</li></ul>
cannam@16	405 </p>
cannam@16	406 <h3>Outputs</h3>
cannam@16	407
cannam@16	408 <p><b>Distance Matrix</b> – A matrix of the distance measures between input
cannam@16	409 channels, returned as a series of vector features timestamped at
cannam@16	410 one-second intervals. The distance from channel i to channel j
cannam@16	411 appears as the j'th bin of the feature at time i.
cannam@16	412 </p>
cannam@16	413 <p><b>Distance from First Channel</b> – A single vector feature, timestamped
cannam@16	414 at time zero, containing the distances between the first input channel
cannam@16	415 and each of the input channels (including the first channel itself at
cannam@16	416 bin 0, which should have zero distance).
cannam@16	417 </p>
cannam@16	418 <p><b>Ordered Distances from First Channel</b> – A pair of vector features,
cannam@16	419 at times 0 and 1 second. The feature at time 0 contains the 1-based
cannam@16	420 indices of the input channels in the order of similarity to the first
cannam@16	421 input channel (so its first bin should always contain 1, as the first
cannam@16	422 channel is most similar to itself). The feature at time 1 contains,
cannam@16	423 in bin n, the distance between the first input channel and the channel
cannam@16	424 with index found at bin n of the feature at time 0.
cannam@16	425 </p>
cannam@16	426 <p><b>Feature Means</b> – A series of vector features containing the mean
cannam@16	427 values of each of the feature bins across the duration of each of the
cannam@16	428 input channels. This output returns one feature for each input
cannam@16	429 channel, timestamped at one-second intervals. The number of bins for
cannam@16	430 each feature depends on the feature type; it will be 20 for MFCC
cannam@16	431 features and 12 for chroma features. No features will be returned on
cannam@16	432 this output if the feature type is purely rhythmic.
cannam@16	433 </p>
cannam@16	434 <p><b>Feature Variances</b> – Just as Feature Means, but variances.
cannam@16	435 </p>
cannam@16	436 <p><b>Beat Spectra</b> – A series of vector features containing the rhythmic
cannam@16	437 autocorrelation profiles (beat spectra) for each of the input
cannam@16	438 channels. This output returns one 512-bin feature for each input
cannam@16	439 channel, timestamped at one-second intervals. No features will be
cannam@16	440 returned on this output if the feature type contains no rhythm
cannam@16	441 component.
cannam@16	442 </p>
cannam@16	443 <h3>References and Credits</h3>
cannam@16	444
cannam@16	445 <p><b>Timbral similarity</b>: M. Levy and M. Sandler. <i><a href="http://www.elec.qmul.ac.uk/easaier/papers/mlevytimbralsimilarity.pdf">Lightweight measures for timbral similarity of musical audio</a></i>. In Proceedings of the 1st
cannam@16	446 ACM workshop on Audio and Music Computing Multimedia, Santa Barbara,
cannam@16	447 2006.
cannam@16	448 </p>
cannam@16	449 <p><b>Combined rhythmic and timbral similarity</b>: K. Jacobson. <i><a href="http://ismir2006.ismir.net/PAPERS/ISMIR0696_Paper.pdf">A Multifaceted Approach to Music Similarity</a></i>. In Proceedings of the
cannam@16	450 Seventh International Conference on Music Information Retrieval
cannam@16	451 (ISMIR), 2006.
cannam@16	452 </p>
cannam@16	453 <p>The Similarity Vamp plugin was written by Mark Levy, Kurt Jacobson and
cannam@16	454 Chris Cannam.
cannam@16	455 </p>
cannam@16	456 <a name="qm-constantq"></a><h2>7. Constant-Q Spectrogram</h2>
cannam@16	457
cannam@16	458 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-constantq</code>
cannam@16	459 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-constantq">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-constantq</a>
cannam@16	460 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a>
cannam@16	461 </p>
cannam@16	462 <p>Constant-Q Spectrogram calculates a spectrogram based on a short-time
cannam@16	463 windowed constant Q spectral transform. This is a spectrogram in
cannam@16	464 which the ratio of centre frequency to resolution is constant for each
cannam@16	465 frequency bin. The frequency bins correspond to the frequencies of
cannam@16	466 "musical notes" rather than being linearly spaced in frequency as they
cannam@16	467 are for the conventional DFT spectrogram.
cannam@16	468 </p>
cannam@16	469 <p>The pitch range and the number of frequency bins per octave may be
cannam@16	470 adjusted using the plugin's parameters. Note that the plugin's
cannam@16	471 preferred step and block sizes are defined by these parameters, and
cannam@16	472 the plugin will not accept any other block size than its preferred
cannam@16	473 value.
cannam@16	474 </p>
cannam@16	475 <h3>Parameters</h3>
cannam@16	476
cannam@16	477 <p><b>Minimum Pitch</b> – The MIDI pitch value (0-127) corresponding to the lowest
cannam@16	478 frequency to be included in the constant-Q transform.
cannam@16	479 </p>
cannam@16	480 <p><b>Maximum Pitch</b> – The MIDI pitch value (0-127) corresponding to the
cannam@16	481 lowest frequency to be included in the constant-Q transform.
cannam@16	482 </p>
cannam@16	483 <p><b>Tuning Frequency</b> – The frequency of concert A in the
cannam@16	484 music under analysis.
cannam@16	485 </p>
cannam@16	486 <p><b>Bins per Octave</b> – The number of constant-Q transform bins to be
cannam@16	487 computed per octave.
cannam@16	488 </p>
cannam@16	489 <p><b>Normalized</b> – Whether to normalize each output column to unit
cannam@16	490 maximum.
cannam@16	491 </p>
cannam@16	492 <h3>Outputs</h3>
cannam@16	493
cannam@16	494 <p><b>Constant-Q Spectrogram</b> – The calculated spectrogram, as a single
cannam@16	495 feature per process block containing one bin for each pitch included
cannam@16	496 in the spectrogram's range.
cannam@16	497 </p>
cannam@16	498 <h3>References and Credits</h3>
cannam@16	499
cannam@16	500 <p><b>Principle</b>: J. Brown. <i><a href="http://www.wellesley.edu/Physics/brown/pubs/cq1stPaper.pdf">Calculation of a constant Q spectral transform</a></i>. Journal of the Acoustical Society of America, 89(1):
cannam@16	501 425-434, 1991.
cannam@16	502 </p>
cannam@16	503 <p>The Constant-Q Spectrogram Vamp plugin was written by Christian
cannam@16	504 Landone.
cannam@16	505 </p>
cannam@16	506 <a name="qm-chromagram"></a><h2>8. Chromagram</h2>
cannam@16	507
cannam@16	508 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-chromagram</code>
cannam@16	509 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-chromagram">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-chromagram</a>
cannam@16	510 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a>
cannam@16	511 </p>
cannam@16	512 <p>Chromagram calculates a constant Q spectral transform (as in the
cannam@16	513 Constant Q Spectrogram plugin) and then wraps the frequency bin values
cannam@16	514 into a single octave, with each bin containing the sum of the
cannam@16	515 magnitudes from the corresponding bin in all octaves. The number of
cannam@16	516 values in each feature vector returned by the plugin is therefore the
cannam@16	517 same as the number of bins per octave configured for the underlying
cannam@16	518 constant Q transform.
cannam@16	519 </p>
cannam@16	520 <p>The pitch range and the number of frequency bins per octave for the
cannam@16	521 transform may be adjusted using the plugin's parameters. Note that
cannam@16	522 the plugin's preferred step and block sizes depend on these
cannam@16	523 parameters, and the plugin will not accept any other block size than
cannam@16	524 its preferred value.
cannam@16	525 </p>
cannam@16	526 <h3>Parameters</h3>
cannam@16	527
cannam@16	528 <p><b>Minimum Pitch</b> – The MIDI pitch value (0-127) corresponding to the
cannam@16	529 lowest frequency to be included in the constant-Q transform used in
cannam@16	530 calculating the chromagram.
cannam@16	531 </p>
cannam@16	532 <p><b>Maximum Pitch</b> – The MIDI pitch value (0-127) corresponding to the
cannam@16	533 lowest frequency to be included in the constant-Q transform used in
cannam@16	534 calculating the chromagram.
cannam@16	535 </p>
cannam@16	536 <p><b>Tuning Frequency</b> – The frequency of concert A in the
cannam@16	537 music under analysis.
cannam@16	538 </p>
cannam@16	539 <p><b>Bins per Octave</b> – The number of constant-Q transform bins to be
cannam@16	540 computed per octave, and thus the total number of bins present in the
cannam@16	541 resulting chromagram.
cannam@16	542 </p>
cannam@16	543 <p><b>Normalized</b> – Whether to normalize each output column. Normalization
cannam@16	544 may be to unit sum or unit maximum.
cannam@16	545 </p>
cannam@16	546 <h3>Outputs</h3>
cannam@16	547
cannam@16	548 <p><b>Chromagram</b> – The calculated chromagram, as a single feature per
cannam@16	549 process block containing the number of bins given in the bins per
cannam@16	550 octave parameter.
cannam@16	551 </p>
cannam@16	552 <h3>References and Credits</h3>
cannam@16	553
cannam@16	554 <p>The Chromagram Vamp plugin was written by Christian Landone.
cannam@16	555 </p>
cannam@16	556 <a name="qm-mfcc"></a><h2>9. Mel-Frequency Cepstral Coefficients</h2>
cannam@16	557
cannam@16	558 <p><b>System identifier</b> – <code>vamp:qm-vamp-plugins:qm-mfcc</code>
cannam@16	559 <br><b>RDF URI</b> – <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-mfcc">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-mfcc</a>
cannam@16	560 <br><b>Links</b> – <a href="#">Back to top of library documentation</a> – <a href="http://www.elec.qmul.ac.uk/digitalmusic/downloads/index.html#qm-vamp-plugins">Download location</a>
cannam@16	561 </p>
cannam@16	562 <p>Mel-Frequency Cepstral Coefficients calculates MFCCs from a single
cannam@16	563 channel of audio. These coefficients, derived from a cosine transform
cannam@16	564 of the mapping of an audio spectrum onto a frequency scale modelled on
cannam@16	565 human auditory response, are widely used in speech recognition, music
cannam@16	566 classification and other tasks.
cannam@16	567 </p>
cannam@16	568 <h3>Parameters</h3>
cannam@16	569
cannam@16	570 <p><b>Number of Coefficients</b> – The number of MFCCs to return. Commonly
cannam@16	571 used values include 13 or the default 20. This number includes C0 if
cannam@16	572 requested (see Include C0 below).
cannam@16	573 </p>
cannam@16	574 <p><b>Power for Mel Amplitude Logs</b> – An optional power value to which the
cannam@16	575 spectral amplitudes should be raised before applying the cosine
cannam@16	576 transform. Values greater than 1 may in principle reduce the
cannam@16	577 contribution of noise to the results. The default is 1.
cannam@16	578 </p>
cannam@16	579 <p><b>Include C0</b> – Whether to include the "zero'th" coefficient, which
cannam@16	580 simply reflects the overall signal power across the Mel frequency
cannam@16	581 bands.
cannam@16	582 </p>
cannam@16	583 <h3>Outputs</h3>
cannam@16	584
cannam@16	585 <p><b>Coefficients</b> – The MFCC values, returned as one vector feature per
cannam@16	586 processing block.
cannam@16	587 </p>
cannam@16	588 <p><b>Means of Coefficients</b> – The overall means of the MFCC bins, as a
cannam@16	589 single vector feature with time 0 that is returned when processing is
cannam@16	590 complete.
cannam@16	591 </p>
cannam@16	592 <h3>References and Credits</h3>
cannam@16	593
cannam@16	594 <p><b>MFCCs in music</b>: See B. Logan. <i><a href="http://ismir2000.ismir.net/papers/logan_paper.pdf">Mel-Frequency Cepstral Coefficients for Music Modeling</a></i>. In Proceedings of the First International
cannam@16	595 Symposium on Music Information Retrieval (ISMIR), 2000.
cannam@16	596 </p>
cannam@16	597 <p>The Mel-Frequency Cepstral Coefficients Vamp plugin was written by
cannam@16	598 Nicolas Chetry and Chris Cannam.
cannam@16	599 </p>
cannam@16	600 <p></p>
cannam@16	601 </CONTENTS>
cannam@16	602 </body>
cannam@16	603 </html>

Mercurial > hg > vamp-website

annotate plugin-doc/qm-vamp-plugins.html @ 22:d83d60afe81e website