view plugin-doc/qm-vamp-plugins.html @ 128:817c3988afc7 website tip

Add a page about the Vamp Plugin Pack, including links to download mirrors
author Chris Cannam
date Tue, 11 Aug 2020 16:41:11 +0100
parents 678a88672953
children
line wrap: on
line source
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>
    <link rel="stylesheet" media="screen" type="text/css" href="/screen.css"/>
    <link rel="icon" type="image/png" href="/images/waveform.png"/>
    <link rel="shortcut" type="image/png" href="/images/waveform.png"/>
    <title>QM Vamp Plugins: User Documentation</title>
    <meta name="robots" content="index"/>
  </head>
  <body>
<h1 id="header"><span>Vamp Plugins</span></h1>

<h2>QM Vamp Plugins</h2>

<p>The QM Vamp Plugin set is a library of Vamp audio feature
extraction plugins developed at the <a
href="http://c4dm.eecs.qmul.ac.uk/">Centre for Digital Music</a> at
Queen Mary, University of London.  These plugins are provided as a
single library file, made available in source and binary form for
Windows, OS/X, and Linux via the <a
href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/">SoundSoftware
code site</a> (see <a
href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/files">download
page</a>).

</p>
<p>For more information about Vamp plugins, see <a href="http://www.vamp-plugins.org/">http://www.vamp-plugins.org/</a> .
</p>

<div class="toc2">1. &nbsp;<a href="#qm-onsetdetector">Note Onset Detector</a></div>
<div class="toc2">2. &nbsp;<a href="#qm-tempotracker">Tempo and Beat Tracker</a></div>
<div class="toc2">3. &nbsp;<a href="#qm-barbeattracker">Bar and Beat Tracker</a></div>
<div class="toc2">4. &nbsp;<a href="#qm-keydetector">Key Detector</a></div>
<div class="toc2">5. &nbsp;<a href="#qm-tonalchange">Tonal Change</a></div>
<div class="toc2">6. &nbsp;<a href="#qm-adaptivespectrogram">Adaptive Spectrogram</a></div>
<div class="toc2">7. &nbsp;<a href="#qm-transcription">Polyphonic Transcription</a></div>
<div class="toc2">8. &nbsp;<a href="#qm-segmenter">Segmenter</a></div>
<div class="toc2">9. &nbsp;<a href="#qm-similarity">Similarity</a></div>
<div class="toc2">10. &nbsp;<a href="#qm-dwt">Discrete Wavelet Transform</a></div>
<div class="toc2">11. &nbsp;<a href="#qm-constantq">Constant-Q Spectrogram</a></div>
<div class="toc2">12. &nbsp;<a href="#qm-chromagram">Chromagram</a></div>
<div class="toc2">13. &nbsp;<a href="#qm-mfcc">Mel-Frequency Cepstral Coefficients</a></div>

<a name="qm-onsetdetector"></a><h2>1. Note Onset Detector</h2>

<p><b>System identifier</b> &ndash;    <code>vamp:qm-vamp-plugins:qm-onsetdetector</code>
<br><b>RDF URI</b> &ndash; <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-onsetdetector">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-onsetdetector</a>
<br><b>Links</b> &ndash; <a href="#">Back to top of library documentation</a> &ndash; <a href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/files">Download location</a>
</p>
<p>Note Onset Detector analyses a single channel of audio and estimates
 the onset times of notes within the music &ndash; that is, the times at
 which notes and other audible events begin.
</p>
<p>It calculates an onset likelihood function for each spectral frame,
 and picks peaks in a smoothed version of this function.  The plugin is
 non-causal, returning all results at the end of processing.
</p>
<h3>Parameters</h3>

<p><b>Onset Detection Function Type</b> &ndash; The method used to calculate the
 onset likelihood function.  The most versatile method is the default,
 "Complex Domain" (see reference, Duxbury et al 2003).  "Spectral
 Difference" may be appropriate for percussive recordings, "Phase
 Deviation" for non-percussive music, and "Broadband Energy Rise" (see
 reference, Barry et al 2005) for identifying percussive onsets in
 mixed music.
</p>
<p><b>Onset Detector Sensitivity</b> &ndash; Sensitivity level for peak detection
 in the onset likelihood function.  The higher the sensitivity, the
 more onsets will (rightly or wrongly) be detected.  The peak picker
 does not have a simple threshold level; instead, this parameter
 controls the required "steepness" of the slopes in the smoothed
 detection function either side of a peak value, in order for that peak
 to be accepted as an onset.
</p>
<p><b>Adaptive Whitening</b> &ndash; This option evens out the temporal and
 frequency variation in the signal, which can yield improved
 performance in onset detection, for example in audio with big
 variations in dynamics.
</p>
<h3>Outputs</h3>

<p><b>Note Onsets</b> &ndash; The detected note onset times, returned as a single
 feature with timestamp but no value for each detected note.
</p>
<p><b>Onset Detection Function</b> &ndash; The raw note onset likelihood function
 that was calculated as the first step of the detection process.
</p>
<p><b>Smoothed Detection Function</b> &ndash; The note onset likelihood function
 following median filtering.  This is the function from which
 sufficiently steep peak values are picked and classified as onsets.
</p>
<h3>References and Credits</h3>

<p><b>Basic detection methods</b>: C. Duxbury, J. P. Bello, M. Davies and
 M. Sandler, <i><a href="http://www.elec.qmul.ac.uk/dafx03/proceedings/pdfs/dafx81.pdf">Complex domain Onset Detection for Musical Signals</a></i>. In
 Proceedings of the 6th Conference on Digital Audio Effects
 (DAFx-03). London, UK. September 2003.
</p>
<p><b>Adaptive whitening</b>: D. Stowell and M. D. Plumbley, <i><a href="http://www.elec.qmul.ac.uk/digitalmusic/papers/2007/StowellPlumbley07-icmc.pdf">Adaptive whitening for improved real-time audio onset detection</a></i>. In
 Proceedings of the International Computer Music Conference (ICMC'07),
 August 2007.
</p>
<p><b>Percussion onset detector</b>: D. Barry, D. Fitzgerald, E. Coyle and
 B. Lawlor, <i><a href="http://eleceng.dit.ie/papers/15.pdf">Drum Source Separation using Percussive Feature Detection and Spectral Modulation</a></i>. ISSC 2005.
</p>
<p>The Note Onset Detector Vamp plugin was written by Chris Duxbury, Juan
 Pablo Bello and Christian Landone.
</p>

<a name="qm-tempotracker"></a><h2>2. Tempo and Beat Tracker</h2>

<p><b>System identifier</b> &ndash;    <code>vamp:qm-vamp-plugins:qm-tempotracker</code>
<br><b>RDF URI</b> &ndash; <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tempotracker">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tempotracker</a>
<br><b>Links</b> &ndash; <a href="#">Back to top of library documentation</a> &ndash; <a href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/files">Download location</a>
</p>
<p>Tempo and Beat Tracker analyses a single channel of audio and
 estimates the positions of metrical beats within the music (the
 equivalent of a human listener tapping their foot to the beat).
</p>
<h3>Parameters</h3>

<p><b>Beat Tracking Method</b> &ndash; The method used to track beats.  The default, "New", uses a hybrid of the "Old" two-state beat tracking model
(see reference Davies 2007) and a dynamic programming method (see reference
Ellis 2007). A more detailed description is given below within the Bar and
Beat Tracker plugin. </p>

<p><b>Onset Detection Function Type</b> &ndash; The algorithm used to calculate the
 onset likelihood function.  The most versatile method is the default,
 "Complex Domain" (see reference, Duxbury et al 2003).  "Spectral
 Difference" may be appropriate for percussive recordings, "Phase
 Deviation" for non-percussive music, and "Broadband Energy Rise" (see
 reference, Barry et al 2005) for identifying percussive onsets in
 mixed music.
</p>
<p><b>Adaptive Whitening</b> &ndash; This option evens out the temporal and
 frequency variation in the signal, which can yield improved
 performance in onset detection, for example in audio with big
 variations in dynamics.
</p>
<h3>Outputs</h3>

<p><b>Beats</b> &ndash; The estimated beat locations, returned as a single feature,
 with timestamp but no value, for each beat, labelled with the
 corresponding estimated tempo at that beat.
</p>
<p><b>Onset Detection Function</b> &ndash; The raw note onset likelihood function
 used in beat estimation.
</p>
<p><b>Tempo</b> &ndash; The estimated tempo, returned as a feature each time the
 estimated tempo changes, with a single value for the tempo in beats
 per minute.
</p>
<h3>References and Credits</h3>

<p><b>Beat tracking method</b>: M. E. P. Davies and M. D. Plumbley.
 <i><a href="http://www.elec.qmul.ac.uk/people/markp/2007/DaviesPlumbley07-taslp.pdf">Context-dependent beat tracking of musical audio</a></i>. In IEEE
 Transactions on Audio, Speech and Language Processing. Vol. 15, No. 3,
 pp1009-1020, 2007;<br>M. E. P. Davies and M. D. Plumbley.
 <i><a href="http://www.elec.qmul.ac.uk/people/markp/2005/DaviesPlumbley05-icassp.pdf">Beat Tracking With A Two State Model</a></i>. In Proceedings of the IEEE
 International Conference on Acoustics, Speech and Signal Processing
 (ICASSP 2005), Vol. 3, pp241-244 Philadelphia, USA, March 19-23, 2005;
<br>D. P. W. Ellis. <i>Beat Tracking by Dynamic
 Programming</i>. In Journal of New Music Research. Vol. 37, No. 1,
 pp51-60, 2007.
</p>
<p><b>Onset detection methods</b>: C. Duxbury, J. P. Bello, M. Davies and
 M. Sandler, <i><a href="http://www.elec.qmul.ac.uk/dafx03/proceedings/pdfs/dafx81.pdf">Complex domain Onset Detection for Musical Signals</a></i>. In
 Proceedings of the 6th Conference on Digital Audio Effects
 (DAFx-03). London, UK. September 2003.
</p>
<p><b>Adaptive whitening</b>: D. Stowell and M. D. Plumbley, <i><a href="http://www.elec.qmul.ac.uk/digitalmusic/papers/2007/StowellPlumbley07-icmc.pdf">Adaptive whitening for improved real-time audio onset detection</a></i>. In
 Proceedings of the International Computer Music Conference (ICMC'07),
 August 2007.
</p>
<p><b>Percussion onset detector</b>: D. Barry, D. Fitzgerald, E. Coyle and
 B. Lawlor, <i><a href="http://eleceng.dit.ie/papers/15.pdf">Drum Source Separation using Percussive Feature Detection and Spectral Modulation</a></i>. ISSC 2005.
</p>
<p>The Tempo and Beat Tracker Vamp plugin was written by Matthew Davies
 and Christian Landone.
</p>


<a name="qm-barbeattracker"></a><h2>3. Bar and Beat Tracker</h2>

<p><b>System identifier</b> &ndash;    <code>vamp:qm-vamp-plugins:qm-barbeattracker</code>
<br><b>RDF URI</b> &ndash; <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-barbeattracker">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-barbeattracker</a>
<br><b>Links</b> &ndash; <a href="#">Back to top of library documentation</a> &ndash; <a href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/files">Download location</a>
</p>

<p>Bar and Beat Tracker analyses a single channel of audio and
 estimates the positions of bar lines and the resulting counted
 metrical beat positions within the music (where the first beat of
 each bar is "1", the equivalent of counting in time to the music).
 It is closely related to the <a href="#qm-tempotracker">Tempo and
 Beat Tracker</a>, producing the same results for beat position as
 that plugin's "New" beat tracking method.

</p>

<h3>Method</h3>

<p>The plugin first calculates an onset detection function using the
"Complex Domain" method (see <a href="#qm-tempotracker">Tempo and Beat
Tracker</a>).</p>

<p>The beat tracking method performs two passes over the onset
detection function, first to estimate the tempo contour, and then
given the tempo, to recover the beat locations.</p>

<p>To identify the tempo, the onset detection function is partitioned
into 6-second frames with a 1.5-second increment. The autocorrelation
function of each 6-second onset detection function is found and this
is then passed through a perceptually weighted comb filterbank (see
reference Davies 2007). The successive comb filterbank output signals
are grouped together into a matrix of observations of periodicity
through time. The best path of periodicity through these observations
is found using the Viterbi algorithm, where the transition matrix is
defined as a diagonal Gaussian.</p>

<p>Given the estimates of periodicity, the beat locations are recovered
by applying the dynamic programming algorithm (see reference Ellis
2007). This process involves the calculation of a recursive cumulative
score function and backtrace signal. The cumulative score indicates
the likelihood of a beat existing at each sample of the onset
detection function input, and the backtrace gives the location of the
best previous beat given this point in time. Once the cumulative score
and backtrace have been calculated for the whole input signal, the
best path through beat locations is found by recursively sampling the
backtrace signal from the end of the input signal back to the
beginning.  See reference Stark et al. 2009 for a description of the
real-time implementation of the beat tracking algorithm.</p>

<p>Once the beat locations have been identified, the plugin makes a
second pass over the input audio signal, partitioning it into beat
synchronous frames. The audio within each beat frame is down-sampled
to give a new sampling frequency of 2.8kHz. A beat-synchronous
spectral representation is then calculated within each frame, from
which a measure of beat spectral difference is calculated using
Jensen-Shannon divergence. The bar boundaries are identified as those
beat transitions leading to most consistent spectral change given the
specified number of beats per bar.</p>

<h3>Parameters</h3>

<p><b>Beats per Bar</b> &ndash; The number of beats per bar (or measure).  The
plugin assumes that the number of beats per bar is fixed throughout
the music.
</p>
<h3>Outputs</h3>

<p><b>Beats</b> &ndash; The estimated beat locations, returned as a single feature,
 with timestamp but no value, for each beat, labelled with the
 number of that beat within the bar (e.g. consecutively 1, 2, 3, 4 for 4 beats to the bar).
</p>
<p><b>Bars</b> &ndash; The estimated bar line locations, returned as a single feature,
 with timestamp but no value, for each bar.
</p>
<p><b>Beat Count</b> &ndash; The estimated beat locations, returned as a single feature,
 with timestamp and a value corresponding to the
 number of that beat within the bar.  This is similar to the Beats output except that it returns a counting function rather than a series of instants.
</p>
<p><b>Beat Spectral Difference</b> &ndash; The new-bar likelihood function used in bar line estimation.
</p>

<h3>References and Credits</h3>

<p><b>Beat tracking method</b>: A. M. Stark, M. E. P. Davies and
 M. D. Plumbley. <i>Real-time beat-synchronous analysis of musical
 audio</i>. To appear in Proceedings of 12th International Conference
 on Digital Audio Effects (DAFx). 2009;<br>M. E. P. Davies and
 M. D. Plumbley.  <i><a
 href="http://www.elec.qmul.ac.uk/people/markp/2007/DaviesPlumbley07-taslp.pdf">Context-dependent
 beat tracking of musical audio</a></i>. In IEEE Transactions on
 Audio, Speech and Language Processing. Vol. 15, No. 3, pp1009-1020,
 2007;<br>D. P. W. Ellis. <i>Beat Tracking by Dynamic
 Programming</i>. In Journal of New Music Research. Vol. 37, No. 1,
 pp51-60, 2007.</p>

<p><b>Bar finding method</b>: M. E. P. Davies and M. D. Plumbley. <i>A
spectral difference approach to extracting downbeats in musical
audio</i>. In Proceedings of 14th European Signal Processing Conference
(EUSIPCO), Italy, 2006.</p>

<p>The Bar and Beat Tracker Vamp plugin was written by Matthew Davies and Adam Stark.
</p>



<a name="qm-keydetector"></a><h2>4. Key Detector</h2>

<p><b>System identifier</b> &ndash;    <code>vamp:qm-vamp-plugins:qm-keydetector</code>
<br><b>RDF URI</b> &ndash; <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-keydetector">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-keydetector</a>
<br><b>Links</b> &ndash; <a href="#">Back to top of library documentation</a> &ndash; <a href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/files">Download location</a>
</p>
<p>Key Detector analyses a single channel of audio and continuously
 estimates the key of the music by comparing the degree to which a
 block-by-block chromagram correlates to the stored key profiles for
 each major and minor key.
</p>
<p>The key profiles are drawn from analysis of Book I of the Well
 Tempered Klavier by J S Bach, recorded at A=440 equal temperament.
</p>
<h3>Parameters</h3>

<p><b>Tuning Frequency</b> &ndash; The frequency of concert A in the music under
 analysis.
</p>
<p><b>Window Length</b> &ndash; The number of chroma analysis frames taken into
 account for key estimation.  This controls how eager the key detector
 will be to return short-duration tonal changes as new key changes (the
 shorter the window, the more likely it is to detect a new key change).
</p>
<h3>Outputs</h3>

<p><b>Tonic Pitch</b> &ndash; The tonic pitch of each estimated key change,
 returned as a single-valued feature at the point where the key change
 is detected, with value counted from 1 to 12 where C is 1, C# or Db is
 2, and so on up to B which is 12.
</p>
<p><b>Key Mode</b> &ndash; The major or minor mode of the estimated key, where
 major is 0 and minor is 1.
</p>
<p><b>Key</b> &ndash; The estimated key for each key change, returned as a
 single-valued feature at the point where the key change is detected,
 with value counted from 1 to 24 where 1-12 are the major keys and
 13-24 are the minor keys, such that C major is 1, C# major is 2, and
 so on up to B major which is 12; then C minor is 13, Db minor is 14,
 and so on up to B minor which is 24.
</p>
<p><b>Key Strength Plot</b> &ndash; A grid representing the ongoing key
 "probability" throughout the music.  This is returned as a feature for
 each chroma frame, containing 25 bins.  Bins 1-12 are the major keys
 from C upwards; bins 14-25 are the minor keys from C upwards.  The
 13th bin is unused: it just provides space between the first and
 second halves of the feature if displayed in a single plot.
</p>
<p>The outputs are also labelled with pitch or key as text.
</p>
<h3>References and Credits</h3>

<p><b>Method</b>: see K. Noland and M. Sandler. <i><a href="http://www.aes.org/e-lib/browse.cfm?elib=14140">Signal Processing Parameters for Tonality Estimation</a></i>. In Proceedings of Audio Engineering Society
 122nd Convention, Vienna, 2007.
</p>
<p>The Key Detector Vamp plugin was written by Katy Noland and Christian
 Landone.
</p>

<a name="qm-tonalchange"></a><h2>5. Tonal Change</h2>

<p><b>System identifier</b> &ndash;    <code>vamp:qm-vamp-plugins:qm-tonalchange</code>
<br><b>RDF URI</b> &ndash; <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tonalchange">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tonalchange</a>
<br><b>Links</b> &ndash; <a href="#">Back to top of library documentation</a> &ndash; <a href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/files">Download location</a>
</p>
<p>Tonal Change analyses a single channel of audio, detecting harmonic
 changes such as chord boundaries.
</p>
<h3>Parameters</h3>

<p><b>Gaussian smoothing</b> &ndash; The window length for the internal smoothing
 operation, in chroma analysis frames.  This controls how eager the
 tonal change detector will be to identify very short-term tonal
 changes.  The default value of 5 is quite short, and may lead to more
 (not always meaningful) results being returned; for many purposes a
 larger value, closer to the maximum of 20, may be appropriate.
</p>
<p><b>Chromagram minimum pitch</b> &ndash; The MIDI pitch value (0-127) of the
 minimum pitch included in the internal chromagram analyis.
</p>
<p><b>Chromagram maximum pitch</b> &ndash; The MIDI pitch value (0-127) of the
 maximum pitch included in the internal chromagram analyis.
</p>
<p><b>Chromagram tuning frequency</b> &ndash; The frequency of concert A in the
 music under analysis.
</p>
<h3>Outputs</h3>

<p><b>Transform to 6D Tonal Content Space</b> &ndash; A representation of the
 musical content in a six-dimensional tonal space onto which the
 algorithm maps 12-bin chroma vectors extracted from the audio.
</p>
<p><b>Tonal Change Detection Function</b> &ndash; A function representing the
 estimated likelihood of a tonal change occurring in each spectral
 frame.
</p>
<p><b>Tonal Change Positions</b> &ndash; The resulting estimated positions of tonal
 changes.
</p>
<h3>References and Credits</h3>

<p><b>Method</b>: C. A. Harte, M. Gasser, and M. Sandler. <i><a href="http://portal.acm.org/citation.cfm?id=1178723.1178727">Detecting harmonic change in musical audio</a></i>.  In Proceedings of the 1st ACM workshop on
 Audio and Music Computing Multimedia, Santa Barbara, 2006.
</p>
<p>The Tonal Change Vamp plugin was written by Chris Harte and Martin
 Gasser.
</p>


<a name="qm-adaptivespectrogram"></a><h2>6. Adaptive Spectrogram</h2>

<p><b>System identifier</b> &ndash; <code>vamp:qm-vamp-plugins:qm-adaptivespectrogram</code>
<br><b>RDF URI</b> &ndash; <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-adaptivespectrogram">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-adaptivespectrogram</a>
<br><b>Links</b> &ndash; <a href="#">Back to top of library documentation</a> &ndash; <a href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/files">Download location</a>
</p>

<p>Adaptive Spectrogram produces a composite spectrogram from a set of
series of short-time Fourier transforms at differing resolutions.
Values are selected from these spectrograms by repeated subdivision by
time and frequency in order to maximise an entropy function across
each column.</p>

<h3>Parameters</h3>

<p><b>Number of resolutions</b> &ndash; The number of distinct
resolutions to calculate and use.  The resolutions will be consecutive
powers of two starting from the smallest resolution specified.</p>

<p><b>Smallest resolution</b> &ndash; The smallest of the set of
resolutions to use.</p>

<p><b>Omit alternate resolutions</b> &ndash; Causes the plugin to
ignore alternate resolutions (i.e. the smallest resolution multiplied
by 2, 8, 32, etc) when composing a spectrogram.  The smallest
resolution specified, and its multiples by 4, 16, etc as applicable,
will be retained.  The total number of resolutions actually included
in the resulting spectrogram will therefore be N/2 (for even N) or
(N+1)/2 (for odd N) where N is the value of the "number of
resolutions" parameter.  This permits a wider range of resolutions to
be included with less processing, at obvious cost in quality.</p>

<p><b>Multi-threaded processing</b> &ndash; Enables multi-threading of
the spectrogram calculation.  This usually results in somewhat faster
processing where multiple CPU cores are available.</p>

<p>As an example of the resolution parameters, if the "number of
resolutions" is set to 5, "smallest resolution" to 128, and "omit
alternate resolutions" is not used, the composite spectrogram will be
calculated using spectrograms from 128, 256, 512, 1024, and 2048 point
short-time Fourier transforms (with 50% overlap in each case).  With
"omit alternate resolutions" set, the same parameters would result in
spectrograms from 128, 512, and 2048 point STFTs being used.</p>

<h3>References and Credits</h3>

<p><b>Method</b>: X. Wen and M. Sandler. <i><a href="http://scitation.aip.org/getabs/servlet/GetabsServlet?prog=normal&id=ISPECX000003000001000051000001">Composite spectrogram using multiple Fourier transforms</a></i>.  IET Signal Processing, 3(1):51-63, 2009.
</p>

<p>The Adaptive Spectrogram Vamp plugin was written by Wen Xue and Chris Cannam.</p>

<a name="qm-transcription"></a><h2>7. Polyphonic Transcription</h2>

<p><b>System identifier</b> &ndash; <code>vamp:qm-vamp-plugins:qm-transcription</code>
<br><b>RDF URI</b> &ndash; <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-transcription">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-transcription</a>
<br><b>Links</b> &ndash; <a href="#">Back to top of library documentation</a> &ndash; <a href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/files">Download location</a>

<p>The Polyphonic Transcription plugin estimates a note transcription
using MIDI pitch values from its input audio, returning a feature for
each note (with timestamp and duration) whose value is the MIDI pitch
number.  Velocity is not estimated.</p>

<p>Although the published description of the method is described as
real-time, the implementation used in this plugin is non-causal; it
buffers its input to operate on in a single unit, doing all the real
work after its entire input has been received, and is very memory
intensive.  However, it is relatively fast (faster than real-time)
compared to other polyphonic transcription methods.</p>

<p>The plugin works best at 44.1KHz input sample rate, and is tuned for
piano and guitar music.</p>


<h3>References and Credits</h3>

<p><b>Method</b>: R. Zhou and J. D. Reiss. <i>A Real-Time Polyphonic Music Transcription System</i>. In Proceedings of the Fourth Music Information Retrieval Evaluation eXchange (MIREX), Philadelphia, USA, 2008;<br>R. Zhou and J. D. Reiss. <i>A Real-Time Frame-Based Multiple Pitch Estimation Method Using the Resonator Time Frequency Image</i>. Third Music Information Retrieval Evaluation eXchange (MIREX), Vienna, Austria, 2007.</p>

<p>The Polyphonic Transcription Vamp plugin was written by Ruohua Zhou.</p>


<a name="qm-segmenter"></a><h2>8. Segmenter</h2>

<p><b>System identifier</b> &ndash; <code>vamp:qm-vamp-plugins:qm-segmenter</code>
<br><b>RDF URI</b> &ndash; <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-segmenter">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-segmenter</a>
<br><b>Links</b> &ndash; <a href="#">Back to top of library documentation</a> &ndash; <a href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/files">Download location</a>
</p>
<p>Segmenter divides a single channel of music up into structurally
 consistent segments.  It returns a numeric value (the segment type)
 for each moment at which a new segment starts.
</p>
<p>For music with clearly tonally distinguishable sections such as verse,
 chorus, etc., segments with the same type may be expected to be
 similar to one another in some structural sense.  For example,
 repetitions of the chorus are likely to share a segment type.
</p>
<p>The plugin only attempts to identify similar segments; it does not
 attempt to label them.  For example, it makes no attempt to tell you
 which segment is the chorus.
</p>
<p>Note that this plugin does a substantial amount of processing after
 receiving all of the input audio data, before it produces any results.
</p>
<h3>Method</h3>

<p>The method relies upon structural/timbral similarity to obtain the
 high-level song structure.  This is based on the assumption that the
 distributions of timbre features are similar over corresponding
 structural elements of the music.
</p>
<p>The algorithm works by obtaining a frequency-domain representation of
 the audio signal using a Constant-Q transform, a Chromagram or
 Mel-Frequency Cepstral Coefficients (MFCC) as underlying features (the
 particular feature is selectable as a parameter).  The extracted
 features are normalised in accordance with the MPEG-7 standard (NASE
 descriptor), which means the spectrum is converted to decibel scale
 and each spectral vector is normalised by the RMS energy envelope.
 The value of this envelope is stored for each processing block of
 audio. This is followed by the extraction of 20 principal components
 per block using PCA, yielding a sequence of 21 dimensional feature
 vectors where the last element in each vector corresponds to the
 energy envelope.
</p>
<p>A 40-state Hidden Markov Model is then trained on the whole sequence
 of features, with each state of the HMM corresponding to a specific
 timbre type. This process partitions the timbre-space of a given track
 into 40 possible types. The important assumption of the model is that
 the distribution of these features remain consistent over a structural
 segment. After training and decoding the HMM, the song is assigned a
 sequence of timbre-features according to specific timbre-type
 distributions for each possible structural segment.
</p>
<p>The segmentation itself is computed by clustering timbre-type
 histograms. A series of histograms are created over a sliding window
 which are grouped into M clusters by an adapted soft k-means
 algorithm. Each of these clusters will correspond to a specific
 segment-type of the analyzed song. Reference histograms, iteratively
 updated during clustering, describe the timbre distribution for each
 segment. The segmentation arises from the final cluster assignments.
</p>
<h3>Parameters</h3>

<p><b>Number of segment-types</b> &ndash; The maximum number of clusters
 (segment-types) to be returned.  The default is 10. Unlike many
 clustering algorithms, the constrained clustering used in this plugin
 does not produce too many clusters or vary significantly even if this
 is set too high. However, this parameter can be useful for limiting
 the number of expected segment-types.
</p>
<p><b>Feature Type</b> &ndash; The type of spectral feature used for segmentation.  The available features are:<ul><li>"Hybrid", the default, which uses a Constant-Q transform (see <a href="#qm-constantq">related
 plugin</a>): this is generally effective for modern studio recordings;</li><li> "Chromatic", using a chromagram derived from the Constant-Q feature (see <a href="#qm-chromagram">related plugin</a>): this may be preferable for live, acoustic, or older recordings, in which repeated sections may be less consistent in
 sound;</li><li>"Timbral", using Mel-Frequency
 Cepstral Coefficients (see <a href="#qm-mfcc">related plugin</a>), which is more likely to
 result in classification by instrumentation rather than musical
 content.</li></ul>
</p>
<p><b>Minimum segment duration</b> &ndash; The approximate expected minimum
 duration for a segment, from 1 to 15 seconds.  Changing this parameter
 may help the plugin to find musical sections rather than just
 following changes in the sound of the music, and also avoid wasting a
 segment-type cluster for timbrally distinct but too-short segments.
 The default of 4 seconds usually produces good results.
</p>
<h3>Outputs</h3>

<p><b>Segmentation</b> &ndash; The estimated segment boundaries, returned as a
 single feature with one value at each segment boundary, with the value
 representing the segment type number for the segment starting at that
 boundary.
</p>
<h3>References and Credits</h3>

<p><b>Method</b>: M. Levy and M. Sandler. <i><a href="http://ieeexplore.ieee.org/iel5/10376/4432632/04432648.pdf?arnumber=4432648">Structural segmentation of musical audio by constrained clustering</a></i>. IEEE Transactions on Audio, Speech, and Language Processing, February 2008.
</p>
<p>Note that this plugin does not implement the beat-sychronous aspect
 of the segmentation method described in the paper.
</p>
<p>The Segmenter Vamp plugin was written by Mark Levy.  Thanks to George
 Fazekas for providing much of this documentation.
</p>
<a name="qm-similarity"></a><h2>9. Similarity</h2>

<p><b>System identifier</b> &ndash;    <code>vamp:qm-vamp-plugins:qm-similarity</code>
<br><b>RDF URI</b> &ndash; <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-similarity">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-similarity</a>
<br><b>Links</b> &ndash; <a href="#">Back to top of library documentation</a> &ndash; <a href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/files">Download location</a>
</p>
<p>Similarity treats each channel of its audio input as a separate
 "track", and estimates how similar the tracks are to one another using
 a selectable similarity measure.
</p>
<p>The plugin also returns the intermediate data used as a basis of the
 similarity measure; it can therefore be used on a single channel of
 input (with the resulting intermediate data then being applied in some
 other similarity or clustering algorithm, for example) if desired, as
 well as with multiple inputs.
</p>
<p>Because of the way this plugin handles multiple inputs, by assuming
 that each channel represents a separate piece of music, it may not be
 appropriate for use directly in a general-purpose host (unless you
 actually want to do something like compare two stereo channels for
 timbral similarity, which is unlikely).
</p>
<h3>Parameters</h3>

<p><b>Feature Type</b> &ndash; The underlying audio feature used for the similarity
 measure.  The available features are:
<ul><li>"Timbre", in which the distance
 between tracks is a symmetrised Kullback-Leibler divergence between
 Gaussian-modelled MFCC means and variances across each track, for the
 first 20 MFCCs including C0 (see <a href="#qm-mfcc">related plugin</a>);</li><li>"Chroma", which uses Kullback-Leibler divergence of
 mean chroma histogram (see <a href="#qm-chromagram">related plugin</a>);</li><li>"Rhythm", using the cosine distance between
 "beat spectrum" measures derived from a short sampled section of the
 track;</li><li>and combined "Timbre and Rhythm" and "Chroma and Rhythm"
 features.</li></ul>
</p>
<h3>Outputs</h3>

<p><b>Distance Matrix</b> &ndash; A matrix of the distance measures between input
 channels, returned as a series of vector features timestamped at
 one-second intervals.  The distance from channel i to channel j
 appears as the j'th bin of the feature at time i.
</p>
<p><b>Distance from First Channel</b> &ndash; A single vector feature, timestamped
 at time zero, containing the distances between the first input channel
 and each of the input channels (including the first channel itself at
 bin 0, which should have zero distance).
</p>
<p><b>Ordered Distances from First Channel</b> &ndash; A pair of vector features,
 at times 0 and 1 second.  The feature at time 0 contains the 1-based
 indices of the input channels in the order of similarity to the first
 input channel (so its first bin should always contain 1, as the first
 channel is most similar to itself).  The feature at time 1 contains,
 in bin n, the distance between the first input channel and the channel
 with index found at bin n of the feature at time 0.
</p>
<p><b>Feature Means</b> &ndash; A series of vector features containing the mean
 values of each of the feature bins across the duration of each of the
 input channels.  This output returns one feature for each input
 channel, timestamped at one-second intervals.  The number of bins for
 each feature depends on the feature type; it will be 20 for MFCC
 features and 12 for chroma features.  No features will be returned on
 this output if the feature type is purely rhythmic.
</p>
<p><b>Feature Variances</b> &ndash; Just as Feature Means, but variances.
</p>
<p><b>Beat Spectra</b> &ndash; A series of vector features containing the rhythmic
 autocorrelation profiles (beat spectra) for each of the input
 channels.  This output returns one 512-bin feature for each input
 channel, timestamped at one-second intervals.  No features will be
 returned on this output if the feature type contains no rhythm
 component.
</p>
<h3>References and Credits</h3>

<p><b>Timbral similarity</b>: M. Levy and M. Sandler. <i><a href="http://www.elec.qmul.ac.uk/easaier/papers/mlevytimbralsimilarity.pdf">Lightweight measures for timbral similarity of musical audio</a></i>. In Proceedings of the 1st
 ACM workshop on Audio and Music Computing Multimedia, Santa Barbara,
 2006.
</p>
<p><b>Combined rhythmic and timbral similarity</b>: K. Jacobson. <i><a href="http://ismir2006.ismir.net/PAPERS/ISMIR0696_Paper.pdf">A Multifaceted Approach to Music Similarity</a></i>. In Proceedings of the
 Seventh International Conference on Music Information Retrieval
 (ISMIR), 2006.
</p>
<p>The Similarity Vamp plugin was written by Mark Levy, Kurt Jacobson and
 Chris Cannam.
</p>


<a name="qm-dwt"></a><h2>10. Discrete Wavelet Transform</h2>

<p><b>System identifier</b> &ndash;    <code>vamp:qm-vamp-plugins:qm-dwt</code>
<br><b>RDF URI</b> &ndash; <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-dwt">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-dwt</a>
<br><b>Links</b> &ndash; <a href="#">Back to top of library documentation</a> &ndash; <a href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/files">Download location</a>

<p>Discrete Wavelet Transform plugin performs the forward DWT on the
signal. The wavelet coefficients are derived from a fast segmented DWT
algorithm without block end effects. The DWT can be performed with
various functions from a selection of wavelets up to the 16th scale.<p>

<p>The wavelet coefficients are returned as feature columns at a rate of
half the sample rate of the signal to be analysed. To simulate
multiresolution in the layer data table, the coefficient values at
higher scales are copied multiple times according to the number of the
scale. For example, for scale 2 each value will appear twice, at scale
3 they will be appear four times, at scale 4 there will be 8 times the
same coefficient value in order to simulate the lower resolution at
higher scales.</p>

<h3>Parameters</h3>

<p><b>Scales</b> &ndash; Adjusts the number of scales of the DWT. The
processing block size needs to be set to at least 2<sup>n</sup>, where n =
number of scales.</p>

<p><b>Wavelet</b> &ndash; Selects the wavelet function to be used for
the transform.  Wavelets from the following families are available:
Daubechies, Symlets, Coiflets, Biorthogonal, Meyer.</p>

<h3>References and Credits</h3>

<p><b>Principles</b>: S. Mallat. <i>A theory for multiresolution signal decomposition: the wavelet representation</i>. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 11 (1989), pp. 674-693;<br>
P. Rajmic and J. Vlach. <i>Real-Time Audio Processing via Segmented Wavelet Transform</i>. In Proceedings of the 10th Int. Conference on Digital Audio Effects (DAFx-07), Bordeaux, France, September 10-15, 2007.</p>

<p>The Discrete Wavelet Transform plugin was written by Thomas Wilmering.</p>

<a name="qm-constantq"></a><h2>11. Constant-Q Spectrogram</h2>

<p><b>System identifier</b> &ndash;    <code>vamp:qm-vamp-plugins:qm-constantq</code>
<br><b>RDF URI</b> &ndash; <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-constantq">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-constantq</a>
<br><b>Links</b> &ndash; <a href="#">Back to top of library documentation</a> &ndash; <a href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/files">Download location</a>
</p>
<p>Constant-Q Spectrogram calculates a spectrogram based on a short-time
 windowed constant Q spectral transform.  This is a spectrogram in
 which the ratio of centre frequency to resolution is constant for each
 frequency bin.  The frequency bins correspond to the frequencies of
 "musical notes" rather than being linearly spaced in frequency as they
 are for the conventional DFT spectrogram.
</p>
<p>The pitch range and the number of frequency bins per octave may be
 adjusted using the plugin's parameters.  Note that the plugin's
 preferred step and block sizes are defined by these parameters, and
 the plugin will not accept any other block size than its preferred
 value.
</p>
<h3>Parameters</h3>

<p><b>Minimum Pitch</b> &ndash; The MIDI pitch value (0-127) corresponding to the lowest
 frequency to be included in the constant-Q transform.
</p>
<p><b>Maximum Pitch</b> &ndash; The MIDI pitch value (0-127) corresponding to the
 lowest frequency to be included in the constant-Q transform.
</p>
<p><b>Tuning Frequency</b> &ndash; The frequency of concert A in the
 music under analysis.
</p>
<p><b>Bins per Octave</b> &ndash; The number of constant-Q transform bins to be
 computed per octave.
</p>
<p><b>Normalized</b> &ndash; Whether to normalize each output column to unit
 maximum.
</p>
<h3>Outputs</h3>

<p><b>Constant-Q Spectrogram</b> &ndash; The calculated spectrogram, as a single
 feature per process block containing one bin for each pitch included
 in the spectrogram's range.
</p>
<h3>References and Credits</h3>

<p><b>Principle</b>: J. Brown. <i><a href="http://www.wellesley.edu/Physics/brown/pubs/cq1stPaper.pdf">Calculation of a constant Q spectral transform</a></i>. Journal of the Acoustical Society of America, 89(1):
 425-434, 1991.
</p>
<p>The Constant-Q Spectrogram Vamp plugin was written by Christian
 Landone.
</p>
<a name="qm-chromagram"></a><h2>12. Chromagram</h2>

<p><b>System identifier</b> &ndash;    <code>vamp:qm-vamp-plugins:qm-chromagram</code>
<br><b>RDF URI</b> &ndash; <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-chromagram">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-chromagram</a>
<br><b>Links</b> &ndash; <a href="#">Back to top of library documentation</a> &ndash; <a href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/files">Download location</a>
</p>
<p>Chromagram calculates a constant Q spectral transform (as in the
 Constant Q Spectrogram plugin) and then wraps the frequency bin values
 into a single octave, with each bin containing the sum of the
 magnitudes from the corresponding bin in all octaves.  The number of
 values in each feature vector returned by the plugin is therefore the
 same as the number of bins per octave configured for the underlying
 constant Q transform.
</p>
<p>The pitch range and the number of frequency bins per octave for the
 transform may be adjusted using the plugin's parameters.  Note that
 the plugin's preferred step and block sizes depend on these
 parameters, and the plugin will not accept any other block size than
 its preferred value.
</p>
<h3>Parameters</h3>

<p><b>Minimum Pitch</b> &ndash; The MIDI pitch value (0-127) corresponding to the
 lowest frequency to be included in the constant-Q transform used in
 calculating the chromagram.
</p>
<p><b>Maximum Pitch</b> &ndash; The MIDI pitch value (0-127) corresponding to the
 lowest frequency to be included in the constant-Q transform used in
 calculating the chromagram.
</p>
<p><b>Tuning Frequency</b> &ndash; The frequency of concert A in the
 music under analysis.
</p>
<p><b>Bins per Octave</b> &ndash; The number of constant-Q transform bins to be
 computed per octave, and thus the total number of bins present in the
 resulting chromagram.
</p>
<p><b>Normalized</b> &ndash; Whether to normalize each output column. Normalization
 may be to unit sum or unit maximum.
</p>
<h3>Outputs</h3>

<p><b>Chromagram</b> &ndash; The calculated chromagram, as a single feature per
 process block containing the number of bins given in the bins per
 octave parameter.
</p>
<h3>References and Credits</h3>

<p>The Chromagram Vamp plugin was written by Christian Landone.
</p>
<a name="qm-mfcc"></a><h2>13. Mel-Frequency Cepstral Coefficients</h2>

<p><b>System identifier</b> &ndash;    <code>vamp:qm-vamp-plugins:qm-mfcc</code>
<br><b>RDF URI</b> &ndash; <a href="http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-mfcc">http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-mfcc</a>
<br><b>Links</b> &ndash; <a href="#">Back to top of library documentation</a> &ndash; <a href="https://code.soundsoftware.ac.uk/projects/qm-vamp-plugins/files">Download location</a>
</p>
<p>Mel-Frequency Cepstral Coefficients calculates MFCCs from a single
 channel of audio.  These coefficients, derived from a cosine transform
 of the mapping of an audio spectrum onto a frequency scale modelled on
 human auditory response, are widely used in speech recognition, music
 classification and other tasks.
</p>
<h3>Parameters</h3>

<p><b>Number of Coefficients</b> &ndash; The number of MFCCs to return.  Commonly
 used values include 13 or the default 20.  This number includes C0 if
 requested (see Include C0 below).
</p>
<p><b>Power for Mel Amplitude Logs</b> &ndash; An optional power value to which the
 spectral amplitudes should be raised before applying the cosine
 transform.  Values greater than 1 may in principle reduce the
 contribution of noise to the results.  The default is 1.
</p>
<p><b>Include C0</b> &ndash; Whether to include the "zero'th" coefficient, which
 simply reflects the overall signal power across the Mel frequency
 bands.
</p>
<h3>Outputs</h3>

<p><b>Coefficients</b> &ndash; The MFCC values, returned as one vector feature per
 processing block.
</p>
<p><b>Means of Coefficients</b> &ndash; The overall means of the MFCC bins, as a
 single vector feature with time 0 that is returned when processing is
 complete.
</p>
<h3>References and Credits</h3>

<p><b>MFCCs in music</b>: See B. Logan. <i><a href="http://ismir2000.ismir.net/papers/logan_paper.pdf">Mel-Frequency Cepstral Coefficients for Music Modeling</a></i>. In Proceedings of the First International
 Symposium on Music Information Retrieval (ISMIR), 2000.
</p>
<p>The Mel-Frequency Cepstral Coefficients Vamp plugin was written by
 Nicolas Chetry and Chris Cannam.
</p>
<p></p>
</CONTENTS>
</body>
</html>