QM Vamp Plugins: User Documentation

# HG changeset patch # User cannam # Date 1227267705 0 # Node ID 16f8de0dc974ffdef5fc16ddd7b467fb5dd2ca39 # Parent c57ba57f33fa22a49a1f74462e4d9a3601683e39 * Add doc for QM plugins diff -r c57ba57f33fa -r 16f8de0dc974 plugin-doc/qm-vamp-plugins.html --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/plugin-doc/qm-vamp-plugins.html Fri Nov 21 11:41:45 2008 +0000 @@ -0,0 +1,603 @@ + + + + + + + QM Vamp Plugins: User Documentation + + + + + +

QM Vamp Plugins

+ +

The QM Vamp Plugin set is a library of Vamp audio feature +extraction plugins developed at the Centre for Digital +Music at Queen Mary, University of London. These plugins are +provided as a single library file, made available in binary form for +Windows, OS/X, and Linux from the Centre for Digital Music's download +page. +

For more information about Vamp plugins, see http://www.vamp-plugins.org/ . +

+ +

1. Note Onset Detector

2. Tempo and Beat Tracker

3. Key Detector

4. Tonal Change

5. Segmenter

6. Similarity

7. Constant-Q Spectrogram

8. Chromagram

9. Mel-Frequency Cepstral Coefficients

+ +

1. Note Onset Detector

+ +

System identifier – vamp:qm-vamp-plugins:qm-onsetdetector +
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-onsetdetector +
Links – Back to top of library documentation – Download location +

Note Onset Detector analyses a single channel of audio and estimates + the onset times of notes within the music – that is, the times at + which notes and other audible events begin. +

It calculates an onset likelihood function for each spectral frame, + and picks peaks in a smoothed version of this function. The plugin is + non-causal, returning all results at the end of processing. +

Parameters

+ +

Onset Detection Function Type – The method used to calculate the + onset likelihood function. The most versatile method is the default, + "Complex Domain" (see reference, Duxbury et al 2003). "Spectral + Difference" may be appropriate for percussive recordings, "Phase + Deviation" for non-percussive music, and "Broadband Energy Rise" (see + reference, Barry et al 2005) for identifying percussive onsets in + mixed music. +

Onset Detector Sensitivity – Sensitivity level for peak detection + in the onset likelihood function. The higher the sensitivity, the + more onsets will (rightly or wrongly) be detected. The peak picker + does not have a simple threshold level; instead, this parameter + controls the required "steepness" of the slopes in the smoothed + detection function either side of a peak value, in order for that peak + to be accepted as an onset. +

Adaptive Whitening – This option evens out the temporal and + frequency variation in the signal, which can yield improved + performance in onset detection, for example in audio with big + variations in dynamics. +

Outputs

+ +

Note Onsets – The detected note onset times, returned as a single + feature with timestamp but no value for each detected note. +

Onset Detection Function – The raw note onset likelihood function + that was calculated as the first step of the detection process. +

Smoothed Detection Function – The note onset likelihood function + following median filtering. This is the function from which + sufficiently steep peak values are picked and classified as onsets. +

References and Credits

+ +

Basic detection methods: C. Duxbury, J. P. Bello, M. Davies and + M. Sandler, Complex domain Onset Detection for Musical Signals. In + Proceedings of the 6th Conference on Digital Audio Effects + (DAFx-03). London, UK. September 2003. +

Adaptive whitening: D. Stowell and M. D. Plumbley, Adaptive whitening for improved real-time audio onset detection. In + Proceedings of the International Computer Music Conference (ICMC'07), + August 2007. +

Percussion onset detector: D. Barry, D. Fitzgerald, E. Coyle and + B. Lawlor, Drum Source Separation using Percussive Feature Detection and Spectral Modulation. ISSC 2005. +

The Note Onset Detector Vamp plugin was written by Chris Duxbury, Juan + Pablo Bello and Christian Landone. +

2. Tempo and Beat Tracker

+ +

System identifier – vamp:qm-vamp-plugins:qm-tempotracker +
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tempotracker +
Links – Back to top of library documentation – Download location +

Tempo and Beat Tracker analyses a single channel of audio and + estimates the positions of metrical beats within the music (the + equivalent of a human listener tapping their foot to the beat). +

Parameters

+ +

Outputs

+ +

Beats – The estimated beat locations, returned as a single feature, + with timestamp but no value, for each beat, labelled with the + corresponding estimated tempo at that beat. +

Onset Detection Function – The raw note onset likelihood function + used in beat estimation. +

Tempo – The estimated tempo, returned as a feature each time the + estimated tempo changes, with a single value for the tempo in beats + per minute. +

References and Credits

+ +

Beat tracking method: M. E. P. Davies and M. D. Plumbley. + Context-dependent beat tracking of musical audio. In IEEE + Transactions on Audio, Speech and Language Processing. Vol. 15, No. 3, + pp1009-1020, 2007. See also M. E. P. Davies and M. D. Plumbley. + Beat Tracking With A Two State Model. In Proceedings of the IEEE + International Conference on Acoustics, Speech and Signal Processing + (ICASSP 2005), Vol. 3, pp241-244 Philadelphia, USA, March 19-23, 2005. +

Onset detection methods: C. Duxbury, J. P. Bello, M. Davies and + M. Sandler, Complex domain Onset Detection for Musical Signals. In + Proceedings of the 6th Conference on Digital Audio Effects + (DAFx-03). London, UK. September 2003. +

Percussion onset detector: D. Barry, D. Fitzgerald, E. Coyle and + B. Lawlor, Drum Source Separation using Percussive Feature Detection and Spectral Modulation. ISSC 2005. +

The Tempo and Beat Tracker Vamp plugin was written by Matthew Davies + and Christian Landone. +

3. Key Detector

+ +

System identifier – vamp:qm-vamp-plugins:qm-keydetector +
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-keydetector +
Links – Back to top of library documentation – Download location +

Key Detector analyses a single channel of audio and continuously + estimates the key of the music by comparing the degree to which a + block-by-block chromagram correlates to the stored key profiles for + each major and minor key. +

The key profiles are drawn from analysis of Book I of the Well + Tempered Klavier by J S Bach, recorded at A=440 equal temperament. +

Parameters

+ +

Tuning Frequency – The frequency of concert A in the music under + analysis. +

Window Length – The number of chroma analysis frames taken into + account for key estimation. This controls how eager the key detector + will be to return short-duration tonal changes as new key changes (the + shorter the window, the more likely it is to detect a new key change). +

Outputs

+ +

Tonic Pitch – The tonic pitch of each estimated key change, + returned as a single-valued feature at the point where the key change + is detected, with value counted from 1 to 12 where C is 1, C# or Db is + 2, and so on up to B which is 12. +

Key Mode – The major or minor mode of the estimated key, where + major is 0 and minor is 1. +

Key – The estimated key for each key change, returned as a + single-valued feature at the point where the key change is detected, + with value counted from 1 to 24 where 1-12 are the major keys and + 13-24 are the minor keys, such that C major is 1, C# major is 2, and + so on up to B major which is 12; then C minor is 13, Db minor is 14, + and so on up to B minor which is 24. +

Key Strength Plot – A grid representing the ongoing key + "probability" throughout the music. This is returned as a feature for + each chroma frame, containing 25 bins. Bins 1-12 are the major keys + from C upwards; bins 14-25 are the minor keys from C upwards. The + 13th bin is unused: it just provides space between the first and + second halves of the feature if displayed in a single plot. +

The outputs are also labelled with pitch or key as text. +

References and Credits

+ +

Method: see K. Noland and M. Sandler. Signal Processing Parameters for Tonality Estimation. In Proceedings of Audio Engineering Society + 122nd Convention, Vienna, 2007. +

The Key Detector Vamp plugin was written by Katy Noland and Christian + Landone. +

4. Tonal Change

+ +

System identifier – vamp:qm-vamp-plugins:qm-tonalchange +
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tonalchange +
Links – Back to top of library documentation – Download location +

Tonal Change analyses a single channel of audio, detecting harmonic + changes such as chord boundaries. +

Parameters

+ +

Gaussian smoothing – The window length for the internal smoothing + operation, in chroma analysis frames. This controls how eager the + tonal change detector will be to identify very short-term tonal + changes. The default value of 5 is quite short, and may lead to more + (not always meaningful) results being returned; for many purposes a + larger value, closer to the maximum of 20, may be appropriate. +

Chromagram minimum pitch – The MIDI pitch value (0-127) of the + minimum pitch included in the internal chromagram analyis. +

Chromagram maximum pitch – The MIDI pitch value (0-127) of the + maximum pitch included in the internal chromagram analyis. +

Chromagram tuning frequency – The frequency of concert A in the + music under analysis. +

Outputs

+ +

Transform to 6D Tonal Content Space – A representation of the + musical content in a six-dimensional tonal space onto which the + algorithm maps 12-bin chroma vectors extracted from the audio. +

Tonal Change Detection Function – A function representing the + estimated likelihood of a tonal change occurring in each spectral + frame. +

Tonal Change Positions – The resulting estimated positions of tonal + changes. +

References and Credits

+ +

Method: C. A. Harte, M. Gasser, and M. Sandler. Detecting harmonic change in musical audio. In Proceedings of the 1st ACM workshop on + Audio and Music Computing Multimedia, Santa Barbara, 2006. +

The Tonal Change Vamp plugin was wrtitten by Chris Harte and Martin + Gasser. +

5. Segmenter

+ +

System identifier – vamp:qm-vamp-plugins:qm-segmenter +
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-segmenter +
Links – Back to top of library documentation – Download location +

Segmenter divides a single channel of music up into structurally + consistent segments. It returns a numeric value (the segment type) + for each moment at which a new segment starts. +

For music with clearly tonally distinguishable sections such as verse, + chorus, etc., segments with the same type may be expected to be + similar to one another in some structural sense. For example, + repetitions of the chorus are likely to share a segment type. +

The plugin only attempts to identify similar segments; it does not + attempt to label them. For example, it makes no attempt to tell you + which segment is the chorus. +

Note that this plugin does a substantial amount of processing after + receiving all of the input audio data, before it produces any results. +

Method

+ +

The method relies upon structural/timbral similarity to obtain the + high-level song structure. This is based on the assumption that the + distributions of timbre features are similar over corresponding + structural elements of the music. +

The algorithm works by obtaining a frequency-domain representation of + the audio signal using a Constant-Q transform, a Chromagram or + Mel-Frequency Cepstral Coefficients (MFCC) as underlying features (the + particular feature is selectable as a parameter). The extracted + features are normalised in accordance with the MPEG-7 standard (NASE + descriptor), which means the spectrum is converted to decibel scale + and each spectral vector is normalised by the RMS energy envelope. + The value of this envelope is stored for each processing block of + audio. This is followed by the extraction of 20 principal components + per block using PCA, yielding a sequence of 21 dimensional feature + vectors where the last element in each vector corresponds to the + energy envelope. +

A 40-state Hidden Markov Model is then trained on the whole sequence + of features, with each state of the HMM corresponding to a specific + timbre type. This process partitions the timbre-space of a given track + into 40 possible types. The important assumption of the model is that + the distribution of these features remain consistent over a structural + segment. After training and decoding the HMM, the song is assigned a + sequence of timbre-features according to specific timbre-type + distributions for each possible structural segment. +

The segmentation itself is computed by clustering timbre-type + histograms. A series of histograms are created over a sliding window + which are grouped into M clusters by an adapted soft k-means + algorithm. Each of these clusters will correspond to a specific + segment-type of the analyzed song. Reference histograms, iteratively + updated during clustering, describe the timbre distribution for each + segment. The segmentation arises from the final cluster assignments. +

Parameters

+ +

Number of segment-types – The maximum number of clusters + (segment-types) to be returned. The default is 10. Unlike many + clustering algorithms, the constrained clustering used in this plugin + does not produce too many clusters or vary significantly even if this + is set too high. However, this parameter can be useful for limiting + the number of expected segment-types. +

Feature Type – The type of spectral feature used for segmentation. The available features are:

"Hybrid", the default, which uses a Constant-Q transform (see related + plugin): this is generally effective for modern studio recordings;
"Chromatic", using a chromagram derived from the Constant-Q feature (see related plugin): this may be preferable for live, acoustic, or older recordings, in which repeated sections may be less consistent in + sound;
"Timbral", using Mel-Frequency + Cepstral Coefficients (see related plugin), which is more likely to + result in classification by instrumentation rather than musical + content.

Minimum segment duration – The approximate expected minimum + duration for a segment, from 1 to 15 seconds. Changing this parameter + may help the plugin to find musical sections rather than just + following changes in the sound of the music, and also avoid wasting a + segment-type cluster for timbrally distinct but too-short segments. + The default of 4 seconds usually produces good results. +

Outputs

+ +

Segmentation – The estimated segment boundaries, returned as a + single feature with one value at each segment boundary, with the value + representing the segment type number for the segment starting at that + boundary. +

References and Credits

+ +

Method: M. Levy and M. Sandler. Structural segmentation of musical audio by constrained clustering. IEEE Transactions on Audio, Speech, and Language Processing, February 2008. +

Note that this plugin does not implement the beat-sychronous aspect + of the segmentation method described in the paper. +

The Segmenter Vamp plugin was written by Mark Levy. Thanks to George + Fazekas for providing much of this documentation. +

6. Similarity

+ +

System identifier – vamp:qm-vamp-plugins:qm-similarity +
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-similarity +
Links – Back to top of library documentation – Download location +

Similarity treats each channel of its audio input as a separate + "track", and estimates how similar the tracks are to one another using + a selectable similarity measure. +

The plugin also returns the intermediate data used as a basis of the + similarity measure; it can therefore be used on a single channel of + input (with the resulting intermediate data then being applied in some + other similarity or clustering algorithm, for example) if desired, as + well as with multiple inputs. +

Because of the way this plugin handles multiple inputs, by assuming + that each channel represents a separate piece of music, it may not be + appropriate for use directly in a general-purpose host (unless you + actually want to do something like compare two stereo channels for + timbral similarity, which is unlikely). +

Parameters

+ +

Feature Type – The underlying audio feature used for the similarity + measure. The available features are: +

"Timbre", in which the distance + between tracks is a symmetrised Kullback-Leibler divergence between + Gaussian-modelled MFCC means and variances across each track, for the + first 20 MFCCs including C0 (see related plugin);
"Chroma", which uses Kullback-Leibler divergence of + mean chroma histogram (see related plugin);
"Rhythm", using the cosine distance between + "beat spectrum" measures derived from a short sampled section of the + track;
and combined "Timbre and Rhythm" and "Chroma and Rhythm" + features.

Outputs

+ +

Distance Matrix – A matrix of the distance measures between input + channels, returned as a series of vector features timestamped at + one-second intervals. The distance from channel i to channel j + appears as the j'th bin of the feature at time i. +

Distance from First Channel – A single vector feature, timestamped + at time zero, containing the distances between the first input channel + and each of the input channels (including the first channel itself at + bin 0, which should have zero distance). +

Ordered Distances from First Channel – A pair of vector features, + at times 0 and 1 second. The feature at time 0 contains the 1-based + indices of the input channels in the order of similarity to the first + input channel (so its first bin should always contain 1, as the first + channel is most similar to itself). The feature at time 1 contains, + in bin n, the distance between the first input channel and the channel + with index found at bin n of the feature at time 0. +

Feature Means – A series of vector features containing the mean + values of each of the feature bins across the duration of each of the + input channels. This output returns one feature for each input + channel, timestamped at one-second intervals. The number of bins for + each feature depends on the feature type; it will be 20 for MFCC + features and 12 for chroma features. No features will be returned on + this output if the feature type is purely rhythmic. +

Feature Variances – Just as Feature Means, but variances. +

Beat Spectra – A series of vector features containing the rhythmic + autocorrelation profiles (beat spectra) for each of the input + channels. This output returns one 512-bin feature for each input + channel, timestamped at one-second intervals. No features will be + returned on this output if the feature type contains no rhythm + component. +

References and Credits

+ +

Timbral similarity: M. Levy and M. Sandler. Lightweight measures for timbral similarity of musical audio. In Proceedings of the 1st + ACM workshop on Audio and Music Computing Multimedia, Santa Barbara, + 2006. +

Combined rhythmic and timbral similarity: K. Jacobson. A Multifaceted Approach to Music Similarity. In Proceedings of the + Seventh International Conference on Music Information Retrieval + (ISMIR), 2006. +

The Similarity Vamp plugin was written by Mark Levy, Kurt Jacobson and + Chris Cannam. +

7. Constant-Q Spectrogram

+ +

System identifier – vamp:qm-vamp-plugins:qm-constantq +
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-constantq +
Links – Back to top of library documentation – Download location +

Constant-Q Spectrogram calculates a spectrogram based on a short-time + windowed constant Q spectral transform. This is a spectrogram in + which the ratio of centre frequency to resolution is constant for each + frequency bin. The frequency bins correspond to the frequencies of + "musical notes" rather than being linearly spaced in frequency as they + are for the conventional DFT spectrogram. +

The pitch range and the number of frequency bins per octave may be + adjusted using the plugin's parameters. Note that the plugin's + preferred step and block sizes are defined by these parameters, and + the plugin will not accept any other block size than its preferred + value. +

Parameters

+ +

Minimum Pitch – The MIDI pitch value (0-127) corresponding to the lowest + frequency to be included in the constant-Q transform. +

Maximum Pitch – The MIDI pitch value (0-127) corresponding to the + lowest frequency to be included in the constant-Q transform. +

Tuning Frequency – The frequency of concert A in the + music under analysis. +

Bins per Octave – The number of constant-Q transform bins to be + computed per octave. +

Normalized – Whether to normalize each output column to unit + maximum. +

Outputs

+ +

Constant-Q Spectrogram – The calculated spectrogram, as a single + feature per process block containing one bin for each pitch included + in the spectrogram's range. +

References and Credits

+ +

Principle: J. Brown. Calculation of a constant Q spectral transform. Journal of the Acoustical Society of America, 89(1): + 425-434, 1991. +

The Constant-Q Spectrogram Vamp plugin was written by Christian + Landone. +

8. Chromagram

+ +

System identifier – vamp:qm-vamp-plugins:qm-chromagram +
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-chromagram +
Links – Back to top of library documentation – Download location +

Chromagram calculates a constant Q spectral transform (as in the + Constant Q Spectrogram plugin) and then wraps the frequency bin values + into a single octave, with each bin containing the sum of the + magnitudes from the corresponding bin in all octaves. The number of + values in each feature vector returned by the plugin is therefore the + same as the number of bins per octave configured for the underlying + constant Q transform. +

The pitch range and the number of frequency bins per octave for the + transform may be adjusted using the plugin's parameters. Note that + the plugin's preferred step and block sizes depend on these + parameters, and the plugin will not accept any other block size than + its preferred value. +

Parameters

+ +

Minimum Pitch – The MIDI pitch value (0-127) corresponding to the + lowest frequency to be included in the constant-Q transform used in + calculating the chromagram. +

Maximum Pitch – The MIDI pitch value (0-127) corresponding to the + lowest frequency to be included in the constant-Q transform used in + calculating the chromagram. +

Tuning Frequency – The frequency of concert A in the + music under analysis. +

Bins per Octave – The number of constant-Q transform bins to be + computed per octave, and thus the total number of bins present in the + resulting chromagram. +

Normalized – Whether to normalize each output column. Normalization + may be to unit sum or unit maximum. +

Outputs

+ +

Chromagram – The calculated chromagram, as a single feature per + process block containing the number of bins given in the bins per + octave parameter. +

References and Credits

+ +

The Chromagram Vamp plugin was written by Christian Landone. +

9. Mel-Frequency Cepstral Coefficients

+ +

System identifier – vamp:qm-vamp-plugins:qm-mfcc +
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-mfcc +
Links – Back to top of library documentation – Download location +

Mel-Frequency Cepstral Coefficients calculates MFCCs from a single + channel of audio. These coefficients, derived from a cosine transform + of the mapping of an audio spectrum onto a frequency scale modelled on + human auditory response, are widely used in speech recognition, music + classification and other tasks. +

Parameters

+ +

Number of Coefficients – The number of MFCCs to return. Commonly + used values include 13 or the default 20. This number includes C0 if + requested (see Include C0 below). +

Power for Mel Amplitude Logs – An optional power value to which the + spectral amplitudes should be raised before applying the cosine + transform. Values greater than 1 may in principle reduce the + contribution of noise to the results. The default is 1. +

Include C0 – Whether to include the "zero'th" coefficient, which + simply reflects the overall signal power across the Mel frequency + bands. +

Outputs

+ +

Coefficients – The MFCC values, returned as one vector feature per + processing block. +

Means of Coefficients – The overall means of the MFCC bins, as a + single vector feature with time 0 that is returned when processing is + complete. +

References and Credits

+ +

MFCCs in music: See B. Logan. Mel-Frequency Cepstral Coefficients for Music Modeling. In Proceedings of the First International + Symposium on Music Information Retrieval (ISMIR), 2000. +

The Mel-Frequency Cepstral Coefficients Vamp plugin was written by + Nicolas Chetry and Chris Cannam. +

+ + +