QM Vamp Plugins: User Documentation

cannam@16: cannam@16: cannam@16: cannam@16: cannam@16: cannam@16: cannam@16: QM Vamp Plugins: User Documentation cannam@16: cannam@16: cannam@16: cannam@16: cannam@16: cannam@16:

QM Vamp Plugins

cannam@16: cannam@16:

The QM Vamp Plugin set is a library of Vamp audio feature cannam@16: extraction plugins developed at the Centre for Digital cannam@16: Music at Queen Mary, University of London. These plugins are cannam@16: provided as a single library file, made available in binary form for cannam@16: Windows, OS/X, and Linux from the Centre for Digital Music's download cannam@16: page. cannam@16:

cannam@16:

For more information about Vamp plugins, see http://www.vamp-plugins.org/ . cannam@16:

cannam@16: cannam@16:

1. Note Onset Detector

cannam@16:

2. Tempo and Beat Tracker

cannam@16:

3. Key Detector

cannam@16:

4. Tonal Change

cannam@16:

5. Segmenter

cannam@16:

6. Similarity

cannam@16:

7. Constant-Q Spectrogram

cannam@16:

8. Chromagram

cannam@16:

9. Mel-Frequency Cepstral Coefficients

cannam@16: cannam@16:

1. Note Onset Detector

cannam@16: cannam@16:

System identifier – vamp:qm-vamp-plugins:qm-onsetdetector cannam@16:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-onsetdetector cannam@16:
Links – Back to top of library documentation – Download location cannam@16:

cannam@16:

Note Onset Detector analyses a single channel of audio and estimates cannam@16: the onset times of notes within the music – that is, the times at cannam@16: which notes and other audible events begin. cannam@16:

cannam@16:

It calculates an onset likelihood function for each spectral frame, cannam@16: and picks peaks in a smoothed version of this function. The plugin is cannam@16: non-causal, returning all results at the end of processing. cannam@16:

cannam@16:

Parameters

cannam@16: cannam@16:

Onset Detection Function Type – The method used to calculate the cannam@16: onset likelihood function. The most versatile method is the default, cannam@16: "Complex Domain" (see reference, Duxbury et al 2003). "Spectral cannam@16: Difference" may be appropriate for percussive recordings, "Phase cannam@16: Deviation" for non-percussive music, and "Broadband Energy Rise" (see cannam@16: reference, Barry et al 2005) for identifying percussive onsets in cannam@16: mixed music. cannam@16:

cannam@16:

Onset Detector Sensitivity – Sensitivity level for peak detection cannam@16: in the onset likelihood function. The higher the sensitivity, the cannam@16: more onsets will (rightly or wrongly) be detected. The peak picker cannam@16: does not have a simple threshold level; instead, this parameter cannam@16: controls the required "steepness" of the slopes in the smoothed cannam@16: detection function either side of a peak value, in order for that peak cannam@16: to be accepted as an onset. cannam@16:

cannam@16:

Adaptive Whitening – This option evens out the temporal and cannam@16: frequency variation in the signal, which can yield improved cannam@16: performance in onset detection, for example in audio with big cannam@16: variations in dynamics. cannam@16:

cannam@16:

Outputs

cannam@16: cannam@16:

Note Onsets – The detected note onset times, returned as a single cannam@16: feature with timestamp but no value for each detected note. cannam@16:

cannam@16:

Onset Detection Function – The raw note onset likelihood function cannam@16: that was calculated as the first step of the detection process. cannam@16:

cannam@16:

Smoothed Detection Function – The note onset likelihood function cannam@16: following median filtering. This is the function from which cannam@16: sufficiently steep peak values are picked and classified as onsets. cannam@16:

cannam@16:

References and Credits

cannam@16: cannam@16:

Basic detection methods: C. Duxbury, J. P. Bello, M. Davies and cannam@16: M. Sandler, Complex domain Onset Detection for Musical Signals. In cannam@16: Proceedings of the 6th Conference on Digital Audio Effects cannam@16: (DAFx-03). London, UK. September 2003. cannam@16:

cannam@16:

Adaptive whitening: D. Stowell and M. D. Plumbley, Adaptive whitening for improved real-time audio onset detection. In cannam@16: Proceedings of the International Computer Music Conference (ICMC'07), cannam@16: August 2007. cannam@16:

cannam@16:

Percussion onset detector: D. Barry, D. Fitzgerald, E. Coyle and cannam@16: B. Lawlor, Drum Source Separation using Percussive Feature Detection and Spectral Modulation. ISSC 2005. cannam@16:

cannam@16:

The Note Onset Detector Vamp plugin was written by Chris Duxbury, Juan cannam@16: Pablo Bello and Christian Landone. cannam@16:

cannam@16:

2. Tempo and Beat Tracker

cannam@16: cannam@16:

System identifier – vamp:qm-vamp-plugins:qm-tempotracker cannam@16:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tempotracker cannam@16:
Links – Back to top of library documentation – Download location cannam@16:

cannam@16:

Tempo and Beat Tracker analyses a single channel of audio and cannam@16: estimates the positions of metrical beats within the music (the cannam@16: equivalent of a human listener tapping their foot to the beat). cannam@16:

cannam@16:

Parameters

cannam@16: cannam@16:

cannam@16:

Outputs

cannam@16: cannam@16:

Beats – The estimated beat locations, returned as a single feature, cannam@16: with timestamp but no value, for each beat, labelled with the cannam@16: corresponding estimated tempo at that beat. cannam@16:

cannam@16:

Onset Detection Function – The raw note onset likelihood function cannam@16: used in beat estimation. cannam@16:

cannam@16:

Tempo – The estimated tempo, returned as a feature each time the cannam@16: estimated tempo changes, with a single value for the tempo in beats cannam@16: per minute. cannam@16:

cannam@16:

References and Credits

cannam@16: cannam@16:

Beat tracking method: M. E. P. Davies and M. D. Plumbley. cannam@16: Context-dependent beat tracking of musical audio. In IEEE cannam@16: Transactions on Audio, Speech and Language Processing. Vol. 15, No. 3, cannam@16: pp1009-1020, 2007. See also M. E. P. Davies and M. D. Plumbley. cannam@16: Beat Tracking With A Two State Model. In Proceedings of the IEEE cannam@16: International Conference on Acoustics, Speech and Signal Processing cannam@16: (ICASSP 2005), Vol. 3, pp241-244 Philadelphia, USA, March 19-23, 2005. cannam@16:

cannam@16:

Onset detection methods: C. Duxbury, J. P. Bello, M. Davies and cannam@16: M. Sandler, Complex domain Onset Detection for Musical Signals. In cannam@16: Proceedings of the 6th Conference on Digital Audio Effects cannam@16: (DAFx-03). London, UK. September 2003. cannam@16:

cannam@16:

Percussion onset detector: D. Barry, D. Fitzgerald, E. Coyle and cannam@16: B. Lawlor, Drum Source Separation using Percussive Feature Detection and Spectral Modulation. ISSC 2005. cannam@16:

cannam@16:

The Tempo and Beat Tracker Vamp plugin was written by Matthew Davies cannam@16: and Christian Landone. cannam@16:

cannam@16:

3. Key Detector

cannam@16: cannam@16:

System identifier – vamp:qm-vamp-plugins:qm-keydetector cannam@16:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-keydetector cannam@16:
Links – Back to top of library documentation – Download location cannam@16:

cannam@16:

Key Detector analyses a single channel of audio and continuously cannam@16: estimates the key of the music by comparing the degree to which a cannam@16: block-by-block chromagram correlates to the stored key profiles for cannam@16: each major and minor key. cannam@16:

cannam@16:

The key profiles are drawn from analysis of Book I of the Well cannam@16: Tempered Klavier by J S Bach, recorded at A=440 equal temperament. cannam@16:

cannam@16:

Parameters

cannam@16: cannam@16:

Tuning Frequency – The frequency of concert A in the music under cannam@16: analysis. cannam@16:

cannam@16:

Window Length – The number of chroma analysis frames taken into cannam@16: account for key estimation. This controls how eager the key detector cannam@16: will be to return short-duration tonal changes as new key changes (the cannam@16: shorter the window, the more likely it is to detect a new key change). cannam@16:

cannam@16:

Outputs

cannam@16: cannam@16:

Tonic Pitch – The tonic pitch of each estimated key change, cannam@16: returned as a single-valued feature at the point where the key change cannam@16: is detected, with value counted from 1 to 12 where C is 1, C# or Db is cannam@16: 2, and so on up to B which is 12. cannam@16:

cannam@16:

Key Mode – The major or minor mode of the estimated key, where cannam@16: major is 0 and minor is 1. cannam@16:

cannam@16:

Key – The estimated key for each key change, returned as a cannam@16: single-valued feature at the point where the key change is detected, cannam@16: with value counted from 1 to 24 where 1-12 are the major keys and cannam@16: 13-24 are the minor keys, such that C major is 1, C# major is 2, and cannam@16: so on up to B major which is 12; then C minor is 13, Db minor is 14, cannam@16: and so on up to B minor which is 24. cannam@16:

cannam@16:

Key Strength Plot – A grid representing the ongoing key cannam@16: "probability" throughout the music. This is returned as a feature for cannam@16: each chroma frame, containing 25 bins. Bins 1-12 are the major keys cannam@16: from C upwards; bins 14-25 are the minor keys from C upwards. The cannam@16: 13th bin is unused: it just provides space between the first and cannam@16: second halves of the feature if displayed in a single plot. cannam@16:

cannam@16:

The outputs are also labelled with pitch or key as text. cannam@16:

cannam@16:

References and Credits

cannam@16: cannam@16:

Method: see K. Noland and M. Sandler. Signal Processing Parameters for Tonality Estimation. In Proceedings of Audio Engineering Society cannam@16: 122nd Convention, Vienna, 2007. cannam@16:

cannam@16:

The Key Detector Vamp plugin was written by Katy Noland and Christian cannam@16: Landone. cannam@16:

cannam@16:

4. Tonal Change

cannam@16: cannam@16:

System identifier – vamp:qm-vamp-plugins:qm-tonalchange cannam@16:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tonalchange cannam@16:
Links – Back to top of library documentation – Download location cannam@16:

cannam@16:

Tonal Change analyses a single channel of audio, detecting harmonic cannam@16: changes such as chord boundaries. cannam@16:

cannam@16:

Parameters

cannam@16: cannam@16:

Gaussian smoothing – The window length for the internal smoothing cannam@16: operation, in chroma analysis frames. This controls how eager the cannam@16: tonal change detector will be to identify very short-term tonal cannam@16: changes. The default value of 5 is quite short, and may lead to more cannam@16: (not always meaningful) results being returned; for many purposes a cannam@16: larger value, closer to the maximum of 20, may be appropriate. cannam@16:

cannam@16:

Chromagram minimum pitch – The MIDI pitch value (0-127) of the cannam@16: minimum pitch included in the internal chromagram analyis. cannam@16:

cannam@16:

Chromagram maximum pitch – The MIDI pitch value (0-127) of the cannam@16: maximum pitch included in the internal chromagram analyis. cannam@16:

cannam@16:

Chromagram tuning frequency – The frequency of concert A in the cannam@16: music under analysis. cannam@16:

cannam@16:

Outputs

cannam@16: cannam@16:

Transform to 6D Tonal Content Space – A representation of the cannam@16: musical content in a six-dimensional tonal space onto which the cannam@16: algorithm maps 12-bin chroma vectors extracted from the audio. cannam@16:

cannam@16:

Tonal Change Detection Function – A function representing the cannam@16: estimated likelihood of a tonal change occurring in each spectral cannam@16: frame. cannam@16:

cannam@16:

Tonal Change Positions – The resulting estimated positions of tonal cannam@16: changes. cannam@16:

cannam@16:

References and Credits

cannam@16: cannam@16:

Method: C. A. Harte, M. Gasser, and M. Sandler. Detecting harmonic change in musical audio. In Proceedings of the 1st ACM workshop on cannam@16: Audio and Music Computing Multimedia, Santa Barbara, 2006. cannam@16:

cannam@16:

The Tonal Change Vamp plugin was wrtitten by Chris Harte and Martin cannam@16: Gasser. cannam@16:

cannam@16:

5. Segmenter

cannam@16: cannam@16:

System identifier – vamp:qm-vamp-plugins:qm-segmenter cannam@16:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-segmenter cannam@16:
Links – Back to top of library documentation – Download location cannam@16:

cannam@16:

Segmenter divides a single channel of music up into structurally cannam@16: consistent segments. It returns a numeric value (the segment type) cannam@16: for each moment at which a new segment starts. cannam@16:

cannam@16:

For music with clearly tonally distinguishable sections such as verse, cannam@16: chorus, etc., segments with the same type may be expected to be cannam@16: similar to one another in some structural sense. For example, cannam@16: repetitions of the chorus are likely to share a segment type. cannam@16:

cannam@16:

The plugin only attempts to identify similar segments; it does not cannam@16: attempt to label them. For example, it makes no attempt to tell you cannam@16: which segment is the chorus. cannam@16:

cannam@16:

Note that this plugin does a substantial amount of processing after cannam@16: receiving all of the input audio data, before it produces any results. cannam@16:

cannam@16:

Method

cannam@16: cannam@16:

The method relies upon structural/timbral similarity to obtain the cannam@16: high-level song structure. This is based on the assumption that the cannam@16: distributions of timbre features are similar over corresponding cannam@16: structural elements of the music. cannam@16:

cannam@16:

The algorithm works by obtaining a frequency-domain representation of cannam@16: the audio signal using a Constant-Q transform, a Chromagram or cannam@16: Mel-Frequency Cepstral Coefficients (MFCC) as underlying features (the cannam@16: particular feature is selectable as a parameter). The extracted cannam@16: features are normalised in accordance with the MPEG-7 standard (NASE cannam@16: descriptor), which means the spectrum is converted to decibel scale cannam@16: and each spectral vector is normalised by the RMS energy envelope. cannam@16: The value of this envelope is stored for each processing block of cannam@16: audio. This is followed by the extraction of 20 principal components cannam@16: per block using PCA, yielding a sequence of 21 dimensional feature cannam@16: vectors where the last element in each vector corresponds to the cannam@16: energy envelope. cannam@16:

cannam@16:

A 40-state Hidden Markov Model is then trained on the whole sequence cannam@16: of features, with each state of the HMM corresponding to a specific cannam@16: timbre type. This process partitions the timbre-space of a given track cannam@16: into 40 possible types. The important assumption of the model is that cannam@16: the distribution of these features remain consistent over a structural cannam@16: segment. After training and decoding the HMM, the song is assigned a cannam@16: sequence of timbre-features according to specific timbre-type cannam@16: distributions for each possible structural segment. cannam@16:

cannam@16:

The segmentation itself is computed by clustering timbre-type cannam@16: histograms. A series of histograms are created over a sliding window cannam@16: which are grouped into M clusters by an adapted soft k-means cannam@16: algorithm. Each of these clusters will correspond to a specific cannam@16: segment-type of the analyzed song. Reference histograms, iteratively cannam@16: updated during clustering, describe the timbre distribution for each cannam@16: segment. The segmentation arises from the final cluster assignments. cannam@16:

cannam@16:

Parameters

cannam@16: cannam@16:

Number of segment-types – The maximum number of clusters cannam@16: (segment-types) to be returned. The default is 10. Unlike many cannam@16: clustering algorithms, the constrained clustering used in this plugin cannam@16: does not produce too many clusters or vary significantly even if this cannam@16: is set too high. However, this parameter can be useful for limiting cannam@16: the number of expected segment-types. cannam@16:

cannam@16:

Feature Type – The type of spectral feature used for segmentation. The available features are:

"Hybrid", the default, which uses a Constant-Q transform (see related cannam@16: plugin): this is generally effective for modern studio recordings;
"Chromatic", using a chromagram derived from the Constant-Q feature (see related plugin): this may be preferable for live, acoustic, or older recordings, in which repeated sections may be less consistent in cannam@16: sound;
"Timbral", using Mel-Frequency cannam@16: Cepstral Coefficients (see related plugin), which is more likely to cannam@16: result in classification by instrumentation rather than musical cannam@16: content.

cannam@16:

Minimum segment duration – The approximate expected minimum cannam@16: duration for a segment, from 1 to 15 seconds. Changing this parameter cannam@16: may help the plugin to find musical sections rather than just cannam@16: following changes in the sound of the music, and also avoid wasting a cannam@16: segment-type cluster for timbrally distinct but too-short segments. cannam@16: The default of 4 seconds usually produces good results. cannam@16:

cannam@16:

Outputs

cannam@16: cannam@16:

Segmentation – The estimated segment boundaries, returned as a cannam@16: single feature with one value at each segment boundary, with the value cannam@16: representing the segment type number for the segment starting at that cannam@16: boundary. cannam@16:

cannam@16:

References and Credits

cannam@16: cannam@16:

Method: M. Levy and M. Sandler. Structural segmentation of musical audio by constrained clustering. IEEE Transactions on Audio, Speech, and Language Processing, February 2008. cannam@16:

cannam@16:

Note that this plugin does not implement the beat-sychronous aspect cannam@16: of the segmentation method described in the paper. cannam@16:

cannam@16:

The Segmenter Vamp plugin was written by Mark Levy. Thanks to George cannam@16: Fazekas for providing much of this documentation. cannam@16:

cannam@16:

6. Similarity

cannam@16: cannam@16:

System identifier – vamp:qm-vamp-plugins:qm-similarity cannam@16:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-similarity cannam@16:
Links – Back to top of library documentation – Download location cannam@16:

cannam@16:

Similarity treats each channel of its audio input as a separate cannam@16: "track", and estimates how similar the tracks are to one another using cannam@16: a selectable similarity measure. cannam@16:

cannam@16:

The plugin also returns the intermediate data used as a basis of the cannam@16: similarity measure; it can therefore be used on a single channel of cannam@16: input (with the resulting intermediate data then being applied in some cannam@16: other similarity or clustering algorithm, for example) if desired, as cannam@16: well as with multiple inputs. cannam@16:

cannam@16:

Because of the way this plugin handles multiple inputs, by assuming cannam@16: that each channel represents a separate piece of music, it may not be cannam@16: appropriate for use directly in a general-purpose host (unless you cannam@16: actually want to do something like compare two stereo channels for cannam@16: timbral similarity, which is unlikely). cannam@16:

cannam@16:

Parameters

cannam@16: cannam@16:

Feature Type – The underlying audio feature used for the similarity cannam@16: measure. The available features are: cannam@16:

"Timbre", in which the distance cannam@16: between tracks is a symmetrised Kullback-Leibler divergence between cannam@16: Gaussian-modelled MFCC means and variances across each track, for the cannam@16: first 20 MFCCs including C0 (see related plugin);
"Chroma", which uses Kullback-Leibler divergence of cannam@16: mean chroma histogram (see related plugin);
"Rhythm", using the cosine distance between cannam@16: "beat spectrum" measures derived from a short sampled section of the cannam@16: track;
and combined "Timbre and Rhythm" and "Chroma and Rhythm" cannam@16: features.

cannam@16:

Outputs

cannam@16: cannam@16:

Distance Matrix – A matrix of the distance measures between input cannam@16: channels, returned as a series of vector features timestamped at cannam@16: one-second intervals. The distance from channel i to channel j cannam@16: appears as the j'th bin of the feature at time i. cannam@16:

cannam@16:

Distance from First Channel – A single vector feature, timestamped cannam@16: at time zero, containing the distances between the first input channel cannam@16: and each of the input channels (including the first channel itself at cannam@16: bin 0, which should have zero distance). cannam@16:

cannam@16:

Ordered Distances from First Channel – A pair of vector features, cannam@16: at times 0 and 1 second. The feature at time 0 contains the 1-based cannam@16: indices of the input channels in the order of similarity to the first cannam@16: input channel (so its first bin should always contain 1, as the first cannam@16: channel is most similar to itself). The feature at time 1 contains, cannam@16: in bin n, the distance between the first input channel and the channel cannam@16: with index found at bin n of the feature at time 0. cannam@16:

cannam@16:

Feature Means – A series of vector features containing the mean cannam@16: values of each of the feature bins across the duration of each of the cannam@16: input channels. This output returns one feature for each input cannam@16: channel, timestamped at one-second intervals. The number of bins for cannam@16: each feature depends on the feature type; it will be 20 for MFCC cannam@16: features and 12 for chroma features. No features will be returned on cannam@16: this output if the feature type is purely rhythmic. cannam@16:

cannam@16:

Feature Variances – Just as Feature Means, but variances. cannam@16:

cannam@16:

Beat Spectra – A series of vector features containing the rhythmic cannam@16: autocorrelation profiles (beat spectra) for each of the input cannam@16: channels. This output returns one 512-bin feature for each input cannam@16: channel, timestamped at one-second intervals. No features will be cannam@16: returned on this output if the feature type contains no rhythm cannam@16: component. cannam@16:

cannam@16:

References and Credits

cannam@16: cannam@16:

Timbral similarity: M. Levy and M. Sandler. Lightweight measures for timbral similarity of musical audio. In Proceedings of the 1st cannam@16: ACM workshop on Audio and Music Computing Multimedia, Santa Barbara, cannam@16: 2006. cannam@16:

cannam@16:

Combined rhythmic and timbral similarity: K. Jacobson. A Multifaceted Approach to Music Similarity. In Proceedings of the cannam@16: Seventh International Conference on Music Information Retrieval cannam@16: (ISMIR), 2006. cannam@16:

cannam@16:

The Similarity Vamp plugin was written by Mark Levy, Kurt Jacobson and cannam@16: Chris Cannam. cannam@16:

cannam@16:

7. Constant-Q Spectrogram

cannam@16: cannam@16:

System identifier – vamp:qm-vamp-plugins:qm-constantq cannam@16:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-constantq cannam@16:
Links – Back to top of library documentation – Download location cannam@16:

cannam@16:

Constant-Q Spectrogram calculates a spectrogram based on a short-time cannam@16: windowed constant Q spectral transform. This is a spectrogram in cannam@16: which the ratio of centre frequency to resolution is constant for each cannam@16: frequency bin. The frequency bins correspond to the frequencies of cannam@16: "musical notes" rather than being linearly spaced in frequency as they cannam@16: are for the conventional DFT spectrogram. cannam@16:

cannam@16:

The pitch range and the number of frequency bins per octave may be cannam@16: adjusted using the plugin's parameters. Note that the plugin's cannam@16: preferred step and block sizes are defined by these parameters, and cannam@16: the plugin will not accept any other block size than its preferred cannam@16: value. cannam@16:

cannam@16:

Parameters

cannam@16: cannam@16:

Minimum Pitch – The MIDI pitch value (0-127) corresponding to the lowest cannam@16: frequency to be included in the constant-Q transform. cannam@16:

cannam@16:

Maximum Pitch – The MIDI pitch value (0-127) corresponding to the cannam@16: lowest frequency to be included in the constant-Q transform. cannam@16:

cannam@16:

Tuning Frequency – The frequency of concert A in the cannam@16: music under analysis. cannam@16:

cannam@16:

Bins per Octave – The number of constant-Q transform bins to be cannam@16: computed per octave. cannam@16:

cannam@16:

Normalized – Whether to normalize each output column to unit cannam@16: maximum. cannam@16:

cannam@16:

Outputs

cannam@16: cannam@16:

Constant-Q Spectrogram – The calculated spectrogram, as a single cannam@16: feature per process block containing one bin for each pitch included cannam@16: in the spectrogram's range. cannam@16:

cannam@16:

References and Credits

cannam@16: cannam@16:

Principle: J. Brown. Calculation of a constant Q spectral transform. Journal of the Acoustical Society of America, 89(1): cannam@16: 425-434, 1991. cannam@16:

cannam@16:

The Constant-Q Spectrogram Vamp plugin was written by Christian cannam@16: Landone. cannam@16:

cannam@16:

8. Chromagram

cannam@16: cannam@16:

System identifier – vamp:qm-vamp-plugins:qm-chromagram cannam@16:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-chromagram cannam@16:
Links – Back to top of library documentation – Download location cannam@16:

cannam@16:

Chromagram calculates a constant Q spectral transform (as in the cannam@16: Constant Q Spectrogram plugin) and then wraps the frequency bin values cannam@16: into a single octave, with each bin containing the sum of the cannam@16: magnitudes from the corresponding bin in all octaves. The number of cannam@16: values in each feature vector returned by the plugin is therefore the cannam@16: same as the number of bins per octave configured for the underlying cannam@16: constant Q transform. cannam@16:

cannam@16:

The pitch range and the number of frequency bins per octave for the cannam@16: transform may be adjusted using the plugin's parameters. Note that cannam@16: the plugin's preferred step and block sizes depend on these cannam@16: parameters, and the plugin will not accept any other block size than cannam@16: its preferred value. cannam@16:

cannam@16:

Parameters

cannam@16: cannam@16:

Minimum Pitch – The MIDI pitch value (0-127) corresponding to the cannam@16: lowest frequency to be included in the constant-Q transform used in cannam@16: calculating the chromagram. cannam@16:

cannam@16:

Maximum Pitch – The MIDI pitch value (0-127) corresponding to the cannam@16: lowest frequency to be included in the constant-Q transform used in cannam@16: calculating the chromagram. cannam@16:

cannam@16:

Tuning Frequency – The frequency of concert A in the cannam@16: music under analysis. cannam@16:

cannam@16:

Bins per Octave – The number of constant-Q transform bins to be cannam@16: computed per octave, and thus the total number of bins present in the cannam@16: resulting chromagram. cannam@16:

cannam@16:

Normalized – Whether to normalize each output column. Normalization cannam@16: may be to unit sum or unit maximum. cannam@16:

cannam@16:

Outputs

cannam@16: cannam@16:

Chromagram – The calculated chromagram, as a single feature per cannam@16: process block containing the number of bins given in the bins per cannam@16: octave parameter. cannam@16:

cannam@16:

References and Credits

cannam@16: cannam@16:

The Chromagram Vamp plugin was written by Christian Landone. cannam@16:

cannam@16:

9. Mel-Frequency Cepstral Coefficients

cannam@16: cannam@16:

System identifier – vamp:qm-vamp-plugins:qm-mfcc cannam@16:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-mfcc cannam@16:
Links – Back to top of library documentation – Download location cannam@16:

cannam@16:

Mel-Frequency Cepstral Coefficients calculates MFCCs from a single cannam@16: channel of audio. These coefficients, derived from a cosine transform cannam@16: of the mapping of an audio spectrum onto a frequency scale modelled on cannam@16: human auditory response, are widely used in speech recognition, music cannam@16: classification and other tasks. cannam@16:

cannam@16:

Parameters

cannam@16: cannam@16:

Number of Coefficients – The number of MFCCs to return. Commonly cannam@16: used values include 13 or the default 20. This number includes C0 if cannam@16: requested (see Include C0 below). cannam@16:

cannam@16:

Power for Mel Amplitude Logs – An optional power value to which the cannam@16: spectral amplitudes should be raised before applying the cosine cannam@16: transform. Values greater than 1 may in principle reduce the cannam@16: contribution of noise to the results. The default is 1. cannam@16:

cannam@16:

Include C0 – Whether to include the "zero'th" coefficient, which cannam@16: simply reflects the overall signal power across the Mel frequency cannam@16: bands. cannam@16:

cannam@16:

Outputs

cannam@16: cannam@16:

Coefficients – The MFCC values, returned as one vector feature per cannam@16: processing block. cannam@16:

cannam@16:

Means of Coefficients – The overall means of the MFCC bins, as a cannam@16: single vector feature with time 0 that is returned when processing is cannam@16: complete. cannam@16:

cannam@16:

References and Credits

cannam@16: cannam@16:

MFCCs in music: See B. Logan. Mel-Frequency Cepstral Coefficients for Music Modeling. In Proceedings of the First International cannam@16: Symposium on Music Information Retrieval (ISMIR), 2000. cannam@16:

cannam@16:

The Mel-Frequency Cepstral Coefficients Vamp plugin was written by cannam@16: Nicolas Chetry and Chris Cannam. cannam@16:

cannam@16:

cannam@16: cannam@16: cannam@16: