QM Vamp Plugins: User Documentation

cannam@16: cannam@16: cannam@16: cannam@16: cannam@16: cannam@16: cannam@16: QM Vamp Plugins: User Documentation cannam@16: cannam@16: cannam@16: cannam@16: cannam@16: cannam@16:

QM Vamp Plugins

cannam@16: cannam@16:

The QM Vamp Plugin set is a library of Vamp audio feature cannam@16: extraction plugins developed at the Centre for Digital Music at Chris@109: Queen Mary, University of London. These plugins are provided as a Chris@109: single library file, made available in source and binary form for Chris@109: Windows, OS/X, and Linux via the SoundSoftware Chris@109: code site (see download Chris@109: page). Chris@109: cannam@16:

cannam@16:

For more information about Vamp plugins, see http://www.vamp-plugins.org/ . cannam@16:

cannam@16: cannam@16:

1. Note Onset Detector

cannam@16:

2. Tempo and Beat Tracker

cannam@29:

3. Bar and Beat Tracker

cannam@29:

4. Key Detector

cannam@29:

5. Tonal Change

cannam@29:

6. Adaptive Spectrogram

cannam@29:

7. Polyphonic Transcription

cannam@29:

8. Segmenter

cannam@29:

9. Similarity

cannam@29:

10. Discrete Wavelet Transform

cannam@29:

11. Constant-Q Spectrogram

cannam@29:

12. Chromagram

cannam@29:

13. Mel-Frequency Cepstral Coefficients

cannam@16: cannam@29:

1. Note Onset Detector

cannam@16: cannam@16:

System identifier – vamp:qm-vamp-plugins:qm-onsetdetector cannam@16:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-onsetdetector Chris@109:
Links – Back to top of library documentation – Download location cannam@16:

cannam@16:

Note Onset Detector analyses a single channel of audio and estimates cannam@16: the onset times of notes within the music – that is, the times at cannam@16: which notes and other audible events begin. cannam@16:

cannam@16:

It calculates an onset likelihood function for each spectral frame, cannam@16: and picks peaks in a smoothed version of this function. The plugin is cannam@16: non-causal, returning all results at the end of processing. cannam@16:

cannam@16:

Parameters

cannam@16: cannam@16:

Onset Detection Function Type – The method used to calculate the cannam@16: onset likelihood function. The most versatile method is the default, cannam@16: "Complex Domain" (see reference, Duxbury et al 2003). "Spectral cannam@16: Difference" may be appropriate for percussive recordings, "Phase cannam@16: Deviation" for non-percussive music, and "Broadband Energy Rise" (see cannam@16: reference, Barry et al 2005) for identifying percussive onsets in cannam@16: mixed music. cannam@16:

cannam@16:

Onset Detector Sensitivity – Sensitivity level for peak detection cannam@16: in the onset likelihood function. The higher the sensitivity, the cannam@16: more onsets will (rightly or wrongly) be detected. The peak picker cannam@16: does not have a simple threshold level; instead, this parameter cannam@16: controls the required "steepness" of the slopes in the smoothed cannam@16: detection function either side of a peak value, in order for that peak cannam@16: to be accepted as an onset. cannam@16:

cannam@16:

Adaptive Whitening – This option evens out the temporal and cannam@16: frequency variation in the signal, which can yield improved cannam@16: performance in onset detection, for example in audio with big cannam@16: variations in dynamics. cannam@16:

cannam@16:

Outputs

cannam@16: cannam@16:

Note Onsets – The detected note onset times, returned as a single cannam@16: feature with timestamp but no value for each detected note. cannam@16:

cannam@16:

Onset Detection Function – The raw note onset likelihood function cannam@16: that was calculated as the first step of the detection process. cannam@16:

cannam@16:

Smoothed Detection Function – The note onset likelihood function cannam@16: following median filtering. This is the function from which cannam@16: sufficiently steep peak values are picked and classified as onsets. cannam@16:

cannam@16:

References and Credits

cannam@16: cannam@16:

Basic detection methods: C. Duxbury, J. P. Bello, M. Davies and cannam@16: M. Sandler, Complex domain Onset Detection for Musical Signals. In cannam@16: Proceedings of the 6th Conference on Digital Audio Effects cannam@16: (DAFx-03). London, UK. September 2003. cannam@16:

cannam@16:

Adaptive whitening: D. Stowell and M. D. Plumbley, Adaptive whitening for improved real-time audio onset detection. In cannam@16: Proceedings of the International Computer Music Conference (ICMC'07), cannam@16: August 2007. cannam@16:

cannam@16:

Percussion onset detector: D. Barry, D. Fitzgerald, E. Coyle and cannam@16: B. Lawlor, Drum Source Separation using Percussive Feature Detection and Spectral Modulation. ISSC 2005. cannam@16:

cannam@16:

The Note Onset Detector Vamp plugin was written by Chris Duxbury, Juan cannam@16: Pablo Bello and Christian Landone. cannam@16:

cannam@29: cannam@16:

2. Tempo and Beat Tracker

cannam@16: cannam@16:

System identifier – vamp:qm-vamp-plugins:qm-tempotracker cannam@16:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tempotracker Chris@109:
Links – Back to top of library documentation – Download location cannam@16:

cannam@16:

Tempo and Beat Tracker analyses a single channel of audio and cannam@16: estimates the positions of metrical beats within the music (the cannam@16: equivalent of a human listener tapping their foot to the beat). cannam@16:

cannam@16:

Parameters

cannam@16: cannam@46:

Beat Tracking Method – The method used to track beats. The default, "New", uses a hybrid of the "Old" two-state beat tracking model cannam@46: (see reference Davies 2007) and a dynamic programming method (see reference cannam@46: Ellis 2007). A more detailed description is given below within the Bar and cannam@46: Beat Tracker plugin.

cannam@29: cannam@29:

Onset Detection Function Type – The algorithm used to calculate the cannam@16: onset likelihood function. The most versatile method is the default, cannam@16: "Complex Domain" (see reference, Duxbury et al 2003). "Spectral cannam@16: Difference" may be appropriate for percussive recordings, "Phase cannam@16: Deviation" for non-percussive music, and "Broadband Energy Rise" (see cannam@16: reference, Barry et al 2005) for identifying percussive onsets in cannam@16: mixed music. cannam@16:

cannam@16:

Outputs

cannam@16: cannam@16:

Beats – The estimated beat locations, returned as a single feature, cannam@16: with timestamp but no value, for each beat, labelled with the cannam@16: corresponding estimated tempo at that beat. cannam@16:

cannam@16:

Onset Detection Function – The raw note onset likelihood function cannam@16: used in beat estimation. cannam@16:

cannam@16:

Tempo – The estimated tempo, returned as a feature each time the cannam@16: estimated tempo changes, with a single value for the tempo in beats cannam@16: per minute. cannam@16:

cannam@16:

References and Credits

cannam@16: cannam@16:

Beat tracking method: M. E. P. Davies and M. D. Plumbley. cannam@16: Context-dependent beat tracking of musical audio. In IEEE cannam@16: Transactions on Audio, Speech and Language Processing. Vol. 15, No. 3, cannam@46: pp1009-1020, 2007;
M. E. P. Davies and M. D. Plumbley. cannam@16: Beat Tracking With A Two State Model. In Proceedings of the IEEE cannam@16: International Conference on Acoustics, Speech and Signal Processing cannam@46: (ICASSP 2005), Vol. 3, pp241-244 Philadelphia, USA, March 19-23, 2005; cannam@46:
D. P. W. Ellis. Beat Tracking by Dynamic cannam@46: Programming. In Journal of New Music Research. Vol. 37, No. 1, cannam@46: pp51-60, 2007. cannam@16:

cannam@16:

Onset detection methods: C. Duxbury, J. P. Bello, M. Davies and cannam@16: M. Sandler, Complex domain Onset Detection for Musical Signals. In cannam@16: Proceedings of the 6th Conference on Digital Audio Effects cannam@16: (DAFx-03). London, UK. September 2003. cannam@16:

cannam@16:

Percussion onset detector: D. Barry, D. Fitzgerald, E. Coyle and cannam@16: B. Lawlor, Drum Source Separation using Percussive Feature Detection and Spectral Modulation. ISSC 2005. cannam@16:

cannam@16:

The Tempo and Beat Tracker Vamp plugin was written by Matthew Davies cannam@16: and Christian Landone. cannam@16:

cannam@29: cannam@29: cannam@29:

3. Bar and Beat Tracker

cannam@29: cannam@29:

System identifier – vamp:qm-vamp-plugins:qm-barbeattracker cannam@29:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-barbeattracker Chris@109:
Links – Back to top of library documentation – Download location cannam@29:

cannam@29: cannam@29:

Bar and Beat Tracker analyses a single channel of audio and cannam@29: estimates the positions of bar lines and the resulting counted cannam@29: metrical beat positions within the music (where the first beat of cannam@29: each bar is "1", the equivalent of counting in time to the music). cannam@29: It is closely related to the Tempo and cannam@29: Beat Tracker, producing the same results for beat position as cannam@29: that plugin's "New" beat tracking method. cannam@29: cannam@29:

cannam@29: cannam@29:

Method

cannam@29: cannam@29:

The plugin first calculates an onset detection function using the cannam@29: "Complex Domain" method (see Tempo and Beat cannam@29: Tracker).

cannam@29: cannam@29:

The beat tracking method performs two passes over the onset cannam@29: detection function, first to estimate the tempo contour, and then cannam@29: given the tempo, to recover the beat locations.

cannam@29: cannam@29:

To identify the tempo, the onset detection function is partitioned cannam@29: into 6-second frames with a 1.5-second increment. The autocorrelation cannam@29: function of each 6-second onset detection function is found and this cannam@29: is then passed through a perceptually weighted comb filterbank (see cannam@29: reference Davies 2007). The successive comb filterbank output signals cannam@29: are grouped together into a matrix of observations of periodicity cannam@29: through time. The best path of periodicity through these observations cannam@29: is found using the Viterbi algorithm, where the transition matrix is cannam@29: defined as a diagonal Gaussian.

cannam@29: cannam@29:

Given the estimates of periodicity, the beat locations are recovered cannam@29: by applying the dynamic programming algorithm (see reference Ellis cannam@29: 2007). This process involves the calculation of a recursive cumulative cannam@29: score function and backtrace signal. The cumulative score indicates cannam@29: the likelihood of a beat existing at each sample of the onset cannam@29: detection function input, and the backtrace gives the location of the cannam@29: best previous beat given this point in time. Once the cumulative score cannam@29: and backtrace have been calculated for the whole input signal, the cannam@29: best path through beat locations is found by recursively sampling the cannam@29: backtrace signal from the end of the input signal back to the cannam@29: beginning. See reference Stark et al. 2009 for a description of the cannam@29: real-time implementation of the beat tracking algorithm.

cannam@29: cannam@29:

Once the beat locations have been identified, the plugin makes a cannam@29: second pass over the input audio signal, partitioning it into beat cannam@29: synchronous frames. The audio within each beat frame is down-sampled cannam@29: to give a new sampling frequency of 2.8kHz. A beat-synchronous cannam@29: spectral representation is then calculated within each frame, from cannam@29: which a measure of beat spectral difference is calculated using cannam@29: Jensen-Shannon divergence. The bar boundaries are identified as those cannam@29: beat transitions leading to most consistent spectral change given the cannam@29: specified number of beats per bar.

cannam@29: cannam@29:

Parameters

cannam@29: cannam@29:

Beats per Bar – The number of beats per bar (or measure). The cannam@29: plugin assumes that the number of beats per bar is fixed throughout cannam@29: the music. cannam@29:

cannam@29:

Outputs

cannam@29: cannam@29:

Beats – The estimated beat locations, returned as a single feature, cannam@29: with timestamp but no value, for each beat, labelled with the cannam@29: number of that beat within the bar (e.g. consecutively 1, 2, 3, 4 for 4 beats to the bar). cannam@29:

cannam@29:

Bars – The estimated bar line locations, returned as a single feature, cannam@29: with timestamp but no value, for each bar. cannam@29:

cannam@29:

Beat Count – The estimated beat locations, returned as a single feature, cannam@29: with timestamp and a value corresponding to the cannam@29: number of that beat within the bar. This is similar to the Beats output except that it returns a counting function rather than a series of instants. cannam@29:

cannam@29:

Beat Spectral Difference – The new-bar likelihood function used in bar line estimation. cannam@29:

cannam@29: cannam@29:

References and Credits

cannam@29: cannam@29:

Beat tracking method: A. M. Stark, M. E. P. Davies and cannam@29: M. D. Plumbley. Real-time beat-synchronous analysis of musical cannam@29: audio. To appear in Proceedings of 12th International Conference cannam@29: on Digital Audio Effects (DAFx). 2009;
M. E. P. Davies and cannam@29: M. D. Plumbley. Context-dependent cannam@29: beat tracking of musical audio. In IEEE Transactions on cannam@29: Audio, Speech and Language Processing. Vol. 15, No. 3, pp1009-1020, cannam@29: 2007;
D. P. W. Ellis. Beat Tracking by Dynamic cannam@29: Programming. In Journal of New Music Research. Vol. 37, No. 1, cannam@29: pp51-60, 2007.

cannam@29: cannam@29:

Bar finding method: M. E. P. Davies and M. D. Plumbley. A cannam@29: spectral difference approach to extracting downbeats in musical cannam@29: audio. In Proceedings of 14th European Signal Processing Conference cannam@29: (EUSIPCO), Italy, 2006.

cannam@29: cannam@29:

The Bar and Beat Tracker Vamp plugin was written by Matthew Davies and Adam Stark. cannam@29:

cannam@29: cannam@29: cannam@29: cannam@29:

4. Key Detector

cannam@16: cannam@16:

System identifier – vamp:qm-vamp-plugins:qm-keydetector cannam@16:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-keydetector Chris@109:
Links – Back to top of library documentation – Download location cannam@16:

cannam@16:

Key Detector analyses a single channel of audio and continuously cannam@16: estimates the key of the music by comparing the degree to which a cannam@16: block-by-block chromagram correlates to the stored key profiles for cannam@16: each major and minor key. cannam@16:

cannam@16:

The key profiles are drawn from analysis of Book I of the Well cannam@16: Tempered Klavier by J S Bach, recorded at A=440 equal temperament. cannam@16:

cannam@16:

Parameters

cannam@16: cannam@16:

Tuning Frequency – The frequency of concert A in the music under cannam@16: analysis. cannam@16:

cannam@16:

Window Length – The number of chroma analysis frames taken into cannam@16: account for key estimation. This controls how eager the key detector cannam@16: will be to return short-duration tonal changes as new key changes (the cannam@16: shorter the window, the more likely it is to detect a new key change). cannam@16:

cannam@16:

Outputs

cannam@16: cannam@16:

Tonic Pitch – The tonic pitch of each estimated key change, cannam@16: returned as a single-valued feature at the point where the key change cannam@16: is detected, with value counted from 1 to 12 where C is 1, C# or Db is cannam@16: 2, and so on up to B which is 12. cannam@16:

cannam@16:

Key Mode – The major or minor mode of the estimated key, where cannam@16: major is 0 and minor is 1. cannam@16:

cannam@16:

Key – The estimated key for each key change, returned as a cannam@16: single-valued feature at the point where the key change is detected, cannam@16: with value counted from 1 to 24 where 1-12 are the major keys and cannam@16: 13-24 are the minor keys, such that C major is 1, C# major is 2, and cannam@16: so on up to B major which is 12; then C minor is 13, Db minor is 14, cannam@16: and so on up to B minor which is 24. cannam@16:

cannam@16:

Key Strength Plot – A grid representing the ongoing key cannam@16: "probability" throughout the music. This is returned as a feature for cannam@16: each chroma frame, containing 25 bins. Bins 1-12 are the major keys cannam@16: from C upwards; bins 14-25 are the minor keys from C upwards. The cannam@16: 13th bin is unused: it just provides space between the first and cannam@16: second halves of the feature if displayed in a single plot. cannam@16:

cannam@16:

The outputs are also labelled with pitch or key as text. cannam@16:

cannam@16:

References and Credits

cannam@16: cannam@16:

Method: see K. Noland and M. Sandler. Signal Processing Parameters for Tonality Estimation. In Proceedings of Audio Engineering Society cannam@16: 122nd Convention, Vienna, 2007. cannam@16:

cannam@16:

The Key Detector Vamp plugin was written by Katy Noland and Christian cannam@16: Landone. cannam@16:

cannam@29: cannam@29:

5. Tonal Change

cannam@16: cannam@16:

System identifier – vamp:qm-vamp-plugins:qm-tonalchange cannam@16:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tonalchange Chris@109:
Links – Back to top of library documentation – Download location cannam@16:

cannam@16:

Tonal Change analyses a single channel of audio, detecting harmonic cannam@16: changes such as chord boundaries. cannam@16:

cannam@16:

Parameters

cannam@16: cannam@16:

Gaussian smoothing – The window length for the internal smoothing cannam@16: operation, in chroma analysis frames. This controls how eager the cannam@16: tonal change detector will be to identify very short-term tonal cannam@16: changes. The default value of 5 is quite short, and may lead to more cannam@16: (not always meaningful) results being returned; for many purposes a cannam@16: larger value, closer to the maximum of 20, may be appropriate. cannam@16:

cannam@16:

Chromagram minimum pitch – The MIDI pitch value (0-127) of the cannam@16: minimum pitch included in the internal chromagram analyis. cannam@16:

cannam@16:

Chromagram maximum pitch – The MIDI pitch value (0-127) of the cannam@16: maximum pitch included in the internal chromagram analyis. cannam@16:

cannam@16:

Chromagram tuning frequency – The frequency of concert A in the cannam@16: music under analysis. cannam@16:

cannam@16:

Outputs

cannam@16: cannam@16:

Transform to 6D Tonal Content Space – A representation of the cannam@16: musical content in a six-dimensional tonal space onto which the cannam@16: algorithm maps 12-bin chroma vectors extracted from the audio. cannam@16:

cannam@16:

Tonal Change Detection Function – A function representing the cannam@16: estimated likelihood of a tonal change occurring in each spectral cannam@16: frame. cannam@16:

cannam@16:

Tonal Change Positions – The resulting estimated positions of tonal cannam@16: changes. cannam@16:

cannam@16:

References and Credits

cannam@16: cannam@16:

Method: C. A. Harte, M. Gasser, and M. Sandler. Detecting harmonic change in musical audio. In Proceedings of the 1st ACM workshop on cannam@16: Audio and Music Computing Multimedia, Santa Barbara, 2006. cannam@16:

cannam@29:

The Tonal Change Vamp plugin was written by Chris Harte and Martin cannam@16: Gasser. cannam@16:

cannam@29: cannam@29: cannam@29:

6. Adaptive Spectrogram

cannam@29: cannam@29:

System identifier – vamp:qm-vamp-plugins:qm-adaptivespectrogram cannam@29:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-adaptivespectrogram Chris@109:
Links – Back to top of library documentation – Download location cannam@29:

cannam@29: cannam@29:

Adaptive Spectrogram produces a composite spectrogram from a set of cannam@29: series of short-time Fourier transforms at differing resolutions. cannam@29: Values are selected from these spectrograms by repeated subdivision by cannam@29: time and frequency in order to maximise an entropy function across cannam@29: each column.

cannam@29: cannam@29:

Parameters

cannam@29: cannam@29:

Number of resolutions – The number of distinct cannam@29: resolutions to calculate and use. The resolutions will be consecutive cannam@29: powers of two starting from the smallest resolution specified.

cannam@29: cannam@29:

Smallest resolution – The smallest of the set of cannam@29: resolutions to use.

cannam@29: cannam@29:

Omit alternate resolutions – Causes the plugin to cannam@29: ignore alternate resolutions (i.e. the smallest resolution multiplied cannam@29: by 2, 8, 32, etc) when composing a spectrogram. The smallest cannam@29: resolution specified, and its multiples by 4, 16, etc as applicable, cannam@29: will be retained. The total number of resolutions actually included cannam@29: in the resulting spectrogram will therefore be N/2 (for even N) or cannam@29: (N+1)/2 (for odd N) where N is the value of the "number of cannam@29: resolutions" parameter. This permits a wider range of resolutions to cannam@29: be included with less processing, at obvious cost in quality.

cannam@29: cannam@29:

Multi-threaded processing – Enables multi-threading of cannam@29: the spectrogram calculation. This usually results in somewhat faster cannam@29: processing where multiple CPU cores are available.

cannam@29: cannam@29:

As an example of the resolution parameters, if the "number of cannam@29: resolutions" is set to 5, "smallest resolution" to 128, and "omit cannam@29: alternate resolutions" is not used, the composite spectrogram will be cannam@29: calculated using spectrograms from 128, 256, 512, 1024, and 2048 point cannam@29: short-time Fourier transforms (with 50% overlap in each case). With cannam@29: "omit alternate resolutions" set, the same parameters would result in cannam@29: spectrograms from 128, 512, and 2048 point STFTs being used.

cannam@29: cannam@29:

References and Credits

cannam@29: cannam@29:

Method: X. Wen and M. Sandler. Composite spectrogram using multiple Fourier transforms. IET Signal Processing, 3(1):51-63, 2009. cannam@29:

cannam@29: cannam@29:

The Adaptive Spectrogram Vamp plugin was written by Wen Xue and Chris Cannam.

cannam@29: cannam@29:

7. Polyphonic Transcription

cannam@29: cannam@29:

System identifier – vamp:qm-vamp-plugins:qm-transcription cannam@29:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-transcription Chris@109:
Links – Back to top of library documentation – Download location cannam@29: cannam@29:

The Polyphonic Transcription plugin estimates a note transcription cannam@29: using MIDI pitch values from its input audio, returning a feature for cannam@29: each note (with timestamp and duration) whose value is the MIDI pitch cannam@29: number. Velocity is not estimated.

cannam@29: cannam@29:

Although the published description of the method is described as cannam@29: real-time, the implementation used in this plugin is non-causal; it cannam@29: buffers its input to operate on in a single unit, doing all the real cannam@29: work after its entire input has been received, and is very memory cannam@29: intensive. However, it is relatively fast (faster than real-time) cannam@29: compared to other polyphonic transcription methods.

cannam@29: cannam@29:

The plugin works best at 44.1KHz input sample rate, and is tuned for cannam@29: piano and guitar music.

cannam@29: cannam@29: cannam@29:

References and Credits

cannam@29: cannam@29:

Method: R. Zhou and J. D. Reiss. A Real-Time Polyphonic Music Transcription System. In Proceedings of the Fourth Music Information Retrieval Evaluation eXchange (MIREX), Philadelphia, USA, 2008;
R. Zhou and J. D. Reiss. A Real-Time Frame-Based Multiple Pitch Estimation Method Using the Resonator Time Frequency Image. Third Music Information Retrieval Evaluation eXchange (MIREX), Vienna, Austria, 2007.

cannam@29: cannam@29:

The Polyphonic Transcription Vamp plugin was written by Ruohua Zhou.

cannam@29: cannam@29: cannam@29:

8. Segmenter

cannam@16: cannam@16:

System identifier – vamp:qm-vamp-plugins:qm-segmenter cannam@16:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-segmenter Chris@109:
Links – Back to top of library documentation – Download location cannam@16:

cannam@16:

Segmenter divides a single channel of music up into structurally cannam@16: consistent segments. It returns a numeric value (the segment type) cannam@16: for each moment at which a new segment starts. cannam@16:

cannam@16:

For music with clearly tonally distinguishable sections such as verse, cannam@16: chorus, etc., segments with the same type may be expected to be cannam@16: similar to one another in some structural sense. For example, cannam@16: repetitions of the chorus are likely to share a segment type. cannam@16:

cannam@16:

The plugin only attempts to identify similar segments; it does not cannam@16: attempt to label them. For example, it makes no attempt to tell you cannam@16: which segment is the chorus. cannam@16:

cannam@16:

Note that this plugin does a substantial amount of processing after cannam@16: receiving all of the input audio data, before it produces any results. cannam@16:

cannam@16:

Method

cannam@16: cannam@16:

The method relies upon structural/timbral similarity to obtain the cannam@16: high-level song structure. This is based on the assumption that the cannam@16: distributions of timbre features are similar over corresponding cannam@16: structural elements of the music. cannam@16:

cannam@16:

The algorithm works by obtaining a frequency-domain representation of cannam@16: the audio signal using a Constant-Q transform, a Chromagram or cannam@16: Mel-Frequency Cepstral Coefficients (MFCC) as underlying features (the cannam@16: particular feature is selectable as a parameter). The extracted cannam@16: features are normalised in accordance with the MPEG-7 standard (NASE cannam@16: descriptor), which means the spectrum is converted to decibel scale cannam@16: and each spectral vector is normalised by the RMS energy envelope. cannam@16: The value of this envelope is stored for each processing block of cannam@16: audio. This is followed by the extraction of 20 principal components cannam@16: per block using PCA, yielding a sequence of 21 dimensional feature cannam@16: vectors where the last element in each vector corresponds to the cannam@16: energy envelope. cannam@16:

cannam@16:

A 40-state Hidden Markov Model is then trained on the whole sequence cannam@16: of features, with each state of the HMM corresponding to a specific cannam@16: timbre type. This process partitions the timbre-space of a given track cannam@16: into 40 possible types. The important assumption of the model is that cannam@16: the distribution of these features remain consistent over a structural cannam@16: segment. After training and decoding the HMM, the song is assigned a cannam@16: sequence of timbre-features according to specific timbre-type cannam@16: distributions for each possible structural segment. cannam@16:

cannam@16:

The segmentation itself is computed by clustering timbre-type cannam@16: histograms. A series of histograms are created over a sliding window cannam@16: which are grouped into M clusters by an adapted soft k-means cannam@16: algorithm. Each of these clusters will correspond to a specific cannam@16: segment-type of the analyzed song. Reference histograms, iteratively cannam@16: updated during clustering, describe the timbre distribution for each cannam@16: segment. The segmentation arises from the final cluster assignments. cannam@16:

cannam@16:

Parameters

cannam@16: cannam@16:

Number of segment-types – The maximum number of clusters cannam@16: (segment-types) to be returned. The default is 10. Unlike many cannam@16: clustering algorithms, the constrained clustering used in this plugin cannam@16: does not produce too many clusters or vary significantly even if this cannam@16: is set too high. However, this parameter can be useful for limiting cannam@16: the number of expected segment-types. cannam@16:

cannam@16:

Feature Type – The type of spectral feature used for segmentation. The available features are:

"Hybrid", the default, which uses a Constant-Q transform (see related cannam@16: plugin): this is generally effective for modern studio recordings;
"Chromatic", using a chromagram derived from the Constant-Q feature (see related plugin): this may be preferable for live, acoustic, or older recordings, in which repeated sections may be less consistent in cannam@16: sound;
"Timbral", using Mel-Frequency cannam@16: Cepstral Coefficients (see related plugin), which is more likely to cannam@16: result in classification by instrumentation rather than musical cannam@16: content.

cannam@16:

Minimum segment duration – The approximate expected minimum cannam@16: duration for a segment, from 1 to 15 seconds. Changing this parameter cannam@16: may help the plugin to find musical sections rather than just cannam@16: following changes in the sound of the music, and also avoid wasting a cannam@16: segment-type cluster for timbrally distinct but too-short segments. cannam@16: The default of 4 seconds usually produces good results. cannam@16:

cannam@16:

Outputs

cannam@16: cannam@16:

Segmentation – The estimated segment boundaries, returned as a cannam@16: single feature with one value at each segment boundary, with the value cannam@16: representing the segment type number for the segment starting at that cannam@16: boundary. cannam@16:

cannam@16:

References and Credits

cannam@16: cannam@16:

Method: M. Levy and M. Sandler. Structural segmentation of musical audio by constrained clustering. IEEE Transactions on Audio, Speech, and Language Processing, February 2008. cannam@16:

cannam@16:

Note that this plugin does not implement the beat-sychronous aspect cannam@16: of the segmentation method described in the paper. cannam@16:

cannam@16:

The Segmenter Vamp plugin was written by Mark Levy. Thanks to George cannam@16: Fazekas for providing much of this documentation. cannam@16:

cannam@29:

9. Similarity

cannam@16: cannam@16:

System identifier – vamp:qm-vamp-plugins:qm-similarity cannam@16:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-similarity Chris@109:
Links – Back to top of library documentation – Download location cannam@16:

cannam@16:

Similarity treats each channel of its audio input as a separate cannam@16: "track", and estimates how similar the tracks are to one another using cannam@16: a selectable similarity measure. cannam@16:

cannam@16:

The plugin also returns the intermediate data used as a basis of the cannam@16: similarity measure; it can therefore be used on a single channel of cannam@16: input (with the resulting intermediate data then being applied in some cannam@16: other similarity or clustering algorithm, for example) if desired, as cannam@16: well as with multiple inputs. cannam@16:

cannam@16:

Because of the way this plugin handles multiple inputs, by assuming cannam@16: that each channel represents a separate piece of music, it may not be cannam@16: appropriate for use directly in a general-purpose host (unless you cannam@16: actually want to do something like compare two stereo channels for cannam@16: timbral similarity, which is unlikely). cannam@16:

cannam@16:

Parameters

cannam@16: cannam@16:

Feature Type – The underlying audio feature used for the similarity cannam@16: measure. The available features are: cannam@16:

"Timbre", in which the distance cannam@16: between tracks is a symmetrised Kullback-Leibler divergence between cannam@16: Gaussian-modelled MFCC means and variances across each track, for the cannam@16: first 20 MFCCs including C0 (see related plugin);
"Chroma", which uses Kullback-Leibler divergence of cannam@16: mean chroma histogram (see related plugin);
"Rhythm", using the cosine distance between cannam@16: "beat spectrum" measures derived from a short sampled section of the cannam@16: track;
and combined "Timbre and Rhythm" and "Chroma and Rhythm" cannam@16: features.

cannam@16:

Outputs

cannam@16: cannam@16:

Distance Matrix – A matrix of the distance measures between input cannam@16: channels, returned as a series of vector features timestamped at cannam@16: one-second intervals. The distance from channel i to channel j cannam@16: appears as the j'th bin of the feature at time i. cannam@16:

cannam@16:

Distance from First Channel – A single vector feature, timestamped cannam@16: at time zero, containing the distances between the first input channel cannam@16: and each of the input channels (including the first channel itself at cannam@16: bin 0, which should have zero distance). cannam@16:

cannam@16:

Ordered Distances from First Channel – A pair of vector features, cannam@16: at times 0 and 1 second. The feature at time 0 contains the 1-based cannam@16: indices of the input channels in the order of similarity to the first cannam@16: input channel (so its first bin should always contain 1, as the first cannam@16: channel is most similar to itself). The feature at time 1 contains, cannam@16: in bin n, the distance between the first input channel and the channel cannam@16: with index found at bin n of the feature at time 0. cannam@16:

cannam@16:

Feature Means – A series of vector features containing the mean cannam@16: values of each of the feature bins across the duration of each of the cannam@16: input channels. This output returns one feature for each input cannam@16: channel, timestamped at one-second intervals. The number of bins for cannam@16: each feature depends on the feature type; it will be 20 for MFCC cannam@16: features and 12 for chroma features. No features will be returned on cannam@16: this output if the feature type is purely rhythmic. cannam@16:

cannam@16:

Feature Variances – Just as Feature Means, but variances. cannam@16:

cannam@16:

Beat Spectra – A series of vector features containing the rhythmic cannam@16: autocorrelation profiles (beat spectra) for each of the input cannam@16: channels. This output returns one 512-bin feature for each input cannam@16: channel, timestamped at one-second intervals. No features will be cannam@16: returned on this output if the feature type contains no rhythm cannam@16: component. cannam@16:

cannam@16:

References and Credits

cannam@16: cannam@16:

Timbral similarity: M. Levy and M. Sandler. Lightweight measures for timbral similarity of musical audio. In Proceedings of the 1st cannam@16: ACM workshop on Audio and Music Computing Multimedia, Santa Barbara, cannam@16: 2006. cannam@16:

cannam@16:

Combined rhythmic and timbral similarity: K. Jacobson. A Multifaceted Approach to Music Similarity. In Proceedings of the cannam@16: Seventh International Conference on Music Information Retrieval cannam@16: (ISMIR), 2006. cannam@16:

cannam@16:

The Similarity Vamp plugin was written by Mark Levy, Kurt Jacobson and cannam@16: Chris Cannam. cannam@16:

cannam@29: cannam@29: cannam@29:

10. Discrete Wavelet Transform

cannam@29: cannam@29:

System identifier – vamp:qm-vamp-plugins:qm-dwt cannam@29:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-dwt Chris@109:
Links – Back to top of library documentation – Download location cannam@29: cannam@29:

Discrete Wavelet Transform plugin performs the forward DWT on the cannam@29: signal. The wavelet coefficients are derived from a fast segmented DWT cannam@29: algorithm without block end effects. The DWT can be performed with cannam@29: various functions from a selection of wavelets up to the 16th scale.

cannam@29: cannam@29:

The wavelet coefficients are returned as feature columns at a rate of cannam@29: half the sample rate of the signal to be analysed. To simulate cannam@29: multiresolution in the layer data table, the coefficient values at cannam@29: higher scales are copied multiple times according to the number of the cannam@29: scale. For example, for scale 2 each value will appear twice, at scale cannam@29: 3 they will be appear four times, at scale 4 there will be 8 times the cannam@29: same coefficient value in order to simulate the lower resolution at cannam@29: higher scales.

cannam@29: cannam@29:

Parameters

cannam@29: cannam@29:

Scales – Adjusts the number of scales of the DWT. The cannam@29: processing block size needs to be set to at least 2ⁿ, where n = cannam@29: number of scales.

cannam@29: cannam@29:

Wavelet – Selects the wavelet function to be used for cannam@29: the transform. Wavelets from the following families are available: cannam@29: Daubechies, Symlets, Coiflets, Biorthogonal, Meyer.

cannam@29: cannam@29:

References and Credits

cannam@29: cannam@29:

Principles: S. Mallat. A theory for multiresolution signal decomposition: the wavelet representation. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 11 (1989), pp. 674-693;
cannam@29: P. Rajmic and J. Vlach. Real-Time Audio Processing via Segmented Wavelet Transform. In Proceedings of the 10th Int. Conference on Digital Audio Effects (DAFx-07), Bordeaux, France, September 10-15, 2007.

cannam@29: cannam@29:

The Discrete Wavelet Transform plugin was written by Thomas Wilmering.

cannam@29: cannam@29:

11. Constant-Q Spectrogram

cannam@16: cannam@16:

System identifier – vamp:qm-vamp-plugins:qm-constantq cannam@16:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-constantq Chris@109:
Links – Back to top of library documentation – Download location cannam@16:

cannam@16:

Constant-Q Spectrogram calculates a spectrogram based on a short-time cannam@16: windowed constant Q spectral transform. This is a spectrogram in cannam@16: which the ratio of centre frequency to resolution is constant for each cannam@16: frequency bin. The frequency bins correspond to the frequencies of cannam@16: "musical notes" rather than being linearly spaced in frequency as they cannam@16: are for the conventional DFT spectrogram. cannam@16:

cannam@16:

The pitch range and the number of frequency bins per octave may be cannam@16: adjusted using the plugin's parameters. Note that the plugin's cannam@16: preferred step and block sizes are defined by these parameters, and cannam@16: the plugin will not accept any other block size than its preferred cannam@16: value. cannam@16:

cannam@16:

Parameters

cannam@16: cannam@16:

Minimum Pitch – The MIDI pitch value (0-127) corresponding to the lowest cannam@16: frequency to be included in the constant-Q transform. cannam@16:

cannam@16:

Maximum Pitch – The MIDI pitch value (0-127) corresponding to the cannam@16: lowest frequency to be included in the constant-Q transform. cannam@16:

cannam@16:

Tuning Frequency – The frequency of concert A in the cannam@16: music under analysis. cannam@16:

cannam@16:

Bins per Octave – The number of constant-Q transform bins to be cannam@16: computed per octave. cannam@16:

cannam@16:

Normalized – Whether to normalize each output column to unit cannam@16: maximum. cannam@16:

cannam@16:

Outputs

cannam@16: cannam@16:

Constant-Q Spectrogram – The calculated spectrogram, as a single cannam@16: feature per process block containing one bin for each pitch included cannam@16: in the spectrogram's range. cannam@16:

cannam@16:

References and Credits

cannam@16: cannam@16:

Principle: J. Brown. Calculation of a constant Q spectral transform. Journal of the Acoustical Society of America, 89(1): cannam@16: 425-434, 1991. cannam@16:

cannam@16:

The Constant-Q Spectrogram Vamp plugin was written by Christian cannam@16: Landone. cannam@16:

cannam@29:

12. Chromagram

cannam@16: cannam@16:

System identifier – vamp:qm-vamp-plugins:qm-chromagram cannam@16:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-chromagram Chris@109:
Links – Back to top of library documentation – Download location cannam@16:

cannam@16:

Chromagram calculates a constant Q spectral transform (as in the cannam@16: Constant Q Spectrogram plugin) and then wraps the frequency bin values cannam@16: into a single octave, with each bin containing the sum of the cannam@16: magnitudes from the corresponding bin in all octaves. The number of cannam@16: values in each feature vector returned by the plugin is therefore the cannam@16: same as the number of bins per octave configured for the underlying cannam@16: constant Q transform. cannam@16:

cannam@16:

The pitch range and the number of frequency bins per octave for the cannam@16: transform may be adjusted using the plugin's parameters. Note that cannam@16: the plugin's preferred step and block sizes depend on these cannam@16: parameters, and the plugin will not accept any other block size than cannam@16: its preferred value. cannam@16:

cannam@16:

Parameters

cannam@16: cannam@16:

Minimum Pitch – The MIDI pitch value (0-127) corresponding to the cannam@16: lowest frequency to be included in the constant-Q transform used in cannam@16: calculating the chromagram. cannam@16:

cannam@16:

Maximum Pitch – The MIDI pitch value (0-127) corresponding to the cannam@16: lowest frequency to be included in the constant-Q transform used in cannam@16: calculating the chromagram. cannam@16:

cannam@16:

Tuning Frequency – The frequency of concert A in the cannam@16: music under analysis. cannam@16:

cannam@16:

Bins per Octave – The number of constant-Q transform bins to be cannam@16: computed per octave, and thus the total number of bins present in the cannam@16: resulting chromagram. cannam@16:

cannam@16:

Normalized – Whether to normalize each output column. Normalization cannam@16: may be to unit sum or unit maximum. cannam@16:

cannam@16:

Outputs

cannam@16: cannam@16:

Chromagram – The calculated chromagram, as a single feature per cannam@16: process block containing the number of bins given in the bins per cannam@16: octave parameter. cannam@16:

cannam@16:

References and Credits

cannam@16: cannam@16:

The Chromagram Vamp plugin was written by Christian Landone. cannam@16:

cannam@29:

13. Mel-Frequency Cepstral Coefficients

cannam@16: cannam@16:

System identifier – vamp:qm-vamp-plugins:qm-mfcc cannam@16:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-mfcc Chris@109:
Links – Back to top of library documentation – Download location cannam@16:

cannam@16:

Mel-Frequency Cepstral Coefficients calculates MFCCs from a single cannam@16: channel of audio. These coefficients, derived from a cosine transform cannam@16: of the mapping of an audio spectrum onto a frequency scale modelled on cannam@16: human auditory response, are widely used in speech recognition, music cannam@16: classification and other tasks. cannam@16:

cannam@16:

Parameters

cannam@16: cannam@16:

Number of Coefficients – The number of MFCCs to return. Commonly cannam@16: used values include 13 or the default 20. This number includes C0 if cannam@16: requested (see Include C0 below). cannam@16:

cannam@16:

Power for Mel Amplitude Logs – An optional power value to which the cannam@16: spectral amplitudes should be raised before applying the cosine cannam@16: transform. Values greater than 1 may in principle reduce the cannam@16: contribution of noise to the results. The default is 1. cannam@16:

cannam@16:

Include C0 – Whether to include the "zero'th" coefficient, which cannam@16: simply reflects the overall signal power across the Mel frequency cannam@16: bands. cannam@16:

cannam@16:

Outputs

cannam@16: cannam@16:

Coefficients – The MFCC values, returned as one vector feature per cannam@16: processing block. cannam@16:

cannam@16:

Means of Coefficients – The overall means of the MFCC bins, as a cannam@16: single vector feature with time 0 that is returned when processing is cannam@16: complete. cannam@16:

cannam@16:

References and Credits

cannam@16: cannam@16:

MFCCs in music: See B. Logan. Mel-Frequency Cepstral Coefficients for Music Modeling. In Proceedings of the First International cannam@16: Symposium on Music Information Retrieval (ISMIR), 2000. cannam@16:

cannam@16:

The Mel-Frequency Cepstral Coefficients Vamp plugin was written by cannam@16: Nicolas Chetry and Chris Cannam. cannam@16:

cannam@16:

cannam@16: cannam@16: cannam@16: