cannam@16: cannam@16: cannam@16:
cannam@16: cannam@16: cannam@16: cannam@16:The QM Vamp Plugin set is a library of Vamp audio feature cannam@16: extraction plugins developed at the Centre for Digital Music at Chris@109: Queen Mary, University of London. These plugins are provided as a Chris@109: single library file, made available in source and binary form for Chris@109: Windows, OS/X, and Linux via the SoundSoftware Chris@109: code site (see download Chris@109: page). Chris@109: cannam@16:
cannam@16:For more information about Vamp plugins, see http://www.vamp-plugins.org/ . cannam@16:
cannam@16: cannam@16: cannam@16: cannam@29: cannam@29:System identifier – vamp:qm-vamp-plugins:qm-onsetdetector
cannam@16:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-onsetdetector
Chris@109:
Links – Back to top of library documentation – Download location
cannam@16:
Note Onset Detector analyses a single channel of audio and estimates cannam@16: the onset times of notes within the music – that is, the times at cannam@16: which notes and other audible events begin. cannam@16:
cannam@16:It calculates an onset likelihood function for each spectral frame, cannam@16: and picks peaks in a smoothed version of this function. The plugin is cannam@16: non-causal, returning all results at the end of processing. cannam@16:
cannam@16:Onset Detection Function Type – The method used to calculate the cannam@16: onset likelihood function. The most versatile method is the default, cannam@16: "Complex Domain" (see reference, Duxbury et al 2003). "Spectral cannam@16: Difference" may be appropriate for percussive recordings, "Phase cannam@16: Deviation" for non-percussive music, and "Broadband Energy Rise" (see cannam@16: reference, Barry et al 2005) for identifying percussive onsets in cannam@16: mixed music. cannam@16:
cannam@16:Onset Detector Sensitivity – Sensitivity level for peak detection cannam@16: in the onset likelihood function. The higher the sensitivity, the cannam@16: more onsets will (rightly or wrongly) be detected. The peak picker cannam@16: does not have a simple threshold level; instead, this parameter cannam@16: controls the required "steepness" of the slopes in the smoothed cannam@16: detection function either side of a peak value, in order for that peak cannam@16: to be accepted as an onset. cannam@16:
cannam@16:Adaptive Whitening – This option evens out the temporal and cannam@16: frequency variation in the signal, which can yield improved cannam@16: performance in onset detection, for example in audio with big cannam@16: variations in dynamics. cannam@16:
cannam@16:Note Onsets – The detected note onset times, returned as a single cannam@16: feature with timestamp but no value for each detected note. cannam@16:
cannam@16:Onset Detection Function – The raw note onset likelihood function cannam@16: that was calculated as the first step of the detection process. cannam@16:
cannam@16:Smoothed Detection Function – The note onset likelihood function cannam@16: following median filtering. This is the function from which cannam@16: sufficiently steep peak values are picked and classified as onsets. cannam@16:
cannam@16:Basic detection methods: C. Duxbury, J. P. Bello, M. Davies and cannam@16: M. Sandler, Complex domain Onset Detection for Musical Signals. In cannam@16: Proceedings of the 6th Conference on Digital Audio Effects cannam@16: (DAFx-03). London, UK. September 2003. cannam@16:
cannam@16:Adaptive whitening: D. Stowell and M. D. Plumbley, Adaptive whitening for improved real-time audio onset detection. In cannam@16: Proceedings of the International Computer Music Conference (ICMC'07), cannam@16: August 2007. cannam@16:
cannam@16:Percussion onset detector: D. Barry, D. Fitzgerald, E. Coyle and cannam@16: B. Lawlor, Drum Source Separation using Percussive Feature Detection and Spectral Modulation. ISSC 2005. cannam@16:
cannam@16:The Note Onset Detector Vamp plugin was written by Chris Duxbury, Juan cannam@16: Pablo Bello and Christian Landone. cannam@16:
cannam@29: cannam@16:System identifier – vamp:qm-vamp-plugins:qm-tempotracker
cannam@16:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tempotracker
Chris@109:
Links – Back to top of library documentation – Download location
cannam@16:
Tempo and Beat Tracker analyses a single channel of audio and cannam@16: estimates the positions of metrical beats within the music (the cannam@16: equivalent of a human listener tapping their foot to the beat). cannam@16:
cannam@16:Beat Tracking Method – The method used to track beats. The default, "New", uses a hybrid of the "Old" two-state beat tracking model cannam@46: (see reference Davies 2007) and a dynamic programming method (see reference cannam@46: Ellis 2007). A more detailed description is given below within the Bar and cannam@46: Beat Tracker plugin.
cannam@29: cannam@29:Onset Detection Function Type – The algorithm used to calculate the cannam@16: onset likelihood function. The most versatile method is the default, cannam@16: "Complex Domain" (see reference, Duxbury et al 2003). "Spectral cannam@16: Difference" may be appropriate for percussive recordings, "Phase cannam@16: Deviation" for non-percussive music, and "Broadband Energy Rise" (see cannam@16: reference, Barry et al 2005) for identifying percussive onsets in cannam@16: mixed music. cannam@16:
cannam@16:Adaptive Whitening – This option evens out the temporal and cannam@16: frequency variation in the signal, which can yield improved cannam@16: performance in onset detection, for example in audio with big cannam@16: variations in dynamics. cannam@16:
cannam@16:Beats – The estimated beat locations, returned as a single feature, cannam@16: with timestamp but no value, for each beat, labelled with the cannam@16: corresponding estimated tempo at that beat. cannam@16:
cannam@16:Onset Detection Function – The raw note onset likelihood function cannam@16: used in beat estimation. cannam@16:
cannam@16:Tempo – The estimated tempo, returned as a feature each time the cannam@16: estimated tempo changes, with a single value for the tempo in beats cannam@16: per minute. cannam@16:
cannam@16:Beat tracking method: M. E. P. Davies and M. D. Plumbley.
cannam@16: Context-dependent beat tracking of musical audio. In IEEE
cannam@16: Transactions on Audio, Speech and Language Processing. Vol. 15, No. 3,
cannam@46: pp1009-1020, 2007;
M. E. P. Davies and M. D. Plumbley.
cannam@16: Beat Tracking With A Two State Model. In Proceedings of the IEEE
cannam@16: International Conference on Acoustics, Speech and Signal Processing
cannam@46: (ICASSP 2005), Vol. 3, pp241-244 Philadelphia, USA, March 19-23, 2005;
cannam@46:
D. P. W. Ellis. Beat Tracking by Dynamic
cannam@46: Programming. In Journal of New Music Research. Vol. 37, No. 1,
cannam@46: pp51-60, 2007.
cannam@16:
Onset detection methods: C. Duxbury, J. P. Bello, M. Davies and cannam@16: M. Sandler, Complex domain Onset Detection for Musical Signals. In cannam@16: Proceedings of the 6th Conference on Digital Audio Effects cannam@16: (DAFx-03). London, UK. September 2003. cannam@16:
cannam@16:Adaptive whitening: D. Stowell and M. D. Plumbley, Adaptive whitening for improved real-time audio onset detection. In cannam@16: Proceedings of the International Computer Music Conference (ICMC'07), cannam@16: August 2007. cannam@16:
cannam@16:Percussion onset detector: D. Barry, D. Fitzgerald, E. Coyle and cannam@16: B. Lawlor, Drum Source Separation using Percussive Feature Detection and Spectral Modulation. ISSC 2005. cannam@16:
cannam@16:The Tempo and Beat Tracker Vamp plugin was written by Matthew Davies cannam@16: and Christian Landone. cannam@16:
cannam@29: cannam@29: cannam@29:System identifier – vamp:qm-vamp-plugins:qm-barbeattracker
cannam@29:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-barbeattracker
Chris@109:
Links – Back to top of library documentation – Download location
cannam@29:
Bar and Beat Tracker analyses a single channel of audio and cannam@29: estimates the positions of bar lines and the resulting counted cannam@29: metrical beat positions within the music (where the first beat of cannam@29: each bar is "1", the equivalent of counting in time to the music). cannam@29: It is closely related to the Tempo and cannam@29: Beat Tracker, producing the same results for beat position as cannam@29: that plugin's "New" beat tracking method. cannam@29: cannam@29:
cannam@29: cannam@29:The plugin first calculates an onset detection function using the cannam@29: "Complex Domain" method (see Tempo and Beat cannam@29: Tracker).
cannam@29: cannam@29:The beat tracking method performs two passes over the onset cannam@29: detection function, first to estimate the tempo contour, and then cannam@29: given the tempo, to recover the beat locations.
cannam@29: cannam@29:To identify the tempo, the onset detection function is partitioned cannam@29: into 6-second frames with a 1.5-second increment. The autocorrelation cannam@29: function of each 6-second onset detection function is found and this cannam@29: is then passed through a perceptually weighted comb filterbank (see cannam@29: reference Davies 2007). The successive comb filterbank output signals cannam@29: are grouped together into a matrix of observations of periodicity cannam@29: through time. The best path of periodicity through these observations cannam@29: is found using the Viterbi algorithm, where the transition matrix is cannam@29: defined as a diagonal Gaussian.
cannam@29: cannam@29:Given the estimates of periodicity, the beat locations are recovered cannam@29: by applying the dynamic programming algorithm (see reference Ellis cannam@29: 2007). This process involves the calculation of a recursive cumulative cannam@29: score function and backtrace signal. The cumulative score indicates cannam@29: the likelihood of a beat existing at each sample of the onset cannam@29: detection function input, and the backtrace gives the location of the cannam@29: best previous beat given this point in time. Once the cumulative score cannam@29: and backtrace have been calculated for the whole input signal, the cannam@29: best path through beat locations is found by recursively sampling the cannam@29: backtrace signal from the end of the input signal back to the cannam@29: beginning. See reference Stark et al. 2009 for a description of the cannam@29: real-time implementation of the beat tracking algorithm.
cannam@29: cannam@29:Once the beat locations have been identified, the plugin makes a cannam@29: second pass over the input audio signal, partitioning it into beat cannam@29: synchronous frames. The audio within each beat frame is down-sampled cannam@29: to give a new sampling frequency of 2.8kHz. A beat-synchronous cannam@29: spectral representation is then calculated within each frame, from cannam@29: which a measure of beat spectral difference is calculated using cannam@29: Jensen-Shannon divergence. The bar boundaries are identified as those cannam@29: beat transitions leading to most consistent spectral change given the cannam@29: specified number of beats per bar.
cannam@29: cannam@29:Beats per Bar – The number of beats per bar (or measure). The cannam@29: plugin assumes that the number of beats per bar is fixed throughout cannam@29: the music. cannam@29:
cannam@29:Beats – The estimated beat locations, returned as a single feature, cannam@29: with timestamp but no value, for each beat, labelled with the cannam@29: number of that beat within the bar (e.g. consecutively 1, 2, 3, 4 for 4 beats to the bar). cannam@29:
cannam@29:Bars – The estimated bar line locations, returned as a single feature, cannam@29: with timestamp but no value, for each bar. cannam@29:
cannam@29:Beat Count – The estimated beat locations, returned as a single feature, cannam@29: with timestamp and a value corresponding to the cannam@29: number of that beat within the bar. This is similar to the Beats output except that it returns a counting function rather than a series of instants. cannam@29:
cannam@29:Beat Spectral Difference – The new-bar likelihood function used in bar line estimation. cannam@29:
cannam@29: cannam@29:Beat tracking method: A. M. Stark, M. E. P. Davies and
cannam@29: M. D. Plumbley. Real-time beat-synchronous analysis of musical
cannam@29: audio. To appear in Proceedings of 12th International Conference
cannam@29: on Digital Audio Effects (DAFx). 2009;
M. E. P. Davies and
cannam@29: M. D. Plumbley. Context-dependent
cannam@29: beat tracking of musical audio. In IEEE Transactions on
cannam@29: Audio, Speech and Language Processing. Vol. 15, No. 3, pp1009-1020,
cannam@29: 2007;
D. P. W. Ellis. Beat Tracking by Dynamic
cannam@29: Programming. In Journal of New Music Research. Vol. 37, No. 1,
cannam@29: pp51-60, 2007.
Bar finding method: M. E. P. Davies and M. D. Plumbley. A cannam@29: spectral difference approach to extracting downbeats in musical cannam@29: audio. In Proceedings of 14th European Signal Processing Conference cannam@29: (EUSIPCO), Italy, 2006.
cannam@29: cannam@29:The Bar and Beat Tracker Vamp plugin was written by Matthew Davies and Adam Stark. cannam@29:
cannam@29: cannam@29: cannam@29: cannam@29:System identifier – vamp:qm-vamp-plugins:qm-keydetector
cannam@16:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-keydetector
Chris@109:
Links – Back to top of library documentation – Download location
cannam@16:
Key Detector analyses a single channel of audio and continuously cannam@16: estimates the key of the music by comparing the degree to which a cannam@16: block-by-block chromagram correlates to the stored key profiles for cannam@16: each major and minor key. cannam@16:
cannam@16:The key profiles are drawn from analysis of Book I of the Well cannam@16: Tempered Klavier by J S Bach, recorded at A=440 equal temperament. cannam@16:
cannam@16:Tuning Frequency – The frequency of concert A in the music under cannam@16: analysis. cannam@16:
cannam@16:Window Length – The number of chroma analysis frames taken into cannam@16: account for key estimation. This controls how eager the key detector cannam@16: will be to return short-duration tonal changes as new key changes (the cannam@16: shorter the window, the more likely it is to detect a new key change). cannam@16:
cannam@16:Tonic Pitch – The tonic pitch of each estimated key change, cannam@16: returned as a single-valued feature at the point where the key change cannam@16: is detected, with value counted from 1 to 12 where C is 1, C# or Db is cannam@16: 2, and so on up to B which is 12. cannam@16:
cannam@16:Key Mode – The major or minor mode of the estimated key, where cannam@16: major is 0 and minor is 1. cannam@16:
cannam@16:Key – The estimated key for each key change, returned as a cannam@16: single-valued feature at the point where the key change is detected, cannam@16: with value counted from 1 to 24 where 1-12 are the major keys and cannam@16: 13-24 are the minor keys, such that C major is 1, C# major is 2, and cannam@16: so on up to B major which is 12; then C minor is 13, Db minor is 14, cannam@16: and so on up to B minor which is 24. cannam@16:
cannam@16:Key Strength Plot – A grid representing the ongoing key cannam@16: "probability" throughout the music. This is returned as a feature for cannam@16: each chroma frame, containing 25 bins. Bins 1-12 are the major keys cannam@16: from C upwards; bins 14-25 are the minor keys from C upwards. The cannam@16: 13th bin is unused: it just provides space between the first and cannam@16: second halves of the feature if displayed in a single plot. cannam@16:
cannam@16:The outputs are also labelled with pitch or key as text. cannam@16:
cannam@16:Method: see K. Noland and M. Sandler. Signal Processing Parameters for Tonality Estimation. In Proceedings of Audio Engineering Society cannam@16: 122nd Convention, Vienna, 2007. cannam@16:
cannam@16:The Key Detector Vamp plugin was written by Katy Noland and Christian cannam@16: Landone. cannam@16:
cannam@29: cannam@29:System identifier – vamp:qm-vamp-plugins:qm-tonalchange
cannam@16:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-tonalchange
Chris@109:
Links – Back to top of library documentation – Download location
cannam@16:
Tonal Change analyses a single channel of audio, detecting harmonic cannam@16: changes such as chord boundaries. cannam@16:
cannam@16:Gaussian smoothing – The window length for the internal smoothing cannam@16: operation, in chroma analysis frames. This controls how eager the cannam@16: tonal change detector will be to identify very short-term tonal cannam@16: changes. The default value of 5 is quite short, and may lead to more cannam@16: (not always meaningful) results being returned; for many purposes a cannam@16: larger value, closer to the maximum of 20, may be appropriate. cannam@16:
cannam@16:Chromagram minimum pitch – The MIDI pitch value (0-127) of the cannam@16: minimum pitch included in the internal chromagram analyis. cannam@16:
cannam@16:Chromagram maximum pitch – The MIDI pitch value (0-127) of the cannam@16: maximum pitch included in the internal chromagram analyis. cannam@16:
cannam@16:Chromagram tuning frequency – The frequency of concert A in the cannam@16: music under analysis. cannam@16:
cannam@16:Transform to 6D Tonal Content Space – A representation of the cannam@16: musical content in a six-dimensional tonal space onto which the cannam@16: algorithm maps 12-bin chroma vectors extracted from the audio. cannam@16:
cannam@16:Tonal Change Detection Function – A function representing the cannam@16: estimated likelihood of a tonal change occurring in each spectral cannam@16: frame. cannam@16:
cannam@16:Tonal Change Positions – The resulting estimated positions of tonal cannam@16: changes. cannam@16:
cannam@16:Method: C. A. Harte, M. Gasser, and M. Sandler. Detecting harmonic change in musical audio. In Proceedings of the 1st ACM workshop on cannam@16: Audio and Music Computing Multimedia, Santa Barbara, 2006. cannam@16:
cannam@29:The Tonal Change Vamp plugin was written by Chris Harte and Martin cannam@16: Gasser. cannam@16:
cannam@29: cannam@29: cannam@29:System identifier – vamp:qm-vamp-plugins:qm-adaptivespectrogram
cannam@29:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-adaptivespectrogram
Chris@109:
Links – Back to top of library documentation – Download location
cannam@29:
Adaptive Spectrogram produces a composite spectrogram from a set of cannam@29: series of short-time Fourier transforms at differing resolutions. cannam@29: Values are selected from these spectrograms by repeated subdivision by cannam@29: time and frequency in order to maximise an entropy function across cannam@29: each column.
cannam@29: cannam@29:Number of resolutions – The number of distinct cannam@29: resolutions to calculate and use. The resolutions will be consecutive cannam@29: powers of two starting from the smallest resolution specified.
cannam@29: cannam@29:Smallest resolution – The smallest of the set of cannam@29: resolutions to use.
cannam@29: cannam@29:Omit alternate resolutions – Causes the plugin to cannam@29: ignore alternate resolutions (i.e. the smallest resolution multiplied cannam@29: by 2, 8, 32, etc) when composing a spectrogram. The smallest cannam@29: resolution specified, and its multiples by 4, 16, etc as applicable, cannam@29: will be retained. The total number of resolutions actually included cannam@29: in the resulting spectrogram will therefore be N/2 (for even N) or cannam@29: (N+1)/2 (for odd N) where N is the value of the "number of cannam@29: resolutions" parameter. This permits a wider range of resolutions to cannam@29: be included with less processing, at obvious cost in quality.
cannam@29: cannam@29:Multi-threaded processing – Enables multi-threading of cannam@29: the spectrogram calculation. This usually results in somewhat faster cannam@29: processing where multiple CPU cores are available.
cannam@29: cannam@29:As an example of the resolution parameters, if the "number of cannam@29: resolutions" is set to 5, "smallest resolution" to 128, and "omit cannam@29: alternate resolutions" is not used, the composite spectrogram will be cannam@29: calculated using spectrograms from 128, 256, 512, 1024, and 2048 point cannam@29: short-time Fourier transforms (with 50% overlap in each case). With cannam@29: "omit alternate resolutions" set, the same parameters would result in cannam@29: spectrograms from 128, 512, and 2048 point STFTs being used.
cannam@29: cannam@29:Method: X. Wen and M. Sandler. Composite spectrogram using multiple Fourier transforms. IET Signal Processing, 3(1):51-63, 2009. cannam@29:
cannam@29: cannam@29:The Adaptive Spectrogram Vamp plugin was written by Wen Xue and Chris Cannam.
cannam@29: cannam@29:System identifier – vamp:qm-vamp-plugins:qm-transcription
cannam@29:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-transcription
Chris@109:
Links – Back to top of library documentation – Download location
cannam@29:
cannam@29:
The Polyphonic Transcription plugin estimates a note transcription cannam@29: using MIDI pitch values from its input audio, returning a feature for cannam@29: each note (with timestamp and duration) whose value is the MIDI pitch cannam@29: number. Velocity is not estimated.
cannam@29: cannam@29:Although the published description of the method is described as cannam@29: real-time, the implementation used in this plugin is non-causal; it cannam@29: buffers its input to operate on in a single unit, doing all the real cannam@29: work after its entire input has been received, and is very memory cannam@29: intensive. However, it is relatively fast (faster than real-time) cannam@29: compared to other polyphonic transcription methods.
cannam@29: cannam@29:The plugin works best at 44.1KHz input sample rate, and is tuned for cannam@29: piano and guitar music.
cannam@29: cannam@29: cannam@29:Method: R. Zhou and J. D. Reiss. A Real-Time Polyphonic Music Transcription System. In Proceedings of the Fourth Music Information Retrieval Evaluation eXchange (MIREX), Philadelphia, USA, 2008;
R. Zhou and J. D. Reiss. A Real-Time Frame-Based Multiple Pitch Estimation Method Using the Resonator Time Frequency Image. Third Music Information Retrieval Evaluation eXchange (MIREX), Vienna, Austria, 2007.
The Polyphonic Transcription Vamp plugin was written by Ruohua Zhou.
cannam@29: cannam@29: cannam@29:System identifier – vamp:qm-vamp-plugins:qm-segmenter
cannam@16:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-segmenter
Chris@109:
Links – Back to top of library documentation – Download location
cannam@16:
Segmenter divides a single channel of music up into structurally cannam@16: consistent segments. It returns a numeric value (the segment type) cannam@16: for each moment at which a new segment starts. cannam@16:
cannam@16:For music with clearly tonally distinguishable sections such as verse, cannam@16: chorus, etc., segments with the same type may be expected to be cannam@16: similar to one another in some structural sense. For example, cannam@16: repetitions of the chorus are likely to share a segment type. cannam@16:
cannam@16:The plugin only attempts to identify similar segments; it does not cannam@16: attempt to label them. For example, it makes no attempt to tell you cannam@16: which segment is the chorus. cannam@16:
cannam@16:Note that this plugin does a substantial amount of processing after cannam@16: receiving all of the input audio data, before it produces any results. cannam@16:
cannam@16:The method relies upon structural/timbral similarity to obtain the cannam@16: high-level song structure. This is based on the assumption that the cannam@16: distributions of timbre features are similar over corresponding cannam@16: structural elements of the music. cannam@16:
cannam@16:The algorithm works by obtaining a frequency-domain representation of cannam@16: the audio signal using a Constant-Q transform, a Chromagram or cannam@16: Mel-Frequency Cepstral Coefficients (MFCC) as underlying features (the cannam@16: particular feature is selectable as a parameter). The extracted cannam@16: features are normalised in accordance with the MPEG-7 standard (NASE cannam@16: descriptor), which means the spectrum is converted to decibel scale cannam@16: and each spectral vector is normalised by the RMS energy envelope. cannam@16: The value of this envelope is stored for each processing block of cannam@16: audio. This is followed by the extraction of 20 principal components cannam@16: per block using PCA, yielding a sequence of 21 dimensional feature cannam@16: vectors where the last element in each vector corresponds to the cannam@16: energy envelope. cannam@16:
cannam@16:A 40-state Hidden Markov Model is then trained on the whole sequence cannam@16: of features, with each state of the HMM corresponding to a specific cannam@16: timbre type. This process partitions the timbre-space of a given track cannam@16: into 40 possible types. The important assumption of the model is that cannam@16: the distribution of these features remain consistent over a structural cannam@16: segment. After training and decoding the HMM, the song is assigned a cannam@16: sequence of timbre-features according to specific timbre-type cannam@16: distributions for each possible structural segment. cannam@16:
cannam@16:The segmentation itself is computed by clustering timbre-type cannam@16: histograms. A series of histograms are created over a sliding window cannam@16: which are grouped into M clusters by an adapted soft k-means cannam@16: algorithm. Each of these clusters will correspond to a specific cannam@16: segment-type of the analyzed song. Reference histograms, iteratively cannam@16: updated during clustering, describe the timbre distribution for each cannam@16: segment. The segmentation arises from the final cluster assignments. cannam@16:
cannam@16:Number of segment-types – The maximum number of clusters cannam@16: (segment-types) to be returned. The default is 10. Unlike many cannam@16: clustering algorithms, the constrained clustering used in this plugin cannam@16: does not produce too many clusters or vary significantly even if this cannam@16: is set too high. However, this parameter can be useful for limiting cannam@16: the number of expected segment-types. cannam@16:
cannam@16:Feature Type – The type of spectral feature used for segmentation. The available features are:
Minimum segment duration – The approximate expected minimum cannam@16: duration for a segment, from 1 to 15 seconds. Changing this parameter cannam@16: may help the plugin to find musical sections rather than just cannam@16: following changes in the sound of the music, and also avoid wasting a cannam@16: segment-type cluster for timbrally distinct but too-short segments. cannam@16: The default of 4 seconds usually produces good results. cannam@16:
cannam@16:Segmentation – The estimated segment boundaries, returned as a cannam@16: single feature with one value at each segment boundary, with the value cannam@16: representing the segment type number for the segment starting at that cannam@16: boundary. cannam@16:
cannam@16:Method: M. Levy and M. Sandler. Structural segmentation of musical audio by constrained clustering. IEEE Transactions on Audio, Speech, and Language Processing, February 2008. cannam@16:
cannam@16:Note that this plugin does not implement the beat-sychronous aspect cannam@16: of the segmentation method described in the paper. cannam@16:
cannam@16:The Segmenter Vamp plugin was written by Mark Levy. Thanks to George cannam@16: Fazekas for providing much of this documentation. cannam@16:
cannam@29:System identifier – vamp:qm-vamp-plugins:qm-similarity
cannam@16:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-similarity
Chris@109:
Links – Back to top of library documentation – Download location
cannam@16:
Similarity treats each channel of its audio input as a separate cannam@16: "track", and estimates how similar the tracks are to one another using cannam@16: a selectable similarity measure. cannam@16:
cannam@16:The plugin also returns the intermediate data used as a basis of the cannam@16: similarity measure; it can therefore be used on a single channel of cannam@16: input (with the resulting intermediate data then being applied in some cannam@16: other similarity or clustering algorithm, for example) if desired, as cannam@16: well as with multiple inputs. cannam@16:
cannam@16:Because of the way this plugin handles multiple inputs, by assuming cannam@16: that each channel represents a separate piece of music, it may not be cannam@16: appropriate for use directly in a general-purpose host (unless you cannam@16: actually want to do something like compare two stereo channels for cannam@16: timbral similarity, which is unlikely). cannam@16:
cannam@16:Feature Type – The underlying audio feature used for the similarity cannam@16: measure. The available features are: cannam@16:
Distance Matrix – A matrix of the distance measures between input cannam@16: channels, returned as a series of vector features timestamped at cannam@16: one-second intervals. The distance from channel i to channel j cannam@16: appears as the j'th bin of the feature at time i. cannam@16:
cannam@16:Distance from First Channel – A single vector feature, timestamped cannam@16: at time zero, containing the distances between the first input channel cannam@16: and each of the input channels (including the first channel itself at cannam@16: bin 0, which should have zero distance). cannam@16:
cannam@16:Ordered Distances from First Channel – A pair of vector features, cannam@16: at times 0 and 1 second. The feature at time 0 contains the 1-based cannam@16: indices of the input channels in the order of similarity to the first cannam@16: input channel (so its first bin should always contain 1, as the first cannam@16: channel is most similar to itself). The feature at time 1 contains, cannam@16: in bin n, the distance between the first input channel and the channel cannam@16: with index found at bin n of the feature at time 0. cannam@16:
cannam@16:Feature Means – A series of vector features containing the mean cannam@16: values of each of the feature bins across the duration of each of the cannam@16: input channels. This output returns one feature for each input cannam@16: channel, timestamped at one-second intervals. The number of bins for cannam@16: each feature depends on the feature type; it will be 20 for MFCC cannam@16: features and 12 for chroma features. No features will be returned on cannam@16: this output if the feature type is purely rhythmic. cannam@16:
cannam@16:Feature Variances – Just as Feature Means, but variances. cannam@16:
cannam@16:Beat Spectra – A series of vector features containing the rhythmic cannam@16: autocorrelation profiles (beat spectra) for each of the input cannam@16: channels. This output returns one 512-bin feature for each input cannam@16: channel, timestamped at one-second intervals. No features will be cannam@16: returned on this output if the feature type contains no rhythm cannam@16: component. cannam@16:
cannam@16:Timbral similarity: M. Levy and M. Sandler. Lightweight measures for timbral similarity of musical audio. In Proceedings of the 1st cannam@16: ACM workshop on Audio and Music Computing Multimedia, Santa Barbara, cannam@16: 2006. cannam@16:
cannam@16:Combined rhythmic and timbral similarity: K. Jacobson. A Multifaceted Approach to Music Similarity. In Proceedings of the cannam@16: Seventh International Conference on Music Information Retrieval cannam@16: (ISMIR), 2006. cannam@16:
cannam@16:The Similarity Vamp plugin was written by Mark Levy, Kurt Jacobson and cannam@16: Chris Cannam. cannam@16:
cannam@29: cannam@29: cannam@29:System identifier – vamp:qm-vamp-plugins:qm-dwt
cannam@29:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-dwt
Chris@109:
Links – Back to top of library documentation – Download location
cannam@29:
cannam@29:
Discrete Wavelet Transform plugin performs the forward DWT on the cannam@29: signal. The wavelet coefficients are derived from a fast segmented DWT cannam@29: algorithm without block end effects. The DWT can be performed with cannam@29: various functions from a selection of wavelets up to the 16th scale.
cannam@29: cannam@29:
The wavelet coefficients are returned as feature columns at a rate of cannam@29: half the sample rate of the signal to be analysed. To simulate cannam@29: multiresolution in the layer data table, the coefficient values at cannam@29: higher scales are copied multiple times according to the number of the cannam@29: scale. For example, for scale 2 each value will appear twice, at scale cannam@29: 3 they will be appear four times, at scale 4 there will be 8 times the cannam@29: same coefficient value in order to simulate the lower resolution at cannam@29: higher scales.
cannam@29: cannam@29:Scales – Adjusts the number of scales of the DWT. The cannam@29: processing block size needs to be set to at least 2n, where n = cannam@29: number of scales.
cannam@29: cannam@29:Wavelet – Selects the wavelet function to be used for cannam@29: the transform. Wavelets from the following families are available: cannam@29: Daubechies, Symlets, Coiflets, Biorthogonal, Meyer.
cannam@29: cannam@29:Principles: S. Mallat. A theory for multiresolution signal decomposition: the wavelet representation. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 11 (1989), pp. 674-693;
cannam@29: P. Rajmic and J. Vlach. Real-Time Audio Processing via Segmented Wavelet Transform. In Proceedings of the 10th Int. Conference on Digital Audio Effects (DAFx-07), Bordeaux, France, September 10-15, 2007.
The Discrete Wavelet Transform plugin was written by Thomas Wilmering.
cannam@29: cannam@29:System identifier – vamp:qm-vamp-plugins:qm-constantq
cannam@16:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-constantq
Chris@109:
Links – Back to top of library documentation – Download location
cannam@16:
Constant-Q Spectrogram calculates a spectrogram based on a short-time cannam@16: windowed constant Q spectral transform. This is a spectrogram in cannam@16: which the ratio of centre frequency to resolution is constant for each cannam@16: frequency bin. The frequency bins correspond to the frequencies of cannam@16: "musical notes" rather than being linearly spaced in frequency as they cannam@16: are for the conventional DFT spectrogram. cannam@16:
cannam@16:The pitch range and the number of frequency bins per octave may be cannam@16: adjusted using the plugin's parameters. Note that the plugin's cannam@16: preferred step and block sizes are defined by these parameters, and cannam@16: the plugin will not accept any other block size than its preferred cannam@16: value. cannam@16:
cannam@16:Minimum Pitch – The MIDI pitch value (0-127) corresponding to the lowest cannam@16: frequency to be included in the constant-Q transform. cannam@16:
cannam@16:Maximum Pitch – The MIDI pitch value (0-127) corresponding to the cannam@16: lowest frequency to be included in the constant-Q transform. cannam@16:
cannam@16:Tuning Frequency – The frequency of concert A in the cannam@16: music under analysis. cannam@16:
cannam@16:Bins per Octave – The number of constant-Q transform bins to be cannam@16: computed per octave. cannam@16:
cannam@16:Normalized – Whether to normalize each output column to unit cannam@16: maximum. cannam@16:
cannam@16:Constant-Q Spectrogram – The calculated spectrogram, as a single cannam@16: feature per process block containing one bin for each pitch included cannam@16: in the spectrogram's range. cannam@16:
cannam@16:Principle: J. Brown. Calculation of a constant Q spectral transform. Journal of the Acoustical Society of America, 89(1): cannam@16: 425-434, 1991. cannam@16:
cannam@16:The Constant-Q Spectrogram Vamp plugin was written by Christian cannam@16: Landone. cannam@16:
cannam@29:System identifier – vamp:qm-vamp-plugins:qm-chromagram
cannam@16:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-chromagram
Chris@109:
Links – Back to top of library documentation – Download location
cannam@16:
Chromagram calculates a constant Q spectral transform (as in the cannam@16: Constant Q Spectrogram plugin) and then wraps the frequency bin values cannam@16: into a single octave, with each bin containing the sum of the cannam@16: magnitudes from the corresponding bin in all octaves. The number of cannam@16: values in each feature vector returned by the plugin is therefore the cannam@16: same as the number of bins per octave configured for the underlying cannam@16: constant Q transform. cannam@16:
cannam@16:The pitch range and the number of frequency bins per octave for the cannam@16: transform may be adjusted using the plugin's parameters. Note that cannam@16: the plugin's preferred step and block sizes depend on these cannam@16: parameters, and the plugin will not accept any other block size than cannam@16: its preferred value. cannam@16:
cannam@16:Minimum Pitch – The MIDI pitch value (0-127) corresponding to the cannam@16: lowest frequency to be included in the constant-Q transform used in cannam@16: calculating the chromagram. cannam@16:
cannam@16:Maximum Pitch – The MIDI pitch value (0-127) corresponding to the cannam@16: lowest frequency to be included in the constant-Q transform used in cannam@16: calculating the chromagram. cannam@16:
cannam@16:Tuning Frequency – The frequency of concert A in the cannam@16: music under analysis. cannam@16:
cannam@16:Bins per Octave – The number of constant-Q transform bins to be cannam@16: computed per octave, and thus the total number of bins present in the cannam@16: resulting chromagram. cannam@16:
cannam@16:Normalized – Whether to normalize each output column. Normalization cannam@16: may be to unit sum or unit maximum. cannam@16:
cannam@16:Chromagram – The calculated chromagram, as a single feature per cannam@16: process block containing the number of bins given in the bins per cannam@16: octave parameter. cannam@16:
cannam@16:The Chromagram Vamp plugin was written by Christian Landone. cannam@16:
cannam@29:System identifier – vamp:qm-vamp-plugins:qm-mfcc
cannam@16:
RDF URI – http://vamp-plugins.org/rdf/plugins/qm-vamp-plugins#qm-mfcc
Chris@109:
Links – Back to top of library documentation – Download location
cannam@16:
Mel-Frequency Cepstral Coefficients calculates MFCCs from a single cannam@16: channel of audio. These coefficients, derived from a cosine transform cannam@16: of the mapping of an audio spectrum onto a frequency scale modelled on cannam@16: human auditory response, are widely used in speech recognition, music cannam@16: classification and other tasks. cannam@16:
cannam@16:Number of Coefficients – The number of MFCCs to return. Commonly cannam@16: used values include 13 or the default 20. This number includes C0 if cannam@16: requested (see Include C0 below). cannam@16:
cannam@16:Power for Mel Amplitude Logs – An optional power value to which the cannam@16: spectral amplitudes should be raised before applying the cosine cannam@16: transform. Values greater than 1 may in principle reduce the cannam@16: contribution of noise to the results. The default is 1. cannam@16:
cannam@16:Include C0 – Whether to include the "zero'th" coefficient, which cannam@16: simply reflects the overall signal power across the Mel frequency cannam@16: bands. cannam@16:
cannam@16:Coefficients – The MFCC values, returned as one vector feature per cannam@16: processing block. cannam@16:
cannam@16:Means of Coefficients – The overall means of the MFCC bins, as a cannam@16: single vector feature with time 0 that is returned when processing is cannam@16: complete. cannam@16:
cannam@16:MFCCs in music: See B. Logan. Mel-Frequency Cepstral Coefficients for Music Modeling. In Proceedings of the First International cannam@16: Symposium on Music Information Retrieval (ISMIR), 2000. cannam@16:
cannam@16:The Mel-Frequency Cepstral Coefficients Vamp plugin was written by cannam@16: Nicolas Chetry and Chris Cannam. cannam@16:
cannam@16: cannam@16: cannam@16: cannam@16: