Mercurial > hg > qm-vamp-plugins
changeset 51:64e4c8aec287
* docs
author | Chris Cannam <c.cannam@qmul.ac.uk> |
---|---|
date | Wed, 30 Jan 2008 12:42:04 +0000 |
parents | df7a0bc46592 |
children | 4fe04e706839 |
files | README.txt plugins/ChromagramPlugin.cpp plugins/ConstantQSpectrogram.cpp |
diffstat | 3 files changed, 95 insertions(+), 11 deletions(-) [+] |
line wrap: on
line diff
--- a/README.txt Fri Jan 25 17:56:49 2008 +0000 +++ b/README.txt Wed Jan 30 12:42:04 2008 +0000 @@ -202,15 +202,24 @@ The Segmenter plugin divides a single channel of music up into structurally consistent segments. Its single output contains a -numeric value (the segment classifier) for each moment at which a new -segment starts. For music with clearly tonally distinguishable -sections such as verse, chorus, etc., the segments with the same -classifier number are expected to be similar to one another in some -structural sense (e.g. all repetitions of the chorus). +numeric value (the segment type) for each moment at which a new +segment starts. -Note that this plugin consumes a significant amount of processing time -after receiving all of the input audio data, before it produces any -result. +For music with clearly tonally distinguishable sections such as verse, +chorus, etc., the segments with the same type may be expected to be +similar to one another in some structural sense (e.g. repetitions of +the chorus). + +The type of feature used in segmentation can be selected using the +Feature Type parameter. The default Hybrid (Constant-Q) is generally +effective for modern studio recordings, while the Chromatic option may +be preferable for live, acoustic, or older recordings, in which +repeated sections may be less consistent in sound. Also available is +a timbral (MFCC) feature, which is more likely to result in +classification by instrumentation rather than musical content. + +Note that this plugin does a substantial amount of processing after +receiving all of the input audio data, before it produces any results. Similarity @@ -230,6 +239,49 @@ In Proceedings of the Seventh International Conference on Music Information Retrieval (ISMIR), 2006. +The Similarity plugin treats each channel of its audio input as a +separate "track", and estimates how similar the tracks are to one +another using a selectable similarity measure. + +The plugin also returns the intermediate data used as a basis of the +similarity measure; it can therefore be used on a single channel of +input (with the resulting intermediate data then being applied in some +other similarity or clustering algorithm, for example) if desired, as +well as with multiple inputs. + +The underlying audio features used for the similarity measure can be +selected using the Feature Type parameter. The available features are +Timbre (in which the distance between tracks is a symmetrised +Kullback-Leibler divergence between Gaussian-modelled MFCC means and +variances across each track); Chroma (KL divergence of mean chroma +histogram); Rhythm (cosine distance between "beat spectrum" measures +derived from a short sampled section of the track); and combined +"Timbre and Rhythm" and "Chroma and Rhythm". + +The plugin has six outputs: a matrix of the distances between input +channels; a vector containing the distances between the first input +channel and each of the input channels; a pair of vectors containing +the indices of the input channels in the order of their similarity to +the first input channel, and the distances between the first input +channel and each of those channels; the means of the underlying +feature bins (MFCCs or chroma); the variances of the underlying +feature bins; and the beat spectra used for the rhythmic feature. + +Because Vamp does not have the capability to return features in matrix +form explicitly, the matrix output is returned as a series of vector +features timestamped at one-second intervals. Likewise, the +underlying feature outputs contain one vector feature per input +channel, timestamped at one-second intervals (so the feature for the +first channel is at time 0, and so on). Examining the features that +the plugin actually returns, when run on some test data, may make this +arrangement more clear. + +Note that the underlying feature values are only returned if the +relevant feature type is selected. That is, the means and variances +outputs are valid provided the pure rhythm feature is not selected; +the beat spectra output is valid provided rhythm is included in the +selected feature type. + Constant-Q Spectrogram ---------------------- @@ -243,6 +295,18 @@ Journal of the Acoustical Society of America, 89(1): 425-434, 1991. +The Constant-Q Spectrogram plugin calculates a spectrogram based on a +short-time windowed constant Q spectral transform. This is a +spectrogram in which the ratio of centre frequency to resolution is +constant for each frequency bin. The frequency bins correspond to the +frequencies of "musical notes" rather than being linearly spaced in +frequency as they are for the conventional DFT spectrogram. + +The pitch range and the number of frequency bins per octave may be +adjusted using the plugin's parameters. Note that the plugin's +preferred step and block sizes depend on these parameters, and the +plugin will not accept any other block size. + Chromagram ---------- @@ -251,7 +315,18 @@ Authors: Christian Landone Category: Visualisation -References: +The Chromagram plugin calculates a constant Q spectral transform (as +above) and then wraps the frequency bin values into a single octave, +with each bin containing the sum of the magnitudes from the +corresponding bin in all octaves. The number of values in each +feature vector returned by the plugin is therefore the same as the +number of bins per octave configured for the underlying constant Q +transform. + +The pitch range and the number of frequency bins per octave for the +transform may be adjusted using the plugin's parameters. Note that +the plugin's preferred step and block sizes depend on these +parameters, and the plugin will not accept any other block size. Mel-Frequency Cepstral Coefficients @@ -266,4 +341,9 @@ In Proceedings of the First International Symposium on Music Information Retrieval (ISMIR), 2000. +The Mel-Frequency Cepstral Coefficients plugin calculates MFCCs from a +single channel of audio, returning one MFCC vector from each process +call. It also returns the overall means of the coefficient values +across the length of the audio input, as a separate output at the end +of processing.
--- a/plugins/ChromagramPlugin.cpp Fri Jan 25 17:56:49 2008 +0000 +++ b/plugins/ChromagramPlugin.cpp Wed Jan 30 12:42:04 2008 +0000 @@ -222,6 +222,8 @@ m_step = m_chromagram->getHopSize(); m_block = m_chromagram->getFrameSize(); + //!!! stepSize != m_step should not be an error + if (stepSize != m_step || blockSize != m_block) { delete m_chromagram;
--- a/plugins/ConstantQSpectrogram.cpp Fri Jan 25 17:56:49 2008 +0000 +++ b/plugins/ConstantQSpectrogram.cpp Wed Jan 30 12:42:04 2008 +0000 @@ -68,7 +68,7 @@ string ConstantQSpectrogram::getDescription() const { - return "Extract a constant pitch bin spectrogram from the input audio"; + return "Extract a spectrogram with constant ratio of centre frequency to resolution from the input audio"; } string @@ -139,7 +139,7 @@ desc.unit = ""; desc.minValue = 0; desc.maxValue = 1; - desc.defaultValue = 1; + desc.defaultValue = 0; desc.isQuantized = true; desc.quantizeStep = 1; list.push_back(desc); @@ -214,6 +214,8 @@ m_step = m_cq->gethop(); m_block = m_cq->getfftlength(); + //!!! stepSize != m_step should not be an error + if (stepSize != m_step || blockSize != m_block) { delete m_cq;