Mercurial > hg > qm-vamp-plugins

--- a/README.txt	Fri Jan 25 17:56:49 2008 +0000
+++ b/README.txt	Wed Jan 30 12:42:04 2008 +0000
@@ -202,15 +202,24 @@

 The Segmenter plugin divides a single channel of music up into
 structurally consistent segments.  Its single output contains a
-numeric value (the segment classifier) for each moment at which a new
-segment starts.  For music with clearly tonally distinguishable
-sections such as verse, chorus, etc., the segments with the same
-classifier number are expected to be similar to one another in some
-structural sense (e.g. all repetitions of the chorus).
+numeric value (the segment type) for each moment at which a new
+segment starts.

-Note that this plugin consumes a significant amount of processing time
-after receiving all of the input audio data, before it produces any
-result.
+For music with clearly tonally distinguishable sections such as verse,
+chorus, etc., the segments with the same type may be expected to be
+similar to one another in some structural sense (e.g. repetitions of
+the chorus).
+
+The type of feature used in segmentation can be selected using the
+Feature Type parameter.  The default Hybrid (Constant-Q) is generally
+effective for modern studio recordings, while the Chromatic option may
+be preferable for live, acoustic, or older recordings, in which
+repeated sections may be less consistent in sound.  Also available is
+a timbral (MFCC) feature, which is more likely to result in
+classification by instrumentation rather than musical content.
+
+Note that this plugin does a substantial amount of processing after
+receiving all of the input audio data, before it produces any results.


 Similarity
@@ -230,6 +239,49 @@
 		In Proceedings of the Seventh International Conference on
 		Music Information Retrieval (ISMIR), 2006.

+The Similarity plugin treats each channel of its audio input as a
+separate "track", and estimates how similar the tracks are to one
+another using a selectable similarity measure.
+
+The plugin also returns the intermediate data used as a basis of the
+similarity measure; it can therefore be used on a single channel of
+input (with the resulting intermediate data then being applied in some
+other similarity or clustering algorithm, for example) if desired, as
+well as with multiple inputs.
+
+The underlying audio features used for the similarity measure can be
+selected using the Feature Type parameter.  The available features are
+Timbre (in which the distance between tracks is a symmetrised
+Kullback-Leibler divergence between Gaussian-modelled MFCC means and
+variances across each track); Chroma (KL divergence of mean chroma
+histogram); Rhythm (cosine distance between "beat spectrum" measures
+derived from a short sampled section of the track); and combined
+"Timbre and Rhythm" and "Chroma and Rhythm".
+
+The plugin has six outputs: a matrix of the distances between input
+channels; a vector containing the distances between the first input
+channel and each of the input channels; a pair of vectors containing
+the indices of the input channels in the order of their similarity to
+the first input channel, and the distances between the first input
+channel and each of those channels; the means of the underlying
+feature bins (MFCCs or chroma); the variances of the underlying
+feature bins; and the beat spectra used for the rhythmic feature.
+
+Because Vamp does not have the capability to return features in matrix
+form explicitly, the matrix output is returned as a series of vector
+features timestamped at one-second intervals.  Likewise, the
+underlying feature outputs contain one vector feature per input
+channel, timestamped at one-second intervals (so the feature for the
+first channel is at time 0, and so on).  Examining the features that
+the plugin actually returns, when run on some test data, may make this
+arrangement more clear.
+
+Note that the underlying feature values are only returned if the
+relevant feature type is selected.  That is, the means and variances
+outputs are valid provided the pure rhythm feature is not selected;
+the beat spectra output is valid provided rhythm is included in the
+selected feature type.
+

 Constant-Q Spectrogram
 ----------------------
@@ -243,6 +295,18 @@
 		Journal of the Acoustical Society of America, 89(1):
 		425-434, 1991.

+The Constant-Q Spectrogram plugin calculates a spectrogram based on a
+short-time windowed constant Q spectral transform.  This is a
+spectrogram in which the ratio of centre frequency to resolution is
+constant for each frequency bin.  The frequency bins correspond to the
+frequencies of "musical notes" rather than being linearly spaced in
+frequency as they are for the conventional DFT spectrogram.
+
+The pitch range and the number of frequency bins per octave may be
+adjusted using the plugin's parameters.  Note that the plugin's
+preferred step and block sizes depend on these parameters, and the
+plugin will not accept any other block size.
+

 Chromagram
 ----------
@@ -251,7 +315,18 @@
 Authors:	Christian Landone
 Category:	Visualisation

-References:
+The Chromagram plugin calculates a constant Q spectral transform (as
+above) and then wraps the frequency bin values into a single octave,
+with each bin containing the sum of the magnitudes from the
+corresponding bin in all octaves.  The number of values in each
+feature vector returned by the plugin is therefore the same as the
+number of bins per octave configured for the underlying constant Q
+transform.
+
+The pitch range and the number of frequency bins per octave for the
+transform may be adjusted using the plugin's parameters.  Note that
+the plugin's preferred step and block sizes depend on these
+parameters, and the plugin will not accept any other block size.


 Mel-Frequency Cepstral Coefficients
@@ -266,4 +341,9 @@
 		In Proceedings of the First International Symposium on Music
 		Information Retrieval (ISMIR), 2000.

+The Mel-Frequency Cepstral Coefficients plugin calculates MFCCs from a
+single channel of audio, returning one MFCC vector from each process
+call.  It also returns the overall means of the coefficient values
+across the length of the audio input, as a separate output at the end
+of processing.
--- a/plugins/ChromagramPlugin.cpp	Fri Jan 25 17:56:49 2008 +0000
+++ b/plugins/ChromagramPlugin.cpp	Wed Jan 30 12:42:04 2008 +0000
@@ -222,6 +222,8 @@
     m_step = m_chromagram->getHopSize();
     m_block = m_chromagram->getFrameSize();

+    //!!! stepSize != m_step should not be an error
+
     if (stepSize != m_step ||
         blockSize != m_block) {
         delete m_chromagram;
--- a/plugins/ConstantQSpectrogram.cpp	Fri Jan 25 17:56:49 2008 +0000
+++ b/plugins/ConstantQSpectrogram.cpp	Wed Jan 30 12:42:04 2008 +0000
@@ -68,7 +68,7 @@
 string
 ConstantQSpectrogram::getDescription() const
 {
-    return "Extract a constant pitch bin spectrogram from the input audio";
+    return "Extract a spectrogram with constant ratio of centre frequency to resolution from the input audio";
 }

 string
@@ -139,7 +139,7 @@
     desc.unit = "";
     desc.minValue = 0;
     desc.maxValue = 1;
-    desc.defaultValue = 1;
+    desc.defaultValue = 0;
     desc.isQuantized = true;
     desc.quantizeStep = 1;
     list.push_back(desc);
@@ -214,6 +214,8 @@
     m_step = m_cq->gethop();
     m_block = m_cq->getfftlength();

+    //!!! stepSize != m_step should not be an error
+
     if (stepSize != m_step ||
         blockSize != m_block) {
         delete m_cq;