Mercurial > hg > qm-vamp-plugins

--- a/README.txt	Wed Jan 30 12:42:04 2008 +0000
+++ b/README.txt	Wed Jan 30 13:33:23 2008 +0000
@@ -6,7 +6,7 @@
 Version 1.4.

 For more information about Vamp plugins, see http://www.vamp-plugins.org/
-and http://www.sonicvisualiser.org/.
+and http://www.sonicvisualiser.org/ .


 License
@@ -64,10 +64,15 @@
 This plugin set includes the following plugins:

    * Note onset detector
+
    * Beat tracker and tempo estimator
+
    * Key estimator and tonal change detector
+
    * Segmenter, to divide a track into a consistent sequence of segments
+
    * Timbral and rhythmic similarity between audio tracks
+
    * Chromagram, constant-Q spectrogram, and MFCC calculation plugins

 More details about the plugins follow.
@@ -76,11 +81,11 @@
 Note Onset Detector
 -------------------

-Identifier:	qm-onsetdetector
-Authors:	Chris Duxbury, Juan Pablo Bello and Christian Landone
-Category:	Time > Onsets
+ Identifier:	qm-onsetdetector
+ Authors:	Chris Duxbury, Juan Pablo Bello and Christian Landone
+ Category:	Time > Onsets

-References:	C. Duxbury, J. P. Bello, M. Davies and M. Sandler.
+ References:	C. Duxbury, J. P. Bello, M. Davies and M. Sandler.
 		Complex domain Onset Detection for Musical Signals.
 		In Proceedings of the 6th Conference on Digital Audio
 		Effects (DAFx-03). London, UK. September 2003.
@@ -111,11 +116,11 @@
 Tempo and Beat Tracker
 ----------------------

-Identifier:	qm-tempotracker
-Authors:	Matthew Davies and Christian Landone
-Category:       Time > Tempo
+ Identifier:	qm-tempotracker
+ Authors:	Matthew Davies and Christian Landone
+ Category:      Time > Tempo

-References:	M. E. P. Davies and M. D. Plumbley.
+ References:	M. E. P. Davies and M. D. Plumbley.
 		Context-dependent beat tracking of musical audio.
 		Technical Report C4DM-TR-06-02. 5 April 2006.

@@ -136,14 +141,14 @@
 Key Detector
 ------------

-Identifier:	qm-keydetector
-Authors:	Katy Noland and Christian Landone
-Category:	Key and Tonality
+ Identifier:	qm-keydetector
+ Authors:	Katy Noland and Christian Landone
+ Category:	Key and Tonality

-References:	K. Noland and M. Sandler.
+ References:	K. Noland and M. Sandler.
 		Signal Processing Parameters for Tonality Estimation.
-		In Proceedings of Audio Engineering Society 122nd Convention,
-		Vienna, 2007.
+		In Proceedings of Audio Engineering Society 122nd
+                Convention, Vienna, 2007.

 The Key Detector plugin analyses a single channel of audio and
 continuously estimates the key of the music.
@@ -163,11 +168,11 @@
 Tonal Change
 ------------

-Identifier:	qm-tonalchange
-Authors:	Chris Harte and Martin Gasser
-Category:	Key and Tonality
+ Identifier:	qm-tonalchange
+ Authors:	Chris Harte and Martin Gasser
+ Category:	Key and Tonality

-References:	C. A. Harte, M. Gasser, and M. Sandler.
+ References:	C. A. Harte, M. Gasser, and M. Sandler.
 		Detecting harmonic change in musical audio.
 		In Proceedings of the 1st ACM workshop on Audio and Music
 		Computing Multimedia, Santa Barbara, 2006.
@@ -190,11 +195,11 @@
 Segmenter
 ---------

-Identifier:	qm-segmenter
-Authors:	Mark Levy
-Category:	Classification
+ Identifier:	qm-segmenter
+ Authors:	Mark Levy
+ Category:	Classification

-References:	M. Levy and M. Sandler.
+ References:	M. Levy and M. Sandler.
 		Structural segmentation of musical audio by constrained
 		clustering.
 		IEEE Transactions on Audio, Speech, and Language Processing,
@@ -225,11 +230,11 @@
 Similarity
 ----------

-Identifier:	qm-similarity
-Authors:	Mark Levy, Kurt Jacobson and Chris Cannam
-Category:	Classification
+ Identifier:	qm-similarity
+ Authors:	Mark Levy, Kurt Jacobson and Chris Cannam
+ Category:	Classification

-References:	M. Levy and M. Sandler.
+ References:	M. Levy and M. Sandler.
 		Lightweight measures for timbral similarity of musical audio.
 		In Proceedings of the 1st ACM workshop on Audio and Music
 		Computing Multimedia, Santa Barbara, 2006.
@@ -286,11 +291,11 @@
 Constant-Q Spectrogram
 ----------------------

-Identifier:	qm-constantq
-Authors:	Christian Landone
-Category:	Visualisation
+ Identifier:	qm-constantq
+ Authors:	Christian Landone
+ Category:	Visualisation

-References:	J. Brown.
+ References:	J. Brown.
 		Calculation of a constant Q spectral transform.
 		Journal of the Acoustical Society of America, 89(1):
 		425-434, 1991.
@@ -311,9 +316,9 @@
 Chromagram
 ----------

-Identifier:	qm-chromagram
-Authors:	Christian Landone
-Category:	Visualisation
+ Identifier:	qm-chromagram
+ Authors:	Christian Landone
+ Category:	Visualisation

 The Chromagram plugin calculates a constant Q spectral transform (as
 above) and then wraps the frequency bin values into a single octave,
@@ -332,11 +337,11 @@
 Mel-Frequency Cepstral Coefficients
 -----------------------------------

-Identifier:	qm-mfcc
-Authors:	Nicolas Chetry and Chris Cannam
-Category:	Low Level Features
+ Identifier:	qm-mfcc
+ Authors:	Nicolas Chetry and Chris Cannam
+ Category:	Low Level Features

-References:	B. Logan.
+ References:	B. Logan.
 		Mel-Frequency Cepstral Coefficients for Music Modeling.
 		In Proceedings of the First International Symposium on Music
 		Information Retrieval (ISMIR), 2000.
--- a/plugins/BeatTrack.cpp	Wed Jan 30 12:42:04 2008 +0000
+++ b/plugins/BeatTrack.cpp	Wed Jan 30 13:33:23 2008 +0000
@@ -219,7 +219,9 @@
 {
     size_t theoretical = getPreferredStepSize() * 2;

-    //!!! need power of 2
+    // I think this is not necessarily going to be a power of two, and
+    // the host might have a problem with that, but I'm not sure we
+    // can do much about it here
     return theoretical;
 }
--- a/plugins/ChromagramPlugin.cpp	Wed Jan 30 12:42:04 2008 +0000
+++ b/plugins/ChromagramPlugin.cpp	Wed Jan 30 13:33:23 2008 +0000
@@ -98,7 +98,7 @@
     desc.identifier = "minpitch";
     desc.name = "Minimum Pitch";
     desc.unit = "MIDI units";
-    //!!! descriptions
+    desc.description = "MIDI pitch corresponding to the lowest frequency to be included in the chromagram";
     desc.minValue = 0;
     desc.maxValue = 127;
     desc.defaultValue = 12;
@@ -109,6 +109,7 @@
     desc.identifier = "maxpitch";
     desc.name = "Maximum Pitch";
     desc.unit = "MIDI units";
+    desc.description = "MIDI pitch corresponding to the highest frequency to be included in the chromagram";
     desc.minValue = 0;
     desc.maxValue = 127;
     desc.defaultValue = 96;
@@ -119,6 +120,7 @@
     desc.identifier = "tuning";
     desc.name = "Tuning Frequency";
     desc.unit = "Hz";
+    desc.description = "Frequency of concert A";
     desc.minValue = 420;
     desc.maxValue = 460;
     desc.defaultValue = 440;
@@ -128,6 +130,7 @@
     desc.identifier = "bpo";
     desc.name = "Bins per Octave";
     desc.unit = "bins";
+    desc.description = "Number of constant-Q transform bins per octave, and the number of bins for the chromagram outputs";
     desc.minValue = 2;
     desc.maxValue = 36;
     desc.defaultValue = 12;
@@ -138,6 +141,7 @@
     desc.identifier = "normalization";
     desc.name = "Normalization";
     desc.unit = "";
+    desc.description = "Normalization for each chromagram output column";
     desc.minValue = 0;
     desc.maxValue = 2;
     desc.defaultValue = 2;
@@ -207,9 +211,6 @@
     if (channels < getMinChannelCount() ||
 	channels > getMaxChannelCount()) return false;

-    std::cerr << "ChromagramPlugin::initialise: step " << stepSize << ", block "
-	      << blockSize << std::endl;
-
     m_chromagram = new Chromagram(m_config);
     m_binsums = vector<double>(m_config.BPO);

@@ -222,15 +223,17 @@
     m_step = m_chromagram->getHopSize();
     m_block = m_chromagram->getFrameSize();

-    //!!! stepSize != m_step should not be an error
-
-    if (stepSize != m_step ||
-        blockSize != m_block) {
+    if (blockSize != m_block) {
+        std::cerr << "ChromagramPlugin::initialise: ERROR: supplied block size " << blockSize << " differs from required block size " << m_block << ", initialise failing" << std::endl;
         delete m_chromagram;
         m_chromagram = 0;
         return false;
     }

+    if (stepSize != m_step) {
+        std::cerr << "ChromagramPlugin::initialise: NOTE: supplied step size " << stepSize << " differs from expected step size " << m_step << " (for block size = " << m_block << ")" << std::endl;
+    }
+
     return true;
 }

@@ -276,6 +279,7 @@
     d.identifier = "chromagram";
     d.name = "Chromagram";
     d.unit = "";
+    d.description = "Output of chromagram, as a single vector per process block";
     d.hasFixedBinCount = true;
     d.binCount = m_config.BPO;

@@ -304,7 +308,7 @@

     d.identifier = "chromameans";
     d.name = "Chroma Means";
-    //!!! descriptions
+    d.description = "Mean values of chromagram bins across the duration of the input audio";
     d.sampleType = OutputDescriptor::FixedSampleRate;
     d.sampleRate = 1;
     list.push_back(d);
--- a/plugins/ConstantQSpectrogram.cpp	Wed Jan 30 12:42:04 2008 +0000
+++ b/plugins/ConstantQSpectrogram.cpp	Wed Jan 30 13:33:23 2008 +0000
@@ -98,6 +98,7 @@
     desc.identifier = "minpitch";
     desc.name = "Minimum Pitch";
     desc.unit = "MIDI units";
+    desc.description = "MIDI pitch corresponding to the lowest frequency to be included in the constant-Q transform";
     desc.minValue = 0;
     desc.maxValue = 127;
     desc.defaultValue = 36;
@@ -108,6 +109,7 @@
     desc.identifier = "maxpitch";
     desc.name = "Maximum Pitch";
     desc.unit = "MIDI units";
+    desc.description = "MIDI pitch corresponding to the highest frequency to be included in the constant-Q transform";
     desc.minValue = 0;
     desc.maxValue = 127;
     desc.defaultValue = 84;
@@ -118,6 +120,7 @@
     desc.identifier = "tuning";
     desc.name = "Tuning Frequency";
     desc.unit = "Hz";
+    desc.description = "Frequency of concert A";
     desc.minValue = 420;
     desc.maxValue = 460;
     desc.defaultValue = 440;
@@ -127,6 +130,7 @@
     desc.identifier = "bpo";
     desc.name = "Bins per Octave";
     desc.unit = "bins";
+    desc.description = "Number of constant-Q transform bins per octave";
     desc.minValue = 2;
     desc.maxValue = 36;
     desc.defaultValue = 12;
@@ -137,6 +141,7 @@
     desc.identifier = "normalized";
     desc.name = "Normalized";
     desc.unit = "";
+    desc.description = "Whether to normalize each output column to unit maximum";
     desc.minValue = 0;
     desc.maxValue = 1;
     desc.defaultValue = 0;
@@ -203,9 +208,6 @@
     if (channels < getMinChannelCount() ||
 	channels > getMaxChannelCount()) return false;

-    std::cerr << "ConstantQSpectrogram::initialise: step " << stepSize << ", block "
-	      << blockSize << std::endl;
-
     setupConfig();

     m_cq = new ConstantQ(m_config);
@@ -214,15 +216,17 @@
     m_step = m_cq->gethop();
     m_block = m_cq->getfftlength();

-    //!!! stepSize != m_step should not be an error
-
-    if (stepSize != m_step ||
-        blockSize != m_block) {
+    if (blockSize != m_block) {
+        std::cerr << "ConstantQSpectrogram::initialise: ERROR: supplied block size " << blockSize << " differs from required block size " << m_block << ", initialise failing" << std::endl;
         delete m_cq;
         m_cq = 0;
         return false;
     }

+    if (stepSize != m_step) {
+        std::cerr << "ConstantQSpectrogram::initialise: NOTE: supplied step size " << stepSize << " differs from expected step size " << m_step << " (for block size = " << m_block << ")" << std::endl;
+    }
+
     return true;
 }

@@ -268,6 +272,7 @@
     d.identifier = "constantq";
     d.name = "Constant-Q Spectrogram";
     d.unit = "";
+    d.description = "Output of constant-Q transform, as a single vector per process block";
     d.hasFixedBinCount = true;
     d.binCount = m_bins;
--- a/plugins/KeyDetect.cpp	Wed Jan 30 12:42:04 2008 +0000
+++ b/plugins/KeyDetect.cpp	Wed Jan 30 13:33:23 2008 +0000
@@ -81,6 +81,7 @@
     ParameterDescriptor desc;
     desc.identifier = "tuning";
     desc.name = "Tuning Frequency";
+    desc.description = "Frequency of concert A";
     desc.unit = "Hz";
     desc.minValue = 420;
     desc.maxValue = 460;
@@ -91,6 +92,7 @@
     desc.identifier = "length";
     desc.name = "Window Length";
     desc.unit = "chroma frames";
+    desc.description = "Number of chroma analysis frames per key estimation";
     desc.minValue = 1;
     desc.maxValue = 30;
     desc.defaultValue = 10;
@@ -110,7 +112,7 @@
     if (param == "length") {
         return m_length;
     }
-    std::cerr << "WARNING: KeyDetect::getParameter: unknown parameter \""
+    std::cerr << "WARNING: KeyDetector::getParameter: unknown parameter \""
               << param << "\"" << std::endl;
     return 0.0;
 }
@@ -123,7 +125,7 @@
     } else if (param == "length") {
         m_length = int(value + 0.1);
     } else {
-        std::cerr << "WARNING: KeyDetect::setParameter: unknown parameter \""
+        std::cerr << "WARNING: KeyDetector::setParameter: unknown parameter \""
                   << param << "\"" << std::endl;
     }
 }
@@ -147,7 +149,7 @@
     m_blockSize = m_getKeyMode->getBlockSize();

     if (stepSize != m_stepSize || blockSize != m_blockSize) {
-        std::cerr << "KeyDetector::initialise: step/block sizes "
+        std::cerr << "KeyDetector::initialise: ERROR: step/block sizes "
                   << stepSize << "/" << blockSize << " differ from required "
                   << m_stepSize << "/" << m_blockSize << std::endl;
         delete m_getKeyMode;
@@ -191,6 +193,7 @@
     d.identifier = "tonic";
     d.name = "Tonic Pitch";
     d.unit = "";
+    d.description = "Tonic of the estimated key (from C = 1 to B = 12)";
     d.hasFixedBinCount = true;
     d.binCount = 1;
     d.hasKnownExtents = true;
@@ -204,6 +207,7 @@
     d.identifier = "mode";
     d.name = "Key Mode";
     d.unit = "";
+    d.description = "Major or minor mode of the estimated key (major = 0, minor = 1)";
     d.hasFixedBinCount = true;
     d.binCount = 1;
     d.hasKnownExtents = true;
@@ -211,13 +215,13 @@
     d.minValue = 0;
     d.maxValue = 1;
     d.quantizeStep = 1;
-    d.binNames.push_back("Major = 0, Minor = 1");
     d.sampleType = OutputDescriptor::OneSamplePerStep;
     list.push_back(d);

     d.identifier = "key";
     d.name = "Key";
     d.unit = "";
+    d.description = "Estimated key (from C major = 1 to B major = 12 and C minor = 13 to B minor = 24)";
     d.hasFixedBinCount = true;
     d.binCount = 1;
     d.hasKnownExtents = true;
@@ -225,7 +229,6 @@
     d.minValue = 1;
     d.maxValue = 24;
     d.quantizeStep = 1;
-    d.binNames.erase(d.binNames.begin(),d.binNames.end());
     d.sampleType = OutputDescriptor::OneSamplePerStep;
     list.push_back(d);
--- a/plugins/MFCCPlugin.cpp	Wed Jan 30 12:42:04 2008 +0000
+++ b/plugins/MFCCPlugin.cpp	Wed Jan 30 13:33:23 2008 +0000
@@ -90,7 +90,7 @@
     desc.identifier = "nceps";
     desc.name = "Number of Coefficients";
     desc.unit = "";
-    //!!! descriptions -- "including C0 if requested"
+    desc.description = "Number of MFCCs to return, starting from C0 if \"Include C0\" is specified or from C1 otherwise";
     desc.minValue = 1;
     desc.maxValue = 40;
     desc.defaultValue = 20;
@@ -101,6 +101,7 @@
     desc.identifier = "logpower";
     desc.name = "Power for Mel Amplitude Logs";
     desc.unit = "";
+    desc.description = "Power to raise the amplitude log values to before applying DCT.  Values greater than 1 may reduce contribution of noise";
     desc.minValue = 0;
     desc.maxValue = 5;
     desc.defaultValue = 1;
@@ -111,7 +112,7 @@
     desc.identifier = "wantc0";
     desc.name = "Include C0";
     desc.unit = "";
-    //!!! description
+    desc.description = "Whether to include the C0 (energy level) coefficient in the returned results";
     desc.minValue = 0;
     desc.maxValue = 1;
     desc.defaultValue = 1;
@@ -217,6 +218,7 @@
     d.identifier = "coefficients";
     d.name = "Coefficients";
     d.unit = "";
+    d.description = "MFCC values";
     d.hasFixedBinCount = true;
     d.binCount = m_bins;
     d.hasKnownExtents = false;
@@ -226,7 +228,7 @@

     d.identifier = "means";
     d.name = "Means of Coefficients";
-    //!!! descriptions
+    d.description = "Mean values of MFCCs across duration of audio input";
     d.sampleType = OutputDescriptor::FixedSampleRate;
     d.sampleRate = 1;
     list.push_back(d);
--- a/plugins/SimilarityPlugin.cpp	Wed Jan 30 12:42:04 2008 +0000
+++ b/plugins/SimilarityPlugin.cpp	Wed Jan 30 13:33:23 2008 +0000
@@ -122,12 +122,14 @@
 bool
 SimilarityPlugin::initialise(size_t channels, size_t stepSize, size_t blockSize)
 {
-    if (channels < getMinChannelCount() ||
-	channels > getMaxChannelCount()) return false;
+    if (channels < getMinChannelCount()) return false;
+
+    // Using more than getMaxChannelCount is not actually a problem
+    // for us.  Using "incorrect" step and block sizes would be fine
+    // for timbral or chroma similarity, but will break rhythmic
+    // similarity, so we'd better enforce these.

     if (stepSize != getPreferredStepSize()) {
-        //!!! actually this perhaps shouldn't be an error... similarly
-        //using more than getMaxChannelCount channels
         std::cerr << "SimilarityPlugin::initialise: supplied step size "
                   << stepSize << " differs from required step size "
                   << getPreferredStepSize() << std::endl;
--- a/plugins/TonalChangeDetect.cpp	Wed Jan 30 12:42:04 2008 +0000
+++ b/plugins/TonalChangeDetect.cpp	Wed Jan 30 13:33:23 2008 +0000
@@ -87,8 +87,7 @@

 std::string TonalChangeDetect::getDescription() const
 {
-    //!!!
-    return "";
+    return "Detect and return the positions of harmonic changes such as chord boundaries";
 }

 std::string TonalChangeDetect::getMaker() const