changeset 48:3b4572153ce3

* Similarity -> single user control rather than separate weighting * Key detector -> correct reported min/max values for outputs * Start some documentation
author Chris Cannam <c.cannam@qmul.ac.uk>
date Mon, 21 Jan 2008 18:05:28 +0000
parents f8c5f11e60a6
children fc88b465548a
files README.txt plugins/BeatTrack.cpp plugins/KeyDetect.cpp plugins/SimilarityPlugin.cpp qm-vamp-plugins.cat qm-vamp-plugins.pro
diffstat 6 files changed, 320 insertions(+), 45 deletions(-) [+]
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/README.txt	Mon Jan 21 18:05:28 2008 +0000
@@ -0,0 +1,225 @@
+
+QM Vamp Plugins
+===============
+
+Vamp audio feature extraction plugins from Queen Mary, University of London.
+Version 1.4.
+
+For more information about Vamp plugins, see http://www.vamp-plugins.org/
+and http://www.sonicvisualiser.org/.
+
+
+New In This Release
+===================
+
+This release contains a new plugin to estimate timbral and rhythmic
+similarity between multiple audio tracks, a plugin for structural
+segmentation of music audio, and a Mel-frequency cepstral coefficients
+calculation plugin.
+
+This release also includes fixes to the existing key detector and
+chromagram plugins.
+
+
+To Install
+==========
+
+Installation depends on your operating system.
+
+    Windows -> Copy qm-vamp-plugins.dll and qm-vamp-plugins.cat to
+               C:\Program Files\Vamp Plugins\
+
+    OS/X    -> Copy qm-vamp-plugins.dylib and qm-vamp-plugins.cat to
+               $HOME/Library/Audio/Plug-Ins/Vamp/
+	       or
+	       /Library/Audio/Plug-Ins/Vamp/
+
+    Linux   -> Copy qm-vamp-plugins.so and qm-vamp-plugins.cat to
+               $HOME/vamp/
+	       or
+	       /usr/local/lib/vamp/
+	       or 
+	       /usr/lib/vamp/
+
+
+Plugins Included
+================
+
+This plugin set includes the following plugins:
+
+   * Note onset detector
+   * Beat tracker and tempo estimator
+   * Key estimator and tonal change detector
+   * Segmenter, to divide a track into a consistent sequence of segments
+   * Timbral and rhythmic similarity between audio tracks
+   * Chromagram, constant-Q spectrogram, and MFCC calculation plugins
+
+More details about the plugins follow.
+
+
+Note Onset Detector
+-------------------
+
+Identifier:	qm-onsetdetector
+Authors:	Chris Duxbury, Juan Pablo Bello and Christian Landone
+Category:	Time > Onsets
+
+References:	C. Duxbury, J. P. Bello, M. Davies and M. Sandler.
+		Complex domain Onset Detection for Musical Signals.
+		In Proceedings of the 6th Conference on Digital Audio
+		Effects (DAFx-03). London, UK. September 2003.
+
+		D. Stowell and M. D. Plumbley.
+		Adaptive whitening for improved real-time audio onset
+		detection.
+		In Proceedings of the International Computer Music
+		Conference (ICMC'07), August 2007.
+
+		D. Barry, D. Fitzgerald, E. Coyle and B. Lawlor.
+		Drum Source Separation using Percussive Feature
+		Detection and Spectral Modulation.
+		ISSC 2005
+
+This plugin analyses a single channel of audio and estimates the
+locations of note onsets within the music.
+
+It has three outputs: the note onset positions, the onset detection
+function used in estimating onset positions, and a smoothed version of
+the detection function that is used in the peak-picking phase.
+
+
+Tempo and Beat Tracker
+----------------------
+
+Identifier:	qm-tempotracker
+Authors:	Matthew Davies and Christian Landone
+Category:       Time > Tempo
+
+References:	M. E. P. Davies and M. D. Plumbley.
+		Context-dependent beat tracking of musical audio.
+		Technical Report C4DM-TR-06-02. 5 April 2006.
+
+		M. E. P. Davies and M. D. Plumbley.
+		Beat Tracking With A Two State Model.
+		In Proceedings of the IEEE International Conference 
+		on Acoustics, Speech and Signal Processing (ICASSP 2005),
+		Vol. 3, pp241-244 Philadelphia, USA, March 19-23, 2005.
+
+This plugin analyses a single channel of audio and estimates the
+locations of metrical beats and the resulting tempo of the music.
+
+It has three outputs: the beat positions, an ongoing estimate of tempo
+where available, and the onset detection function used in estimating
+beat positions.
+
+
+
+Key Detector
+------------
+
+Identifier:	qm-keydetector
+Authors:	Katy Noland and Christian Landone
+Category:	Key and Tonality
+
+References:	K. Noland and M. Sandler.
+		Signal Processing Parameters for Tonality Estimation.
+		In Proceedings of Audio Engineering Society 122nd Convention,
+		Vienna, 2007.
+
+This plugin analyses a single channel of audio and continuously
+estimates the key of the music.
+
+It has three outputs: the tonic pitch of the key; a major or minor
+mode flag; and key (combining the tonic and major/minor into a single
+value).  These outputs have the values:
+
+  Tonic pitch: C = 1, C#/Db = 2, ..., B = 12
+  Major/minor mode: major = 0, minor = 1
+  Key: C major = 1, C#/Db major = 2, ..., B major = 12
+       C minor = 13, C#/Db minor = 14, ..., B minor = 24
+
+The outputs are also labelled with pitch or key as text.
+
+
+Tonal Change
+------------
+
+Identifier:	qm-tonalchange
+Authors:	Chris Harte and Martin Gasser
+Category:	Key and Tonality
+
+References:	C. A. Harte and M. Sandler.
+		Automatic chord identification using a quantised chromagram.
+		In Proceedings of the 118th Convention of the Audio
+		Engineering Society, Barcelona, Spain, May 28-31 2005.
+
+
+Segmenter
+---------
+
+Identifier:	qm-segmenter
+Authors:	Mark Levy
+Category:	Classification
+
+References:	M. Levy and M. Sandler.
+		Structural segmentation of musical audio by constrained
+		clustering.
+		IEEE Transactions on Audio, Speech, and Language Processing,
+		February 2008.
+
+
+
+Similarity
+----------
+
+Identifier:	qm-similarity
+Authors:	Mark Levy, Kurt Jacobson and Chris Cannam
+Category:	Classification
+
+References:	M. Levy and M. Sandler.
+		Lightweight measures for timbral similarity of musical audio.
+		In Proceedings of the 1st ACM workshop on Audio and Music
+		Computing Multimedia, Santa Barbara, 2006.
+
+		K. Jacobson.
+		A Multifaceted Approach to Music Similarity.
+		In Proceedings of the Seventh International Conference on Music
+		Information Retrieval (ISMIR), 2006.
+
+
+Constant-Q Spectrogram
+----------------------
+
+Identifier:	qm-constantq
+Authors:	Christian Landone
+Category:	Visualisation
+
+References:	J. Brown.
+		Calculation of a constant Q spectral transform.
+		Journal of the Acoustical Society of America, 89(1):
+		425-434, 1991.
+
+
+Chromagram
+----------
+
+Identifier:	qm-chromagram
+Authors:	Christian Landone
+Category:	Visualisation
+
+References:	
+
+
+Mel-Frequency Cepstral Coefficients
+-----------------------------------
+
+Identifier:	qm-mfcc
+Authors:	Nicolas Chetry and Chris Cannam
+Category:	Low Level Features
+
+References:	B. Logan.
+		Mel-Frequency Cepstral Coefficients for Music Modeling.
+		In Proceedings of the First International Symposium on Music
+		Information Retrieval (ISMIR), 2000.
+
+
--- a/plugins/BeatTrack.cpp	Fri Jan 18 18:11:01 2008 +0000
+++ b/plugins/BeatTrack.cpp	Mon Jan 21 18:05:28 2008 +0000
@@ -87,7 +87,7 @@
 string
 BeatTracker::getCopyright() const
 {
-    return "Copyright (c) 2006-2007 - All Rights Reserved";
+    return "Copyright (c) 2006-2008 - All Rights Reserved";
 }
 
 BeatTracker::ParameterList
--- a/plugins/KeyDetect.cpp	Fri Jan 18 18:11:01 2008 +0000
+++ b/plugins/KeyDetect.cpp	Mon Jan 21 18:05:28 2008 +0000
@@ -196,8 +196,8 @@
     d.binCount = 1;
     d.hasKnownExtents = true;
     d.isQuantized = true;
-    d.minValue = 0;
-    d.maxValue = 11;
+    d.minValue = 1;
+    d.maxValue = 12;
     d.quantizeStep = 1;
     d.sampleType = OutputDescriptor::OneSamplePerStep;
     list.push_back(d);
@@ -223,8 +223,8 @@
     d.binCount = 1;
     d.hasKnownExtents = true;
     d.isQuantized = true;
-    d.minValue = 0;
-    d.maxValue = 23;
+    d.minValue = 1;
+    d.maxValue = 24;
     d.quantizeStep = 1;
     d.binNames.erase(d.binNames.begin(),d.binNames.end());
     d.sampleType = OutputDescriptor::OneSamplePerStep;
@@ -328,8 +328,8 @@
 const char *
 KeyDetector::getKeyName(int index)
 {
-	// Keys are numbered with 1 => C, 12 => B
-	// This is based on chromagram base set to a C in qm-dsp's GetKeyMode.cpp
+    // Keys are numbered with 1 => C, 12 => B
+    // This is based on chromagram base set to a C in qm-dsp's GetKeyMode.cpp
     static const char *names[] = {
         "C", "C# / Db", "D", "D# / Eb",
         "E", "F", "F# / Gb", "G",
--- a/plugins/SimilarityPlugin.cpp	Fri Jan 18 18:11:01 2008 +0000
+++ b/plugins/SimilarityPlugin.cpp	Mon Jan 21 18:05:28 2008 +0000
@@ -39,7 +39,7 @@
     m_chromagram(0),
     m_decimator(0),
     m_featureColumnSize(20),
-    m_rhythmWeighting(0.f),
+    m_rhythmWeighting(0.5f),
     m_rhythmClipDuration(4.f), // seconds
     m_rhythmClipOrigin(40.f), // seconds
     m_rhythmClipFrameSize(0),
@@ -290,17 +290,20 @@
     ParameterDescriptor desc;
     desc.identifier = "featureType";
     desc.name = "Feature Type";
-    desc.description = "Audio feature used for similarity measure.  Timbral: use the first 20 MFCCs (19 plus C0).  Chromatic: use 12 bin-per-octave chroma.";
+    desc.description = "Audio feature used for similarity measure.  Timbral: use the first 20 MFCCs (19 plus C0).  Chromatic: use 12 bin-per-octave chroma.  Rhythmic: compare beat spectra of short regions.";
     desc.unit = "";
     desc.minValue = 0;
-    desc.maxValue = 1;
-    desc.defaultValue = 0;
+    desc.maxValue = 4;
+    desc.defaultValue = 1;
     desc.isQuantized = true;
     desc.quantizeStep = 1;
-    desc.valueNames.push_back("Timbral (MFCC)");
-    desc.valueNames.push_back("Chromatic (Chroma)");
+    desc.valueNames.push_back("Timbre");
+    desc.valueNames.push_back("Timbre and Rhythm");
+    desc.valueNames.push_back("Chroma");
+    desc.valueNames.push_back("Chroma and Rhythm");
+    desc.valueNames.push_back("Rhythm only");
     list.push_back(desc);	
-	
+/*
     desc.identifier = "rhythmWeighting";
     desc.name = "Influence of Rhythm";
     desc.description = "Proportion of similarity measure made up from rhythmic similarity component, from 0 (entirely timbral or chromatic) to 100 (entirely rhythmic).";
@@ -308,11 +311,10 @@
     desc.minValue = 0;
     desc.maxValue = 100;
     desc.defaultValue = 0;
-    desc.isQuantized = true;
-    desc.quantizeStep = 1;
+    desc.isQuantized = false;
     desc.valueNames.clear();
     list.push_back(desc);	
-	
+*/
     return list;
 }
 
@@ -320,11 +322,28 @@
 SimilarityPlugin::getParameter(std::string param) const
 {
     if (param == "featureType") {
-        if (m_type == TypeMFCC) return 0;
-        else if (m_type == TypeChroma) return 1;
-        else return 0;
-    } else if (param == "rhythmWeighting") {
-        return nearbyint(m_rhythmWeighting * 100.0);
+
+        if (m_rhythmWeighting > m_allRhythm) {
+            return 4;
+        }
+
+        switch (m_type) {
+
+        case TypeMFCC:
+            if (m_rhythmWeighting < m_noRhythm) return 0;
+            else return 1;
+            break;
+
+        case TypeChroma:
+            if (m_rhythmWeighting < m_noRhythm) return 2;
+            else return 3;
+            break;
+        }            
+
+        return 1;
+
+//    } else if (param == "rhythmWeighting") {
+//        return nearbyint(m_rhythmWeighting * 100.0);
     }
 
     std::cerr << "WARNING: SimilarityPlugin::getParameter: unknown parameter \""
@@ -336,15 +355,27 @@
 SimilarityPlugin::setParameter(std::string param, float value)
 {
     if (param == "featureType") {
+
         int v = int(value + 0.1);
-        Type prevType = m_type;
-        if (v == 0) m_type = TypeMFCC;
-        else if (v == 1) m_type = TypeChroma;
-        if (m_type != prevType) m_blockSize = 0;
+
+        Type newType = m_type;
+
+        switch (v) {
+        case 0: newType = TypeMFCC; m_rhythmWeighting = 0.0f; break;
+        case 1: newType = TypeMFCC; m_rhythmWeighting = 0.5f; break;
+        case 2: newType = TypeChroma; m_rhythmWeighting = 0.0f; break;
+        case 3: newType = TypeChroma; m_rhythmWeighting = 0.5f; break;
+        case 4: newType = TypeMFCC; m_rhythmWeighting = 1.f; break;
+        }
+
+        if (newType != m_type) m_blockSize = 0;
+
+        m_type = newType;
         return;
-    } else if (param == "rhythmWeighting") {
-        m_rhythmWeighting = value / 100;
-        return;
+
+//    } else if (param == "rhythmWeighting") {
+//        m_rhythmWeighting = value / 100;
+//        return;
     }
 
     std::cerr << "WARNING: SimilarityPlugin::setParameter: unknown parameter \""
@@ -629,22 +660,40 @@
         v[i] = variance;
     }
 
-    // "Despite the fact that MFCCs extracted from music are clearly
-    // not Gaussian, [14] showed, somewhat surprisingly, that a
-    // similarity function comparing single Gaussians modelling MFCCs
-    // for each track can perform as well as mixture models.  A great
-    // advantage of using single Gaussians is that a simple closed
-    // form exists for the KL divergence." -- Mark Levy, "Lightweight
-    // measures for timbral similarity of musical audio"
-    // (http://www.elec.qmul.ac.uk/easaier/papers/mlevytimbralsimilarity.pdf)
-
-    KLDivergence kld;
     FeatureMatrix distances(m_channels);
 
-    for (int i = 0; i < m_channels; ++i) {
-        for (int j = 0; j < m_channels; ++j) {
-            double d = kld.distance(m[i], v[i], m[j], v[j]);
-            distances[i].push_back(d);
+    if (m_type == TypeMFCC) {
+
+        // "Despite the fact that MFCCs extracted from music are
+        // clearly not Gaussian, [14] showed, somewhat surprisingly,
+        // that a similarity function comparing single Gaussians
+        // modelling MFCCs for each track can perform as well as
+        // mixture models.  A great advantage of using single
+        // Gaussians is that a simple closed form exists for the KL
+        // divergence." -- Mark Levy, "Lightweight measures for
+        // timbral similarity of musical audio"
+        // (http://www.elec.qmul.ac.uk/easaier/papers/mlevytimbralsimilarity.pdf)
+
+        KLDivergence kld;
+
+        for (int i = 0; i < m_channels; ++i) {
+            for (int j = 0; j < m_channels; ++j) {
+                double d = kld.distanceGaussian(m[i], v[i], m[j], v[j]);
+                distances[i].push_back(d);
+            }
+        }
+
+    } else {
+
+        // Chroma are histograms already
+
+        KLDivergence kld;
+
+        for (int i = 0; i < m_channels; ++i) {
+            for (int j = 0; j < m_channels; ++j) {
+                double d = kld.distanceDistribution(m[i], m[j], true);
+                distances[i].push_back(d);
+            }
         }
     }
     
--- a/qm-vamp-plugins.cat	Fri Jan 18 18:11:01 2008 +0000
+++ b/qm-vamp-plugins.cat	Mon Jan 21 18:05:28 2008 +0000
@@ -4,5 +4,6 @@
 vamp:qm-vamp-plugins:qm-constantq::Visualisation
 vamp:qm-vamp-plugins:qm-tonalchange::Key and Tonality
 vamp:qm-vamp-plugins:qm-keydetector::Key and Tonality
-vamp:qm-vamp-plugins:qm-segmenter::Key and Tonality
+vamp:qm-vamp-plugins:qm-segmenter::Classification
 vamp:qm-vamp-plugins:qm-similarity::Classification
+vamp-qm-vamp-plugins:qm-mfcc::Low Level Features
--- a/qm-vamp-plugins.pro	Fri Jan 18 18:11:01 2008 +0000
+++ b/qm-vamp-plugins.pro	Mon Jan 21 18:05:28 2008 +0000
@@ -1,7 +1,7 @@
 
 TEMPLATE = lib
 
-CONFIG += plugin warn_on release
+CONFIG += plugin warn_on debug
 CONFIG -= qt
 
 linux-g++:QMAKE_CXXFLAGS_RELEASE += -DNDEBUG -O3 -march=pentium4 -msse -msse2