Mercurial > hg > mirex2013

--- a/vamp-plugins_abstract/qmvamp-mirex2013.bib	Fri Sep 06 11:15:56 2013 +0100
+++ b/vamp-plugins_abstract/qmvamp-mirex2013.bib	Fri Sep 06 11:32:30 2013 +0100
@@ -8,22 +8,34 @@
   year = {2007}
 }

- @inproceedings{dan2007a,
+@inproceedings{dan2007a,
   author = {Dan Stowell and Mark D. Plumbley},
   title = {Adaptive whitening for improved real-time audio onset detection},
   booktitle = {Proceedings of the International Computer Music Conference (ICMC'07)},
   year = {2007}
 }

- @inproceedings{chris2003a,
+@inproceedings{chris2003a,
   author = {Chris Duxbury and Juan Pablo Bello and Mike Davies and Mark Sandler},
   title = {Complex Domain Onset Detection for Musical Signals},
   booktitle = {Proceedings of the 6th Int. Conference on Digital Audio Effects (DAFx-03) },
   year = {2003}
 }
+
 @inproceedings{dan2005a,
   author = {Dan Barry and Derry Fitzgerald and Eugene Coyle and Bob Lawlor},
   title = {Drum Source Separation using Percussive Feature Detection and Spectral Modulation},
   booktitle = {ISSC 2005},
   year = {2005}
 }
+
+ @article{mark2008a,
+  author = {Mark Levy and Mark Sandler},
+  title = {Structural Segmentation of Musical Audio by Constrained Clustering},
+  journal = {IEEE Transactions on Audio, Speech, and Language Processing},
+  month = {February},
+  number = {2},
+  pages = {318-326},
+  volume = {16},
+  year = {2008}
+}
--- a/vamp-plugins_abstract/qmvamp-mirex2013.tex	Fri Sep 06 11:15:56 2013 +0100
+++ b/vamp-plugins_abstract/qmvamp-mirex2013.tex	Fri Sep 06 11:32:30 2013 +0100
@@ -49,20 +49,14 @@
 describe vamp\ldots
 describe rationale supporting submission\ldots

-\section{Audio Beat Tracking}
-
-\subsection{Tempo and Beat Tracker}
+\section{Audio Beat Tracking and Audio Tempo Estimation}

 The Tempo and Beat Tracker\cite{matthew2007a} VAMP plugin analyses a single channel of audio and estimates the positions of metrical beats within the music (the equivalent of a human listener tapping their foot to the beat).

 The Tempo and Beat Tracker Vamp plugin was written by Matthew Davies and Christian Landone.

-\section{Audio Chord Estimation}
-
 \section{Audio Key Detection}

-\subsection{Key Detector}
-
 [Need reference]

 The Key Detector VAMP pluginan alyses a single channel of audio and continuously estimates the key of the music by comparing the degree to which a block-by-block chromagram correlates to the stored key profiles for each major and minor key.
@@ -81,9 +75,23 @@

 \section{Audio Structural Segmentation}

+The Segmenter VAMP plugin divides a single channel of music up into structurally consistent segments. It returns a numeric value (the segment type) for each moment at which a new segment starts.

+For music with clearly tonally distinguishable sections such as verse, chorus, etc., segments with the same type may be expected to be similar to one another in some structural sense. For example, repetitions of the chorus are likely to share a segment type.

-\section{Audio Tempo Estimation}
+The plugin only attempts to identify similar segments; it does not attempt to label them. For example, it makes no attempt to tell you which segment is the chorus.
+
+Note that this plugin does a substantial amount of processing after receiving all of the input audio data, before it produces any results.
+
+\subsection{Method}
+
+The method, described in~\cite{mark2008a}, relies upon structural/timbral similarity to obtain the high-level song structure. This is based on the assumption that the distributions of timbre features are similar over corresponding structural elements of the music.
+
+The algorithm works by obtaining a frequency-domain representation of the audio signal using a Constant-Q transform, a Chromagram or Mel-Frequency Cepstral Coefficients (MFCC) as underlying features (the particular feature is selectable as a parameter). The extracted features are normalised in accordance with the MPEG-7 standard (NASE descriptor), which means the spectrum is converted to decibel scale and each spectral vector is normalised by the RMS energy envelope. The value of this envelope is stored for each processing block of audio. This is followed by the extraction of 20 principal components per block using PCA, yielding a sequence of 21 dimensional feature vectors where the last element in each vector corresponds to the energy envelope.
+
+A 40-state Hidden Markov Model is then trained on the whole sequence of features, with each state of the HMM corresponding to a specific timbre type. This process partitions the timbre-space of a given track into 40 possible types. The important assumption of the model is that the distribution of these features remain consistent over a structural segment. After training and decoding the HMM, the song is assigned a sequence of timbre-features according to specific timbre-type distributions for each possible structural segment.
+
+The segmentation itself is computed by clustering timbre-type histograms. A series of histograms are created over a sliding window which are grouped into M clusters by an adapted soft k-means algorithm. Each of these clusters will correspond to a specific segment-type of the analyzed song. Reference histograms, iteratively updated during clustering, describe the timbre distribution for each segment. The segmentation arises from the final cluster assignments.

 \bibliography{qmvamp-mirex2013}