Mercurial > hg > mirex2013

--- a/vamp-plugins_abstract/qmvamp-mirex2013.tex	Fri Sep 06 19:54:27 2013 +0100
+++ b/vamp-plugins_abstract/qmvamp-mirex2013.tex	Fri Sep 06 21:13:34 2013 +0100
@@ -38,35 +38,62 @@
 %
 \begin{abstract}

-In this submission we intend to test several Vamp plugins for various tasks. Most of these plugins are no longer state-of-the-art, and were developed a few years ago. All the methods/algorithms implemented on this set of plugins are described in the literature (and referenced throughout this paper).
+In this submission we intend to test several Vamp plugins for various
+tasks. Most of these plugins are no longer state-of-the-art, and were
+developed a few years ago. All the methods/algorithms implemented on
+this set of plugins are described in the literature (and referenced
+throughout this paper).
+
 \end{abstract}
 %
 \section{Introduction}\label{sec:introduction}

-The Vamp plugin format\footnote{http://vamp-plugins.org/} was developed at the Centre for Digital Music (C4DM) at Queen Mary, University of London, during 2005-2006 and published as an open specification, alongside the Sonic Visualiser~\cite{sonicvisualise2010} audio analysis application, in response to a desire to publish algorithms developed at the Centre in a form in which they could be immediately useful to people outside this research field.
+The Vamp plugin format\footnote{http://vamp-plugins.org/} was
+developed at the Centre for Digital Music (C4DM) at Queen Mary,
+University of London, during 2005-2006 and published as an open
+specification, alongside the Sonic
+Visualiser~\cite{sonicvisualise2010} audio analysis application, in
+response to a desire to publish algorithms developed at the Centre in
+a form in which they could be immediately useful to people outside
+this research field.

-In subsequent years the Vamp plugin format has become a moderately popular means of distributing methods from the Centre and other research groups. Some dozens of Vamp plugins are now available from groups such as the MTG at UPF in Barcelona, the SMC at INESC in Porto, the BBC, and others as well as from the Centre for Digital Music.
+In subsequent years the Vamp plugin format has become a moderately
+popular means of distributing methods from the Centre and other
+research groups. Some dozens of Vamp plugins are now available from
+groups such as the MTG at UPF in Barcelona, the SMC at INESC in Porto,
+the BBC, and others as well as from the Centre for Digital Music.

- These plugins are provided as a single library file, made available in binary form for Windows, OS/X, and Linux from the Centre for Digital Music's download page\footnote{http://vamp-plugins.org/plugin-doc/qm-vamp-plugins.html}. All plugins are fully open-source --- you can find the source code in the SoundSoftware website\footnote{http://code.soundsoftware.ac.uk/projects/qm-vamp-plugins}.
+ These plugins are provided as a single library file, made available
+ in binary form for Windows, OS/X, and Linux from the Centre for
+ Digital Music's download
+ page\footnote{http://vamp-plugins.org/plugin-doc/qm-vamp-plugins.html}. All
+ plugins are fully open-source --- you can find the source code in the
+ SoundSoftware
+ website\footnote{http://code.soundsoftware.ac.uk/projects/qm-vamp-plugins}.

-(For a complete overview of this submission across all of the tasks and
-plugins it covers, please see the relevant repository at the
-SoundSoftware
-site\footnote{http://code.soundsoftware.ac.uk/projects/mirex2013}.)
+(For a complete overview of this submission across all of the tasks
+ and plugins it covers, please see the relevant repository at the
+ SoundSoftware
+ site\footnote{http://code.soundsoftware.ac.uk/projects/mirex2013}.)

 \section{Audio Beat Tracking}

 \subsection{Tempo and Beat Tracker Plugin}
 \label{tempo_and_beat_tracker}

-The Tempo and Beat Tracker\cite{matthew2007a} Vamp plugin analyses a single channel of audio and estimates the positions of metrical beats within the music (the equivalent of a human listener tapping their foot to the beat).
+The Tempo and Beat Tracker\cite{matthew2007a} Vamp plugin analyses a
+single channel of audio and estimates the positions of metrical beats
+within the music (the equivalent of a human listener tapping their
+foot to the beat).

-The Tempo and Beat Tracker Vamp plugin was written by Matthew Davies and Christian Landone.
+The Tempo and Beat Tracker Vamp plugin was written by Matthew Davies
+and Christian Landone.

 \subsection{BeatRoot Plugin}

 The BeatRoot Vamp Plugin is an open source Vamp plugin library that
-implements the BeatRoot beat-tracking method of Simon Dixon\cite{!!!!}.
+implements the BeatRoot beat-tracking method of Simon
+Dixon\cite{!!!!}.

 This plugin library is available online as a free, open source
 download from the Centre for Digital Music at Queen Mary, University
@@ -78,9 +105,16 @@

 \section{Audio Key Detection}

-The Key Detector Vamp plugin anlyses a single channel of audio and continuously estimates the key of the music by comparing the degree to which a block-by-block chromagram correlates to the stored key profiles for each major and minor key.
+The Key Detector Vamp plugin anlyses a single channel of audio and
+continuously estimates the key of the music by comparing the degree to
+which a block-by-block chromagram correlates to the stored key
+profiles for each major and minor key.

-This plugin uses the correlation method described in \cite{krumhansl1990} and \cite{gomez2006}, but using different tone profiles. The key profiles used in this implementation are drawn from analysis of Book I of the Well Tempered Klavier by J S Bach, recorded at A=440 equal temperament, as described in \cite{noland2007signal}.
+This plugin uses the correlation method described in
+\cite{krumhansl1990} and \cite{gomez2006}, but using different tone
+profiles. The key profiles used in this implementation are drawn from
+analysis of Book I of the Well Tempered Klavier by J S Bach, recorded
+at A=440 equal temperament, as described in \cite{noland2007signal}.

 The Key Detector Vamp plugin was written by Katy Noland and Christian Landone.

@@ -149,35 +183,84 @@

 \subsection{Note Onset Detector Plugin}

-The Note Onset Detector Vamp plugin analyses a single channel of audio and estimates the onset times of notes within the music -- that is, the times at which notes and other audible events begin.
+The Note Onset Detector Vamp plugin analyses a single channel of audio
+and estimates the onset times of notes within the music -- that is,
+the times at which notes and other audible events begin.

-It calculates an onset likelihood function for each spectral frame, and picks peaks in a smoothed version of this function. The plugin is non-causal, returning all results at the end of processing.
+It calculates an onset likelihood function for each spectral frame,
+and picks peaks in a smoothed version of this function. The plugin is
+non-causal, returning all results at the end of processing.

-Please read refer to the following publication for the basic detection methods~\cite{chris2003a}. The Adaptative Whitening technique is described in~\cite{dan2007a}. The Percussion Onset detector is described in~\cite{dan2005a}.
+Please read refer to the following publication for the basic detection
+methods~\cite{chris2003a}. The Adaptative Whitening technique is
+described in~\cite{dan2007a}. The Percussion Onset detector is
+described in~\cite{dan2005a}.

 \subsection{OnsetDS Plugin}

-OnsetDS is an onset detector that uses Dan Stowell's OnsetsDS library\footnote{http://onsetsds.sourceforge.net/}, described in~\cite{dan2007a}.
+OnsetDS is an onset detector that uses Dan Stowell's OnsetsDS
+library\footnote{http://onsetsds.sourceforge.net/}, described
+in~\cite{dan2007a}.

-The purpose of OnsetsDS is to provide capabilities for FFT-based onset detection that works very efficiently in real-time, and can detect onsets pretty well in a broad variety of musical signals, with a fast reaction time.
+The purpose of OnsetsDS is to provide capabilities for FFT-based onset
+detection that works very efficiently in real-time, and can detect
+onsets pretty well in a broad variety of musical signals, with a fast
+reaction time.

-It is not specialised for any particular type of signal. Nor is it particularly tailored towards non-real-time use (if we were working in non-real-time there are extra things we could do to improve the precision). Its efficiency and fast reaction are designed with general real-time musical applications in mind.
+It is not specialised for any particular type of signal. Nor is it
+particularly tailored towards non-real-time use (if we were working in
+non-real-time there are extra things we could do to improve the
+precision). Its efficiency and fast reaction are designed with general
+real-time musical applications in mind.

 \section{Audio Structural Segmentation}

 \subsection{QM Segmenter Plugin}

-The Segmenter Vamp plugin divides a single channel of music up into structurally consistent segments. It returns a numeric value (the segment type) for each moment at which a new segment starts.
+The Segmenter Vamp plugin divides a single channel of music up into
+structurally consistent segments. It returns a numeric value (the
+segment type) for each moment at which a new segment starts.

-For music with clearly tonally distinguishable sections such as verse, chorus, etc., segments with the same type may be expected to be similar to one another in some structural sense. For example, repetitions of the chorus are likely to share a segment type.
+For music with clearly tonally distinguishable sections such as verse,
+chorus, etc., segments with the same type may be expected to be
+similar to one another in some structural sense. For example,
+repetitions of the chorus are likely to share a segment type.

-The method, described in~\cite{mark2008a}, relies upon structural/timbral similarity to obtain the high-level song structure. This is based on the assumption that the distributions of timbre features are similar over corresponding structural elements of the music.
+The method, described in~\cite{mark2008a}, relies upon
+structural/timbral similarity to obtain the high-level song
+structure. This is based on the assumption that the distributions of
+timbre features are similar over corresponding structural elements of
+the music.

-The algorithm works by obtaining a frequency-domain representation of the audio signal using a Constant-Q transform, a Chromagram or Mel-Frequency Cepstral Coefficients (MFCC) as underlying features (the particular feature is selectable as a parameter). The extracted features are normalised in accordance with the MPEG-7 standard (NASE descriptor), which means the spectrum is converted to decibel scale and each spectral vector is normalised by the RMS energy envelope. The value of this envelope is stored for each processing block of audio. This is followed by the extraction of 20 principal components per block using PCA, yielding a sequence of 21 dimensional feature vectors where the last element in each vector corresponds to the energy envelope.
+The algorithm works by obtaining a frequency-domain representation of
+the audio signal using a Constant-Q transform, a Chromagram or
+Mel-Frequency Cepstral Coefficients (MFCC) as underlying features (the
+particular feature is selectable as a parameter). The extracted
+features are normalised in accordance with the MPEG-7 standard (NASE
+descriptor), which means the spectrum is converted to decibel scale
+and each spectral vector is normalised by the RMS energy envelope. The
+value of this envelope is stored for each processing block of
+audio. This is followed by the extraction of 20 principal components
+per block using PCA, yielding a sequence of 21 dimensional feature
+vectors where the last element in each vector corresponds to the
+energy envelope.

-A 40-state Hidden Markov Model is then trained on the whole sequence of features, with each state of the HMM corresponding to a specific timbre type. This process partitions the timbre-space of a given track into 40 possible types. The important assumption of the model is that the distribution of these features remain consistent over a structural segment. After training and decoding the HMM, the song is assigned a sequence of timbre-features according to specific timbre-type distributions for each possible structural segment.
+A 40-state Hidden Markov Model is then trained on the whole sequence
+of features, with each state of the HMM corresponding to a specific
+timbre type. This process partitions the timbre-space of a given track
+into 40 possible types. The important assumption of the model is that
+the distribution of these features remain consistent over a structural
+segment. After training and decoding the HMM, the song is assigned a
+sequence of timbre-features according to specific timbre-type
+distributions for each possible structural segment.

-The segmentation itself is computed by clustering timbre-type histograms. A series of histograms are created over a sliding window which are grouped into M clusters by an adapted soft k-means algorithm. Each of these clusters will correspond to a specific segment-type of the analyzed song. Reference histograms, iteratively updated during clustering, describe the timbre distribution for each segment. The segmentation arises from the final cluster assignments.
+The segmentation itself is computed by clustering timbre-type
+histograms. A series of histograms are created over a sliding window
+which are grouped into M clusters by an adapted soft k-means
+algorithm. Each of these clusters will correspond to a specific
+segment-type of the analyzed song. Reference histograms, iteratively
+updated during clustering, describe the timbre distribution for each
+segment. The segmentation arises from the final cluster assignments.

 \subsection{Segmentino}