changeset 74:176a0f04a499

Update abstract
author Chris Cannam
date Wed, 06 Jul 2016 14:29:54 +0100
parents a70e744f91e0
children 3b9882204565
files vamp-plugins_abstract/Makefile vamp-plugins_abstract/qmvamp-mirex2016.bib vamp-plugins_abstract/qmvamp-mirex2016.tex
diffstat 3 files changed, 528 insertions(+), 1 deletions(-) [+]
line wrap: on
line diff
--- a/vamp-plugins_abstract/Makefile	Wed Jul 06 13:53:12 2016 +0100
+++ b/vamp-plugins_abstract/Makefile	Wed Jul 06 14:29:54 2016 +0100
@@ -1,4 +1,4 @@
-all: qmvamp-mirex2013.pdf qmvamp-mirex2014.pdf qmvamp-mirex2015.pdf
+all: qmvamp-mirex2013.pdf qmvamp-mirex2014.pdf qmvamp-mirex2015.pdf qmvamp-mirex2016.pdf
 
 qmvamp-mirex2013.pdf: qmvamp-mirex2013.tex qmvamp-mirex2013.bib
 	( echo q | xelatex qmvamp-mirex2013 ) && bibtex qmvamp-mirex2013 && xelatex qmvamp-mirex2013 && xelatex qmvamp-mirex2013
@@ -9,7 +9,11 @@
 qmvamp-mirex2015.pdf: qmvamp-mirex2015.tex qmvamp-mirex2014.bib
 	( echo q | xelatex qmvamp-mirex2015 ) && bibtex qmvamp-mirex2015 && xelatex qmvamp-mirex2015 && xelatex qmvamp-mirex2015
 
+qmvamp-mirex2016.pdf: qmvamp-mirex2016.tex qmvamp-mirex2016.bib
+	( echo q | xelatex qmvamp-mirex2016 ) && bibtex qmvamp-mirex2016 && xelatex qmvamp-mirex2016 && xelatex qmvamp-mirex2016
+
 clean:
 	rm -f qmvamp-mirex2013.bbl qmvamp-mirex2013.aux qmvamp-mirex2013.blg qmvamp-mirex2013.log 
 	rm -f qmvamp-mirex2014.bbl qmvamp-mirex2014.aux qmvamp-mirex2014.blg qmvamp-mirex2014.log 
 	rm -f qmvamp-mirex2015.bbl qmvamp-mirex2015.aux qmvamp-mirex2015.blg qmvamp-mirex2015.log 
+	rm -f qmvamp-mirex2016.bbl qmvamp-mirex2016.aux qmvamp-mirex2016.blg qmvamp-mirex2016.log 
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/vamp-plugins_abstract/qmvamp-mirex2016.bib	Wed Jul 06 14:29:54 2016 +0100
@@ -0,0 +1,157 @@
+ @article{matthew2007a,
+  author = {Matthew E. P. Davies and Mark D. Plumbley},
+  title = {Context-Dependent Beat Tracking of Musical Audio},
+  journal = {IEEE Transactions on Audio, Speech and Language Processing},
+  number = {3},
+  pages = {1009-1020},
+  volume = {15},
+  year = {2007}
+}
+
+@article{ellis2007,
+ author = {D. P. W. Ellis},
+ title = {Beat Tracking by Dynamic Programming},
+ journal = {Journal of New Music Research},
+ volume = {37},
+ number = {1},
+ pages = {51-60},
+ year = {2007}
+} 
+
+@inproceedings{matthew2006,
+  author = {Matthew E. P. Davies and Mark D. Plumbley},
+  title = {A spectral difference approach to extracting downbeats in musical audio},
+  booktitle = {Proceedings of the 14th European Signal Processing Conference (EUSIPCO)},
+  year = {2006}
+}
+
+@inproceedings{dan2007a,
+  author = {Dan Stowell and Mark D. Plumbley},
+  title = {Adaptive whitening for improved real-time audio onset detection},
+  booktitle = {Proceedings of the International Computer Music Conference (ICMC'07)},
+  year = {2007}
+}
+
+@inproceedings{chris2003a,
+  author = {Chris Duxbury and Juan Pablo Bello and Mike Davies and Mark Sandler},
+  title = {Complex Domain Onset Detection for Musical Signals},
+  booktitle = {Proceedings of the 6th Int. Conference on Digital Audio Effects (DAFx-03) },
+  year = {2003}
+}
+
+@inproceedings{dan2005a,
+  author = {Dan Barry and Derry Fitzgerald and Eugene Coyle and Bob Lawlor},
+  title = {Drum Source Separation using Percussive Feature Detection and Spectral Modulation},
+  booktitle = {ISSC 2005},
+  year = {2005}
+}
+
+ @article{mark2008a,
+  author = {Mark Levy and Mark Sandler},
+  title = {Structural Segmentation of Musical Audio by Constrained Clustering},
+  journal = {IEEE Transactions on Audio, Speech, and Language Processing},
+  month = {February},
+  number = {2},
+  pages = {318-326},
+  volume = {16},
+  year = {2008}
+}
+
+@conference{noland2007signal,
+        title={Signal Processing Parameters for Tonality Estimation},
+        author={Noland, Katy and Sandler, Mark},
+        booktitle={Audio Engineering Society Convention 122},
+        month={May},
+        year={2007}
+}
+
+ @inproceedings{sonicvisualise2010,
+  author = {Chris Cannam and Christian Landone and Mark Sandler},
+  title = {Sonic Visualiser: An Open Source Application for Viewing, Analysing, and Annotating Music Audio Files},
+  booktitle = {Proceedings of the ACM Multimedia 2010 International Conference},
+  year = {2010}
+}
+
+@BOOK{krumhansl1990,
+    AUTHOR    = {C. L. Krumhansl},
+    TITLE     = {Cognitive Foundations of Musical Pitch},
+    PUBLISHER = {Oxford University Press},
+    YEAR      = {1990}
+}
+
+@article {gomez2006,
+    title = {Tonal description of polyphonic audio for music content processing},
+    journal = {{INFORMS} Journal on Computing, Special Cluster on Computation in Music},
+    volume = {18},
+    year = {2006},
+    author = {Emilia G{\'o}mez}
+}
+
+@incollection{mauch:md1:2010,
+	Author = {Matthias Mauch and Simon Dixon},
+	Title = {{MIREX} 2010: Chord Detection Using a Dynamic Bayesian Network},
+  booktitle = {Music Information Retrieval Evaluation Exchange (MIREX)},
+	Year = {2010}
+}
+
+ @inproceedings{matthias2010a,
+  author = {Matthias Mauch and Simon Dixon},
+  title = {Approximate Note Transcription for the Improved Identification of Difficult Chords},
+  booktitle = {Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR 2010)},
+  year = {2010}
+} 
+
+ @inproceedings{matthias2009a,
+  author = {Matthias Mauch and Katy C. Noland and Dixon, Simon},
+  title = {Using Musical Structure to Enhance Automatic Chord Transcription},
+  booktitle = {Proceedings of the 10th International Conference on Music Information Retrieval (ISMIR 2009)},
+  pages = {231-236},
+  year = {2009}
+} 
+
+ @phdthesis{matthiasphd,
+  author = {Matthias Mauch},
+  title = {Automatic Chord Transcription from Audio Using Computational Models of Musical Context},
+  school = {Queen Mary, University of London},
+  year = {2010}
+} 
+
+@misc{chris2012a,
+  author = {Chris Cannam},
+  title = {Unit testing: An audio research example},
+  howpublished = {Handout},
+  note = {One of the single-page handouts made available at DAFx and ISMIR 2012 tutorials. See http://www.soundsoftware.ac.uk/handouts-guides for more information.},
+  year = {2012}
+} 
+
+ @inproceedings{simon2006a,
+  author = {Simon Dixon},
+  title = {{MIREX} 2006 Audio Beat Tracking Evaluation: BeatRoot},
+  booktitle = {Music Information Retrieval Evaluation Exchange (MIREX)},
+  year = {2006}
+} 
+
+ @inproceedings{simon2001a,
+  author = {Simon Dixon},
+  title = {An Interactive Beat Tracking and Visualisation System},
+  booktitle = {Proceedings of the 2001 International Computer Music Conference (ICMC'2001)},
+  year = {2001}
+} 
+
+ @article{emmanouil2012a,
+  author = {Emmanouil Benetos and Dixon, Simon},
+  title = {A Shift-Invariant Latent Variable Model for Automatic Music Transcription},
+  journal = {Computer Music Journal},
+  number = {4},
+  pages = {81-94},
+  volume = {36},
+  year = {2012}
+}
+
+ @inproceedings{emmanouil2012b,
+  author = {Emmanouil Benetos and Simon Dixon},
+  title = {Multiple-{F0} Estimation and Note Tracking for {MIREX} 2012 using a Shift-Invariant Latent Variable Model},
+  booktitle = {Music Information Retrieval Evaluation Exchange (MIREX)},
+  year = {2012}
+} 
+
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/vamp-plugins_abstract/qmvamp-mirex2016.tex	Wed Jul 06 14:29:54 2016 +0100
@@ -0,0 +1,366 @@
+% -----------------------------------------------
+% Template for MIREX 2010
+% (based on ISMIR 2010 template)
+% -----------------------------------------------
+
+\documentclass{article}
+\usepackage{mirex2010,amsmath,cite}
+\usepackage{graphicx}
+
+% Title.
+% ------
+\title{MIREX 2016:\\Vamp Plugins from the Centre for Digital Music}
+
+% Single address
+% To use with only one author or several with the same address
+% ---------------
+\oneauthor
+{Chris Cannam, Emmanouil Benetos, Matthias Mauch, Matthew E. P. Davies,}
+{Simon Dixon, Christian Landone, Katy Noland, and Dan Stowell}
+{Queen Mary, University of London \\ {\em chris.cannam@eecs.qmul.ac.uk}}
+
+\begin{document}
+%
+\maketitle
+%
+\begin{abstract}
+
+In this submission we offer for evaluation several audio feature
+extraction plugins in Vamp format.
+
+Most of these plugins were also submitted to the 2013, 2014, and 2015
+editions of MIREX. The majority are unchanged here and may offer a
+useful baseline for comparison across years. One category (Audio
+Downbeat Estimation) sees a new submission, although it is not a new
+plugin. One plugin (Chordino) has been corrected after the
+side-effects of an earlier bug fix made it perform unexpectedly badly
+last year. The rest are unchanged from 2015.
+
+Some of these plugins represent efficient implementations based on
+modern work, while others are no longer state-of-the-art and were
+developed a few years ago. The methods implemented in this set of
+plugins are described in the literature and are referenced throughout
+this paper. All of the plugins are written in C++ and have been
+published under open source licences, in most cases the GPL.
+
+\end{abstract}
+%
+\section{Introduction}\label{sec:introduction}
+
+The Vamp plugin format\footnote{http://vamp-plugins.org/} was
+developed at the Centre for Digital Music (C4DM) at Queen Mary,
+University of London, during 2005-2006 in response to a desire to
+publish work in a form that would be immediately useful to people
+outside this research field. The Vamp plugin format was published with
+an open source SDK, alongside the Sonic
+Visualiser~\cite{sonicvisualise2010} audio analysis application which
+provided a useful host for Vamp plugins.
+
+In subsequent years the Vamp format has become a moderately popular
+means of distributing methods from the Centre and other research
+groups. Some dozens of Vamp plugins are now available from groups such
+as the Music Technology Group at UPF in Barcelona, the Sound and Music
+Computing group at INESC in Porto, the BBC, and others, as well as
+from the Centre for Digital Music.
+
+The plugins submitted for this evaluation are provided as a set of
+dynamic library files. Those with names starting ``QM'' are all
+provided in a single library file, the QM Vamp Plugins set, made
+available in binary form for Windows, OS/X, and Linux from the Centre
+for Digital Music's download
+page.\footnote{http://vamp-plugins.org/plugin-doc/qm-vamp-plugins.html} These
+plugins come from a number of authors who are credited in this
+abstract and in the plugins' accompanying documentation.
+
+In addition to the QM Vamp Plugins set, this submission contains a
+number of separate plugins: the Chordino and Segmentino plugins from
+Matthias Mauch; the BeatRoot Vamp Plugin from Simon Dixon; OnsetsDS
+from Dan Stowell; and the Silvet note transcription plugin from
+Emmanouil Benetos and Chris Cannam.
+
+The plugins are all provided as 64-bit Linux shared objects depending
+on GNU libc 2.15 or newer and GNU libstdc++ 3.4.15 or newer. Sonic
+Annotator v1.1 is also
+required\footnote{http://code.soundsoftware.ac.uk/projects/sonic-annotator/}
+in order to run the task scripts.
+
+For an overview of this submission across all of the tasks and plugins
+it covers, please see the relevant repository at the SoundSoftware
+site.\footnote{http://code.soundsoftware.ac.uk/projects/mirex2013/}
+
+\section{Submissions by MIREX Task}
+
+\subsection{Audio Beat Tracking}
+
+\subsubsection{QM Tempo and Beat Tracker}
+\label{tempo_and_beat_tracker}
+
+The QM Tempo and Beat Tracker\cite{matthew2007a} Vamp plugin analyses
+a single channel of audio and estimates the positions of metrical
+beats within the music.
+
+This plugin uses the complex-domain onset detection method from~\cite{chris2003a} with a hybrid of the two-state beat tracking model
+proposed in~\cite{matthew2007a} and a dynamic programming method based
+on~\cite{ellis2007}. 
+
+To identify the tempo, the onset detection function is partitioned
+into 6-second frames with a 1.5-second increment. The autocorrelation
+function of each 6-second onset detection function is found and this
+is then passed through a perceptually weighted comb
+filterbank\cite{matthew2007a}. The successive comb filterbank output
+signals are grouped together into a matrix of observations of
+periodicity through time. The best path of periodicity through these
+observations is found using the Viterbi algorithm, where the
+transition matrix is defined as a diagonal Gaussian.
+
+Given the estimates of periodicity, the beat locations are recovered
+by applying the dynamic programming algorithm\cite{ellis2007}. This
+process involves the calculation of a recursive cumulative score
+function and backtrace signal. The cumulative score indicates the
+likelihood of a beat existing at each sample of the onset detection
+function input, and the backtrace gives the location of the best
+previous beat given this point in time. Once the cumulative score and
+backtrace have been calculated for the whole input signal, the best
+path through beat locations is found by recursively sampling the
+backtrace signal from the end of the input signal back to the
+beginning.
+
+The QM Tempo and Beat Tracker plugin was written by Matthew
+Davies and Christian Landone.
+
+\subsubsection{BeatRoot}
+
+The BeatRoot Vamp plugin\footnote{http://code.soundsoftware.ac.uk/projects/beatroot-vamp/} is an open source Vamp plugin library that
+implements the BeatRoot beat-tracking method of Simon
+Dixon\cite{simon2001a}. The BeatRoot algorithm has been submitted to
+MIREX evaluation in earlier years\cite{simon2006a}; this plugin
+consists of the most recent BeatRoot code release, converted from Java
+to C++ and modified for plugin format.
+
+The BeatRoot plugin was written by Simon Dixon and Chris Cannam.
+
+\subsection{Audio Key Detection}
+
+\subsubsection{QM Key Detector}
+
+The QM Key Detector Vamp plugin continuously estimates the key of the
+music by comparing the degree to which a block-by-block chromagram
+correlates to stored key profiles for each major and minor key.
+
+This plugin uses the correlation method described in~\cite{krumhansl1990} and~\cite{gomez2006}, but using different tone
+profiles. The key profiles used in this implementation are drawn from
+analysis of Book I of the Well Tempered Klavier by J S Bach, recorded
+at A=440 equal temperament, as described in~\cite{noland2007signal}.
+
+The QM Key Detector plugin was written by Katy Noland and
+Christian Landone.
+
+\subsection{Audio Chord Estimation}
+
+\subsubsection{Chordino}
+
+The Chordino plugin\footnote{http://isophonics.net/nnls-chroma} was developed following Mauch's 2010 work on chord
+extraction, submitted to MIREX in that
+year\cite{mauch:md1:2010}. While that submission used a C++ chroma
+implementation with a MATLAB dynamic Bayesian network as a chord
+extraction front-end\cite{matthias2010a}, Chordino is an entirely C++
+implementation that was developed specifically to be made freely
+available as an open-source plugin for general use.
+
+The method for the Chordino plugin has two parts:
+
+{\bf NNLS Chroma} --- NNLS Chroma analyses a single channel of audio
+using frame-wise spectral input from the Vamp host. The spectrum is
+transformed to a log-frequency spectrum (constant-Q) with three bins
+per semitone. On this representation, two processing steps are
+performed: tuning, after which each centre bin (i.e. bin 2, 5, 8, …)
+corresponds to a semitone, even if the tuning of the piece deviates
+from 440 Hz standard pitch; and running standardisation: subtraction
+of the running mean, division by the running standard deviation. This
+has a spectral whitening effect.
+
+The processed log-frequency spectrum is then used as an input for NNLS
+approximate transcription using a dictionary of harmonic notes with
+geometrically decaying harmonics magnitudes. The output of the NNLS
+approximate transcription is semitone-spaced. To get the chroma, this
+semitone spectrum is multiplied (element-wise) with the desired
+profile (chroma or bass chroma) and then mapped to 12 bins.
+
+{\bf Chord transcription} --- A fixed dictionary of chord profiles is
+used to calculate frame-wise chord similarities. A standard
+HMM/Viterbi approach is used to smooth these to provide a chord
+transcription.
+
+Chordino was written by Matthias Mauch.
+
+\subsection{Audio Onset Detection}
+
+\subsubsection{QM Note Onset Detector}
+
+The QM Note Onset Detector Vamp plugin estimates the onset times of
+notes within the music. It calculates an onset likelihood function for
+each spectral frame, and picks peaks in a smoothed version of this
+function.
+
+Several onset detection functions are available in this plugin; this
+submission uses the complex-domain method described
+in~\cite{chris2003a}.
+
+The QM Note Onset Detector plugin was written by Chris Duxbury, Juan
+Pablo Bello and Christian Landone.
+
+\subsubsection{OnsetsDS}
+
+OnsetsDS\footnote{http://code.soundsoftware.ac.uk/projects/vamp-onsetsds-plugin/} is an onset detector plugin wrapping Dan Stowell's OnsetsDS
+library\footnote{http://onsetsds.sourceforge.net/}, described
+in~\cite{dan2007a}.
+
+OnsetsDS was designed to provide an FFT-based onset detection that
+works very efficiently in real-time, with a fast reaction time. It is
+not tailored for non-real-time use or for any particular type of
+signal.
+
+The OnsetsDS plugin was written by Dan Stowell and Chris Cannam.
+
+\subsection{Multiple Fundamental Frequency Estimation and Tracking}
+
+\subsubsection{Silvet}
+
+Silvet (for Shift-Invariant Latent Variable
+Transcription)\footnote{http://code.soundsoftware.ac.uk/projects/silvet/}
+is a Vamp plugin for automatic music transcription, using a method
+based on that of~\cite{emmanouil2012a}. It produces a note
+transcription as output, and we have included a script to transform
+this into a framewise output as well, in order to make it available
+for framewise evaluation as well as note-tracking evaluation.
+
+Silvet uses a probablistic latent-variable estimation method to
+decompose a Constant-Q time-frequency matrix into note activations
+using a set of spectral templates learned from recordings of solo
+instruments. The method is thought to perform quite well for clear
+recordings that contain only instruments with a good correspondence to
+the known templates. Silvet does not contain any vocal templates, or
+templates for typical rock or electronic instruments.
+
+The method implemented in Silvet is very similar to that submitted to
+MIREX in 2012 as the BD1, BD2 and BD3 submissions in the Multiple F0
+Tracking task of that year~\cite{emmanouil2012b}. In common with that
+submission, and unlike the paper cited at~\cite{emmanouil2012a},
+Silvet uses a simple thresholding method instead of an HMM for note
+identification. However, Silvet follows~\cite{emmanouil2012a}
+rather than~\cite{emmanouil2012b} in including a 5-bin-per-semitone
+pitch shifting parameter.
+
+The Silvet plugin was written by Chris Cannam and Emmanouil Benetos.
+
+\subsubsection{Silvet Live}
+
+The Silvet Live submission uses the Silvet plugin in its ``Live''
+mode. This has somewhat lower latency than the default mode, and is
+much faster to run. This is mainly a result of using a reduced 12-bin
+chromagram and corresponding instrument templates, making this
+conceptually a very simple method. Results are expected to be
+substantially poorer than those for the default Silvet parameters.
+
+The Silvet plugin was written by Chris Cannam and Emmanouil Benetos.
+
+\subsection{Structural Segmentation}
+
+\subsubsection{QM Segmenter}
+
+The QM Segmenter Vamp plugin divides a single channel of music up into
+structurally consistent segments.
+
+The method, described in~\cite{mark2008a}, relies upon timbral or
+pitch similarity to obtain the high-level song structure. This is
+based on the assumption that the distributions of timbre features are
+similar over corresponding structural elements of the music.
+
+The input feature is a frequency-domain representation of the audio
+signal, in this case using a Constant-Q transform for the underlying
+features (though the plugin supports other timbral and pitch
+features). The extracted features are normalised in accordance with
+the MPEG-7 standard (NASE descriptor), and the value of this envelope
+is stored for each processing block of audio. This is followed by the
+extraction of 20 principal components per block using PCA, yielding a
+sequence of 21 dimensional feature vectors where the last element in
+each vector corresponds to the energy envelope.
+
+A 40-state Hidden Markov Model is then trained on the whole sequence
+of features, with each state corresponding to a specific timbre
+type. This partitions the timbre-space of a given track into 40
+possible types. After training and decoding the HMM, the song is
+assigned a sequence of timbre-features according to specific
+timbre-type distributions for each possible structural segment.
+
+The segmentation itself is computed by clustering timbre-type
+histograms. A series of histograms are created over a sliding window
+which are grouped into M clusters by an adapted soft k-means
+algorithm.  Reference histograms, iteratively updated during
+clustering, describe the timbre distribution for each segment. The
+segmentation arises from the final cluster assignments.
+
+The QM Segmenter plugin was written by Mark Levy.
+
+\subsubsection{Segmentino}
+
+The Segmentino plugin is a C++ implementation of a segmentation method
+described in Matthias Mauch's paper on using musical structure to
+enhance chord transcription\cite{matthias2009a} and expanded on in
+Mauch's PhD thesis\cite{matthiasphd}.
+
+A beat-quantised chroma representation is used to calculate pair-wise
+similarities between beats (really: beat ``shingles'', i.e.\ multi-beat
+vectors). Based on this first similarity calculation, an exhaustive
+comparison of all possible segments of reasonable length in beats is
+executed, and segments are added to form segment families if they are
+sufficiently similar to another ``family member''. Having accumulated a
+lot of families, the families are rated, and the one with the highest
+score is used as the first segmentation group that gets
+annotated. This last step is repeated until no more families fit the
+remaining ``holes'' in the song that haven't already been assigned to a
+segment.
+
+This method was developed for ``classic rock'' music, and therefore
+assumes a few characteristics that are not necessarily found in other
+music: repetition of harmonic sequences in the music that coincide
+with structural segments in a song; a steady beat; segments of a
+certain length; corresponding segments have the same length in
+beats.
+
+Segmentino plugin was written by Matthias Mauch and Massimiliano
+Zanoni.
+
+\subsection{Audio Tempo Estimation}
+
+\subsubsection{QM Tempo and Beat Tracker}
+
+For this task we submit the same plugin as that used in the Audio Beat
+Tracking task in section~\ref{tempo_and_beat_tracker}.
+
+\subsection{Audio Downbeat Estimation}
+
+\subsubsection{QM Bar and Beat Tracker}
+
+The QM Bar and Beat Tracker\cite{matthew2006} Vamp plugin estimates
+the positions of bar lines and metrical beat positions.
+
+The plugin first uses the method of the QM Tempo and Beat Tracker (see
+section~\ref{tempo_and_beat_tracker}) to estimate beat locations. Once
+these have been identified, the plugin makes a second pass over the
+input audio signal, partitioning it into beat synchronous frames. The
+audio within each beat frame is down-sampled to give a new sampling
+frequency of 2.8kHz. A beat-synchronous spectral representation is
+then calculated within each frame, from which a measure of beat
+spectral difference is calculated using Jensen-Shannon divergence. The
+bar boundaries are identified as those beat transitions leading to
+most consistent spectral change given the specified number of beats
+per bar.
+
+The plugin expects the number of beats per bar as a parameter rather
+than attempt to estimate this from the music. For the purposes of this
+submission the number of beats per bar is fixed to 4.
+
+\bibliography{qmvamp-mirex2016}
+
+\end{document}