changeset 19:94c9700135db

Substantial updates -- I think the "real-world limitations" section is OK now
author Chris Cannam
date Sun, 25 Sep 2011 21:01:17 +0100
parents 6cb784212511
children cfd78be7676b
files cannam.tex refs.bib
diffstat 2 files changed, 75 insertions(+), 79 deletions(-) [+]
line wrap: on
line diff
--- a/cannam.tex	Sun Sep 25 12:58:26 2011 +0100
+++ b/cannam.tex	Sun Sep 25 21:01:17 2011 +0100
@@ -35,13 +35,12 @@
 %
 \begin{abstract}
 Although researchers are increasingly aware of the need to publish and
-maintain software code alongside their results, several practical
-barriers prevent this from happening in many cases.  We examine these
-barriers and describe the incremental approach to overcoming them used
-by the Sound Software project, an effort to improve software practice
-in the UK audio and music research community.
+maintain software code alongside their results, practical barriers
+prevent this from happening in many cases.  We examine these barriers,
+propose an incremental approach to overcoming some of them, and
+describe the Sound Software project, an effort to improve software
+practice in the UK audio and music research community.
 
-TODO: The above is still not good, rewrite again
 
 TODO: Replace all [citation needed] with citations!
 
@@ -56,9 +55,9 @@
 \label{sec:intro}
 
 Much research in audio and music informatics involves the development
-of new software and the evaluation of methods against earlier work
-also implemented in software.  Both of these are sometimes problematic
-in practice.
+of new software and the evaluation of new methods against earlier work
+also implemented in software.  Both of these can be problematic in
+practice.
 
 First, researchers in the audio and music research community ---
 including those in the group represented by the present authors, the
@@ -99,11 +98,11 @@
 Recent years have seen some moves to promote this philosophy across
 the signal processing research community. A special session was
 organised at ICASSP 2007 and special issues of IEEE Signal Processing
-Magazine and Computing in Science and Engineering concerning this
-subject both appeared in 2009~\cite{vandewalle2009}. The IEEE Signal
-Processing society now encourages Reproducible Research, allowing
-links from the online journal repository IEEEXplore to the code and
-data associated with a publication (TODO: check this). 
+Magazine and Computing in Science and Engineering on this subject both
+appeared in 2009~\cite{vandewalle2009}. The IEEE Signal Processing
+society now encourages Reproducible Research, allowing links from the
+online journal repository IEEEXplore to the code and data associated
+with a publication (TODO: check this).
 
 Actions such as these promote the idea that research results in signal
 processing should be presented not simply as a printed paper, but as a
@@ -122,18 +121,6 @@
 \section{Understanding real-world limitations on software practice}
 \label{sec:researchsoft}
 
-[We are going to propose four barriers to reuse -- lack of education
-  \& confidence; lack of tools \& facilities; lack of incentive,
-  because code is not measured as publications are; platform
-  incompatibilities and code rot. We need to give the facts and
-  figures supporting these as barriers and then identify them.]
-
-A study by Hannay et al \cite{gwilson2009} found a great deal of
-variation in the level of understanding of standard software
-engineering concepts by scientists, and found that for developing and
-using scientific software, informal self-study or learning from peers
-was commonplace.
-
 In order to better understand the reality faced by the audio and music
 research community, we conducted an online survey on software usage
 and development~\cite{ssamrsurvey}.  This survey opened in October
@@ -149,22 +136,14 @@
 Although 80\% of respondents reported developing software themselves
 during research and 40\% of those said that they took steps to ensure
 reproducibility of their publications, the accompanying comments
-showed that this did not necessarily involve the publication of code.
+showed that this did not necessarily involve publication of code.
 Respondents referred to using standard, publicly-available datasets
 and calibration procedures when performing measurements; to
 documenting code and data so that they or other researchers in their
-group could reproduce the results later; or to making code available
-on personal request.  All of these are worthwhile actions but they do
-not suggest widespread use of a full reproducible compendium approach.
-
-Developing collaboratively even within a research group also seems to
-be the exception rather than the rule.  51\% of respondents who
-developed software said that their code did not leave their own
-computer, and 59\% said they did not use version control software
-(typically used in software practice to facilitate collaborative
-development).  This is also consistent with the finding by Hannay et
-al that scientists usually developed and used software on their own
-desktop computers rather than dedicated processing servers.
+group could reproduce the results later; or to making code and data
+available on personal request.  These are worthwhile actions, but they
+do not suggest widespread conscious use of reproducible research
+methods.
 
 Our respondents cited as obstacles to the publication of code lack of
 time, copyright restrictions, and the potential for future commercial
@@ -174,34 +153,43 @@
 independence and competition, and quality concerns as typical
 inhibiting factors for open sharing of data and code.
 
+Besides these practical obstacles, undertaking reproducible research
+takes effort early in the research cycle.  This happens before the
+benefits are necessarily apparent and while the value of the research
+is still unclear. Once results have been produced and a paper written
+there is little apparent incentive to make the research reproducible,
+and assessments such as the English ``Research Excellence
+Framework''~\cite{ref} typically do not identify software code among
+assessed research outputs.
 
-[This suggests that there are cultural and technical [lack of
-    facilities \& awareness of how to use them] barriers... this makes
-  intuitive sense because of the following]
+A possible reason why reproducibility efforts do not happen earlier is
+that standard software engineering practices that could facilitate
+open dissemination of code, such as collaborative development and the
+use of public code repositories, are not widely used by researchers,
+who are often self-trained in software development.  A study by Hannay
+et al \cite{gwilson2009} found that for developing and using
+scientific software, informal self-study or learning from peers was
+commonplace.  The same study found that scientists usually developed
+and used software on their own desktop computers rather than servers
+provided for the purpose of running scientific software.  In our
+survey, 51\% of respondents who developed software said that their
+code did not leave their own computer, and 59\% said they did not use
+version control software.
 
-Undertaking reproducible research takes effort early in the research
-cycle.  This happens before the benefits are necessarily apparent and
-while the value of the reserch is still unclear, and can be perceived
-as delaying the production of ``real'' research. Once research results
-have been produced and a paper written, there is little apparent
-incentive to make the research reproducible.
+Not only does software often go un-published; software that is
+published often does not work for future users.  Partly this is
+because of platform variations.  In many of the fields within this
+community, researchers lack the skills or desire to grapple with
+someone else's code.  Where they do work with code, they use a variety
+of platforms and batch and real-time environments: respondents to our
+survey named MATLAB and numerous of its toolboxes, Max/MSP, C++ and
+OpenFrameworks, Juce, HTK and MPTK, SuperCollider, Python, Clojure and
+R among technologies used.  And even code that can be run may produce
+incorrect results \cite{merali2010} owing to limited testing.
 
-[Furthermore there is a barrier because of platform incompatibilities
-  as follows]
-
-In many of the fields within this community, researchers lack the
-skills or desire to write their own code or to make someone else's
-code work. Where they do write their own code, they work on different
-platforms and use a wide variety of batch and real-time environments.
-We found a variety of environments and toolkits used, including MATLAB
-and numerous MATLAB toolboxes, C++, Max/MSP, OpenFrameworks, Juce, HTK
-and MPTK, SuperCollider, Clojure and R. Recent publications from our
-group have also made use of Python \cite{fazekas} and
-Prolog~\cite{raimond}.
-
-As a consequence of the lack of publication and variety of platforms
-used, software developed in earlier research is not always readily
-available to later researchers.  For example, in the well-known
+As a consequence of these obstacles to publication and the variety of
+platforms used, software developed in earlier research is often
+unavailable to later researchers.  For example, in the well-known
 subject of beat tracking, the method of Scheirer et al~\cite{scheirer}
 was developed in \CC{} for a legacy platform and is now only available
 informally; Goto et al~\cite{goto} was written for a parallel
@@ -209,7 +197,7 @@
 Hainsworth \cite{hainsworth} was written in MATLAB with a
 Windows-specific DLL component and only runs on a single platform;
 Klapuri et al \cite{klapuri} is not widely distributed; and methods
-from several other researchers have not been published as code.
+from several other researchers have not been published in code.
 
 \section{Sustainable software: a bottom-up approach}
 \label{sec:philosophy}
@@ -366,8 +354,8 @@
 widely available to researchers in other fields related to audio and
 music such as computational musicology or music therapy.
 
-\subsubsection{Sonic Visualiser and Vamp Plugins}
-\label{subsubsec:sv}
+\subsubsection{Plugins}
+\label{sec:plugins}
 
 Sonic Visualiser was developed at the Centre for Digital Music from
 2005 onwards as a visualisation and analysis tool for audio
@@ -424,18 +412,18 @@
 
 
 \subsection{Chordino and NNLS Chroma}
-\label{subsubsec:chordino}
+\label{sec:chordino}
 
 % Note that in this case the author did _not_ follow a RR methodology, and the code is not referred to in the paper.  The link between code and publication must be made after the fact.
 
-In~\cite{mauch2010}, Mauch describes a method for improving the
-automatic recognition of chords whose fundamental frequencies are
-easily confused with other partials.  This is a traditional
-publication which makes no reference to any published code or test
-data.  Although no formal attempt was made initially toward
-reproducibility, some independent evaluation was carried out through
-the submission of a MATLAB implementation of the method to the annual
-MIREX evaluation exchange~\cite{mirex}.
+Mauch describes in~\cite{mauch2010} a method for improving automatic
+recognition of chords whose fundamental frequencies are easily
+confused with other partials.  This is a traditional publication which
+appeared without accompanying code or test data.  Although no formal
+attempt was made initially toward reproducibility, some independent
+evaluation was carried out through the submission of a MATLAB
+implementation of the method to the annual MIREX evaluation
+exchange~\cite{mirex}.
 
 Following this publication, we worked with the author of the paper and
 code to develop a C++ implementation of the method and turn it into a
@@ -460,7 +448,8 @@
   provenance etc]
 
 [2. Version control and related facilities -- provide, encourage
-  people to use, make it simple]
+  people to use, make it simple -- use our facilities if you like --
+  encourage people to use it for papers as well!]
 
 [3. Don't feel bad]
 
--- a/refs.bib	Sun Sep 25 12:58:26 2011 +0100
+++ b/refs.bib	Sun Sep 25 21:01:17 2011 +0100
@@ -57,7 +57,7 @@
 @article{vandewalle2009,
 author = {P. Vandewalle and J. Kovacevic and M. Vetterli},
 year = 2009,
-title = {Reproducible Research in Signal Processing - What, why, and how},
+title = {Reproducible Research in Signal Processing---What, why, and how},
 journal = {IEEE Signal Processing Magazine},
 volume = 26,
 number = 3,
@@ -71,6 +71,13 @@
 howpublished = {http://www.rin.ac.uk/our-work/data-management-and-curation/open-science-case-studies},
 }
 
+@misc{ref,
+author = {Higher Education Funding Council for England},
+title = "{Assessment framework and guidance on submissions}",
+year = 2011,
+howpublished = {http://www.hefce.ac.uk/research/ref/pubs/2011/02\_11/},
+}
+
 @misc{ssamrsurvey,
 author = {I. Damnjanovic and L. Figueira and C. Cannam and M. Plumbley},
 title = "{SoundSoftware.ac.uk Survey Report}",
@@ -90,7 +97,7 @@
 
 @article{mirex,
 author = {J. Stephen Downie},
-title = {The music information retrieval evaluation exchange (2005–2007): A window into music information retrieval research},
+title = {The music information retrieval evaluation exchange (2005--2007): A window into music information retrieval research},
 journal = {Acoustical Science and Technology},
 volume = 29,
 number = 4,