changeset 15:d166744ca3d8

Some initial edits following feedback from Mark P
author Chris Cannam
date Fri, 23 Sep 2011 18:04:56 +0100
parents 7b4c0e52878e
children e5c387d04f6e
files cannam.tex
diffstat 1 files changed, 92 insertions(+), 84 deletions(-) [+]
line wrap: on
line diff
--- a/cannam.tex	Thu Sep 22 17:04:49 2011 +0100
+++ b/cannam.tex	Fri Sep 23 18:04:56 2011 +0100
@@ -3,7 +3,7 @@
 \def\CC{{C\nolinebreak[4]\hspace{-.05em}\raisebox{.4ex}{\tiny\bf ++}}}
 \raggedbottom
 
-\title{The title of the scientific paper: A long, explanatory subtitle}
+\title{Sound Software: Towards Sustainable and Reusable Software in Audio and Music Research}
 
 \name{Chris Cannam, Luis Figueira and Mark Plumbley}
 \address{Centre for Digital Music,\\
@@ -57,60 +57,59 @@
 audio and music research community --- including within the group
 represented by the present authors, the Centre for Digital Music
 (C4DM) at Queen Mary University of London --- come from a wide range
-of backgrounds including signal processing, electronics, computer
-science, music, information sciences, dance and performance, and data
-sonification.  In many of these fields, researchers do not have the
-skills or desire to become involved in traditional software
+of backgrounds besides signal processing, including electronics,
+computer science, music, information sciences, dance and performance,
+and data sonification.  In many of these fields, researchers do not
+have the skills or desire to become involved in traditional software
 development practice or in publication of code.
 
 Second, there are many technical and logistical reasons why software
 developed during earlier research is no longer available for
-evaluation and subsequent development even if it has been published,
-including platform incompatibilities and obsolescence, or legal
-limitations on distribution or reuse.
+evaluation and subsequent development even if it has been
+published. These include platform incompatibilities and obsolescence,
+or legal limitations on distribution or reuse.
 
-In this paper we discuss some of these practical constraints on
+In this paper we will discuss some of these practical constraints on
 application of reproducible research principles for software code, and
-propose an incremental approach toward better practice.
+explore an incremental approach toward better practice.  Finally, we will make some recommendations for [...]
 
-\section{Background}
-\label{sec:background}
-
-\subsection{Reproducible Research}
-\label{subsec:rr}
+\section{Reproducible Research}
+\label{sec:rr}
 
 Some researchers have come to realize that traditional methods of
 disseminating research outputs are no longer sufficient for
 computational science research, because the algorithms and parameters
-involved are complex enough that the description in the paper is no
+involved are often so complex that the description in the paper is no
 longer sufficient to reproduce the results.  Dohono and colleagues at
 Stanford have, since the mid 1990’s, aimed to carry out ``Reproducible
 Research'' by providing the paper, source code, and data, sufficient
 for other researchers to reproduce the same results
-\cite{buckheit1995}.  Recent years have seen some moves to promote
-this philosophy across the signal processing research community. A
-special session was organised at the ICASSP 2007 international signal
-processing conference, and special issues of IEEE Signal Processing
+\cite{buckheit1995}.
+
+Recent years have seen some moves to promote this philosophy across
+the signal processing research community. A special session was
+organised at ICASSP 2007 and special issues of IEEE Signal Processing
 Magazine and Computing in Science and Engineering concerning this
 subject both appeared in 2009~\cite{vandewalle2009}. The IEEE Signal
 Processing society now encourages Reproducible Research, allowing
 links from the online journal repository IEEEXplore to the code and
-data so that other researchers can reproduce the results. Actions such
-as these promote the idea that research results in signal processing
-should be presented not simply as a printed paper, but as a compendium
-including the paper, research data, and code.  Vandewalle et al have
-also created a Reproducible Research Repository\footnote{\tt http://rr.epfl.ch/},
-designed to promote reproducible research by requiring the authors of
-a paper to upload the code and data used in the experiments. Readers
-can also comment on a publication and evaluate the reproducibility of
-the work.
+data so that other researchers can reproduce the results.
 
-\subsection{Real-world limitations on software practice}
-\label{subsec:researchsoft}
+Actions such as these promote the idea that research results in signal
+processing should be presented not simply as a printed paper, but as a
+{\it compendium} [citation needed] including the paper, research data,
+and code.  Vandewalle et al have also created a Reproducible Research
+Repository\footnote{\tt http://rr.epfl.ch/}, designed to promote
+reproducible research by requiring the authors of a paper to upload
+the code and data used in the experiments. Readers can also comment on
+a publication and evaluate the reproducibility of the work.
 
 Although the Reproducible Research principle provides a comprehensive
-solution to the problem of code dissemination, in practice take-up in
-this field appears limited.  
+solution to the problem of code dissemination, our experience has been
+that take-up in the audio and music research field is limited.
+
+\section{Understanding real-world limitations on software practice}
+\label{sec:researchsoft}
 
 In order to better understand the reality faced by the audio and music
 research community, we conducted an online survey on software usage
@@ -128,35 +127,41 @@
 ensure reproducibility in their publications or that they only made
 code or data available on request.  Obstacles cited included lack of
 time, copyright restrictions, and the potential for commercial use of
-the code.  A broader case study by the UK Research Information Network
-into science research across several subject areas also identified
+the code.
+
+A broader study into science research across several subject areas by
+the UK Research Information Network \cite{rin2010} also identified
 lack of evidence of benefits, cultures of independence and
-competition, and concerns about quality as typical factors inhibiting
-open sharing of data and code \cite{rin2010}. Intuitively, undertaking
-reproducible research takes effort early in the research cycle, before
-the benefits are necessarily apparent and while the value of the
-reserch is still unclear, and can be perceived as delaying the
-production of ``real'' research. Once research results have been
-produced and a paper written, there is little apparent incentive to
-make the research reproducible.
+competition, and quality concerns as typical inhibiting factors for
+open sharing of data and code.
+
+Undertaking reproducible research takes effort early in the research
+cycle.  This happens before the benefits are necessarily apparent and
+while the value of the reserch is still unclear, and can be perceived
+as delaying the production of ``real'' research. Once research results
+have been produced and a paper written, there is little apparent
+incentive to make the research reproducible.
+
+A study in 2009 \cite{gwilson2009} found a great deal of variation in
+the level of understanding of standard software engineering concepts
+by scientists, and found that for developing and using scientific
+software, informal self-study or learning from peers was commonplace.
+
+The same study found that scientists typically developed and used
+software on their personal computers rather than dedicated servers,
+reflecting our own survey which found that most respondents kept code
+on their own machines and did not develop collaboratively
+\cite{ssamrsurvey}.
 
 In many of the fields within this community, researchers lack the
 skills or desire to write their own code or to make someone else's
-code work, and where they do write their own code, they work on
-different platforms and use a wide variety of batch and real-time
-environments.  A study in 2009 found a great deal of variation in the
-level of understanding of standard software engineering concepts by
-scientists, and found that for developing and using scientific
-software, informal self-study or learning from peers was commonplace
-\cite{gwilson2009}.  The study found that scientists typically
-developed and used software on their personal computers rather than
-dedicated servers, and our own survey also found that most respondents
-kept code on their own machines and did not develop collaboratively
-\cite{ssamrsurvey}.  We found a variety of environments and toolkits
-used, including MATLAB and numerous MATLAB toolboxes, C++, Max/MSP,
-OpenFrameworks, Juce, HTK and MPTK, SuperCollider, Clojure and
-R. Recent publications from our group have also made use of Python
-\cite{fazekas} and Prolog~\cite{raimond}.
+code work. Where they do write their own code, they work on different
+platforms and use a wide variety of batch and real-time environments.
+We found a variety of environments and toolkits used, including MATLAB
+and numerous MATLAB toolboxes, C++, Max/MSP, OpenFrameworks, Juce, HTK
+and MPTK, SuperCollider, Clojure and R. Recent publications from our
+group have also made use of Python \cite{fazekas} and
+Prolog~\cite{raimond}.
 
 As a consequence of the lack of publication and variety of platforms
 used, software developed in earlier research is not always readily
@@ -169,15 +174,16 @@
 component and only runs on a single platform; Klapuri et al
 \cite{klapuri} was written in MATLAB with a platform-specific
 extension and is not widely distributed for reasons of commercial
-confidentiality; and methods from Cemgil, Laroche, Alonso, and Peeters
-have not been published as code.
+confidentiality; and methods from several other researchers have not
+been published as code.
 
 \section{Sustainable software: a bottom-up approach}
 \label{sec:philosophy}
 
-Our approach is to facilitate incremental improvements to the way
-software is managed during research, by identifying practical barriers
-to software reuse and providing means to reduce or eliminate them.
+Our approach to this problem is to facilitate incremental improvements
+to the way software is managed during research, by identifying
+practical barriers to software reuse and providing means to reduce or
+eliminate them.
 
 While we support the goal of reproducible research and aim to
 encourage open publication of code and data linked with paper
@@ -205,7 +211,7 @@
 facilities and tools to support such development; and reusability
 problems caused by platform incompatibilities.
 
-\subsection{Education and Confidence with Code}
+\subsection{Barrier: Lack of education and confidence with code}
 
 introductory note here: the barrier is that people lack software
 development skills
@@ -229,7 +235,7 @@
 started work on tutorial material on various subjects (todo: what can
 we say about this?)
 
-\subsection{Facilities and Tools}
+\subsection{Barrier: Lack of facilities and tools}
 
 Researchers will not make use of version control and collaborative
 development facilities if they are unaware that they exist.  An
@@ -240,28 +246,30 @@
 institution could provide all reported failure.  This is consistent
 with the experience in our own group, where version control has been
 used sporadically and set up in an ad-hoc fashion, and also with
-feedback provided to our survey.  Attendees at the Autumn School also
-reported difficulty getting started with the complex user interfaces
-available for version control.  Nonetheless, version control was
-identified by attendees in debriefing as the most compelling subject
-taught during the course, suggesting that lack of awareness may be the
-main barrier to uptake.
+feedback provided to our survey.
+
+Attendees at the Autumn School also reported difficulty getting
+started with the complex user interfaces available for version
+control.  Nonetheless, version control was identified by attendees in
+debriefing as the most compelling subject taught during the course,
+suggesting that lack of awareness may be the main barrier to uptake.
 
 \subsubsection{SoundSoftware Code Site}
 \label{sec:codesite}
 
-We provide the SoundSoftware code site\footnote{\tt http://code.soundsoftware.ac.uk/} as a facility which audio and music
-researchers in the UK may use for collaborative development and as a
-version control and code hosting facility, if their institution is
-unable to help them or if they have a need to work with researchers at
-other institutions who would not be permitted access to their
-institution's facilities.  Of course the existence of this site also
-addresses the shortcomings in our own group's former ad-hoc use of
-version control.  The site is implemented using a custom version of
-the Redmine [citation needed] project management application, with
-Mercurial version control.  Any UK researcher in the field is
-permitted to register and to start their own collaborative projects
-using the version control, wiki, issue tracking, and other services
+We provide the SoundSoftware code site\footnote{\tt
+  http://code.soundsoftware.ac.uk/} as a facility which audio and
+music researchers in the UK may use for collaborative development and
+as a version control and code hosting facility, if their institution
+is unable to help them or if they have a need to work with researchers
+at other institutions who would not be permitted access to their
+institution's facilities.  The existence of this site also addresses
+the shortcomings in our own group's former ad-hoc use of version
+control.  The site is implemented using our own custom version of the
+Redmine\footnote{\tt http://redmine.org/} project management
+application, with Mercurial version control.  Any UK researcher in the
+field can register and start their own collaborative projects using
+the version control, wiki, issue tracking, and other services
 provided.
 
 Four aspects of our code site contribute to sustainability and utility
@@ -299,7 +307,7 @@
 interface that we could teach easily to researchers across multiple
 operating system platforms.
 
-\subsection{Platforms and Reuse}
+\subsection{Barrier: Platform incompatibilities}
 
 introductory note here: the barrier is that software that is published
 is not always usable