Mercurial > hg > soundsoftware-icassp-2012
changeset 19:94c9700135db
Substantial updates -- I think the "real-world limitations" section is OK now
author | Chris Cannam |
---|---|
date | Sun, 25 Sep 2011 21:01:17 +0100 |
parents | 6cb784212511 |
children | cfd78be7676b |
files | cannam.tex refs.bib |
diffstat | 2 files changed, 75 insertions(+), 79 deletions(-) [+] |
line wrap: on
line diff
--- a/cannam.tex Sun Sep 25 12:58:26 2011 +0100 +++ b/cannam.tex Sun Sep 25 21:01:17 2011 +0100 @@ -35,13 +35,12 @@ % \begin{abstract} Although researchers are increasingly aware of the need to publish and -maintain software code alongside their results, several practical -barriers prevent this from happening in many cases. We examine these -barriers and describe the incremental approach to overcoming them used -by the Sound Software project, an effort to improve software practice -in the UK audio and music research community. +maintain software code alongside their results, practical barriers +prevent this from happening in many cases. We examine these barriers, +propose an incremental approach to overcoming some of them, and +describe the Sound Software project, an effort to improve software +practice in the UK audio and music research community. -TODO: The above is still not good, rewrite again TODO: Replace all [citation needed] with citations! @@ -56,9 +55,9 @@ \label{sec:intro} Much research in audio and music informatics involves the development -of new software and the evaluation of methods against earlier work -also implemented in software. Both of these are sometimes problematic -in practice. +of new software and the evaluation of new methods against earlier work +also implemented in software. Both of these can be problematic in +practice. First, researchers in the audio and music research community --- including those in the group represented by the present authors, the @@ -99,11 +98,11 @@ Recent years have seen some moves to promote this philosophy across the signal processing research community. A special session was organised at ICASSP 2007 and special issues of IEEE Signal Processing -Magazine and Computing in Science and Engineering concerning this -subject both appeared in 2009~\cite{vandewalle2009}. The IEEE Signal -Processing society now encourages Reproducible Research, allowing -links from the online journal repository IEEEXplore to the code and -data associated with a publication (TODO: check this). +Magazine and Computing in Science and Engineering on this subject both +appeared in 2009~\cite{vandewalle2009}. The IEEE Signal Processing +society now encourages Reproducible Research, allowing links from the +online journal repository IEEEXplore to the code and data associated +with a publication (TODO: check this). Actions such as these promote the idea that research results in signal processing should be presented not simply as a printed paper, but as a @@ -122,18 +121,6 @@ \section{Understanding real-world limitations on software practice} \label{sec:researchsoft} -[We are going to propose four barriers to reuse -- lack of education - \& confidence; lack of tools \& facilities; lack of incentive, - because code is not measured as publications are; platform - incompatibilities and code rot. We need to give the facts and - figures supporting these as barriers and then identify them.] - -A study by Hannay et al \cite{gwilson2009} found a great deal of -variation in the level of understanding of standard software -engineering concepts by scientists, and found that for developing and -using scientific software, informal self-study or learning from peers -was commonplace. - In order to better understand the reality faced by the audio and music research community, we conducted an online survey on software usage and development~\cite{ssamrsurvey}. This survey opened in October @@ -149,22 +136,14 @@ Although 80\% of respondents reported developing software themselves during research and 40\% of those said that they took steps to ensure reproducibility of their publications, the accompanying comments -showed that this did not necessarily involve the publication of code. +showed that this did not necessarily involve publication of code. Respondents referred to using standard, publicly-available datasets and calibration procedures when performing measurements; to documenting code and data so that they or other researchers in their -group could reproduce the results later; or to making code available -on personal request. All of these are worthwhile actions but they do -not suggest widespread use of a full reproducible compendium approach. - -Developing collaboratively even within a research group also seems to -be the exception rather than the rule. 51\% of respondents who -developed software said that their code did not leave their own -computer, and 59\% said they did not use version control software -(typically used in software practice to facilitate collaborative -development). This is also consistent with the finding by Hannay et -al that scientists usually developed and used software on their own -desktop computers rather than dedicated processing servers. +group could reproduce the results later; or to making code and data +available on personal request. These are worthwhile actions, but they +do not suggest widespread conscious use of reproducible research +methods. Our respondents cited as obstacles to the publication of code lack of time, copyright restrictions, and the potential for future commercial @@ -174,34 +153,43 @@ independence and competition, and quality concerns as typical inhibiting factors for open sharing of data and code. +Besides these practical obstacles, undertaking reproducible research +takes effort early in the research cycle. This happens before the +benefits are necessarily apparent and while the value of the research +is still unclear. Once results have been produced and a paper written +there is little apparent incentive to make the research reproducible, +and assessments such as the English ``Research Excellence +Framework''~\cite{ref} typically do not identify software code among +assessed research outputs. -[This suggests that there are cultural and technical [lack of - facilities \& awareness of how to use them] barriers... this makes - intuitive sense because of the following] +A possible reason why reproducibility efforts do not happen earlier is +that standard software engineering practices that could facilitate +open dissemination of code, such as collaborative development and the +use of public code repositories, are not widely used by researchers, +who are often self-trained in software development. A study by Hannay +et al \cite{gwilson2009} found that for developing and using +scientific software, informal self-study or learning from peers was +commonplace. The same study found that scientists usually developed +and used software on their own desktop computers rather than servers +provided for the purpose of running scientific software. In our +survey, 51\% of respondents who developed software said that their +code did not leave their own computer, and 59\% said they did not use +version control software. -Undertaking reproducible research takes effort early in the research -cycle. This happens before the benefits are necessarily apparent and -while the value of the reserch is still unclear, and can be perceived -as delaying the production of ``real'' research. Once research results -have been produced and a paper written, there is little apparent -incentive to make the research reproducible. +Not only does software often go un-published; software that is +published often does not work for future users. Partly this is +because of platform variations. In many of the fields within this +community, researchers lack the skills or desire to grapple with +someone else's code. Where they do work with code, they use a variety +of platforms and batch and real-time environments: respondents to our +survey named MATLAB and numerous of its toolboxes, Max/MSP, C++ and +OpenFrameworks, Juce, HTK and MPTK, SuperCollider, Python, Clojure and +R among technologies used. And even code that can be run may produce +incorrect results \cite{merali2010} owing to limited testing. -[Furthermore there is a barrier because of platform incompatibilities - as follows] - -In many of the fields within this community, researchers lack the -skills or desire to write their own code or to make someone else's -code work. Where they do write their own code, they work on different -platforms and use a wide variety of batch and real-time environments. -We found a variety of environments and toolkits used, including MATLAB -and numerous MATLAB toolboxes, C++, Max/MSP, OpenFrameworks, Juce, HTK -and MPTK, SuperCollider, Clojure and R. Recent publications from our -group have also made use of Python \cite{fazekas} and -Prolog~\cite{raimond}. - -As a consequence of the lack of publication and variety of platforms -used, software developed in earlier research is not always readily -available to later researchers. For example, in the well-known +As a consequence of these obstacles to publication and the variety of +platforms used, software developed in earlier research is often +unavailable to later researchers. For example, in the well-known subject of beat tracking, the method of Scheirer et al~\cite{scheirer} was developed in \CC{} for a legacy platform and is now only available informally; Goto et al~\cite{goto} was written for a parallel @@ -209,7 +197,7 @@ Hainsworth \cite{hainsworth} was written in MATLAB with a Windows-specific DLL component and only runs on a single platform; Klapuri et al \cite{klapuri} is not widely distributed; and methods -from several other researchers have not been published as code. +from several other researchers have not been published in code. \section{Sustainable software: a bottom-up approach} \label{sec:philosophy} @@ -366,8 +354,8 @@ widely available to researchers in other fields related to audio and music such as computational musicology or music therapy. -\subsubsection{Sonic Visualiser and Vamp Plugins} -\label{subsubsec:sv} +\subsubsection{Plugins} +\label{sec:plugins} Sonic Visualiser was developed at the Centre for Digital Music from 2005 onwards as a visualisation and analysis tool for audio @@ -424,18 +412,18 @@ \subsection{Chordino and NNLS Chroma} -\label{subsubsec:chordino} +\label{sec:chordino} % Note that in this case the author did _not_ follow a RR methodology, and the code is not referred to in the paper. The link between code and publication must be made after the fact. -In~\cite{mauch2010}, Mauch describes a method for improving the -automatic recognition of chords whose fundamental frequencies are -easily confused with other partials. This is a traditional -publication which makes no reference to any published code or test -data. Although no formal attempt was made initially toward -reproducibility, some independent evaluation was carried out through -the submission of a MATLAB implementation of the method to the annual -MIREX evaluation exchange~\cite{mirex}. +Mauch describes in~\cite{mauch2010} a method for improving automatic +recognition of chords whose fundamental frequencies are easily +confused with other partials. This is a traditional publication which +appeared without accompanying code or test data. Although no formal +attempt was made initially toward reproducibility, some independent +evaluation was carried out through the submission of a MATLAB +implementation of the method to the annual MIREX evaluation +exchange~\cite{mirex}. Following this publication, we worked with the author of the paper and code to develop a C++ implementation of the method and turn it into a @@ -460,7 +448,8 @@ provenance etc] [2. Version control and related facilities -- provide, encourage - people to use, make it simple] + people to use, make it simple -- use our facilities if you like -- + encourage people to use it for papers as well!] [3. Don't feel bad]
--- a/refs.bib Sun Sep 25 12:58:26 2011 +0100 +++ b/refs.bib Sun Sep 25 21:01:17 2011 +0100 @@ -57,7 +57,7 @@ @article{vandewalle2009, author = {P. Vandewalle and J. Kovacevic and M. Vetterli}, year = 2009, -title = {Reproducible Research in Signal Processing - What, why, and how}, +title = {Reproducible Research in Signal Processing---What, why, and how}, journal = {IEEE Signal Processing Magazine}, volume = 26, number = 3, @@ -71,6 +71,13 @@ howpublished = {http://www.rin.ac.uk/our-work/data-management-and-curation/open-science-case-studies}, } +@misc{ref, +author = {Higher Education Funding Council for England}, +title = "{Assessment framework and guidance on submissions}", +year = 2011, +howpublished = {http://www.hefce.ac.uk/research/ref/pubs/2011/02\_11/}, +} + @misc{ssamrsurvey, author = {I. Damnjanovic and L. Figueira and C. Cannam and M. Plumbley}, title = "{SoundSoftware.ac.uk Survey Report}", @@ -90,7 +97,7 @@ @article{mirex, author = {J. Stephen Downie}, -title = {The music information retrieval evaluation exchange (2005–2007): A window into music information retrieval research}, +title = {The music information retrieval evaluation exchange (2005--2007): A window into music information retrieval research}, journal = {Acoustical Science and Technology}, volume = 29, number = 4,