Mercurial > hg > soundsoftware-icassp-2012
changeset 38:ccd2d07c4c26
Many edits to first few sections; some work on code site
author | Chris Cannam |
---|---|
date | Mon, 26 Sep 2011 12:31:05 +0100 |
parents | 826f81501918 |
children | 93c0dc50d311 |
files | cannam.tex |
diffstat | 1 files changed, 128 insertions(+), 122 deletions(-) [+] |
line wrap: on
line diff
--- a/cannam.tex Mon Sep 26 11:23:42 2011 +0100 +++ b/cannam.tex Mon Sep 26 12:31:05 2011 +0100 @@ -6,8 +6,9 @@ \title{Sound Software: Towards Software Reuse in Audio and Music Research} \name{Chris Cannam, Luis Figueira and Mark Plumbley} -\address{Centre for Digital Music,\\ - Queen Mary University of London\\\{first.lastname@eecs.qmul.ac.uk\}} +\address{Queen Mary University of London\\ +Centre for Digital Music\\ + \{chris.cannam, luis.figueira, mark.plumbley\}@eecs.qmul.ac.uk} % % For example: % ------------ @@ -55,9 +56,9 @@ \label{sec:intro} Much research in audio and music informatics involves the development -of new software and the evaluation of new methods against earlier work -also implemented in software. Both of these can be problematic in -practice. +of new computational methods implemented in software and the +evaluation of new methods against earlier work also implemented in +software. Both of these can be problematic in practice. First, researchers in the audio and music research community --- including those in the group represented by the present authors, the @@ -75,21 +76,22 @@ platform incompatibilities and obsolescence, or legal limitations on distribution or reuse. -In this paper we will discuss some of the practical constraints on +In this paper we discuss some of the practical constraints on application of reproducible research principles in connection with -reuse of research software. We will then explore an incremental -approach toward better practice. Finally, we will make some early -recommendations for research groups that wish to improve software -development practice in their work. +reuse of research software. We then explore an incremental approach +toward better practice. Finally we make some early recommendations +for research groups that wish to improve software development practice +in their work. \section{Reproducible Research} \label{sec:rr} Some researchers have come to realize that traditional methods of -disseminating research outputs are no longer sufficient for -computational science research, because the algorithms and parameters -involved are often so complex that the description in the paper is no -longer sufficient to reproduce the results. Dohono and colleagues at +disseminating research outputs based on the published paper only are +no longer sufficient for computational science research. The +algorithms and parameters involved are often so complex that the +description in the paper is no longer sufficient to reproduce the +results. As an alternative approach, Dohono and colleagues at Stanford have, since the mid 1990’s, aimed to carry out ``Reproducible Research'' by providing the paper, source code, and data, sufficient for other researchers to reproduce the same results @@ -98,21 +100,21 @@ Recent years have seen some moves to promote this philosophy across the signal processing research community. A special session was organised at ICASSP 2007 and special issues of IEEE Signal Processing -Magazine and Computing in Science and Engineering on this subject both -appeared in 2009~\cite{vandewalle2009}. The IEEE Signal Processing -society now encourages Reproducible Research, allowing links from the -online journal repository IEEEXplore to the code and data associated -with a publication (TODO: check this). +Magazine~\cite{vandewalle2009} and Computing in Science and +Engineering [citation needed] on this subject both appeared in +2009. The IEEE Signal Processing society now encourages Reproducible +Research, allowing links from the online journal repository IEEEXplore +to the code and data associated with a publication (TODO: check this). Actions such as these promote the idea that research results in signal processing should be presented not simply as a printed paper, but as a -{\it compendium} [citation needed] including the paper, research data, -and code. Vandewalle et al [citation needed] have also created a -Reproducible Research Repository\footnote{\tt http://rr.epfl.ch/}, -designed to promote reproducible research by requiring the authors of -a paper to upload the code and data used in the experiments. Readers -can then comment on a publication and evaluate the reproducibility of -the work. +{\it compendium} [citation needed -- Victoria Stodden?] including the +paper, research data, and code. Vandewalle et al [citation needed] +also created a Reproducible Research Repository\footnote{\tt + http://rr.epfl.ch/}, designed to promote reproducible research by +requiring the authors of a paper to upload the code and data used in +the experiments. Readers can then comment on a publication and +evaluate the reproducibility of the work. Although the Reproducible Research principle proposes a comprehensive solution to the problem of code dissemination, our experience has been @@ -131,7 +133,7 @@ further examination as well as some broad numerical results. The survey closed in April 2011, with 54 complete and 23 partially complete responses. There were responses from at least 16 different -institutions. +institutions. A number of common issues were reported. Although 80\% of respondents reported developing software themselves during research and 40\% of those said that they took steps to ensure @@ -163,11 +165,11 @@ assessed research outputs. A possible reason why reproducibility efforts do not happen earlier is -that standard software engineering practices that could facilitate -open dissemination of code, such as collaborative development and the -use of public code repositories, are not widely used by researchers, -who are often self-trained in software development. A study by Hannay -et al \cite{gwilson2009} found that for developing and using +that researchers are often self-trained in software development and so +make little use of standard software engineering practices that could +facilitate open dissemination of code, such as collaborative +development and the use of public code repositories. A study by +Hannay et al~\cite{gwilson2009} found that for developing and using scientific software, informal self-study or learning from peers was commonplace. The same study found that scientists usually developed and used software on their own desktop computers rather than servers @@ -177,27 +179,20 @@ version control software. Not only does software often go un-published; software that is -published often does not work for future users. Partly this is -because of platform variations. In many of the fields within this +published is often unavailable for future users because of platform +incompatibilities. For example, in the well-known subject of beat +tracking, the method of Scheirer et al~\cite{scheirer} was written for +a legacy platform and is only available by informal means; Goto et +al~\cite{goto} was written for a parallel architecture no longer in +wide use and never publicly released; and Hainsworth \cite{hainsworth} +was written in MATLAB with a Windows-specific DLL component and only +runs on a single platform. In many of the fields within this community, researchers lack the skills or desire to grapple with code if it will not immediately run on a platform they have available. Where they produce code, they use a variety of platforms and batch and real-time environments: respondents to our survey named MATLAB and numerous of its toolboxes, Max/MSP, C++ and OpenFrameworks, Juce, HTK -and MPTK, SuperCollider, Python, Clojure and R among technologies -used. - -As a consequence of these obstacles to publication and the variety of -platforms used, software developed in earlier research is often -unavailable to later researchers. For example, in the well-known -subject of beat tracking, the method of Scheirer et al~\cite{scheirer} -was developed in \CC{} for a legacy platform and is now only available -informally; Goto et al~\cite{goto} was written for a parallel -architecture no longer in wide use and never publicly released; -Hainsworth \cite{hainsworth} was written in MATLAB with a -Windows-specific DLL component and only runs on a single platform; -Klapuri et al \cite{klapuri} is not widely distributed; and methods -from several other researchers have not been published in code. +and MPTK, SuperCollider, Python, and Clojure among technologies used. \section{Sustainable software: a bottom-up approach} \label{sec:philosophy} @@ -211,10 +206,10 @@ While we support the goal of reproducible research and aim to encourage open publication of code and data linked with paper publications, we believe that this goal is more easily approached -incrementally because researchers will appreciate improvements to -software development practice regardless of their beliefs or -intentions with regard to reproducible research. By helping -researchers to feel comfortable with managing provenance and +incrementally. We maintain that researchers will appreciate +improvements to software development practice, regardless of their +beliefs or intentions with regard to reproducible research. By +helping researchers to feel comfortable with managing provenance and versioning for software, with collaborative development of code, and with the perception of code as something that may readily be reused, we aim to prepare ground in which open and reproducible publication @@ -224,24 +219,27 @@ Software that is to be used needs maintenance, and any proposal to help researchers reuse software more easily needs to address the problem that such software is not always in a reusable state. Our -direct concern therefore is sustainability and reusability rather than -reproducibility. +direct concern therefore is {\it sustainability} and {\it reusability} +rather than {\it reproducibility}. -We cannot address all possible barriers to software publication and -reuse, but following section \ref{sec:researchsoft} we identify four -that may be approachable: lack of confidence in code quality and of -comfort with collaborative development; lack of facilities and tools -to support such development; lack of incentive to distribute software -given the academic focus on paper publications; and reusability -problems caused by platform incompatibilities. +While we cannot address all possible barriers to software publication +and reuse, following section \ref{sec:researchsoft} we identify four +specific barriers that we consider to be approachable: lack of +education and confidence with code; lack of facilities and tools to +support collaborative development; lack of incentive to distribute +software given the academic focus on paper publications; and +reusability problems caused by platform incompatibilities. \subsection{Barrier to reuse: Lack of education and confidence with code} \label{sec:lackofeducation} In section \ref{sec:researchsoft} we noted that research software -developers are largely self-trained, and Merali~\cite{merali2010} +developers are largely self-trained. Merali~\cite{merali2010} provides a number of examples of unfortunate outcomes caused by lack -of education and experience in software development. +of education and experience in software development. Although +software development is a deep subject, our belief is that worthwhile +improvements to normal working practice can follow relatively small +amounts of training. In November 2010 we organised an Autumn School for researchers, presented by Dr Greg Wilson and based on the Software Carpentry @@ -255,52 +253,49 @@ http://soundsoftware.ac.uk/autumnschool2010video} A subsequent online poll of attendees~\cite{autumnschoolsurvey} -suggests that training in even the most basic software development -skills may be well received by and beneficial to researchers. -Attendees identified program design, testing and validation, and -provenance and reproducibility as particularly valuable areas covered, -and these are areas in which the simplest possible introductions to -program structure, test-driven development, and version control can -provide sufficient provocation for the researcher to re-evaluate their -own practices. +supports the view that training in even the most basic software +development skills may be well received by, and beneficial to, +researchers. Attendees identified program design, testing and +validation, and provenance and reproducibility as particularly +valuable areas covered. These are areas in which the simplest +possible introductions to program structure, test-driven development, +and version control can provide sufficient provocation for the +researcher to re-evaluate their own practices. \subsection{Barrier to reuse: Lack of facilities and tools} \label{sec:lackoffacilities} -Researchers will not make use of version control and collaborative -development facilities that are not available to them, or of whose -existence they are not aware. Few of the attendees at our Autumn -School (section \ref{sec:lackofeducation}) were aware of such -facilities being provided by their institutions, and in our survey -(section \ref{sec:researchsoft}) only a minority of respondents made -use of them. This is consistent with experience in our own research -group, where version control has been used only sporadically. - -Attendees at the Autumn School also reported difficulty during the -course in getting started with the complex user interfaces available -for version control. Nonetheless, version control was amongst the -areas identified subsequently as most valuable, suggesting that lack -of awareness may be the main barrier to uptake. - -\subsubsection{SoundSoftware Code Site} +\subsubsection{Facilities for code hosting and version control} \label{sec:codesite} -We provide the SoundSoftware code site\footnote{\tt - http://code.soundsoftware.ac.uk/} as a facility which audio and -music researchers in the UK may use for collaborative development and -as a version control and code hosting facility, if their institution -is unable to help them or if they have a need to work with researchers -at other institutions who would not be permitted access to their -institution's facilities. The existence of this site also addresses -shortcomings in our own group's use of version control -(section~\ref{sec:lackoffacilities}). The site is implemented using a -custom version of the Redmine\footnote{\tt http://redmine.org/} -project management application, together with Mercurial version -control. Any UK researcher in the field can register and start their -own projects. +Researchers will not make use of version control and collaborative +development facilities that are not available to them, or that they do +not know about. Few of the attendees at our Autumn School (section~\ref{sec:lackofeducation}) were aware of such facilities being +provided by their institutions, and in our survey (section~\ref{sec:researchsoft}) only a minority of respondents said they used +them. This is consistent with experience in our own research group, +where version control has been used only sporadically. -Four aspects of our code site contribute to sustainability and utility -for researchers: +To address this issue, we developed the SoundSoftware code +site\footnote{\tt http://code.soundsoftware.ac.uk/} as a service which +audio and music researchers in the UK may use for collaborative +development and as a version control and code hosting facility. The +site is designed to help researchers whose institutions have no +suitable facility or who need to collaborate with individuals at other +institutions in a way that their own facilities do not +support. Research groups as a whole may also make use of the site, and +at the C4DM we use it to provide version control to our own +researchers. + +The site is implemented using a custom version of the +Redmine\footnote{\tt http://redmine.org/} project management +application, together with Mercurial version control. Any UK +researcher in the field can register and start their own projects. + +We designed three aspects of the code site to contribute to +sustainability and code reuse for researchers, distinguishing this +site from general-purpose code hosting facilities such as Google +Code\footnote{\tt http://code.google.com/} or GitHub\footnote{\tt + http://github.com/}: % Figures as of 24th Sept 2011: % @@ -318,32 +313,42 @@ \begin{enumerate} \item {\em Focus} --- The focus of the site on audio and music - research may make it easier to locate and obtain code. + research is intended to make researchers who do not think of + themselves as software developers feel that they are among peers, + and to make it easier to locate and obtain relevant code. \item {\em Linking publications with code} --- Users can associate - publication records with their projects so that readers can - immediately see what publications are related to the code. + publication records with their projects, so that readers can + immediately see what publications are related to the code (see + section~\ref{sec:lackofincentives}). \item {\em Public and private projects} --- Projects can be entirely public, or private to a group of collaborating researchers; work can - also be started privately and made public later. At the time of - writing, 57\% of projects hosted at the site are private, and the - average of the numbers of members per private project is 1.97. -\item {\em Tracking external projects} --- Researchers who use code - hosting or project management facilities elsewhere can also make use - of our site as a nexus for relevant projects, as the site does not - require that code is hosted with it and can also track external - repositories. + also be started privately and made public later. We believe that + supporting private projects helps users become comfortable with the + site. At the time of writing, 57\% of projects hosted at the site + are private and even for private projects the average number of + members is almost 2. \end{enumerate} -We hope these features together will encourage researchers to employ -collaborative development early in their work, and to place themselves -in a situation in which the outcomes of their work can be used in a -sustainable way with relatively little extra effort. That said, our -primary goal is to encourage researchers to make best use of whatever -facilities are available to them; this site is only one offering. +The site is also capable of tracking external projects. Researchers +who use code hosting or project management facilities elsewhere can +also make use of our site as a nexus for relevant projects, +registering their projects at our site and making it point to their +own hosting. -\subsubsection{EasyMercurial} +We believe these features together will encourage researchers to +employ collaborative development early in their work, and to place +themselves in a situation in which the outcomes of their work can be +used in a sustainable way with relatively little extra effort. + +\subsubsection{User interfaces for version control} \label{sec:easyhg} +Attendees at the Autumn School also reported difficulty during the +course in getting started with the complex user interfaces available +for version control. Nonetheless, version control was amongst the +areas identified subsequently as most valuable, suggesting that lack +of awareness may be the main barrier to uptake. + Our attempt to address the difficulties faced in learning version control user interfaces is EasyMercurial,\footnote{\tt http://easyhg.org} an application we developed based on existing @@ -352,6 +357,7 @@ platforms. \subsection{Barrier to reuse: Lack of incentive for publication} +\label{sec:lackofincentives} In section \ref{sec:researchsoft} we noted that software and data are not typically recognised as assessable research outputs. Software @@ -367,6 +373,7 @@ publications, increasing the citation impact of the work. \subsection{Barrier to reuse: Platform incompatibilities} +\label{sec:platforms} We observed in \ref{sec:researchsoft} that researchers in this field choose to use many platforms and programming languages to carry out @@ -479,8 +486,7 @@ {\bf Provide version control software and hosting} and encourage researchers to use it. Version control is useful when writing papers in formats such as \LaTeX{}, as well as for code. If you are in the -UK and cannot or prefer not to provide hosting facilities, consider -using our code site (see section~\ref{sec:codesite}). +UK, consider using our code site (see section~\ref{sec:codesite}). The benefits of version control---managing software history and enabling collaborative development---are sufficiently abstract that it