changeset 18:6cb784212511

More small updates
author Chris Cannam
date Sun, 25 Sep 2011 12:58:26 +0100
parents 56627b8fcf4d
children 94c9700135db
files cannam.tex
diffstat 1 files changed, 60 insertions(+), 39 deletions(-) [+]
line wrap: on
line diff
--- a/cannam.tex	Sun Sep 25 12:20:00 2011 +0100
+++ b/cannam.tex	Sun Sep 25 12:58:26 2011 +0100
@@ -112,7 +112,7 @@
 Reproducible Research Repository\footnote{\tt http://rr.epfl.ch/},
 designed to promote reproducible research by requiring the authors of
 a paper to upload the code and data used in the experiments. Readers
-can also comment on a publication and evaluate the reproducibility of
+can then comment on a publication and evaluate the reproducibility of
 the work.
 
 Although the Reproducible Research principle proposes a comprehensive
@@ -122,8 +122,9 @@
 \section{Understanding real-world limitations on software practice}
 \label{sec:researchsoft}
 
-[We are going to propose three barriers to reuse -- lack of education
-  \& confidence; lack of tools \& facilities; platform
+[We are going to propose four barriers to reuse -- lack of education
+  \& confidence; lack of tools \& facilities; lack of incentive,
+  because code is not measured as publications are; platform
   incompatibilities and code rot. We need to give the facts and
   figures supporting these as barriers and then identify them.]
 
@@ -145,13 +146,25 @@
 complete responses. There were responses from at least 16 different
 institutions.
 
-Although 44\% of respondents said that they took steps to ensure
-reproducibility of their work, their accompanying comments suggested
-various interpretations of the meaning of reproducibility.  A common
-theme was that code would be made available on personal request; some
-respondents said that they documented code in order to be able to
-reproduce the results themselves, or that they were planning to
-publish software or data rather than having actually done so.
+Although 80\% of respondents reported developing software themselves
+during research and 40\% of those said that they took steps to ensure
+reproducibility of their publications, the accompanying comments
+showed that this did not necessarily involve the publication of code.
+Respondents referred to using standard, publicly-available datasets
+and calibration procedures when performing measurements; to
+documenting code and data so that they or other researchers in their
+group could reproduce the results later; or to making code available
+on personal request.  All of these are worthwhile actions but they do
+not suggest widespread use of a full reproducible compendium approach.
+
+Developing collaboratively even within a research group also seems to
+be the exception rather than the rule.  51\% of respondents who
+developed software said that their code did not leave their own
+computer, and 59\% said they did not use version control software
+(typically used in software practice to facilitate collaborative
+development).  This is also consistent with the finding by Hannay et
+al that scientists usually developed and used software on their own
+desktop computers rather than dedicated processing servers.
 
 Our respondents cited as obstacles to the publication of code lack of
 time, copyright restrictions, and the potential for future commercial
@@ -161,11 +174,6 @@
 independence and competition, and quality concerns as typical
 inhibiting factors for open sharing of data and code.
 
-Our survey found that most respondents kept code on their own machines
-and did not develop collaboratively. This is consistent with the
-Hannay study, which found that scientists typically developed and used
-software on their personal computers rather than dedicated servers
-(TODO: check this, does it say anything about sharing code?).
 
 [This suggests that there are cultural and technical [lack of
     facilities \& awareness of how to use them] barriers... this makes
@@ -196,14 +204,12 @@
 available to later researchers.  For example, in the well-known
 subject of beat tracking, the method of Scheirer et al~\cite{scheirer}
 was developed in \CC{} for a legacy platform and is now only available
-informally; Goto et al~\cite{goto} was written for a now-defunct
-parallel architecture and never publicly released; Hainsworth
-\cite{hainsworth} was written in MATLAB with a non-portable DLL
-component and only runs on a single platform; Klapuri et al
-\cite{klapuri} was written in MATLAB with a platform-specific
-extension and is not widely distributed for reasons of commercial
-confidentiality; and methods from several other researchers have not
-been published as code.
+informally; Goto et al~\cite{goto} was written for a parallel
+architecture no longer in wide use and never publicly released;
+Hainsworth \cite{hainsworth} was written in MATLAB with a
+Windows-specific DLL component and only runs on a single platform;
+Klapuri et al \cite{klapuri} is not widely distributed; and methods
+from several other researchers have not been published as code.
 
 \section{Sustainable software: a bottom-up approach}
 \label{sec:philosophy}
@@ -298,9 +304,7 @@
 section \ref{sec:lackoffacilities}.  The site is implemented using our
 own custom version of the Redmine\footnote{\tt http://redmine.org/}
 project management application, with Mercurial version control.  Any
-UK researcher in the field can register and start their own
-collaborative projects using the version control, wiki, issue
-tracking, and other services provided.
+UK researcher in the field can register and start their own projects.
 
 Four aspects of our code site contribute to sustainability and utility
 for researchers:
@@ -329,7 +333,7 @@
   public, or private to a group of collaborating researchers; work can
   also be started privately and made public later.  At the time of
   writing, 57\% of projects hosted at the site are private, and the
-  average number of members in private projects is 1.97.
+  average of the numbers of members per private project is 1.97.
 \item {\em Tracking external projects} --- Researchers who use code
   hosting or project management facilities elsewhere can also make use
   of our site as a nexus for relevant projects, as the site does not
@@ -355,8 +359,12 @@
 
 \subsection{Barrier to reuse: Platform incompatibilities}
 
-introductory note here: the barrier is that software that is published
-is not always usable
+We observed in \ref{sec:researchsoft} that researchers in this field
+choose to use many platforms and programming languages to carry out
+their work.  Although the most common (MATLAB) is widely available in
+signal processing groups, it is a commercial platform that is not
+widely available to researchers in other fields related to audio and
+music such as computational musicology or music therapy.
 
 \subsubsection{Sonic Visualiser and Vamp Plugins}
 \label{subsubsec:sv}
@@ -431,20 +439,33 @@
 
 Following this publication, we worked with the author of the paper and
 code to develop a C++ implementation of the method and turn it into a
-highly usable Vamp plugin for chord estimation, named Chordino.  This
-code has been made available at
-http://code.soundsoftware.ac.uk/projects/nnls-chroma --- a Web page
-which also links the code with its associated publication.  Although
-it has been updated since publication, the plugin includes a mode in
-which it uses the same method as that submitted to the MIREX
-evaluation and as a consequence, although this paper still lacks true
-``one-click'' reproducibility, a high degree of openness and effective
-reusability have been achieved even though the process did not begin
-until after the initial publication.
+Vamp plugin for chord estimation, named Chordino.  This code and its
+revision history are available through our code site\footnote{\tt
+  http://code.soundsoftware.ac.uk/projects/nnls-chroma} and thereby
+linked with the associated publication.  Although the code has been
+updated since release, the plugin includes a mode in which it uses the
+same method as that submitted to the MIREX evaluation.  As a
+consequence, although this paper still lacks a true reproducibility
+compendium, a high degree of openness and effective reusability have
+been achieved even though the process did not begin until after the
+initial publication.
 
 %\subsection{Auditory Image Models}
 %\label{subsubsec:aim}
 
+\section{Recommendations}
+\label{sec:recommendations}
+
+[1. Software development training -- pick small battles -- testing,
+  provenance etc]
+
+[2. Version control and related facilities -- provide, encourage
+  people to use, make it simple]
+
+[3. Don't feel bad]
+
+[4. ???]
+
 \section{Conclusions and Future Work}
 \label{sec:conclusions}