Outline notes for ICASSP 2012 paper submission¶
General form of a paper:
- What problem we tried to solve
- How we tried to solve it
- How well it worked
What is the problem here?¶
(Can we add references here!?)
- Research in this field involves developing software
- That software typically is not published
- Consequently it's hard to get hold of reference implementations of significant algorithms or to reproduce results from papers
We did a survey -- but can we package its results in a way that provide any sense of scientific rigour?
It found:
- People use lots of different languages and environments
- Many people don't share their code
- A surprising number asserted that they did not intend to publish any code, and that their code never left their own computer
We also observed that our own facilities were not ideal and not being used to best advantage.
We also made some observations in the Autumn School (though this is getting a bit circular since the Autumn School was one of the things we've been trying to do as well):
- Many attendees had never used version control before
- When its benefits were shown to them, they were generally very receptive & positive (version control was identified as a good point in the programme)
- But they found it trickier than I had expected to get going with the Subversion client used in the workshop
What have we been trying to do?¶
- Promote collaborative development from the outset -- if we can encourage people to work together on code even just as much as they would normally work together on a paper, then we will increase their comfort with disclosing code later -- this faces some tricky cultural obstacles though (e.g. necessity to convince supervisors etc that you did your own work on your own)
- Provide facilities and services that people can use and educate them to make best use of them (or of any facilities they already have)
- Get hands-on, taking care of code that people really want to use
What have we done so far?¶
- Autumn School
- Code repository site
- EasyMercurial
How well has it worked?¶
What will we do next?¶
- More learning materials
- Follow-ups to Autumn School
- Visits to other UK research institutions
Thoughts from Mark¶
Concrete examples are always nice.
We should aim to make a case: "I could use this facility", or "I could copy this facility"; "I could apply this method"
General scientific paper principle: Background (what other people have tried, and why it is lacking); our solution; evaluation. For background we can discuss the general philosophy of software development and how software is actually developed in academia (with stats).
The Reproducible Research movement shows a similar intention but does not address quite the same problem -- it is somewhat orthogonal. If a researcher does not elect to make their results available in a full RR form, they nonetheless still have a problem with how their software is actually made -- we can make incremental improvements to software development practice (a bottom-up approach) and can help people get better results no matter how they publish their work. Sustainability and reusability are relevant even if you do not intend ever to publish your code openly.
[However, encouraging openness in software development should lead to greater comfort with the idea of reproducible research as well.]
In future work, we can consider how to handle really big data -- and how to handle copyright etc -- in a way that is less ad-hoc than is typically the case at present.
It is important to explain how is our project different -- Reproducible Research is a tool, not an end on itself. We aim at sustainability by tying together code versioning, documentation, code availability, etc. The idea of a community (UK wide) is fundamental to achieve our goals.
Reproducible Research References¶
EPFL Page on RR: EPFL's Repository:Some papers worth reading (http://reproducibleresearch.net/index.php/RR_links)
- WaveLab and Reproducible Research (J. B. Buckheit and D. L. Donoho, )
- Dept. of Statistics, Stanford University, Tech. Rep. 474, 1995.
- http://www-stat.stanford.edu/~donoho/Reports/1995/wavelab.pdf
- Reproducible Research: The bottom line (de Leeuw, Jan)
- Department of Statistics, UCLA, Department of Statistics Papers, March 2001
- http://escholarship.org/uc/item/9050x4r4#page-1
- Pushing Science into Signal Processing (M. Barni and F. Perez-Gonzalez)
- IEEE Signal Processing Magazine, vol. 22, no. 4, pp. 119–120, July 2005.
- http://www.gts.tsc.uvigo.es/~fperez/docs/pushing_science.pdf
- Sharing Detailed Research Data Is Associated with Increased Citation Rate (H. A. Piwowar, R. S. Day, and D. B. Fridsma)
- PLoS ONE, March 2007
- http://www.plosone.org/article/info:doi/10.1371/journal.pone.0000308
- How to encourage and publish reproducible research (J. Kovačević)
- Reproducible Research in Signal Processing - What, why, and how (P. Vandewalle, J. Kovacevic and M. Vetterli,)
- IEEE Signal Processing Magazine, May 2009
- http://rr.epfl.ch/17/
- SHARE: a web portal for creating and sharing executable research papers (P. Van Gorp and S. Mazanek)
- Proc. International Conference on Computational Science, 2011
- T. D. Brody, “Evaluating research impact through open access to scholarly communication,” Ph.D. dissertation, School of Electronics and Comput. Sci., Univ. Southampton, May 2006. [Online]. Available: http://eprints.ecs.soton. ac.uk/13313/