SoDaMaT Project » History » Version 17

Steve Welburn, 2012-06-07 02:31 PM

1 1 Steve Welburn
h1. Sound Data Management Training (SoDaMaT) 
2 1 Steve Welburn
3 3 Steve Welburn
{{>toc}}
4 3 Steve Welburn
5 1 Steve Welburn
h2. Overview
6 1 Steve Welburn
7 14 Steve Welburn
Sound Data Management Training (SoDaMaT) is an eight-month project to create and evaluate discipline-specific data management training material for digital music and audio research. The materials will be targeted to: postgraduate research students (MSc and PhD); research staff (postdoctoral researchers, CIs, PIs); and academic staff. The project is to run at the Centre for Digital Music (C4DM) at Queen Mary University of London (QMUL) from June 2012 to January 2013, in collaboration with the "QMUL Learning Institute":http://www.learninginstitute.qmul.ac.uk/ .
8 1 Steve Welburn
9 1 Steve Welburn
The immediate objectives of the SoDaMaT project are:
10 1 Steve Welburn
# to develop specific training material on data management planning for research projects, targeting research and academic staff in digital music and audio research;
11 1 Steve Welburn
# to develop training material covering the different aspects of research data management, including subject-specific topics such as music copyrights, for postgraduate students, research and academic staff in the area of digital music and audio;
12 15 Steve Welburn
# to collaborate with institutional partners ("QMUL Learning Institute":http://www.learninginstitute.qmul.ac.uk/), other projects ("SoundSoftware.ac.uk":http://soundsoftware.ac.uk/), and discipline-specific societies (Digital Music Research Network, "International Society for Music Information Retrieval":http://www.ismir.net/ ) to test the training material in postgraduate courses, workshops, and tutorials, and to collect feedback on their quality and impact;
13 14 Steve Welburn
# to collaborate with institutional partners ("QMUL Learning Institute":http://www.learninginstitute.qmul.ac.uk/ ; EECS School of Electronic Engineering and Computer Science; QMUL IT Services) to embed the training material into postgraduate curricula and Continuous Professional Development courses to assure the long-term sustainability and generalisation of the project's results to other similar disciplines.
14 1 Steve Welburn
15 1 Steve Welburn
The requirements will be scoped, and the training materials will be trialled, within the Centre for Digital Music (C4DM), part of the School of Electronic Engineering and Computer Science (EECS) at Queen Mary University of London.
16 1 Steve Welburn
17 1 Steve Welburn
In addition to designing, producing, and evaluating discipline-specific training material, a wider objective is to promote good practice in research data management through education and awareness both within Queen Mary University of London, and across UK and overseas research institutions in the digital music and audio area.
18 2 Steve Welburn
19 2 Steve Welburn
h2. Background
20 2 Steve Welburn
21 2 Steve Welburn
A "survey":http://rdm.c4dm.eecs.qmul.ac.uk/blog/DAF-interviews on data management practices among researchers and students at C4DM, conducted during the JISC-funded "Sustainable Management of Digital Music Research Data" (SMDMRD) project, showed very low awareness of the importance of research data management as part of the research workflow. Although many researchers organise their data in folders and perform semi-regular backups, to the specific question "Do you have a particular strategy for data management", the majority responded negatively. Through our links with other groups via the EPSRC-funded Sound Software project (see Collaborations section), we have good reason to believe the situation is similar in many other music and audio research groups. The results of the survey point to the need for raising awareness of the benefits of research data management, such as a potential increase in citations, understanding and meeting data management requirements set by funding bodies (e.g. EPSRC), and producing sustainable and reproducible research.
22 2 Steve Welburn
23 2 Steve Welburn
The SMDMRD project defined a set of data management policies and created a pilot data management system for C4DM. This was a pioneering effort within QMUL, and a collaboration with the QMUL IT Services has been recently established to adapt the results of the SMDMRD project to define institutional policies, and build an institutional research data repository. Policies can be used to raise awareness among research staff and students by imposing rules of conduct, and adherence to such policies is supported by tools like a data repository. Nevertheless, policies only give a general idea of why research data management is important, and the enthusiasm for using a data management system can easily fade if a culture for data management is not established. These facts point to the need for continuous, embedded, sustainable data management training, with strong focus on promoting the benefits of research data management, ideally from the early stages of a researcher's career.
24 3 Steve Welburn
25 3 Steve Welburn
h2. The Digital Music and Audio Researcher Profile
26 3 Steve Welburn
27 11 Steve Welburn
A wealth of material for training researchers in data management has been produced by previous JISC-funded projects such as "Incremental":http://www.jisc.ac.uk/whatwedo/programmes/mrd/rdmi/incremental.aspx and those in the "RDMTrain":http://www.jisc.ac.uk/whatwedo/programmes/mrd/rdmtrain.aspx programme. The "Research Data Management Skills Support Initiative":http://www.dcc.ac.uk/training/data-management-courses-and-training/skills-frameworks (DaMSSI), which collected and compared the results from discipline-specific data management training projects in the "RDMTrain":http://www.jisc.ac.uk/whatwedo/programmes/mrd/rdmtrain.aspx programme, in its "final report":http://www.dcc.ac.uk/webfm_send/532 came to the conclusion that "participants respond well to discipline-specific examples and the opportunity to discuss issues with tutors and others in similar disciplines" and that "a discipline-specific approach is more likely to engage students - in many cases principles are the same across disciplines but are more interesting to students if these principles can be seen in the students' own context". "DaMSSI":http://www.dcc.ac.uk/training/data-management-courses-and-training/skills-frameworks also produced three discipline-specific researcher profiles - in the social sciences, in clinical psychology, and in archeology - and two generic data profiles - the conservator and the data manager. We believe that researchers at the Centre for Digital Music, and researchers in similar laboratories or institutions, do not fit in the above-mentioned profiles.
28 3 Steve Welburn
29 7 Steve Welburn
The Centre for Digital Music at Queen Mary University of London is one of the leading research centres in the field of audio and music technology and signal processing.  C4DM makes use of a variety of data as research inputs - most obviously audio datasets - and produces a variety of types of data as research outputs.  These outputs include: (i) manually annotated feature data ("reference annotations") such as expert chord and key transcriptions of existing music recordings which are used as comparative data for evaluating research work, and (ii) automatically produced annotations such as those accompanying the publication of methods for audio feature analysis. 
30 3 Steve Welburn
31 3 Steve Welburn
The primary targets for the training material to be produced by the proposed project are postgraduate research students, and research and academic staff in C4DM, who perform research over a range of areas including music informatics, machine listening, audio engineering and interaction. C4DM is one of the leading research centres in the field of audio and music technology and signal processing. C4DM makes use of a variety of data as research inputs - most obviously audio datasets - and produces a variety of types of data as research outputs. A common use-case in C4DM research is to run a newly-developed analysis algorithm on a set of audio examples and evaluate the algorithm by comparing its output with that of a human annotator. Results are then compared with published results using the same input data to determine whether the newly proposed approach makes any improvement on the state of the art.
32 3 Steve Welburn
33 3 Steve Welburn
The type of data used in digital music and audio research poses some challenges that need to be addressed in discipline-specific training material. These challenges include:
34 3 Steve Welburn
# Copyright: the copyright status of digital music data is often difficult to establish. For example, the owner of internally generated data might be unclear, or data purchased or downloaded from outside might have special license requirements that must be adhered to. This prevents researchers from publishing data in order to avoid unnecessary risk. Addressing this aspect in detail and emphasising the use of less restrictive licenses (e.g. Creative Commons, Open Data Commons), could lead to a larger amount of data being published in public repositories.
35 9 Steve Welburn
# Metadata: the line between data and metadata is often unclear. For example, descriptive metadata (e.g. a song's title, author, year of publication, or key) is in another context used as data. The training material will focus on defining what data and metadata are, on the importance of metadata standards, and on their use, together with standard protocols such as "OAI-PMH":http://www.openarchives.org/OAI/openarchivesprotocol.html and "SWORD":http://swordapp.org/, to exchange data among repositories.
36 3 Steve Welburn
# Ethical approval and participant agreement: experimental work based on human responses (e.g. perceptual listening tests) require ethical approval. The lack of information and experience on this topic leads people to write ethics forms that prohibit the release of data, preventing other researchers from reproducing or extending their results, when data could be safely released with the participants' consent if anonymised. %Data is often not published because, for lack of information, the creators tend to be exceedingly "safe" in this respect. The material will include information on how ethical approval works, how to obtain it, and information about publication of sensitive data.
37 3 Steve Welburn
38 12 Steve Welburn
In addition to the recommendations from "DaMSSI":http://www.dcc.ac.uk/training/data-management-courses-and-training/skills-frameworks , the need for specific training material for digital music and audio researchers is justified by at least two additional factors. First, most of the researchers are either computer scientists or electrical engineers and have advanced IT skills. Second, the data is very heterogeneous, rapidly changing, and relatively small in size. As a result, it is usually managed by the creator of the data itself. Thus, the clear separation pointed out by the profiles produced by "DaMSSI":http://www.dcc.ac.uk/training/data-management-courses-and-training/skills-frameworks , as well as in Pryor and Donnelly (2009, p. 165), between the data creator and the data manager/librarian/scientist becomes blurred: all the different aspects can be, and often are, taken care of by the same person.
39 3 Steve Welburn
40 3 Steve Welburn
h2. Evaluation
41 3 Steve Welburn
42 15 Steve Welburn
Strong attention will be payed to evaluate the quality and impact on research practice of the training material. By taking advantage of the established collaborations, the material will be tested in different situations, including postgraduate courses, internal and external seminars and workshops, and tutorials at international conferences. A tutorial proposal has been already submitted in collaboration with the Sound Software project to the "2012 ISMIR conference":http://ismir2012.ismir.net/ (8-12 October in Porto, Portugal). The "International Society for Music Information Retrieval":http://www.ismir.net/ (ISMIR) serves the purposes of fostering the exchange of ideas between and among members whose activities, though diverse, stem from a common interest in music information retrieval. A tutorial proposal will also be submitted to "DAFx-12":http://dafx12.york.ac.uk/ (Digital Audio Effects conference, 17-21 September in York).
43 3 Steve Welburn
44 13 Steve Welburn
The "QMUL Learning Institute":http://www.learninginstitute.qmul.ac.uk/ will provide support and know-how in evaluation methodologies and analysis. Feedback will be collected using: (i) anonymous questionnaires after the tutorials/workshops, tailored to the specific audience; (ii) online questionnaires; (iii) standard course evaluation for postgraduate modules; (iv) focus groups interviewed a few months after the training to establish the longer-term impact of the training. The feedback will be used to iteratively improve the material. Revised versions of all training materials will be available by the end of the project.
45 3 Steve Welburn
46 3 Steve Welburn
h2. Sustainability
47 3 Steve Welburn
48 3 Steve Welburn
We aim to achieve sustainability in the longer term both in the digital music and audio research community, and within QMUL. Our goals are:
49 3 Steve Welburn
# *to make discipline-specific training sustainable in the digital music and audio research community.* Awareness will be raised by presenting the material in collaboration with the Sound Software project at similar UK research institutions, and at discipline-specific conferences (ISMIR and DAFx). Training material will be made available for reuse through the Jorum repository.
50 17 Steve Welburn
# *to set an example within QMUL.* The project will be used as an example by the "QMUL Learning Institute":http://www.learninginstitute.qmul.ac.uk/ , the School of Electronic Engineering and Computer Science, and the IT Services to expand the data management training to other disciplines by adapting the material and methodologies, starting from related research areas such as Signal Processing, and more generally Electronic Engineering and Computer Science. Data management training will be integrated in postgraduate curricula: every PhD student is expected to take part in approximately 210 hours of development activities (including research methods courses) over the course of their studies and the points gained are mapped against the four domains of the "Vitae":http://www.vitae.ac.uk/ /RCUK "Researcher Development Framework":http://www.vitae.ac.uk/researchers/428241/Researcher-Development-Framework.html . Material for Continuous Professional Development courses for research and academic staff will also be adapted to other disciplines, and all face-to-face training will be complemented by online training material.
51 3 Steve Welburn
52 4 Steve Welburn
h2. [[Workplan]]
53 3 Steve Welburn
54 1 Steve Welburn
The work of the project is divided into four work packages (WP):
55 5 Steve Welburn
* [[Workplan#WP1-Training-Material-Design|WP1 Training Material Design]]
56 5 Steve Welburn
** [[Workplan#WP1.1-Research-Of-Available-Resources|WP1.1 Research Of Available Resources]]
57 5 Steve Welburn
** [[Workplan#WP1.2-Online-Training-Material|WP1.2 Online Training Material]]
58 5 Steve Welburn
** [[Workplan#WP1.3-Research-Staff-Material|WP1.3 Research Staff Material]]
59 5 Steve Welburn
** [[Workplan#WP1.4-Post-Graduate-Course-Material|WP1.4 Post-Graduate Course Material]]
60 5 Steve Welburn
* [[Workplan#WP2-Test-and-evaluation|WP2 Test and evaluation]]
61 5 Steve Welburn
* [[Workplan#WP3-Embedding|WP3 Embedding]]
62 5 Steve Welburn
* [[Workplan#WP4-Communication-and-Management|WP4 Communication and Management]]
63 3 Steve Welburn
64 3 Steve Welburn
h2. References
65 3 Steve Welburn
66 4 Steve Welburn
??Pryor, G. and Donnelly, M. (2009). "Skilling up to do data: whose role, whose responsibility, whose career?":http://www.ijdc.net/index.php/ijdc/article/view/126 The International Journal of Digital Curation. Vol. 4(2), pp. 158--170.??