SoDaMaT Project » History » Version 4

Version 3 (Steve Welburn, 2012-06-07 01:21 PM) → Version 4/65 (Steve Welburn, 2012-06-07 01:28 PM)

h1. Sound Data Management Training (SoDaMaT)


h2. Overview

Sound Data Management Training (SoDaMaT) is an eight-month project to create and evaluate discipline-specific data management training material for digital music and audio research. The materials will be targeted to: postgraduate research students (MSc and PhD); research staff (postdoctoral researchers, CIs, PIs); and academic staff. The project is to run at the Centre for Digital Music (C4DM) at Queen Mary University of London (QMUL) from June 2012 to January 2013, in collaboration with the QMUL Learning Institute.

The immediate objectives of the SoDaMaT project are:
# to develop specific training material on data management planning for research projects, targeting research and academic staff in digital music and audio research;
# to develop training material covering the different aspects of research data management, including subject-specific topics such as music copyrights, for postgraduate students, research and academic staff in the area of digital music and audio;
# to collaborate with institutional partners (QMUL Learning Institute), other projects (, and discipline-specific societies (Digital Music Research Network, International Society for Music Information Retrieval) to test the training material in postgraduate courses, workshops, and tutorials, and to collect feedback on their quality and impact;
# to collaborate with institutional partners (QMUL Learning Institute; EECS School of Electronic Engineering and Computer Science; QMUL IT Services) to embed the training material into postgraduate curricula and Continuous Professional Development courses to assure the long-term sustainability and generalisation of the project's results to other similar disciplines.

The requirements will be scoped, and the training materials will be trialled, within the Centre for Digital Music (C4DM), part of the School of Electronic Engineering and Computer Science (EECS) at Queen Mary University of London.

In addition to designing, producing, and evaluating discipline-specific training material, a wider objective is to promote good practice in research data management through education and awareness both within Queen Mary University of London, and across UK and overseas research institutions in the digital music and audio area.

h2. Background

A "survey": on data management practices among researchers and students at C4DM, conducted during the JISC-funded "Sustainable Management of Digital Music Research Data" (SMDMRD) project, showed very low awareness of the importance of research data management as part of the research workflow. Although many researchers organise their data in folders and perform semi-regular backups, to the specific question "Do you have a particular strategy for data management", the majority responded negatively. Through our links with other groups via the EPSRC-funded Sound Software project (see Collaborations section), we have good reason to believe the situation is similar in many other music and audio research groups. The results of the survey point to the need for raising awareness of the benefits of research data management, such as a potential increase in citations, understanding and meeting data management requirements set by funding bodies (e.g. EPSRC), and producing sustainable and reproducible research.

The SMDMRD project defined a set of data management policies and created a pilot data management system for C4DM. This was a pioneering effort within QMUL, and a collaboration with the QMUL IT Services has been recently established to adapt the results of the SMDMRD project to define institutional policies, and build an institutional research data repository. Policies can be used to raise awareness among research staff and students by imposing rules of conduct, and adherence to such policies is supported by tools like a data repository. Nevertheless, policies only give a general idea of why research data management is important, and the enthusiasm for using a data management system can easily fade if a culture for data management is not established. These facts point to the need for continuous, embedded, sustainable data management training, with strong focus on promoting the benefits of research data management, ideally from the early stages of a researcher's career.

h2. The Digital Music and Audio Researcher Profile

A wealth of material for training researchers in data management has been produced by previous JISC-funded projects such as "Incremental": and those in the "RDMTrain programme": The DaMMSI project, which collected and compared the results from discipline-specific data management training projects in the RDMTrain programme, in its "final report":\_send/532 came to the conclusion that "participants respond well to discipline-specific examples and the opportunity to discuss issues with tutors and others in similar disciplines" and that "a discipline-specific approach is more likely to engage students---in many cases principles are the same across disciplines but are more interesting to students if these principles can be seen in the students' own context". The DaMMSI project also produced three discipline-specific researcher profiles---in the social sciences, in clinical psychology, and in archeology---and two generic data profiles---the conservator and the data manager. We believe that researchers at the Centre for Digital Music, and researchers in similar laboratories or institutions, do not fit in the above-mentioned profiles.

The Centre for Digital Music at Queen Mary University of London is one of the leading research centres in the field of audio and music technology and signal processing. C4DM makes use of a variety of data as research inputs---most obviously audio datasets---and produces a variety of types of data as research outputs. These outputs include: (i) manually annotated feature data ("reference annotations") such as expert chord and key transcriptions of existing music recordings which are used as comparative data for evaluating research work, and (ii) automatically produced annotations such as those accompanying the publication of methods for audio feature analysis.

The primary targets for the training material to be produced by the proposed project are postgraduate research students, and research and academic staff in C4DM, who perform research over a range of areas including music informatics, machine listening, audio engineering and interaction. C4DM is one of the leading research centres in the field of audio and music technology and signal processing. C4DM makes use of a variety of data as research inputs - most obviously audio datasets - and produces a variety of types of data as research outputs. A common use-case in C4DM research is to run a newly-developed analysis algorithm on a set of audio examples and evaluate the algorithm by comparing its output with that of a human annotator. Results are then compared with published results using the same input data to determine whether the newly proposed approach makes any improvement on the state of the art.

The type of data used in digital music and audio research poses some challenges that need to be addressed in discipline-specific training material. These challenges include:
# Copyright: the copyright status of digital music data is often difficult to establish. For example, the owner of internally generated data might be unclear, or data purchased or downloaded from outside might have special license requirements that must be adhered to. This prevents researchers from publishing data in order to avoid unnecessary risk. Addressing this aspect in detail and emphasising the use of less restrictive licenses (e.g. Creative Commons, Open Data Commons), could lead to a larger amount of data being published in public repositories.
# Metadata: the line between data and metadata is often unclear. For example, descriptive metadata (e.g. a song's title, author, year of publication, or key) is in another context used as data. The training material will focus on defining what data and metadata are, on the importance of metadata standards, and on their use, together with standard protocols such as OAI-PMH and SWORD, to exchange data among repositories.
# Ethical approval and participant agreement: experimental work based on human responses (e.g. perceptual listening tests) require ethical approval. The lack of information and experience on this topic leads people to write ethics forms that prohibit the release of data, preventing other researchers from reproducing or extending their results, when data could be safely released with the participants' consent if anonymised. %Data is often not published because, for lack of information, the creators tend to be exceedingly "safe" in this respect. The material will include information on how ethical approval works, how to obtain it, and information about publication of sensitive data.

In addition to the recommendations from the DaMSSI project, the need for specific training material for digital music and audio researchers is justified by at least two additional factors. First, most of the researchers are either computer scientists or electrical engineers and have advanced IT skills. Second, the data is very heterogeneous, rapidly changing, and relatively small in size. As a result, it is usually managed by the creator of the data itself. Thus, the clear separation pointed out by the profiles produced by the DaMMSI project, as well as in Pryor and Donnelly (2009, p. 165), between the data creator and the data manager/librarian/scientist becomes blurred: all the different aspects can be, and often are, taken care of by the same person.

h2. Evaluation

Strong attention will be payed to evaluate the quality and impact on research practice of the training material. By taking advantage of the established collaborations, the material will be tested in different situations, including postgraduate courses, internal and external seminars and workshops, and tutorials at international conferences. A tutorial proposal has been already submitted in collaboration with the Sound Software project to the "2012 ISMIR conference": (8-12 October in Porto, Portugal). The International Society for Music Information Retrieval (ISMIR) serves the purposes of fostering the exchange of ideas between and among members whose activities, though diverse, stem from a common interest in music information retrieval. A tutorial proposal will also be submitted to "DAFx-12": (Digital Audio Effects conference, 17-21 September in York).

The QMUL Learning Institute will provide support and know-how in evaluation methodologies and analysis. Feedback will be collected using: (i) anonymous questionnaires after the tutorials/workshops, tailored to the specific audience; (ii) online questionnaires; (iii) standard course evaluation for postgraduate modules; (iv) focus groups interviewed a few months after the training to establish the longer-term impact of the training. The feedback will be used to iteratively improve the material. Revised versions of all training materials will be available by the end of the project.

h2. Sustainability

We aim to achieve sustainability in the longer term both in the digital music and audio research community, and within QMUL. Our goals are:
# *to make discipline-specific training sustainable in the digital music and audio research community.* Awareness will be raised by presenting the material in collaboration with the Sound Software project at similar UK research institutions, and at discipline-specific conferences (ISMIR and DAFx). Training material will be made available for reuse through the Jorum repository.
# *to set an example within QMUL.* The project will be used as an example by the QMUL Learning Institute, the School of Electronic Engineering and Computer Science, and the IT Services to expand the data management training to other disciplines by adapting the material and methodologies, starting from related research areas such as Signal Processing, and more generally Electronic Engineering and Computer Science. Data management training will be integrated in postgraduate curricula: every PhD student is expected to take part in approximately 210 hours of development activities (including research methods courses) over the course of their studies and the points gained are mapped against the four domains of the Vitae/RCUK Researcher Development Framework. Material for Continuous Professional Development courses for research and academic staff will also be adapted to other disciplines, and all face-to-face training will be complemented by online training material.

h2. [[Workplan]] Workplan

The work of the project is divided into four work packages (WP):
* [[Workplan#WP1 [[WP1 Training Material Design|WP1 Training Material Design]] Although the basic principles of data management are valid both for postgraduate students and for research and academic staff, we decided to make a distinction between the two groups (WP1.3 and WP1.4) --- a PhD student starting on his project and a PI writing a grant proposal might want to focus on different aspects of data management. The online material (WP1.2) will cover all aspects and be relevant to both groups.
** [[Workplan#WP1.1 Research Of Available Resources|WP1.1 Research Of Available Resources]]
** [[Workplan#WP1.2 Online Training Material|WP1.2 Online Training Material]]
** [[Workplan#WP1.3 Research Staff Material|WP1.3 Research Staff Material]]
** [[Workplan#WP1.4 Post-Graduate Course Material|WP1.4 Post-Graduate Course Material]]
* [[Workplan#WP2 [[WP2 Test and evaluation|WP2 Test and evaluation]]
* [[Workplan#WP3 Embedding|WP3 [[WP3 Embedding]]
* [[Workplan#WP4 [[WP4 Communication and Management|WP4 Communication and Management]]

h2. References

??Pryor, G. and Donnelly, M. (2009). "Skilling Skilling up to do data: whose role, whose responsibility, whose career?": career?. The International Journal of Digital Curation. Vol. 4(2), pp. 158--170.??