SoDaMaT Project » History » Version 56

« Previous - Version 56/65 (diff) - Next » - Current version
Steve Welburn, 2012-11-20 12:42 PM


Sound Data Management Training (SoDaMaT)

(for general information re. Research Data Management please see the parent project Wiki)

1 SoDaMaT Overview

2 SoDaMaT Background

The Digital Music and Audio Researcher Profile

A wealth of material for training researchers in data management has been produced by previous JISC-funded projects such as Incremental and those in the RDMTrain programme. The Research Data Management Skills Support Initiative (DaMSSI), which collected and compared the results from discipline-specific data management training projects in the RDMTrain programme, in its final report came to the conclusion that "participants respond well to discipline-specific examples and the opportunity to discuss issues with tutors and others in similar disciplines" and that "a discipline-specific approach is more likely to engage students - in many cases principles are the same across disciplines but are more interesting to students if these principles can be seen in the students' own context". DaMSSI also produced three discipline-specific researcher profiles - in the social sciences, in clinical psychology, and in archeology - and two generic data profiles - the conservator and the data manager. We believe that researchers at the Centre for Digital Music, and researchers in similar laboratories or institutions, do not fit in the above-mentioned profiles.

The Centre for Digital Music (C4DM) at QMUL is one of the leading research centres in the field of audio and music technology and signal processing. C4DM makes use of a variety of data as research inputs - most obviously audio datasets - and produces a variety of types of data as research outputs. These outputs include:
  1. manually annotated feature data ("reference annotations") such as expert chord and key transcriptions of existing music recordings which are used as comparative data for evaluating research work;
  2. automatically produced annotations such as those accompanying the publication of methods for audio feature analysis.

The primary targets for the training material to be produced by the proposed project are postgraduate research students, and research and academic staff in C4DM, who perform research over a range of areas including music informatics, machine listening, audio engineering and interaction. C4DM is one of the leading research centres in the field of audio and music technology and signal processing. C4DM makes use of a variety of data as research inputs - most obviously audio datasets - and produces a variety of types of data as research outputs. A common use-case in C4DM research is to run a newly-developed analysis algorithm on a set of audio examples and evaluate the algorithm by comparing its output with that of a human annotator. Results are then compared with published results using the same input data to determine whether the newly proposed approach makes any improvement on the state of the art.

The type of data used in digital music and audio research poses some challenges that need to be addressed in discipline-specific training material. These challenges include:
  1. Copyright: the copyright status of digital music data is often difficult to establish. For example, the owner of internally generated data might be unclear, or data purchased or downloaded from outside might have special license requirements that must be adhered to. This prevents researchers from publishing data in order to avoid unnecessary risk. Addressing this aspect in detail and emphasising the use of less restrictive licenses (e.g. Creative Commons , Open Data Commons ), could lead to a larger amount of data being published in public repositories.
  2. Metadata: the line between data and metadata is often unclear. For example, descriptive metadata (e.g. a song's title, author, year of publication, or key) is in another context used as data. The training material will focus on defining what data and metadata are, on the importance of metadata standards, and on their use, together with standard protocols such as OAI-PMH and SWORD, to exchange data among repositories.
  3. Ethical approval and participant agreement: experimental work based on human responses (e.g. perceptual listening tests) require ethical approval. The lack of information and experience on this topic leads people to write ethics forms that prohibit the release of data, preventing other researchers from reproducing or extending their results, when data could be safely released with the participants' consent if anonymised. %Data is often not published because, for lack of information, the creators tend to be exceedingly "safe" in this respect. The material will include information on how ethical approval works, how to obtain it, and information about publication of sensitive data.

In addition to the recommendations from DaMSSI , the need for specific training material for digital music and audio researchers is justified by at least two additional factors. First, most of the researchers are either computer scientists or electrical engineers and have advanced IT skills. Second, the data is very heterogeneous, rapidly changing, and relatively small in size. As a result, it is usually managed by the creator of the data itself. Thus, the clear separation pointed out by the profiles produced by DaMSSI , as well as in Pryor and Donnelly (2009, p. 165), between the data creator and the data manager/librarian/scientist becomes blurred: all the different aspects can be, and often are, taken care of by the same person.

Evaluation

Strong attention will be payed to evaluate the quality and impact on research practice of the training material. By taking advantage of the established collaborations, the material will be tested in different situations, including postgraduate courses, internal and external seminars and workshops, and tutorials at international conferences. The International Society for Music Information Retrieval (ISMIR) serves the purposes of fostering the exchange of ideas between and among members whose activities, though diverse, stem from a common interest in music information retrieval. A tutorial proposal been submitted in collaboration with the Sound Software project to the 2012 ISMIR conference (8-12 October in Porto, Portugal). A tutorial proposal will also be submitted to DAFx-12 (Digital Audio Effects conference, 17-21 September in York).

The QMUL Learning Institute will provide support and know-how in evaluation methodologies and analysis.

Feedback will be collected using:
  1. anonymous questionnaires after the tutorials/workshops, tailored to the specific audience;
  2. online questionnaires;
  3. standard course evaluation for postgraduate modules;
  4. focus groups interviewed a few months after the training to establish the longer-term impact of the training.

The feedback will be used to iteratively improve the material. Revised versions of all training materials will be available by the end of the project.

Sustainability

We aim to achieve sustainability in the longer term both in the digital music and audio research community, and within QMUL. Our goals are:
  1. to make discipline-specific training sustainable in the digital music and audio research community. Awareness will be raised by presenting the material in collaboration with the Sound Software project at similar UK research institutions, and at discipline-specific conferences (ISMIR and DAFx). Training material will be made available for reuse through the Jorum repository.
  2. to set an example within QMUL. The project will be used as an example by the QMUL Learning Institute , the School of Electronic Engineering and Computer Science, and the IT Services to expand the data management training to other disciplines by adapting the material and methodologies, starting from related research areas such as Signal Processing, and more generally Electronic Engineering and Computer Science. Data management training will be integrated in postgraduate curricula: every PhD student is expected to take part in approximately 210 hours of development activities (including research methods courses) over the course of their studies and the points gained are mapped against the four domains of the Vitae /RCUK Researcher Development Framework . Material for Continuous Professional Development courses for research and academic staff will also be adapted to other disciplines, and all face-to-face training will be complemented by online training material.

Workplan

The work of the project is divided into four work packages (WP):

An overview of the intended content of the work packages is here

Training the Trainers

Additional Notes

References

Pryor, G. and Donnelly, M. (2009). Skilling up to do data: whose role, whose responsibility, whose career? The International Journal of Digital Curation. Vol. 4(2), pp. 158--170.

Research Data Management Skills Support Initiative (DaMSSI) final report

Printable Version