Sound Data Management Training Project (SoDaMaT)¶
Sound Data Management Training (SoDaMaT) is an eight-month project to create and evaluate discipline-specific data management training material for digital music and audio research. The materials will be targeted to: postgraduate research students (MSc and PhD); research staff (postdoctoral researchers, CIs, PIs); and academic staff. The project is to run at the Centre for Digital Music (C4DM) at Queen Mary University of London (QMUL) from June 2012 to January 2013, in collaboration with the QMUL Learning Institute .The immediate objectives of the SoDaMaT project are:
- to develop specific training material on data management planning for research projects, targeting research and academic staff in digital music and audio research;
- to develop training material covering the different aspects of research data management, including subject-specific topics such as music copyrights, for postgraduate students, research and academic staff in the area of digital music and audio;
- to collaborate with institutional partners at QMUL (The Learning Institute), other projects (SoundSoftware.ac.uk), and discipline-specific societies (Digital Music Research Network, International Society for Music Information Retrieval ) to test the training material in postgraduate courses, workshops, and tutorials, and to collect feedback on their quality and impact;
- to collaborate with institutional partners at QMUL (Learning Institute ; School of Electronic Engineering and Computer Science ; IT Services ) to embed the training material into postgraduate curricula and Continuous Professional Development courses to assure the long-term sustainability and generalisation of the project's results to other similar disciplines.
In addition to designing, producing, and evaluating discipline-specific training material, a wider objective is to promote good practice in research data management through education and awareness both within QMUL, and across UK and overseas research institutions in the digital music and audio area.
A survey on data management practices among researchers and students at C4DM, conducted during the JISC-funded Sustainable Management of Digital Music Research Data (SMDMRD) project, showed very low awareness of the importance of research data management as part of the research workflow. Although many researchers organise their data in folders and perform semi-regular backups, to the specific question "Do you have a particular strategy for data management", the majority responded negatively. Through our links with other groups via the EPSRC -funded Sound Software project (see Collaborations section), we have good reason to believe the situation is similar in many other music and audio research groups. The results of the survey point to the need for raising awareness of the benefits of research data management, such as a potential increase in citations, understanding and meeting data management requirements set by funding bodies (e.g. EPSRC), and producing sustainable and reproducible research.
The SMDMRD project defined a set of data management policies and created a pilot data management system for C4DM. This was a pioneering effort within QMUL, and a collaboration with the QMUL IT Services has been recently established to adapt the results of the SMDMRD project to define institutional policies, and build an institutional research data repository. Policies can be used to raise awareness among research staff and students by imposing rules of conduct, and adherence to such policies is supported by tools like a data repository. Nevertheless, policies only give a general idea of why research data management is important, and the enthusiasm for using a data management system can easily fade if a culture for data management is not established. These facts point to the need for continuous, embedded, sustainable data management training, with strong focus on promoting the benefits of research data management, ideally from the early stages of a researcher's career.
The Digital Music and Audio Researcher Profile¶
A wealth of material for training researchers in data management has been produced by previous JISC-funded projects such as Incremental and those in the RDMTrain programme. The Research Data Management Skills Support Initiative (DaMSSI), which collected and compared the results from discipline-specific data management training projects in the RDMTrain programme, in its final report came to the conclusion that "participants respond well to discipline-specific examples and the opportunity to discuss issues with tutors and others in similar disciplines" and that "a discipline-specific approach is more likely to engage students - in many cases principles are the same across disciplines but are more interesting to students if these principles can be seen in the students' own context". DaMSSI also produced three discipline-specific researcher profiles - in the social sciences, in clinical psychology, and in archeology - and two generic data profiles - the conservator and the data manager. We believe that researchers at the Centre for Digital Music, and researchers in similar laboratories or institutions, do not fit in the above-mentioned profiles.The Centre for Digital Music (C4DM) at QMUL is one of the leading research centres in the field of audio and music technology and signal processing. C4DM makes use of a variety of data as research inputs - most obviously audio datasets - and produces a variety of types of data as research outputs. These outputs include:
- manually annotated feature data ("reference annotations") such as expert chord and key transcriptions of existing music recordings which are used as comparative data for evaluating research work;
- automatically produced annotations such as those accompanying the publication of methods for audio feature analysis.
The primary targets for the training material to be produced by the proposed project are postgraduate research students, and research and academic staff in C4DM, who perform research over a range of areas including music informatics, machine listening, audio engineering and interaction. C4DM is one of the leading research centres in the field of audio and music technology and signal processing. C4DM makes use of a variety of data as research inputs - most obviously audio datasets - and produces a variety of types of data as research outputs. A common use-case in C4DM research is to run a newly-developed analysis algorithm on a set of audio examples and evaluate the algorithm by comparing its output with that of a human annotator. Results are then compared with published results using the same input data to determine whether the newly proposed approach makes any improvement on the state of the art.The type of data used in digital music and audio research poses some challenges that need to be addressed in discipline-specific training material. These challenges include:
- Copyright: the copyright status of digital music data is often difficult to establish. For example, the owner of internally generated data might be unclear, or data purchased or downloaded from outside might have special license requirements that must be adhered to. This prevents researchers from publishing data in order to avoid unnecessary risk. Addressing this aspect in detail and emphasising the use of less restrictive licenses (e.g. Creative Commons , Open Data Commons ), could lead to a larger amount of data being published in public repositories.
- Metadata: the line between data and metadata is often unclear. For example, descriptive metadata (e.g. a song's title, author, year of publication, or key) is in another context used as data. The training material will focus on defining what data and metadata are, on the importance of metadata standards, and on their use, together with standard protocols such as OAI-PMH and SWORD, to exchange data among repositories.
- Ethical approval and participant agreement: experimental work based on human responses (e.g. perceptual listening tests) require ethical approval. The lack of information and experience on this topic leads people to write ethics forms that prohibit the release of data, preventing other researchers from reproducing or extending their results, when data could be safely released with the participants' consent if anonymised. %Data is often not published because, for lack of information, the creators tend to be exceedingly "safe" in this respect. The material will include information on how ethical approval works, how to obtain it, and information about publication of sensitive data.
In addition to the recommendations from DaMSSI , the need for specific training material for digital music and audio researchers is justified by at least two additional factors. First, most of the researchers are either computer scientists or electrical engineers and have advanced IT skills. Second, the data is very heterogeneous, rapidly changing, and relatively small in size. As a result, it is usually managed by the creator of the data itself. Thus, the clear separation pointed out by the profiles produced by DaMSSI , as well as in Pryor and Donnelly (2009, p. 165), between the data creator and the data manager/librarian/scientist becomes blurred: all the different aspects can be, and often are, taken care of by the same person.
Strong attention will be payed to evaluate the quality and impact on research practice of the training material. By taking advantage of the established collaborations, the material will be tested in different situations, including postgraduate courses, internal and external seminars and workshops, and tutorials at international conferences. The International Society for Music Information Retrieval (ISMIR) serves the purposes of fostering the exchange of ideas between and among members whose activities, though diverse, stem from a common interest in music information retrieval. A tutorial proposal been submitted in collaboration with the Sound Software project to the 2012 ISMIR conference (8-12 October in Porto, Portugal). A tutorial proposal will also be submitted to DAFx-12 (Digital Audio Effects conference, 17-21 September in York).
The QMUL Learning Institute will provide support and know-how in evaluation methodologies and analysis.Feedback will be collected using:
- anonymous questionnaires after the tutorials/workshops, tailored to the specific audience;
- online questionnaires;
- standard course evaluation for postgraduate modules;
- focus groups interviewed a few months after the training to establish the longer-term impact of the training.
The feedback will be used to iteratively improve the material. Revised versions of all training materials will be available by the end of the project.
Sustainability¶We aim to achieve sustainability in the longer term both in the digital music and audio research community, and within QMUL. Our goals are:
- to make discipline-specific training sustainable in the digital music and audio research community. Awareness will be raised by presenting the material in collaboration with the Sound Software project at similar UK research institutions, and at discipline-specific conferences (ISMIR and DAFx). Training material will be made available for reuse through the Jorum repository.
- to set an example within QMUL. The project will be used as an example by the QMUL Learning Institute , the School of Electronic Engineering and Computer Science, and the IT Services to expand the data management training to other disciplines by adapting the material and methodologies, starting from related research areas such as Signal Processing, and more generally Electronic Engineering and Computer Science. Data management training will be integrated in postgraduate curricula: every PhD student is expected to take part in approximately 210 hours of development activities (including research methods courses) over the course of their studies and the points gained are mapped against the four domains of the Vitae /RCUK Researcher Development Framework . Material for Continuous Professional Development courses for research and academic staff will also be adapted to other disciplines, and all face-to-face training will be complemented by online training material.
- WP1 Training Material Design
- WP2 Test and evaluation
- WP3 Embedding
- WP4 Communication and Management
An overview of the intended content of the work packages is here
WP1 Training Material Design¶
WP1 Training Material Design¶
Although the basic principles of data management are valid for both postgraduate students, and research and academic staff, we decided to make a distinction between the two groups (WP1.3 and WP1.4) - a PhD student starting on his project and a PI writing a grant proposal might want to focus on different aspects of data management. The online material (WP1.2) will cover all aspects and be relevant to both groups.
Results from previous projects (e.g. JISC RDMTrain programme, Research Data Management Skills Support Initiative (DaMSSI), Incremental ), as well as available material from the DCC and other institutions, will be studied and evaluated. Disciplines will be compared and parts of the available material identified that need to be adapted to appeal to researchers in the area of digital music and audio research. In order to integrate the material into the Vitae /RCUK Researcher Development Framework , used to assign credits by the QMUL Learning Institute , the recently released "Information-handling Lens" will also be analysed.
The Incremental project recommends in its final report (page 21) to "create a collection of webpages to help researchers find tools and assistance". Examples will include FAQs, fact-sheets, online step-by-step guides (e.g. on creating a data management plan for PIs writing a project proposal), short instructional videos (e.g. on how to deposit a data set into a repository, from metadata collection to choosing a license). It will target both new members of staff who could not participate in face-to-face training, and those who need quick reference material or want to learn in greater depth after a seminar. It will also contain information on where to get help for different problems (e.g. copyrights, technical) inside the institution. The online material will be prepared first because it should be already in place when face-to-face training is given.
The online materials have been prepared in the form of a wiki, and are part of this site.
WP1.3 Research Staff Material¶
Material will be designed that targets research and academic staff involved in funded research projects, although the basic principles will be relevant to students as well. Experience from the Sound Software project showed that different material is useful at different stages of a project. We will thus create a range of training materials to cover some of these stages, to be presented in different formats (e.g. short seminars, tutorials, workshops), and to be integrated by online material. Examples include, but are not limited to:
- a five-minute long "executive" pitch on the benefits of data management;
- hands-on workshops for CIs and PIs on data management planning for research projects;
- conference tutorials giving an overview of research data management;
- material for short seminars with in-depth analysis of single aspects of data management such as available tools, policies, and discipline specific challenges.
This material will be presented at internal seminars, discipline-specific conferences and, in collaboration with the Sound Software project, at other institutions in the UK working in the area of digital music and audio research.
WP1.4 Post-Graduate Course Material¶
Discipline-focused material for face-to-face training sessions will be designed. The material will cover the basics of good data management practise, point out its benefits, and touch on discipline-specific challenges such as copyrights and licenses, with discipline-specific examples, based on the recently developed C4DM Data Management System. Also, the students, as suggested by the DaMSSI project final report (Conclusions, page 15, paragraph 5), will be instructed to create a Data Management Plan for their PhD projects, to be included in their Research Proposal. The material should be sufficient to cover at most one or two sessions in a module. For more in-depth study, the students will then be referred to the online material. The material will be tested first with postgraduate students at C4DM, and then at other research groups in the Digital Music Research Network.
- D1.1 Summary and analysis of material already available.
- D1.2 First draft of the online material.
- D1.3 First draft of the research staff material.
- D1.4 First draft of the postgraduate course material.
- D1.5 Updated version of the research staff material.
- D1.6 Updated version of the online reference material.
- D1.7 Updated version of the postgraduate course material.
WP2 Test and evaluation¶
With the support of the QMUL Learning Institute workshop questionnaires will be developed based on their prior experience from other projects in order to evaluate the effectiveness of training material and their delivery. The feedback obtained will be used to inform future activities.
The online material will be released as early as possible during the project. Continuous online evaluation will be used to collect feedback and make the appropriate changes.
WP2.3 Feedback collection and analysis - research staff material¶
The workshop material will be tested at various institutions across the UK, and at the ISMIR 2012 conference, where questionnaires will be handed out at the end of each session.
WP2.4 Feedback collection and analysis - postgraduate course material¶
Feedback for the postgraduate course material will be collected through the standard course evaluation procedures in place at QMUL.
- D2.1 Questionnaires for evaluating training material (all types).
- D2.2 Summary of the collected feedback for the online material and recommendations for improvement.
- D2.3 Summary of the collected feedback for the research staff material and recommendations for improvement.
- D2.4 Summary of the collected feedback for the postgraduate course material and recommendations for improvement.
This work package organised the various workshops, courses and seminars in collaboration with the partners.
- Sound Software 2012 presentation
- DAfx 2012 presentation
- ISMIR 2012 Presentation
- JISCMRD Training Projects Launch Workshop presentation
- D3.1 Final report on embedding.
WP4 Communication and Management¶
WP4.1 Project management¶
The project will be managed on a day-to-day basis by the PI, with project meetings held weekly to assess progress and problems. This has been our practice throughout the Sound Software project and previous JISC-funded projects. The CIs will participate in the management process to ensure compatibility and continuity with the requirements of the Sound Software project from a management and technical perspective respectively.
The project results will be disseminated through blog posts, Twitter, and official reports on the project's website. Results will also be presented at discipline-specific conferences (ISMIR, DAFx), and to other similar UK-based research institutions via the partnership with the Sound Software project.
- SoDaMaT blog
- Sound Software 2012 presentation
- DAfx 2012 presentation
- ISMIR 2012 Presentation
- JISCMRD Training Projects Launch Workshop presentation
- D4.1 Project site and feed.
- D4.2 Final report and publication of the material in the Jorum repository.
Pryor, G. and Donnelly, M. (2009). Skilling up to do data: whose role, whose responsibility, whose career? The International Journal of Digital Curation. Vol. 4(2), pp. 158--170.