Printable Version » History » Version 4
« Previous -
Version 4/14
(diff) -
Next » -
Current version
Steve Welburn, 2012-09-26 02:43 PM
SoDaMaT Wiki¶
This contains the full content of the SodaMaT Wiki
Sound Data Management Training (SoDaMaT)¶
- SoDaMaT Wiki
- Sound Data Management Training (SoDaMaT)
- Workplan
- WP1 Training Material Design
- WP1.1 Research Of Available Resources
- DataTrain
- DATUM for Health
- DMTpsych
- Incremental
- MANTRA
- Project CAIRO - Managing Creative Arts Research Data
- WP2.1 Evaluation Strategy Design
- Sound Data Management Training
(for general information re. Research Data Management please see the parent project Wiki)
Overview¶
Sound Data Management Training (SoDaMaT) is an eight-month project to create and evaluate discipline-specific data management training material for digital music and audio research. The materials will be targeted to: postgraduate research students (MSc and PhD); research staff (postdoctoral researchers, CIs, PIs); and academic staff. The project is to run at the Centre for Digital Music (C4DM) at Queen Mary University of London (QMUL) from June 2012 to January 2013, in collaboration with the QMUL Learning Institute .
The immediate objectives of the SoDaMaT project are:- to develop specific training material on data management planning for research projects, targeting research and academic staff in digital music and audio research;
- to develop training material covering the different aspects of research data management, including subject-specific topics such as music copyrights, for postgraduate students, research and academic staff in the area of digital music and audio;
- to collaborate with institutional partners at QMUL (The Learning Institute), other projects (SoundSoftware.ac.uk), and discipline-specific societies (Digital Music Research Network, International Society for Music Information Retrieval ) to test the training material in postgraduate courses, workshops, and tutorials, and to collect feedback on their quality and impact;
- to collaborate with institutional partners at QMUL (Learning Institute ; School of Electronic Engineering and Computer Science ; IT Services ) to embed the training material into postgraduate curricula and Continuous Professional Development courses to assure the long-term sustainability and generalisation of the project's results to other similar disciplines.
The requirements will be scoped, and the training materials will be trialled, within the Centre for Digital Music (C4DM), part of the School of Electronic Engineering and Computer Science at QMUL.
In addition to designing, producing, and evaluating discipline-specific training material, a wider objective is to promote good practice in research data management through education and awareness both within QMUL, and across UK and overseas research institutions in the digital music and audio area.
Background¶
A survey on data management practices among researchers and students at C4DM, conducted during the JISC-funded Sustainable Management of Digital Music Research Data (SMDMRD) project, showed very low awareness of the importance of research data management as part of the research workflow. Although many researchers organise their data in folders and perform semi-regular backups, to the specific question "Do you have a particular strategy for data management", the majority responded negatively. Through our links with other groups via the EPSRC -funded Sound Software project (see Collaborations section), we have good reason to believe the situation is similar in many other music and audio research groups. The results of the survey point to the need for raising awareness of the benefits of research data management, such as a potential increase in citations, understanding and meeting data management requirements set by funding bodies (e.g. EPSRC), and producing sustainable and reproducible research.
The SMDMRD project defined a set of data management policies and created a pilot data management system for C4DM. This was a pioneering effort within QMUL, and a collaboration with the QMUL IT Services has been recently established to adapt the results of the SMDMRD project to define institutional policies, and build an institutional research data repository. Policies can be used to raise awareness among research staff and students by imposing rules of conduct, and adherence to such policies is supported by tools like a data repository. Nevertheless, policies only give a general idea of why research data management is important, and the enthusiasm for using a data management system can easily fade if a culture for data management is not established. These facts point to the need for continuous, embedded, sustainable data management training, with strong focus on promoting the benefits of research data management, ideally from the early stages of a researcher's career.
The Digital Music and Audio Researcher Profile¶
A wealth of material for training researchers in data management has been produced by previous JISC-funded projects such as Incremental and those in the RDMTrain programme. The Research Data Management Skills Support Initiative (DaMSSI), which collected and compared the results from discipline-specific data management training projects in the RDMTrain programme, in its final report came to the conclusion that "participants respond well to discipline-specific examples and the opportunity to discuss issues with tutors and others in similar disciplines" and that "a discipline-specific approach is more likely to engage students - in many cases principles are the same across disciplines but are more interesting to students if these principles can be seen in the students' own context". DaMSSI also produced three discipline-specific researcher profiles - in the social sciences, in clinical psychology, and in archeology - and two generic data profiles - the conservator and the data manager. We believe that researchers at the Centre for Digital Music, and researchers in similar laboratories or institutions, do not fit in the above-mentioned profiles.
The Centre for Digital Music (C4DM) at QMUL is one of the leading research centres in the field of audio and music technology and signal processing. C4DM makes use of a variety of data as research inputs - most obviously audio datasets - and produces a variety of types of data as research outputs. These outputs include:- manually annotated feature data ("reference annotations") such as expert chord and key transcriptions of existing music recordings which are used as comparative data for evaluating research work;
- automatically produced annotations such as those accompanying the publication of methods for audio feature analysis.
The primary targets for the training material to be produced by the proposed project are postgraduate research students, and research and academic staff in C4DM, who perform research over a range of areas including music informatics, machine listening, audio engineering and interaction. C4DM is one of the leading research centres in the field of audio and music technology and signal processing. C4DM makes use of a variety of data as research inputs - most obviously audio datasets - and produces a variety of types of data as research outputs. A common use-case in C4DM research is to run a newly-developed analysis algorithm on a set of audio examples and evaluate the algorithm by comparing its output with that of a human annotator. Results are then compared with published results using the same input data to determine whether the newly proposed approach makes any improvement on the state of the art.
The type of data used in digital music and audio research poses some challenges that need to be addressed in discipline-specific training material. These challenges include:- Copyright: the copyright status of digital music data is often difficult to establish. For example, the owner of internally generated data might be unclear, or data purchased or downloaded from outside might have special license requirements that must be adhered to. This prevents researchers from publishing data in order to avoid unnecessary risk. Addressing this aspect in detail and emphasising the use of less restrictive licenses (e.g. Creative Commons , Open Data Commons ), could lead to a larger amount of data being published in public repositories.
- Metadata: the line between data and metadata is often unclear. For example, descriptive metadata (e.g. a song's title, author, year of publication, or key) is in another context used as data. The training material will focus on defining what data and metadata are, on the importance of metadata standards, and on their use, together with standard protocols such as OAI-PMH and SWORD, to exchange data among repositories.
- Ethical approval and participant agreement: experimental work based on human responses (e.g. perceptual listening tests) require ethical approval. The lack of information and experience on this topic leads people to write ethics forms that prohibit the release of data, preventing other researchers from reproducing or extending their results, when data could be safely released with the participants' consent if anonymised. %Data is often not published because, for lack of information, the creators tend to be exceedingly "safe" in this respect. The material will include information on how ethical approval works, how to obtain it, and information about publication of sensitive data.
In addition to the recommendations from DaMSSI , the need for specific training material for digital music and audio researchers is justified by at least two additional factors. First, most of the researchers are either computer scientists or electrical engineers and have advanced IT skills. Second, the data is very heterogeneous, rapidly changing, and relatively small in size. As a result, it is usually managed by the creator of the data itself. Thus, the clear separation pointed out by the profiles produced by DaMSSI , as well as in Pryor and Donnelly (2009, p. 165), between the data creator and the data manager/librarian/scientist becomes blurred: all the different aspects can be, and often are, taken care of by the same person.
Evaluation¶
Strong attention will be payed to evaluate the quality and impact on research practice of the training material. By taking advantage of the established collaborations, the material will be tested in different situations, including postgraduate courses, internal and external seminars and workshops, and tutorials at international conferences. The International Society for Music Information Retrieval (ISMIR) serves the purposes of fostering the exchange of ideas between and among members whose activities, though diverse, stem from a common interest in music information retrieval. A tutorial proposal been submitted in collaboration with the Sound Software project to the 2012 ISMIR conference (8-12 October in Porto, Portugal). A tutorial proposal will also be submitted to DAFx-12 (Digital Audio Effects conference, 17-21 September in York).
The QMUL Learning Institute will provide support and know-how in evaluation methodologies and analysis.
Feedback will be collected using:- anonymous questionnaires after the tutorials/workshops, tailored to the specific audience;
- online questionnaires;
- standard course evaluation for postgraduate modules;
- focus groups interviewed a few months after the training to establish the longer-term impact of the training.
The feedback will be used to iteratively improve the material. Revised versions of all training materials will be available by the end of the project.
Sustainability¶
We aim to achieve sustainability in the longer term both in the digital music and audio research community, and within QMUL. Our goals are:- to make discipline-specific training sustainable in the digital music and audio research community. Awareness will be raised by presenting the material in collaboration with the Sound Software project at similar UK research institutions, and at discipline-specific conferences (ISMIR and DAFx). Training material will be made available for reuse through the Jorum repository.
- to set an example within QMUL. The project will be used as an example by the QMUL Learning Institute , the School of Electronic Engineering and Computer Science, and the IT Services to expand the data management training to other disciplines by adapting the material and methodologies, starting from related research areas such as Signal Processing, and more generally Electronic Engineering and Computer Science. Data management training will be integrated in postgraduate curricula: every PhD student is expected to take part in approximately 210 hours of development activities (including research methods courses) over the course of their studies and the points gained are mapped against the four domains of the Vitae /RCUK Researcher Development Framework . Material for Continuous Professional Development courses for research and academic staff will also be adapted to other disciplines, and all face-to-face training will be complemented by online training material.
Workplan¶
The work of the project is divided into four work packages (WP):- WP1 Training Material Design
- WP1.1 Research Of Available Resources
- WP1.2 Online Training Material
- WP1.3 Research Staff Material (see ISMIR 2012 Presentation)
- WP1.4 Post-Graduate Course Material (see Software Carpentry Presentation (February 2013))
- WP2 Test and evaluation
- WP3 Embedding
- WP4 Communication and Management
An overview of the intended content of the work packages is here
Training the Trainers¶
Additional Notes¶
References¶
Pryor, G. and Donnelly, M. (2009). Skilling up to do data: whose role, whose responsibility, whose career? The International Journal of Digital Curation. Vol. 4(2), pp. 158--170.
Research Data Management Skills Support Initiative (DaMSSI) final report
Workplan¶
- SoDaMaT Wiki
- Sound Data Management Training (SoDaMaT)
- Workplan
- WP1 Training Material Design
- WP1.1 Research Of Available Resources
- DataTrain
- DATUM for Health
- DMTpsych
- Incremental
- MANTRA
- Project CAIRO - Managing Creative Arts Research Data
- WP2.1 Evaluation Strategy Design
- Sound Data Management Training
The work of the project is divided into four work packages (WP).
WP1 Training Material Design¶
WP1 Training Material Design¶
Although the basic principles of data management are valid for both postgraduate students, and research and academic staff, we decided to make a distinction between the two groups (WP1.3 and WP1.4) - a PhD student starting on his project and a PI writing a grant proposal might want to focus on different aspects of data management. The online material (WP1.2) will cover all aspects and be relevant to both groups.
WP1.1 Research Of Available Resources¶
Results from previous projects (e.g. JISC RDMTrain programme, Research Data Management Skills Support Initiative (DaMSSI), Incremental ), as well as available material from the DCC and other institutions, will be studied and evaluated. Disciplines will be compared and parts of the available material identified that need to be adapted to appeal to researchers in the area of digital music and audio research. In order to integrate the material into the Vitae /RCUK Researcher Development Framework , used to assign credits by the QMUL Learning Institute , the recently released "Information-handling Lens" will also be analysed.
WP1.2 Online Training Material¶
The Incremental project recommends in its final report (page 21) to "create a collection of webpages to help researchers find tools and assistance". Examples will include FAQs, fact-sheets, online step-by-step guides (e.g. on creating a data management plan for PIs writing a project proposal), short instructional videos (e.g. on how to deposit a data set into a repository, from metadata collection to choosing a license). It will target both new members of staff who could not participate in face-to-face training, and those who need quick reference material or want to learn in greater depth after a seminar. It will also contain information on where to get help for different problems (e.g. copyrights, technical) inside the institution. The online material will be prepared first because it should be already in place when face-to-face training is given.
The online materials have been prepared in the form of a wiki, and are part of this site.
WP1.3 Research Staff Material¶
Material will be designed that targets research and academic staff involved in funded research projects, although the basic principles will be relevant to students as well. Experience from the Sound Software project showed that different material is useful at different stages of a project. We will thus create a range of training materials to cover some of these stages, to be presented in different formats (e.g. short seminars, tutorials, workshops), and to be integrated by online material. Examples include, but are not limited to:
- a five-minute long "executive" pitch on the benefits of data management;
- hands-on workshops for CIs and PIs on data management planning for research projects;
- conference tutorials giving an overview of research data management;
- material for short seminars with in-depth analysis of single aspects of data management such as available tools, policies, and discipline specific challenges.
This material will be presented at internal seminars, discipline-specific conferences and, in collaboration with the Sound Software project, at other institutions in the UK working in the area of digital music and audio research.
WP1.4 Post-Graduate Course Material¶
Discipline-focused material for face-to-face training sessions will be designed. The material will cover the basics of good data management practise, point out its benefits, and touch on discipline-specific challenges such as copyrights and licenses, with discipline-specific examples, based on the recently developed C4DM Data Management System. Also, the students, as suggested by the DaMSSI project final report (Conclusions, page 15, paragraph 5), will be instructed to create a Data Management Plan for their PhD projects, to be included in their Research Proposal. The material should be sufficient to cover at most one or two sessions in a module. For more in-depth study, the students will then be referred to the online material. The material will be tested first with postgraduate students at C4DM, and then at other research groups in the Digital Music Research Network.
WP1 Deliverables¶
- D1.1 Summary and analysis of material already available.
- D1.2 First draft of the online material.
- D1.3 First draft of the research staff material.
- D1.4 First draft of the postgraduate course material.
- D1.5 Updated version of the research staff material.
- D1.6 Updated version of the online reference material.
- D1.7 Updated version of the postgraduate course material.
WP2 Test and evaluation¶
WP2.1 Evaluation strategies design¶
With the support of the QMUL Learning Institute workshop questionnaires will be developed based on their prior experience from other projects in order to evaluate the effectiveness of training material and their delivery. The feedback obtained will be used to inform future activities.
WP2.2 Feedback collection and analysis - online material¶
The online material will be released as early as possible during the project. Continuous online evaluation will be used to collect feedback and make the appropriate changes.
WP2.3 Feedback collection and analysis - research staff material¶
The workshop material will be tested at various institutions across the UK, and at the ISMIR 2012 conference, where questionnaires will be handed out at the end of each session.
WP2.4 Feedback collection and analysis - postgraduate course material¶
Feedback for the postgraduate course material will be collected through the standard course evaluation procedures in place at QMUL.
WP2 Deliverables¶
- D2.1 Questionnaires for evaluating training material (all types).
- D2.2 Summary of the collected feedback for the online material and recommendations for improvement.
- D2.3 Summary of the collected feedback for the research staff material and recommendations for improvement.
- D2.4 Summary of the collected feedback for the postgraduate course material and recommendations for improvement.
WP3 Embedding¶
This work package organised the various workshops, courses and seminars in collaboration with the partners.
- Sound Software 2012 presentation
- DAfx 2012 presentation
- ISMIR 2012 Presentation
- JISCMRD Training Projects Launch Workshop presentation
WP3 Deliverables¶
- D3.1 Final report on embedding.
WP4 Communication and Management¶
WP4.1 Project management¶
The project will be managed on a day-to-day basis by the PI, with project meetings held weekly to assess progress and problems. This has been our practice throughout the Sound Software project and previous JISC-funded projects. The CIs will participate in the management process to ensure compatibility and continuity with the requirements of the Sound Software project from a management and technical perspective respectively.
WP4.2 Dissemination¶
The project results will be disseminated through blog posts, Twitter, and official reports on the project's website. Results will also be presented at discipline-specific conferences (ISMIR, DAFx), and to other similar UK-based research institutions via the partnership with the Sound Software project.
- SoDaMaT blog
- Sound Software 2012 presentation
- DAfx 2012 presentation
- ISMIR 2012 Presentation
- JISCMRD Training Projects Launch Workshop presentation
WP4 Deliverables¶
- D4.1 Project site and feed.
- D4.2 Final report and publication of the material in the Jorum repository.
References¶
WP1.1 Research Of Available Resources¶
- Previous JISC Projects
- Vitae Researcher Development Framework
- DCC and Other Institutions
- Legislation
- Resources For Learning Materials
- QMUL resources for e-Learning
- Links
Results from previous projects (e.g. JISC RDMTrain programme, Research Data Management Skills Support Initiative (DaMSSI), Incremental ), as well as available material from the DCC and other institutions, will be studied and evaluated. Disciplines will be compared and parts of the available material identified that need to be adapted to appeal to researchers in the area of digital music and audio research. In order to integrate the material into the Vitae /RCUK Researcher Development Framework , used to assign credits by the QMUL Learning Institute , the recently released "Information-handling Lens" will also be analysed.
Previous JISC Projects with Data Management Training Outputs¶
There are lots of materials relating to data management available through Jorum these include audio interviews, PowerPoint presentations, factsheets, videos and more. Many of these are outputs of previous JISC-funded projects, and we consider some of those here.
The JISC RDMTrain programme funded five discipline-specific research data management training projects in 2010-2011.
Two projects produced online courses:- Project CAIRO - Managing Creative Arts Research Data (4 short units)
- MANTRA - for geosciences, social and political sciences and clinical psychology (a very detailed self-guided course)
- DATUM for Health - 3 sessions
- DMTpsych - 6 sessions
- DataTrain - different versions for archaeology (4 sessions) and social anthropology (3 modules targetted at different audiences)
- Three slideshows at varying levels of detail including materials targeted at post-doc researchers (Jorum)
- Research Data Management Factsheet (Jorum)
- Research Information Management Guides (Jorum)
- Research Information Management: Organising Humanities Material (Jorum)
- Research Information Management: Tools for the Humanities (Jorum)
The Incremental project at Glasgow and Cambridge was also part of the JISC Research Data Management Infrastructure programme. Incremental aimed to develop a data management infrastructure by examining existing practises and requirements at the institutions, piloting tools and services to enable data management (examples of proposed outputs included "templates, training, best practice guidelines, and policy") and embedding those outputs withing the institutions. In addition, they aimed to disseminate the results to the wider research community. During the course of the project many training resources were produced and these have been published on Jorum. We provide a summary of some resources on our page on Incremental.
Vitae Researcher Development Framework¶
The Vitae Researcher Development Framework (RDF) categorises the knowledge, behaviours and attributes of researchers and uses this as a foundation to guide the development of researcher skills.
In April 2012, Vitae published an information literacy component for the RDF.
Information literacy is an umbrella term which encompasses concepts such as digital, visual and media literacies, academic literacy, information handling, information skills, data curation and data management. Interacting with information is at the very heart of research and informed researchers are both consumers and producers of information.
The RDF component included an information literacy lens - mapping information literacy skills onto the RDF researcher model - and an Informed Researcher Booklet giving guidelines to researchers on evaluating and improving their information literacy.
The RDF Information Literacy lens is largely based on the Society of College, National and University Libraries (SCONUL) 7 Pillars Of Information Literacy
Other Vitae RDF Lenses and the more general SCONUL 7 Pillars Of Wisdom may also be of interest.
JISC and the Research Information Network (RIN) co-funded the Research Data Management Skills Support Initiative (DaMSSI) at the DCC. This aimed to examine how the Vitae RDF and the SCONUL 7 Pillars Of Information Literacy could be used to improve the planning of data management training, contributing to the development of the Vitae Information Literacy Lens and Informed researcher booklet, above.
The DaMSSI outputs were (from here):- DaMSSI project plan
- JISC RDMTrain projects mapped against the RDF
- JISC RDMTrain projects mapped to the Digital Curation Lifecycle Model
- Career profiles
- DaMSSI final report
Of particular interest to the current project are the mappings of previous RDM training projects onto the RDF and the Digital Curation Lifecycle Model.
DCC and Other Institutions¶
Several UK universities have published materials relation to data management, and, particularly, data management training: In addition, universities in other countries have also published materials:- Carleton
- Columbia
- Melbourne / Melbourne - RDM for Researchers
- Dartmouth - online course
- MIT
- New Hampshire
- Responsible Conduct In Data Management (from US Office Of Research Integrity / U. of Illinois)
Although the legal and funder requirements for these organisations will differ from the UK situation, the underlying principles for data management are still the same.
Some UK research councils have published policies regarding data management, data sharing and data curation:- the Engineering and Physical Sciences Research Council (EPSRC) have a published policy framework for research data
- the Medical Research Council (MRC) have guidelines on data sharing including their data policy and information on data management plans
- the Economic and Social Research Council (ESRC) also have a research data policy
- although the Arts and Humanities Research Council (AHRC) web pages don't include data management information, their Funding Guide (PDF downloadable from AHRC web-site) does specify requirements for research data publication.
We are in the process of summarising the main Research Council Requirements.
Other organisations have also produced materials related to data management training:- the Digital Curation Centre (DCC) provide information, training and other resources for the digital curation lifecycle
- the UK Data Archive have published best practices for creating, preparing, storing and sharing data
- JISC Digital Media provides information on using digital media including specific guides to managing still images , moving images and audio as well as cross-media digital media use.
- World Agroforestry Centre created a one week course on Research Data Management and the manual for this (published 2002) is available from University of Reading Statistical Services Centre)
- Ecosystem Services for Poverty Alleviation (ESPA)
- British Library Preservation Advisory Centre (PAC)
- NCDCR Electonic Records documentation - includes file naming
- the Research Information Network (RIN) provide information on data management and curation
- The DataONE distributed framework for environmental sciences has Powerpoint training materials on data management
- Data Intelligence for Librarians
Legislation¶
Resources For Learning Materials¶
QMUL resources for e-Learning¶
Resources available at QMUL for eLearning include:- Moodle - a Course Management System (CMS), also known as a Learning Management System (LMS) or a Virtual Learning Environment (VLE).
- Mahara "open source eportfolios", whatever that means.
- Articulate for developing online/e-Learning materials
- qReview lecture capture system
- Adobe Connect web-conferencing
- Bristol Online Surveys (QMUL) for developing... surveys
Links¶
Doctoral Training Centres as catalysts for research data management
RDM training for Postgraduates and Doctoral Training Centres
Open Exeter PGR Workshop on Data Management
DataTrain¶
DataTrain for archaeology and for social anthropology
Modules are available on Jorum:Archaeology¶
Licensed CC-BY-NC-SA
Structure of course:
Modules:
- Creating and managing research data in archaeology: an overview
- Data lifecycles and management plans
- Working with digital data
- Rights and digital data
- E-Theses and supplementary digital data
- Archiving digital data
- Post-Graduate data management plans
- Project and professional data: data management on post-doctoral research projects and beyond
The teaching modules were run as a trial course in March 2011, as part of a post-graduate course in Digital Skills for Dissertation and Publications, Department of Archaeology, University of Cambridge. The data management course comprised 4 x 2 hour sessions:
- Creating and Managing Data - Defining post-graduate research data
- Working with Digital Data
File structure, naming, and formats
E-theses and supplementary digital data
Post-Graduate Data Management Plans- Project and Professional Data
Data management for larger research projects- Archiving and Re-using Data
Depositing digital data
Intellectual Property Rights and research data
The slides and notes have been kept as simple and as straight forward as possible. They are not meant to be exhaustive in the information they contain. Rather, they provide an overview of the general issues regarding data management.
Each module has been designed to take approximately 30 minutes to complete. Six of the eight presentations have between 10 and 16 slides (including front title and end acknowledgement slides). The two longer modules are Module 3: Working with Digital Data; and Module 8: Project and Professional Data.
Module 3 (Working with Digital Data) has 38 slides many of which contain a lot of information on different file types and formats. This information has been summarised from the Archaeology Data Service’s Guides to Good Practice, and content most relevant to post-graduate students is presented in a straight forward way. Rather that spending an hour presenting Module 3 in detail (and boring the students to death), it is suggested that the slides be presented as a ‘lightening tour’ of the practical issues of working with digital data. The slides can then be made available for future reference by the students as a handout.
Module 8 (Project and Professional Data) provides an introduction into data management at a higher level of research, including writing AHRC Technical Appendices. While this can be run as a stand alone session, given that this is the desired career path of many doctoral students, and the fact that many doctoral students carry out their research as part of larger projects, the aim of the module is to round off the post-graduate course by looking forward beyond the submission of a PhD Thesis.
Comments regarding discipline-specific nature (from notes for part 1 of course):
Can archaeology be considered in any way a special case in terms of how we create, manage, and archive digital data?
The simple answer is no. The issues of how best to manage digital data and safeguard it preservation in the long term are broadly the same across all disciplines.
The same goes for individual archaeological projects. Even though some might think that their own project is a special case in terms of complicated digital data, or for the fact that they will produce very little in the way of digital data, at the heart of it, the same issues apply, just on a larger or smaller scale.
A key issue which does vary from discipline to discipline is that of what are private data and what are public data. This does arise in archaeology particularly in regard to sensitive data of site or artefact locations, or sensitive personal data collected during the course of a research project.
What perhaps sets archaeology apart from other disciplines is the appreciation of the historical significance of what we do. And the fact that very often, the practice of archaeology is a destructive process and the physical and digital data obtained represent a unique archive – an experiment that cannot be repeated.
However... primary data is often paper-based. Notes, sketches etc.
One area of discipline-specificness is the selection of bodies that provide definitions of good practise and/or archiving facilities (e.g. Archaeology Data Service). Who are these for digital audio research ? AES ? JASA ? ISMIR ? IEEE ? Others ?
Includes details of copyright terms for 8 types of creative works: Literary; Artistic; Sound; Typographic; Broadcasts; Dramatic; Film; and Musical.
For post-grad students, e-Theses are covered. Publishing a digital copy of a thesis makes it "published" and means that all copyright details need to be ironed out.
Part 8 is largely related to resources (arch. specific).
Social Anthropology¶
A different approach...- Basic module - aimed at pre-fieldwork PhD students, fundamentals
- Advanced module - metadata, ethics, IPR, FoI, data protection, tools
- Writing-up module - for PhD students and early stage researchers, includes info on long-term archiving
Can be combined to produce a 1-day course.
Mentions reference management. Line between Reserach Data Management and Data Management ?
Lots of info. on data capture - digitizing data.
Points to interesting list of formats from the UK Data Archive":http://www.data-archive.ac.uk
Posting things on CDs/DVDs might be a good idea for infrequent sharing of large amounts of data. Beware of security issues, which can be sidestepped by encryption (more later); and of decay/damage.
In the Advanced module, examples are drawn from the discipline.
DATUM for Health¶
Comprises 3 sessions:- Session 1: Introduction To Data Management/session1/)
- Session 2: Data Curation Lifecycle
- Session 3: Problems and practical strategies and solutions
- Data For Life - Digital Preservation for Health Sciences
Downloadable from Jorum (CC-BY-NC-SA)
Session 1: Introduction To Data Management (Northumbria)¶
- What is research data ?
- Where is your research data ?
- Why manage research data
- a requirement
- to work effectively & efficiently
- to protect it
- for use and/or re-use
- to share it
- for preservation
- because it is good research practice
- How to manage research data
- The research data lifecycle
- Plan / Create / Analyse / Preserve / Share /Use (and repeat...)
- Creating a DMP
Session 2: Data Curation Lifecycle (Northumbria)¶
- What is data curation ?
- Why curate ?
- Requirements
- Rewards
- DCC Data Curation Lifecycle Model
- Conceptualise - planning
- Create - collection & analysis
- Appraise - selection
- Ingest - transferring to a custodian
- Preserve - keeping data over time
- Store - keeping data safe
- Access - finding data
- Transform - generating new data
Session 3: Problems and practical strategies and solutions (Northumbria)¶
- What problems are there ?
- Conflicting considerations
- Resource issues
- anything else ?
- Conflicts
- Confidentiality and sharing
- personal and sensitive data - anonymisation, consent
- Confidentiality and sharing
- Data security and storage
- File and folder names
- Locations
- Email is not secure
- Physical security - destroy USB sticks, shred documents
- Metadata
DMTpsych¶
Postgraduate training for research data management in the psychological sciences
DMTpsych built upon existing research data management materials developed by the Digital Curation Centre Opens new window (DCC) to create discipline-focused postgraduate training materials that can be embedded into postgraduate research training in the psychological sciences. The materials produced consist of:
- PowerPoint slides to be used in taught research methods courses
- Workbook containing psychology specific guidance on completing the DCC’s Online Data Management Planning Tool (including worked examples)
- A paper copy of the DMPT Opens new window to be completed by students (actually at DCC)
The lectures are structured thematically to match the existing DCC DMPT with the eight key sections forming the centrepiece of six psychology specific lectures and round table discussions.
Deliverables online
Material available for:- Overview
- 1. Historical and Conceptual Issues and Best Practice
- 2. Introduction and context to psychology-specific DMPT
- 3a. Access, data sharing and re-use; Legal and ethical issues
Good detail on Data Protection and FoI. Less good on IPR. - 3b. Data standards and capture methods
- 4. Short-term storage and data management; Deposit and long-term preservation
- 5. Resourcing; Adherence, review and long-term management
- 6. Completion of your own Data Management Plan
- Informed Consent Form
Licensed CC-BY-NC:
This license lets others remix, tweak, and build upon your work non-commercially, and although their new works must also acknowledge you and be non-commercial, they don’t have to license their derivative works on the same terms.
Written from a psychology perspective... but the content isn't particularly psych.
Incremental¶
This project will build on earlier work by HATII and the DCC to support research data management. It will analyse needs at Glasgow and Cambridge across a number of different disciplines; propose a range of tools or services to address those needs; and develop, adapt and pilot these within each institution. Outputs will then be further adapted and prepared for embedding in local infrastructures and wider dissemination via the Digital Curation Centre, Digital Preservation Coalition, and JISC. The project intends to focus on the provision of softer infrastructure (e.g. templates, training, best practice guidelines, and policy).
Includes multimedia files (audio, video)
Funded by JISC 2010-2011
From Jorum (largely CC-BY-NC-SA)¶
- Re-use, sharing, and archiving sensitive research data: a practical overview - slideshow (Jorum)
- How data centres and repositories can help with research data management (Jorum)
- University of Glasgow: bidding for grant funding workflow (Jorum)
- University of Cambridge: bidding for funding workflow (Jorum)
- The university ethics process and how it impacts on making creative work (Jorum)
- The benefits of sharing research data (Jorum)
- Managing music data (Jorum)
- Managing multimedia research data (Jorum)
- Working with digital media files (Jorum)
- Archiving sensitive research data (Jorum)
- Managing sensitive data in performing arts - narrated slideshow (Jorum)
- Re-use, sharing, and archiving sensitive research data: a practical overview - slideshow (Jorum)
- Intellectual Property Rights and Research Data: Focus on copyright - narrated slideshow (Jorum)
- Who owns IPR? - flowchart (Jorum)
- Intellectual property rights (IPR) and the creation and use research materials (Jorum)
- Intellectual property rights and University of Cambridge: Focus on patents and commercialisation - narrated slideshow (Jorum)
- How the Freedom of Information Act (FOI) applies to research data (Joprum)
- FAQ for Freedom of Information and Environmental Information Requests for Research Data - narrated slideshow (Jorum)
- Using the UK Freedom of Information Act: A practical guide for researchers - narrated slideshow (Jorum)
- What options do researchers have when asked to release their data by an FOI request? (Jorum)
- UK research funders' data policies (Jorum)
- Organising files and folders (Jorum)
- Adding metadata to Microsoft Office documents (Jorum)
- Choosing the right digital storage media for you (Jorum)
- Selecting which data to keep (Jorum)
- Common Image Formats (Jorum)
- Selecting which data to keep at University of Glasgow (Jorum)
- Version control across devices (Jorum)
Incremental Project¶
Content produced by the Incremental project is released under Creative Commons licence BY-NC-SA
- Create
- Organise
- Access
- Look After
MANTRA¶
geosciences, social and political sciences and clinical psychology
This course is an Open Educational Resource that may be freely used by anyone.
It is available through an open license for re-using, rebranding, repurposing.
You are free to re-use part or all of this work elsewhere, with or without modification. In order to comply with the attribution requirements of the Creative Commons license (CC-BY), we request that you cite:
- the author/creator: EDINA and Data Library, University of Edinburgh
- the title of the work: Research Data MANTRA [online course]
- the URL where the original work can be found: http://datalib.edina.ac.uk/mantra
Downloadable from Jorum (DSpace!)
Structure:- Introduction
- Research Data Explained: in depth discussion of what data is and types of data
- Data Management Plans
- Organising Data: File naming and storing
- File Formats
- Documentation and Metadata
- Storage and Security
- Data Protection, Rights and Access (in development / not available)
- Preservation, Sharing and Licensing (in development / not available)
- Recommended Resources
Includes videos. Approx. 20 slides per topic (13-30).
Appears to have been built using Xerte
Project CAIRO - Managing Creative Arts Research Data¶
Online course materials consisting of four units: Downloadable from Jorum:- Narrated slideshow (CC-BY-NC-SA)
- Managing Creative Arts Research Data post graduate module (CC-BY-NC-SA)
Introduction to Research Data Management¶
Examines:- challenges in evidencing art as research
- identifying users
- identifying threats to data
- benefits of retaining data in a usable form
Presents a workflow for arts data management:
Planning -> Creating -> Shaping -> Long-term management
- Who's it for ?
- What documentation is required ?
- Are there stipulations on data management (timescales, repositories, publish, sensitivity) ?
- Is assessment required ? How do we enable it ?
- Are there guidelines we should follow (e.g. institutional) ?
- If we will publish data, do repositories have requirements for formats ?
- collecting permissions as required
- documenting data
- considering file formats
- backups
- selection of data
- extending metadata
- use of sustainable file formats
Long-term management is after-the-research management of data - NB: the nature of this means that it will involve handing data over to a long-term archive. Occasional activities required (e.g. changing file formats)
AHRC rules are that data needs to be kept for 3 years after a project concludes. (see Research Council Requirements)
Creating Research Data¶
Focuses on actions before data is created:- Planning
- Funder's expectations
- Copyright (researcher as user of copyright material as opposed to as a producer of it). Includes discussion of fair dealing and notes that This exception does not apply to the copyright in films, sound recordings, broadcasts or typographical arrangements (see also Legislation)
- Permissions e.g. model release
- Data Protection
HE and FE institutions should ensure that [...] employees and students are aware that, while some exemptions are granted for the use of personal data for research purposes, the majority of the Data Protection Principles must still be conformed to — there is no blanket exemption.
(JISC Data Protection Code of Practice for the HE and FE Sectors (2001))
The simplest way to deal with DPA is to remove personal information. Anonymised data doesn't come under the DPA. So consider carefully whether any personal details in data add to its usefulness or could be removed
Managing Research Data¶
- Understand some of the criteria for retaining research data
- Self-archiving research data
- Understanding file formats
- Appreciate the differences between open and proprietary technologies
- Recognise the importance of metadata
- Identify some long-term challenges relating to research data
- Longevity
- Integrity
- Accessibility
- Use of refreshing (same format, new media) and migration (new format) to overcome these.
- Identifying significant properties that should be preserved for the data
Delivering Research Data¶
Identifying issues that come to light after the creation of research data, and overcoming those issues.
- Identify different ways to share research data
- Understand the role of an institutional repository
- Appreciate why some arts researchers use social platforms to deliver work
- Recognise the importance of attribution and copyright permissions
WP2.1 Evaluation Strategy Design¶
In order to evaluate the (training) materials, it is necessary to:- identify specific (learning) objectives which they aim to meet
- evaluate whether the materials meet those objectives
additionally, we need to - identify the overall purposes of the materials
- evaluate whether the cumulative objectives satisfy these overall purposes
In order to produce the best possible materials, it is necessary to evaluate and then revise the materials. Initial evaluation of the materials will take place once a first draft has been created, but before they are used in training. This will concentrate on the suitability and level of the content. After the initial evaluation and update, the materials will be used in training courses and begin an ongoing series of formative and summative evaluations (i.e. evaluations during and after the training). These evaluations will apply Kirkpatrick's four-level evaluation model1
Methods of Evaluation¶
Design review
Pre-course (evaluation of materials)- Cloze Test to assess readability
- I-Tech Instructional Design & Materials Evaluation Form and Guidelines
- Focus Groups (I-Tech guide) e.g. some of C4DM
- Internal use of materials i.e. C4DM
On-going evaluation of course
- Informal / Formal Review e.g.:
- Questionnaire to see how easy it is to find relevant material / test users knowledge
- Focus Groups (I-Tech guide)
- In-course / formative evaluation (see I-Tech)
- Assessment of level of knowledge within training group
- Checking progress with participants
- Trainer assessment - self assessment and from other trainers if possible
- Pre- and Post-course questionnaires - assess change in answers (true/false + multiple-choice)
- Post-course / summative evaluation
- Debriefing of trainer (did it work ? to time ? did it engage people ?)
- Questionnaire for participants Sample training evaluation forms
- Medium-term review of usefulness of course content / adoption of techniques (e.g. 2-3 months after course)
- Reaction - to the course ("motivation" may be more appropriate)
- Pacing, was it enjoyable
- Learning - from the course
- Did the facts get across ?
- Behavior - changes after the course
- Did participants actually manage their data better
- during research ?
- at the end of research ?
- Have data management plans been produced for grant proposals ?
- Did participants actually manage their data better
- Results or Impact - to a wider community
- Did they publish data ?
- Was any data loss avoided ?
- reading level
- correctness
- organization
- ease of use
based on the target audience
Tools and links¶
Instructional System Design approach to trainingFree Managemnt Library - Evaluating Training and Results
Lingualinks Implement A Literacy Program - Evaluating Training
Training Works!... ...what you need to know about managing, designing, delivering, and evaluating group-based training
References¶
[1] Kirkpatrick, D. L. (1959). Techniques for evaluating training programs. Journal of the American Society of Training Directors, 13, 3–9.
[2] Kirkpatrick, D. L. (1976). Evaluation of training. In R. L. Craig (Ed.), Training and development handbook: A guide to human resource development (2nd ed., pp. 301–319). New York: McGraw-Hill
Sound Data Management Training¶
- SoDaMaT Wiki
- Sound Data Management Training (SoDaMaT)
- Workplan
- WP1 Training Material Design
- WP1.1 Research Of Available Resources
- DataTrain
- DATUM for Health
- DMTpsych
- Incremental
- MANTRA
- Project CAIRO - Managing Creative Arts Research Data
- WP2.1 Evaluation Strategy Design
- Sound Data Management Training
Managing research data is basic good practice. It ensures your research data is available to complete the project, reducing risk in the project; and preserves your research for future use after the project is complete, increasing the impact of the project. In addition, good research data management will ensure that: you comply with funder and institutional requirements; and consider the ethical and legal implications related to your research data.
There are many counter-examples showing that poor research data management can result in lost research. Additionally, there are the success stories where good research data management has allowed research to continue after disasters.
We consider three stages of a research project, and the appropriate research data management considerations for each of those stages. The stages are:In addition, we consider the responsibilities of a Principal Investigator regarding data management.
There is also an alternate view of the content based on individual research data management skills and a summary of data management resources available to C4DM researchers.
These online materials are an output of the JISC-funded Sound Data Management Training (SoDaMaT) project.
Before The Research - Planning Research Data Management¶
A data management plan is an opportunity to think about the resources that will be required during the lifetime of the research project and to make sure that any necessary resources will be available for the project. In addition, it is likely that some form of data management plan will be required as part of a grant proposal.
The main questions the plan will cover are:- What type of storage do you require ?
Do you need a lot of local disk space to store copies of standard datasets ? Will you be creating data which should be deposited in a long-term archive, or published online ? How will you back up your data ? - How much storage do you require ?
Does it fit within the standard allocation for backed-up storage ? - How long will you require the storage for ?
Is data being archived or published ? Does your funder require data publication ? - How will this storage be provided ?
- the types of data you will be using and creating;
- available existing data management resources;
- funder requirements;
- and relevant policies (e.g. research group, institutional).
- What is the appropriate license under which to publish data ?
- Are there any ethical concerns relating to data management e.g. identifiable participants ?
- Does your research data management plan comply with relevant legislation ?
e.g. Data Protection, Intellectual Property and Freedom of Information
A minimal data management plan for a project using standard C4DM/QMUL facilities could say:
During the project, data will be created locally on researchers machines and will be backed up to the QMUL network. Software will be managed through the code.soundsoftware.ac.uk site which provides a Mercurial version control system and issue tracking. At the end of the project, software will be published through soundsoftware and data will be published on the C4DM Research Data Repository.
For larger proposals, a more complete plan may be required. The Digital Curation Centre have an online tool (DMP Online) for creating data management plans which asks (many) questions related to RCUK principles and builds a long-form plan to match research council requirements.
It is important to review the data management plan during the project as it is likely that actual requirements will differ from initial estimates. Reviewing the data management plan against actual data use will allow you to assess whether additional resources are required before resourcing becomes a critical issue.
In order to create an appropriate data management plan, it is necessary to consider data management requirements during and after the project.
The Digital Curation Centre (DCC) provide DMP Online, a tool for creating data management plans. The tool can provide a data management questionnaire based on institutional and funder templates and produce a data management plan from the responses. Documents are available describing how to use DMP Online.
During The Research¶
During the course of a piece of research, data management is largely risk mitigation - it makes your research more robust and allows you to continue if something goes wrong.
The two main areas to consider are:- backing up research data - in case you lose, or corrupt, the main copy of your data;
- documenting data - in case you need to to return to it later.
In addition to the immediate benefits during research, applying good research data management practices makes it easier to manage your research data at the end of your research project.
We have identified three basic types of research projects, two quantitative (one based on new data, one based on a new algorithm) and one qualitative, and consider the data management techniques appropriate to those workflows. More complex research projects might require a combination of these techniques.
Quantitative research - New Data¶
For this use case, the research workflow involves:- creating a new dataset
- testing outputs of existing algorithms on the dataset
- publication of results
- selection or creation of underlying (audio) data (the actual audio might be in the dataset or the dataset might reference material - e.g. for copyright reasons)
- creation of ground-truth annotations for the audio and the type of algorithm (e.g. chord sequences for chord estimation, onset times for onset detection)
- software for the algorithms
- the new dataset
- identification of existing datasets against which results will be compared
- results of applying the algorithms to the dataset
- documentation of the testing methodology - e.g. method and algorithm parameters (including any default parameter values).
All of these should be documented and backed up.
Note that if existing algorithms have published results using the same existing datasets and methodology, then results should be directly comparable between the published results and the results for the new dataset. In this case, most of the methodology is already documented and only details specific to the new dataset need to be recorded separately.
If the testing is scripted, then the code used would be sufficient documentation during the research - readable documentation only being required at publication.
Quantitative research - New Algorithm¶
Data involved includes:A common use-case in C4DM research is to run a newly-developed analysis algorithm on a set of audio examples and evaluate the algorithm by comparing its output with that of a human annotator. Results are then compared with published results using the same input data to determine whether the newly proposed approach makes any improvement on the state of the art.
- software for the algorithm
- an annotated dataset against which the algorithm can be tested
- results of applying the new algorithm and competing algorithms to the dataset
- documentation of the testing methodology
Note that if other algorithms have published results using the same dataset and methodology, then results should be directly comparable between the published results and the results for the new algorithm. In this case, most of the methodology is already documented and only details specific to the new algorithm (e.g. parameters) need to be recorded separately.
Also, if the testing is scripted, then the code used would be sufficient documentation during the research - readable documentation only being required at publication.
Qualitative research¶
An example would be using interviews with performers to evaluate a new instrument design.
The workflow is:- Gather data for the experiment (e.g. though interviews)
- Analyse data
- Publish data
- the interface design
- Captured audio from performances
- Recorded interviews with performers (possibly audio or video)
- Interview transcripts
Survey participants and interviewees retain copyright over their contributions unless they are specifically assigned to you! In order to have the freedom to publish the content a suitable rights waiver / transfer of copyright / clearance form / licence agreement should be signed. Or agreed on tape. Also, the people (or organisation) recording the event will have copyright on their materials... unless assigned/waived/licensed (e.g. video / photos / sound recordings). Most of this can be dealt with fairly informally for most research, but if you want to publish data then a more formal agreement is sensible. Rather than transferring copyright, an agreement to publish the (possibly edited) materials under a particular license might be appropriate.
Creators of materials (e.g. interviewees) always retain moral rights to their words: they have the right to be named as the author of their content; and they maintain the right to object to derogatory treatment of their material. Note that this means that in order to publish anonymised interviews, you should have an agreement that allows this.
If people are named in interviews (even if they're not the interviewee) then the Data Protection Act might be relevant.
The research might also involve:- Demographic details of participants
- Identifiable participants (Data Protection)
- Release forms for people taking part
At The End Of The Research¶
Whether you have finished a research project or simply completed an identifiable unit of research (e.g. published a paper based on your research), you should look at:
- Archiving research data
- Publishing research data
- Reviewing the data management plan (possibly for the project final report)
- Summarising the results
- Publishing a relevant sub-set of research data / summarised data to support your paper
- Publishing the paper
Note that the EPSRC data management principles require sources of data to be referenced.
Research Management¶
The data management concerns of a PI will largely revolve around planning and appraisal of data management for research projects: to make sure that they conform with institutional policy and funder requirements; and to ensure that the data management needs of the research project are met.
A data management plan (e.g. for use in a grant proposal) will show that you have considered:- the costs of preserving your data;
- funder requirements for data preservation and publication;
- institutional data management policy
- and ethical issues surrounding data management (e.g. data relating to human participants).
- legalities (Freedom of Information, Copyright and Data Protection)
- covering the research council requirements
- data management during the project
- data archiving
- data publication
After the project is completed, an appraisal of how the data was managed should be carried out as part of the project's "lessons learned".
Data management training should provide an overview of all the above, and keep PIs informed of any changes in the above that affect data management requirements.
Further Reading Material¶
Additional Notes
Training the Trainers
Legislation
Copyright
Data Protection
Freedom Of information
Research Council Requirements
Resources For Learning Materials
Data Management By Researcher Need
At The End Of The Research
Before The Research
During The Research
Data Management Skills
Archiving research data
Backing up
Documenting data
Managing Software As Data
Publishing research data
And more repositories
MUSHRA
Research Management