Sound Data Management Training » History » Version 84
Version 83 (Steve Welburn, 2012-09-25 12:11 PM) → Version 84/110 (Steve Welburn, 2012-09-25 12:39 PM)
h1. WP1.2 Online Training Material
{{>toc}}
We consider three stages of a research project, and the appropriate research data management considerations for each of those stages. The stages are:
* [[Before The Research|before the research]];
* [[During The Research|during the research]];
* [[At The End Of The Research|at the end of the research]].
In addition, we consider the [[Research Management|responsibilities of a Principal Investigator]] regarding data management.
{{include(Data Management By Research Stage)}}
h2. Why manage research data ?
Funder requirements: http://researchonline.lshtm.ac.uk/208596/
Ponemon reports for Intel on the "Lost Laptop problem" ~10% of Education and Research laptops are lost during their lifetime.
PC World study on laptop failure rates: 20-30% of laptops with a significant failure
h3. Failure Trends In A Large Disk Drive Population
FAST '07 paper on "Failure Trends In A Large Disk Drive Population":https://www.usenix.org/conference/fast-07/failure-trends-large-disk-drive-population
Google report on over 100,000 consumer-grade disk drives from 80-400 GB produced in or after 2001 and used within Google. Data collected December 2005 - August 2006. Disk drives had a burn-in process and only those that were commissioned for use were included in the study - certain basic defects may well be excluded from this report.
bq. the most accurate definition we can present of a failure event for our study is: a drive is considered to have failed if it was replaced as part of a repairs procedure. Note that this definition implicitly excludes drives that were replaced due to an upgrade.
~3% in first 3 months, ~2% up to 1 year, ~8% @ 2 years, ~9% @ 3 years, ~6% @ 4 years, ~7% @ 5 years
NB: Variation with model and manufacturer!
In the first 6 months, the risk of failure is highest for low & high utilisation!
* ~10% for high utilisation in the first 3 months
* for 3-year old drives ~4-5% chance of failure whatever the utilisation
* failures are most likely at low drive temperatures (on start-up ?) i.e. < 25 deg. C
* drives over 2 years old are most likely to fail at high temperatures (could be mode of failure ?)
Disks with SMART scan errors are 10 times more likely to fail - almost 30% of drives with a SMART scan error failed within 8 months of the error.
* If a drive up to 8 months old gets a scan error, there's a 90% chance of it surviving at least 8 months
* If a drive over 2 years old gets a scan error, there's a 60% chance of it surviving at least 8 months
* If you have more than 1 scan error on a drive, it's significantly less likely to survive
* Similar for SMART reallocation counts AFR almost 20% if reallocation occurs in first 3 months
Hard drive manufacturers often quote yearly failure rates below 2% [2]
User studies have seen rates as high as 6% [9]
Between 15-60% of drives returned to manufacturers having been considered to have failed by users have no defect as far as the manufacturers are concerned [7]
Between 20-30% “no problem found” cases were observed after analyzing failed drives from a study of 3477 disks [11]
Failure rates are known bq. the most accurate definition we can present of a failure event for our study is: a drive is considered to be highly correlated with drive models, manufacturers and vintages [18].
have failed if it was replaced as part of a repairs procedure. Note that this definition implicitly excludes drives that were replaced due to an upgrade.
h2. Overarching concerns
Human participation - ethics, data protection
Audio data - copyright
Storage - where ? how ? SLA ?
Short-term resilient storage for work-in-progress
Long-term archival storage for research data outputs
Curation of archived data - refreshing media and formats
Drivers - FoI, RCUK
{{>toc}}
We consider three stages of a research project, and the appropriate research data management considerations for each of those stages. The stages are:
* [[Before The Research|before the research]];
* [[During The Research|during the research]];
* [[At The End Of The Research|at the end of the research]].
In addition, we consider the [[Research Management|responsibilities of a Principal Investigator]] regarding data management.
{{include(Data Management By Research Stage)}}
h2. Why manage research data ?
Funder requirements: http://researchonline.lshtm.ac.uk/208596/
Ponemon reports for Intel on the "Lost Laptop problem" ~10% of Education and Research laptops are lost during their lifetime.
PC World study on laptop failure rates: 20-30% of laptops with a significant failure
h3. Failure Trends In A Large Disk Drive Population
FAST '07 paper on "Failure Trends In A Large Disk Drive Population":https://www.usenix.org/conference/fast-07/failure-trends-large-disk-drive-population
Google report on over 100,000 consumer-grade disk drives from 80-400 GB produced in or after 2001 and used within Google. Data collected December 2005 - August 2006. Disk drives had a burn-in process and only those that were commissioned for use were included in the study - certain basic defects may well be excluded from this report.
bq. the most accurate definition we can present of a failure event for our study is: a drive is considered to have failed if it was replaced as part of a repairs procedure. Note that this definition implicitly excludes drives that were replaced due to an upgrade.
~3% in first 3 months, ~2% up to 1 year, ~8% @ 2 years, ~9% @ 3 years, ~6% @ 4 years, ~7% @ 5 years
NB: Variation with model and manufacturer!
In the first 6 months, the risk of failure is highest for low & high utilisation!
* ~10% for high utilisation in the first 3 months
* for 3-year old drives ~4-5% chance of failure whatever the utilisation
* failures are most likely at low drive temperatures (on start-up ?) i.e. < 25 deg. C
* drives over 2 years old are most likely to fail at high temperatures (could be mode of failure ?)
Disks with SMART scan errors are 10 times more likely to fail - almost 30% of drives with a SMART scan error failed within 8 months of the error.
* If a drive up to 8 months old gets a scan error, there's a 90% chance of it surviving at least 8 months
* If a drive over 2 years old gets a scan error, there's a 60% chance of it surviving at least 8 months
* If you have more than 1 scan error on a drive, it's significantly less likely to survive
* Similar for SMART reallocation counts AFR almost 20% if reallocation occurs in first 3 months
Hard drive manufacturers often quote yearly failure rates below 2% [2]
User studies have seen rates as high as 6% [9]
Between 15-60% of drives returned to manufacturers having been considered to have failed by users have no defect as far as the manufacturers are concerned [7]
Between 20-30% “no problem found” cases were observed after analyzing failed drives from a study of 3477 disks [11]
Failure rates are known bq. the most accurate definition we can present of a failure event for our study is: a drive is considered to be highly correlated with drive models, manufacturers and vintages [18].
have failed if it was replaced as part of a repairs procedure. Note that this definition implicitly excludes drives that were replaced due to an upgrade.
h2. Overarching concerns
Human participation - ethics, data protection
Audio data - copyright
Storage - where ? how ? SLA ?
Short-term resilient storage for work-in-progress
Long-term archival storage for research data outputs
Curation of archived data - refreshing media and formats
Drivers - FoI, RCUK