Sound Data Management Training » History » Version 92

Steve Welburn, 2012-10-23 02:46 PM

1 5 Steve Welburn
h1. WP1.2 Online Training Material
2 1 Steve Welburn
3 9 Steve Welburn
{{>toc}}
4 9 Steve Welburn
5 68 Steve Welburn
We consider three stages of a research project, and the appropriate research data management considerations for each of those stages. The stages are:
6 75 Steve Welburn
* [[Before The Research|before the research]];
7 75 Steve Welburn
* [[During The Research|during the research]];
8 75 Steve Welburn
* [[At The End Of The Research|at the end of the research]].
9 1 Steve Welburn
10 75 Steve Welburn
In addition, we consider the [[Research Management|responsibilities of a Principal Investigator]] regarding data management.
11 75 Steve Welburn
12 77 Steve Welburn
{{include(Data Management By Research Stage)}}
13 44 Steve Welburn
14 82 Steve Welburn
h2. Why manage research data ?
15 82 Steve Welburn
16 82 Steve Welburn
Funder requirements: http://researchonline.lshtm.ac.uk/208596/
17 82 Steve Welburn
18 83 Steve Welburn
Ponemon reports for Intel on the "Lost Laptop problem" ~10% of Education and Research laptops are lost during their lifetime.
19 83 Steve Welburn
20 83 Steve Welburn
PC World study on laptop failure rates: 20-30% of laptops with a significant failure
21 83 Steve Welburn
22 83 Steve Welburn
h3. Failure Trends In A Large Disk Drive Population
23 83 Steve Welburn
24 88 Steve Welburn
Identified ~13% of hard drives being replaced over 3 years, 20% over 4 years as a result of a repair being required!
25 86 Steve Welburn
26 83 Steve Welburn
FAST '07 paper on "Failure Trends In A Large Disk Drive Population":https://www.usenix.org/conference/fast-07/failure-trends-large-disk-drive-population
27 83 Steve Welburn
28 89 Steve Welburn
Google report on over 100,000 consumer-grade disk drives from 80-400 GB produced in or after 2001 and used within Google. Data collected December 2005 - August 2006. Disk drives had a burn-in process and only those that were commissioned for use were included in the study - certain basic defects may well be excluded from this report. Also, discs were largely use in servers resulting in (relatively) large hours used relative to desktop / laptop computers.
29 83 Steve Welburn
30 84 Steve Welburn
bq. the most accurate definition we can present of a failure event for our study is: a drive is considered to have failed if it was replaced as part of a repairs procedure. Note that this definition implicitly excludes drives that were replaced due to an upgrade.
31 84 Steve Welburn
32 84 Steve Welburn
~3% in first 3 months, ~2% up to 1 year, ~8% @ 2 years, ~9% @ 3 years, ~6% @ 4 years, ~7% @ 5 years
33 84 Steve Welburn
34 84 Steve Welburn
NB: Variation with model and manufacturer!
35 84 Steve Welburn
36 84 Steve Welburn
In the first 6 months, the risk of failure is highest for low & high utilisation!
37 84 Steve Welburn
* ~10% for high utilisation in the first 3 months
38 84 Steve Welburn
* for 3-year old drives ~4-5% chance of failure whatever the utilisation
39 84 Steve Welburn
* failures are most likely at low drive temperatures (on start-up ?) i.e. < 25 deg. C
40 84 Steve Welburn
* drives over 2 years old are most likely to fail at high temperatures (could be mode of failure ?)
41 84 Steve Welburn
42 84 Steve Welburn
Disks with SMART scan errors are 10 times more likely to fail - almost 30% of drives with a SMART scan error failed within 8 months of the error.
43 84 Steve Welburn
* If a drive up to 8 months old gets a scan error, there's a 90% chance of it surviving at least 8 months
44 84 Steve Welburn
* If a drive over 2 years old gets a scan error, there's a 60% chance of it surviving at least 8 months
45 84 Steve Welburn
* If you have more than 1 scan error on a drive, it's significantly less likely to survive
46 84 Steve Welburn
* Similar for SMART reallocation counts AFR almost 20% if reallocation occurs in first 3 months
47 85 Steve Welburn
* ...but over 36% of failed drives had zero counts on all variables
48 85 Steve Welburn
49 85 Steve Welburn
bq. Talagala and Patterson [20] perform a detailed error analysis of 368 SCSI disk drives over an eighteen month period, reporting a failure rate of 1.9%. Results on a larger number of desktop-class ATA drives under deployment at the Internet Archive are presented by Schwarz et al [17]. They report on a 2% failure rate for a population of 2489 disks during 2005, while mentioning that replacement rates have been as high as 6% in the past. Gray and van Ingen [9] cite observed failure rates ranging from 3.3-6% in two large web properties with 22,400 and 15,805 disks respectively. A recent study by Schroeder and Gibson [16] helps shed light into the statistical properties of disk drive failures. The study uses failure data from several large scale deployments, including a large number of SATA drives. They report a significant overestimation of mean time to failure by manufacturers and a lack of infant mortality effects. None of these user studies have attempted to correlate failures with SMART parameters or other environmental factors.
50 85 Steve Welburn
51 84 Steve Welburn
52 83 Steve Welburn
Hard drive manufacturers often quote yearly failure rates below 2% [2]
53 83 Steve Welburn
User studies have seen rates as high as 6% [9]
54 83 Steve Welburn
55 83 Steve Welburn
Between 15-60% of drives returned to manufacturers having been considered to have failed by users have no defect as far as the manufacturers are concerned [7]
56 1 Steve Welburn
Between 20-30% “no problem found” cases were observed after analyzing failed drives from a study of 3477 disks [11]
57 83 Steve Welburn
58 84 Steve Welburn
Failure rates are known to be highly correlated with drive models, manufacturers and vintages [18].
59 83 Steve Welburn
60 90 Steve Welburn
Sharing Detailed Research Data Is Associated with Increased Citation Rate
61 90 Steve Welburn
62 90 Steve Welburn
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0000308
63 90 Steve Welburn
64 91 Steve Welburn
65 91 Steve Welburn
Gleditsch, N.P., C. Metelits and H. Strand. 2003. Posting your data: Will you be scooped or will you be famous?.
66 91 Steve Welburn
67 91 Steve Welburn
Int. Stud. Perspect. 4:89–97.
68 91 Steve Welburn
69 91 Steve Welburn
Freckleton, R.P., P. Hulme, P. Giller and G. Kerby. 2005. The changing face of applied ecology.
70 91 Steve Welburn
71 91 Steve Welburn
J. Appl. Ecol. 42:1–3.
72 91 Steve Welburn
73 8 Steve Welburn
h2. Overarching concerns
74 8 Steve Welburn
75 92 Steve Welburn
http://muse.jhu.edu/journals/mcb/summary/v038/38.4mccullough.html
76 92 Steve Welburn
77 92 Steve Welburn
78 8 Steve Welburn
Human participation - ethics, data protection
79 10 Steve Welburn
80 10 Steve Welburn
Audio data - copyright
81 20 Steve Welburn
82 21 Steve Welburn
Storage - where ? how ? SLA ?
83 20 Steve Welburn
84 21 Steve Welburn
Short-term resilient storage for work-in-progress
85 1 Steve Welburn
86 1 Steve Welburn
Long-term archival storage for research data outputs
87 21 Steve Welburn
88 21 Steve Welburn
Curation of archived data - refreshing media and formats
89 1 Steve Welburn
90 1 Steve Welburn
Drivers - FoI, RCUK