Sound Data Management Training » History » Version 93
Steve Welburn, 2012-10-23 02:52 PM
1 | 5 | Steve Welburn | h1. WP1.2 Online Training Material |
---|---|---|---|
2 | 1 | Steve Welburn | |
3 | 9 | Steve Welburn | {{>toc}} |
4 | 9 | Steve Welburn | |
5 | 68 | Steve Welburn | We consider three stages of a research project, and the appropriate research data management considerations for each of those stages. The stages are: |
6 | 75 | Steve Welburn | * [[Before The Research|before the research]]; |
7 | 75 | Steve Welburn | * [[During The Research|during the research]]; |
8 | 75 | Steve Welburn | * [[At The End Of The Research|at the end of the research]]. |
9 | 1 | Steve Welburn | |
10 | 75 | Steve Welburn | In addition, we consider the [[Research Management|responsibilities of a Principal Investigator]] regarding data management. |
11 | 75 | Steve Welburn | |
12 | 77 | Steve Welburn | {{include(Data Management By Research Stage)}} |
13 | 44 | Steve Welburn | |
14 | 82 | Steve Welburn | h2. Why manage research data ? |
15 | 82 | Steve Welburn | |
16 | 82 | Steve Welburn | Funder requirements: http://researchonline.lshtm.ac.uk/208596/ |
17 | 82 | Steve Welburn | |
18 | 83 | Steve Welburn | Ponemon reports for Intel on the "Lost Laptop problem" ~10% of Education and Research laptops are lost during their lifetime. |
19 | 83 | Steve Welburn | |
20 | 83 | Steve Welburn | PC World study on laptop failure rates: 20-30% of laptops with a significant failure |
21 | 83 | Steve Welburn | |
22 | 83 | Steve Welburn | h3. Failure Trends In A Large Disk Drive Population |
23 | 83 | Steve Welburn | |
24 | 88 | Steve Welburn | Identified ~13% of hard drives being replaced over 3 years, 20% over 4 years as a result of a repair being required! |
25 | 86 | Steve Welburn | |
26 | 83 | Steve Welburn | FAST '07 paper on "Failure Trends In A Large Disk Drive Population":https://www.usenix.org/conference/fast-07/failure-trends-large-disk-drive-population |
27 | 83 | Steve Welburn | |
28 | 89 | Steve Welburn | Google report on over 100,000 consumer-grade disk drives from 80-400 GB produced in or after 2001 and used within Google. Data collected December 2005 - August 2006. Disk drives had a burn-in process and only those that were commissioned for use were included in the study - certain basic defects may well be excluded from this report. Also, discs were largely use in servers resulting in (relatively) large hours used relative to desktop / laptop computers. |
29 | 83 | Steve Welburn | |
30 | 84 | Steve Welburn | bq. the most accurate definition we can present of a failure event for our study is: a drive is considered to have failed if it was replaced as part of a repairs procedure. Note that this definition implicitly excludes drives that were replaced due to an upgrade. |
31 | 84 | Steve Welburn | |
32 | 84 | Steve Welburn | ~3% in first 3 months, ~2% up to 1 year, ~8% @ 2 years, ~9% @ 3 years, ~6% @ 4 years, ~7% @ 5 years |
33 | 84 | Steve Welburn | |
34 | 84 | Steve Welburn | NB: Variation with model and manufacturer! |
35 | 84 | Steve Welburn | |
36 | 84 | Steve Welburn | In the first 6 months, the risk of failure is highest for low & high utilisation! |
37 | 84 | Steve Welburn | * ~10% for high utilisation in the first 3 months |
38 | 84 | Steve Welburn | * for 3-year old drives ~4-5% chance of failure whatever the utilisation |
39 | 84 | Steve Welburn | * failures are most likely at low drive temperatures (on start-up ?) i.e. < 25 deg. C |
40 | 84 | Steve Welburn | * drives over 2 years old are most likely to fail at high temperatures (could be mode of failure ?) |
41 | 84 | Steve Welburn | |
42 | 84 | Steve Welburn | Disks with SMART scan errors are 10 times more likely to fail - almost 30% of drives with a SMART scan error failed within 8 months of the error. |
43 | 84 | Steve Welburn | * If a drive up to 8 months old gets a scan error, there's a 90% chance of it surviving at least 8 months |
44 | 84 | Steve Welburn | * If a drive over 2 years old gets a scan error, there's a 60% chance of it surviving at least 8 months |
45 | 84 | Steve Welburn | * If you have more than 1 scan error on a drive, it's significantly less likely to survive |
46 | 84 | Steve Welburn | * Similar for SMART reallocation counts AFR almost 20% if reallocation occurs in first 3 months |
47 | 85 | Steve Welburn | * ...but over 36% of failed drives had zero counts on all variables |
48 | 85 | Steve Welburn | |
49 | 85 | Steve Welburn | bq. Talagala and Patterson [20] perform a detailed error analysis of 368 SCSI disk drives over an eighteen month period, reporting a failure rate of 1.9%. Results on a larger number of desktop-class ATA drives under deployment at the Internet Archive are presented by Schwarz et al [17]. They report on a 2% failure rate for a population of 2489 disks during 2005, while mentioning that replacement rates have been as high as 6% in the past. Gray and van Ingen [9] cite observed failure rates ranging from 3.3-6% in two large web properties with 22,400 and 15,805 disks respectively. A recent study by Schroeder and Gibson [16] helps shed light into the statistical properties of disk drive failures. The study uses failure data from several large scale deployments, including a large number of SATA drives. They report a significant overestimation of mean time to failure by manufacturers and a lack of infant mortality effects. None of these user studies have attempted to correlate failures with SMART parameters or other environmental factors. |
50 | 85 | Steve Welburn | |
51 | 84 | Steve Welburn | |
52 | 83 | Steve Welburn | Hard drive manufacturers often quote yearly failure rates below 2% [2] |
53 | 83 | Steve Welburn | User studies have seen rates as high as 6% [9] |
54 | 83 | Steve Welburn | |
55 | 83 | Steve Welburn | Between 15-60% of drives returned to manufacturers having been considered to have failed by users have no defect as far as the manufacturers are concerned [7] |
56 | 1 | Steve Welburn | Between 20-30% “no problem found” cases were observed after analyzing failed drives from a study of 3477 disks [11] |
57 | 83 | Steve Welburn | |
58 | 84 | Steve Welburn | Failure rates are known to be highly correlated with drive models, manufacturers and vintages [18]. |
59 | 83 | Steve Welburn | |
60 | 90 | Steve Welburn | Sharing Detailed Research Data Is Associated with Increased Citation Rate |
61 | 90 | Steve Welburn | |
62 | 90 | Steve Welburn | http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0000308 |
63 | 90 | Steve Welburn | |
64 | 91 | Steve Welburn | |
65 | 91 | Steve Welburn | Gleditsch, N.P., C. Metelits and H. Strand. 2003. Posting your data: Will you be scooped or will you be famous?. |
66 | 91 | Steve Welburn | |
67 | 91 | Steve Welburn | Int. Stud. Perspect. 4:89–97. |
68 | 91 | Steve Welburn | |
69 | 91 | Steve Welburn | Freckleton, R.P., P. Hulme, P. Giller and G. Kerby. 2005. The changing face of applied ecology. |
70 | 91 | Steve Welburn | |
71 | 91 | Steve Welburn | J. Appl. Ecol. 42:1–3. |
72 | 91 | Steve Welburn | |
73 | 93 | Steve Welburn | Lessons from the JMCB Archive |
74 | 8 | Steve Welburn | |
75 | 1 | Steve Welburn | http://muse.jhu.edu/journals/mcb/summary/v038/38.4mccullough.html |
76 | 93 | Steve Welburn | |
77 | 93 | Steve Welburn | |
78 | 93 | Steve Welburn | h2. Overarching concerns |
79 | 93 | Steve Welburn | |
80 | 92 | Steve Welburn | |
81 | 92 | Steve Welburn | |
82 | 8 | Steve Welburn | Human participation - ethics, data protection |
83 | 10 | Steve Welburn | |
84 | 10 | Steve Welburn | Audio data - copyright |
85 | 20 | Steve Welburn | |
86 | 21 | Steve Welburn | Storage - where ? how ? SLA ? |
87 | 20 | Steve Welburn | |
88 | 21 | Steve Welburn | Short-term resilient storage for work-in-progress |
89 | 1 | Steve Welburn | |
90 | 1 | Steve Welburn | Long-term archival storage for research data outputs |
91 | 21 | Steve Welburn | |
92 | 21 | Steve Welburn | Curation of archived data - refreshing media and formats |
93 | 1 | Steve Welburn | |
94 | 1 | Steve Welburn | Drivers - FoI, RCUK |