Evidence Promoting Good Data Management » History » Version 73
Evidence Promoting Good Data Management¶
- Evidence Promoting Good Data Management
If you have any additional examples that you would like to share, please email them to: rdm.c4dm at gmail.com
The Lost Laptop Problem¶
- 2010 Ponemon Institute report for Intel re. US laptops
- On average, 2.3% of laptops assigned to employees are lost each year
- In education & research that rises to 3.7%, with 10.8% of laptops being lost before the end of their useful life
- ~3 years i.e. within 1 PhD of allocation!
- 75% lost outside the workplace
- Very similar results from 2011 European report!
Intel 2010, The Billion Dollar Lost Laptop Problem - http://tinyurl.com/8c9m4bn
Intel 2011, The Billion Euro Laptop Problem - http://tinyurl.com/9wpbxn9
- 2011 PC World Laptop Reliability Survey from 63,000 readers:
- 22.6% had signifcant problems during the product's lifetime
- Of which...
- 19% had OS problems ~1 in 25 of all laptops
- 18% had HDD problems ~1 in 25 of all laptops
- 10% PSU problems ~1 in 50 of all laptops
PC World 2011 - http://tinyurl.com/876qza5
Hard Disk Failures¶
- Failure Trends In A Large Disk Drive Population
- Usenix conference on File and Storage Technologies 2007 (FAST '07)
- Eduardo Pinheiro & Wolf-Dietrich Weber, Google Inc.
- Data collected from over 100,000 disk drives at Google
- As part of repairs procedures:
- ~13% of disk drives replaced over 3 years
- ~20% of disk drives replaced over 4 years
Data management in the cloud¶
See JISC/DCC document "Curation In The Cloud" - http://tinyurl.com/8nogtmv
Service agreements may give wide-ranging rights to the data service.
Google Terms Of Service¶
1 March 2012 Google Terms of Service : http://tinyurl.com/89dc9fa
When you upload or otherwise submit content to our Services, you give Google (and those we work with) a worldwide license to use, host, store, reproduce, modify, create derivative works (such as those resulting from translations, adaptations or other changes we make so that your content works better with our Services), communicate, publish, publicly perform, publicly display and distribute such content. The rights you grant in this license are for the limited purpose of operating, promoting, and improving our Services, and to develop new ones. This license continues even if you stop using our Services (for example, for a business listing you have added to Google Maps).
Microsoft Services Agreement¶
19 October 2012 Microsoft services agreement : http://tinyurl.com/8e4kucy
When you upload your content to the services, you agree that it may be used, modifed, adapted, saved, reproduced, distributed, and displayed to the extent necessary to protect you and to provide, protect and improve Microsoft products and services. For example, we may occasionally use automated means to isolate information from email, chats, or photos in order to help detect and protect against spam and malware, or to improve the services with new features that makes them easier to use. When processing your content, Microsoft takes steps to help preserve your privacy.
BBC Domesday Project¶1986 Project to do a modern-day Domesday book (early crowd-sourcing)
- Used “BBC Master” computers with data on laserdisc
- Collected 147,819 pages of text and 23,225 photos
- Media expiring and obsolete technology put the data at risk!
- Required emulation of software
- Images restored from original masters
- Don't use obscure formats!
- Don't use obscure media!
- Don't rely on technology being available!
- Do keep original source material!
Google images for BBC Domesday
Piwowar, Heather A., Roger S. Day, and Douglas B. Fridsma. Sharing detailed research data is associated with increased citation rate.
PLoS One 2.3 (2007): e308.
Disk Drives Break¶
Laptops Break / Get Broken¶
More To Read¶
Albers, S. Editorial: Well Documented Articles Achieve More Impact
BuR Business Research Journal, Vol. 2, No.2, May 2009
Anderson, Richard G., et al. The role of data/code archives in the future of economic research.
Journal of Economic Methodology 15.1 (2008): 99-119.
Borgman, Christine L. "The conundrum of sharing research data."
Journal of the American Society for Information Science and Technology 63.6 (2012): 1059-1078.
Campbell, Eric G., et al. "Data withholding in academic genetics."
JAMA: the journal of the American Medical Association 287.4 (2002): 473-480.
Evanschitzky, Heiner, et al. Replication research's disturbing trend.
Journal of Business Research 60.4 (2007): 411-415.
Fischer, Beth A., and Michael J. Zigmond. "The essential nature of sharing in science."
Science and engineering ethics 16.4 (2010): 783-799.
Freckleton, R.P., P. Hulme, P. Giller and G. Kerby. 2005. The changing face of applied ecology.
J. Appl. Ecol. 42:1–3.
Gleditsch, N.P., C. Metelits and H. Strand. 2003. Posting your data: Will you be scooped or will you be famous?.
Int. Stud. Perspect. 4:89–97.
Lancaster, Larry, and Alan Rowe. Measuring Real World Data Availability.
Proceedings of the LISA 2001 15th Systems Administration Conference. 2001.
McCullough, Bruce D., Kerry Anne McGeary, and Teresa D. Harrison. Lessons from the JMCB Archive.
Journal of Money, Credit, and Banking 38.4 (2006): 1093-1107.
Piwowar, Heather A., and Wendy W. Chapman. "Public sharing of research datasets: a pilot study of associations."
Journal of informetrics 4.2 (2010): 148-156.
Piwowar, Heather A., et al. "Towards a data sharing culture: recommendations for leadership from academic health centers."
PLoS medicine 5.9 (2008): e183.
Schroeder, Bianca, and Garth A. Gibson. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you.
Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST). 2007.
Vandewalle, Patrick, Jelena Kovacevic, and Martin Vetterli. "Reproducible research in signal processing."
Signal Processing Magazine, IEEE 26.3 (2009): 37-47.
Whitlock, Michael C. "Data archiving in ecology and evolution: best practices."
Trends in ecology & evolution 26.2 (2011): 61-65.
Whitlock, Michael C., et al. "Data archiving."
The American Naturalist 175.2 (2010): 145-146.
Wicherts, Jelte M., Marjan Bakker, and Dylan Molenaar. "Willingness to share research data is related to the strength of the evidence and the quality of reporting of statistical results."
PloS one 6.11 (2011): e26828.
NEED FOR AN INTERNATIONAL REPOSITORY FOR ORIGINAL RESEARCH DATA
Thatcher, 70 (1807): 167-168
Science 16 August 1929: Vol. 70 no. 1807 pp. 167-168
Research Data in the Digital Age
Daniel Kleppner and Phillip A. Sharp
Science 24 July 2009: Vol. 325 no. 5939 p. 368
Sharing Research Data Urged
Science 16 August 1985: Vol. 229 no. 4714 p. 632