Archiving research data » History » Version 7
Steve Welburn, 2012-09-04 12:33 PM
1 | 1 | Steve Welburn | h1. Archiving research data |
---|---|---|---|
2 | 1 | Steve Welburn | |
3 | 1 | Steve Welburn | For archival purposes data needs to be stored in a location which provides facilities for long-term preservation of data. As well as standard data management concerns (e.g. backup, documentation) the media and the file formats will need to be appropriate for long-term use. |
4 | 1 | Steve Welburn | |
5 | 1 | Steve Welburn | This may involve: |
6 | 1 | Steve Welburn | * refreshing the media at suitable intervals by moving data onto new media |
7 | 1 | Steve Welburn | * creating copies of the data in new formats to allow their use (e.g. converting data in closed formats to open formats, updating data to new versions of file formats). |
8 | 2 | Steve Welburn | |
9 | 3 | Steve Welburn | h2. Media |
10 | 3 | Steve Welburn | |
11 | 5 | Steve Welburn | Archive copies of data may be held on the same types of media as used during research. Additionally, Write-Once media (e.g. CD-R, DVD+/-R, BDR) may be appropriate. |
12 | 5 | Steve Welburn | |
13 | 5 | Steve Welburn | Removable drives (e.g. USB flash drives, firewire HDD) may be used, but there is a risk of hardware failure with these devices - they are not "just" data storage. |
14 | 5 | Steve Welburn | |
15 | 5 | Steve Welburn | Removable media (e.g. CD-R, tapes) do not have the risk of hardware failure but the media themselves may be damaged or become unusable - the estimated lifetime of an optical disc is 2-100 years. Whether a specific disc will last 2 years or 100 is not something that can easily be judged - although buying high quality media rather than cheap packs of 100 discs may help. |
16 | 5 | Steve Welburn | |
17 | 5 | Steve Welburn | With all external / removable options, there is a risk of obsolescence |
18 | 6 | Steve Welburn | * devices to read removable media may no longer be commonplace (e.g. floppy disc drives, ZIP drives) |
19 | 1 | Steve Welburn | * formats used for removable media may no longer be supported (e.g. various formats for DVD-RAM discs) |
20 | 6 | Steve Welburn | * interfaces used for removable drives may no longer be commonplace (e.g. parallel or SCSI ports, PATA/IDE disc drives) |
21 | 6 | Steve Welburn | |
22 | 6 | Steve Welburn | All media decay / become obsolete over time. It is therefore necessary to refresh the media by copying the data to new media at intervals. Doing this regularly reduces the risk of discovering that your archived data is inaccessible. |
23 | 5 | Steve Welburn | |
24 | 3 | Steve Welburn | h2. File Formats |
25 | 4 | Steve Welburn | |
26 | 7 | Steve Welburn | File formats also become obsolete. Although the original data should be archived, it is also recommended that copies of data are stored in more accessible formats. e.g. storing PDF outputs from LaTeX source, TIFF versions of images, FLAC copies of audio files. The more specific the source format the stronger the requirement for readable formats! Closed formats (e.g. Microsoft Word documents) are particularly vulnerable to obsolescence |
27 | 7 | Steve Welburn | |
28 | 7 | Steve Welburn | * LaTeX source - will all the required packages be available if you want to rebuild the document ? |
29 | 7 | Steve Welburn | * Images - will the format be available ? is ait a closed format (e.g. GIF) ? |
30 | 7 | Steve Welburn | * Audio - is it a lossy format ? will future decoders produce the same audio you expect from the file ? |
31 | 7 | Steve Welburn | |
32 | 2 | Steve Welburn | In the future, current audio formats may become obsolete, we therefore recommend that when archiving audio files, copies of the data should be stored in an open lossless format as well as in the original format. We would currently recommend using "FLAC":http://flac.sourceforge.net/. |