Backing up » History » Version 18
Version 17 (Steve Welburn, 2012-11-16 11:48 AM) → Version 18/33 (Steve Welburn, 2012-11-16 11:52 AM)
h1. Backing up
h2. Why back up your data ?
* [[Reliability|Hard disks die]]
* Portable devices can be lost or broken
* [[Disasters]] happen
* [[Tales Of Lost Data|Laptops get stolen]]
h2. How to back up data
The core principle is that backup copies of data should be stored in a different location to the main copy.
If you delete your local copy of the data then the primary copy will be the original backup... is that copy backed up anywhere ?
Suitable locations for backups are:
* A firesafe
* A network copy
** An network drive e.g. provided by the institution
** Internet storage (in the cloud)
** A data repository - this could be a public thematic / institutional repository for publishing completed research datasets, or an internal repository for archiving datasets during research
* A portable device / portable media which you keep somewhere other than under your desk
The best backup is the one you do!
Backing up on external devices means that you need access to the device... network drives and "internal" backups are usually more available. e.g. backup every time you're in the office / lab or at home.
h2. Can't I just put it in the cloud ?
You can, but the [[Cloud Service Agreements|service agreement]] with the provider may give them a lot of rights... review the service agreement and decide whether you are happy with it!
Looking at service agreements in November 2012, we found that Google's "terms":http://www.google.com/policies/terms/ let them use your data in any way which will improve their services - including publishing your data and creating derivative works. This is partly a side-effect of Google switching to a single set of terms for all their services. For Microsoft SkyDrive, the Windows Live "services agreement":http://windows.microsoft.com/en-US/windows-live/microsoft-services-agreement is pretty similar.
Apple's iCloud is "better":http://www.apple.com/legal/icloud/en/terms.html as they restrict publication rights to data which you want to make public / share. "Dropbox":https://www.dropbox.com/privacy is relatively good - probably because they just provide storage and aren't mining it to use in all their other services!
Even so, there are issues. Data stored in the cloud is still stored somewhere... you just don't have control over where that location is. Your data may be stored in a country which gives the government the right to access data. Also, the firm that stores your data may still be required to comply with the laws of its home country when the data is stored elsewhere. It is, however, unlikely that Digital Audio research data will be sensitive enough to find this an issue.
A Forbes article on "Can European Firms Legally Use US Clouds To Store Data":http://www.forbes.com/sites/ciocentral/2012/01/02/can-european-firms-legally-use-u-s-clouds-to-store-data/ stated that: Data:http://www.forbes.com/sites/ciocentral/2012/01/02/can-european-firms-legally-use-u-s-clouds-to-store-data/
bq. Both Amazon Web Services and Microsoft have recently acknowledged that they would comply with U.S. government requests to release data stored in their European clouds, even though those clouds are located outside of direct U.S. jurisdiction and would conflict with European laws.
If you are worried about what rights a service provider may have to your do store data in their the cloud, then consider encrypting it - e.g. using an encrypted .dmg file on a Mac, or using "Truecrypt":http://www.truecrypt.org/ Truecrypt for a cross-platform solution. These create an encrypted "disc" in a file which you can mount and treat like a real disc - but all the content is encrypted. Note that changing data on an encrypted disc may change the entire contents of the disc and need to resync the whole disc to the cloud storage. Alternatively, "BoxCryptor":https://www.boxcryptor.com/ or "encFs":http://www.arg0.net/encfs (also available "for Windows":http://tinyurl.com/683ye4q) BoxCryptor will encrypt all the individual files separately allowing synchronisation to operate more effectively. in a folder - file names are visible with the standard version, even those are encrypted with the (non-Free) "Unlimited" version.
"SpiderOak":http://spideroak.com provide "zero knowledge" privacy in which all data is encrypted locally before being submitted to the cloud, and SpiderOak do not have a copy of your decryption key - i.e. they can't actually examine your data.
h2. Why back up your data ?
* [[Reliability|Hard disks die]]
* Portable devices can be lost or broken
* [[Disasters]] happen
* [[Tales Of Lost Data|Laptops get stolen]]
h2. How to back up data
The core principle is that backup copies of data should be stored in a different location to the main copy.
If you delete your local copy of the data then the primary copy will be the original backup... is that copy backed up anywhere ?
Suitable locations for backups are:
* A firesafe
* A network copy
** An network drive e.g. provided by the institution
** Internet storage (in the cloud)
** A data repository - this could be a public thematic / institutional repository for publishing completed research datasets, or an internal repository for archiving datasets during research
* A portable device / portable media which you keep somewhere other than under your desk
The best backup is the one you do!
Backing up on external devices means that you need access to the device... network drives and "internal" backups are usually more available. e.g. backup every time you're in the office / lab or at home.
h2. Can't I just put it in the cloud ?
You can, but the [[Cloud Service Agreements|service agreement]] with the provider may give them a lot of rights... review the service agreement and decide whether you are happy with it!
Looking at service agreements in November 2012, we found that Google's "terms":http://www.google.com/policies/terms/ let them use your data in any way which will improve their services - including publishing your data and creating derivative works. This is partly a side-effect of Google switching to a single set of terms for all their services. For Microsoft SkyDrive, the Windows Live "services agreement":http://windows.microsoft.com/en-US/windows-live/microsoft-services-agreement is pretty similar.
Apple's iCloud is "better":http://www.apple.com/legal/icloud/en/terms.html as they restrict publication rights to data which you want to make public / share. "Dropbox":https://www.dropbox.com/privacy is relatively good - probably because they just provide storage and aren't mining it to use in all their other services!
Even so, there are issues. Data stored in the cloud is still stored somewhere... you just don't have control over where that location is. Your data may be stored in a country which gives the government the right to access data. Also, the firm that stores your data may still be required to comply with the laws of its home country when the data is stored elsewhere. It is, however, unlikely that Digital Audio research data will be sensitive enough to find this an issue.
A Forbes article on "Can European Firms Legally Use US Clouds To Store Data":http://www.forbes.com/sites/ciocentral/2012/01/02/can-european-firms-legally-use-u-s-clouds-to-store-data/ stated that: Data:http://www.forbes.com/sites/ciocentral/2012/01/02/can-european-firms-legally-use-u-s-clouds-to-store-data/
bq. Both Amazon Web Services and Microsoft have recently acknowledged that they would comply with U.S. government requests to release data stored in their European clouds, even though those clouds are located outside of direct U.S. jurisdiction and would conflict with European laws.
If you are worried about what rights a service provider may have to your do store data in their the cloud, then consider encrypting it - e.g. using an encrypted .dmg file on a Mac, or using "Truecrypt":http://www.truecrypt.org/ Truecrypt for a cross-platform solution. These create an encrypted "disc" in a file which you can mount and treat like a real disc - but all the content is encrypted. Note that changing data on an encrypted disc may change the entire contents of the disc and need to resync the whole disc to the cloud storage. Alternatively, "BoxCryptor":https://www.boxcryptor.com/ or "encFs":http://www.arg0.net/encfs (also available "for Windows":http://tinyurl.com/683ye4q) BoxCryptor will encrypt all the individual files separately allowing synchronisation to operate more effectively. in a folder - file names are visible with the standard version, even those are encrypted with the (non-Free) "Unlimited" version.
"SpiderOak":http://spideroak.com provide "zero knowledge" privacy in which all data is encrypted locally before being submitted to the cloud, and SpiderOak do not have a copy of your decryption key - i.e. they can't actually examine your data.