Documenting data » History » Version 6

Simon Dixon, 2013-02-25 06:39 PM

1 1 Steve Welburn
h1. Documenting data
2 1 Steve Welburn
3 1 Steve Welburn
h2. What should you document ?
4 1 Steve Welburn
5 4 Steve Welburn
You should document the data so that people can understand it - what units the data is in, how the data was created, why the data was created and possible uses for the data.
6 1 Steve Welburn
7 1 Steve Welburn
As well as summary documentation for the entire dataset, individual data files should have their own documentation.
8 1 Steve Welburn
9 1 Steve Welburn
h2. How to document data
10 1 Steve Welburn
11 1 Steve Welburn
* Use a suitable directory structure. Individual data files can then have documentation giving a summary of the meaning of all files within a folder rather having individual pieces of documentation for each file
12 1 Steve Welburn
* Use meaningful filenames
13 1 Steve Welburn
** The more meaningful the better
14 1 Steve Welburn
** However, they should be succinct
15 1 Steve Welburn
** It may be necessary to refer to an explanation of the filenames to identify their content
16 1 Steve Welburn
** Files may be moved from their original directory structure so filenames should be sufficient to identify a particular file
17 2 Steve Welburn
* If documentation is required to understand file contents, copy the documentation when copying the files 
18 1 Steve Welburn
* Use standard file formats where possible - and preferably open formats so that files can be reused
19 1 Steve Welburn
* Create README files with textual explanations of file content
20 1 Steve Welburn
* Use the capabilities of file formats for self-documentation
21 1 Steve Welburn
** If you have text files of data, consider including comment lines for explanations
22 5 Steve Welburn
** Fill in author, title, date and comments for file formats that support them (e.g. PDF, Word .doc etc.)
23 1 Steve Welburn
** Consider including <!-- --> comments in XML data
24 3 Steve Welburn
* If data is created algorithmically / by code
25 3 Steve Welburn
** Consider automatically writing out textual descriptions when the data is created
26 3 Steve Welburn
** Document the values of all the parameters used to create the data
27 6 Simon Dixon
** Remember to document the actual values of parameters for which default values were accepted - the default values might change with different versions of the code