Documenting data

What should you document ?

You should document the data so that people can understand it - what units the data is in, how the data was created, why the data was created and possible uses for the data.

As well as summary documentation for the entire dataset, individual data files should have their own documentation.

How to document data

  • Use a suitable directory structure. Documentation can then give a summary of all the files within a folder.
  • Use meaningful filenames
    • The more meaningful the better
    • However, they should be succinct
    • It may be necessary to refer to an explanation of the filenames to identify their content
    • Files may be moved from their original directory structure so filenames should be sufficient to identify a particular file
  • If documentation is required to understand file contents, copy the documentation when copying the files
  • Use standard file formats where possible - and preferably open formats so that files can be reused
  • Create README files with textual explanations of file content
  • Use the capabilities of file formats for self-documentation
    • If you have text files of data, consider including comment lines for explanations
    • Fill in author, title, date and comments for file formats that support them (e.g. PDF, Word .doc etc.)
    • Consider including <!-- --> comments in XML data
  • If data is created algorithmically / by code
    • Consider automatically writing out textual descriptions when the data is created
    • Document the values of all the parameters used to create the data
    • Remember to document the actual values of parameters for which default values were accepted - the default values might change with different versions of the code