SWC MAT 2013 - How we marked the assessed exercise

What we looked for

We gave marks for the following:

  • A README file that contained all the information we asked for
  • Good use of version control, with an evident trail of changes
  • Correct results
  • Readable code: sensible choice of variable names, relevant comments, division into functions that do one thing each
  • Small number of bonus points for nice extras, e.g. describing the expected file format in README or comments, labelling the plot fully, including unit tests, behaving sensibly if the user mistyped a filename on the command line.

We did not give marks for particularly clever, elegant or efficient methods (unless they were also easier to read) or for showing special breadth of knowledge in Python. The primary criterion in reviewing the code was "can I easily understand this?"

Some programs had the set of fish hardcoded in the program itself, while others worked with files containing any kind of fish. Both approaches were fine. However, we did generally give slightly higher marks to programs that would work with files containing any date, i.e. that would not start to fail after 2013.

It was nominally possible to achieve a 30% mark without producing any code (simply from the README and version control) and 80% with a readable implementation of the simplest possible program.

You can find an example solution linked at the bottom of this page.

Some remarks

README

  • We asked for a text file called README.txt -- some people provided a file in RTF (rich text) format instead. I guess this is the default format for whichever text editor was used, but it's not universally easy to read.
  • Quite a few people didn't explain how to run the program.

Version Control

  • We looked for a sensible trail of commits with meaningful commit log messages. Submissions that only had one or two commits, or whose messages were not informative, will have lost some marks as a result. Most people did a good job here.

Results

  • We checked the results against our reference graph, looking in particular for:
    • Handling the fact that January appeared in two separate years, showing the right number of months and including the Jan 2013 counts separately from those of Jan 2012.
    • Summing multiple readings for a fish within a single month (e.g. there were 12 cod in Feb 2012).
    • Excluding the turtles, as requested in the assignment.
    • Correctly handling months in which no observations of a particular fish were made (e.g. mackerel in April 2012). Most of the submissions correctly plotted a value at zero for these fish in these months. One failed to plot these values, but was accepted because it clearly used distinct points for observations, with no point at these months, rather than just lines. (Arguably we should have asked for a bar chart anyway!)
  • A couple of bonus marks were available for a nicely-labelled plot.
  • Some people exported a PNG of the results as well as plotting them (using savefig from Matplotlib). This is a nice idea for practical use, for plots to be used in publications etc.

Code

  • We hoped to see some use of functions to divide up the program into logical chunks.
  • Ideally, functions should have a single purpose and a clear flow from input to output (functions that modify global variables on the side, for example, are better avoided).
  • Ideally, the main part of the program would just get a filename from somewhere and pass it to a function that counts the fish.
  • We hoped to see meaningful variable names, aiming for the ability to read a single line and deduce what it does (correctly) from the words in it.
  • There are many possible ways to extract the a "month index" from a textual date with dot separators:
    • Probably the most robust is to divide the text up by splitting on dots, then convert the year and month fields separately to ints and calculate year*12 + month (subtracting a base year such as 2012 first).
    • Some of the programs took a substring of the text, e.g. line[:7] to extract a string like "2012.05" from the start of the line. This works and is pleasingly simple, but it's a bit fragile, for example it fails if someone omits the padding 0 from a month number ("2012.5.12").
    • A couple of programs took the first two fields and converted them to a floating-point number for use as a dictionary index (i.e. treating 2012.05 as a single number). This is ingenious but also rather fragile, confusing for example 2012.1 and 2012.10.
  • Python sets a trap in allowing conversion between True/False and numerical types. This caught out one person who wrote a function that returned False if a fish was in fact a turtle, but then used the return value as an index into a fish array. The False value was treated as zero, with the result that the turtles pretended to be marlins (the fish with index 0). This sort of thing can be tough to see while building a program, and is probably most easily caught through reviews or testing.
  • A nice way to avoid having to hard-code the set of acceptable fish is to use a Python dictionary, which is like an array but with a non-numerical index. (Some other languages call dictionaries "hashes" or "maps".) If you extract a fish name from a line, you can then use it as a key in a dictionary whose value is an array of counts of that fish. Although the problem could be solved (equally well in terms of marks) without using dictionaries, this approach is worth a look if you aren't familiar with them. The example solution linked below uses dictionaries.

Commenting

  • We hoped to see a comment describing what each function does, and a comment at the top of the file saying what the program is for.
  • Further comments are good if they make a programming decision clear. It's not necessary to comment every line, and if a line is particularly difficult to understand, it's better to rewrite it than comment it. We didn't give marks for simply including more comments, unless they were relevant.

Example solution

You can find an example solution at this Bitbucket repository.

This shows most of what we expected to see, although it isn't a "perfect" solution -- some other submissions may have scored more highly than this one would. This is simply the solution that one of us happened to come up with.