Version 15 - History - SWC2013TDD - Software Carpentry at QM

SWC2013TDD » History » Version 15

Version 14 (Chris Cannam, 2013-02-07 09:36 AM) → Version 15/19 (Chris Cannam, 2013-02-08 12:01 PM)

h1. Test-driven development outline

*Note*: This section went very badly in the MAT workshop. The following has been reworked subsequently, as a possible outline for future workshops.

We assume that the "intro to Python" section has at least introduced how you would run a Python program and compare the output against an external source of "correct" results; also that the audiofile NumPy/audiofile section has discussed what shown how to suck in an entire (mono) audio file consists of. as a NumPy array.

h2. Motivation

We'll refer first back to the "intro to Python" example, with the text file of dates and observations.

<pre>
Date,Species,Count
2012.04.28,marlin,2
2012.04.28,turtle,1
2012.04.28,shark,3
# I think it was a Marlin... luis
2012.04.27,marlin,4
</pre>

We have our program that prints out the number of marlin.

<pre>
$ python count-marlin.py
2
$
</pre>

We can check this against some human-generated output, or the result of "grep" or something if the program is simple enough, in order to see whether it produces the right result. But what if we change the program to add a new feature -- will we remember to check all the old behaviour as well and make sure we haven't broken it? What if the program as a whole is so complex and subtle that we don't actually know what its output will be?

We need to do two things:

# automate the tests, and
# make sure we test the individual components that the program is made up of (so we can be confident of its behaviour even when we don't know what the program as a whole should produce)

h2. Automating a test

Simple unit tests with Nose program that uses @assert@. It calls the fish counter for a known file, and checks the output.

Imagine we're dealing with audio data, We're starting to automate things. We can make it more convenient by using @nosetests@, which runs all the functions it finds called @test_@-something in files called @test_@-something in the current directory and subdirectories (recursively searched).

Split this out thus, and run it using @nosetests@.

h2. Testing units, and test-driven development

h3. Context

So consider we have a program that loads data from an audio file file, like

<pre>
import scikits.audiolab as al
sfile = al.Sndfile("testfiles/beatbox.wav")
count = sfile.nframes
samples = sfile.read_frames(count)
</pre>

and then (like many does something with @samples@.

Now, for a lot of methods -- particularly spectral methods) chops it domain ones -- the first thing we want to do is chop up @samples@ into fixed-length frames of a fixed length (1024 is a popular number), either overlapping or non-overlapping. (Draw diagram on whiteboard)

In this case the file has 253929 frames. At 1024 samples per frame is popular).

Let's consider a function that tells us frame, how many frames do we can expect this file to get from a file consist of a given size. if the frames are not overlapping?

253929/1024 comes out (integer division) as 247. Is that the right answer? I've no idea!

h3. Stub function

Sketch the frame count function into @framer.py@:

<pre>
def get_frame_count(nsamples, hop):
"""Given the number of samples, return the number of non-overlapping frames of length hop we can extract from them."""
return 0
</pre>

So we have And we're going to write a stub implementation that always returns zero.

h3. Unit tests

Now, before filling test, in the details, we go and write a test for it. The reason we do this first is so that we have the opportunity to think about what we really expect our function to do -- it will turn out that we have some decisions to make.

In @test_framer.py@:

<pre>
import framer as fr

def test_get_frame_count():
assert fr.get_frame_count(0, 10) == 0
assert fr.get_frame_count(4, 2) == 2
assert fr.get_frame_count(4, 3) == 2
</pre>

etc. Note that for the last of these, we have to start thinking about what to do with partial frames -- do we zero-pad them and return them? do we return them part-full? do we skip them entirely? (I would pick the first of these myself but it depends on the application.)

This is test-driven development -- we are using the phase in which we write the tests, to solve problems about what behaviour we really want.

We run @nosetests@, which runs all the functions it finds called @test_@-something in files called @test_@-something in the current directory and subdirectories (recursively searched), and it fails.

(Note -- it turns out that @nosetests@ will only pick up code from files that are not executable. On the network file system in the MAT lab, all files are executable it seems. Run @nosetests --exe@ to include those.)

h3. Implementation

*Exercise:* Implement @get_frame_count@ to satisfy the tests.

h2. Applying this to our marlin example

SoundSoftware Project » Tutorials and Workshops »

Software Carpentry at QM

SWC2013TDD » History » Version 15