Using Computers to Analyse Recordings 2017 » History » Version 18

Chris Cannam, 2018-06-29 11:32 AM

1 18 Chris Cannam
h1. Using Computers to Analyse Recordings 2017/2018
2 1 Chris Cannam
3 1 Chris Cannam
h3. General outline
4 1 Chris Cannam
5 1 Chris Cannam
# Introductory notes and slides on acoustics and audio (DW)
6 1 Chris Cannam
# Introductory notes and slides on audio features (CC)
7 1 Chris Cannam
# Sonic Visualiser - hands on with waveform and spectrograms (CC)
8 1 Chris Cannam
# Introductory notes and slides on Vamp plugins (CC)
9 1 Chris Cannam
# Sonic Visualiser - hands on with Vamp plugins (CC)
10 1 Chris Cannam
11 1 Chris Cannam
h3. Audio material 
12 1 Chris Cannam
13 1 Chris Cannam
 * "piano-scale.wav":https://code.soundsoftware.ac.uk/projects/dhoxss15/repository/raw/data/piano-scale.wav - our simplest example, used for basic waveform, spectrogram, & chroma introductions
14 8 Chris Cannam
 * "A Friendly Warning":https://code.soundsoftware.ac.uk/projects/dhoxss15/repository/raw/data/A%20Friendly%20Warning.ogg - severe synthetic pop from the 80s, used to illustrate spectrograms a bit more and demonstrate straightforward timing-related features
15 16 Chris Cannam
 * "Frog Galliard":https://code.soundsoftware.ac.uk/projects/dhoxss15/repository/raw/data/lutemusic426.mp3 (Dowland) - illustrating both timing and pitch/harmonic features - file will be used throughout the week, possibly
16 7 Chris Cannam
 * "King Henry":https://code.soundsoftware.ac.uk/projects/dhoxss15/repository/raw/data/Music/07%20-%20King%20Henry.flac - to show appearance of vibrato and discuss aspects of pitch estimation, pitch perception, and temperament
17 1 Chris Cannam
18 7 Chris Cannam
h3. What we hope to cover
19 3 Chris Cannam
20 7 Chris Cannam
# Audio waveforms, and what we can learn from them on their own;
21 15 Chris Cannam
# Spectrograms: what is a spectrogram; how the terms "spectral", "spectrum", and "spectrogram" are related, as well as "discrete Fourier transform", "short-time Fourier transform", etc; time/frequency tradeoff; fundamental frequency and harmonic series; linear and logarithmic frequency scales
22 15 Chris Cannam
# Some common higher-level features derived from short-time Fourier analysis: onset detection and tempo estimation; pitch estimation and note segmentation
23 7 Chris Cannam
# Chromagrams: what is a chromagram; how the terms "pitch chroma", "chromagram", and "chroma features" are related; relationship to Constant-Q spectrograms; what use chroma features are; limitations; tuning/parameter considerations (tuning frequency, number of bins per octave, lack of consideration for temperament, etc); what "chroma features" saved to a file might look like compared to how a chromagram looks on screen
24 3 Chris Cannam
25 5 Chris Cannam
h3. Breakdown
26 1 Chris Cannam
27 1 Chris Cannam
(Direct links to the audio files themselves are included here, but for copyright reasons they might disappear at any time after the workshop!)
28 1 Chris Cannam
29 4 Chris Cannam
h5. Introductory notes and slides on audio features
30 1 Chris Cannam
31 5 Chris Cannam
h5. Sonic Visualiser - hands on with waveform and spectrograms
32 5 Chris Cannam
33 7 Chris Cannam
* *Waveform*. To cover:
34 10 Chris Cannam
** audio waveforms, and what we can learn from them on their own
35 7 Chris Cannam
36 1 Chris Cannam
## Start Sonic Visualiser and open "A Friendly Warning":https://code.soundsoftware.ac.uk/projects/dhoxss15/repository/raw/data/A%20Friendly%20Warning.ogg. (This is a severe, synthetic 80s pop song by an act called Act, which we're using as a first illustrative piece because so much about it is clear-cut and easily visible in waveform and spectrogram.)
37 1 Chris Cannam
## Click-and-drag through the file using Navigate tool, and also using the overview at the bottom of the window, to see the scope of the waveform view.
38 1 Chris Cannam
## Play from the start, just to get an idea what it sounds like.
39 1 Chris Cannam
## Return to the start and zoom in (using the zoom wheel, but noting that the mouse wheel also works).
40 1 Chris Cannam
## Notice the different shapes in waveform resulting from different types of synthetic percussive sound (low-frequency kick drum / higher frequency cymbal-type sounds). These can be related intuitively to the direct correspondence between signal voltage and speaker cone deflection.
41 1 Chris Cannam
## Continue until the vocal starts, and observe that we can see very little that can be directly related to the sung pitch, although if we zoom in we can quite clearly see sibilance (these frequencies around 10kHz are pretty much the sweet spot for visibility in a 44.1kHz waveform).
42 1 Chris Cannam
## Start a new session, open "piano-scale.wav":https://code.soundsoftware.ac.uk/projects/dhoxss15/repository/raw/data/piano-scale.wav, and play it.
43 1 Chris Cannam
## Some information can sort-of be perceived and measured from the waveform here: we can see when the notes start, and can get simple fundamental frequency estimate - zoom in to the first note, switch to Select mode, drag out one cycle - it's about 170 samples, so 44100/170 = 259 Hz - the note is a middle C so true value should be nearer to 261, but this is a fair approximation. (But this is a very simple example, and in particular it's one where the single note's fundamental frequency dominates the harmonic envelope so this simple waveform zero-crossing measurement can be carried out without octave errors.)
44 7 Chris Cannam
45 7 Chris Cannam
* *Spectrogram*. To cover:
46 10 Chris Cannam
** what is a spectrogram?
47 10 Chris Cannam
** how the terms "spectral", "spectrum", and "spectrogram" are related, as well as "discrete Fourier transform", "short-time Fourier transform", etc
48 10 Chris Cannam
** time/frequency tradeoff
49 10 Chris Cannam
** fundamental frequency and harmonic series
50 11 Chris Cannam
** linear and logarithmic frequency scales (frequency vs pitch)
51 7 Chris Cannam
52 11 Chris Cannam
## With the "piano-scale.wav":https://code.soundsoftware.ac.uk/projects/dhoxss15/repository/raw/data/piano-scale.wav file open, call up a plain spectrogram - Pane -> Add Spectrogram (or keyboard shortcut on the G key). This is the standard kind of "audio recorder/editor" spectrogram.
53 7 Chris Cannam
## Observe that the Y axis is frequency, with the full recorded frequency range; the X axis is time, as it is for most layers in Sonic Visualiser. The spectrogram is a simple time-frequency breakdown, the output of a series of short-time Fourier transforms, one for each horizontal step.
54 4 Chris Cannam
## Notice that, for each note, we can see the fundamental frequency most strongly and then the harmonics stacked above it. The harmonics are spaced more widely for higher notes because they are at multiples of the note's fundamental frequency, which is larger.
55 4 Chris Cannam
## The colour scale is a dB scale, which means that it boosts quieter content. For this reason we can see the noise floor (general background noise) even though it may not be audible. We can switch the colour scale to Linear to isolate only the strong frequencies. There isn't really enough detail to measure much here. (NB the default colour scheme is unhelpful to colour-blind users, so it might be worth changing to Sunset or Ice scheme.)
56 1 Chris Cannam
## Close that pane and open a "melodic-range spectrogram" - Pane -> Add Melodic Range Spectrogram (or M key). Observe the much more limited frequency range and the fact that this spectrogram uses both Linear colour and Sunset scheme by default.
57 1 Chris Cannam
## However, the biggest difference is that the default vertical (Y) scale for this view is logarithmic in frequency, i.e. with equal spacing between 1, 2, 4, 8, 16 etc rather than between 1, 2, 3, 4, 5 etc as in a linear scale. This makes it essentially linear in pitch, with "one octave" (a doubling in fundamental frequency) always being the same distance on the vertical scale. Correspondingly there is now a little representation of a piano keyboard shown at left, with middle C highlighted.
58 1 Chris Cannam
## The above assumes 12tET with A=440Hz; we can change at least the latter part of that in the Preferences and the scale will move immediately when we do so (try it but, at this point, be sure to restore the default).
59 1 Chris Cannam
## Select the Measure tool and show that we can get a frequency readout with harmonic markers. Return to the Navigate tool and contrast with the readout that is displayed as you move the pointer over the pane.
60 1 Chris Cannam
## Close that pane and open the "peak-frequency spectrogram" - Pane -> Add Peak Frequency Spectrogram (or K key). Notice that here we can just wave the Navigate tool over a bin to get an estimate of the instantaneous frequency there.
61 1 Chris Cannam
## New session, open "A Friendly Warning":https://code.soundsoftware.ac.uk/projects/dhoxss15/repository/raw/data/A%20Friendly%20Warning.ogg again and open both the plain spectrogram and the melodic-range one -- observe and contrast the various visible elements, in particular vertical lines in full frequency range for noisy percussion, relative invisibility of such broadband sounds in the melodic-range spectrogram, glides in vocal, difficulty of distinguishing harmonic traces from simultaneous notes etc.
62 1 Chris Cannam
## Go to File -> Replace Main Audio, open "King Henry":https://code.soundsoftware.ac.uk/projects/dhoxss15/repository/raw/data/Music/07%20-%20King%20Henry.flac. Note among other things that we need to increase the gain on the melodic-range spectrogram, and that the vibrato is visible and things like vibrato rate could be approximately measured, that the long reverb makes the notes appear to overlap in places.
63 1 Chris Cannam
64 1 Chris Cannam
h5. Introductory notes and slides on Vamp plugins
65 1 Chris Cannam
66 1 Chris Cannam
h5. Sonic Visualiser - hands on with Vamp plugins
67 1 Chris Cannam
68 15 Chris Cannam
* *Quick survey of other common features* including some derived from short-time Fourier analysis. To cover:
69 7 Chris Cannam
** amplitude
70 7 Chris Cannam
** onset detection and tempo estimation
71 7 Chris Cannam
72 1 Chris Cannam
## Start a new session with "A Friendly Warning":https://code.soundsoftware.ac.uk/projects/dhoxss15/repository/raw/data/A%20Friendly%20Warning.ogg.
73 1 Chris Cannam
## Feature extraction plugins are found under the Transform menu, so called because it contains things that turn your audio into something else, including both feature extractors and audio effects. You can run a transform and show the output in the same pane as the audio, or in a new one.
74 7 Chris Cannam
## With the audio pane selected, run Transform -> Analysis by Category -> Time -> Tempo -> Bar and Beat Tracker: Beats. This produces a series of beat locations, each labelled with metrical beat number, and when you play the audio, the beats are played with a tap.
75 7 Chris Cannam
## Create a new pane (Pane -> Add New Pane or shortcut N), and run Transform -> Analysis by Category -> Low Level Features -> Amplitude Follower. This produces an amplitude curve.
76 1 Chris Cannam
## Open another new pane (Pane -> Add New Pane or shortcut N) and run Transform -> Analysis by Category -> Time -> Onsets -> Note Onset Detector: Onsets. Here we have individual note onset positions: it works OK for this sort of music. We can switch playback on and off for the individual feature tracks with the Play toggle on the parameter box at right.
77 1 Chris Cannam
78 7 Chris Cannam
* *Pitch*. To cover:
79 7 Chris Cannam
** pitch estimation and note segmentation
80 7 Chris Cannam
81 1 Chris Cannam
## Close session, return to "King Henry":https://code.soundsoftware.ac.uk/projects/dhoxss15/repository/raw/data/Music/07%20-%20King%20Henry.flac. Open a melodic-range spectrogram and make sure its pane is current.
82 1 Chris Cannam
## Now run the transform pYIN - Smoothed Pitch Track. A pitch track should appear in a bright colour. Switch its plot type to Discrete Curves and make sure its scale is set to Auto-Align, which means that if it has Hz units, it will be aligned to the same vertical scale as the spectrogram behind it.
83 1 Chris Cannam
## We can of course check this extracted pitch-track visually, but we can also inspect individual values (by mouseover), inspect in bulk (Layer -> Edit Layer Data) including tracking through the data during playback, and export to a file (File -> Export Annotation Layer). Demonstrate this latter. (Note that the correct layer must be selected for any of these to work!)
84 1 Chris Cannam
## This layer can also be synthesised and played back -- switch on the Play button on layer parameters and try it.
85 1 Chris Cannam
## The same plugin can produce note segmentations (for monophonic audio of this type), so run it again requesting the Notes output. Each note is recorded as having a pitch equal to the median of the underlying pitch track's pitches for the time it spans. This is unlikely to sound so nice when played back, because of both segmentation flaws and difficulties (e.g. glides) and interesting properties of pitch perception (e.g. with vibrato). If this kind of use is of interest to you, consider our other program "Tony":/projects/tony.
86 1 Chris Cannam
87 10 Chris Cannam
* *Chromagrams*. To cover:
88 10 Chris Cannam
** what is a chromagram?
89 10 Chris Cannam
** how the terms "pitch chroma", "chromagram", and "chroma features" are related;
90 10 Chris Cannam
** relationship to Constant-Q spectrograms;
91 10 Chris Cannam
** what use chroma features are, and their limitations;
92 10 Chris Cannam
** tuning/parameter considerations (tuning frequency, number of bins per octave, lack of consideration for temperament, etc);
93 1 Chris Cannam
** what "chroma features" saved to a file might look like compared to how a chromagram looks on screen
94 1 Chris Cannam
95 11 Chris Cannam
## Start a new session with "piano-scale.wav":https://code.soundsoftware.ac.uk/projects/dhoxss15/repository/raw/data/piano-scale.wav.
96 11 Chris Cannam
## Open a "melodic-range spectrogram" - Pane -> Add Melodic Range Spectrogram (or M key) pane again. As we saw earlier, this is a series of short-time Fourier transform outputs displayed with a logarithmic-frequency Y axis. The visual resolution of individual "bins" becomes finer and finer as we go vertically up the image, because the underlying spectrogram has a linear-frequency Y axis and we are just warping it for the display.
97 12 Chris Cannam
## Now open a new pane and add a Constant-Q spectrogram from a plugin, via Transform -> Analysis by Category -> Visualisation -> Constant Q Spectrogram (MIDI pitch range) with the default parameters.
98 17 Chris Cannam
## You can see that this is quite similar to the melodic-range spectrogram, but the underlying analysis is quite different -- this spectrogram uses a different time/frequency kernel design in order to ensure each output bin has a constant log-frequency bandwidth (known as Q) by adjusting the bins' time resolution instead. So when this one is plotted, no warping is needed -- the output already has a logarithmic relationship with frequency and a linear one with pitch.
99 1 Chris Cannam
## The difference between the "melodic range spectrogram" (standard Fourier transform, with Y axis warped to log-frequency for display) and "constant-Q spectrogram" (transform that already has log-frequency scale) is not always all that significant when it comes to visualisation alone. The standard spectrogram is faster to calculate and has a clearer single mathematical definition, which is why it's built-in to Sonic Visualiser while constant-Q spectrograms are only available via plugins.
100 1 Chris Cannam
## However, because a constant-Q spectrogram has a more rigorous and precise correspondence between pitch and output bin, it can be useful especially as a source of data for further use. The chromagram is an example of such a use.
101 1 Chris Cannam
## Open another new pane and add a chromagram via Transform -> Analysis by Category -> Visualisation -> CQ Chromagram. (There are several chromagram plugins, so note the one we're using here.)
102 1 Chris Cannam
## A chromagram consists of a constant-Q spectrogram folded around into a single octave, so that each "bin" of the chromagram contains the sum of the values for that bin's pitch across all octaves within the chromagram's source range. (The term "chroma" refers to the pitch class of a note, so these values are the contributions of notes that share a certain pitch class.)
103 1 Chris Cannam
## Chromagrams can be generated with any number of bins per octave -- a figure of 12 would give us one bin per semitone. This plugin defaults to 36, which can be changed in the plugin configuration dialog. Others will default to 12, and some can't be changed from that.
104 16 Chris Cannam
## One chromagram plugin that always uses 12 bins per octave is NNLS Chroma. Try this in a new pane with Transform -> Analysis by Category -> Visualisation -> NNLS Chroma: Chromagram. This chromagram tries to emphasise plausible fundamental frequencies, de-emphasise harmonics, and reduce overspill between neighbouring bins. That makes it a good chromagram for many uses and often the clearest as the music gets more complex, but there are risks.
105 16 Chris Cannam
## Let's get a more interesting piece of music -- replace the main audio with the "Frog Galliard":https://code.soundsoftware.ac.uk/projects/dhoxss15/repository/raw/data/lutemusic426.mp3. Looking at the chromagram, what key does this appear to be in?
106 16 Chris Cannam
## This one also shows a limitation of the NNLS Chroma: the ringing single notes show up as more than one simultaneous note, which doesn't happen so much with other chroma plots. Maybe the lute is not as harmonic as NNLS Chroma hopes an arbitrary instrument will be.
107 15 Chris Cannam
108 15 Chris Cannam
* *Examples of other features based on chroma*. To cover:
109 15 Chris Cannam
** key estimation
110 15 Chris Cannam
** chord estimation
111 15 Chris Cannam
112 15 Chris Cannam
## Some further transforms we can try: Chordino chord estimate; the notes corresponding to Chordino's chord estimate (so as to audition whether the chords sound any good); QM key estimator and key-strength plot.
113 15 Chris Cannam
114 14 Chris Cannam
## (To add: problem with harmonic interference; bring in Frog Galliard example; uses and "meaning" of chroma; examples illustrating those; exporting chroma data; a reference to Sonic Annotator)