Version 3 - History - Using Computers to Analyse Recordings 2017 - DHOxSS - Using Computers to Analyse Recordings

Using Computers to Analyse Recordings 2017 » History » Version 3

Chris Cannam, 2017-06-23 03:39 PM

-Chris Cannam
+h1. Using Computers to Analyse Recordings 2017
 Chris Cannam
-Chris Cannam
+h3. General outline
 Chris Cannam
-Chris Cannam
+# Introductory notes and slides on acoustics and audio (DW)
-Chris Cannam
+# Introductory notes and slides on audio features (CC)
-Chris Cannam
+# Sonic Visualiser - hands on with waveform and spectrograms (CC)
-Chris Cannam
+# Introductory notes and slides on Vamp plugins (CC)
-Chris Cannam
+# Sonic Visualiser - hands on with Vamp plugins (CC)
 Chris Cannam
-Chris Cannam
+h3. Audio material
 Chris Cannam
-Chris Cannam
+ * "piano-scale.wav":https://code.soundsoftware.ac.uk/projects/dhoxss15/repository/raw/data/piano-scale.wav - our simplest example, used for basic waveform, spectrogram, & chroma introductions
-Chris Cannam
+ * "A Friendly Warning":https://code.soundsoftware.ac.uk/projects/dhoxss15/repository/raw/data/A%20Friendly%20Warning.ogg - severe synthetic pop from the 80s, used to illustrate spectrograms a bit more and demonstrate gross timing-related features
-Chris Cannam
+ * "Frog Galliard":https://code.soundsoftware.ac.uk/projects/dhoxss15/repository/raw/data/lutemusic426.mp3 (Dowland) - illustrating both timing and pitch/harmonic features - file will be used throughout the week, possibly (nb it's in G major)
-Chris Cannam
+ * "King Henry":https://code.soundsoftware.ac.uk/projects/dhoxss15/repository/raw/data/Music/07%20-%20King%20Henry.flac - to show appearance of vibrato, talk about confusions between harmonics and fundamentals, and discuss temperament
-Chris Cannam
+ * a live recording of sung audio? (bring a usb mic?) - illustrating pitch (e.g. pYin) and chroma features
 Chris Cannam
-Chris Cannam
+h3. Sequence of things to cover (some or all of)
 Chris Cannam
-Chris Cannam
+# Waveform, and what we can learn from that alone
-Chris Cannam
+# Spectrogram: what it is; how the terms "spectral", "spectrum", and "spectrogram" are related, as well as "discrete Fourier transform", "short-time Fourier transform", etc; time/frequency tradeoff; fundamental frequency and harmonic series; linear and logarithmic frequency scales
-Chris Cannam
+# Mid-level features derived from short-time Fourier analysis, such as onset detection methods
-Chris Cannam
+# Chromagram: what it is; how the terms "pitch chroma", "chromagram", and "chroma features" are related; relationship to "Constant Q Spectrogram"; what use chroma features are; limitations; tuning/parameter considerations (tuning frequency, number of bins per octave, lack of consideration for temperament, etc); what "chroma features" saved to a file might look like compared to how a chromagram looks on screen
 Chris Cannam
-Chris Cannam
+-> rest is to be revised
 Chris Cannam
 Chris Cannam
-Chris Cannam
+h3. Breakdown of CC sections
 Chris Cannam
-Chris Cannam
+Direct links to the audio files themselves are included here, but for copyright reasons they won't necessarily persist beyond the workshop!
 Chris Cannam
-Chris Cannam
+h5. Sonic Visualiser - hands on with waveform and spectrograms
 Chris Cannam
-Chris Cannam
+* Waveform
-Chris Cannam
+## Start Sonic Visualiser and open "A Friendly Warning":https://code.soundsoftware.ac.uk/projects/dhoxss15/repository/raw/data/A%20Friendly%20Warning.ogg (very severe, synthetic 80s pop song by Act)
-Chris Cannam
+## Show dragging through the file using Navigate tool, and also using the overview at bottom
-Chris Cannam
+## Play from the start, just to get an idea what it sounds like
-Chris Cannam
+## Return to the start and zoom in (using the zoom wheel, but noting that the mouse wheel also works)
-Chris Cannam
+## Notice the different shapes in waveform resulting from different types of synthetic percussive sound (low-frequency kick drum / higher frequency cymbal-type sounds) - refer back to Christophe's notes about correspondence between e.g. signal voltage and speaker cone deflection
-Chris Cannam
+## Continue until the vocal starts, and observe that we can see very little that relates to e.g. sung pitch, although if we zoom in we can quite clearly see sibilance (these frequencies around 10kHz are pretty much the sweet spot for visibility in a 44.1kHz waveform)
 Chris Cannam
-Chris Cannam
+* Spectrogram
-Chris Cannam
+## New session, open "piano-scale.wav":https://code.soundsoftware.ac.uk/projects/dhoxss15/repository/raw/data/piano-scale.wav and play it
-Chris Cannam
+## Some information can sort-of be perceived and measured from the waveform here: we can see when the notes start, and can get simple fundamental frequency estimate - zoom in to the first note, switch to Select mode, drag out one cycle - it's about 170 samples, so 44100/170 = 259 Hz - the note is a middle C so true value should be nearer to 261, but this is a fair approximation. (But this is a very simple example!)
-Chris Cannam
+## Now open a plain spectrogram - Pane -> Add Spectrogram (or G key). Observe full range on frequency scale; x axis is time, this is a simple time-frequency breakdown.
-Chris Cannam
+## Notice that, for each note, we can see the fundamental frequency most strongly and then the harmonic sequence. The harmonics are spaced more widely for higher notes because they are multiples of the fundamental frequency, which is larger. The noise floor is visible because we're using a dB scale, can switch to Linear to isolate only the strong frequencies. There isn't really enough detail to measure much here. (NB the default colour scheme is unhelpful to colour blind users, so it might be worth changing to Sunset scheme.)
-Chris Cannam
+## Close that pane and open a "melodic-range spectrogram" - Pane -> Add Melodic Range Spectrogram (or M key). Observe the much more limited frequency range and the fact that this spectrogram uses both Linear colour and Sunset scheme by default.
-Chris Cannam
+## Although the higher harmonics quickly disappear off the top of the scale, we can clearly see that the spacing between harmonics is now the same for each note, but the harmonics for a given note get closer together as they go up; and that the semitones are equally-spaced (note spacing corresponding to the major scale intervals). This is because the vertical scale is now logarithmic in frequency, which makes it (fudging the issue a little) linear in pitch. Correspondingly there is now a little representation of a piano keyboard shown at left, with middle C highlighted.
-Chris Cannam
+## The above assumes 12tET with A=440Hz, we can change at least the latter part of that in the Preferences and the scale will move immediately when we do so (demonstrate but be sure to restore the default)
-Chris Cannam
+## Select the Measure tool and show that we can get a frequency readout with harmonic markers. Return to the Navigate tool and contrast with the readout that is displayed as you move the pointer over the pane.
-Chris Cannam
+## Close that pane and open the "peak-frequency spectrogram" - Pane -> Add Peak Frequency Spectrogram (or K key). Notice that here we can just wave the Navigate tool over a bin to get an estimate of the instantaneous frequency there.
-Chris Cannam
+## New session, open "A Friendly Warning":https://code.soundsoftware.ac.uk/projects/dhoxss15/repository/raw/data/A%20Friendly%20Warning.ogg again and open both the plain spectrogram and the melodic-range one -- observe and contrast the various visible elements, in particular vertical lines in full frequency range for noisy percussion, relative invisibility of such broadband sounds in the melodic-range spectrogram, glides in vocal, difficulty of distinguishing harmonic traces from simultaneous notes etc.
-Chris Cannam
+## Go to File -> Replace Main Audio, open "King Henry":https://code.soundsoftware.ac.uk/projects/dhoxss15/repository/raw/data/Music/07%20-%20King%20Henry.flac. Note among other things that we need to increase the gain on the melodic-range spectrogram, and that the vibrato is visible and things like vibrato rate could be approximately measured, that the long reverb makes the notes appear to overlap in places.
 Chris Cannam
-Chris Cannam
+h5. Introductory notes and slides on audio features
 Chris Cannam
-Chris Cannam
+h5. Sonic Visualiser - hands on with Vamp plugins
 Chris Cannam
-Chris Cannam
+* Quick survey of nominally high-ish level feature extractors
-Chris Cannam
+## Start a new session with "A Friendly Warning":https://code.soundsoftware.ac.uk/projects/dhoxss15/repository/raw/data/A%20Friendly%20Warning.ogg.
-Chris Cannam
+## Feature extraction plugins are found under the Transform menu, so called because it contains things that turn your audio into something else, including both feature extractors and audio effects. You can run a transform and show the output in the same pane as the audio, or in a new one.
-Chris Cannam
+## With the audio pane selected, run Transform -> Analysis by Category -> Time -> Tempo -> Bar and Beat Tracker: Beats. This produces a series of beat locations, each labelled with metrical beat number, and when you play the audio, the beats are played with a tap.
-Chris Cannam
+## Now open a new pane (Pane -> Add New Pane or shortcut N) and run Transform -> Analysis by Category -> Time -> Onsets -> Note Onset Detector: Onsets. Here we have individual note onset positions: it works OK for this sort of music. We can switch playback on and off for the individual feature tracks with the Play toggle on the parameter box at right.
-Chris Cannam
+## Some further transforms we can try: Chordino chord estimate; the notes corresponding to Chordino's chord estimate (so as to audition whether the chords sound any good); QM key estimator and key-strength plot.
 Chris Cannam
-Chris Cannam
+* Pitch
-Chris Cannam
+## Close session, return to "King Henry":https://code.soundsoftware.ac.uk/projects/dhoxss15/repository/raw/data/Music/07%20-%20King%20Henry.flac. Open a melodic-range spectrogram and make sure its pane is current.
-Chris Cannam
+## Now run the transform pYIN - Smoothed Pitch Track. A pitch track should appear in a bright colour. Switch its plot type to Discrete Curves and make sure its scale is set to Auto-Align, which means that if it has Hz units, it will be aligned to the same vertical scale as the spectrogram behind it.
-Chris Cannam
+## We can of course check this extracted pitch-track visually, but we can also inspect individual values (by mouseover), inspect in bulk (Layer -> Edit Layer Data) including tracking through the data during playback, and export to a file (File -> Export Annotation Layer). Demonstrate this latter. (Note that the correct layer must be selected for any of these to work!)
-Chris Cannam
+## This layer can also be synthesised and played back -- switch on the Play button on layer parameters and try it.
-Chris Cannam
+## The same plugin can produce note segmentations (for monophonic audio of this type), so run it again requesting the Notes output. Each note is recorded as having a pitch equal to the median of the underlying pitch track's pitches for the time it spans. This is unlikely to sound so nice when played back, because of both segmentation flaws and difficulties (e.g. glides) and interesting properties of pitch perception (e.g. with vibrato). If this kind of use is of interest to you, consider our other program "Tony":/projects/tony.
 Chris Cannam
-Chris Cannam
+* Chroma reduction
-Chris Cannam
+## Using the NNLS Chroma plugin - open an empty pane and run this transform with the default parameters. This is a single-octave reduction of frequency content (can explain at arbitrary length).
-Chris Cannam
+## Run the same transform again in a second pane, but this time with different parameters: local tuning, L2 norm, spectral shape = 0.9.