Wiki » History » Version 5

Chris Cannam, 2015-02-02 12:18 PM

1 1 Chris Cannam
h1. The problem
2 1 Chris Cannam
3 1 Chris Cannam
The problem we're trying to solve is:
4 1 Chris Cannam
5 5 Chris Cannam
We have two (or more) recordings of a particular score. We want to carry out some task such as audio alignment on them, but they differ in pitch (tuning frequency) sufficiently to confuse the feature extractor we want to use. We therefore want to first detect the difference in pitch between the two, so as to compensate for it in our subsequent processing.
6 1 Chris Cannam
7 1 Chris Cannam
This differs from the problem of deducing the tuning frequency of a recording in isolation, because the frequency difference may be quite extreme: more than a semitone. If you ran a tuning-frequency detector (of the type that works by calculating the difference between predominant frequencies and chroma bin centre frequencies) on both recordings, you would believe them closer in frequency than they actually are, because the whole semitones spanned by the difference would be perceived as a change of key rather than tuning frequency.
8 2 Chris Cannam
9 3 Chris Cannam
h2. Example
10 3 Chris Cannam
11 4 Chris Cannam
 * "Recording of Bach BWV846":https://code.soundsoftware.ac.uk/attachments/download/1301/BWV846Egarr.mp3 (C major prelude from the Well-Tempered Clavier) by Richard Egarr, harpsichord, at roughly A=400Hz
12 4 Chris Cannam
 * "MIDI rendering":https://code.soundsoftware.ac.uk/attachments/download/1302/PreludeInCMajorBWV846.mp3 at A=440 for comparison
13 3 Chris Cannam
14 2 Chris Cannam
h2. Experiments
15 2 Chris Cannam
16 2 Chris Cannam
h3. Iterative chroma comparison
17 2 Chris Cannam
18 2 Chris Cannam
Using the script @iterative-chroma/chromacompare.sh@.
19 2 Chris Cannam
20 2 Chris Cannam
Extracts chroma means (using CQ Chromagram) from the first 30 sec of the reference (at 440Hz) and then repeatedly extracts chroma means from the first 30 sec of the test recording with the chroma tuned to various numbers of cents below and above 440 (from -400 to 400 in 10 cent steps). At each step it calculates the Euclidean distance between the chroma vector just extracted and that from the reference. The frequency yielding the lowest distance is reported.
21 2 Chris Cannam
22 2 Chris Cannam
This takes 2m11sec to run and reports the best tuning frequency as 548Hz.
23 2 Chris Cannam
24 2 Chris Cannam
The closest probe frequencies to the actual tuning are 398.85Hz (-170c) and 401.16Hz (-160c). Both score worse than 440Hz does.
25 2 Chris Cannam
26 2 Chris Cannam
h3. Iterative MATCH path comparison
27 2 Chris Cannam
28 2 Chris Cannam
Using the script @iterative-match/matchcompare.sh@.
29 2 Chris Cannam
30 2 Chris Cannam
As above, except that the score is based on lowest MATCH overall path cost between the two files (with the tuning frequency adjusted appropriately for the second one).
31 2 Chris Cannam
32 2 Chris Cannam
This takes 4m12sec to run and reports the best tuning frequency as 363.6Hz.
33 2 Chris Cannam
34 2 Chris Cannam
Once again the two closest probe frequencies score worse than 440Hz does.
35 2 Chris Cannam
36 2 Chris Cannam
MATCH does actually seem to find a reasonable alignment if you feed it the test file pitch-shifted to A=440Hz along with the reference, so this doesn't seem to be a fault in the aligner. I think I am simply misinterpreting the underlying meaning of the overall path cost.
37 2 Chris Cannam
38 2 Chris Cannam
h3. Spectrum comparator
39 2 Chris Cannam
40 2 Chris Cannam
A plugin written for this purpose, in @spectrum-compare@. It works by calculating a mean harmonic spectrum for each of its two input channels, then repeatedly frequency-scaling one using a multiplicative factor and comparing the values for each rescaled version, within a limited frequency range, with the reference version.
41 2 Chris Cannam
42 2 Chris Cannam
It probes shifts up to 2400 cents in both directions and reports a shift of -1021 as the best result. There are various other local minima but the true difference is nowhere near any of them. This may be down to arithmetic error, it seems hard to believe that there wouldn't be a minimum nearby - must review.
43 2 Chris Cannam
44 2 Chris Cannam
h3. Separate tuning and key estimation
45 2 Chris Cannam
46 2 Chris Cannam
h4. NNLS Chroma Tuning plugin
47 2 Chris Cannam
48 2 Chris Cannam
Run on the Egarr recording alone, takes 0.54 sec to produce an estimate of A=445.8 Hz. This is strangely fast! But obviously on its own it has no way to tell the tuning is more than a semitone different.
49 2 Chris Cannam
50 2 Chris Cannam
If concert A actually was 445.8Hz, our actual tuning frequency of 400Hz would be just above G. Can we adjust for the greater-than-a-semitone shift using a key detector as well?
51 2 Chris Cannam
52 2 Chris Cannam
h4. NNLS Chroma and QM Key Detector
53 2 Chris Cannam
54 2 Chris Cannam
The QM Key Detector reports a modal key of C major for the reference and B major for the test piece.
55 2 Chris Cannam
56 2 Chris Cannam
The tuning plugin had reported a tuning frequency of 445.8. If we take this at face value and apply the pitch shift necessary to adjust from 445.8Hz to 440Hz, and then run the key detector, we get a modal key of Bb major.
57 2 Chris Cannam
58 2 Chris Cannam
From 445.8Hz to 440Hz is about 23 cents, so we have shifted the piece down by 23 cents and found it to be a whole tone lower than the reference -- implying that the original tuning frequency was about 177 cents below the reference, or about 397Hz.
59 2 Chris Cannam
60 2 Chris Cannam
That's pretty good, but relying on a key detector feels fragile. We got the right modal key here, but this is an easy piece (in fact it's one of the pieces the key detector's reference templates came from). We could easily have got a complementary key as the modal key and ended up miles out.
61 2 Chris Cannam
62 2 Chris Cannam
I started implementing this in a script (@tuning-and-key/keycompare.sh@), taking advantage as well of the fact that the key detector also has a tuning frequency parameter. But then I discovered that the tuning frequency estimation from NNLS Chroma was not reliable after all -- it produces a different result (433.753) if you resample the audio first (from 48 to 44.1 kHz).
63 2 Chris Cannam
64 2 Chris Cannam
Taking the same approach, if we shift the audio up from 433.753 to 440 and then run the key detector, we get a modal key of A major, three semitones below the reference. We have shifted the piece up by 24.8 cents and found it to be three semis lower than the reference, so the original tuning frequency must have been 324.8 cents below the reference, or about 364.7Hz.
65 2 Chris Cannam
66 2 Chris Cannam
That's no good -- back to the drawing board.