Wiki » History » Version 12

Chris Cannam, 2015-02-04 10:19 AM

1 1 Chris Cannam
h1. The problem
2 1 Chris Cannam
3 1 Chris Cannam
The problem we're trying to solve is:
4 1 Chris Cannam
5 5 Chris Cannam
We have two (or more) recordings of a particular score. We want to carry out some task such as audio alignment on them, but they differ in pitch (tuning frequency) sufficiently to confuse the feature extractor we want to use. We therefore want to first detect the difference in pitch between the two, so as to compensate for it in our subsequent processing.
6 1 Chris Cannam
7 1 Chris Cannam
This differs from the problem of deducing the tuning frequency of a recording in isolation, because the frequency difference may be quite extreme: more than a semitone. If you ran a tuning-frequency detector (of the type that works by calculating the difference between predominant frequencies and chroma bin centre frequencies) on both recordings, you would believe them closer in frequency than they actually are, because the whole semitones spanned by the difference would be perceived as a change of key rather than tuning frequency.
8 2 Chris Cannam
9 3 Chris Cannam
h2. Example
10 3 Chris Cannam
11 4 Chris Cannam
 * "Recording of Bach BWV846":https://code.soundsoftware.ac.uk/attachments/download/1301/BWV846Egarr.mp3 (C major prelude from the Well-Tempered Clavier) by Richard Egarr, harpsichord, at roughly A=400Hz
12 4 Chris Cannam
 * "MIDI rendering":https://code.soundsoftware.ac.uk/attachments/download/1302/PreludeInCMajorBWV846.mp3 at A=440 for comparison
13 3 Chris Cannam
14 11 Chris Cannam
h2. A Plan
15 11 Chris Cannam
16 11 Chris Cannam
(based on the experiments below)
17 11 Chris Cannam
18 12 Chris Cannam
Our feature is the normalised mean across the whole input duration of 60-bin-per-octave chroma with an adjustable tuning frequency. (Or a matrix of such features at half a dozen intervals through the file?)
19 12 Chris Cannam
20 12 Chris Cannam
Our metric is the Manhattan distance between two feature vectors.
21 11 Chris Cannam
22 11 Chris Cannam
# Calculate the chroma feature from the reference input at A=440
23 11 Chris Cannam
# Calculate it from the other input at A=440
24 11 Chris Cannam
# Rotate the second chroma feature up by successive 1-bin increments, calculating the distance at each rotation, until a local minimum is found. Repeat in the downward direction. This gives an approximate tuning frequency to 20 cent resolution.
25 11 Chris Cannam
# Starting with the approximate tuning frequency, recalculate the second chroma feature at each tuning frequency adjusting upward by 1-cent steps until a local minimum is found. Repeat in the downward direction. This gives (very slowly) a tuning frequency to 1 cent resolution.
26 11 Chris Cannam
27 2 Chris Cannam
h2. Experiments
28 2 Chris Cannam
29 2 Chris Cannam
h3. Iterative chroma comparison
30 2 Chris Cannam
31 2 Chris Cannam
Using the script @iterative-chroma/chromacompare.sh@.
32 2 Chris Cannam
33 2 Chris Cannam
Extracts chroma means (using CQ Chromagram) from the first 30 sec of the reference (at 440Hz) and then repeatedly extracts chroma means from the first 30 sec of the test recording with the chroma tuned to various numbers of cents below and above 440 (from -400 to 400 in 10 cent steps). At each step it calculates the Euclidean distance between the chroma vector just extracted and that from the reference. The frequency yielding the lowest distance is reported.
34 2 Chris Cannam
35 2 Chris Cannam
This takes 2m11sec to run and reports the best tuning frequency as 548Hz.
36 2 Chris Cannam
37 2 Chris Cannam
The closest probe frequencies to the actual tuning are 398.85Hz (-170c) and 401.16Hz (-160c). Both score worse than 440Hz does.
38 2 Chris Cannam
39 9 Chris Cannam
Switching to a 36-bin or 60-bin chromagram gets us an estimate of 529.3Hz. That seems very hard to believe -- I'm sure those should work! I think I need to take a closer look at this.
40 9 Chris Cannam
41 9 Chris Cannam
OK, I think it's a lack of normalisation...
42 9 Chris Cannam
43 10 Chris Cannam
*Normalising*
44 9 Chris Cannam
45 10 Chris Cannam
Switching to the QM Vamp Plugins chromagram which has a normalisation option -- better than nothing, though we should really be normalising the means rather than taking the mean of the normalised chroma -- gets us an estimate of 396.56Hz with a fairly clear curve having a minimum somewhere between that probe value and the next one (398.85). There is another minimum around 529Hz. Searching a narrower range in 1 cent increments gets us a more precise estimate of 397.24Hz.
46 8 Chris Cannam
47 2 Chris Cannam
h3. Iterative MATCH path comparison
48 2 Chris Cannam
49 2 Chris Cannam
Using the script @iterative-match/matchcompare.sh@.
50 2 Chris Cannam
51 2 Chris Cannam
As above, except that the score is based on lowest MATCH overall path cost between the two files (with the tuning frequency adjusted appropriately for the second one).
52 2 Chris Cannam
53 2 Chris Cannam
This takes 4m12sec to run and reports the best tuning frequency as 363.6Hz.
54 2 Chris Cannam
55 2 Chris Cannam
Once again the two closest probe frequencies score worse than 440Hz does.
56 2 Chris Cannam
57 2 Chris Cannam
MATCH does actually seem to find a reasonable alignment if you feed it the test file pitch-shifted to A=440Hz along with the reference, so this doesn't seem to be a fault in the aligner. I think I am simply misinterpreting the underlying meaning of the overall path cost.
58 2 Chris Cannam
59 7 Chris Cannam
*Switching to chroma features*
60 7 Chris Cannam
61 7 Chris Cannam
The iterative MATCH path comparison using chroma features performs much better: it estimates 398.85 Hz which is the closest of the probe frequencies. This may have potential, although it does require that MATCH alignment works somewhat on the pieces in question.
62 7 Chris Cannam
63 2 Chris Cannam
h3. Spectrum comparator
64 2 Chris Cannam
65 2 Chris Cannam
A plugin written for this purpose, in @spectrum-compare@. It works by calculating a mean harmonic spectrum for each of its two input channels, then repeatedly frequency-scaling one using a multiplicative factor and comparing the values for each rescaled version, within a limited frequency range, with the reference version.
66 2 Chris Cannam
67 2 Chris Cannam
It probes shifts up to 2400 cents in both directions and reports a shift of -1021 as the best result. There are various other local minima but the true difference is nowhere near any of them. This may be down to arithmetic error, it seems hard to believe that there wouldn't be a minimum nearby - must review.
68 2 Chris Cannam
69 6 Chris Cannam
h3. TempEst
70 6 Chris Cannam
71 6 Chris Cannam
The "TempEst":/projects/tempest plugin tries to estimate temperament and tuning frequency. I haven't read up yet on how it does this. For the Egarr recording it estimates A=415.98 Hz, shifted quarter-comma meantone.
72 6 Chris Cannam
73 2 Chris Cannam
h3. Separate tuning and key estimation
74 2 Chris Cannam
75 2 Chris Cannam
h4. NNLS Chroma Tuning plugin
76 2 Chris Cannam
77 2 Chris Cannam
Run on the Egarr recording alone, takes 0.54 sec to produce an estimate of A=445.8 Hz. This is strangely fast! But obviously on its own it has no way to tell the tuning is more than a semitone different.
78 2 Chris Cannam
79 2 Chris Cannam
If concert A actually was 445.8Hz, our actual tuning frequency of 400Hz would be just above G. Can we adjust for the greater-than-a-semitone shift using a key detector as well?
80 2 Chris Cannam
81 2 Chris Cannam
h4. NNLS Chroma and QM Key Detector
82 2 Chris Cannam
83 2 Chris Cannam
The QM Key Detector reports a modal key of C major for the reference and B major for the test piece.
84 2 Chris Cannam
85 2 Chris Cannam
The tuning plugin had reported a tuning frequency of 445.8. If we take this at face value and apply the pitch shift necessary to adjust from 445.8Hz to 440Hz, and then run the key detector, we get a modal key of Bb major.
86 2 Chris Cannam
87 2 Chris Cannam
From 445.8Hz to 440Hz is about 23 cents, so we have shifted the piece down by 23 cents and found it to be a whole tone lower than the reference -- implying that the original tuning frequency was about 177 cents below the reference, or about 397Hz.
88 2 Chris Cannam
89 2 Chris Cannam
That's pretty good, but relying on a key detector feels fragile. We got the right modal key here, but this is an easy piece (in fact it's one of the pieces the key detector's reference templates came from). We could easily have got a complementary key as the modal key and ended up miles out.
90 2 Chris Cannam
91 2 Chris Cannam
I started implementing this in a script (@tuning-and-key/keycompare.sh@), taking advantage as well of the fact that the key detector also has a tuning frequency parameter. But then I discovered that the tuning frequency estimation from NNLS Chroma was not reliable after all -- it produces a different result (433.753) if you resample the audio first (from 48 to 44.1 kHz).
92 2 Chris Cannam
93 2 Chris Cannam
Taking the same approach, if we shift the audio up from 433.753 to 440 and then run the key detector, we get a modal key of A major, three semitones below the reference. We have shifted the piece up by 24.8 cents and found it to be three semis lower than the reference, so the original tuning frequency must have been 324.8 cents below the reference, or about 364.7Hz.
94 2 Chris Cannam
95 2 Chris Cannam
That's no good -- back to the drawing board.