Version 12 - History - Wiki - Tuning Difference

Wiki » History » Version 12

Chris Cannam, 2015-02-04 10:19 AM

-Chris Cannam
+h1. The problem
 Chris Cannam
-Chris Cannam
+The problem we're trying to solve is:
 Chris Cannam
-Chris Cannam
+We have two (or more) recordings of a particular score. We want to carry out some task such as audio alignment on them, but they differ in pitch (tuning frequency) sufficiently to confuse the feature extractor we want to use. We therefore want to first detect the difference in pitch between the two, so as to compensate for it in our subsequent processing.
 Chris Cannam
-Chris Cannam
+This differs from the problem of deducing the tuning frequency of a recording in isolation, because the frequency difference may be quite extreme: more than a semitone. If you ran a tuning-frequency detector (of the type that works by calculating the difference between predominant frequencies and chroma bin centre frequencies) on both recordings, you would believe them closer in frequency than they actually are, because the whole semitones spanned by the difference would be perceived as a change of key rather than tuning frequency.
 Chris Cannam
-Chris Cannam
+h2. Example
 Chris Cannam
-Chris Cannam
+ * "Recording of Bach BWV846":https://code.soundsoftware.ac.uk/attachments/download/1301/BWV846Egarr.mp3 (C major prelude from the Well-Tempered Clavier) by Richard Egarr, harpsichord, at roughly A=400Hz
-Chris Cannam
+ * "MIDI rendering":https://code.soundsoftware.ac.uk/attachments/download/1302/PreludeInCMajorBWV846.mp3 at A=440 for comparison
 Chris Cannam
-Chris Cannam
+h2. A Plan
 Chris Cannam
-Chris Cannam
+(based on the experiments below)
 Chris Cannam
-Chris Cannam
+Our feature is the normalised mean across the whole input duration of 60-bin-per-octave chroma with an adjustable tuning frequency. (Or a matrix of such features at half a dozen intervals through the file?)
 Chris Cannam
-Chris Cannam
+Our metric is the Manhattan distance between two feature vectors.
 Chris Cannam
-Chris Cannam
+# Calculate the chroma feature from the reference input at A=440
-Chris Cannam
+# Calculate it from the other input at A=440
-Chris Cannam
+# Rotate the second chroma feature up by successive 1-bin increments, calculating the distance at each rotation, until a local minimum is found. Repeat in the downward direction. This gives an approximate tuning frequency to 20 cent resolution.
-Chris Cannam
+# Starting with the approximate tuning frequency, recalculate the second chroma feature at each tuning frequency adjusting upward by 1-cent steps until a local minimum is found. Repeat in the downward direction. This gives (very slowly) a tuning frequency to 1 cent resolution.
 Chris Cannam
-Chris Cannam
+h2. Experiments
 Chris Cannam
-Chris Cannam
+h3. Iterative chroma comparison
 Chris Cannam
-Chris Cannam
+Using the script @iterative-chroma/chromacompare.sh@.
 Chris Cannam
-Chris Cannam
+Extracts chroma means (using CQ Chromagram) from the first 30 sec of the reference (at 440Hz) and then repeatedly extracts chroma means from the first 30 sec of the test recording with the chroma tuned to various numbers of cents below and above 440 (from -400 to 400 in 10 cent steps). At each step it calculates the Euclidean distance between the chroma vector just extracted and that from the reference. The frequency yielding the lowest distance is reported.
 Chris Cannam
-Chris Cannam
+This takes 2m11sec to run and reports the best tuning frequency as 548Hz.
 Chris Cannam
-Chris Cannam
+The closest probe frequencies to the actual tuning are 398.85Hz (-170c) and 401.16Hz (-160c). Both score worse than 440Hz does.
 Chris Cannam
-Chris Cannam
+Switching to a 36-bin or 60-bin chromagram gets us an estimate of 529.3Hz. That seems very hard to believe -- I'm sure those should work! I think I need to take a closer look at this.
 Chris Cannam
-Chris Cannam
+OK, I think it's a lack of normalisation...
 Chris Cannam
-Chris Cannam
+*Normalising*
 Chris Cannam
-Chris Cannam
+Switching to the QM Vamp Plugins chromagram which has a normalisation option -- better than nothing, though we should really be normalising the means rather than taking the mean of the normalised chroma -- gets us an estimate of 396.56Hz with a fairly clear curve having a minimum somewhere between that probe value and the next one (398.85). There is another minimum around 529Hz. Searching a narrower range in 1 cent increments gets us a more precise estimate of 397.24Hz.
 Chris Cannam
-Chris Cannam
+h3. Iterative MATCH path comparison
 Chris Cannam
-Chris Cannam
+Using the script @iterative-match/matchcompare.sh@.
 Chris Cannam
-Chris Cannam
+As above, except that the score is based on lowest MATCH overall path cost between the two files (with the tuning frequency adjusted appropriately for the second one).
 Chris Cannam
-Chris Cannam
+This takes 4m12sec to run and reports the best tuning frequency as 363.6Hz.
 Chris Cannam
-Chris Cannam
+Once again the two closest probe frequencies score worse than 440Hz does.
 Chris Cannam
-Chris Cannam
+MATCH does actually seem to find a reasonable alignment if you feed it the test file pitch-shifted to A=440Hz along with the reference, so this doesn't seem to be a fault in the aligner. I think I am simply misinterpreting the underlying meaning of the overall path cost.
 Chris Cannam
-Chris Cannam
+*Switching to chroma features*
 Chris Cannam
-Chris Cannam
+The iterative MATCH path comparison using chroma features performs much better: it estimates 398.85 Hz which is the closest of the probe frequencies. This may have potential, although it does require that MATCH alignment works somewhat on the pieces in question.
 Chris Cannam
-Chris Cannam
+h3. Spectrum comparator
 Chris Cannam
-Chris Cannam
+A plugin written for this purpose, in @spectrum-compare@. It works by calculating a mean harmonic spectrum for each of its two input channels, then repeatedly frequency-scaling one using a multiplicative factor and comparing the values for each rescaled version, within a limited frequency range, with the reference version.
 Chris Cannam
-Chris Cannam
+It probes shifts up to 2400 cents in both directions and reports a shift of -1021 as the best result. There are various other local minima but the true difference is nowhere near any of them. This may be down to arithmetic error, it seems hard to believe that there wouldn't be a minimum nearby - must review.
 Chris Cannam
-Chris Cannam
+h3. TempEst
 Chris Cannam
-Chris Cannam
+The "TempEst":/projects/tempest plugin tries to estimate temperament and tuning frequency. I haven't read up yet on how it does this. For the Egarr recording it estimates A=415.98 Hz, shifted quarter-comma meantone.
 Chris Cannam
-Chris Cannam
+h3. Separate tuning and key estimation
 Chris Cannam
-Chris Cannam
+h4. NNLS Chroma Tuning plugin
 Chris Cannam
-Chris Cannam
+Run on the Egarr recording alone, takes 0.54 sec to produce an estimate of A=445.8 Hz. This is strangely fast! But obviously on its own it has no way to tell the tuning is more than a semitone different.
 Chris Cannam
-Chris Cannam
+If concert A actually was 445.8Hz, our actual tuning frequency of 400Hz would be just above G. Can we adjust for the greater-than-a-semitone shift using a key detector as well?
 Chris Cannam
-Chris Cannam
+h4. NNLS Chroma and QM Key Detector
 Chris Cannam
-Chris Cannam
+The QM Key Detector reports a modal key of C major for the reference and B major for the test piece.
 Chris Cannam
-Chris Cannam
+The tuning plugin had reported a tuning frequency of 445.8. If we take this at face value and apply the pitch shift necessary to adjust from 445.8Hz to 440Hz, and then run the key detector, we get a modal key of Bb major.
 Chris Cannam
-Chris Cannam
+From 445.8Hz to 440Hz is about 23 cents, so we have shifted the piece down by 23 cents and found it to be a whole tone lower than the reference -- implying that the original tuning frequency was about 177 cents below the reference, or about 397Hz.
 Chris Cannam
-Chris Cannam
+That's pretty good, but relying on a key detector feels fragile. We got the right modal key here, but this is an easy piece (in fact it's one of the pieces the key detector's reference templates came from). We could easily have got a complementary key as the modal key and ended up miles out.
 Chris Cannam
-Chris Cannam
+I started implementing this in a script (@tuning-and-key/keycompare.sh@), taking advantage as well of the fact that the key detector also has a tuning frequency parameter. But then I discovered that the tuning frequency estimation from NNLS Chroma was not reliable after all -- it produces a different result (433.753) if you resample the audio first (from 48 to 44.1 kHz).
 Chris Cannam
-Chris Cannam
+Taking the same approach, if we shift the audio up from 433.753 to 440 and then run the key detector, we get a modal key of A major, three semitones below the reference. We have shifted the piece up by 24.8 cents and found it to be three semis lower than the reference, so the original tuning frequency must have been 324.8 cents below the reference, or about 364.7Hz.
 Chris Cannam
-Chris Cannam
+That's no good -- back to the drawing board.