Chris Cannam, 2015-02-02 11:58 AM
h1. The problem
The problem we're trying to solve is:
We have two (or more) recordings of a particular score. We want to carry out some task such as audio alignment on them, but they differ in pitch (tuning frequency) sufficiently to confuse the feature extractor we want to use. We therefore want to detect the difference in pitch between the two, and compensate for it.
This differs from the problem of deducing the tuning frequency of a recording in isolation, because the frequency difference may be quite extreme: more than a semitone. If you ran a tuning-frequency detector (of the type that works by calculating the difference between predominant frequencies and chroma bin centre frequencies) on both recordings, you would believe them closer in frequency than they actually are, because the whole semitones spanned by the difference would be perceived as a change of key rather than tuning frequency.