Are pitches discrete An information-theoretic framework and a corpus study¶
Human melodies are generally thought of as sequences of discrete/level pitches. Western musical theory does have terms for deviations from discreteness (vibrato, glissandi, scoops), but melodies, when recorded in symbolic notation, are typically defined as a sequence of discrete pitches. Here we ask, is this an appropriate description of music? Are pitches really discrete? Does the degree of discreteness vary across cultures? Is there something special about the perception of human music, or is discreteness common to other domains (bird song, human speech)? These questions are hindered by the fact that pitch-discreteness lacks a rigorous quantitative definition.
To establish a rigorous information-theoretic framework in which we can quantify the degree of discreteness. To validate this framework using human perceptual data, and to use the framework to study the above questions.
Our framework considers pitch perception as the output of a rate-distortion optimization process, such that an information-theoretic observer (such as a human listening to music) can be defined by their time resolution, frequency resolution, and λ, the ratio of cost of information to the cost of inaccuracy in the pitch percept. Using this framework we study pitch-discreteness in samples of human song and speech from different cultures, and bird song. Again, using this framework we create a segmentation algorithm that decomposes an audio signal into continuous segments which are labelled discrete or non-discrete by the observer. This allows us to establish an information-theoretic observer that approximates a human listener, by comparing the results of the segmentation algorithm to manual transcriptions of human song (Natural History of Song, Mehr et al 2019), and to automatic transcriptions by a state-of-the-art algorithm (TONY, Mauch et al 2015).
For human song we use 82 vocal songs from the Natural History of Song collection, 100 vocal songs from the Global Jukebox collection. We also use 62 bird songs (Tierney et al 2011) and 26 human speech recordings (Kuroyanagi et al. 2019).
We show that there is significant agreement between manual transcriptions, TONY, and segmentation according to the information-theoretic observer. We currently have preliminary results that explore variation across cultures and across domains.
We have developed a theoretical framework that can help answer questions about pitch-discreteness, lacking any bias to a specific culture/domain. This framework makes several predictions that can be tested through perceptual experiments, providing a robust link between perceptual and corpus studies.
John McBride is a post-doc working in Korea at a biophysics institute. With his precious academic freedom he has shifted focus to music informatics and perception. His main interest is in quantifying what is music, and how it is shaped by biology and culture.