samer@73: \documentclass[conference]{IEEEtran} samer@73: \usepackage{fixltx2e} samer@73: \usepackage{cite} samer@73: \usepackage[spacing]{microtype} samer@73: \usepackage[cmex10]{amsmath} samer@73: \usepackage{graphicx} samer@73: \usepackage{amssymb} samer@73: \usepackage{epstopdf} samer@73: \usepackage{url} samer@73: \usepackage{listings} samer@73: %\usepackage[expectangle]{tools} samer@73: \usepackage{tools} samer@73: \usepackage{tikz} samer@73: \usetikzlibrary{calc} samer@73: \usetikzlibrary{matrix} samer@73: \usetikzlibrary{patterns} samer@73: \usetikzlibrary{arrows} samer@73: samer@73: \let\citep=\cite samer@73: \newcommand{\colfig}[2][1]{\includegraphics[width=#1\linewidth]{figs/#2}}% samer@73: \newcommand\preals{\reals_+} samer@73: \newcommand\X{\mathcal{X}} samer@73: \newcommand\Y{\mathcal{Y}} samer@73: \newcommand\domS{\mathcal{S}} samer@73: \newcommand\A{\mathcal{A}} samer@73: \newcommand\Data{\mathcal{D}} samer@73: \newcommand\rvm[1]{\mathrm{#1}} samer@73: \newcommand\sps{\,.\,} samer@73: \newcommand\Ipred{\mathcal{I}_{\mathrm{pred}}} samer@73: \newcommand\Ix{\mathcal{I}} samer@73: \newcommand\IXZ{\overline{\underline{\mathcal{I}}}} samer@73: \newcommand\x{\vec{x}} samer@73: \newcommand\Ham[1]{\mathcal{H}_{#1}} samer@73: \newcommand\subsets[2]{[#1]^{(k)}} samer@73: \def\bet(#1,#2){#1..#2} samer@73: samer@73: samer@73: \def\ev(#1=#2){#1\!\!=\!#2} samer@73: \newcommand\rv[1]{\Omega \to #1} samer@73: \newcommand\ceq{\!\!=\!} samer@73: \newcommand\cmin{\!-\!} samer@73: \newcommand\modulo[2]{#1\!\!\!\!\!\mod#2} samer@73: samer@73: \newcommand\sumitoN{\sum_{i=1}^N} samer@73: \newcommand\sumktoK{\sum_{k=1}^K} samer@73: \newcommand\sumjtoK{\sum_{j=1}^K} samer@73: \newcommand\sumalpha{\sum_{\alpha\in\A}} samer@73: \newcommand\prodktoK{\prod_{k=1}^K} samer@73: \newcommand\prodjtoK{\prod_{j=1}^K} samer@73: samer@73: \newcommand\past[1]{\overset{\rule{0pt}{0.2em}\smash{\leftarrow}}{#1}} samer@73: \newcommand\fut[1]{\overset{\rule{0pt}{0.1em}\smash{\rightarrow}}{#1}} samer@73: \newcommand\parity[2]{P^{#1}_{2,#2}} samer@73: \newcommand\specint[1]{\frac{1}{2\pi}\int_{-\pi}^\pi #1{S(\omega)} \dd \omega} samer@73: %\newcommand\specint[1]{\int_{-1/2}^{1/2} #1{S(f)} \dd f} samer@73: samer@73: samer@73: %\usepackage[parfill]{parskip} samer@73: samer@73: \begin{document} samer@73: \title{Cognitive Music Modelling: an\\Information Dynamics Approach} samer@73: samer@73: \author{ samer@73: \IEEEauthorblockN{Samer A. Abdallah, Henrik Ekeus, Peter Foster} samer@73: \IEEEauthorblockN{Andrew Robertson and Mark D. Plumbley} samer@73: \IEEEauthorblockA{Centre for Digital Music\\ samer@73: Queen Mary University of London\\ samer@73: Mile End Road, London E1 4NS}} samer@73: samer@73: \maketitle samer@73: \begin{abstract} samer@73: We describe an information-theoretic approach to the analysis samer@73: of music and other sequential data, which emphasises the predictive aspects samer@73: of perception, and the dynamic process samer@73: of forming and modifying expectations about an unfolding stream of data, samer@73: characterising these using the tools of information theory: entropies, samer@73: mutual informations, and related quantities. samer@73: After reviewing the theoretical foundations, samer@73: % we present a new result on predictive information rates in high-order Markov chains, and samer@73: we discuss a few emerging areas of application, including samer@73: musicological analysis, real-time beat-tracking analysis, and the generation samer@73: of musical materials as a cognitively-informed compositional aid. samer@73: \end{abstract} samer@73: samer@73: samer@73: \section{Introduction} samer@73: \label{s:Intro} samer@73: The relationship between samer@73: Shannon's \cite{Shannon48} information theory and music and art in general has been the samer@73: subject of some interest since the 1950s samer@73: \cite{Youngblood58,CoonsKraehenbuehl1958,Moles66,Meyer67,Cohen1962}. samer@73: The general thesis is that perceptible qualities and subjective states samer@73: like uncertainty, surprise, complexity, tension, and interestingness samer@73: are closely related to information-theoretic quantities like samer@73: entropy, relative entropy, and mutual information. samer@73: samer@73: Music is also an inherently dynamic process, samer@73: where listeners build up expectations about what is to happen next, samer@73: which may be fulfilled samer@73: immediately, after some delay, or modified as the music unfolds. samer@73: In this paper, we explore this ``Information Dynamics'' view of music, samer@73: discussing the theory behind it and some emerging applications. samer@73: samer@73: \subsection{Expectation and surprise in music} samer@73: The idea that the musical experience is strongly shaped by the generation samer@73: and playing out of strong and weak expectations was put forward by, amongst others, samer@73: music theorists L. B. Meyer \cite{Meyer67} and Narmour \citep{Narmour77}, but was samer@73: recognised much earlier; for example, samer@73: it was elegantly put by Hanslick \cite{Hanslick1854} in the samer@73: nineteenth century: samer@73: \begin{quote} samer@73: `The most important factor in the mental process which accompanies the samer@73: act of listening to music, and which converts it to a source of pleasure, samer@73: is \ldots the intellectual satisfaction samer@73: which the listener derives from continually following and anticipating samer@73: the composer's intentions---now, to see his expectations fulfilled, and samer@73: now, to find himself agreeably mistaken. samer@73: %It is a matter of course that samer@73: %this intellectual flux and reflux, this perpetual giving and receiving samer@73: %takes place unconsciously, and with the rapidity of lightning-flashes.' samer@73: \end{quote} samer@73: An essential aspect of this is that music is experienced as a phenomenon samer@73: that unfolds in time, rather than being apprehended as a static object samer@73: presented in its entirety. Meyer argued that the experience depends samer@73: on how we change and revise our conceptions \emph{as events happen}, on samer@73: how expectation and prediction interact with occurrence, and that, to a samer@73: large degree, the way to understand the effect of music is to focus on samer@73: this `kinetics' of expectation and surprise. samer@73: samer@73: Prediction and expectation are essentially probabilistic concepts samer@73: and can be treated mathematically using probability theory. samer@73: We suppose that when we listen to music, expectations are created on the basis samer@73: of our familiarity with various styles of music and our ability to samer@73: detect and learn statistical regularities in the music as they emerge, samer@73: There is experimental evidence that human listeners are able to internalise samer@73: statistical knowledge about musical structure, \eg samer@73: % \citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also samer@73: \citep{SaffranJohnsonAslin1999}, and also samer@73: that statistical models can form an effective basis for computational samer@73: analysis of music, \eg samer@73: \cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}. samer@73: samer@73: % \subsection{Music and information theory} samer@73: samer@73: % With a probabilistic framework for music modelling and prediction in hand, samer@73: % we can %are in a position to samer@73: % compute various samer@73: \comment{ samer@73: which provides us with a number of measures, such as entropy samer@73: and mutual information, which are suitable for quantifying states of samer@73: uncertainty and surprise, and thus could potentially enable us to build samer@73: quantitative models of the listening process described above. They are samer@73: what Berlyne \cite{Berlyne71} called `collative variables' since they are samer@73: to do with patterns of occurrence rather than medium-specific details. samer@73: Berlyne sought to show that the collative variables are closely related to samer@73: perceptual qualities like complexity, tension, interestingness, samer@73: and even aesthetic value, not just in music, but in other temporal samer@73: or visual media. samer@73: The relevance of information theory to music and art has samer@73: also been addressed by researchers from the 1950s onwards samer@73: \cite{Youngblood58,CoonsKraehenbuehl1958,Cohen1962,HillerBean66,Moles66,Meyer67}. samer@73: } samer@73: % information-theoretic quantities like entropy, relative entropy, samer@73: % and mutual information. samer@73: % and are major determinants of the overall experience. samer@73: % Berlyne's `new experimental aesthetics', the `information-aestheticians'. samer@73: samer@73: % Listeners then experience greater or lesser levels of surprise samer@73: % in response to departures from these norms. samer@73: % By careful manipulation samer@73: % of the material, the composer can thus define, and induce within the samer@73: % listener, a temporal programme of varying samer@73: % levels of uncertainty, ambiguity and surprise. samer@73: samer@73: samer@73: \subsection{Information dynamic approach} samer@73: Our working hypothesis is that, as an intelligent, predictive samer@73: agent (to which will refer as `it') listens to a piece of music, it maintains samer@73: a dynamically evolving probabilistic belief state that enables it to make predictions samer@73: about how the piece will continue, relying on both its previous experience samer@73: of music and the emerging themes of the piece. As events unfold, it revises samer@73: this belief state, which includes predictive samer@73: distributions over possible future events. These samer@73: % distributions and changes in distributions samer@73: can be characterised in terms of a handful of information samer@73: theoretic-measures such as entropy and relative entropy, samer@73: what Berlyne \cite{Berlyne71} called `collative variables', since samer@73: they are to do with \emph{patterns} of occurrence, rather than the details samer@73: of which specific things occur, samer@73: and developed the ideas of `information aesthetics' in an experimental setting. samer@73: By tracing the samer@73: evolution of a these measures, we obtain a representation which captures much samer@73: of the significant structure of the music. samer@73: samer@73: % In addition, when adaptive probabilistic models are used, expectations are samer@73: % created mainly in response to \emph{patterns} of occurence, samer@73: % rather the details of which specific things occur. samer@73: One consequence of this approach is that regardless of the details of samer@73: the sensory input or even which sensory modality is being processed, the resulting samer@73: analysis is in terms of the same units: quantities of information (bits) and samer@73: rates of information flow (bits per second). The information samer@73: theoretic concepts in terms of which the analysis is framed are universal to all sorts samer@73: of data. samer@73: Together, these suggest that an information dynamic analysis captures a samer@73: high level of \emph{abstraction}, and could be used to samer@73: make structural comparisons between different temporal media, samer@73: such as music, film, animation, and dance. samer@73: % analyse and compare information samer@73: % flow in different temporal media regardless of whether they are auditory, samer@73: % visual or otherwise. samer@73: samer@73: Another consequence is that the information dynamic approach gives us a principled way samer@73: to address the notion of \emph{subjectivity}, since the analysis is dependent on the samer@73: probability model the observer starts off with, which may depend on prior experience samer@73: or other factors, and which may change over time. Thus, inter-subject variablity and samer@73: variation in subjects' responses over time are samer@73: fundamental to the theory. samer@73: samer@73: %modelling the creative process, which often alternates between generative samer@73: %and selective or evaluative phases \cite{Boden1990}, and would have samer@73: %applications in tools for computer aided composition. samer@73: samer@73: samer@73: \section{Theoretical review} samer@73: samer@73: \subsection{Entropy and information} samer@73: \label{s:entro-info} samer@73: samer@73: Let $X$ denote some variable whose value is initially unknown to our samer@73: hypothetical observer. We will treat $X$ mathematically as a random variable, samer@73: with a value to be drawn from some set $\X$ and a samer@73: probability distribution representing the observer's beliefs about the samer@73: true value of $X$. samer@73: In this case, the observer's uncertainty about $X$ can be quantified samer@73: as the entropy of the random variable $H(X)$. For a discrete variable samer@73: with probability mass function $p:\X \to [0,1]$, this is samer@73: \begin{equation} samer@73: H(X) = \sum_{x\in\X} -p(x) \log p(x), % = \expect{-\log p(X)}, samer@73: \end{equation} samer@73: % where $\expect{}$ is the expectation operator. samer@73: The negative-log-probability samer@73: $\ell(x) = -\log p(x)$ of a particular value $x$ can usefully be thought of as samer@73: the \emph{surprisingness} of the value $x$ should it be observed, and samer@73: hence the entropy is the expectation of the surprisingness, $\expect \ell(X)$. samer@73: samer@73: Now suppose that the observer receives some new data $\Data$ that samer@73: causes a revision of its beliefs about $X$. The \emph{information} samer@73: in this new data \emph{about} $X$ can be quantified as the samer@73: relative entropy or samer@73: Kullback-Leibler (KL) divergence between the prior and posterior samer@73: distributions $p(x)$ and $p(x|\Data)$ respectively: samer@73: \begin{equation} samer@73: \mathcal{I}_{\Data\to X} = D(p_{X|\Data} || p_{X}) samer@73: = \sum_{x\in\X} p(x|\Data) \log \frac{p(x|\Data)}{p(x)}. samer@73: \label{eq:info} samer@73: \end{equation} samer@73: When there are multiple variables $X_1, X_2$ samer@73: \etc which the observer believes to be dependent, then the observation of samer@73: one may change its beliefs and hence yield information about the samer@73: others. The joint and conditional entropies as described in any samer@73: textbook on information theory (\eg \cite{CoverThomas}) then quantify samer@73: the observer's expected uncertainty about groups of variables given the samer@73: values of others. In particular, the \emph{mutual information} samer@73: $I(X_1;X_2)$ is both the expected information samer@73: in an observation of $X_2$ about $X_1$ and the expected reduction samer@73: in uncertainty about $X_1$ after observing $X_2$: samer@73: \begin{equation} samer@73: I(X_1;X_2) = H(X_1) - H(X_1|X_2), samer@73: \end{equation} samer@73: where $H(X_1|X_2) = H(X_1,X_2) - H(X_2)$ is the conditional entropy samer@73: of $X_1$ given $X_2$. A little algebra shows that $I(X_1;X_2)=I(X_2;X_1)$ samer@73: and so the mutual information is symmetric in its arguments. A conditional samer@73: form of the mutual information can be formulated analogously: samer@73: \begin{equation} samer@73: I(X_1;X_2|X_3) = H(X_1|X_3) - H(X_1|X_2,X_3). samer@73: \end{equation} samer@73: These relationships between the various entropies and mutual samer@73: informations are conveniently visualised in \emph{information diagrams} samer@73: or I-diagrams \cite{Yeung1991} such as the one in \figrf{venn-example}. samer@73: samer@73: \begin{fig}{venn-example} samer@73: \newcommand\rad{2.2em}% samer@73: \newcommand\circo{circle (3.4em)}% samer@73: \newcommand\labrad{4.3em} samer@73: \newcommand\bound{(-6em,-5em) rectangle (6em,6em)} samer@73: \newcommand\colsep{\ } samer@73: \newcommand\clipin[1]{\clip (#1) \circo;}% samer@73: \newcommand\clipout[1]{\clip \bound (#1) \circo;}% samer@73: \newcommand\cliptwo[3]{% samer@73: \begin{scope} samer@73: \clipin{#1}; samer@73: \clipin{#2}; samer@73: \clipout{#3}; samer@73: \fill[black!30] \bound; samer@73: \end{scope} samer@73: }% samer@73: \newcommand\clipone[3]{% samer@73: \begin{scope} samer@73: \clipin{#1}; samer@73: \clipout{#2}; samer@73: \clipout{#3}; samer@73: \fill[black!15] \bound; samer@73: \end{scope} samer@73: }% samer@73: \begin{tabular}{c@{\colsep}c} samer@73: \scalebox{0.9}{% samer@73: \begin{tikzpicture}[baseline=0pt] samer@73: \coordinate (p1) at (90:\rad); samer@73: \coordinate (p2) at (210:\rad); samer@73: \coordinate (p3) at (-30:\rad); samer@73: \clipone{p1}{p2}{p3}; samer@73: \clipone{p2}{p3}{p1}; samer@73: \clipone{p3}{p1}{p2}; samer@73: \cliptwo{p1}{p2}{p3}; samer@73: \cliptwo{p2}{p3}{p1}; samer@73: \cliptwo{p3}{p1}{p2}; samer@73: \begin{scope} samer@73: \clip (p1) \circo; samer@73: \clip (p2) \circo; samer@73: \clip (p3) \circo; samer@73: \fill[black!45] \bound; samer@73: \end{scope} samer@73: \draw (p1) \circo; samer@73: \draw (p2) \circo; samer@73: \draw (p3) \circo; samer@73: \path samer@73: (barycentric cs:p3=1,p1=-0.2,p2=-0.1) +(0ex,0) node {$I_{3|12}$} samer@73: (barycentric cs:p1=1,p2=-0.2,p3=-0.1) +(0ex,0) node {$I_{1|23}$} samer@73: (barycentric cs:p2=1,p3=-0.2,p1=-0.1) +(0ex,0) node {$I_{2|13}$} samer@73: (barycentric cs:p3=1,p2=1,p1=-0.55) +(0ex,0) node {$I_{23|1}$} samer@73: (barycentric cs:p1=1,p3=1,p2=-0.55) +(0ex,0) node {$I_{13|2}$} samer@73: (barycentric cs:p2=1,p1=1,p3=-0.55) +(0ex,0) node {$I_{12|3}$} samer@73: (barycentric cs:p3=1,p2=1,p1=1) node {$I_{123}$} samer@73: ; samer@73: \path samer@73: (p1) +(140:\labrad) node {$X_1$} samer@73: (p2) +(-140:\labrad) node {$X_2$} samer@73: (p3) +(-40:\labrad) node {$X_3$}; samer@73: \end{tikzpicture}% samer@73: } samer@73: & samer@73: \parbox{0.5\linewidth}{ samer@73: \small samer@73: \begin{align*} samer@73: I_{1|23} &= H(X_1|X_2,X_3) \\ samer@73: I_{13|2} &= I(X_1;X_3|X_2) \\ samer@73: I_{1|23} + I_{13|2} &= H(X_1|X_2) \\ samer@73: I_{12|3} + I_{123} &= I(X_1;X_2) samer@73: \end{align*} samer@73: } samer@73: \end{tabular} samer@73: \caption{ samer@73: I-diagram of entropies and mutual informations samer@73: for three random variables $X_1$, $X_2$ and $X_3$. The areas of samer@73: the three circles represent $H(X_1)$, $H(X_2)$ and $H(X_3)$ respectively. samer@73: The total shaded area is the joint entropy $H(X_1,X_2,X_3)$. samer@73: The central area $I_{123}$ is the co-information \cite{McGill1954}. samer@73: Some other information measures are indicated in the legend. samer@73: } samer@73: \end{fig} samer@73: samer@73: samer@73: \subsection{Surprise and information in sequences} samer@73: \label{s:surprise-info-seq} samer@73: samer@73: Suppose that $(\ldots,X_{-1},X_0,X_1,\ldots)$ is a sequence of samer@73: random variables, infinite in both directions, samer@73: and that $\mu$ is the associated probability measure over all samer@73: realisations of the sequence. In the following, $\mu$ will simply serve samer@73: as a label for the process. We can indentify a number of information-theoretic samer@73: measures meaningful in the context of a sequential observation of the sequence, during samer@73: which, at any time $t$, the sequence can be divided into a `present' $X_t$, a `past' samer@73: $\past{X}_t \equiv (\ldots, X_{t-2}, X_{t-1})$, and a `future' samer@73: $\fut{X}_t \equiv (X_{t+1},X_{t+2},\ldots)$. samer@73: We will write the actually observed value of $X_t$ as $x_t$, and samer@73: the sequence of observations up to but not including $x_t$ as samer@73: $\past{x}_t$. samer@73: % Since the sequence is assumed stationary, we can without loss of generality, samer@73: % assume that $t=0$ in the following definitions. samer@73: samer@73: The in-context surprisingness of the observation $X_t=x_t$ depends on samer@73: both $x_t$ and the context $\past{x}_t$: samer@73: \begin{equation} samer@73: \ell_t = - \log p(x_t|\past{x}_t). samer@73: \end{equation} samer@73: However, before $X_t$ is observed, the observer can compute samer@73: the \emph{expected} surprisingness as a measure of its uncertainty about samer@73: $X_t$; this may be written as an entropy samer@73: $H(X_t|\ev(\past{X}_t = \past{x}_t))$, but note that this is samer@73: conditional on the \emph{event} $\ev(\past{X}_t=\past{x}_t)$, not the samer@73: \emph{variables} $\past{X}_t$ as in the conventional conditional entropy. samer@73: samer@73: The surprisingness $\ell_t$ and expected surprisingness samer@73: $H(X_t|\ev(\past{X}_t=\past{x}_t))$ samer@73: can be understood as \emph{subjective} information dynamic measures, since they are samer@73: based on the observer's probability model in the context of the actually observed sequence samer@73: $\past{x}_t$. They characterise what it is like to be `in the observer's shoes'. samer@73: If we view the observer as a purely passive or reactive agent, this would samer@73: probably be sufficient, but for active agents such as humans or animals, it is samer@73: often necessary to \emph{aniticipate} future events in order, for example, to plan the samer@73: most effective course of action. It makes sense for such observers to be samer@73: concerned about the predictive probability distribution over future events, samer@73: $p(\fut{x}_t|\past{x}_t)$. When an observation $\ev(X_t=x_t)$ is made in this context, samer@73: the \emph{instantaneous predictive information} (IPI) $\mathcal{I}_t$ at time $t$ samer@73: is the information in the event $\ev(X_t=x_t)$ about the entire future of the sequence $\fut{X}_t$, samer@73: \emph{given} the observed past $\past{X}_t=\past{x}_t$. samer@73: Referring to the definition of information \eqrf{info}, this is the KL divergence samer@73: between prior and posterior distributions over possible futures, which written out in full, is samer@73: \begin{equation} samer@73: \mathcal{I}_t = \sum_{\fut{x}_t \in \X^*} samer@73: p(\fut{x}_t|x_t,\past{x}_t) \log \frac{ p(\fut{x}_t|x_t,\past{x}_t) }{ p(\fut{x}_t|\past{x}_t) }, samer@73: \end{equation} samer@73: where the sum is to be taken over the set of infinite sequences $\X^*$. samer@73: Note that it is quite possible for an event to be surprising but not informative samer@73: in a predictive sense. samer@73: As with the surprisingness, the observer can compute its \emph{expected} IPI samer@73: at time $t$, which reduces to a mutual information $I(X_t;\fut{X}_t|\ev(\past{X}_t=\past{x}_t))$ samer@73: conditioned on the observed past. This could be used, for example, as an estimate samer@73: of attentional resources which should be directed at this stream of data, which may samer@73: be in competition with other sensory streams. samer@73: samer@73: \subsection{Information measures for stationary random processes} samer@73: \label{s:process-info} samer@73: samer@73: samer@73: \begin{fig}{predinfo-bg} samer@73: \newcommand\subfig[2]{\shortstack{#2\\[0.75em]#1}} samer@73: \newcommand\rad{2em}% samer@73: \newcommand\ovoid[1]{% samer@73: ++(-#1,\rad) samer@73: -- ++(2 * #1,0em) arc (90:-90:\rad) samer@73: -- ++(-2 * #1,0em) arc (270:90:\rad) samer@73: }% samer@73: \newcommand\axis{2.75em}% samer@73: \newcommand\olap{0.85em}% samer@73: \newcommand\offs{3.6em} samer@73: \newcommand\colsep{\hspace{5em}} samer@73: \newcommand\longblob{\ovoid{\axis}} samer@73: \newcommand\shortblob{\ovoid{1.75em}} samer@73: \begin{tabular}{c} samer@73: \comment{ samer@73: \subfig{(a) multi-information and entropy rates}{% samer@73: \begin{tikzpicture}%[baseline=-1em] samer@73: \newcommand\rc{1.75em} samer@73: \newcommand\throw{2.5em} samer@73: \coordinate (p1) at (180:1.5em); samer@73: \coordinate (p2) at (0:0.3em); samer@73: \newcommand\bound{(-7em,-2.6em) rectangle (7em,3.0em)} samer@73: \newcommand\present{(p2) circle (\rc)} samer@73: \newcommand\thepast{(p1) ++(-\throw,0) \ovoid{\throw}} samer@73: \newcommand\fillclipped[2]{% samer@73: \begin{scope}[even odd rule] samer@73: \foreach \thing in {#2} {\clip \thing;} samer@73: \fill[black!#1] \bound; samer@73: \end{scope}% samer@73: }% samer@73: \fillclipped{30}{\present,\bound \thepast} samer@73: \fillclipped{15}{\present,\bound \thepast} samer@73: \fillclipped{45}{\present,\thepast} samer@73: \draw \thepast; samer@73: \draw \present; samer@73: \node at (barycentric cs:p2=1,p1=-0.3) {$h_\mu$}; samer@73: \node at (barycentric cs:p2=1,p1=1) [shape=rectangle,fill=black!45,inner sep=1pt]{$\rho_\mu$}; samer@73: \path (p2) +(90:3em) node {$X_0$}; samer@73: \path (p1) +(-3em,0em) node {\shortstack{infinite\\past}}; samer@73: \path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$}; samer@73: \end{tikzpicture}}% samer@73: \\[1em] samer@73: \subfig{(a) excess entropy}{% samer@73: \newcommand\blob{\longblob} samer@73: \begin{tikzpicture} samer@73: \coordinate (p1) at (-\offs,0em); samer@73: \coordinate (p2) at (\offs,0em); samer@73: \begin{scope} samer@73: \clip (p1) \blob; samer@73: \clip (p2) \blob; samer@73: \fill[lightgray] (-1,-1) rectangle (1,1); samer@73: \end{scope} samer@73: \draw (p1) +(-0.5em,0em) node{\shortstack{infinite\\past}} \blob; samer@73: \draw (p2) +(0.5em,0em) node{\shortstack{infinite\\future}} \blob; samer@73: \path (0,0) node (future) {$E$}; samer@73: \path (p1) +(-2em,\rad) node [anchor=south] {$\ldots,X_{-1}$}; samer@73: \path (p2) +(2em,\rad) node [anchor=south] {$X_0,\ldots$}; samer@73: \end{tikzpicture}% samer@73: }% samer@73: \\[1em] samer@73: } samer@73: % \subfig{(b) predictive information rate $b_\mu$}{% samer@73: \begin{tikzpicture}%[baseline=-1em] samer@73: \newcommand\rc{2.2em} samer@73: \newcommand\throw{2.5em} samer@73: \coordinate (p1) at (210:1.5em); samer@73: \coordinate (p2) at (90:0.8em); samer@73: \coordinate (p3) at (-30:1.5em); samer@73: \newcommand\bound{(-7em,-2.6em) rectangle (7em,3.0em)} samer@73: \newcommand\present{(p2) circle (\rc)} samer@73: \newcommand\thepast{(p1) ++(-\throw,0) \ovoid{\throw}} samer@73: \newcommand\future{(p3) ++(\throw,0) \ovoid{\throw}} samer@73: \newcommand\fillclipped[2]{% samer@73: \begin{scope}[even odd rule] samer@73: \foreach \thing in {#2} {\clip \thing;} samer@73: \fill[black!#1] \bound; samer@73: \end{scope}% samer@73: }% samer@73: % \fillclipped{80}{\future,\thepast} samer@73: \fillclipped{30}{\present,\future,\bound \thepast} samer@73: \fillclipped{15}{\present,\bound \future,\bound \thepast} samer@73: \draw \future; samer@73: \fillclipped{45}{\present,\thepast} samer@73: \draw \thepast; samer@73: \draw \present; samer@73: \node at (barycentric cs:p2=0.9,p1=-0.17,p3=-0.17) {$r_\mu$}; samer@73: \node at (barycentric cs:p1=-0.5,p2=1.0,p3=1) {$b_\mu$}; samer@73: \node at (barycentric cs:p3=0,p2=1,p1=1.2) [shape=rectangle,fill=black!45,inner sep=1pt]{$\rho_\mu$}; samer@73: \path (p2) +(140:3.2em) node {$X_0$}; samer@73: % \node at (barycentric cs:p3=0,p2=1,p1=1) {$\rho_\mu$}; samer@73: \path (p3) +(3em,0em) node {\shortstack{infinite\\future}}; samer@73: \path (p1) +(-3em,0em) node {\shortstack{infinite\\past}}; samer@73: \path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$}; samer@73: \path (p3) +(4em,\rad) node [anchor=south] {$X_1,\ldots$}; samer@73: \end{tikzpicture}%}% samer@73: % \\[0.25em] samer@73: \end{tabular} samer@73: \caption{ samer@73: I-diagram illustrating several information measures in samer@73: stationary random processes. Each circle or oval represents a random samer@73: variable or sequence of random variables relative to time $t=0$. Overlapped areas samer@73: correspond to various mutual informations. samer@73: The circle represents the `present'. Its total area is samer@73: $H(X_0)=\rho_\mu+r_\mu+b_\mu$, where $\rho_\mu$ is the multi-information samer@73: rate, $r_\mu$ is the residual entropy rate, and $b_\mu$ is the predictive samer@73: information rate. The entropy rate is $h_\mu = r_\mu+b_\mu$. samer@73: % The small dark samer@73: % region below $X_0$ is $\sigma_\mu$ and the excess entropy samer@73: % is $E = \rho_\mu + \sigma_\mu$. samer@73: } samer@73: \end{fig} samer@73: samer@73: If we step back, out of the observer's shoes as it were, and consider the samer@73: random process $(\ldots,X_{-1},X_0,X_1,\dots)$ as a statistical ensemble of samer@73: possible realisations, and furthermore assume that it is stationary, samer@73: then it becomes possible to define a number of information-theoretic measures, samer@73: closely related to those described above, but which characterise the samer@73: process as a whole, rather than on a moment-by-moment basis. Some of these, samer@73: such as the entropy rate, are well-known, but others are only recently being samer@73: investigated. (In the following, the assumption of stationarity means that samer@73: the measures defined below are independent of $t$.) samer@73: samer@73: The \emph{entropy rate} of the process is the entropy of the `present' samer@73: $X_t$ given the `past': samer@73: \begin{equation} samer@73: \label{eq:entro-rate} samer@73: h_\mu = H(X_t|\past{X}_t). samer@73: \end{equation} samer@73: The entropy rate is a measure of the overall surprisingness samer@73: or unpredictability of the process, and gives an indication of the average samer@73: level of surprise and uncertainty that would be experienced by an observer samer@73: computing the measures of \secrf{surprise-info-seq} on a sequence sampled samer@73: from the process. samer@73: samer@73: The \emph{multi-information rate} $\rho_\mu$ \cite{Dubnov2004} samer@73: is the mutual samer@73: information between the `past' and the `present': samer@73: \begin{equation} samer@73: \label{eq:multi-info} samer@73: \rho_\mu = I(\past{X}_t;X_t) = H(X_t) - h_\mu. samer@73: \end{equation} samer@73: It is a measure of how much the preceeding context of an observation samer@73: helps in predicting or reducing the suprisingness of the current observation. samer@73: samer@73: The \emph{excess entropy} \cite{CrutchfieldPackard1983} samer@73: is the mutual information between samer@73: the entire `past' and the entire `future' plus `present': samer@73: \begin{equation} samer@73: E = I(\past{X}_t; X_t,\fut{X}_t). samer@73: \end{equation} samer@73: Both the excess entropy and the multi-information rate can be thought samer@73: of as measures of \emph{redundancy}, quantifying the extent to which samer@73: the same information is to be found in all parts of the sequence. samer@73: samer@73: samer@73: The \emph{predictive information rate} (or PIR) \cite{AbdallahPlumbley2009} samer@73: is the mutual information between the `present' and the `future' given the samer@73: `past': samer@73: \begin{equation} samer@73: \label{eq:PIR} samer@73: b_\mu = I(X_t;\fut{X}_t|\past{X}_t) = H(\fut{X}_t|\past{X}_t) - H(\fut{X}_t|X_t,\past{X}_t), samer@73: \end{equation} samer@73: which can be read as the average reduction samer@73: in uncertainty about the future on learning $X_t$, given the past. samer@73: Due to the symmetry of the mutual information, it can also be written samer@73: as samer@73: \begin{equation} samer@73: % \IXZ_t samer@73: b_\mu = H(X_t|\past{X}_t) - H(X_t|\past{X}_t,\fut{X}_t) = h_\mu - r_\mu, samer@73: % \label{<++>} samer@73: \end{equation} samer@73: % If $X$ is stationary, then samer@73: where $r_\mu = H(X_t|\fut{X}_t,\past{X}_t)$, samer@73: is the \emph{residual} \cite{AbdallahPlumbley2010}, samer@73: or \emph{erasure} \cite{VerduWeissman2006} entropy rate. samer@73: The PIR gives an indication of the average IPI that would be experienced samer@73: by an observer processing a sequence sampled from this process. samer@73: The relationship between these various measures are illustrated in \Figrf{predinfo-bg}; samer@73: see James et al \cite{JamesEllisonCrutchfield2011} for further discussion. samer@73: % in , along with several of the information measures we have discussed so far. samer@73: samer@73: \comment{ samer@73: James et al v\cite{JamesEllisonCrutchfield2011} review several of these samer@73: information measures and introduce some new related ones. samer@73: In particular they identify the $\sigma_\mu = I(\past{X}_t;\fut{X}_t|X_t)$, samer@73: the mutual information between the past and the future given the present, samer@73: as an interesting quantity that measures the predictive benefit of samer@73: model-building, that is, maintaining an internal state summarising past samer@73: observations in order to make better predictions. It is shown as the samer@73: small dark region below the circle in \figrf{predinfo-bg}(c). samer@73: By comparing with \figrf{predinfo-bg}(b), we can see that samer@73: $\sigma_\mu = E - \rho_\mu$. samer@73: } samer@73: % They also identify samer@73: % $w_\mu = \rho_\mu + b_{\mu}$, which they call the \emph{local exogenous samer@73: % information} rate. samer@73: samer@73: samer@73: \subsection{First and higher order Markov chains} samer@73: \label{s:markov} samer@73: % First order Markov chains are the simplest non-trivial models to which information samer@73: % dynamics methods can be applied. samer@73: In \cite{AbdallahPlumbley2009} we derived samer@73: expressions for all the information measures described in \secrf{surprise-info-seq} for samer@73: ergodic first order Markov chains (\ie that have a unique stationary samer@73: distribution). samer@73: % The derivation is greatly simplified by the dependency structure samer@73: % of the Markov chain: for the purpose of the analysis, the `past' and `future' samer@73: % segments $\past{X}_t$ and $\fut{X}_t$ can be collapsed to just the previous samer@73: % and next variables $X_{t-1}$ and $X_{t+1}$ respectively. samer@73: We also showed that samer@73: the PIR can be expressed simply in terms of entropy rates: samer@73: if we let $a$ denote the $K\times K$ transition matrix of a Markov chain over samer@73: an alphabet $\{1,\ldots,K\}$, such that samer@73: $a_{ij} = \Pr(\ev(X_t=i|\ev(X_{t-1}=j)))$, and let $h:\reals^{K\times K}\to \reals$ be samer@73: the entropy rate function such that $h(a)$ is the entropy rate of a Markov chain samer@73: with transition matrix $a$, then the PIR is samer@73: \begin{equation} samer@73: b_\mu = h(a^2) - h(a), samer@73: \end{equation} samer@73: where $a^2$ is the transition matrix of the samer@73: % `skip one' samer@73: Markov chain obtained by jumping two steps at a time samer@73: along the original chain. samer@73: samer@73: Second and higher order Markov chains can be treated in a similar way by transforming samer@73: to a first order representation of the high order Markov chain. With samer@73: an $N$th order model, this is done by forming a new alphabet of size $K^N$ samer@73: consisting of all possible $N$-tuples of symbols from the base alphabet. samer@73: An observation $\hat{x}_t$ in this new model encodes a block of $N$ observations samer@73: $(x_{t+1},\ldots,x_{t+N})$ from the base model. samer@73: % The next samer@73: % observation $\hat{x}_{t+1}$ encodes the block of $N$ obtained by shifting the previous samer@73: % block along by one step. samer@73: The new Markov of chain is parameterised by a sparse $K^N\times K^N$ samer@73: transition matrix $\hat{a}$, in terms of which the PIR is samer@73: \begin{equation} samer@73: h_\mu = h(\hat{a}), \qquad b_\mu = h({\hat{a}^{N+1}}) - N h({\hat{a}}), samer@73: \end{equation} samer@73: where $\hat{a}^{N+1}$ is the $(N+1)$th power of the first order transition matrix. samer@73: Other information measures can also be computed for the high-order Markov chain, including samer@73: the multi-information rate $\rho_\mu$ and the excess entropy $E$. (These are identical samer@73: for first order Markov chains, but for order $N$ chains, $E$ can be up to $N$ times larger samer@73: than $\rho_\mu$.) samer@73: samer@73: In our experiments with visualising and sonifying sequences sampled from samer@73: first order Markov chains \cite{AbdallahPlumbley2009}, we found that samer@73: the measures $h_\mu$, $\rho_\mu$ and $b_\mu$ correspond to perceptible samer@73: characteristics, and that the transition matrices maximising or minimising samer@73: each of these quantities are quite distinct. High entropy rates are associated samer@73: with completely uncorrelated sequences with no recognisable temporal structure samer@73: (and low $\rho_\mu$ and $b_\mu$). samer@73: High values of $\rho_\mu$ are associated with long periodic cycles (and low $h_\mu$ samer@73: and $b_\mu$). High values of $b_\mu$ are associated with intermediate values samer@73: of $\rho_\mu$ and $h_\mu$, and recognisable, but not completely predictable, samer@73: temporal structures. These relationships are visible in \figrf{mtriscat} in samer@73: \secrf{composition}, where we pick up this thread again, with an application of samer@73: information dynamics in a compositional aid. samer@73: samer@73: samer@73: \section{Information Dynamics in Analysis} samer@73: samer@73: \subsection{Musicological Analysis} samer@73: \label{s:minimusic} samer@73: samer@73: In \cite{AbdallahPlumbley2009}, we analysed two pieces of music in the minimalist style samer@73: by Philip Glass: \emph{Two Pages} (1969) and \emph{Gradus} (1968). samer@73: The analysis was done using a first-order Markov chain model, with the samer@73: enhancement that the transition matrix of the model was allowed to samer@73: evolve dynamically as the notes were processed, and was tracked (in samer@73: a Bayesian way) as a \emph{distribution} over possible transition matrices, samer@73: rather than a point estimate. Some results are summarised in \figrf{twopages}: samer@73: the upper four plots show the dynamically evolving subjective information samer@73: measures as described in \secrf{surprise-info-seq}, computed using a point samer@73: estimate of the current transition matrix; the fifth plot (the `model information rate') samer@73: shows the information in each observation about the transition matrix. samer@73: In \cite{AbdallahPlumbley2010b}, we showed that this `model information rate' samer@73: is actually a component of the true IPI when the transition samer@73: matrix is being learned online, and was neglected when we computed the IPI from samer@73: the transition matrix as if it were a constant. samer@73: samer@73: The peaks of the surprisingness and both components of the IPI samer@73: show good correspondence with structure of the piece both as marked in the score samer@73: and as analysed by musicologist Keith Potter, who was asked to mark the six samer@73: `most surprising moments' of the piece (shown as asterisks in the fifth plot). %% samer@73: % \footnote{% samer@73: % Note that the boundary marked in the score at around note 5,400 is known to be samer@73: % anomalous; on the basis of a listening analysis, some musicologists have samer@73: % placed the boundary a few bars later, in agreement with our analysis samer@73: % \cite{PotterEtAl2007}.} samer@73: % samer@73: In contrast, the analyses shown in the lower two plots of \figrf{twopages}, samer@73: obtained using two rule-based music segmentation algorithms, while clearly samer@73: \emph{reflecting} the structure of the piece, do not \emph{segment} the piece, samer@73: with no tendency to peaking of the boundary strength function at samer@73: the boundaries in the piece. samer@73: samer@73: The complete analysis of \emph{Gradus} can be found in \cite{AbdallahPlumbley2009}, samer@73: but \figrf{metre} illustrates the result of a metrical analysis: the piece was divided samer@73: into bars of 32, 64 and 128 notes. In each case, the average surprisingness and samer@73: IPI for the first, second, third \etc notes in each bar were computed. The plots samer@73: show that the first note of each bar is, on average, significantly more surprising samer@73: and informative than the others, up to the 64-note level, where as at the 128-note, samer@73: level, the dominant periodicity appears to remain at 64 notes. samer@73: samer@73: \begin{fig}{twopages} samer@73: \colfig[0.96]{matbase/fig9471}\\ % update from mbc paper samer@73: % \colfig[0.97]{matbase/fig72663}\\ % later update from mbc paper (Keith's new picks) samer@73: \vspace*{0.5em} samer@73: \colfig[0.97]{matbase/fig13377} % rule based analysis samer@73: \caption{Analysis of \emph{Two Pages}. samer@73: The thick vertical lines are the part boundaries as indicated in samer@73: the score by the composer. samer@73: The thin grey lines samer@73: indicate changes in the melodic `figures' of which the piece is samer@73: constructed. In the `model information rate' panel, the black asterisks samer@73: mark the six most surprising moments selected by Keith Potter. samer@73: The bottom two panels show two rule-based boundary strength analyses. samer@73: All information measures are in nats. samer@73: %Note that the boundary marked in the score at around note 5,400 is known to be samer@73: %anomalous; on the basis of a listening analysis, some musicologists have samer@73: %placed the boundary a few bars later, in agreement with our analysis samer@73: \cite{PotterEtAl2007}. samer@73: } samer@73: \end{fig} samer@73: samer@73: \begin{fig}{metre} samer@73: % \scalebox{1}{% samer@73: \begin{tabular}{cc} samer@73: \colfig[0.45]{matbase/fig36859} & \colfig[0.48]{matbase/fig88658} \\ samer@73: \colfig[0.45]{matbase/fig48061} & \colfig[0.48]{matbase/fig46367} \\ samer@73: \colfig[0.45]{matbase/fig99042} & \colfig[0.47]{matbase/fig87490} samer@73: % \colfig[0.46]{matbase/fig56807} & \colfig[0.48]{matbase/fig27144} \\ samer@73: % \colfig[0.46]{matbase/fig87574} & \colfig[0.48]{matbase/fig13651} \\ samer@73: % \colfig[0.44]{matbase/fig19913} & \colfig[0.46]{matbase/fig66144} \\ samer@73: % \colfig[0.48]{matbase/fig73098} & \colfig[0.48]{matbase/fig57141} \\ samer@73: % \colfig[0.48]{matbase/fig25703} & \colfig[0.48]{matbase/fig72080} \\ samer@73: % \colfig[0.48]{matbase/fig9142} & \colfig[0.48]{matbase/fig27751} samer@73: samer@73: \end{tabular}% samer@73: % } samer@73: \caption{Metrical analysis by computing average surprisingness and samer@73: IPI of notes at different periodicities (\ie hypothetical samer@73: bar lengths) and phases (\ie positions within a bar). samer@73: } samer@73: \end{fig} samer@73: samer@73: \begin{fig*}{drumfig} samer@73: % \includegraphics[width=0.9\linewidth]{drum_plots/file9-track.eps}% \\ samer@73: \includegraphics[width=0.97\linewidth]{figs/file11-track.eps} \\ samer@73: % \includegraphics[width=0.9\linewidth]{newplots/file8-track.eps} samer@73: \caption{Information dynamic analysis derived from audio recordings of samer@73: drumming, obtained by applying a Bayesian beat tracking system to the samer@73: sequence of detected kick and snare drum events. The grey line show the system's samer@73: varying level of uncertainty (entropy) about the tempo and phase of the samer@73: beat grid, while the stem plot shows the amount of information in each samer@73: drum event about the beat grid. The entropy drops instantaneously at each samer@73: event and rises gradually between events. samer@73: } samer@73: \end{fig*} samer@73: samer@73: \subsection{Real-valued signals and audio analysis} samer@73: Using analogous definitions based on the differential entropy samer@73: \cite{CoverThomas}, the methods outlined samer@73: in \secrf{surprise-info-seq} and \secrf{process-info} samer@73: can be reformulated for random variables taking values in a continuous domain samer@73: and thus be applied to expressive parameters of music samer@73: such as dynamics, timing and timbre, which are readily quantified on a continuous scale. samer@73: % samer@73: % \subsection{Audio based content analysis} samer@73: % Using analogous definitions of differential entropy, the methods outlined samer@73: % in the previous section are equally applicable to continuous random variables. samer@73: % In the case of music, where expressive properties such as dynamics, tempo, samer@73: % timing and timbre are readily quantified on a continuous scale, the information samer@73: % dynamic framework may also be considered. samer@73: % samer@73: Dubnov \cite{Dubnov2004} considers the class of stationary Gaussian samer@73: processes, for which the entropy rate may be obtained analytically samer@73: from the power spectral density function $S(\omega)$ of the signal, samer@73: and found that the samer@73: multi-information rate can be samer@73: expressed as samer@73: \begin{equation} samer@73: \rho_\mu = \frac{1}{2} \left( \log \specint{} - \specint{\log}\right). samer@73: \label{eq:mir-sfm} samer@73: \end{equation} samer@73: Dubnov also notes that $e^{-2\rho_\mu}$ is equivalent to the well-known samer@73: \emph{spectral flatness measure}, and hence, samer@73: Gaussian processes with maximal multi-information rate are those with maximally samer@73: non-flat spectra, which are those dominated by a single frequency component. samer@73: % These essentially consist of a single samer@73: % sinusoidal component and hence are completely predictable once samer@73: % the parameters of the sinusoid have been inferred. samer@73: % Local stationarity is assumed, which may be achieved by windowing or samer@73: % change point detection \cite{Dubnov2008}. samer@73: %TODO samer@73: samer@73: We have found (to appear in forthcoming work) that the predictive information for autoregressive samer@73: Gaussian processes can be expressed as samer@73: \begin{equation} samer@73: b_\mu = \frac{1}{2} \left( \log \specint{\frac{1}} - \specint{\log\frac{1}}\right), samer@73: \end{equation} samer@73: suggesting a sort of duality between $b_\mu$ and $\rho_\mu$ which is consistent with samer@73: the duality between multi-information and predictive information rates we discuss in samer@73: \cite{AbdallahPlumbley2012}. A consideration of the residual or erasure entropy rate samer@73: \cite{VerduWeissman2006} samer@73: suggests that this expression applies to Guassian processes in general but this is samer@73: yet to be confirmed rigorously. samer@73: samer@73: Analysis shows that in stationary autogressive processes of a given finite order, samer@73: $\rho_\mu$ is unbounded, while for moving average process of a given order, $b_\mu$ is unbounded. samer@73: This is a result of the physically unattainable infinite precision observations which the samer@73: theoretical analysis assumes; adding more realistic limitations on the amount of information samer@73: that can be extracted from one measurement is the one of the aims of our ongoing work in this samer@73: area. samer@73: % We are currently working towards methods for the computation of predictive information samer@73: % rate in autorregressive and moving average Gaussian processes samer@73: % and processes with power-law (or $1/f$) spectra, samer@73: % which have previously been investegated in relation to their aesthetic properties samer@73: % \cite{Voss75,TaylorSpeharVan-Donkelaar2011}. samer@73: samer@73: % (fractionally integrated Gaussian noise). samer@73: % %(fBm (continuous), fiGn discrete time) possible reference: samer@73: % @book{palma2007long, samer@73: % title={Long-memory time series: theory and methods}, samer@73: % author={Palma, W.}, samer@73: % volume={662}, samer@73: % year={2007}, samer@73: % publisher={Wiley-Blackwell} samer@73: % } samer@73: samer@73: samer@73: samer@73: % mention non-gaussian processes extension Similarly, the predictive information samer@73: % rate may be computed using a Gaussian linear formulation CITE. In this view, samer@73: % the PIR is a function of the correlation between random innovations supplied samer@73: % to the stochastic process. %Dubnov, MacAdams, Reynolds (2006) %Bailes and Dean (2009) samer@73: samer@73: % In \cite{Dubnov2006}, Dubnov considers the class of stationary Gaussian samer@73: % processes. For such processes, the entropy rate may be obtained analytically samer@73: % from the power spectral density of the signal, allowing the multi-information samer@73: % rate to be subsequently obtained. One aspect demanding further investigation samer@73: % involves the comparison of alternative measures of predictability. In the case of the PIR, a Gaussian linear formulation is applicable, indicating that the PIR is a function of the correlation between random innovations supplied to the stochastic process CITE. samer@73: % !!! FIXME samer@73: samer@73: samer@73: \subsection{Beat Tracking} samer@73: samer@73: A probabilistic method for drum tracking was presented by Robertson samer@73: \cite{Robertson11c}. The system infers a beat grid (a sequence samer@73: of approximately regular beat times) given audio inputs from a samer@73: live drummer, for the purpose of synchronising a music samer@73: sequencer with the drummer. samer@73: The times of kick and snare drum events are obtained samer@73: using dedicated microphones for each drum and a percussive onset detector samer@73: \cite{puckette98}. These event times are then sent samer@73: to the beat tracker, which maintains a belief state in samer@73: the form of distributions over the tempo and phase of the beat grid. samer@73: Every time an event is received, these distributions are updated samer@73: with respect to a probabilistic model which accounts both for tempo and phase samer@73: variations and the emission of drum events at musically plausible times samer@73: relative to the beat grid. samer@73: %continually updates distributions for tempo and phase on receiving a new samer@73: %event time samer@73: samer@73: The use of a probabilistic belief state means we can compute entropies samer@73: representing the system's uncertainty about the beat grid, and quantify samer@73: the amount of information in each event about the beat grid as the KL divergence samer@73: between prior and posterior distributions. Though this is not strictly the samer@73: instantaneous predictive information (IPI) as described in \secrf{surprise-info-seq} samer@73: (the information gained is not directly about future event times), we can treat samer@73: it as a proxy for the IPI, in the manner of the `model information rate' samer@73: described in \secrf{minimusic}, which has a similar status. samer@73: samer@73: We carried out the analysis on 16 recordings; an example samer@73: is shown in \figrf{drumfig}. There we can see variations in the samer@73: entropy in the upper graph and the information in each drum event in the lower samer@73: stem plot. At certain points in time, unusually large amounts of information samer@73: arrive; these may be related to fills and other rhythmic irregularities, which samer@73: are often followed by an emphatic return to a steady beat at the beginning samer@73: of the next bar---this is something we are currently investigating. samer@73: We also analysed the pattern of information flow samer@73: on a cyclic metre, much as in \figrf{metre}. All the recordings we samer@73: analysed are audibly in 4/4 metre, but we found no samer@73: evidence of a general tendency for greater amounts of information to arrive samer@73: at metrically strong beats, which suggests that the rhythmic accuracy of the samer@73: drummers does not vary systematically across each bar. It is possible that metrical information samer@73: existing in the pattern of kick and snare events might emerge in an samer@73: analysis using a model that attempts to predict the time and type of samer@73: the next drum event, rather than just inferring the beat grid as the current model does. samer@73: %The analysis of information rates can b samer@73: %considered \emph{subjective}, in that it measures how the drum tracker's samer@73: %probability distributions change, and these are contingent upon the samer@73: %model used as well as external properties in the signal. samer@73: %We expect, samer@73: %however, that following periods of increased uncertainty, such as fills samer@73: %or expressive timing, the information contained in an individual event samer@73: %increases. We also examine whether the information is dependent upon samer@73: %metrical position. samer@73: samer@73: samer@73: \section{Information dynamics as compositional aid} samer@73: \label{s:composition} samer@73: samer@73: The use of stochastic processes in music composition has been widespread for samer@73: decades---for instance Iannis Xenakis applied probabilistic mathematical models samer@73: to the creation of musical materials\cite{Xenakis:1992ul}. While such processes samer@73: can drive the \emph{generative} phase of the creative process, information dynamics samer@73: can serve as a novel framework for a \emph{selective} phase, by samer@73: providing a set of criteria to be used in judging which of the samer@73: generated materials samer@73: are of value. This alternation of generative and selective phases as been samer@73: noted before \cite{Boden1990}. samer@73: % samer@73: Information-dynamic criteria can also be used as \emph{constraints} on the samer@73: generative processes, for example, by specifying a certain temporal profile samer@73: of suprisingness and uncertainty the composer wishes to induce in the listener samer@73: as the piece unfolds. samer@73: %stochastic and algorithmic processes: ; outputs can be filtered to match a set of samer@73: %criteria defined in terms of information-dynamical characteristics, such as samer@73: %predictability vs unpredictability samer@73: %s model, this criteria thus becoming a means of interfacing with the generative processes. samer@73: samer@73: %The tools of information dynamics provide a way to constrain and select musical samer@73: %materials at the level of patterns of expectation, implication, uncertainty, and predictability. samer@73: In particular, the behaviour of the predictive information rate (PIR) defined in samer@73: \secrf{process-info} make it interesting from a compositional point of view. The definition samer@73: of the PIR is such that it is low both for extremely regular processes, such as constant samer@73: or periodic sequences, \emph{and} low for extremely random processes, where each symbol samer@73: is chosen independently of the others, in a kind of `white noise'. In the former case, samer@73: the pattern, once established, is completely predictable and therefore there is no samer@73: \emph{new} information in subsequent observations. In the latter case, the randomness samer@73: and independence of all elements of the sequence means that, though potentially surprising, samer@73: each observation carries no information about the ones to come. samer@73: samer@73: Processes with high PIR maintain a certain kind of balance between samer@73: predictability and unpredictability in such a way that the observer must continually samer@73: pay attention to each new observation as it occurs in order to make the best samer@73: possible predictions about the evolution of the seqeunce. This balance between predictability samer@73: and unpredictability is reminiscent of the inverted `U' shape of the Wundt curve (see \figrf{wundt}), samer@73: which summarises the observations of Wundt \cite{Wundt1897} that stimuli are most samer@73: pleasing at intermediate levels of novelty or disorder, samer@73: where there is a balance between `order' and `chaos'. samer@73: samer@73: Using the methods of \secrf{markov}, we found \cite{AbdallahPlumbley2009} samer@73: a similar shape when plotting entropy rate againt PIR---this is visible in the samer@73: upper envelope of the plot in \figrf{mtriscat}, which is a 3-D scatter plot of samer@73: three of the information measures discussed in \secrf{process-info} for several thousand samer@73: first-order Markov chain transition matrices generated by a random sampling method. samer@73: The coordinates of the `information space' are entropy rate ($h_\mu$), redundancy ($\rho_\mu$), and samer@73: predictive information rate ($b_\mu$). The points along the `redundancy' axis correspond samer@73: to periodic Markov chains. Those along the `entropy' axis produce uncorrelated sequences samer@73: with no temporal structure. Processes with high PIR are to be found at intermediate samer@73: levels of entropy and redundancy. samer@73: samer@73: %It is possible to apply information dynamics to the generation of content, such as to the composition of musical materials. samer@73: samer@73: %For instance a stochastic music generating process could be controlled by modifying samer@73: %constraints on its output in terms of predictive information rate or entropy samer@73: %rate. samer@73: samer@73: \begin{fig}{wundt} samer@73: \raisebox{-4em}{\colfig[0.43]{wundt}} samer@73: % {\ \shortstack{{\Large$\longrightarrow$}\\ {\scriptsize\emph{exposure}}}\ } samer@73: {\ {\large$\longrightarrow$}\ } samer@73: \raisebox{-4em}{\colfig[0.43]{wundt2}} samer@73: \caption{ samer@73: The Wundt curve relating randomness/complexity with samer@73: perceived value. Repeated exposure sometimes results samer@73: in a move to the left along the curve \cite{Berlyne71}. samer@73: } samer@73: \end{fig} samer@73: samer@73: samer@73: samer@73: \subsection{The Melody Triangle} samer@73: samer@73: These observations led us to construct the `Melody Triangle', a graphical interface for samer@73: %for %exploring the melodic patterns generated by each of the Markov chains represented samer@73: %as points in \figrf{mtriscat}. samer@73: % samer@73: %The Melody Triangle is an interface for samer@73: the discovery of melodic samer@73: materials, where the input---positions within a triangle---directly map to information samer@73: theoretic properties of the output. % as exemplified in \figrf{mtriscat}. samer@73: %The measures---entropy rate, redundancy and samer@73: %predictive information rate---form a criteria with which to filter the output samer@73: %of the stochastic processes used to generate sequences of notes. samer@73: %These measures samer@73: %address notions of expectation and surprise in music, and as such the Melody samer@73: %Triangle is a means of interfacing with a generative process in terms of the samer@73: %predictability of its output. samer@73: %ยง samer@73: The triangle is populated with first order Markov chain transition samer@73: matrices as illustrated in \figrf{mtriscat}. samer@73: The distribution of transition matrices in this space forms a relatively thin samer@73: curved sheet. Thus, it is a reasonable simplification to project out the samer@73: third dimension (the PIR) and present an interface that is just two dimensional. samer@73: The right-angled triangle is rotated, reflected and stretched to form an equilateral triangle with samer@73: the $h_\mu=0, \rho_\mu=0$ vertex at the top, the `redundancy' axis down the left-hand samer@73: side, and the `entropy rate' axis down the right, as shown in \figrf{TheTriangle}. samer@73: This is our `Melody Triangle' and samer@73: forms the interface by which the system is controlled. samer@73: %Using this interface thus involves a mapping to information space; samer@73: The user selects a point within the triangle, this is mapped into the samer@73: information space and the nearest transition matrix is used to generate samer@73: a sequence of values which are then sonified either as pitched notes or percussive samer@73: sounds. By choosing the position within the triangle, the user can control the samer@73: output at the level of its `collative' properties, with access to the variety samer@73: of patterns as described above and in \secrf{markov}. samer@73: %and information-theoretic criteria related to predictability samer@73: %and information flow samer@73: Though the interface is 2D, the third dimension (PIR) is implicitly present, as samer@73: transition matrices retrieved from samer@73: along the centre line of the triangle will tend to have higher PIR. samer@73: We hypothesise that, under samer@73: the appropriate conditions, these will be perceived as more `interesting' or samer@73: `melodic.' samer@73: samer@73: %The corners correspond to three different extremes of predictability and samer@73: %unpredictability, which could be loosely characterised as `periodicity', `noise' samer@73: %and `repetition'. Melodies from the `noise' corner (high $h_\mu$, low $\rho_\mu$ samer@73: %and $b_\mu$) have no discernible pattern; samer@73: %those along the `periodicity' samer@73: %to `repetition' edge are all cyclic patterns that get shorter as we approach samer@73: %the `repetition' corner, until each is just one repeating note. Those along the samer@73: %opposite edge consist of independent random notes from non-uniform distributions. samer@73: %Areas between the left and right edges will tend to have higher PIR, samer@73: %and we hypothesise that, under samer@73: %the appropriate conditions, these will be perceived as more `interesting' or samer@73: %`melodic.' samer@73: %These melodies have some level of unpredictability, but are not completely random. samer@73: % Or, conversely, are predictable, but not entirely so. samer@73: samer@73: \begin{fig}{mtriscat} samer@73: \colfig[0.9]{mtriscat} samer@73: \caption{The population of transition matrices in the 3D space of samer@73: entropy rate ($h_\mu$), redundancy ($\rho_\mu$) and PIR ($b_\mu$), samer@73: all in bits. samer@73: The concentrations of points along the redundancy axis correspond samer@73: to Markov chains which are roughly periodic with periods of 2 (redundancy 1 bit), samer@73: 3, 4, \etc all the way to period 7 (redundancy 2.8 bits). The colour of each point samer@73: represents its PIR---note that the highest values are found at intermediate entropy samer@73: and redundancy, and that the distribution as a whole makes a curved triangle. Although samer@73: not visible in this plot, it is largely hollow in the middle.} samer@73: \end{fig} samer@73: samer@73: samer@73: %PERHAPS WE SHOULD FOREGO TALKING ABOUT THE samer@73: %INSTALLATION VERSION OF THE TRIANGLE? samer@73: %feels a bit like a tangent, and could do with the space.. samer@73: The Melody Triangle exists in two incarnations: a screen-based interface samer@73: where a user moves tokens in and around a triangle on screen, and a multi-user samer@73: interactive installation where a Kinect camera tracks individuals in a space and samer@73: maps their positions in physical space to the triangle. In the latter each visitor samer@73: that enters the installation generates a melody and can collaborate with their samer@73: co-visitors to generate musical textures. This makes the interaction physically engaging samer@73: and (as our experience with visitors both young and old has demonstrated) more playful. samer@73: %Additionally visitors can change the samer@73: %tempo, register, instrumentation and periodicity of their melody with body gestures. samer@73: % samer@73: The screen based interface can serve as a compositional tool. samer@73: %%A triangle is drawn on the screen, screen space thus mapped to the statistical samer@73: %space of the Melody Triangle. samer@73: A number of tokens, each representing a samer@73: sonification stream or `voice', can be dragged in and around the triangle. samer@73: For each token, a sequence of symbols is sampled using the corresponding samer@73: transition matrix, which samer@73: %statistical properties that correspond to the token's position is generated. These samer@73: %symbols samer@73: are then mapped to notes of a scale or percussive sounds% samer@73: \footnote{The sampled sequence could easily be mapped to other musical processes, possibly over samer@73: different time scales, such as chords, dynamics and timbres. It would also be possible samer@73: to map the symbols to visual or other outputs.}% samer@73: . Keyboard commands give control over other musical parameters such samer@73: as pitch register and inter-onset interval. samer@73: %The possibilities afforded by the Melody Triangle in these other domains remains to be investigated.}. samer@73: % samer@73: The system is capable of generating quite intricate musical textures when multiple tokens samer@73: are in the triangle, but unlike other computer aided composition tools or programming samer@73: environments, the composer excercises control at the abstract level of information-dynamic samer@73: properties. samer@73: %the interface relating to subjective expectation and predictability. samer@73: samer@73: \begin{fig}{TheTriangle} samer@73: \colfig[0.7]{TheTriangle.pdf} samer@73: \caption{The Melody Triangle} samer@73: \end{fig} samer@73: samer@73: \comment{ samer@73: \subsection{Information Dynamics as Evaluative Feedback Mechanism} samer@73: %NOT SURE THIS SHOULD BE HERE AT ALL..? samer@73: Information measures on a stream of symbols can form a feedback mechanism; a samer@73: rudimentary `critic' of sorts. For instance symbol by symbol measure of predictive samer@73: information rate, entropy rate and redundancy could tell us if a stream of symbols samer@73: is currently `boring', either because it is too repetitive, or because it is too samer@73: chaotic. Such feedback would be oblivious to long term and large scale samer@73: structures and any cultural norms (such as style conventions), but samer@73: nonetheless could provide a composer with valuable insight on samer@73: the short term properties of a work. This could not only be used for the samer@73: evaluation of pre-composed streams of symbols, but could also provide real-time samer@73: feedback in an improvisatory setup. samer@73: } samer@73: samer@73: \subsection{User trials with the Melody Triangle} samer@73: We are currently in the process of using the screen-based samer@73: Melody Triangle user interface to investigate the relationship between the information-dynamic samer@73: characteristics of sonified Markov chains and subjective musical preference. samer@73: We carried out a pilot study with six participants, who were asked samer@73: to use a simplified form of the user interface (a single controllable token, samer@73: and no rhythmic, registral or timbral controls) under two conditions: samer@73: one where a single sequence was sonified under user control, and another samer@73: where an additional sequence was sonified in a different register, as if generated samer@73: by a fixed invisible token in one of four regions of the triangle. In addition, subjects samer@73: were asked to press a key if they `liked' what they were hearing. samer@73: samer@73: We recorded subjects' behaviour as well as points which they marked samer@73: with a key press. samer@73: Some results for two of the subjects are shown in \figrf{mtri-results}. Though samer@73: we have not been able to detect any systematic across-subjects preference for any particular samer@73: region of the triangle, subjects do seem to exhibit distinct kinds of exploratory behaviour. samer@73: Our initial hypothesis, that subjects would linger longer in regions of the triangle samer@73: that produced aesthetically preferable sequences, and that this would tend to be towards the samer@73: centre line of the triangle for all subjects, was not confirmed. However, it is possible samer@73: that the design of the experiment encouraged an initial exploration of the space (sometimes samer@73: very systematic, as for subject c) aimed at \emph{understanding} %the parameter space and samer@73: how the system works, rather than finding musical patterns. It is also possible that the samer@73: system encourages users to create musically interesting output by \emph{moving the token}, samer@73: rather than finding a particular spot in the triangle which produces a musically interesting samer@73: sequence by itself. samer@73: samer@73: \begin{fig}{mtri-results} samer@73: \def\scat#1{\colfig[0.42]{mtri/#1}} samer@73: \def\subj#1{\scat{scat_dwells_subj_#1} & \scat{scat_marks_subj_#1}} samer@73: \begin{tabular}{cc} samer@73: % \subj{a} \\ samer@73: \subj{b} \\ samer@73: \subj{c} \\ samer@73: \subj{d} samer@73: \end{tabular} samer@73: \caption{Dwell times and mark positions from user trials with the samer@73: on-screen Melody Triangle interface, for three subjects. The left-hand column shows samer@73: the positions in a 2D information space (entropy rate vs multi-information rate samer@73: in bits) where each spent their time; the area of each circle is proportional samer@73: to the time spent there. The right-hand column shows point which subjects samer@73: `liked'; the area of the circles here is proportional to the duration spent at samer@73: that point before the point was marked.} samer@73: \end{fig} samer@73: samer@73: Comments collected from the subjects samer@73: %during and after the experiment samer@73: suggest that samer@73: the information-dynamic characteristics of the patterns were readily apparent samer@73: to most: several noticed the main organisation of the triangle, samer@73: with repetetive notes at the top, cyclic patterns along one edge, and unpredictable samer@73: notes towards the opposite corner. Some described their systematic exploration of the space. samer@73: Two felt that the right side was `more controllable' than the left (a consequence samer@73: of their ability to return to a particular distinctive pattern and recognise it samer@73: as one heard previously). Two reported that they became bored towards the end, samer@73: but another felt there wasn't enough time to `hear out' the patterns properly. samer@73: One subject did not `enjoy' the patterns in the lower region, but another said the lower samer@73: central regions were more `melodic' and `interesting'. samer@73: samer@73: We plan to continue the trials with a slightly less restricted user interface in order samer@73: make the experience more enjoyable and thereby give subjects longer to use the interface; samer@73: this may allow them to get beyond the initial exploratory phase and give a clearer samer@73: picture of their aesthetic preferences. In addition, we plan to conduct a samer@73: study under more restrictive conditions, where subjects will have no control over the patterns samer@73: other than to signal (a) which of two alternatives they prefer in a forced samer@73: choice paradigm, and (b) when they are bored of listening to a given sequence. samer@73: samer@73: %\emph{comparable system} Gordon Pask's Musicolor (1953) applied a similar notion samer@73: %of boredom in its design. The Musicolour would react to audio input through a samer@73: %microphone by flashing coloured lights. Rather than a direct mapping of sound samer@73: %to light, Pask designed the device to be a partner to a performing musician. It samer@73: %would adapt its lighting pattern based on the rhythms and frequencies it would samer@73: %hear, quickly `learning' to flash in time with the music. However Pask endowed samer@73: %the device with the ability to `be bored'; if the rhythmic and frequency content samer@73: %of the input remained the same for too long it would listen for other rhythms samer@73: %and frequencies, only lighting when it heard these. As the Musicolour would samer@73: %`get bored', the musician would have to change and vary their playing, eliciting samer@73: %new and unexpected outputs in trying to keep the Musicolour interested. samer@73: samer@73: samer@73: \section{Conclusions} samer@73: samer@73: % !!! FIXME samer@73: %We reviewed our information dynamics approach to the modelling of the perception samer@73: We have looked at several emerging areas of application of the methods and samer@73: ideas of information dynamics to various problems in music analysis, perception samer@73: and cognition, including musicological analysis of symbolic music, audio analysis, samer@73: rhythm processing and compositional and creative tasks. The approach has proved samer@73: successful in musicological analysis, and though our initial data on samer@73: rhythm processing and aesthetic preference are inconclusive, there is still samer@73: plenty of work to be done in this area: where-ever there are probabilistic models, samer@73: information dynamics can shed light on their behaviour. samer@73: samer@73: samer@73: samer@73: \section*{acknowledgments} samer@73: This work is supported by EPSRC Doctoral Training Centre EP/G03723X/1 (HE), samer@73: GR/S82213/01 and EP/E045235/1(SA), an EPSRC DTA Studentship (PF), an RAEng/EPSRC Research Fellowship 10216/88 (AR), an EPSRC Leadership Fellowship, EP/G007144/1 samer@73: (MDP) and EPSRC IDyOM2 EP/H013059/1. samer@73: This work is partly funded by the CoSound project, funded by the Danish Agency for Science, Technology and Innovation. samer@73: Thanks also Marcus Pearce for providing the two rule-based analyses of \emph{Two Pages}. samer@73: samer@73: samer@73: \bibliographystyle{IEEEtran} samer@73: {\bibliography{all,c4dm,nime,andrew}} samer@73: \end{document}