Mercurial > hg > cip2012
view draft.tex @ 37:f31433225faa
Minor fixes before bedtime.
author | samer |
---|---|
date | Wed, 14 Mar 2012 23:08:09 +0000 |
parents | ec7d64c0ae44 |
children | 8555ff2232a6 |
line wrap: on
line source
\documentclass[conference,a4paper]{IEEEtran} \usepackage{cite} \usepackage[cmex10]{amsmath} \usepackage{graphicx} \usepackage{amssymb} \usepackage{epstopdf} \usepackage{url} \usepackage{listings} %\usepackage[expectangle]{tools} \usepackage{tools} \usepackage{tikz} \usetikzlibrary{calc} \usetikzlibrary{matrix} \usetikzlibrary{patterns} \usetikzlibrary{arrows} \let\citep=\cite \newcommand{\colfig}[2][1]{\includegraphics[width=#1\linewidth]{figs/#2}}% \newcommand\preals{\reals_+} \newcommand\X{\mathcal{X}} \newcommand\Y{\mathcal{Y}} \newcommand\domS{\mathcal{S}} \newcommand\A{\mathcal{A}} \newcommand\Data{\mathcal{D}} \newcommand\rvm[1]{\mathrm{#1}} \newcommand\sps{\,.\,} \newcommand\Ipred{\mathcal{I}_{\mathrm{pred}}} \newcommand\Ix{\mathcal{I}} \newcommand\IXZ{\overline{\underline{\mathcal{I}}}} \newcommand\x{\vec{x}} \newcommand\Ham[1]{\mathcal{H}_{#1}} \newcommand\subsets[2]{[#1]^{(k)}} \def\bet(#1,#2){#1..#2} \def\ev(#1=#2){#1\!\!=\!#2} \newcommand\rv[1]{\Omega \to #1} \newcommand\ceq{\!\!=\!} \newcommand\cmin{\!-\!} \newcommand\modulo[2]{#1\!\!\!\!\!\mod#2} \newcommand\sumitoN{\sum_{i=1}^N} \newcommand\sumktoK{\sum_{k=1}^K} \newcommand\sumjtoK{\sum_{j=1}^K} \newcommand\sumalpha{\sum_{\alpha\in\A}} \newcommand\prodktoK{\prod_{k=1}^K} \newcommand\prodjtoK{\prod_{j=1}^K} \newcommand\past[1]{\overset{\rule{0pt}{0.2em}\smash{\leftarrow}}{#1}} \newcommand\fut[1]{\overset{\rule{0pt}{0.1em}\smash{\rightarrow}}{#1}} \newcommand\parity[2]{P^{#1}_{2,#2}} %\usepackage[parfill]{parskip} \begin{document} \title{Cognitive Music Modelling: an Information Dynamics Approach} \author{ \IEEEauthorblockN{Samer A. Abdallah, Henrik Ekeus, Peter Foster} \IEEEauthorblockN{Andrew Robertson and Mark D. Plumbley} \IEEEauthorblockA{Centre for Digital Music\\ Queen Mary University of London\\ Mile End Road, London E1 4NS\\ Email:}} \maketitle \begin{abstract} People take in information when perceiving music. With it they continually build predictive models of what is going to happen. There is a relationship between information measures and how we perceive music. An information theoretic approach to music cognition is thus a fruitful avenue of research. In this paper, we review the theoretical foundations of information dynamics and discuss a few emerging areas of application. \end{abstract} \section{Introduction} \label{s:Intro} \subsection{Expectation and surprise in music} One of the effects of listening to music is to create expectations of what is to come next, which may be fulfilled immediately, after some delay, or not at all as the case may be. This is the thesis put forward by, amongst others, music theorists L. B. Meyer \cite{Meyer67} and Narmour \citep{Narmour77}, but was recognised much earlier; for example, it was elegantly put by Hanslick \cite{Hanslick1854} in the nineteenth century: \begin{quote} `The most important factor in the mental process which accompanies the act of listening to music, and which converts it to a source of pleasure, is \ldots the intellectual satisfaction which the listener derives from continually following and anticipating the composer's intentions---now, to see his expectations fulfilled, and now, to find himself agreeably mistaken. %It is a matter of course that %this intellectual flux and reflux, this perpetual giving and receiving %takes place unconsciously, and with the rapidity of lightning-flashes.' \end{quote} An essential aspect of this is that music is experienced as a phenomenon that `unfolds' in time, rather than being apprehended as a static object presented in its entirety. Meyer argued that musical experience depends on how we change and revise our conceptions \emph{as events happen}, on how expectation and prediction interact with occurrence, and that, to a large degree, the way to understand the effect of music is to focus on this `kinetics' of expectation and surprise. Prediction and expectation are essentially probabilistic concepts and can be treated mathematically using probability theory. We suppose that when we listen to music, expectations are created on the basis of our familiarity with various styles of music and our ability to detect and learn statistical regularities in the music as they emerge, There is experimental evidence that human listeners are able to internalise statistical knowledge about musical structure, \eg \citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also that statistical models can form an effective basis for computational analysis of music, \eg \cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}. \comment{ The business of making predictions and assessing surprise is essentially one of reasoning under conditions of uncertainty and manipulating degrees of belief about the various proposition which may or may not hold, and, as has been argued elsewhere \cite{Cox1946,Jaynes27}, best quantified in terms of Bayesian probability theory. Thus, we suppose that when we listen to music, expectations are created on the basis of our familiarity with various stylistic norms that apply to music in general, the particular style (or styles) of music that seem best to fit the piece we are listening to, and the emerging structures peculiar to the current piece. There is experimental evidence that human listeners are able to internalise statistical knowledge about musical structure, \eg \citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also that statistical models can form an effective basis for computational analysis of music, \eg \cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}. } \subsection{Music and information theory} With a probabilistic framework for music modelling and prediction in hand, we are in a position to apply Shannon's quantitative information theory \cite{Shannon48}. \comment{ which provides us with a number of measures, such as entropy and mutual information, which are suitable for quantifying states of uncertainty and surprise, and thus could potentially enable us to build quantitative models of the listening process described above. They are what Berlyne \cite{Berlyne71} called `collative variables' since they are to do with patterns of occurrence rather than medium-specific details. Berlyne sought to show that the collative variables are closely related to perceptual qualities like complexity, tension, interestingness, and even aesthetic value, not just in music, but in other temporal or visual media. The relevance of information theory to music and art has also been addressed by researchers from the 1950s onwards \cite{Youngblood58,CoonsKraehenbuehl1958,Cohen1962,HillerBean66,Moles66,Meyer67}. } The relationship between information theory and music and art in general has been the subject of some interest since the 1950s \cite{Youngblood58,CoonsKraehenbuehl1958,HillerBean66,Moles66,Meyer67,Cohen1962}. The general thesis is that perceptible qualities and subjective states like uncertainty, surprise, complexity, tension, and interestingness are closely related to information-theoretic quantities like entropy, relative entropy, and mutual information. % and are major determinants of the overall experience. Berlyne \cite{Berlyne71} called such quantities `collative variables', since they are to do with patterns of occurrence rather than medium-specific details, and developed the ideas of `information aesthetics' in an experimental setting. % Berlyne's `new experimental aesthetics', the `information-aestheticians'. % Listeners then experience greater or lesser levels of surprise % in response to departures from these norms. % By careful manipulation % of the material, the composer can thus define, and induce within the % listener, a temporal programme of varying % levels of uncertainty, ambiguity and surprise. \subsection{Information dynamic approach} Bringing the various strands together, our working hypothesis is that as a listener (to which will refer as `it') listens to a piece of music, it maintains a dynamically evolving probabilistic model that enables it to make predictions about how the piece will continue, relying on both its previous experience of music and the immediate context of the piece. As events unfold, it revises its probabilistic belief state, which includes predictive distributions over possible future events. These % distributions and changes in distributions can be characterised in terms of a handful of information theoretic-measures such as entropy and relative entropy. By tracing the evolution of a these measures, we obtain a representation which captures much of the significant structure of the music. One of the consequences of this approach is that regardless of the details of the sensory input or even which sensory modality is being processed, the resulting analysis is in terms of the same units: quantities of information (bits) and rates of information flow (bits per second). The probabilistic and information theoretic concepts in terms of which the analysis is framed are universal to all sorts of data. In addition, when adaptive probabilistic models are used, expectations are created mainly in response to to \emph{patterns} of occurence, rather the details of which specific things occur. Together, these suggest that an information dynamic analysis captures a high level of \emph{abstraction}, and could be used to make structural comparisons between different temporal media, such as music, film, animation, and dance. % analyse and compare information % flow in different temporal media regardless of whether they are auditory, % visual or otherwise. Another consequence is that the information dynamic approach gives us a principled way to address the notion of \emph{subjectivity}, since the analysis is dependent on the probability model the observer starts off with, which may depend on prior experience or other factors, and which may change over time. Thus, inter-subject variablity and variation in subjects' responses over time are fundamental to the theory. %modelling the creative process, which often alternates between generative %and selective or evaluative phases \cite{Boden1990}, and would have %applications in tools for computer aided composition. \section{Theoretical review} \subsection{Entropy and information} Let $X$ denote some variable whose value is initially unknown to our hypothetical observer. We will treat $X$ mathematically as a random variable, with a value to be drawn from some set $\X$ and a probability distribution representing the observer's beliefs about the true value of $X$. In this case, the observer's uncertainty about $X$ can be quantified as the entropy of the random variable $H(X)$. For a discrete variable with probability mass function $p:\X \to [0,1]$, this is \begin{equation} H(X) = \sum_{x\in\X} -p(x) \log p(x) = \expect{-\log p(X)}, \end{equation} where $\expect{}$ is the expectation operator. The negative-log-probability $\ell(x) = -\log p(x)$ of a particular value $x$ can usefully be thought of as the \emph{surprisingness} of the value $x$ should it be observed, and hence the entropy is the expected surprisingness. Now suppose that the observer receives some new data $\Data$ that causes a revision of its beliefs about $X$. The \emph{information} in this new data \emph{about} $X$ can be quantified as the Kullback-Leibler (KL) divergence between the prior and posterior distributions $p(x)$ and $p(x|\Data)$ respectively: \begin{equation} \mathcal{I}_{\Data\to X} = D(p_{X|\Data} || p_{X}) = \sum_{x\in\X} p(x|\Data) \log \frac{p(x|\Data)}{p(x)}. \end{equation} When there are multiple variables $X_1, X_2$ \etc which the observer believes to be dependent, then the observation of one may change its beliefs and hence yield information about the others. The joint and conditional entropies as described in any textbook on information theory (\eg \cite{CoverThomas}) then quantify the observer's expected uncertainty about groups of variables given the values of others. In particular, the \emph{mutual information} $I(X_1;X_2)$ is both the expected information in an observation of $X_2$ about $X_1$ and the expected reduction in uncertainty about $X_1$ after observing $X_2$: \begin{equation} I(X_1;X_2) = H(X_1) - H(X_1|X_2), \end{equation} where $H(X_1|X_2) = H(X_1,X_2) - H(X_2)$ is the conditional entropy of $X_2$ given $X_1$. A little algebra shows that $I(X_1;X_2)=I(X_2;X_1)$ and so the mutual information is symmetric in its arguments. A conditional form of the mutual information can be formulated analogously: \begin{equation} I(X_1;X_2|X_3) = H(X_1|X_3) - H(X_1|X_2,X_3). \end{equation} These relationships between the various entropies and mutual informations are conveniently visualised in Venn diagram-like \emph{information diagrams} or I-diagrams \cite{Yeung1991} such as the one in \figrf{venn-example}. \begin{fig}{venn-example} \newcommand\rad{2.2em}% \newcommand\circo{circle (3.4em)}% \newcommand\labrad{4.3em} \newcommand\bound{(-6em,-5em) rectangle (6em,6em)} \newcommand\colsep{\ } \newcommand\clipin[1]{\clip (#1) \circo;}% \newcommand\clipout[1]{\clip \bound (#1) \circo;}% \newcommand\cliptwo[3]{% \begin{scope} \clipin{#1}; \clipin{#2}; \clipout{#3}; \fill[black!30] \bound; \end{scope} }% \newcommand\clipone[3]{% \begin{scope} \clipin{#1}; \clipout{#2}; \clipout{#3}; \fill[black!15] \bound; \end{scope} }% \begin{tabular}{c@{\colsep}c} \begin{tikzpicture}[baseline=0pt] \coordinate (p1) at (90:\rad); \coordinate (p2) at (210:\rad); \coordinate (p3) at (-30:\rad); \clipone{p1}{p2}{p3}; \clipone{p2}{p3}{p1}; \clipone{p3}{p1}{p2}; \cliptwo{p1}{p2}{p3}; \cliptwo{p2}{p3}{p1}; \cliptwo{p3}{p1}{p2}; \begin{scope} \clip (p1) \circo; \clip (p2) \circo; \clip (p3) \circo; \fill[black!45] \bound; \end{scope} \draw (p1) \circo; \draw (p2) \circo; \draw (p3) \circo; \path (barycentric cs:p3=1,p1=-0.2,p2=-0.1) +(0ex,0) node {$I_{3|12}$} (barycentric cs:p1=1,p2=-0.2,p3=-0.1) +(0ex,0) node {$I_{1|23}$} (barycentric cs:p2=1,p3=-0.2,p1=-0.1) +(0ex,0) node {$I_{2|13}$} (barycentric cs:p3=1,p2=1,p1=-0.55) +(0ex,0) node {$I_{23|1}$} (barycentric cs:p1=1,p3=1,p2=-0.55) +(0ex,0) node {$I_{13|2}$} (barycentric cs:p2=1,p1=1,p3=-0.55) +(0ex,0) node {$I_{12|3}$} (barycentric cs:p3=1,p2=1,p1=1) node {$I_{123}$} ; \path (p1) +(140:\labrad) node {$X_1$} (p2) +(-140:\labrad) node {$X_2$} (p3) +(-40:\labrad) node {$X_3$}; \end{tikzpicture} & \parbox{0.5\linewidth}{ \small \begin{align*} I_{1|23} &= H(X_1|X_2,X_3) \\ I_{13|2} &= I(X_1;X_3|X_2) \\ I_{1|23} + I_{13|2} &= H(X_1|X_2) \\ I_{12|3} + I_{123} &= I(X_1;X_2) \end{align*} } \end{tabular} \caption{ I-diagram visualisation of entropies and mutual informations for three random variables $X_1$, $X_2$ and $X_3$. The areas of the three circles represent $H(X_1)$, $H(X_2)$ and $H(X_3)$ respectively. The total shaded area is the joint entropy $H(X_1,X_2,X_3)$. The central area $I_{123}$ is the co-information \cite{McGill1954}. Some other information measures are indicated in the legend. } \end{fig} \subsection{Surprise and information in sequences} \label{s:surprise-info-seq} Suppose that $(\ldots,X_{-1},X_0,X_1,\ldots)$ is a sequence of random variables, infinite in both directions, and that $\mu$ is the associated probability measure over all realisations of the sequence---in the following, $\mu$ will simply serve as a label for the process. We can indentify a number of information-theoretic measures meaningful in the context of a sequential observation of the sequence, during which, at any time $t$, the sequence of variables can be divided into a `present' $X_t$, a `past' $\past{X}_t \equiv (\ldots, X_{t-2}, X_{t-1})$, and a `future' $\fut{X}_t \equiv (X_{t+1},X_{t+2},\ldots)$. The actually observed value of $X_t$ will be written as $x_t$, and the sequence of observations up to but not including $x_t$ as $\past{x}_t$. % Since the sequence is assumed stationary, we can without loss of generality, % assume that $t=0$ in the following definitions. The in-context surprisingness of the observation $X_t=x_t$ is a function of both $x_t$ and the context $\past{x}_t$: \begin{equation} \ell(x_t|\past{x}_t) = - \log p(x_t|\past{x}_t). \end{equation} However, before $X_t$ is observed to be $x_t$, the observer can compute its \emph{expected} surprisingness as a measure of its uncertainty about the very next event; this may be written as an entropy $H(X_t|\ev(\past{X}_t = \past{x}_t))$, but note that this is conditional on the \emph{event} $\ev(\past{X}_t=\past{x}_t)$, not \emph{variables} $\past{X}_t$ as in the conventional conditional entropy. The surprisingness $\ell(x_t|\past{x}_t)$ and expected surprisingness $H(X_t|\ev(\past{X}_t=\past{x}_t))$ are subjective information dynamic measures since they are based on its subjective probability model in the context of the actually observed sequence $\past{x}_t$---they characterise what it is like to `be in the observer's shoes'. If we view the observer as a purely passive or reactive agent, this would probably be sufficient, but for active agents such as humans or animals, it is often necessary to \emph{aniticipate} future events in order, for example, to plan the most effective course of action. It makes sense for such observers to be concerned about the predictive probability distribution over future events, $p(\fut{x}_t|\past{x}_t)$. When an observation $\ev(X_t=x_t)$ is made in this context, the \emph{instantaneous predictive information} (IPI) is the information in the event $\ev(X_t=x_t)$ about the entire future of the sequence $\fut{X}_t$. \subsection{Information measures for stationary random processes} The \emph{entropy rate} of the process is the entropy of the next variable $X_t$ given all the previous ones. \begin{equation} \label{eq:entro-rate} h_\mu = H(X_0|\past{X}_0). \end{equation} The entropy rate gives a measure of the overall randomness or unpredictability of the process. The \emph{multi-information rate} $\rho_\mu$ (following Dubnov's \cite{Dubnov2006} notation for what he called the `information rate') is the mutual information between the `past' and the `present': \begin{equation} \label{eq:multi-info} \rho_\mu(t) = I(\past{X}_t;X_t) = H(X_0) - h_\mu. \end{equation} It is a measure of how much the context of an observation (that is, the observation of previous elements of the sequence) helps in predicting or reducing the suprisingness of the current observation. The \emph{excess entropy} \cite{CrutchfieldPackard1983} is the mutual information between the entire `past' and the entire `future': \begin{equation} E = I(\past{X}_0; X_0,\fut{X}_0). \end{equation} \begin{fig}{predinfo-bg} \newcommand\subfig[2]{\shortstack{#2\\[0.75em]#1}} \newcommand\rad{1.8em}% \newcommand\ovoid[1]{% ++(-#1,\rad) -- ++(2 * #1,0em) arc (90:-90:\rad) -- ++(-2 * #1,0em) arc (270:90:\rad) }% \newcommand\axis{2.75em}% \newcommand\olap{0.85em}% \newcommand\offs{3.6em} \newcommand\colsep{\hspace{5em}} \newcommand\longblob{\ovoid{\axis}} \newcommand\shortblob{\ovoid{1.75em}} \begin{tabular}{c@{\colsep}c} \subfig{(a) excess entropy}{% \newcommand\blob{\longblob} \begin{tikzpicture} \coordinate (p1) at (-\offs,0em); \coordinate (p2) at (\offs,0em); \begin{scope} \clip (p1) \blob; \clip (p2) \blob; \fill[lightgray] (-1,-1) rectangle (1,1); \end{scope} \draw (p1) +(-0.5em,0em) node{\shortstack{infinite\\past}} \blob; \draw (p2) +(0.5em,0em) node{\shortstack{infinite\\future}} \blob; \path (0,0) node (future) {$E$}; \path (p1) +(-2em,\rad) node [anchor=south] {$\ldots,X_{-1}$}; \path (p2) +(2em,\rad) node [anchor=south] {$X_0,\ldots$}; \end{tikzpicture}% }% \\[1.25em] \subfig{(b) predictive information rate $b_\mu$}{% \begin{tikzpicture}%[baseline=-1em] \newcommand\rc{2.1em} \newcommand\throw{2.5em} \coordinate (p1) at (210:1.5em); \coordinate (p2) at (90:0.7em); \coordinate (p3) at (-30:1.5em); \newcommand\bound{(-7em,-2.6em) rectangle (7em,3.0em)} \newcommand\present{(p2) circle (\rc)} \newcommand\thepast{(p1) ++(-\throw,0) \ovoid{\throw}} \newcommand\future{(p3) ++(\throw,0) \ovoid{\throw}} \newcommand\fillclipped[2]{% \begin{scope}[even odd rule] \foreach \thing in {#2} {\clip \thing;} \fill[black!#1] \bound; \end{scope}% }% \fillclipped{30}{\present,\future,\bound \thepast} \fillclipped{15}{\present,\bound \future,\bound \thepast} \draw \future; \fillclipped{45}{\present,\thepast} \draw \thepast; \draw \present; \node at (barycentric cs:p2=1,p1=-0.17,p3=-0.17) {$r_\mu$}; \node at (barycentric cs:p1=-0.4,p2=1.0,p3=1) {$b_\mu$}; \node at (barycentric cs:p3=0,p2=1,p1=1.2) [shape=rectangle,fill=black!45,inner sep=1pt]{$\rho_\mu$}; \path (p2) +(140:3em) node {$X_0$}; % \node at (barycentric cs:p3=0,p2=1,p1=1) {$\rho_\mu$}; \path (p3) +(3em,0em) node {\shortstack{infinite\\future}}; \path (p1) +(-3em,0em) node {\shortstack{infinite\\past}}; \path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$}; \path (p3) +(4em,\rad) node [anchor=south] {$X_1,\ldots$}; \end{tikzpicture}}% \\[0.5em] \end{tabular} \caption{ I-diagrams for several information measures in stationary random processes. Each circle or oval represents a random variable or sequence of random variables relative to time $t=0$. Overlapped areas correspond to various mutual information as in \Figrf{venn-example}. In (b), the circle represents the `present'. Its total area is $H(X_0)=\rho_\mu+r_\mu+b_\mu$, where $\rho_\mu$ is the multi-information rate, $r_\mu$ is the residual entropy rate, and $b_\mu$ is the predictive information rate. The entropy rate is $h_\mu = r_\mu+b_\mu$. } \end{fig} The \emph{predictive information rate} (or PIR) \cite{AbdallahPlumbley2009} is the average information in one observation about the infinite future given the infinite past, and is defined as a conditional mutual information: \begin{equation} \label{eq:PIR} b_\mu = I(X_0;\fut{X}_0|\past{X}_0) = H(\fut{X}_0|\past{X}_0) - H(\fut{X}_0|X_0,\past{X}_0). \end{equation} Equation \eqrf{PIR} can be read as the average reduction in uncertainty about the future on learning $X_t$, given the past. Due to the symmetry of the mutual information, it can also be written as \begin{equation} % \IXZ_t I(X_0;\fut{X}_0|\past{X}_0) = h_\mu - r_\mu, % \label{<++>} \end{equation} % If $X$ is stationary, then where $r_\mu = H(X_0|\fut{X}_0,\past{X}_0)$, is the \emph{residual} \cite{AbdallahPlumbley2010}, or \emph{erasure} \cite{VerduWeissman2006} entropy rate. These relationships are illustrated in \Figrf{predinfo-bg}, along with several of the information measures we have discussed so far. James et al \cite{JamesEllisonCrutchfield2011} study the predictive information rate and also examine some related measures. In particular they identify the $\sigma_\mu$, the difference between the multi-information rate and the excess entropy, as an interesting quantity that measures the predictive benefit of model-building (that is, maintaining an internal state summarising past observations in order to make better predictions). They also identify $w_\mu = \rho_\mu + b_{\mu}$, which they call the \emph{local exogenous information} rate. \begin{fig}{wundt} \raisebox{-4em}{\colfig[0.43]{wundt}} % {\ \shortstack{{\Large$\longrightarrow$}\\ {\scriptsize\emph{exposure}}}\ } {\ {\large$\longrightarrow$}\ } \raisebox{-4em}{\colfig[0.43]{wundt2}} \caption{ The Wundt curve relating randomness/complexity with perceived value. Repeated exposure sometimes results in a move to the left along the curve \cite{Berlyne71}. } \end{fig} \subsection{First and higher order Markov chains} First order Markov chains are the simplest non-trivial models to which information dynamics methods can be applied. In \cite{AbdallahPlumbley2009} we derived expressions for all the information measures introduced [above] for irreducible stationary Markov chains (\ie that have a unique stationary distribution). The derivation is greatly simplified by the dependency structure of the Markov chain: for the purpose of the analysis, the `past' and `future' segments $\past{X}_t$ and $\fut{X})_t$ can be reduced to just the previous and next variables $X_{t-1}$ and $X_{t-1}$ respectively. We also showed that the predictive information rate can be expressed simply in terms of entropy rates: if we let $a$ denote the $K\times K$ transition matrix of a Markov chain over an alphabet of $\{1,\ldots,K\}$, such that $a_{ij} = \Pr(\ev(X_t=i|X_{t-1}=j))$, and let $h:\reals^{K\times K}\to \reals$ be the entropy rate function such that $h(a)$ is the entropy rate of a Markov chain with transition matrix $a$, then the predictive information rate $b(a)$ is \begin{equation} b(a) = h(a^2) - h(a), \end{equation} where $a^2$, the transition matrix squared, is the transition matrix of the `skip one' Markov chain obtained by jumping two steps at a time along the original chain. Second and higher order Markov chains can be treated in a similar way by transforming to a first order representation of the high order Markov chain. If we are dealing with an $N$th order model, this is done forming a new alphabet of size $K^N$ consisting of all possible $N$-tuples of symbols from the base alphabet of size $K$. An observation $\hat{x}_t$ in this new model represents a block of $N$ observations $(x_{t+1},\ldots,x_{t+N})$ from the base model. The next observation $\hat{x}_{t+1}$ represents the block of $N$ obtained by shifting the previous block along by one step. The new Markov of chain is parameterised by a sparse $K^N\times K^N$ transition matrix $\hat{a}$. The entropy rate of the first order system is the same as the entropy rate of the original order $N$ system, and its PIR is \begin{equation} b({\hat{a}}) = h({\hat{a}^{N+1}}) - N h({\hat{a}}), \end{equation} where $\hat{a}^{N+1}$ is the $(N+1)$th power of the first order transition matrix. \section{Information Dynamics in Analysis} \begin{fig}{twopages} \colfig[0.96]{matbase/fig9471} % update from mbc paper % \colfig[0.97]{matbase/fig72663}\\ % later update from mbc paper (Keith's new picks) \vspace*{1em} \colfig[0.97]{matbase/fig13377} % rule based analysis \caption{Analysis of \emph{Two Pages}. The thick vertical lines are the part boundaries as indicated in the score by the composer. The thin grey lines indicate changes in the melodic `figures' of which the piece is constructed. In the `model information rate' panel, the black asterisks mark the six most surprising moments selected by Keith Potter. The bottom panel shows a rule-based boundary strength analysis computed using Cambouropoulos' LBDM. All information measures are in nats and time is in notes. } \end{fig} \subsection{Musicological Analysis} In \cite{AbdallahPlumbley2009}, methods based on the theory described above were used to analysis two pieces of music in the minimalist style by Philip Glass: \emph{Two Pages} (1969) and \emph{Gradus} (1968). The analysis was done using a first-order Markov chain model, with the enhancement that the transition matrix of the model was allowed to evolve dynamically as the notes were processed, and was tracked (in a Bayesian way) as a \emph{distribution} over possible transition matrices, rather than a point estimate. The results are summarised in \figrf{twopages}: the upper four plots show the dynamically evolving subjective information measures as described in \secrf{surprise-info-seq} computed using a point estimate of the current transition matrix, but the fifth plot (the `model information rate') measures the information in each observation about the transition matrix. In \cite{AbdallahPlumbley2010b}, we showed that this `model information rate' is actually a component of the true IPI in a time-varying Markov chain, which was neglected when we computed the IPI from point estimates of the transition matrix as if the transition probabilities were constant. The peaks of the surprisingness and both components of the predictive information show good correspondence with structure of the piece both as marked in the score and as analysed by musicologist Keith Potter, who was asked to mark the six `most surprising moments' of the piece (shown as asterisks in the fifth plot)% \footnote{% Note that the boundary marked in the score at around note 5,400 is known to be anomalous; on the basis of a listening analysis, some musicologists [ref] have placed the boundary a few bars later, in agreement with our analysis.}. In contrast, the analyses shown in the lower two plots of \figrf{twopages}, obtained using two rule-based music segmentation algorithms, while clearly \emph{reflecting} the structure of the piece, do not \emph{segment} the piece, with no tendency to peaking of the boundary strength function at the boundaries in the piece. \begin{fig}{metre} % \scalebox{1}[1]{% \begin{tabular}{cc} \colfig[0.45]{matbase/fig36859} & \colfig[0.48]{matbase/fig88658} \\ \colfig[0.45]{matbase/fig48061} & \colfig[0.48]{matbase/fig46367} \\ \colfig[0.45]{matbase/fig99042} & \colfig[0.47]{matbase/fig87490} % \colfig[0.46]{matbase/fig56807} & \colfig[0.48]{matbase/fig27144} \\ % \colfig[0.46]{matbase/fig87574} & \colfig[0.48]{matbase/fig13651} \\ % \colfig[0.44]{matbase/fig19913} & \colfig[0.46]{matbase/fig66144} \\ % \colfig[0.48]{matbase/fig73098} & \colfig[0.48]{matbase/fig57141} \\ % \colfig[0.48]{matbase/fig25703} & \colfig[0.48]{matbase/fig72080} \\ % \colfig[0.48]{matbase/fig9142} & \colfig[0.48]{matbase/fig27751} \end{tabular}% % } \caption{Metrical analysis by computing average surprisingness and informative of notes at different periodicities (\ie hypothetical bar lengths) and phases (\ie positions within a bar). } \end{fig} \subsection{Content analysis/Sound Categorisation}. Overview of of information-theoretic approaches to music content analysis. \begin{itemize} \item Continuous domain information \item Audio based music expectation modelling \item Proposed model for Gaussian processes \end{itemize} \emph{Peter} \subsection{Beat Tracking} \emph{Andrew} \section{Information dynamics as compositional aid} In addition to applying information dynamics to analysis, it is also possible to apply it to the generation of content, such as to the composition of musical materials. The outputs of algorithmic or stochastic processes can be filtered to match a set of criteria defined in terms of the information dynamics model, this criteria thus becoming a means of interfacing with the generative process. For instance a stochastic music generating process could be controlled by modifying constraints on its output in terms of predictive information rate or entropy rate. The use of stochastic processes for the composition of musical material has been widespread for decades -- for instance Iannis Xenakis applied probabilistic mathematical models to the creation of musical materials\cite{Xenakis:1992ul}. Information dynamics can serve as a novel framework for the exploration of the possibilities of such processes at the high and abstract level of expectation, randomness and predictability. \subsection{The Melody Triangle} The Melody Triangle is an exploratory interface for the discovery of melodic content, where the input -- positions within a triangle -- directly map to information theoretic measures of the output. The measures -- entropy rate, redundancy and predictive information rate -- form a criteria with which to filter the output of the stochastic processes used to generate sequences of notes. These measures address notions of expectation and surprise in music, and as such the Melody Triangle is a means of interfacing with a generative process in terms of the predictability of its output. The triangle is `populated' with possible parameter values for melody generators. These are plotted in a 3d statistical space of redundancy, entropy rate and predictive information rate. In our case we generated thousands of transition matrixes, representing first-order Markov chains, by a random sampling method. In figure \ref{InfoDynEngine} we see a representation of how these matrixes are distributed in the 3d statistical space; each one of these points corresponds to a transition matrix. The distribution of transition matrixes plotted in this space forms an arch shape that is fairly thin. It thus becomes a reasonable approximation to pretend that it is just a sheet in two dimensions; and so we stretch out this curved arc into a flat triangle. It is this triangular sheet that is our `Melody Triangle' and forms the interface by which the system is controlled. Using this interface thus involves a mapping to statistical space; a user selects a position within the triangle, and a corresponding transition matrix is returned. Figure \ref{TheTriangle} shows how the triangle maps to different measures of redundancy, entropy rate and predictive information rate. Each corner corresponds to three different extremes of predictability and unpredictability, which could be loosely characterised as `periodicity', `noise' and `repetition'. Melodies from the `noise' corner have no discernible pattern; they have high entropy rate, low predictive information rate and low redundancy. These melodies are essentially totally random. A melody along the `periodicity' to `repetition' edge are all deterministic loops that get shorter as we approach the `repetition' corner, until it becomes just one repeating note. It is the areas in between the extremes that provide the more `interesting' melodies. These melodies have some level of unpredictability, but are not completely random. Or, conversely, are predictable, but not entirely so. The Melody Triangle exists in two incarnations; a standard screen based interface where a user moves tokens in and around a triangle on screen, and a multi-user interactive installation where a Kinect camera tracks individuals in a space and maps their positions in physical space to the triangle. In the latter visitors entering the installation generates a melody, and could collaborate with their co-visitors to generate musical textures -- a playful yet informative way to explore expectation and surprise in music. Additionally different gestures could be detected to change the tempo, register, instrumentation and periodicity of the output melody. \begin{figure} \centering \includegraphics[width=\linewidth]{figs/mtriscat} \caption{The population of transition matrices distributed along three axes of redundancy, entropy rate and predictive information rate (all measured in bits). The concentrations of points along the redundancy axis correspond to Markov chains which are roughly periodic with periods of 2 (redundancy 1 bit), 3, 4, \etc all the way to period 8 (redundancy 3 bits). The colour of each point represents its PIR---note that the highest values are found at intermediate entropy and redundancy, and that the distribution as a whole makes a curved triangle. Although not visible in this plot, it is largely hollow in the middle. \label{InfoDynEngine}} \end{figure} As a screen based interface the Melody Triangle can serve as composition tool. A triangle is drawn on the screen, screen space thus mapped to the statistical space of the Melody Triangle. A number of round tokens, each representing a melody can be dragged in and around the triangle. When a token is dragged into the triangle, the system will start generating the sequence of symbols with statistical properties that correspond to the position of the token. These symbols are then mapped to notes of a scale. Keyboard input allow for control over additionally parameters. \begin{figure} \centering \includegraphics[width=0.9\linewidth]{figs/TheTriangle.pdf} \caption{The Melody Triangle\label{TheTriangle}} \end{figure} In this mode, the Melody Triangle is a compositional tool. It can assist a composer in the creation not only of melodies, but by placing multiple tokens in the triangle, the generation of intricate musical textures. Unlike other computer aided composition tools or programming environments, here the composer engages with music on the high and abstract level of expectation, randomness and predictability. \section{Musical Preference and Information Dynamics} We carried out a preliminary study that sought to identify any correlation between aesthetic preference and the information theoretical measures of the Melody Triangle. In this study participants were asked to use the screen based interface but it was simplified so that all they could do was move tokens around. To help discount visual biases, the axes of the triangle would be randomly rearranged for each participant.\emph{self-plagiarised} The study was divided in to two parts, the first investigated musical preference with respect to single melodies at different tempos. In the second part of the study, a background melody is playing and the participants are asked to continue playing with the system under the implicit assumption that they will try to find a second melody that works well with the background melody. For each participant this was done four times, each with a different background melody from four different areas of the Melody Triangle. For all parts of the study the participants were asked to signal, by pressing the space bar, whenever they liked what they were hearing.\emph{self-plagiarised} \emph{todo - results} \section{Information Dynamics as Evaluative Feedback Mechanism} \emph{todo - code the info dyn evaluator :) } It is possible to use information dynamics measures to develop a kind of `critic' that would evaluate a stream of symbols. For instance we could develop a system to notify us if a stream of symbols is too boring, either because they are too repetitive or too chaotic. This could be used to evaluate both pre-composed streams of symbols, or could even be used to provide real-time feedback in an improvisatory setup. \emph{comparable system} Gordon Pask's Musicolor (1953) applied a similar notion of boredom in its design. The Musicolour would react to audio input through a microphone by flashing coloured lights. Rather than a direct mapping of sound to light, Pask designed the device to be a partner to a performing musician. It would adapt its lighting pattern based on the rhythms and frequencies it would hear, quickly `learning' to flash in time with the music. However Pask endowed the device with the ability to `be bored'; if the rhythmic and frequency content of the input remained the same for too long it would listen for other rhythms and frequencies, only lighting when it heard these. As the Musicolour would `get bored', the musician would have to change and vary their playing, eliciting new and unexpected outputs in trying to keep the Musicolour interested. In a similar vein, our \emph{Information Dynamics Critic}(name?) allows for an evaluative measure of an input stream, however containing a more sophisticated notion of boredom that \dots \section{Conclusion} \bibliographystyle{unsrt} {\bibliography{all,c4dm,nime}} \end{document}