samer@41: \documentclass[conference]{IEEEtran} samer@4: \usepackage{cite} samer@4: \usepackage[cmex10]{amsmath} samer@4: \usepackage{graphicx} samer@4: \usepackage{amssymb} samer@4: \usepackage{epstopdf} samer@4: \usepackage{url} samer@4: \usepackage{listings} samer@18: %\usepackage[expectangle]{tools} samer@9: \usepackage{tools} samer@18: \usepackage{tikz} samer@18: \usetikzlibrary{calc} samer@18: \usetikzlibrary{matrix} samer@18: \usetikzlibrary{patterns} samer@18: \usetikzlibrary{arrows} samer@9: samer@9: \let\citep=\cite samer@33: \newcommand{\colfig}[2][1]{\includegraphics[width=#1\linewidth]{figs/#2}}% samer@18: \newcommand\preals{\reals_+} samer@18: \newcommand\X{\mathcal{X}} samer@18: \newcommand\Y{\mathcal{Y}} samer@18: \newcommand\domS{\mathcal{S}} samer@18: \newcommand\A{\mathcal{A}} samer@25: \newcommand\Data{\mathcal{D}} samer@18: \newcommand\rvm[1]{\mathrm{#1}} samer@18: \newcommand\sps{\,.\,} samer@18: \newcommand\Ipred{\mathcal{I}_{\mathrm{pred}}} samer@18: \newcommand\Ix{\mathcal{I}} samer@18: \newcommand\IXZ{\overline{\underline{\mathcal{I}}}} samer@18: \newcommand\x{\vec{x}} samer@18: \newcommand\Ham[1]{\mathcal{H}_{#1}} samer@18: \newcommand\subsets[2]{[#1]^{(k)}} samer@18: \def\bet(#1,#2){#1..#2} samer@18: samer@18: samer@18: \def\ev(#1=#2){#1\!\!=\!#2} samer@18: \newcommand\rv[1]{\Omega \to #1} samer@18: \newcommand\ceq{\!\!=\!} samer@18: \newcommand\cmin{\!-\!} samer@18: \newcommand\modulo[2]{#1\!\!\!\!\!\mod#2} samer@18: samer@18: \newcommand\sumitoN{\sum_{i=1}^N} samer@18: \newcommand\sumktoK{\sum_{k=1}^K} samer@18: \newcommand\sumjtoK{\sum_{j=1}^K} samer@18: \newcommand\sumalpha{\sum_{\alpha\in\A}} samer@18: \newcommand\prodktoK{\prod_{k=1}^K} samer@18: \newcommand\prodjtoK{\prod_{j=1}^K} samer@18: samer@18: \newcommand\past[1]{\overset{\rule{0pt}{0.2em}\smash{\leftarrow}}{#1}} samer@18: \newcommand\fut[1]{\overset{\rule{0pt}{0.1em}\smash{\rightarrow}}{#1}} samer@18: \newcommand\parity[2]{P^{#1}_{2,#2}} samer@4: samer@4: %\usepackage[parfill]{parskip} samer@4: samer@4: \begin{document} samer@41: \title{Cognitive Music Modelling: an\\Information Dynamics Approach} samer@4: samer@4: \author{ hekeus@16: \IEEEauthorblockN{Samer A. Abdallah, Henrik Ekeus, Peter Foster} hekeus@16: \IEEEauthorblockN{Andrew Robertson and Mark D. Plumbley} samer@4: \IEEEauthorblockA{Centre for Digital Music\\ samer@4: Queen Mary University of London\\ samer@41: Mile End Road, London E1 4NS}} samer@4: samer@4: \maketitle samer@18: \begin{abstract} samer@18: People take in information when perceiving music. With it they continually samer@18: build predictive models of what is going to happen. There is a relationship samer@18: between information measures and how we perceive music. An information samer@18: theoretic approach to music cognition is thus a fruitful avenue of research. samer@18: In this paper, we review the theoretical foundations of information dynamics samer@18: and discuss a few emerging areas of application. hekeus@16: \end{abstract} samer@4: samer@4: samer@25: \section{Introduction} samer@9: \label{s:Intro} samer@9: samer@25: \subsection{Expectation and surprise in music} samer@18: One of the effects of listening to music is to create samer@18: expectations of what is to come next, which may be fulfilled samer@9: immediately, after some delay, or not at all as the case may be. samer@9: This is the thesis put forward by, amongst others, music theorists samer@18: L. B. Meyer \cite{Meyer67} and Narmour \citep{Narmour77}, but was samer@18: recognised much earlier; for example, samer@9: it was elegantly put by Hanslick \cite{Hanslick1854} in the samer@9: nineteenth century: samer@9: \begin{quote} samer@9: `The most important factor in the mental process which accompanies the samer@9: act of listening to music, and which converts it to a source of pleasure, samer@18: is \ldots the intellectual satisfaction samer@9: which the listener derives from continually following and anticipating samer@9: the composer's intentions---now, to see his expectations fulfilled, and samer@18: now, to find himself agreeably mistaken. samer@18: %It is a matter of course that samer@18: %this intellectual flux and reflux, this perpetual giving and receiving samer@18: %takes place unconsciously, and with the rapidity of lightning-flashes.' samer@9: \end{quote} samer@9: An essential aspect of this is that music is experienced as a phenomenon samer@9: that `unfolds' in time, rather than being apprehended as a static object samer@9: presented in its entirety. Meyer argued that musical experience depends samer@9: on how we change and revise our conceptions \emph{as events happen}, on samer@9: how expectation and prediction interact with occurrence, and that, to a samer@9: large degree, the way to understand the effect of music is to focus on samer@9: this `kinetics' of expectation and surprise. samer@9: samer@25: Prediction and expectation are essentially probabilistic concepts samer@25: and can be treated mathematically using probability theory. samer@25: We suppose that when we listen to music, expectations are created on the basis samer@25: of our familiarity with various styles of music and our ability to samer@25: detect and learn statistical regularities in the music as they emerge, samer@25: There is experimental evidence that human listeners are able to internalise samer@25: statistical knowledge about musical structure, \eg samer@25: \citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also samer@25: that statistical models can form an effective basis for computational samer@25: analysis of music, \eg samer@25: \cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}. samer@25: samer@25: samer@25: \comment{ samer@9: The business of making predictions and assessing surprise is essentially samer@9: one of reasoning under conditions of uncertainty and manipulating samer@9: degrees of belief about the various proposition which may or may not samer@9: hold, and, as has been argued elsewhere \cite{Cox1946,Jaynes27}, best samer@9: quantified in terms of Bayesian probability theory. samer@9: Thus, we suppose that samer@9: when we listen to music, expectations are created on the basis of our samer@24: familiarity with various stylistic norms that apply to music in general, samer@24: the particular style (or styles) of music that seem best to fit the piece samer@24: we are listening to, and samer@9: the emerging structures peculiar to the current piece. There is samer@9: experimental evidence that human listeners are able to internalise samer@9: statistical knowledge about musical structure, \eg samer@9: \citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also samer@9: that statistical models can form an effective basis for computational samer@9: analysis of music, \eg samer@9: \cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}. samer@25: } samer@9: samer@9: \subsection{Music and information theory} samer@24: With a probabilistic framework for music modelling and prediction in hand, samer@25: we are in a position to apply Shannon's quantitative information theory samer@25: \cite{Shannon48}. samer@25: \comment{ samer@25: which provides us with a number of measures, such as entropy samer@25: and mutual information, which are suitable for quantifying states of samer@25: uncertainty and surprise, and thus could potentially enable us to build samer@25: quantitative models of the listening process described above. They are samer@25: what Berlyne \cite{Berlyne71} called `collative variables' since they are samer@25: to do with patterns of occurrence rather than medium-specific details. samer@25: Berlyne sought to show that the collative variables are closely related to samer@25: perceptual qualities like complexity, tension, interestingness, samer@25: and even aesthetic value, not just in music, but in other temporal samer@25: or visual media. samer@25: The relevance of information theory to music and art has samer@25: also been addressed by researchers from the 1950s onwards samer@25: \cite{Youngblood58,CoonsKraehenbuehl1958,Cohen1962,HillerBean66,Moles66,Meyer67}. samer@25: } samer@9: The relationship between information theory and music and art in general has been the samer@9: subject of some interest since the 1950s samer@9: \cite{Youngblood58,CoonsKraehenbuehl1958,HillerBean66,Moles66,Meyer67,Cohen1962}. samer@9: The general thesis is that perceptible qualities and subjective samer@9: states like uncertainty, surprise, complexity, tension, and interestingness samer@9: are closely related to samer@9: information-theoretic quantities like entropy, relative entropy, samer@9: and mutual information. samer@9: % and are major determinants of the overall experience. samer@9: Berlyne \cite{Berlyne71} called such quantities `collative variables', since samer@9: they are to do with patterns of occurrence rather than medium-specific details, samer@9: and developed the ideas of `information aesthetics' in an experimental setting. samer@9: % Berlyne's `new experimental aesthetics', the `information-aestheticians'. samer@9: samer@9: % Listeners then experience greater or lesser levels of surprise samer@9: % in response to departures from these norms. samer@9: % By careful manipulation samer@9: % of the material, the composer can thus define, and induce within the samer@9: % listener, a temporal programme of varying samer@9: % levels of uncertainty, ambiguity and surprise. samer@9: samer@9: samer@9: \subsection{Information dynamic approach} samer@9: samer@24: Bringing the various strands together, our working hypothesis is that as a samer@24: listener (to which will refer as `it') listens to a piece of music, it maintains samer@25: a dynamically evolving probabilistic model that enables it to make predictions samer@24: about how the piece will continue, relying on both its previous experience samer@24: of music and the immediate context of the piece. As events unfold, it revises samer@25: its probabilistic belief state, which includes predictive samer@25: distributions over possible future events. These samer@25: % distributions and changes in distributions samer@25: can be characterised in terms of a handful of information samer@25: theoretic-measures such as entropy and relative entropy. By tracing the samer@24: evolution of a these measures, we obtain a representation which captures much samer@25: of the significant structure of the music. samer@25: samer@25: One of the consequences of this approach is that regardless of the details of samer@25: the sensory input or even which sensory modality is being processed, the resulting samer@25: analysis is in terms of the same units: quantities of information (bits) and samer@25: rates of information flow (bits per second). The probabilistic and information samer@25: theoretic concepts in terms of which the analysis is framed are universal to all sorts samer@25: of data. samer@25: In addition, when adaptive probabilistic models are used, expectations are samer@25: created mainly in response to to \emph{patterns} of occurence, samer@25: rather the details of which specific things occur. samer@25: Together, these suggest that an information dynamic analysis captures a samer@25: high level of \emph{abstraction}, and could be used to samer@25: make structural comparisons between different temporal media, samer@25: such as music, film, animation, and dance. samer@25: % analyse and compare information samer@25: % flow in different temporal media regardless of whether they are auditory, samer@25: % visual or otherwise. samer@9: samer@25: Another consequence is that the information dynamic approach gives us a principled way samer@24: to address the notion of \emph{subjectivity}, since the analysis is dependent on the samer@24: probability model the observer starts off with, which may depend on prior experience samer@24: or other factors, and which may change over time. Thus, inter-subject variablity and samer@24: variation in subjects' responses over time are samer@24: fundamental to the theory. samer@9: samer@18: %modelling the creative process, which often alternates between generative samer@18: %and selective or evaluative phases \cite{Boden1990}, and would have samer@18: %applications in tools for computer aided composition. samer@18: samer@18: samer@18: \section{Theoretical review} samer@18: samer@34: \subsection{Entropy and information} samer@41: \label{s:entro-info} samer@41: samer@34: Let $X$ denote some variable whose value is initially unknown to our samer@34: hypothetical observer. We will treat $X$ mathematically as a random variable, samer@36: with a value to be drawn from some set $\X$ and a samer@34: probability distribution representing the observer's beliefs about the samer@34: true value of $X$. samer@34: In this case, the observer's uncertainty about $X$ can be quantified samer@34: as the entropy of the random variable $H(X)$. For a discrete variable samer@36: with probability mass function $p:\X \to [0,1]$, this is samer@34: \begin{equation} samer@41: H(X) = \sum_{x\in\X} -p(x) \log p(x), % = \expect{-\log p(X)}, samer@34: \end{equation} samer@41: % where $\expect{}$ is the expectation operator. samer@41: The negative-log-probability samer@34: $\ell(x) = -\log p(x)$ of a particular value $x$ can usefully be thought of as samer@34: the \emph{surprisingness} of the value $x$ should it be observed, and samer@41: hence the entropy is the expectation of the surprisingness $\expect \ell(X)$. samer@34: samer@34: Now suppose that the observer receives some new data $\Data$ that samer@34: causes a revision of its beliefs about $X$. The \emph{information} samer@34: in this new data \emph{about} $X$ can be quantified as the samer@34: Kullback-Leibler (KL) divergence between the prior and posterior samer@34: distributions $p(x)$ and $p(x|\Data)$ respectively: samer@34: \begin{equation} samer@34: \mathcal{I}_{\Data\to X} = D(p_{X|\Data} || p_{X}) samer@36: = \sum_{x\in\X} p(x|\Data) \log \frac{p(x|\Data)}{p(x)}. samer@41: \label{eq:info} samer@34: \end{equation} samer@34: When there are multiple variables $X_1, X_2$ samer@34: \etc which the observer believes to be dependent, then the observation of samer@34: one may change its beliefs and hence yield information about the samer@34: others. The joint and conditional entropies as described in any samer@34: textbook on information theory (\eg \cite{CoverThomas}) then quantify samer@34: the observer's expected uncertainty about groups of variables given the samer@34: values of others. In particular, the \emph{mutual information} samer@34: $I(X_1;X_2)$ is both the expected information samer@34: in an observation of $X_2$ about $X_1$ and the expected reduction samer@34: in uncertainty about $X_1$ after observing $X_2$: samer@34: \begin{equation} samer@34: I(X_1;X_2) = H(X_1) - H(X_1|X_2), samer@34: \end{equation} samer@34: where $H(X_1|X_2) = H(X_1,X_2) - H(X_2)$ is the conditional entropy samer@34: of $X_2$ given $X_1$. A little algebra shows that $I(X_1;X_2)=I(X_2;X_1)$ samer@34: and so the mutual information is symmetric in its arguments. A conditional samer@34: form of the mutual information can be formulated analogously: samer@34: \begin{equation} samer@34: I(X_1;X_2|X_3) = H(X_1|X_3) - H(X_1|X_2,X_3). samer@34: \end{equation} samer@34: These relationships between the various entropies and mutual samer@34: informations are conveniently visualised in Venn diagram-like \emph{information diagrams} samer@34: or I-diagrams \cite{Yeung1991} such as the one in \figrf{venn-example}. samer@34: samer@18: \begin{fig}{venn-example} samer@18: \newcommand\rad{2.2em}% samer@18: \newcommand\circo{circle (3.4em)}% samer@18: \newcommand\labrad{4.3em} samer@18: \newcommand\bound{(-6em,-5em) rectangle (6em,6em)} samer@18: \newcommand\colsep{\ } samer@18: \newcommand\clipin[1]{\clip (#1) \circo;}% samer@18: \newcommand\clipout[1]{\clip \bound (#1) \circo;}% samer@18: \newcommand\cliptwo[3]{% samer@18: \begin{scope} samer@18: \clipin{#1}; samer@18: \clipin{#2}; samer@18: \clipout{#3}; samer@18: \fill[black!30] \bound; samer@18: \end{scope} samer@18: }% samer@18: \newcommand\clipone[3]{% samer@18: \begin{scope} samer@18: \clipin{#1}; samer@18: \clipout{#2}; samer@18: \clipout{#3}; samer@18: \fill[black!15] \bound; samer@18: \end{scope} samer@18: }% samer@18: \begin{tabular}{c@{\colsep}c} samer@18: \begin{tikzpicture}[baseline=0pt] samer@18: \coordinate (p1) at (90:\rad); samer@18: \coordinate (p2) at (210:\rad); samer@18: \coordinate (p3) at (-30:\rad); samer@18: \clipone{p1}{p2}{p3}; samer@18: \clipone{p2}{p3}{p1}; samer@18: \clipone{p3}{p1}{p2}; samer@18: \cliptwo{p1}{p2}{p3}; samer@18: \cliptwo{p2}{p3}{p1}; samer@18: \cliptwo{p3}{p1}{p2}; samer@18: \begin{scope} samer@18: \clip (p1) \circo; samer@18: \clip (p2) \circo; samer@18: \clip (p3) \circo; samer@18: \fill[black!45] \bound; samer@18: \end{scope} samer@18: \draw (p1) \circo; samer@18: \draw (p2) \circo; samer@18: \draw (p3) \circo; samer@18: \path samer@18: (barycentric cs:p3=1,p1=-0.2,p2=-0.1) +(0ex,0) node {$I_{3|12}$} samer@18: (barycentric cs:p1=1,p2=-0.2,p3=-0.1) +(0ex,0) node {$I_{1|23}$} samer@18: (barycentric cs:p2=1,p3=-0.2,p1=-0.1) +(0ex,0) node {$I_{2|13}$} samer@18: (barycentric cs:p3=1,p2=1,p1=-0.55) +(0ex,0) node {$I_{23|1}$} samer@18: (barycentric cs:p1=1,p3=1,p2=-0.55) +(0ex,0) node {$I_{13|2}$} samer@18: (barycentric cs:p2=1,p1=1,p3=-0.55) +(0ex,0) node {$I_{12|3}$} samer@18: (barycentric cs:p3=1,p2=1,p1=1) node {$I_{123}$} samer@18: ; samer@18: \path samer@18: (p1) +(140:\labrad) node {$X_1$} samer@18: (p2) +(-140:\labrad) node {$X_2$} samer@18: (p3) +(-40:\labrad) node {$X_3$}; samer@18: \end{tikzpicture} samer@18: & samer@18: \parbox{0.5\linewidth}{ samer@18: \small samer@18: \begin{align*} samer@18: I_{1|23} &= H(X_1|X_2,X_3) \\ samer@18: I_{13|2} &= I(X_1;X_3|X_2) \\ samer@18: I_{1|23} + I_{13|2} &= H(X_1|X_2) \\ samer@18: I_{12|3} + I_{123} &= I(X_1;X_2) samer@18: \end{align*} samer@18: } samer@18: \end{tabular} samer@18: \caption{ samer@30: I-diagram visualisation of entropies and mutual informations samer@18: for three random variables $X_1$, $X_2$ and $X_3$. The areas of samer@18: the three circles represent $H(X_1)$, $H(X_2)$ and $H(X_3)$ respectively. samer@18: The total shaded area is the joint entropy $H(X_1,X_2,X_3)$. samer@18: The central area $I_{123}$ is the co-information \cite{McGill1954}. samer@18: Some other information measures are indicated in the legend. samer@18: } samer@18: \end{fig} samer@30: samer@30: samer@36: \subsection{Surprise and information in sequences} samer@36: \label{s:surprise-info-seq} samer@30: samer@36: Suppose that $(\ldots,X_{-1},X_0,X_1,\ldots)$ is a sequence of samer@30: random variables, infinite in both directions, samer@36: and that $\mu$ is the associated probability measure over all samer@36: realisations of the sequence---in the following, $\mu$ will simply serve samer@30: as a label for the process. We can indentify a number of information-theoretic samer@30: measures meaningful in the context of a sequential observation of the sequence, during samer@36: which, at any time $t$, the sequence of variables can be divided into a `present' $X_t$, a `past' samer@30: $\past{X}_t \equiv (\ldots, X_{t-2}, X_{t-1})$, and a `future' samer@30: $\fut{X}_t \equiv (X_{t+1},X_{t+2},\ldots)$. samer@41: We will write the actually observed value of $X_t$ as $x_t$, and samer@36: the sequence of observations up to but not including $x_t$ as samer@36: $\past{x}_t$. samer@36: % Since the sequence is assumed stationary, we can without loss of generality, samer@36: % assume that $t=0$ in the following definitions. samer@36: samer@41: The in-context surprisingness of the observation $X_t=x_t$ depends on samer@41: both $x_t$ and the context $\past{x}_t$: samer@36: \begin{equation} samer@41: \ell_t = - \log p(x_t|\past{x}_t). samer@36: \end{equation} samer@36: However, before $X_t$ is observed to be $x_t$, the observer can compute samer@36: its \emph{expected} surprisingness as a measure of its uncertainty about samer@36: the very next event; this may be written as an entropy samer@36: $H(X_t|\ev(\past{X}_t = \past{x}_t))$, but note that this is samer@36: conditional on the \emph{event} $\ev(\past{X}_t=\past{x}_t)$, not samer@36: \emph{variables} $\past{X}_t$ as in the conventional conditional entropy. samer@36: samer@41: The surprisingness $\ell_t$ and expected surprisingness samer@36: $H(X_t|\ev(\past{X}_t=\past{x}_t))$ samer@41: can be understood as \emph{subjective} information dynamic measures, since they are samer@41: based on the observer's probability model in the context of the actually observed sequence samer@36: $\past{x}_t$---they characterise what it is like to `be in the observer's shoes'. samer@36: If we view the observer as a purely passive or reactive agent, this would samer@36: probably be sufficient, but for active agents such as humans or animals, it is samer@36: often necessary to \emph{aniticipate} future events in order, for example, to plan the samer@36: most effective course of action. It makes sense for such observers to be samer@36: concerned about the predictive probability distribution over future events, samer@36: $p(\fut{x}_t|\past{x}_t)$. When an observation $\ev(X_t=x_t)$ is made in this context, samer@41: the \emph{instantaneous predictive information} (IPI) $\mathcal{I}_t$ at time $t$ samer@41: is the information in the event $\ev(X_t=x_t)$ about the entire future of the sequence $\fut{X}_t$, samer@41: \emph{given} the observed past $\past{X}_t=\past{x}_t$. samer@41: Referring to the definition of information \eqrf{info}, this is the KL divergence samer@41: between prior and posterior distributions over possible futures, which written out in full, is samer@41: \begin{equation} samer@41: \mathcal{I}_t = \sum_{\fut{x}_t \in \X^*} samer@41: p(\fut{x}_t|x_t,\past{x}_t) \log \frac{ p(\fut{x}_t|x_t,\past{x}_t) }{ p(\fut{x}_t|\past{x}_t) }, samer@41: \end{equation} samer@41: where the sum is to be taken over the set of infinite sequences $\X^*$. samer@41: As with the surprisingness, the observer can compute its \emph{expected} IPI samer@41: at time $t$, which reduces to a mutual information $I(X_t;\fut{X}_t|\ev(\past{X}_t=\past{x}_t))$ samer@41: conditioned on the observed past. This could be used, for example, as an estimate samer@41: of attentional resources which should be directed at this stream of data, which may samer@41: be in competition with other sensory streams. samer@36: samer@36: \subsection{Information measures for stationary random processes} samer@30: samer@18: samer@18: \begin{fig}{predinfo-bg} samer@18: \newcommand\subfig[2]{\shortstack{#2\\[0.75em]#1}} samer@18: \newcommand\rad{1.8em}% samer@18: \newcommand\ovoid[1]{% samer@18: ++(-#1,\rad) samer@18: -- ++(2 * #1,0em) arc (90:-90:\rad) samer@18: -- ++(-2 * #1,0em) arc (270:90:\rad) samer@18: }% samer@18: \newcommand\axis{2.75em}% samer@18: \newcommand\olap{0.85em}% samer@18: \newcommand\offs{3.6em} samer@18: \newcommand\colsep{\hspace{5em}} samer@18: \newcommand\longblob{\ovoid{\axis}} samer@18: \newcommand\shortblob{\ovoid{1.75em}} samer@18: \begin{tabular}{c@{\colsep}c} samer@18: \subfig{(a) excess entropy}{% samer@18: \newcommand\blob{\longblob} samer@18: \begin{tikzpicture} samer@18: \coordinate (p1) at (-\offs,0em); samer@18: \coordinate (p2) at (\offs,0em); samer@18: \begin{scope} samer@18: \clip (p1) \blob; samer@18: \clip (p2) \blob; samer@18: \fill[lightgray] (-1,-1) rectangle (1,1); samer@18: \end{scope} samer@18: \draw (p1) +(-0.5em,0em) node{\shortstack{infinite\\past}} \blob; samer@18: \draw (p2) +(0.5em,0em) node{\shortstack{infinite\\future}} \blob; samer@18: \path (0,0) node (future) {$E$}; samer@18: \path (p1) +(-2em,\rad) node [anchor=south] {$\ldots,X_{-1}$}; samer@18: \path (p2) +(2em,\rad) node [anchor=south] {$X_0,\ldots$}; samer@18: \end{tikzpicture}% samer@18: }% samer@18: \\[1.25em] samer@18: \subfig{(b) predictive information rate $b_\mu$}{% samer@18: \begin{tikzpicture}%[baseline=-1em] samer@18: \newcommand\rc{2.1em} samer@18: \newcommand\throw{2.5em} samer@18: \coordinate (p1) at (210:1.5em); samer@18: \coordinate (p2) at (90:0.7em); samer@18: \coordinate (p3) at (-30:1.5em); samer@18: \newcommand\bound{(-7em,-2.6em) rectangle (7em,3.0em)} samer@18: \newcommand\present{(p2) circle (\rc)} samer@18: \newcommand\thepast{(p1) ++(-\throw,0) \ovoid{\throw}} samer@18: \newcommand\future{(p3) ++(\throw,0) \ovoid{\throw}} samer@18: \newcommand\fillclipped[2]{% samer@18: \begin{scope}[even odd rule] samer@18: \foreach \thing in {#2} {\clip \thing;} samer@18: \fill[black!#1] \bound; samer@18: \end{scope}% samer@18: }% samer@18: \fillclipped{30}{\present,\future,\bound \thepast} samer@18: \fillclipped{15}{\present,\bound \future,\bound \thepast} samer@18: \draw \future; samer@18: \fillclipped{45}{\present,\thepast} samer@18: \draw \thepast; samer@18: \draw \present; samer@18: \node at (barycentric cs:p2=1,p1=-0.17,p3=-0.17) {$r_\mu$}; samer@18: \node at (barycentric cs:p1=-0.4,p2=1.0,p3=1) {$b_\mu$}; samer@18: \node at (barycentric cs:p3=0,p2=1,p1=1.2) [shape=rectangle,fill=black!45,inner sep=1pt]{$\rho_\mu$}; samer@18: \path (p2) +(140:3em) node {$X_0$}; samer@18: % \node at (barycentric cs:p3=0,p2=1,p1=1) {$\rho_\mu$}; samer@18: \path (p3) +(3em,0em) node {\shortstack{infinite\\future}}; samer@18: \path (p1) +(-3em,0em) node {\shortstack{infinite\\past}}; samer@18: \path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$}; samer@18: \path (p3) +(4em,\rad) node [anchor=south] {$X_1,\ldots$}; samer@18: \end{tikzpicture}}% samer@18: \\[0.5em] samer@18: \end{tabular} samer@18: \caption{ samer@30: I-diagrams for several information measures in samer@18: stationary random processes. Each circle or oval represents a random samer@18: variable or sequence of random variables relative to time $t=0$. Overlapped areas samer@18: correspond to various mutual information as in \Figrf{venn-example}. samer@33: In (b), the circle represents the `present'. Its total area is samer@33: $H(X_0)=\rho_\mu+r_\mu+b_\mu$, where $\rho_\mu$ is the multi-information samer@18: rate, $r_\mu$ is the residual entropy rate, and $b_\mu$ is the predictive samer@18: information rate. The entropy rate is $h_\mu = r_\mu+b_\mu$. samer@18: } samer@18: \end{fig} samer@18: samer@41: If we step back, out of the observer's shoes as it were, and consider the samer@41: random process $(\ldots,X_{-1},X_0,X_1,\dots)$ as a statistical ensemble of samer@41: possible realisations, and furthermore assume that it is stationary, samer@41: then it becomes possible to define a number of information-theoretic measures, samer@41: closely related to those described above, but which characterise the samer@41: process as a whole, rather than on a moment-by-moment basis. Some of these, samer@41: such as the entropy rate, are well-known, but others are only recently being samer@41: investigated. (In the following, the assumption of stationarity means that samer@41: the measures defined below are independent of $t$.) samer@41: samer@41: The \emph{entropy rate} of the process is the entropy of the next variable samer@41: $X_t$ given all the previous ones. samer@41: \begin{equation} samer@41: \label{eq:entro-rate} samer@41: h_\mu = H(X_t|\past{X}_t). samer@41: \end{equation} samer@41: The entropy rate gives a measure of the overall randomness samer@41: or unpredictability of the process. samer@41: samer@41: The \emph{multi-information rate} $\rho_\mu$ (following Dubnov's \cite{Dubnov2006} samer@41: notation for what he called the `information rate') is the mutual samer@41: information between the `past' and the `present': samer@41: \begin{equation} samer@41: \label{eq:multi-info} samer@41: \rho_\mu = I(\past{X}_t;X_t) = H(X_t) - h_\mu. samer@41: \end{equation} samer@41: It is a measure of how much the context of an observation (that is, samer@41: the observation of previous elements of the sequence) helps in predicting samer@41: or reducing the suprisingness of the current observation. samer@41: samer@41: The \emph{excess entropy} \cite{CrutchfieldPackard1983} samer@41: is the mutual information between samer@41: the entire `past' and the entire `future': samer@41: \begin{equation} samer@41: E = I(\past{X}_t; X_t,\fut{X}_t). samer@41: \end{equation} samer@41: samer@41: samer@30: The \emph{predictive information rate} (or PIR) \cite{AbdallahPlumbley2009} samer@30: is the average information in one observation about the infinite future given the infinite past, samer@30: and is defined as a conditional mutual information: samer@18: \begin{equation} samer@18: \label{eq:PIR} samer@41: b_\mu = I(X_t;\fut{X}_t|\past{X}_t) = H(\fut{X}_t|\past{X}_t) - H(\fut{X}_t|X_t,\past{X}_t). samer@18: \end{equation} samer@18: Equation \eqrf{PIR} can be read as the average reduction samer@18: in uncertainty about the future on learning $X_t$, given the past. samer@18: Due to the symmetry of the mutual information, it can also be written samer@18: as samer@18: \begin{equation} samer@18: % \IXZ_t samer@41: I(X_t;\fut{X}_t|\past{X}_t) = h_\mu - r_\mu, samer@18: % \label{<++>} samer@18: \end{equation} samer@18: % If $X$ is stationary, then samer@41: where $r_\mu = H(X_t|\fut{X}_t,\past{X}_t)$, samer@34: is the \emph{residual} \cite{AbdallahPlumbley2010}, samer@34: or \emph{erasure} \cite{VerduWeissman2006} entropy rate. samer@18: These relationships are illustrated in \Figrf{predinfo-bg}, along with samer@18: several of the information measures we have discussed so far. samer@18: samer@18: samer@25: James et al \cite{JamesEllisonCrutchfield2011} study the predictive information samer@25: rate and also examine some related measures. In particular they identify the samer@25: $\sigma_\mu$, the difference between the multi-information rate and the excess samer@25: entropy, as an interesting quantity that measures the predictive benefit of samer@25: model-building (that is, maintaining an internal state summarising past samer@25: observations in order to make better predictions). They also identify samer@25: $w_\mu = \rho_\mu + b_{\mu}$, which they call the \emph{local exogenous samer@30: information} rate. samer@24: samer@34: \begin{fig}{wundt} samer@34: \raisebox{-4em}{\colfig[0.43]{wundt}} samer@34: % {\ \shortstack{{\Large$\longrightarrow$}\\ {\scriptsize\emph{exposure}}}\ } samer@34: {\ {\large$\longrightarrow$}\ } samer@34: \raisebox{-4em}{\colfig[0.43]{wundt2}} samer@34: \caption{ samer@34: The Wundt curve relating randomness/complexity with samer@34: perceived value. Repeated exposure sometimes results samer@34: in a move to the left along the curve \cite{Berlyne71}. samer@34: } samer@34: \end{fig} samer@34: samer@4: samer@36: \subsection{First and higher order Markov chains} samer@36: First order Markov chains are the simplest non-trivial models to which information samer@36: dynamics methods can be applied. In \cite{AbdallahPlumbley2009} we derived samer@41: expressions for all the information measures described in \secrf{surprise-info-seq} for samer@36: irreducible stationary Markov chains (\ie that have a unique stationary samer@36: distribution). The derivation is greatly simplified by the dependency structure samer@36: of the Markov chain: for the purpose of the analysis, the `past' and `future' samer@41: segments $\past{X}_t$ and $\fut{X}_t$ can be collapsed to just the previous samer@36: and next variables $X_{t-1}$ and $X_{t-1}$ respectively. We also showed that samer@36: the predictive information rate can be expressed simply in terms of entropy rates: samer@36: if we let $a$ denote the $K\times K$ transition matrix of a Markov chain over samer@36: an alphabet of $\{1,\ldots,K\}$, such that samer@36: $a_{ij} = \Pr(\ev(X_t=i|X_{t-1}=j))$, and let $h:\reals^{K\times K}\to \reals$ be samer@36: the entropy rate function such that $h(a)$ is the entropy rate of a Markov chain samer@36: with transition matrix $a$, then the predictive information rate $b(a)$ is samer@36: \begin{equation} samer@36: b(a) = h(a^2) - h(a), samer@36: \end{equation} samer@36: where $a^2$, the transition matrix squared, is the transition matrix samer@36: of the `skip one' Markov chain obtained by jumping two steps at a time samer@36: along the original chain. samer@36: samer@36: Second and higher order Markov chains can be treated in a similar way by transforming samer@36: to a first order representation of the high order Markov chain. If we are dealing samer@36: with an $N$th order model, this is done forming a new alphabet of size $K^N$ samer@41: consisting of all possible $N$-tuples of symbols from the base alphabet. samer@41: An observation $\hat{x}_t$ in this new model encodes a block of $N$ observations samer@36: $(x_{t+1},\ldots,x_{t+N})$ from the base model. The next samer@41: observation $\hat{x}_{t+1}$ encodes the block of $N$ obtained by shifting the previous samer@36: block along by one step. The new Markov of chain is parameterised by a sparse $K^N\times K^N$ samer@41: transition matrix $\hat{a}$. Adopting the label $\mu$ for the order $N$ system, samer@41: we obtain: samer@36: \begin{equation} samer@41: h_\mu = h(\hat{a}), \qquad b_\mu = h({\hat{a}^{N+1}}) - N h({\hat{a}}), samer@36: \end{equation} samer@36: where $\hat{a}^{N+1}$ is the $(N+1)$th power of the first order transition matrix. samer@41: Other information measures can also be computed for the high-order Markov chain, including samer@41: the multi-information rate $\rho_\mu$ and the excess entropy $E$. These are identical samer@41: for first order Markov chains, but for order $N$ chains, $E$ can be up to $N$ times larger samer@41: than $\rho_\mu$. samer@36: samer@36: hekeus@16: \section{Information Dynamics in Analysis} samer@4: samer@24: \begin{fig}{twopages} samer@33: \colfig[0.96]{matbase/fig9471} % update from mbc paper samer@33: % \colfig[0.97]{matbase/fig72663}\\ % later update from mbc paper (Keith's new picks) samer@24: \vspace*{1em} samer@24: \colfig[0.97]{matbase/fig13377} % rule based analysis samer@24: \caption{Analysis of \emph{Two Pages}. samer@24: The thick vertical lines are the part boundaries as indicated in samer@24: the score by the composer. samer@24: The thin grey lines samer@24: indicate changes in the melodic `figures' of which the piece is samer@24: constructed. In the `model information rate' panel, the black asterisks samer@24: mark the samer@24: six most surprising moments selected by Keith Potter. samer@24: The bottom panel shows a rule-based boundary strength analysis computed samer@24: using Cambouropoulos' LBDM. samer@24: All information measures are in nats and time is in notes. samer@24: } samer@24: \end{fig} samer@24: samer@36: \subsection{Musicological Analysis} samer@36: In \cite{AbdallahPlumbley2009}, methods based on the theory described above samer@36: were used to analysis two pieces of music in the minimalist style samer@36: by Philip Glass: \emph{Two Pages} (1969) and \emph{Gradus} (1968). samer@36: The analysis was done using a first-order Markov chain model, with the samer@36: enhancement that the transition matrix of the model was allowed to samer@36: evolve dynamically as the notes were processed, and was tracked (in samer@36: a Bayesian way) as a \emph{distribution} over possible transition matrices, samer@36: rather than a point estimate. The results are summarised in \figrf{twopages}: samer@36: the upper four plots show the dynamically evolving subjective information samer@36: measures as described in \secrf{surprise-info-seq} computed using a point samer@36: estimate of the current transition matrix, but the fifth plot (the `model information rate') samer@36: measures the information in each observation about the transition matrix. samer@36: In \cite{AbdallahPlumbley2010b}, we showed that this `model information rate' samer@36: is actually a component of the true IPI in samer@36: a time-varying Markov chain, which was neglected when we computed the IPI from samer@36: point estimates of the transition matrix as if the transition probabilities samer@36: were constant. samer@36: samer@36: The peaks of the surprisingness and both components of the predictive information samer@36: show good correspondence with structure of the piece both as marked in the score samer@36: and as analysed by musicologist Keith Potter, who was asked to mark the six samer@36: `most surprising moments' of the piece (shown as asterisks in the fifth plot)% samer@36: \footnote{% samer@36: Note that the boundary marked in the score at around note 5,400 is known to be samer@36: anomalous; on the basis of a listening analysis, some musicologists [ref] have samer@36: placed the boundary a few bars later, in agreement with our analysis.}. samer@36: samer@36: In contrast, the analyses shown in the lower two plots of \figrf{twopages}, samer@36: obtained using two rule-based music segmentation algorithms, while clearly samer@37: \emph{reflecting} the structure of the piece, do not \emph{segment} the piece, samer@37: with no tendency to peaking of the boundary strength function at samer@36: the boundaries in the piece. samer@36: samer@36: samer@24: \begin{fig}{metre} samer@33: % \scalebox{1}[1]{% samer@24: \begin{tabular}{cc} samer@33: \colfig[0.45]{matbase/fig36859} & \colfig[0.48]{matbase/fig88658} \\ samer@33: \colfig[0.45]{matbase/fig48061} & \colfig[0.48]{matbase/fig46367} \\ samer@33: \colfig[0.45]{matbase/fig99042} & \colfig[0.47]{matbase/fig87490} samer@24: % \colfig[0.46]{matbase/fig56807} & \colfig[0.48]{matbase/fig27144} \\ samer@24: % \colfig[0.46]{matbase/fig87574} & \colfig[0.48]{matbase/fig13651} \\ samer@24: % \colfig[0.44]{matbase/fig19913} & \colfig[0.46]{matbase/fig66144} \\ samer@24: % \colfig[0.48]{matbase/fig73098} & \colfig[0.48]{matbase/fig57141} \\ samer@24: % \colfig[0.48]{matbase/fig25703} & \colfig[0.48]{matbase/fig72080} \\ samer@24: % \colfig[0.48]{matbase/fig9142} & \colfig[0.48]{matbase/fig27751} samer@24: samer@24: \end{tabular}% samer@33: % } samer@24: \caption{Metrical analysis by computing average surprisingness and samer@24: informative of notes at different periodicities (\ie hypothetical samer@24: bar lengths) and phases (\ie positions within a bar). samer@24: } samer@24: \end{fig} samer@24: peterf@39: \subsection{Content analysis/Sound Categorisation}. peterf@39: Using analogous definitions of differential entropy, the methods outlined in the previous section are equally applicable to continuous random variables. In the case of music, where expressive properties such as dynamics, tempo, timing and timbre are readily quantified on a continuous scale, the information dynamic framework thus may also be considered. peterf@39: peterf@39: In \cite{Dubnov2006}, Dubnov considers the class of stationary Gaussian processes. For such processes, the entropy rate may be obtained analytically from the power spectral density of the signal, allowing the multi-information rate to be subsequently obtained. Local stationarity is assumed, which may be achieved by windowing or change point detection \cite{Dubnov2008}. %TODO mention non-gaussian processes extension peterf@39: Similarly, the predictive information rate may be computed using a Gaussian linear formulation CITE. In this view, the PIR is a function of the correlation between random innovations supplied to the stochastic process. peterf@39: %Dubnov, MacAdams, Reynolds (2006) peterf@39: %Bailes and Dean (2009) peterf@39: peterf@26: \begin{itemize} peterf@39: \item Continuous domain information peterf@39: \item Audio based music expectation modelling peterf@39: \item Proposed model for Gaussian processes peterf@26: \end{itemize} peterf@26: \emph{Peter} peterf@26: samer@4: samer@4: \subsection{Beat Tracking} hekeus@16: \emph{Andrew} samer@4: samer@4: samer@24: \section{Information dynamics as compositional aid} hekeus@13: hekeus@35: In addition to applying information dynamics to analysis, it is also possible to apply it to the generation of content, such as to the composition of musical materials. hekeus@35: The outputs of algorithmic or stochastic processes can be filtered to match a set of criteria defined in terms of the information dynamics model, this criteria thus becoming a means of interfacing with the generative process. hekeus@35: For instance a stochastic music generating process could be controlled by modifying constraints on its output in terms of predictive information rate or entropy rate. hekeus@13: hekeus@35: The use of stochastic processes for the composition of musical material has been widespread for decades -- for instance Iannis Xenakis applied probabilistic mathematical models to the creation of musical materials\cite{Xenakis:1992ul}. hekeus@35: Information dynamics can serve as a novel framework for the exploration of the possibilities of such processes at the high and abstract level of expectation, randomness and predictability. hekeus@13: samer@23: \subsection{The Melody Triangle} samer@23: samer@34: \begin{figure} samer@34: \centering samer@34: \includegraphics[width=\linewidth]{figs/mtriscat} samer@34: \caption{The population of transition matrices distributed along three axes of samer@34: redundancy, entropy rate and predictive information rate (all measured in bits). samer@34: The concentrations of points along the redundancy axis correspond samer@34: to Markov chains which are roughly periodic with periods of 2 (redundancy 1 bit), samer@34: 3, 4, \etc all the way to period 8 (redundancy 3 bits). The colour of each point samer@34: represents its PIR---note that the highest values are found at intermediate entropy samer@34: and redundancy, and that the distribution as a whole makes a curved triangle. Although samer@34: not visible in this plot, it is largely hollow in the middle. samer@34: \label{InfoDynEngine}} samer@34: \end{figure} samer@23: samer@41: The Melody Triangle is an exploratory interface for the discovery of melodic content, where the input -- positions within a triangle -- directly map to information theoretic measures of the output. samer@41: The measures -- entropy rate, redundancy and predictive information rate -- form a criteria with which to filter the output of the stochastic processes used to generate sequences of notes. samer@41: These measures address notions of expectation and surprise in music, and as such the Melody Triangle is a means of interfacing with a generative process in terms of the predictability of its output. samer@41: samer@41: The triangle is `populated' with possible parameter values for melody generators. samer@41: These are plotted in a 3d statistical space of redundancy, entropy rate and predictive information rate. samer@41: In our case we generated thousands of transition matrixes, representing first-order Markov chains, by a random sampling method. samer@41: In figure \ref{InfoDynEngine} we see a representation of how these matrixes are distributed in the 3d statistical space; each one of these points corresponds to a transition matrix. samer@41: samer@41: samer@41: samer@41: samer@41: The distribution of transition matrixes plotted in this space forms an arch shape that is fairly thin. samer@41: It thus becomes a reasonable approximation to pretend that it is just a sheet in two dimensions; and so we stretch out this curved arc into a flat triangle. samer@41: It is this triangular sheet that is our `Melody Triangle' and forms the interface by which the system is controlled. samer@41: Using this interface thus involves a mapping to statistical space; a user selects a position within the triangle, and a corresponding transition matrix is returned. samer@41: Figure \ref{TheTriangle} shows how the triangle maps to different measures of redundancy, entropy rate and predictive information rate. samer@41: samer@41: samer@41: samer@41: samer@41: Each corner corresponds to three different extremes of predictability and unpredictability, which could be loosely characterised as `periodicity', `noise' and `repetition'. samer@41: Melodies from the `noise' corner have no discernible pattern; they have high entropy rate, low predictive information rate and low redundancy. samer@41: These melodies are essentially totally random. samer@41: A melody along the `periodicity' to `repetition' edge are all deterministic loops that get shorter as we approach the `repetition' corner, until it becomes just one repeating note. samer@41: It is the areas in between the extremes that provide the more `interesting' melodies. samer@41: These melodies have some level of unpredictability, but are not completely random. samer@41: Or, conversely, are predictable, but not entirely so. samer@41: samer@41: \begin{figure} samer@41: \centering samer@41: \includegraphics[width=0.9\linewidth]{figs/TheTriangle.pdf} samer@41: \caption{The Melody Triangle\label{TheTriangle}} samer@41: \end{figure} samer@41: samer@41: samer@41: The Melody Triangle exists in two incarnations; a standard screen based interface where a user moves tokens in and around a triangle on screen, and a multi-user interactive installation where a Kinect camera tracks individuals in a space and maps their positions in physical space to the triangle. samer@41: In the latter visitors entering the installation generates a melody, and could collaborate with their co-visitors to generate musical textures -- a playful yet informative way to explore expectation and surprise in music. samer@41: Additionally different gestures could be detected to change the tempo, register, instrumentation and periodicity of the output melody. samer@41: samer@23: As a screen based interface the Melody Triangle can serve as composition tool. hekeus@35: A triangle is drawn on the screen, screen space thus mapped to the statistical space of the Melody Triangle. hekeus@35: A number of round tokens, each representing a melody can be dragged in and around the triangle. hekeus@35: When a token is dragged into the triangle, the system will start generating the sequence of symbols with statistical properties that correspond to the position of the token. hekeus@35: These symbols are then mapped to notes of a scale. hekeus@35: Keyboard input allow for control over additionally parameters. samer@23: hekeus@40: The Melody Triangle is can assist a composer in the creation not only of melodies, but, by placing multiple tokens in the triangle, can generate intricate musical textures. hekeus@35: Unlike other computer aided composition tools or programming environments, here the composer engages with music on the high and abstract level of expectation, randomness and predictability. hekeus@35: hekeus@35: hekeus@38: hekeus@38: \subsection{Information Dynamics as Evaluative Feedback Mechanism} hekeus@38: %NOT SURE THIS SHOULD BE HERE AT ALL..? hekeus@38: hekeus@38: hekeus@40: Information measures on a stream of symbols can form a feedback mechanism; a rudamentary `critic' of sorts. hekeus@40: For instance symbol by symbol measure of predictive information rate, entropy rate and redundancy could tell us if a stream of symbols is currently `boring', either because it is too repetitive, or because it is too chaotic. hekeus@40: Such feedback would be oblivious to more long term and large scale structures, but it nonetheless could be provide a composer valuable insight on the short term properties of a work. hekeus@40: This could not only be used for the evaluation of pre-composed streams of symbols, but could also provide real-time feedback in an improvisatory setup. hekeus@38: hekeus@13: \section{Musical Preference and Information Dynamics} hekeus@38: We are carrying out a study to investigate the relationship between musical preference and the information dynamics models, the experimental interface a simplified version of the screen-based Melody Triangle. hekeus@38: Participants are asked to use this music pattern generator under various experimental conditions in a composition task. hekeus@38: The data collected includes usage statistics of the system: where in the triangle they place the tokens, how long they leave them there and the state of the system when users, by pressing a key, indicate that they like what they are hearing. hekeus@38: As such the experiments will help us identify any correlation between the information theoretic properties of a stream and its perceived aesthetic worth. hekeus@16: samer@4: hekeus@38: %\emph{comparable system} Gordon Pask's Musicolor (1953) applied a similar notion hekeus@38: %of boredom in its design. The Musicolour would react to audio input through a hekeus@38: %microphone by flashing coloured lights. Rather than a direct mapping of sound hekeus@38: %to light, Pask designed the device to be a partner to a performing musician. It hekeus@38: %would adapt its lighting pattern based on the rhythms and frequencies it would hekeus@38: %hear, quickly `learning' to flash in time with the music. However Pask endowed hekeus@38: %the device with the ability to `be bored'; if the rhythmic and frequency content hekeus@38: %of the input remained the same for too long it would listen for other rhythms hekeus@38: %and frequencies, only lighting when it heard these. As the Musicolour would hekeus@38: %`get bored', the musician would have to change and vary their playing, eliciting hekeus@38: %new and unexpected outputs in trying to keep the Musicolour interested. samer@4: hekeus@13: samer@4: \section{Conclusion} samer@4: samer@9: \bibliographystyle{unsrt} hekeus@16: {\bibliography{all,c4dm,nime}} samer@4: \end{document}