Mercurial > hg > cip2012
view draft.tex @ 72:9135f6fb1a68
Updated to point to new location of drum track plot.
author | samer |
---|---|
date | Mon, 19 Mar 2012 18:36:43 +0000 |
parents | 2cb06db0d271 |
children |
line wrap: on
line source
\documentclass[conference]{IEEEtran} \usepackage{fixltx2e} \usepackage{cite} \usepackage[cmex10]{amsmath} \usepackage{graphicx} \usepackage{amssymb} \usepackage{epstopdf} \usepackage{url} \usepackage{listings} %\usepackage[expectangle]{tools} \usepackage{tools} \usepackage{tikz} \usetikzlibrary{calc} \usetikzlibrary{matrix} \usetikzlibrary{patterns} \usetikzlibrary{arrows} \let\citep=\cite \newcommand{\colfig}[2][1]{\includegraphics[width=#1\linewidth]{figs/#2}}% \newcommand\preals{\reals_+} \newcommand\X{\mathcal{X}} \newcommand\Y{\mathcal{Y}} \newcommand\domS{\mathcal{S}} \newcommand\A{\mathcal{A}} \newcommand\Data{\mathcal{D}} \newcommand\rvm[1]{\mathrm{#1}} \newcommand\sps{\,.\,} \newcommand\Ipred{\mathcal{I}_{\mathrm{pred}}} \newcommand\Ix{\mathcal{I}} \newcommand\IXZ{\overline{\underline{\mathcal{I}}}} \newcommand\x{\vec{x}} \newcommand\Ham[1]{\mathcal{H}_{#1}} \newcommand\subsets[2]{[#1]^{(k)}} \def\bet(#1,#2){#1..#2} \def\ev(#1=#2){#1\!\!=\!#2} \newcommand\rv[1]{\Omega \to #1} \newcommand\ceq{\!\!=\!} \newcommand\cmin{\!-\!} \newcommand\modulo[2]{#1\!\!\!\!\!\mod#2} \newcommand\sumitoN{\sum_{i=1}^N} \newcommand\sumktoK{\sum_{k=1}^K} \newcommand\sumjtoK{\sum_{j=1}^K} \newcommand\sumalpha{\sum_{\alpha\in\A}} \newcommand\prodktoK{\prod_{k=1}^K} \newcommand\prodjtoK{\prod_{j=1}^K} \newcommand\past[1]{\overset{\rule{0pt}{0.2em}\smash{\leftarrow}}{#1}} \newcommand\fut[1]{\overset{\rule{0pt}{0.1em}\smash{\rightarrow}}{#1}} \newcommand\parity[2]{P^{#1}_{2,#2}} %\usepackage[parfill]{parskip} \begin{document} \title{Cognitive Music Modelling: an\\Information Dynamics Approach} \author{ \IEEEauthorblockN{Samer A. Abdallah, Henrik Ekeus, Peter Foster} \IEEEauthorblockN{Andrew Robertson and Mark D. Plumbley} \IEEEauthorblockA{Centre for Digital Music\\ Queen Mary University of London\\ Mile End Road, London E1 4NS}} \maketitle \begin{abstract} We describe an information-theoretic approach to the analysis of music and other sequential data, which emphasises the predictive aspects of perception, and the dynamic process of forming and modifying expectations about an unfolding stream of data, characterising these using the tools of information theory: entropies, mutual informations, and related quantities. After reviewing the theoretical foundations, % we present a new result on predictive information rates in high-order Markov chains, and we discuss a few emerging areas of application, including musicological analysis, real-time beat-tracking analysis, and the generation of musical materials as a cognitively-informed compositional aid. \end{abstract} \section{Introduction} \label{s:Intro} The relationship between Shannon's \cite{Shannon48} information theory and music and art in general has been the subject of some interest since the 1950s \cite{Youngblood58,CoonsKraehenbuehl1958,Moles66,Meyer67,Cohen1962}. The general thesis is that perceptible qualities and subjective states like uncertainty, surprise, complexity, tension, and interestingness are closely related to information-theoretic quantities like entropy, relative entropy, and mutual information. Music is also an inherently dynamic process, where listeners build up expectations about what is to happen next, which may be fulfilled immediately, after some delay, or modified as the music unfolds. In this paper, we explore this ``Information Dynamics'' view of music, discussing the theory behind it and some emerging applications. \subsection{Expectation and surprise in music} The idea that the musical experience is strongly shaped by the generation and playing out of strong and weak expectations was put forward by, amongst others, music theorists L. B. Meyer \cite{Meyer67} and Narmour \citep{Narmour77}, but was recognised much earlier; for example, it was elegantly put by Hanslick \cite{Hanslick1854} in the nineteenth century: \begin{quote} `The most important factor in the mental process which accompanies the act of listening to music, and which converts it to a source of pleasure, is \ldots the intellectual satisfaction which the listener derives from continually following and anticipating the composer's intentions---now, to see his expectations fulfilled, and now, to find himself agreeably mistaken. %It is a matter of course that %this intellectual flux and reflux, this perpetual giving and receiving %takes place unconsciously, and with the rapidity of lightning-flashes.' \end{quote} An essential aspect of this is that music is experienced as a phenomenon that unfolds in time, rather than being apprehended as a static object presented in its entirety. Meyer argued that the experience depends on how we change and revise our conceptions \emph{as events happen}, on how expectation and prediction interact with occurrence, and that, to a large degree, the way to understand the effect of music is to focus on this `kinetics' of expectation and surprise. Prediction and expectation are essentially probabilistic concepts and can be treated mathematically using probability theory. We suppose that when we listen to music, expectations are created on the basis of our familiarity with various styles of music and our ability to detect and learn statistical regularities in the music as they emerge, There is experimental evidence that human listeners are able to internalise statistical knowledge about musical structure, \eg % \citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also \citep{SaffranJohnsonAslin1999}, and also that statistical models can form an effective basis for computational analysis of music, \eg \cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}. % \subsection{Music and information theory} With a probabilistic framework for music modelling and prediction in hand, we can %are in a position to compute various \comment{ which provides us with a number of measures, such as entropy and mutual information, which are suitable for quantifying states of uncertainty and surprise, and thus could potentially enable us to build quantitative models of the listening process described above. They are what Berlyne \cite{Berlyne71} called `collative variables' since they are to do with patterns of occurrence rather than medium-specific details. Berlyne sought to show that the collative variables are closely related to perceptual qualities like complexity, tension, interestingness, and even aesthetic value, not just in music, but in other temporal or visual media. The relevance of information theory to music and art has also been addressed by researchers from the 1950s onwards \cite{Youngblood58,CoonsKraehenbuehl1958,Cohen1962,HillerBean66,Moles66,Meyer67}. } information-theoretic quantities like entropy, relative entropy, and mutual information. % and are major determinants of the overall experience. Berlyne \cite{Berlyne71} called such quantities `collative variables', since they are to do with patterns of occurrence rather than medium-specific details, and developed the ideas of `information aesthetics' in an experimental setting. % Berlyne's `new experimental aesthetics', the `information-aestheticians'. % Listeners then experience greater or lesser levels of surprise % in response to departures from these norms. % By careful manipulation % of the material, the composer can thus define, and induce within the % listener, a temporal programme of varying % levels of uncertainty, ambiguity and surprise. \subsection{Information dynamic approach} Our working hypothesis is that, as an intelligent, predictive agent (to which will refer as `it') listens to a piece of music, it maintains a dynamically evolving probabilistic belief state that enables it to make predictions about how the piece will continue, relying on both its previous experience of music and the emerging themes of the piece. As events unfold, it revises this belief state, which includes predictive distributions over possible future events. These % distributions and changes in distributions can be characterised in terms of a handful of information theoretic-measures such as entropy and relative entropy. By tracing the evolution of a these measures, we obtain a representation which captures much of the significant structure of the music. One consequence of this approach is that regardless of the details of the sensory input or even which sensory modality is being processed, the resulting analysis is in terms of the same units: quantities of information (bits) and rates of information flow (bits per second). The information theoretic concepts in terms of which the analysis is framed are universal to all sorts of data. In addition, when adaptive probabilistic models are used, expectations are created mainly in response to \emph{patterns} of occurence, rather the details of which specific things occur. Together, these suggest that an information dynamic analysis captures a high level of \emph{abstraction}, and could be used to make structural comparisons between different temporal media, such as music, film, animation, and dance. % analyse and compare information % flow in different temporal media regardless of whether they are auditory, % visual or otherwise. Another consequence is that the information dynamic approach gives us a principled way to address the notion of \emph{subjectivity}, since the analysis is dependent on the probability model the observer starts off with, which may depend on prior experience or other factors, and which may change over time. Thus, inter-subject variablity and variation in subjects' responses over time are fundamental to the theory. %modelling the creative process, which often alternates between generative %and selective or evaluative phases \cite{Boden1990}, and would have %applications in tools for computer aided composition. \section{Theoretical review} \subsection{Entropy and information} \label{s:entro-info} Let $X$ denote some variable whose value is initially unknown to our hypothetical observer. We will treat $X$ mathematically as a random variable, with a value to be drawn from some set $\X$ and a probability distribution representing the observer's beliefs about the true value of $X$. In this case, the observer's uncertainty about $X$ can be quantified as the entropy of the random variable $H(X)$. For a discrete variable with probability mass function $p:\X \to [0,1]$, this is \begin{equation} H(X) = \sum_{x\in\X} -p(x) \log p(x), % = \expect{-\log p(X)}, \end{equation} % where $\expect{}$ is the expectation operator. The negative-log-probability $\ell(x) = -\log p(x)$ of a particular value $x$ can usefully be thought of as the \emph{surprisingness} of the value $x$ should it be observed, and hence the entropy is the expectation of the surprisingness, $\expect \ell(X)$. Now suppose that the observer receives some new data $\Data$ that causes a revision of its beliefs about $X$. The \emph{information} in this new data \emph{about} $X$ can be quantified as the relative entropy or Kullback-Leibler (KL) divergence between the prior and posterior distributions $p(x)$ and $p(x|\Data)$ respectively: \begin{equation} \mathcal{I}_{\Data\to X} = D(p_{X|\Data} || p_{X}) = \sum_{x\in\X} p(x|\Data) \log \frac{p(x|\Data)}{p(x)}. \label{eq:info} \end{equation} When there are multiple variables $X_1, X_2$ \etc which the observer believes to be dependent, then the observation of one may change its beliefs and hence yield information about the others. The joint and conditional entropies as described in any textbook on information theory (\eg \cite{CoverThomas}) then quantify the observer's expected uncertainty about groups of variables given the values of others. In particular, the \emph{mutual information} $I(X_1;X_2)$ is both the expected information in an observation of $X_2$ about $X_1$ and the expected reduction in uncertainty about $X_1$ after observing $X_2$: \begin{equation} I(X_1;X_2) = H(X_1) - H(X_1|X_2), \end{equation} where $H(X_1|X_2) = H(X_1,X_2) - H(X_2)$ is the conditional entropy of $X_2$ given $X_1$. A little algebra shows that $I(X_1;X_2)=I(X_2;X_1)$ and so the mutual information is symmetric in its arguments. A conditional form of the mutual information can be formulated analogously: \begin{equation} I(X_1;X_2|X_3) = H(X_1|X_3) - H(X_1|X_2,X_3). \end{equation} These relationships between the various entropies and mutual informations are conveniently visualised in \emph{information diagrams} or I-diagrams \cite{Yeung1991} such as the one in \figrf{venn-example}. \begin{fig}{venn-example} \newcommand\rad{2.2em}% \newcommand\circo{circle (3.4em)}% \newcommand\labrad{4.3em} \newcommand\bound{(-6em,-5em) rectangle (6em,6em)} \newcommand\colsep{\ } \newcommand\clipin[1]{\clip (#1) \circo;}% \newcommand\clipout[1]{\clip \bound (#1) \circo;}% \newcommand\cliptwo[3]{% \begin{scope} \clipin{#1}; \clipin{#2}; \clipout{#3}; \fill[black!30] \bound; \end{scope} }% \newcommand\clipone[3]{% \begin{scope} \clipin{#1}; \clipout{#2}; \clipout{#3}; \fill[black!15] \bound; \end{scope} }% \begin{tabular}{c@{\colsep}c} \begin{tikzpicture}[baseline=0pt] \coordinate (p1) at (90:\rad); \coordinate (p2) at (210:\rad); \coordinate (p3) at (-30:\rad); \clipone{p1}{p2}{p3}; \clipone{p2}{p3}{p1}; \clipone{p3}{p1}{p2}; \cliptwo{p1}{p2}{p3}; \cliptwo{p2}{p3}{p1}; \cliptwo{p3}{p1}{p2}; \begin{scope} \clip (p1) \circo; \clip (p2) \circo; \clip (p3) \circo; \fill[black!45] \bound; \end{scope} \draw (p1) \circo; \draw (p2) \circo; \draw (p3) \circo; \path (barycentric cs:p3=1,p1=-0.2,p2=-0.1) +(0ex,0) node {$I_{3|12}$} (barycentric cs:p1=1,p2=-0.2,p3=-0.1) +(0ex,0) node {$I_{1|23}$} (barycentric cs:p2=1,p3=-0.2,p1=-0.1) +(0ex,0) node {$I_{2|13}$} (barycentric cs:p3=1,p2=1,p1=-0.55) +(0ex,0) node {$I_{23|1}$} (barycentric cs:p1=1,p3=1,p2=-0.55) +(0ex,0) node {$I_{13|2}$} (barycentric cs:p2=1,p1=1,p3=-0.55) +(0ex,0) node {$I_{12|3}$} (barycentric cs:p3=1,p2=1,p1=1) node {$I_{123}$} ; \path (p1) +(140:\labrad) node {$X_1$} (p2) +(-140:\labrad) node {$X_2$} (p3) +(-40:\labrad) node {$X_3$}; \end{tikzpicture} & \parbox{0.5\linewidth}{ \small \begin{align*} I_{1|23} &= H(X_1|X_2,X_3) \\ I_{13|2} &= I(X_1;X_3|X_2) \\ I_{1|23} + I_{13|2} &= H(X_1|X_2) \\ I_{12|3} + I_{123} &= I(X_1;X_2) \end{align*} } \end{tabular} \caption{ I-diagram of entropies and mutual informations for three random variables $X_1$, $X_2$ and $X_3$. The areas of the three circles represent $H(X_1)$, $H(X_2)$ and $H(X_3)$ respectively. The total shaded area is the joint entropy $H(X_1,X_2,X_3)$. The central area $I_{123}$ is the co-information \cite{McGill1954}. Some other information measures are indicated in the legend. } \end{fig} \subsection{Surprise and information in sequences} \label{s:surprise-info-seq} Suppose that $(\ldots,X_{-1},X_0,X_1,\ldots)$ is a sequence of random variables, infinite in both directions, and that $\mu$ is the associated probability measure over all realisations of the sequence. In the following, $\mu$ will simply serve as a label for the process. We can indentify a number of information-theoretic measures meaningful in the context of a sequential observation of the sequence, during which, at any time $t$, the sequence can be divided into a `present' $X_t$, a `past' $\past{X}_t \equiv (\ldots, X_{t-2}, X_{t-1})$, and a `future' $\fut{X}_t \equiv (X_{t+1},X_{t+2},\ldots)$. We will write the actually observed value of $X_t$ as $x_t$, and the sequence of observations up to but not including $x_t$ as $\past{x}_t$. % Since the sequence is assumed stationary, we can without loss of generality, % assume that $t=0$ in the following definitions. The in-context surprisingness of the observation $X_t=x_t$ depends on both $x_t$ and the context $\past{x}_t$: \begin{equation} \ell_t = - \log p(x_t|\past{x}_t). \end{equation} However, before $X_t$ is observed, the observer can compute the \emph{expected} surprisingness as a measure of its uncertainty about $X_t$; this may be written as an entropy $H(X_t|\ev(\past{X}_t = \past{x}_t))$, but note that this is conditional on the \emph{event} $\ev(\past{X}_t=\past{x}_t)$, not the \emph{variables} $\past{X}_t$ as in the conventional conditional entropy. The surprisingness $\ell_t$ and expected surprisingness $H(X_t|\ev(\past{X}_t=\past{x}_t))$ can be understood as \emph{subjective} information dynamic measures, since they are based on the observer's probability model in the context of the actually observed sequence $\past{x}_t$. They characterise what it is like to be `in the observer's shoes'. If we view the observer as a purely passive or reactive agent, this would probably be sufficient, but for active agents such as humans or animals, it is often necessary to \emph{aniticipate} future events in order, for example, to plan the most effective course of action. It makes sense for such observers to be concerned about the predictive probability distribution over future events, $p(\fut{x}_t|\past{x}_t)$. When an observation $\ev(X_t=x_t)$ is made in this context, the \emph{instantaneous predictive information} (IPI) $\mathcal{I}_t$ at time $t$ is the information in the event $\ev(X_t=x_t)$ about the entire future of the sequence $\fut{X}_t$, \emph{given} the observed past $\past{X}_t=\past{x}_t$. Referring to the definition of information \eqrf{info}, this is the KL divergence between prior and posterior distributions over possible futures, which written out in full, is \begin{equation} \mathcal{I}_t = \sum_{\fut{x}_t \in \X^*} p(\fut{x}_t|x_t,\past{x}_t) \log \frac{ p(\fut{x}_t|x_t,\past{x}_t) }{ p(\fut{x}_t|\past{x}_t) }, \end{equation} where the sum is to be taken over the set of infinite sequences $\X^*$. Note that it is quite possible for an event to be surprising but not informative in a predictive sense. As with the surprisingness, the observer can compute its \emph{expected} IPI at time $t$, which reduces to a mutual information $I(X_t;\fut{X}_t|\ev(\past{X}_t=\past{x}_t))$ conditioned on the observed past. This could be used, for example, as an estimate of attentional resources which should be directed at this stream of data, which may be in competition with other sensory streams. \subsection{Information measures for stationary random processes} \label{s:process-info} \begin{fig}{predinfo-bg} \newcommand\subfig[2]{\shortstack{#2\\[0.75em]#1}} \newcommand\rad{1.8em}% \newcommand\ovoid[1]{% ++(-#1,\rad) -- ++(2 * #1,0em) arc (90:-90:\rad) -- ++(-2 * #1,0em) arc (270:90:\rad) }% \newcommand\axis{2.75em}% \newcommand\olap{0.85em}% \newcommand\offs{3.6em} \newcommand\colsep{\hspace{5em}} \newcommand\longblob{\ovoid{\axis}} \newcommand\shortblob{\ovoid{1.75em}} \begin{tabular}{c} \subfig{(a) multi-information and entropy rates}{% \begin{tikzpicture}%[baseline=-1em] \newcommand\rc{1.75em} \newcommand\throw{2.5em} \coordinate (p1) at (180:1.5em); \coordinate (p2) at (0:0.3em); \newcommand\bound{(-7em,-2.6em) rectangle (7em,3.0em)} \newcommand\present{(p2) circle (\rc)} \newcommand\thepast{(p1) ++(-\throw,0) \ovoid{\throw}} \newcommand\fillclipped[2]{% \begin{scope}[even odd rule] \foreach \thing in {#2} {\clip \thing;} \fill[black!#1] \bound; \end{scope}% }% \fillclipped{30}{\present,\bound \thepast} \fillclipped{15}{\present,\bound \thepast} \fillclipped{45}{\present,\thepast} \draw \thepast; \draw \present; \node at (barycentric cs:p2=1,p1=-0.3) {$h_\mu$}; \node at (barycentric cs:p2=1,p1=1) [shape=rectangle,fill=black!45,inner sep=1pt]{$\rho_\mu$}; \path (p2) +(90:3em) node {$X_0$}; \path (p1) +(-3em,0em) node {\shortstack{infinite\\past}}; \path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$}; \end{tikzpicture}}% \\[1em] \subfig{(b) excess entropy}{% \newcommand\blob{\longblob} \begin{tikzpicture} \coordinate (p1) at (-\offs,0em); \coordinate (p2) at (\offs,0em); \begin{scope} \clip (p1) \blob; \clip (p2) \blob; \fill[lightgray] (-1,-1) rectangle (1,1); \end{scope} \draw (p1) +(-0.5em,0em) node{\shortstack{infinite\\past}} \blob; \draw (p2) +(0.5em,0em) node{\shortstack{infinite\\future}} \blob; \path (0,0) node (future) {$E$}; \path (p1) +(-2em,\rad) node [anchor=south] {$\ldots,X_{-1}$}; \path (p2) +(2em,\rad) node [anchor=south] {$X_0,\ldots$}; \end{tikzpicture}% }% \\[1em] \subfig{(c) predictive information rate $b_\mu$}{% \begin{tikzpicture}%[baseline=-1em] \newcommand\rc{2.1em} \newcommand\throw{2.5em} \coordinate (p1) at (210:1.5em); \coordinate (p2) at (90:0.7em); \coordinate (p3) at (-30:1.5em); \newcommand\bound{(-7em,-2.6em) rectangle (7em,3.0em)} \newcommand\present{(p2) circle (\rc)} \newcommand\thepast{(p1) ++(-\throw,0) \ovoid{\throw}} \newcommand\future{(p3) ++(\throw,0) \ovoid{\throw}} \newcommand\fillclipped[2]{% \begin{scope}[even odd rule] \foreach \thing in {#2} {\clip \thing;} \fill[black!#1] \bound; \end{scope}% }% \fillclipped{80}{\future,\thepast} \fillclipped{30}{\present,\future,\bound \thepast} \fillclipped{15}{\present,\bound \future,\bound \thepast} \draw \future; \fillclipped{45}{\present,\thepast} \draw \thepast; \draw \present; \node at (barycentric cs:p2=1,p1=-0.17,p3=-0.17) {$r_\mu$}; \node at (barycentric cs:p1=-0.4,p2=1.0,p3=1) {$b_\mu$}; \node at (barycentric cs:p3=0,p2=1,p1=1.2) [shape=rectangle,fill=black!45,inner sep=1pt]{$\rho_\mu$}; \path (p2) +(140:3em) node {$X_0$}; % \node at (barycentric cs:p3=0,p2=1,p1=1) {$\rho_\mu$}; \path (p3) +(3em,0em) node {\shortstack{infinite\\future}}; \path (p1) +(-3em,0em) node {\shortstack{infinite\\past}}; \path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$}; \path (p3) +(4em,\rad) node [anchor=south] {$X_1,\ldots$}; \end{tikzpicture}}% \\[0.25em] \end{tabular} \caption{ I-diagrams for several information measures in stationary random processes. Each circle or oval represents a random variable or sequence of random variables relative to time $t=0$. Overlapped areas correspond to various mutual informations. In (a) and (c), the circle represents the `present'. Its total area is $H(X_0)=\rho_\mu+r_\mu+b_\mu$, where $\rho_\mu$ is the multi-information rate, $r_\mu$ is the residual entropy rate, and $b_\mu$ is the predictive information rate. The entropy rate is $h_\mu = r_\mu+b_\mu$. The small dark region below $X_0$ in (c) is $\sigma_\mu = E-\rho_\mu$. } \end{fig} If we step back, out of the observer's shoes as it were, and consider the random process $(\ldots,X_{-1},X_0,X_1,\dots)$ as a statistical ensemble of possible realisations, and furthermore assume that it is stationary, then it becomes possible to define a number of information-theoretic measures, closely related to those described above, but which characterise the process as a whole, rather than on a moment-by-moment basis. Some of these, such as the entropy rate, are well-known, but others are only recently being investigated. (In the following, the assumption of stationarity means that the measures defined below are independent of $t$.) The \emph{entropy rate} of the process is the entropy of the `present' $X_t$ given the `past': \begin{equation} \label{eq:entro-rate} h_\mu = H(X_t|\past{X}_t). \end{equation} The entropy rate is a measure of the overall surprisingness or unpredictability of the process, and gives an indication of the average level of surprise and uncertainty that would be experienced by an observer computing the measures of \secrf{surprise-info-seq} on a sequence sampled from the process. The \emph{multi-information rate} $\rho_\mu$ (following Dubnov's \cite{Dubnov2006} notation for what he called the `information rate') is the mutual information between the `past' and the `present': \begin{equation} \label{eq:multi-info} \rho_\mu = I(\past{X}_t;X_t) = H(X_t) - h_\mu. \end{equation} It is a measure of how much the preceeding context of an observation helps in predicting or reducing the suprisingness of the current observation. The \emph{excess entropy} \cite{CrutchfieldPackard1983} is the mutual information between the entire `past' and the entire `future': \begin{equation} E = I(\past{X}_t; X_t,\fut{X}_t). \end{equation} Both the excess entropy and the multi-information rate can be thought of as measures of \emph{redundancy}, quantifying the extent to which the same information is to be found in all parts of the sequence. The \emph{predictive information rate} (or PIR) \cite{AbdallahPlumbley2009} is the mutual information between the `present' and the `future' given the `past': \begin{equation} \label{eq:PIR} b_\mu = I(X_t;\fut{X}_t|\past{X}_t) = H(\fut{X}_t|\past{X}_t) - H(\fut{X}_t|X_t,\past{X}_t), \end{equation} which can be read as the average reduction in uncertainty about the future on learning $X_t$, given the past. Due to the symmetry of the mutual information, it can also be written as \begin{equation} % \IXZ_t b_\mu = H(X_t|\past{X}_t) - H(X_t|\past{X}_t,\fut{X}_t) = h_\mu - r_\mu, % \label{<++>} \end{equation} % If $X$ is stationary, then where $r_\mu = H(X_t|\fut{X}_t,\past{X}_t)$, is the \emph{residual} \cite{AbdallahPlumbley2010}, or \emph{erasure} \cite{VerduWeissman2006} entropy rate. These relationships are illustrated in \Figrf{predinfo-bg}, along with several of the information measures we have discussed so far. The PIR gives an indication of the average IPI that would be experienced by an observer processing a sequence sampled from this process. James et al \cite{JamesEllisonCrutchfield2011} review several of these information measures and introduce some new related ones. In particular they identify the $\sigma_\mu = I(\past{X}_t;\fut{X}_t|X_t)$, the mutual information between the past and the future given the present, as an interesting quantity that measures the predictive benefit of model-building, that is, maintaining an internal state summarising past observations in order to make better predictions. It is shown as the small dark region below the circle in \figrf{predinfo-bg}(c). By comparing with \figrf{predinfo-bg}(b), we can see that $\sigma_\mu = E - \rho_\mu$. % They also identify % $w_\mu = \rho_\mu + b_{\mu}$, which they call the \emph{local exogenous % information} rate. \subsection{First and higher order Markov chains} \label{s:markov} % First order Markov chains are the simplest non-trivial models to which information % dynamics methods can be applied. In \cite{AbdallahPlumbley2009} we derived expressions for all the information measures described in \secrf{surprise-info-seq} for ergodic first order Markov chains (\ie that have a unique stationary distribution). % The derivation is greatly simplified by the dependency structure % of the Markov chain: for the purpose of the analysis, the `past' and `future' % segments $\past{X}_t$ and $\fut{X}_t$ can be collapsed to just the previous % and next variables $X_{t-1}$ and $X_{t+1}$ respectively. We also showed that the PIR can be expressed simply in terms of entropy rates: if we let $a$ denote the $K\times K$ transition matrix of a Markov chain over an alphabet of $\{1,\ldots,K\}$, such that $a_{ij} = \Pr(\ev(X_t=i|\ev(X_{t-1}=j)))$, and let $h:\reals^{K\times K}\to \reals$ be the entropy rate function such that $h(a)$ is the entropy rate of a Markov chain with transition matrix $a$, then the PIR is \begin{equation} b_\mu = h(a^2) - h(a), \end{equation} where $a^2$, the transition matrix squared, is the transition matrix of the `skip one' Markov chain obtained by jumping two steps at a time along the original chain. Second and higher order Markov chains can be treated in a similar way by transforming to a first order representation of the high order Markov chain. With an $N$th order model, this is done by forming a new alphabet of size $K^N$ consisting of all possible $N$-tuples of symbols from the base alphabet. An observation $\hat{x}_t$ in this new model encodes a block of $N$ observations $(x_{t+1},\ldots,x_{t+N})$ from the base model. % The next % observation $\hat{x}_{t+1}$ encodes the block of $N$ obtained by shifting the previous % block along by one step. The new Markov of chain is parameterised by a sparse $K^N\times K^N$ transition matrix $\hat{a}$, in terms of which the PIR is \begin{equation} h_\mu = h(\hat{a}), \qquad b_\mu = h({\hat{a}^{N+1}}) - N h({\hat{a}}), \end{equation} where $\hat{a}^{N+1}$ is the $(N+1)$th power of the first order transition matrix. Other information measures can also be computed for the high-order Markov chain, including the multi-information rate $\rho_\mu$ and the excess entropy $E$. (These are identical for first order Markov chains, but for order $N$ chains, $E$ can be up to $N$ times larger than $\rho_\mu$.) In our experiments with visualising and sonifying sequences sampled from first order Markov chains \cite{AbdallahPlumbley2009}, we found that the measures $h_\mu$, $\rho_\mu$ and $b_\mu$ correspond to perceptible characteristics, and that the transition matrices maximising or minimising each of these quantities are quite distinct. High entropy rates are associated with completely uncorrelated sequences with no recognisable temporal structure (and low $\rho_\mu$ and $b_\mu$). High values of $\rho_\mu$ are associated with long periodic cycles (and low $h_\mu$ and $b_\mu$). High values of $b_\mu$ are associated with intermediate values of $\rho_\mu$ and $h_\mu$, and recognisable, but not completely predictable, temporal structures. These relationships are visible in \figrf{mtriscat} in \secrf{composition}, where we pick up this thread again, with an application of information dynamics in a compositional aid. \section{Information Dynamics in Analysis} \subsection{Musicological Analysis} \label{s:minimusic} \begin{fig}{twopages} \colfig[0.96]{matbase/fig9471}\\ % update from mbc paper % \colfig[0.97]{matbase/fig72663}\\ % later update from mbc paper (Keith's new picks) \vspace*{0.5em} \colfig[0.97]{matbase/fig13377} % rule based analysis \caption{Analysis of \emph{Two Pages}. The thick vertical lines are the part boundaries as indicated in the score by the composer. The thin grey lines indicate changes in the melodic `figures' of which the piece is constructed. In the `model information rate' panel, the black asterisks mark the six most surprising moments selected by Keith Potter. The bottom two panels show two rule-based boundary strength analyses. All information measures are in nats. Note that the boundary marked in the score at around note 5,400 is known to be anomalous; on the basis of a listening analysis, some musicologists have placed the boundary a few bars later, in agreement with our analysis \cite{PotterEtAl2007}. } \end{fig} In \cite{AbdallahPlumbley2009}, we analysed two pieces of music in the minimalist style by Philip Glass: \emph{Two Pages} (1969) and \emph{Gradus} (1968). The analysis was done using a first-order Markov chain model, with the enhancement that the transition matrix of the model was allowed to evolve dynamically as the notes were processed, and was tracked (in a Bayesian way) as a \emph{distribution} over possible transition matrices, rather than a point estimate. Some results are summarised in \figrf{twopages}: the upper four plots show the dynamically evolving subjective information measures as described in \secrf{surprise-info-seq}, computed using a point estimate of the current transition matrix; the fifth plot (the `model information rate') shows the information in each observation about the transition matrix. In \cite{AbdallahPlumbley2010b}, we showed that this `model information rate' is actually a component of the true IPI when the transition matrix is being learned online, and was neglected when we computed the IPI from the transition matrix as if it were a constant. The peaks of the surprisingness and both components of the IPI show good correspondence with structure of the piece both as marked in the score and as analysed by musicologist Keith Potter, who was asked to mark the six `most surprising moments' of the piece (shown as asterisks in the fifth plot). %% % \footnote{% % Note that the boundary marked in the score at around note 5,400 is known to be % anomalous; on the basis of a listening analysis, some musicologists have % placed the boundary a few bars later, in agreement with our analysis % \cite{PotterEtAl2007}.} % In contrast, the analyses shown in the lower two plots of \figrf{twopages}, obtained using two rule-based music segmentation algorithms, while clearly \emph{reflecting} the structure of the piece, do not \emph{segment} the piece, with no tendency to peaking of the boundary strength function at the boundaries in the piece. The complete analysis of \emph{Gradus} can be found in \cite{AbdallahPlumbley2009}, but \figrf{metre} illustrates the result of a metrical analysis: the piece was divided into bars of 32, 64 and 128 notes. In each case, the average surprisingness and IPI for the first, second, third \etc notes in each bar were computed. The plots show that the first note of each bar is, on average, significantly more surprising and informative than the others, up to the 64-note level, where as at the 128-note, level, the dominant periodicity appears to remain at 64 notes. \begin{fig}{metre} % \scalebox{1}{% \begin{tabular}{cc} \colfig[0.45]{matbase/fig36859} & \colfig[0.48]{matbase/fig88658} \\ \colfig[0.45]{matbase/fig48061} & \colfig[0.48]{matbase/fig46367} \\ \colfig[0.45]{matbase/fig99042} & \colfig[0.47]{matbase/fig87490} % \colfig[0.46]{matbase/fig56807} & \colfig[0.48]{matbase/fig27144} \\ % \colfig[0.46]{matbase/fig87574} & \colfig[0.48]{matbase/fig13651} \\ % \colfig[0.44]{matbase/fig19913} & \colfig[0.46]{matbase/fig66144} \\ % \colfig[0.48]{matbase/fig73098} & \colfig[0.48]{matbase/fig57141} \\ % \colfig[0.48]{matbase/fig25703} & \colfig[0.48]{matbase/fig72080} \\ % \colfig[0.48]{matbase/fig9142} & \colfig[0.48]{matbase/fig27751} \end{tabular}% % } \caption{Metrical analysis by computing average surprisingness and IPI of notes at different periodicities (\ie hypothetical bar lengths) and phases (\ie positions within a bar). } \end{fig} \subsection{Real-valued signals and audio analysis} Using analogous definitions based on the differential entropy \cite{CoverThomas}, the methods outlined in \secrf{surprise-info-seq} and \secrf{process-info} can be reformulated for random variables taking values in a continuous domain. Information-dynamic methods may thus be applied to expressive parameters of music such as dynamics, timing and timbre, which are readily quantified on a continuous scale. % \subsection{Audio based content analysis} % Using analogous definitions of differential entropy, the methods outlined % in the previous section are equally applicable to continuous random variables. % In the case of music, where expressive properties such as dynamics, tempo, % timing and timbre are readily quantified on a continuous scale, the information % dynamic framework may also be considered. Dubnov \cite{Dubnov2006} considers the class of stationary Gaussian processes, for which entropy rate may be obtained analytically from the power spectral density of the signal. Dubnov found that the multi-information rate (which he refers to as `information rate') can be expressed as a function of the \emph{spectral flatness measure}. Thus, for a given variance, Gaussian processes with maximal multi-information rate are those with maximally non-flat spectra. These essentially consist of a single sinusoidal component and hence are completely predictable once the parameters of the sinusoid have been inferred. % Local stationarity is assumed, which may be achieved by windowing or % change point detection \cite{Dubnov2008}. %TODO We are currently working towards methods for the computation of predictive information rate in some restricted classes of Gaussian processes including finite-order autoregressive models and processes with power-law (or $1/f$) spectra, which have previously been investegated in relation to their aesthetic properties \cite{Voss75,TaylorSpeharVan-Donkelaar2011}. % (fractionally integrated Gaussian noise). % %(fBm (continuous), fiGn discrete time) possible reference: % @book{palma2007long, % title={Long-memory time series: theory and methods}, % author={Palma, W.}, % volume={662}, % year={2007}, % publisher={Wiley-Blackwell} % } % mention non-gaussian processes extension Similarly, the predictive information % rate may be computed using a Gaussian linear formulation CITE. In this view, % the PIR is a function of the correlation between random innovations supplied % to the stochastic process. %Dubnov, MacAdams, Reynolds (2006) %Bailes and Dean (2009) % In \cite{Dubnov2006}, Dubnov considers the class of stationary Gaussian % processes. For such processes, the entropy rate may be obtained analytically % from the power spectral density of the signal, allowing the multi-information % rate to be subsequently obtained. One aspect demanding further investigation % involves the comparison of alternative measures of predictability. In the case of the PIR, a Gaussian linear formulation is applicable, indicating that the PIR is a function of the correlation between random innovations supplied to the stochastic process CITE. % !!! FIXME \subsection{Beat Tracking} A probabilistic method for drum tracking was presented by Robertson \cite{Robertson11c}. The system infers a beat grid (a sequence of approximately regular beat times) given audio inputs from a live drummer, for the purpose of synchronising a music sequencer with the drummer. The times of kick and snare drum events are obtained using dedicated microphones for each drum and a percussive onset detector \cite{puckette98}. These event times are then sent to the beat tracker, which maintains a probabilistic belief state in the form of distributions over the tempo and phase of the beat grid. Every time an event is received, these distributions are updated with respect to a probabilistic model which accounts both for tempo and phase variations and the emission of drum events at musically plausible times relative to the beat grid. %continually updates distributions for tempo and phase on receiving a new %event time The use of a probabilistic belief state means we can compute entropies representing the system's uncertainty about the beat grid, and quantify the amount of information in each event about the beat grid as the KL divergence between prior and posterior distributions. Though this is not strictly the instantaneous predictive information (IPI) as described in \secrf{surprise-info-seq} (the information gained is not directly about future event times), we can treat it as a proxy for the IPI, in the manner of the `model information rate' described in \secrf{minimusic}, which has a similar status. \begin{fig*}{drumfig} % \includegraphics[width=0.9\linewidth]{drum_plots/file9-track.eps}% \\ \includegraphics[width=0.97\linewidth]{figs/file11-track.eps} \\ % \includegraphics[width=0.9\linewidth]{newplots/file8-track.eps} \caption{Information dynamic analysis derived from audio recordings of drumming, obtained by applying a Bayesian beat tracking system to the sequence of detected kick and snare drum events. The grey line show the system's varying level of uncertainty (entropy) about the tempo and phase of the beat grid, while the stem plot shows the amount of information in each drum event about the beat grid. The entropy drops instantaneously at each event and rises gradually between events. } \end{fig*} We carried out the analysis on 16 recordings; an example is shown in \figrf{drumfig}. There we can see variations in the entropy in the upper graph and the information in each drum event in the lower stem plot. At certain points in time, unusually large amounts of information arrive; these may be related to fills and other rhythmic irregularities, which are often followed by an emphatic return to a steady beat at the beginning of the next bar---this is something we are currently investigating. We also analysed the pattern of information flow on a cyclic metre, much as in \figrf{metre}. All the recordings we analysed are audibly in 4/4 metre, but we found no evidence of a general tendency for greater amounts of information to arrive at metrically strong beats, which suggests that the rhythmic accuracy of the drummers does not vary systematically across each bar. It is possible that metrical information existing in the pattern of kick and snare events might emerge in an information dynamic analysis using a model that attempts to predict the time and type of the next drum event, rather than just inferring the beat grid as the current model does. %The analysis of information rates can b %considered \emph{subjective}, in that it measures how the drum tracker's %probability distributions change, and these are contingent upon the %model used as well as external properties in the signal. %We expect, %however, that following periods of increased uncertainty, such as fills %or expressive timing, the information contained in an individual event %increases. We also examine whether the information is dependent upon %metrical position. \section{Information dynamics as compositional aid} \label{s:composition} The use of stochastic processes in music composition has been widespread for decades---for instance Iannis Xenakis applied probabilistic mathematical models to the creation of musical materials\cite{Xenakis:1992ul}. While such processes can drive the \emph{generative} phase of the creative process, information dynamics can serve as a novel framework for a \emph{selective} phase, by providing a set of criteria to be used in judging which of the generated materials are of value. This alternation of generative and selective phases as been noted before \cite{Boden1990}. % Information-dynamic criteria can also be used as \emph{constraints} on the generative processes, for example, by specifying a certain temporal profile of suprisingness and uncertainty the composer wishes to induce in the listener as the piece unfolds. %stochastic and algorithmic processes: ; outputs can be filtered to match a set of %criteria defined in terms of information-dynamical characteristics, such as %predictability vs unpredictability %s model, this criteria thus becoming a means of interfacing with the generative processes. %The tools of information dynamics provide a way to constrain and select musical %materials at the level of patterns of expectation, implication, uncertainty, and predictability. In particular, the behaviour of the predictive information rate (PIR) defined in \secrf{process-info} make it interesting from a compositional point of view. The definition of the PIR is such that it is low both for extremely regular processes, such as constant or periodic sequences, \emph{and} low for extremely random processes, where each symbol is chosen independently of the others, in a kind of `white noise'. In the former case, the pattern, once established, is completely predictable and therefore there is no \emph{new} information in subsequent observations. In the latter case, the randomness and independence of all elements of the sequence means that, though potentially surprising, each observation carries no information about the ones to come. Processes with high PIR maintain a certain kind of balance between predictability and unpredictability in such a way that the observer must continually pay attention to each new observation as it occurs in order to make the best possible predictions about the evolution of the seqeunce. This balance between predictability and unpredictability is reminiscent of the inverted `U' shape of the Wundt curve (see \figrf{wundt}), which summarises the observations of Wundt \cite{Wundt1897} that stimuli are most pleasing at intermediate levels of novelty or disorder, where there is a balance between `order' and `chaos'. Using the methods of \secrf{markov}, we found \cite{AbdallahPlumbley2009} a similar shape when plotting entropy rate againt PIR---this is visible in the upper envelope of the scatter plot in \figrf{mtriscat}, which is a 3-D scatter plot of three of the information measures discussed in \secrf{process-info} for several thousand first-order Markov chain transition matrices generated by a random sampling method. The coordinates of the `information space' are entropy rate ($h_\mu$), redundancy ($\rho_\mu$), and predictive information rate ($b_\mu$). The points along the `redundancy' axis correspond to periodic Markov chains. Those along the `entropy' axis produce uncorrelated sequences with no temporal structure. Processes with high PIR are to be found at intermediate levels of entropy and redundancy. These observations led us to construct the `Melody Triangle', a graphical interface for exploring the melodic patterns generated by each of the Markov chains represented as points in \figrf{mtriscat}. %It is possible to apply information dynamics to the generation of content, such as to the composition of musical materials. %For instance a stochastic music generating process could be controlled by modifying %constraints on its output in terms of predictive information rate or entropy %rate. \begin{fig}{wundt} \raisebox{-4em}{\colfig[0.43]{wundt}} % {\ \shortstack{{\Large$\longrightarrow$}\\ {\scriptsize\emph{exposure}}}\ } {\ {\large$\longrightarrow$}\ } \raisebox{-4em}{\colfig[0.43]{wundt2}} \caption{ The Wundt curve relating randomness/complexity with perceived value. Repeated exposure sometimes results in a move to the left along the curve \cite{Berlyne71}. } \end{fig} \subsection{The Melody Triangle} The Melody Triangle is an interface for the discovery of melodic materials, where the input---positions within a triangle---directly map to information theoretic properties of the output. %The measures---entropy rate, redundancy and %predictive information rate---form a criteria with which to filter the output %of the stochastic processes used to generate sequences of notes. %These measures %address notions of expectation and surprise in music, and as such the Melody %Triangle is a means of interfacing with a generative process in terms of the %predictability of its output. The triangle is populated with first order Markov chain transition matrices as illustrated in \figrf{mtriscat}. The distribution of transition matrices in this space forms a relatively thin curved sheet. Thus, it is a reasonable simplification to project out the third dimension (the PIR) and present an interface that is just two dimensional. The right-angled triangle is rotated, reflected and stretched to form an equilateral triangle with the $h_\mu=0, \rho_\mu=0$ vertex at the top, the `redundancy' axis down the left-hand side, and the `entropy rate' axis down the right, as shown in \figrf{TheTriangle}. This is our `Melody Triangle' and forms the interface by which the system is controlled. %Using this interface thus involves a mapping to information space; The user selects a point within the triangle, this is mapped into the information space and the nearest transition matrix is used to generate a sequence of values which are then sonified either as pitched notes or percussive sounds. By choosing the position within the triangle, the user can control the output at the level of its `collative' properties, with access to the variety of patterns as described above and in \secrf{markov}. %and information-theoretic criteria related to predictability %and information flow Though the interface is 2D, the third dimension (PIR) is implicitly present, as transition matrices retrieved from along the centre line of the triangle will tend to have higher PIR. We hypothesise that, under the appropriate conditions, these will be perceived as more `interesting' or `melodic.' %The corners correspond to three different extremes of predictability and %unpredictability, which could be loosely characterised as `periodicity', `noise' %and `repetition'. Melodies from the `noise' corner (high $h_\mu$, low $\rho_\mu$ %and $b_\mu$) have no discernible pattern; %those along the `periodicity' %to `repetition' edge are all cyclic patterns that get shorter as we approach %the `repetition' corner, until each is just one repeating note. Those along the %opposite edge consist of independent random notes from non-uniform distributions. %Areas between the left and right edges will tend to have higher PIR, %and we hypothesise that, under %the appropriate conditions, these will be perceived as more `interesting' or %`melodic.' %These melodies have some level of unpredictability, but are not completely random. % Or, conversely, are predictable, but not entirely so. %PERHAPS WE SHOULD FOREGO TALKING ABOUT THE %INSTALLATION VERSION OF THE TRIANGLE? %feels a bit like a tangent, and could do with the space.. The Melody Triangle exists in two incarnations: a screen-based interface where a user moves tokens in and around a triangle on screen, and a multi-user interactive installation where a Kinect camera tracks individuals in a space and maps their positions in physical space to the triangle. In the latter each visitor that enters the installation generates a melody and can collaborate with their co-visitors to generate musical textures. This makes the interaction physically engaging and (as our experience with visitors both young and old has demonstrated) more playful. %Additionally visitors can change the %tempo, register, instrumentation and periodicity of their melody with body gestures. \begin{fig}{mtriscat} \colfig[0.9]{mtriscat} \caption{The population of transition matrices in the 3D space of entropy rate ($h_\mu$), redundancy ($\rho_\mu$) and PIR ($b_\mu$), all in bits. The concentrations of points along the redundancy axis correspond to Markov chains which are roughly periodic with periods of 2 (redundancy 1 bit), 3, 4, \etc all the way to period 7 (redundancy 2.8 bits). The colour of each point represents its PIR---note that the highest values are found at intermediate entropy and redundancy, and that the distribution as a whole makes a curved triangle. Although not visible in this plot, it is largely hollow in the middle.} \end{fig} The screen based interface can serve as a compositional tool. %%A triangle is drawn on the screen, screen space thus mapped to the statistical %space of the Melody Triangle. A number of tokens, each representing a sonification stream or `voice', can be dragged in and around the triangle. For each token, a sequence of symbols is sampled using the corresponding transition matrix, which %statistical properties that correspond to the token's position is generated. These %symbols are then mapped to notes of a scale or percussive sounds% \footnote{The sampled sequence could easily be mapped to other musical processes, possibly over different time scales, such as chords, dynamics and timbres. It would also be possible to map the symbols to visual or other outputs.}% . Keyboard commands give control over other musical parameters such as pitch register and inter-onset interval. %The possibilities afforded by the Melody Triangle in these other domains remains to be investigated.}. % The system is capable of generating quite intricate musical textures when multiple tokens are in the triangle, but unlike other computer aided composition tools or programming environments, the composer excercises control at the abstract level of information-dynamic properties. %the interface relating to subjective expectation and predictability. \begin{fig}{TheTriangle} \colfig[0.7]{TheTriangle.pdf} \caption{The Melody Triangle} \end{fig} \comment{ \subsection{Information Dynamics as Evaluative Feedback Mechanism} %NOT SURE THIS SHOULD BE HERE AT ALL..? Information measures on a stream of symbols can form a feedback mechanism; a rudimentary `critic' of sorts. For instance symbol by symbol measure of predictive information rate, entropy rate and redundancy could tell us if a stream of symbols is currently `boring', either because it is too repetitive, or because it is too chaotic. Such feedback would be oblivious to long term and large scale structures and any cultural norms (such as style conventions), but nonetheless could provide a composer with valuable insight on the short term properties of a work. This could not only be used for the evaluation of pre-composed streams of symbols, but could also provide real-time feedback in an improvisatory setup. } \subsection{User trials with the Melody Triangle} We are currently in the process of using the screen-based Melody Triangle user interface to investigate the relationship between the information-dynamic characteristics of sonified Markov chains and subjective musical preference. We carried out a pilot study with six participants, who were asked to use a simplified form of the user interface (a single controllable token, and no rhythmic, registral or timbral controls) under two conditions: one where a single sequence was sonified under user control, and another where an additional sequence was sonified in a different register, as if generated by a fixed invisible token in one of four regions of the triangle. In addition, subjects were asked to press a key if they `liked' what they were hearing. We recorded subjects' behaviour as well as points which they marked with a key press. Some results for two of the subjects are shown in \figrf{mtri-results}. Though we have not been able to detect any systematic across-subjects preference for any particular region of the triangle, subjects do seem to exhibit distinct kinds of exploratory behaviour. Our initial hypothesis, that subjects would linger longer in regions of the triangle that produced aesthetically preferable sequences, and that this would tend to be towards the centre line of the triangle for all subjects, was not confirmed. However, it is possible that the design of the experiment encouraged an initial exploration of the space (sometimes very systematic, as for subject c) aimed at \emph{understanding} %the parameter space and how the system works, rather than finding musical patterns. It is also possible that the system encourages users to create musically interesting output by \emph{moving the token}, rather than finding a particular spot in the triangle which produces a musically interesting sequence by itself. \begin{fig}{mtri-results} \def\scat#1{\colfig[0.42]{mtri/#1}} \def\subj#1{\scat{scat_dwells_subj_#1} & \scat{scat_marks_subj_#1}} \begin{tabular}{cc} % \subj{a} \\ % \subj{b} \\ \subj{c} \\ \subj{d} \end{tabular} \caption{Dwell times and mark positions from user trials with the on-screen Melody Triangle interface, for two subjects. The left-hand column shows the positions in a 2D information space (entropy rate vs multi-information rate in bits) where each spent their time; the area of each circle is proportional to the time spent there. The right-hand column shows point which subjects `liked'; the area of the circles here is proportional to the duration spent at that point before the point was marked.} \end{fig} Comments collected from the subjects %during and after the experiment suggest that the information-dynamic characteristics of the patterns were readily apparent to most: several noticed the main organisation of the triangle, with repetetive notes at the top, cyclic patterns along one edge, and unpredictable notes towards the opposite corner. Some described their systematic exploration of the space. Two felt that the right side was `more controllable' than the left (a consequence of their ability to return to a particular distinctive pattern and recognise it as one heard previously). Two reported that they became bored towards the end, but another felt there wasn't enough time to `hear out' the patterns properly. One subject did not `enjoy' the patterns in the lower region, but another said the lower central regions were more `melodic' and `interesting'. We plan to continue the trials with a slightly less restricted user interface in order make the experience more enjoyable and thereby give subjects longer to use the interface; this may allow them to get beyond the initial exploratory phase and give a clearer picture of their aesthetic preferences. In addition, we plan to conduct a study under more restrictive conditions, where subjects will have no control over the patterns other than to signal (a) which of two alternatives they prefer in a forced choice paradigm, and (b) when they are bored of listening to a given sequence. %\emph{comparable system} Gordon Pask's Musicolor (1953) applied a similar notion %of boredom in its design. The Musicolour would react to audio input through a %microphone by flashing coloured lights. Rather than a direct mapping of sound %to light, Pask designed the device to be a partner to a performing musician. It %would adapt its lighting pattern based on the rhythms and frequencies it would %hear, quickly `learning' to flash in time with the music. However Pask endowed %the device with the ability to `be bored'; if the rhythmic and frequency content %of the input remained the same for too long it would listen for other rhythms %and frequencies, only lighting when it heard these. As the Musicolour would %`get bored', the musician would have to change and vary their playing, eliciting %new and unexpected outputs in trying to keep the Musicolour interested. \section{Conclusions} % !!! FIXME %We reviewed our information dynamics approach to the modelling of the perception We have looked at several emerging areas of application of the methods and ideas of information dynamics to various problems in music analysis, perception and cognition, including musicological analysis of symbolic music, audio analysis, rhythm processing and compositional and creative tasks. The approach has proved successful in musicological analysis, and though our initial data on rhythm processing and aesthetic preference are inconclusive, there is still plenty of work to be done in this area: where-ever there are probabilistic models, information dynamics can shed light on their behaviour. \section*{acknowledgments} This work is supported by EPSRC Doctoral Training Centre EP/G03723X/1 (HE), GR/S82213/01 and EP/E045235/1(SA), an EPSRC DTA Studentship (PF), an RAEng/EPSRC Research Fellowship 10216/88 (AR), an EPSRC Leadership Fellowship, EP/G007144/1 (MDP) and EPSRC IDyOM2 EP/H013059/1. This work is partly funded by the CoSound project, funded by the Danish Agency for Science, Technology and Innovation. Thanks also Marcus Pearce for providing the two rule-based analyses of \emph{Two Pages}. \bibliographystyle{IEEEtran} {\bibliography{all,c4dm,nime,andrew}} \end{document}