Mercurial > hg > cip2012

Binary file final.pdf has changed
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/final.tex	Mon Apr 16 15:33:42 2012 +0100
@@ -0,0 +1,1236 @@
+\documentclass[conference]{IEEEtran}
+\usepackage{fixltx2e}
+\usepackage{cite}
+\usepackage[spacing]{microtype}
+\usepackage[cmex10]{amsmath}
+\usepackage{graphicx}
+\usepackage{amssymb}
+\usepackage{epstopdf}
+\usepackage{url}
+\usepackage{listings}
+%\usepackage[expectangle]{tools}
+\usepackage{tools}
+\usepackage{tikz}
+\usetikzlibrary{calc}
+\usetikzlibrary{matrix}
+\usetikzlibrary{patterns}
+\usetikzlibrary{arrows}
+
+\let\citep=\cite
+\newcommand{\colfig}[2][1]{\includegraphics[width=#1\linewidth]{figs/#2}}%
+\newcommand\preals{\reals_+}
+\newcommand\X{\mathcal{X}}
+\newcommand\Y{\mathcal{Y}}
+\newcommand\domS{\mathcal{S}}
+\newcommand\A{\mathcal{A}}
+\newcommand\Data{\mathcal{D}}
+\newcommand\rvm[1]{\mathrm{#1}}
+\newcommand\sps{\,.\,}
+\newcommand\Ipred{\mathcal{I}_{\mathrm{pred}}}
+\newcommand\Ix{\mathcal{I}}
+\newcommand\IXZ{\overline{\underline{\mathcal{I}}}}
+\newcommand\x{\vec{x}}
+\newcommand\Ham[1]{\mathcal{H}_{#1}}
+\newcommand\subsets[2]{[#1]^{(k)}}
+\def\bet(#1,#2){#1..#2}
+
+
+\def\ev(#1=#2){#1\!\!=\!#2}
+\newcommand\rv[1]{\Omega \to #1}
+\newcommand\ceq{\!\!=\!}
+\newcommand\cmin{\!-\!}
+\newcommand\modulo[2]{#1\!\!\!\!\!\mod#2}
+
+\newcommand\sumitoN{\sum_{i=1}^N}
+\newcommand\sumktoK{\sum_{k=1}^K}
+\newcommand\sumjtoK{\sum_{j=1}^K}
+\newcommand\sumalpha{\sum_{\alpha\in\A}}
+\newcommand\prodktoK{\prod_{k=1}^K}
+\newcommand\prodjtoK{\prod_{j=1}^K}
+
+\newcommand\past[1]{\overset{\rule{0pt}{0.2em}\smash{\leftarrow}}{#1}}
+\newcommand\fut[1]{\overset{\rule{0pt}{0.1em}\smash{\rightarrow}}{#1}}
+\newcommand\parity[2]{P^{#1}_{2,#2}}
+\newcommand\specint[1]{\frac{1}{2\pi}\int_{-\pi}^\pi #1{S(\omega)} \dd \omega}
+%\newcommand\specint[1]{\int_{-1/2}^{1/2} #1{S(f)} \dd f}
+
+
+%\usepackage[parfill]{parskip}
+
+\begin{document}
+\title{Cognitive Music Modelling: an\\Information Dynamics Approach}
+
+\author{
+	\IEEEauthorblockN{Samer A. Abdallah, Henrik Ekeus, Peter Foster}
+	\IEEEauthorblockN{Andrew Robertson and Mark D. Plumbley}
+	\IEEEauthorblockA{Centre for Digital Music\\
+		Queen Mary University of London\\
+		Mile End Road, London E1 4NS}}
+
+\maketitle
+\begin{abstract}
+	We describe an information-theoretic approach to the analysis
+	of music and other sequential data, which emphasises the predictive aspects
+	of perception, and the dynamic process
+	of forming and modifying expectations about an unfolding stream of data,
+	characterising these using the tools of information theory: entropies,
+	mutual informations, and related quantities.
+	After reviewing the theoretical foundations,
+%	we present a new result on predictive information rates in high-order Markov chains, and
+	we discuss a few emerging areas of application, including
+	musicological analysis, real-time beat-tracking analysis, and the generation
+	of musical materials as a cognitively-informed compositional aid.
+\end{abstract}
+
+
+\section{Introduction}
+\label{s:Intro}
+	The relationship between
+	Shannon's \cite{Shannon48} information theory and music and art in general has been the
+	subject of some interest since the 1950s
+	\cite{Youngblood58,CoonsKraehenbuehl1958,Moles66,Meyer67,Cohen1962}.
+	The general thesis is that perceptible qualities and subjective states
+	like uncertainty, surprise, complexity, tension, and interestingness
+	are closely related to information-theoretic quantities like
+	entropy, relative entropy, and mutual information.
+
+	Music is also an inherently dynamic process,
+	where listeners build up expectations about what is to happen next,
+	which may be fulfilled
+	immediately, after some delay, or modified as the music unfolds.
+	In this paper, we explore this ``Information Dynamics'' view of music,
+	discussing the theory behind it and some emerging applications.
+
+	\subsection{Expectation and surprise in music}
+	The idea that the musical experience is strongly shaped by the generation
+	and playing out of strong and weak expectations was put forward by, amongst others,
+	music theorists L. B. Meyer \cite{Meyer67} and Narmour \citep{Narmour77}, but was
+	recognised much earlier; for example,
+	it was elegantly put by Hanslick \cite{Hanslick1854} in the
+	nineteenth century:
+	\begin{quote}
+			`The most important factor in the mental process which accompanies the
+			act of listening to music, and which converts it to a source of pleasure,
+			is \ldots the intellectual satisfaction
+			which the listener derives from continually following and anticipating
+			the composer's intentions---now, to see his expectations fulfilled, and
+			now, to find himself agreeably mistaken.
+			%It is a matter of course that
+			%this intellectual flux and reflux, this perpetual giving and receiving
+			%takes place unconsciously, and with the rapidity of lightning-flashes.'
+	\end{quote}
+	An essential aspect of this is that music is experienced as a phenomenon
+	that unfolds in time, rather than being apprehended as a static object
+	presented in its entirety. Meyer argued that the experience depends
+	on how we change and revise our conceptions \emph{as events happen}, on
+	how expectation and prediction interact with occurrence, and that, to a
+	large degree, the way to understand the effect of music is to focus on
+	this `kinetics' of expectation and surprise.
+
+  Prediction and expectation are essentially probabilistic concepts
+  and can be treated mathematically using probability theory.
+  We suppose that when we listen to music, expectations are created on the basis
+	of our familiarity with various styles of music and our ability to
+	detect and learn statistical regularities in the music as they emerge,
+	There is experimental evidence that human listeners are able to internalise
+	statistical knowledge about musical structure, \eg
+%	\citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also
+	\citep{SaffranJohnsonAslin1999}, and also
+	that statistical models can form an effective basis for computational
+	analysis of music, \eg
+	\cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}.
+
+%	\subsection{Music and information theory}
+
+%	With a probabilistic framework for music modelling and prediction in hand,
+%	we can %are in a position to
+%	compute various
+\comment{
+	which provides us with a number of measures, such as entropy
+  and mutual information, which are suitable for quantifying states of
+  uncertainty and surprise, and thus could potentially enable us to build
+  quantitative models of the listening process described above.  They are
+  what Berlyne \cite{Berlyne71} called `collative variables' since they are
+  to do with patterns of occurrence rather than medium-specific details.
+  Berlyne sought to show that the collative variables are closely related to
+  perceptual qualities like complexity, tension, interestingness,
+  and even aesthetic value, not just in music, but in other temporal
+  or visual media.
+  The relevance of information theory to music and art has
+  also been addressed by researchers from the 1950s onwards
+  \cite{Youngblood58,CoonsKraehenbuehl1958,Cohen1962,HillerBean66,Moles66,Meyer67}.
+}
+%	information-theoretic quantities like entropy, relative entropy,
+%	and mutual information.
+%	and are major determinants of the overall experience.
+%	Berlyne's `new experimental aesthetics', the `information-aestheticians'.
+
+%	Listeners then experience greater or lesser levels of surprise
+%	in response to departures from these norms.
+%	By careful manipulation
+%	of the material, the composer can thus define, and induce within the
+%	listener, a temporal programme of varying
+%	levels of uncertainty, ambiguity and surprise.
+
+
+\subsection{Information dynamic approach}
+	Our working hypothesis is that, as an intelligent, predictive
+	agent (to which will refer as `it') listens to a piece of music, it maintains
+	a dynamically evolving probabilistic belief state that enables it to make predictions
+	about how the piece will continue, relying on both its previous experience
+	of music and the emerging themes of the piece.  As events unfold, it revises
+	this belief state, which includes predictive
+	distributions over possible future events.  These
+%	distributions and changes in distributions
+	can be characterised in terms of a handful of information
+	theoretic-measures such as entropy and relative entropy,
+	what Berlyne \cite{Berlyne71} called `collative variables', since
+	they are to do with \emph{patterns} of occurrence, rather than the details
+	of which specific things occur,
+	and developed the ideas of `information aesthetics' in an experimental setting.
+	By tracing the
+	evolution of a these measures, we obtain a representation which captures much
+	of the significant structure of the music.
+
+%	In addition, when adaptive probabilistic models are used, expectations are
+%	created mainly in response to \emph{patterns} of occurence,
+%	rather the details of which specific things occur.
+	One consequence of this approach is that regardless of the details of
+	the sensory input or even which sensory modality is being processed, the resulting
+	analysis is in terms of the same units: quantities of information (bits) and
+	rates of information flow (bits per second). The information
+	theoretic concepts in terms of which the analysis is framed are universal to all sorts
+	of data.
+	Together, these suggest that an information dynamic analysis captures a
+	high level of \emph{abstraction}, and could be used to
+	make structural comparisons between different temporal media,
+	such as music, film, animation, and dance.
+%	analyse and compare information
+%	flow in different temporal media regardless of whether they are auditory,
+%	visual or otherwise.
+
+	Another consequence is that the information dynamic approach gives us a principled way
+	to address the notion of \emph{subjectivity}, since the analysis is dependent on the
+	probability model the observer starts off with, which may depend on prior experience
+	or other factors, and which may change over time. Thus, inter-subject variablity and
+	variation in subjects' responses over time are
+	fundamental to the theory.
+
+	%modelling the creative process, which often alternates between generative
+	%and selective or evaluative phases \cite{Boden1990}, and would have
+	%applications in tools for computer aided composition.
+
+
+\section{Theoretical review}
+
+	\subsection{Entropy and information}
+	\label{s:entro-info}
+
+	Let $X$ denote some variable whose value is initially unknown to our
+	hypothetical observer. We will treat $X$ mathematically as a random variable,
+	with a value to be drawn from some set $\X$ and a
+	probability distribution representing the observer's beliefs about the
+	true value of $X$.
+	In this case, the observer's uncertainty about $X$ can be quantified
+	as the entropy of the random variable $H(X)$. For a discrete variable
+	with probability mass function $p:\X \to [0,1]$, this is
+	\begin{equation}
+		H(X) = \sum_{x\in\X} -p(x) \log p(x), % = \expect{-\log p(X)},
+	\end{equation}
+%	where $\expect{}$ is the expectation operator.
+	The negative-log-probability
+	$\ell(x) = -\log p(x)$ of a particular value $x$ can usefully be thought of as
+	the \emph{surprisingness} of the value $x$ should it be observed, and
+	hence the entropy is the expectation of the surprisingness, $\expect \ell(X)$.
+
+	Now suppose that the observer receives some new data $\Data$ that
+	causes a revision of its beliefs about $X$. The \emph{information}
+	in this new data \emph{about} $X$ can be quantified as the
+	relative entropy or
+	Kullback-Leibler (KL) divergence between the prior and posterior
+	distributions $p(x)$ and $p(x|\Data)$ respectively:
+	\begin{equation}
+		\mathcal{I}_{\Data\to X} = D(p_{X|\Data} || p_{X})
+			= \sum_{x\in\X} p(x|\Data) \log \frac{p(x|\Data)}{p(x)}.
+			\label{eq:info}
+	\end{equation}
+	When there are multiple variables $X_1, X_2$
+	\etc which the observer believes to be dependent, then the observation of
+	one may change its beliefs and hence yield information about the
+	others. The joint and conditional entropies as described in any
+	textbook on information theory (\eg \cite{CoverThomas}) then quantify
+	the observer's expected uncertainty about groups of variables given the
+	values of others. In particular, the \emph{mutual information}
+	$I(X_1;X_2)$ is both the expected information
+	in an observation of $X_2$ about $X_1$ and the expected reduction
+	in uncertainty about $X_1$ after observing $X_2$:
+	\begin{equation}
+		I(X_1;X_2) = H(X_1) - H(X_1|X_2),
+	\end{equation}
+	where $H(X_1|X_2) = H(X_1,X_2) - H(X_2)$ is the conditional entropy
+	of $X_1$ given $X_2$. A little algebra shows that $I(X_1;X_2)=I(X_2;X_1)$
+	and so the mutual information is symmetric in its arguments. A conditional
+	form of the mutual information can be formulated analogously:
+	\begin{equation}
+		I(X_1;X_2|X_3) = H(X_1|X_3) - H(X_1|X_2,X_3).
+	\end{equation}
+	These relationships between the various entropies and mutual
+	informations are conveniently visualised in \emph{information diagrams}
+	or I-diagrams \cite{Yeung1991} such as the one in \figrf{venn-example}.
+
+	\begin{fig}{venn-example}
+		\newcommand\rad{2.2em}%
+		\newcommand\circo{circle (3.4em)}%
+		\newcommand\labrad{4.3em}
+		\newcommand\bound{(-6em,-5em) rectangle (6em,6em)}
+		\newcommand\colsep{\ }
+		\newcommand\clipin[1]{\clip (#1) \circo;}%
+		\newcommand\clipout[1]{\clip \bound (#1) \circo;}%
+		\newcommand\cliptwo[3]{%
+			\begin{scope}
+				\clipin{#1};
+				\clipin{#2};
+				\clipout{#3};
+				\fill[black!30] \bound;
+			\end{scope}
+		}%
+		\newcommand\clipone[3]{%
+			\begin{scope}
+				\clipin{#1};
+				\clipout{#2};
+				\clipout{#3};
+				\fill[black!15] \bound;
+			\end{scope}
+		}%
+		\begin{tabular}{c@{\colsep}c}
+			\scalebox{0.9}{%
+			\begin{tikzpicture}[baseline=0pt]
+				\coordinate (p1) at (90:\rad);
+				\coordinate (p2) at (210:\rad);
+				\coordinate (p3) at (-30:\rad);
+				\clipone{p1}{p2}{p3};
+				\clipone{p2}{p3}{p1};
+				\clipone{p3}{p1}{p2};
+				\cliptwo{p1}{p2}{p3};
+				\cliptwo{p2}{p3}{p1};
+				\cliptwo{p3}{p1}{p2};
+            \begin{scope}
+               \clip (p1) \circo;
+               \clip (p2) \circo;
+               \clip (p3) \circo;
+               \fill[black!45] \bound;
+            \end{scope}
+				\draw (p1) \circo;
+				\draw (p2) \circo;
+				\draw (p3) \circo;
+				\path
+					(barycentric cs:p3=1,p1=-0.2,p2=-0.1) +(0ex,0) node {$I_{3|12}$}
+					(barycentric cs:p1=1,p2=-0.2,p3=-0.1) +(0ex,0) node {$I_{1|23}$}
+					(barycentric cs:p2=1,p3=-0.2,p1=-0.1) +(0ex,0) node {$I_{2|13}$}
+					(barycentric cs:p3=1,p2=1,p1=-0.55) +(0ex,0) node {$I_{23|1}$}
+					(barycentric cs:p1=1,p3=1,p2=-0.55) +(0ex,0) node {$I_{13|2}$}
+					(barycentric cs:p2=1,p1=1,p3=-0.55) +(0ex,0) node {$I_{12|3}$}
+					(barycentric cs:p3=1,p2=1,p1=1) node {$I_{123}$}
+					;
+				\path
+					(p1) +(140:\labrad) node {$X_1$}
+					(p2) +(-140:\labrad) node {$X_2$}
+					(p3) +(-40:\labrad) node {$X_3$};
+			\end{tikzpicture}%
+			}
+			&
+			\parbox{0.5\linewidth}{
+				\small
+				\begin{align*}
+					I_{1|23} &= H(X_1|X_2,X_3) \\
+					I_{13|2} &= I(X_1;X_3|X_2) \\
+					I_{1|23} + I_{13|2} &= H(X_1|X_2) \\
+					I_{12|3} + I_{123} &= I(X_1;X_2)
+				\end{align*}
+			}
+		\end{tabular}
+		\caption{
+		I-diagram of entropies and mutual informations
+		for three random variables $X_1$, $X_2$ and $X_3$. The areas of
+		the three circles represent $H(X_1)$, $H(X_2)$ and $H(X_3)$ respectively.
+		The total shaded area is the joint entropy $H(X_1,X_2,X_3)$.
+		The central area $I_{123}$ is the co-information \cite{McGill1954}.
+		Some other information measures are indicated in the legend.
+		}
+	\end{fig}
+
+
+	\subsection{Surprise and information in sequences}
+	\label{s:surprise-info-seq}
+
+	Suppose that  $(\ldots,X_{-1},X_0,X_1,\ldots)$ is a sequence of
+	random variables, infinite in both directions,
+	and that $\mu$ is the associated probability measure over all
+	realisations of the sequence. In the following, $\mu$ will simply serve
+	as a label for the process. We can indentify a number of information-theoretic
+	measures meaningful in the context of a sequential observation of the sequence, during
+	which, at any time $t$, the sequence can be divided into a `present' $X_t$, a `past'
+	$\past{X}_t \equiv (\ldots, X_{t-2}, X_{t-1})$, and a `future'
+	$\fut{X}_t \equiv (X_{t+1},X_{t+2},\ldots)$.
+	We will write the actually observed value of $X_t$ as $x_t$, and
+	the sequence of observations up to but not including $x_t$ as
+	$\past{x}_t$.
+%	Since the sequence is assumed stationary, we can without loss of generality,
+%	assume that $t=0$ in the following definitions.
+
+	The in-context surprisingness of the observation $X_t=x_t$ depends on
+	both $x_t$ and the context $\past{x}_t$:
+	\begin{equation}
+		\ell_t = - \log p(x_t|\past{x}_t).
+	\end{equation}
+	However, before $X_t$ is observed, the observer can compute
+	the \emph{expected} surprisingness as a measure of its uncertainty about
+	$X_t$; this may be written as an entropy
+	$H(X_t|\ev(\past{X}_t = \past{x}_t))$, but note that this is
+	conditional on the \emph{event} $\ev(\past{X}_t=\past{x}_t)$, not the
+	\emph{variables} $\past{X}_t$ as in the conventional conditional entropy.
+
+	The surprisingness $\ell_t$ and expected surprisingness
+	$H(X_t|\ev(\past{X}_t=\past{x}_t))$
+	can be understood as \emph{subjective} information dynamic measures, since they are
+	based on the observer's probability model in the context of the actually observed sequence
+	$\past{x}_t$. They characterise what it is like to be `in the observer's shoes'.
+	If we view the observer as a purely passive or reactive agent, this would
+	probably be sufficient, but for active agents such as humans or animals, it is
+	often necessary to \emph{aniticipate} future events in order, for example, to plan the
+	most effective course of action. It makes sense for such observers to be
+	concerned about the predictive probability distribution over future events,
+	$p(\fut{x}_t|\past{x}_t)$. When an observation $\ev(X_t=x_t)$ is made in this context,
+	the \emph{instantaneous predictive information} (IPI) $\mathcal{I}_t$ at time $t$
+	is the information in the event $\ev(X_t=x_t)$ about the entire future of the sequence $\fut{X}_t$,
+	\emph{given} the observed past $\past{X}_t=\past{x}_t$.
+	Referring to the definition of information \eqrf{info}, this is the KL divergence
+	between prior and posterior distributions over possible futures, which written out in full, is
+	\begin{equation}
+		\mathcal{I}_t = \sum_{\fut{x}_t \in \X^*}
+					p(\fut{x}_t|x_t,\past{x}_t) \log \frac{ p(\fut{x}_t|x_t,\past{x}_t) }{ p(\fut{x}_t|\past{x}_t) },
+	\end{equation}
+	where the sum is to be taken over the set of infinite sequences $\X^*$.
+	Note that it is quite possible for an event to be surprising but not informative
+	in a predictive sense.
+	As with the surprisingness, the observer can compute its \emph{expected} IPI
+	at time $t$, which reduces to a mutual information $I(X_t;\fut{X}_t|\ev(\past{X}_t=\past{x}_t))$
+	conditioned on the observed past. This could be used, for example, as an estimate
+	of attentional resources which should be directed at this stream of data, which may
+	be in competition with other sensory streams.
+
+	\subsection{Information measures for stationary random processes}
+	\label{s:process-info}
+
+
+ 	\begin{fig}{predinfo-bg}
+		\newcommand\subfig[2]{\shortstack{#2\\[0.75em]#1}}
+		\newcommand\rad{2em}%
+		\newcommand\ovoid[1]{%
+			++(-#1,\rad)
+			-- ++(2 * #1,0em) arc (90:-90:\rad)
+ 			-- ++(-2 * #1,0em) arc (270:90:\rad)
+		}%
+		\newcommand\axis{2.75em}%
+		\newcommand\olap{0.85em}%
+		\newcommand\offs{3.6em}
+		\newcommand\colsep{\hspace{5em}}
+		\newcommand\longblob{\ovoid{\axis}}
+		\newcommand\shortblob{\ovoid{1.75em}}
+		\begin{tabular}{c}
+\comment{
+			\subfig{(a) multi-information and entropy rates}{%
+				\begin{tikzpicture}%[baseline=-1em]
+					\newcommand\rc{1.75em}
+					\newcommand\throw{2.5em}
+					\coordinate (p1) at (180:1.5em);
+					\coordinate (p2) at (0:0.3em);
+					\newcommand\bound{(-7em,-2.6em) rectangle (7em,3.0em)}
+					\newcommand\present{(p2) circle (\rc)}
+					\newcommand\thepast{(p1) ++(-\throw,0) \ovoid{\throw}}
+					\newcommand\fillclipped[2]{%
+						\begin{scope}[even odd rule]
+							\foreach \thing in {#2} {\clip \thing;}
+							\fill[black!#1] \bound;
+						\end{scope}%
+					}%
+					\fillclipped{30}{\present,\bound \thepast}
+					\fillclipped{15}{\present,\bound \thepast}
+					\fillclipped{45}{\present,\thepast}
+					\draw \thepast;
+					\draw \present;
+					\node at (barycentric cs:p2=1,p1=-0.3) {$h_\mu$};
+					\node at (barycentric cs:p2=1,p1=1) [shape=rectangle,fill=black!45,inner sep=1pt]{$\rho_\mu$};
+					\path (p2) +(90:3em) node {$X_0$};
+					\path (p1) +(-3em,0em) node  {\shortstack{infinite\\past}};
+					\path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
+				\end{tikzpicture}}%
+			\\[1em]
+			\subfig{(a) excess entropy}{%
+				\newcommand\blob{\longblob}
+				\begin{tikzpicture}
+					\coordinate (p1) at (-\offs,0em);
+					\coordinate (p2) at (\offs,0em);
+					\begin{scope}
+						\clip (p1) \blob;
+						\clip (p2) \blob;
+						\fill[lightgray] (-1,-1) rectangle (1,1);
+					\end{scope}
+					\draw (p1) +(-0.5em,0em) node{\shortstack{infinite\\past}} \blob;
+					\draw (p2) +(0.5em,0em) node{\shortstack{infinite\\future}} \blob;
+					\path (0,0) node (future) {$E$};
+					\path (p1) +(-2em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
+					\path (p2) +(2em,\rad) node [anchor=south] {$X_0,\ldots$};
+				\end{tikzpicture}%
+			}%
+			\\[1em]
+}
+%			\subfig{(b) predictive information rate $b_\mu$}{%
+				\begin{tikzpicture}%[baseline=-1em]
+					\newcommand\rc{2.2em}
+					\newcommand\throw{2.5em}
+					\coordinate (p1) at (210:1.5em);
+					\coordinate (p2) at (90:0.8em);
+					\coordinate (p3) at (-30:1.5em);
+					\newcommand\bound{(-7em,-2.6em) rectangle (7em,3.0em)}
+					\newcommand\present{(p2) circle (\rc)}
+					\newcommand\thepast{(p1) ++(-\throw,0) \ovoid{\throw}}
+					\newcommand\future{(p3) ++(\throw,0) \ovoid{\throw}}
+					\newcommand\fillclipped[2]{%
+						\begin{scope}[even odd rule]
+							\foreach \thing in {#2} {\clip \thing;}
+							\fill[black!#1] \bound;
+						\end{scope}%
+					}%
+%					\fillclipped{80}{\future,\thepast}
+					\fillclipped{30}{\present,\future,\bound \thepast}
+					\fillclipped{15}{\present,\bound \future,\bound \thepast}
+					\draw \future;
+					\fillclipped{45}{\present,\thepast}
+					\draw \thepast;
+					\draw \present;
+					\node at (barycentric cs:p2=0.9,p1=-0.17,p3=-0.17) {$r_\mu$};
+					\node at (barycentric cs:p1=-0.5,p2=1.0,p3=1) {$b_\mu$};
+					\node at (barycentric cs:p3=0,p2=1,p1=1.2) [shape=rectangle,fill=black!45,inner sep=1pt]{$\rho_\mu$};
+					\path (p2) +(140:3.2em) node {$X_0$};
+	%            \node at (barycentric cs:p3=0,p2=1,p1=1) {$\rho_\mu$};
+					\path (p3) +(3em,0em) node  {\shortstack{infinite\\future}};
+					\path (p1) +(-3em,0em) node  {\shortstack{infinite\\past}};
+					\path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
+					\path (p3) +(4em,\rad) node [anchor=south] {$X_1,\ldots$};
+				\end{tikzpicture}%}%
+%				\\[0.25em]
+		\end{tabular}
+		\caption{
+		I-diagram illustrating several information measures in
+		stationary random processes. Each circle or oval represents a random
+		variable or sequence of random variables relative to time $t=0$. Overlapped areas
+		correspond to various mutual informations.
+		The circle represents the `present'. Its total area is
+		$H(X_0)=\rho_\mu+r_\mu+b_\mu$, where $\rho_\mu$ is the multi-information
+		rate, $r_\mu$ is the residual entropy rate, and $b_\mu$ is the predictive
+		information rate. The entropy rate is $h_\mu = r_\mu+b_\mu$.
+%		The small dark
+%		region  below $X_0$ is $\sigma_\mu$ and the excess entropy
+%		is $E = \rho_\mu + \sigma_\mu$.
+		}
+	\end{fig}
+
+	If we step back, out of the observer's shoes as it were, and consider the
+	random process $(\ldots,X_{-1},X_0,X_1,\dots)$ as a statistical ensemble of
+	possible realisations, and furthermore assume that it is stationary,
+	then it becomes possible to define a number of information-theoretic measures,
+	closely related to those described above, but which characterise the
+	process as a whole, rather than on a moment-by-moment basis. Some of these,
+	such as the entropy rate, are well-known, but others are only recently being
+	investigated. (In the following, the assumption of stationarity means that
+	the measures defined below are independent of $t$.)
+
+	The \emph{entropy rate} of the process is the entropy of the `present'
+	$X_t$ given the `past':
+	\begin{equation}
+		\label{eq:entro-rate}
+		h_\mu = H(X_t|\past{X}_t).
+	\end{equation}
+	The entropy rate is a measure of the overall surprisingness
+	or unpredictability of the process, and gives an indication of the average
+	level of surprise and uncertainty that would be experienced by an observer
+	computing the measures of \secrf{surprise-info-seq} on a sequence sampled
+	from the process.
+
+	The \emph{multi-information rate} $\rho_\mu$ \cite{Dubnov2004}
+	is the mutual
+	information between the `past' and the `present':
+	\begin{equation}
+		\label{eq:multi-info}
+			\rho_\mu = I(\past{X}_t;X_t) = H(X_t) - h_\mu.
+	\end{equation}
+	It is a measure of how much the preceeding context of an observation
+	helps in predicting or reducing the suprisingness of the current observation.
+
+	The \emph{excess entropy} \cite{CrutchfieldPackard1983}
+	is the mutual information between
+	the entire `past' and the entire `future' plus `present':
+	\begin{equation}
+		E = I(\past{X}_t; X_t,\fut{X}_t).
+	\end{equation}
+	Both the excess entropy and the multi-information rate can be thought
+	of as measures of \emph{redundancy}, quantifying the extent to which
+	the same information is to be found in all parts of the sequence.
+
+
+	The \emph{predictive information rate} (or PIR) \cite{AbdallahPlumbley2009}
+	is the mutual information between the `present' and the `future' given the
+	`past':
+	\begin{equation}
+		\label{eq:PIR}
+		b_\mu = I(X_t;\fut{X}_t|\past{X}_t) = H(\fut{X}_t|\past{X}_t) - H(\fut{X}_t|X_t,\past{X}_t),
+	\end{equation}
+	which can be read as the average reduction
+	in uncertainty about the future on learning $X_t$, given the past.
+	Due to the symmetry of the mutual information, it can also be written
+	as
+	\begin{equation}
+%		\IXZ_t
+b_\mu = H(X_t|\past{X}_t) - H(X_t|\past{X}_t,\fut{X}_t) = h_\mu - r_\mu,
+%		\label{<++>}
+	\end{equation}
+%	If $X$ is stationary, then
+	where $r_\mu = H(X_t|\fut{X}_t,\past{X}_t)$,
+	is the \emph{residual} \cite{AbdallahPlumbley2010},
+	or \emph{erasure} \cite{VerduWeissman2006} entropy rate.
+	The PIR gives an indication of the average IPI that would be experienced
+	by an observer processing a sequence sampled from this process.
+	The relationship between these various measures are illustrated in \Figrf{predinfo-bg};
+	see James et al \cite{JamesEllisonCrutchfield2011} for further discussion.
+%	in , along with several of the information measures we have discussed so far.
+
+\comment{
+	James et al v\cite{JamesEllisonCrutchfield2011} review several of these
+	information measures and introduce some new related ones.
+	In particular they identify the $\sigma_\mu = I(\past{X}_t;\fut{X}_t|X_t)$,
+	the mutual information between the past and the future given the present,
+	as an interesting quantity that measures the predictive benefit of
+	model-building, that is, maintaining an internal state summarising past
+	observations in order to make better predictions. It is shown as the
+	small dark region below the circle in \figrf{predinfo-bg}(c).
+	By comparing with \figrf{predinfo-bg}(b), we can see that
+	$\sigma_\mu = E - \rho_\mu$.
+}
+%	They also identify
+%	$w_\mu = \rho_\mu + b_{\mu}$, which they call the \emph{local exogenous
+%	information} rate.
+
+
+	\subsection{First and higher order Markov chains}
+	\label{s:markov}
+%	First order Markov chains are the simplest non-trivial models to which information
+%	dynamics methods can be applied.
+	In \cite{AbdallahPlumbley2009} we derived
+	expressions for all the information measures described in \secrf{surprise-info-seq} for
+	ergodic first order Markov chains (\ie that have a unique stationary
+	distribution).
+%	The derivation is greatly simplified by the dependency structure
+%	of the Markov chain: for the purpose of the analysis, the `past' and `future'
+%	segments $\past{X}_t$ and $\fut{X}_t$ can be collapsed to just the previous
+%	and next variables $X_{t-1}$ and $X_{t+1}$ respectively.
+	We also showed that
+	the PIR can be expressed simply in terms of entropy rates:
+	if we let $a$ denote the $K\times K$ transition matrix of a Markov chain over
+	an alphabet $\{1,\ldots,K\}$, such that
+	$a_{ij} = \Pr(\ev(X_t=i|\ev(X_{t-1}=j)))$, and let $h:\reals^{K\times K}\to \reals$ be
+	the entropy rate function such that $h(a)$ is the entropy rate of a Markov chain
+	with transition matrix $a$, then the PIR is
+	\begin{equation}
+		b_\mu = h(a^2) - h(a),
+	\end{equation}
+	where $a^2$ is the transition matrix of the
+%	`skip one'
+	Markov chain obtained by jumping two steps at a time
+	along the original chain.
+
+	Second and higher order Markov chains can be treated in a similar way by transforming
+	to a first order representation of the high order Markov chain. With
+	an $N$th order model, this is done by forming a new alphabet of size $K^N$
+	consisting of all possible $N$-tuples of symbols from the base alphabet.
+	An observation $\hat{x}_t$ in this new model encodes a block of $N$ observations
+	$(x_{t+1},\ldots,x_{t+N})$ from the base model.
+%	The next
+%	observation $\hat{x}_{t+1}$ encodes the block of $N$ obtained by shifting the previous
+%	block along by one step.
+	The new Markov of chain is parameterised by a sparse $K^N\times K^N$
+	transition matrix $\hat{a}$, in terms of which the PIR is
+	\begin{equation}
+		h_\mu = h(\hat{a}), \qquad b_\mu = h({\hat{a}^{N+1}}) - N h({\hat{a}}),
+	\end{equation}
+	where $\hat{a}^{N+1}$ is the $(N+1)$th power of the first order transition matrix.
+	Other information measures can also be computed for the high-order Markov chain, including
+	the multi-information rate $\rho_\mu$ and the excess entropy $E$. (These are identical
+	for first order Markov chains, but for order $N$ chains, $E$ can be up to $N$ times larger
+	than $\rho_\mu$.)
+
+	In our experiments with visualising and sonifying sequences sampled from
+	first order Markov chains \cite{AbdallahPlumbley2009}, we found that
+	the measures $h_\mu$, $\rho_\mu$ and $b_\mu$ correspond to perceptible
+	characteristics, and that the transition matrices maximising or minimising
+	each of these quantities are quite distinct. High entropy rates are associated
+	with completely uncorrelated sequences with no recognisable temporal structure
+	(and low $\rho_\mu$ and $b_\mu$).
+	High values of $\rho_\mu$ are associated with long periodic cycles (and low $h_\mu$
+	and $b_\mu$). High values of $b_\mu$ are associated with intermediate values
+	of $\rho_\mu$ and $h_\mu$, and recognisable, but not completely predictable,
+	temporal structures. These relationships are visible in \figrf{mtriscat} in
+	\secrf{composition}, where we pick up this thread again, with an application of
+	information dynamics in a compositional aid.
+
+
+\section{Information Dynamics in Analysis}
+
+ 	\subsection{Musicological Analysis}
+	\label{s:minimusic}
+
+	In \cite{AbdallahPlumbley2009}, we analysed two pieces of music in the minimalist style
+	by Philip Glass: \emph{Two Pages} (1969) and \emph{Gradus} (1968).
+	The analysis was done using a first-order Markov chain model, with the
+	enhancement that the transition matrix of the model was allowed to
+	evolve dynamically as the notes were processed, and was tracked (in
+	a Bayesian way) as a \emph{distribution} over possible transition matrices,
+	rather than a point estimate. Some results are summarised in \figrf{twopages}:
+	the  upper four plots show the dynamically evolving subjective information
+	measures as described in \secrf{surprise-info-seq}, computed using a point
+	estimate of the current transition matrix; the fifth plot (the `model information rate')
+	shows the information in each observation about the transition matrix.
+	In \cite{AbdallahPlumbley2010b}, we showed that this `model information rate'
+	is actually a component of the true IPI when the transition
+	matrix is being learned online, and was neglected when we computed the IPI from
+	the transition matrix as if it were a constant.
+
+	The peaks of the surprisingness and both components of the IPI
+	show good correspondence with structure of the piece both as marked in the score
+	and as analysed by musicologist Keith Potter, who was asked to mark the six
+	`most surprising moments' of the piece (shown as asterisks in the fifth plot). %%
+%	\footnote{%
+%	Note that the boundary marked in the score at around note 5,400 is known to be
+%	anomalous; on the basis of a listening analysis, some musicologists have
+%	placed the boundary a few bars later, in agreement with our analysis
+%	\cite{PotterEtAl2007}.}
+%
+	In contrast, the analyses shown in the lower two plots of \figrf{twopages},
+	obtained using two rule-based music segmentation algorithms, while clearly
+	\emph{reflecting} the structure of the piece, do not \emph{segment} the piece,
+	with no tendency to peaking of the boundary strength function at
+	the boundaries in the piece.
+
+	The complete analysis of \emph{Gradus} can be found in \cite{AbdallahPlumbley2009},
+	but \figrf{metre} illustrates the result of a metrical analysis: the piece was divided
+	into bars of 32, 64 and 128 notes. In each case, the average surprisingness and
+	IPI for the first, second, third \etc notes in each bar were computed. The plots
+	show that the first note of each bar is, on average, significantly more surprising
+	and informative than the others, up to the 64-note level, where as at the 128-note,
+	level, the dominant periodicity appears to remain at 64 notes.
+
+    \begin{fig}{twopages}
+      \colfig[0.96]{matbase/fig9471}\\  % update from mbc paper
+%      \colfig[0.97]{matbase/fig72663}\\  % later update from mbc paper (Keith's new picks)
+			\vspace*{0.5em}
+      \colfig[0.97]{matbase/fig13377}  % rule based analysis
+      \caption{Analysis of \emph{Two Pages}.
+      The thick vertical lines are the part boundaries as indicated in
+      the score by the composer.
+      The thin grey lines
+      indicate changes in the melodic `figures' of which the piece is
+      constructed. In the `model information rate' panel, the black asterisks
+      mark the six most surprising moments selected by Keith Potter.
+      The bottom two panels show two rule-based boundary strength analyses.
+      All information measures are in nats.
+	%Note that the boundary marked in the score at around note 5,400 is known to be
+	%anomalous; on the basis of a listening analysis, some musicologists have
+	%placed the boundary a few bars later, in agreement with our analysis
+	\cite{PotterEtAl2007}.
+      }
+    \end{fig}
+
+    \begin{fig}{metre}
+%      \scalebox{1}{%
+        \begin{tabular}{cc}
+       \colfig[0.45]{matbase/fig36859} & \colfig[0.48]{matbase/fig88658} \\
+       \colfig[0.45]{matbase/fig48061} & \colfig[0.48]{matbase/fig46367} \\
+       \colfig[0.45]{matbase/fig99042} & \colfig[0.47]{matbase/fig87490}
+%				\colfig[0.46]{matbase/fig56807} & \colfig[0.48]{matbase/fig27144} \\
+%				\colfig[0.46]{matbase/fig87574} & \colfig[0.48]{matbase/fig13651} \\
+%				\colfig[0.44]{matbase/fig19913} & \colfig[0.46]{matbase/fig66144} \\
+%        \colfig[0.48]{matbase/fig73098} & \colfig[0.48]{matbase/fig57141} \\
+%       \colfig[0.48]{matbase/fig25703} & \colfig[0.48]{matbase/fig72080} \\
+%        \colfig[0.48]{matbase/fig9142}  & \colfig[0.48]{matbase/fig27751}
+
+        \end{tabular}%
+%     }
+      \caption{Metrical analysis by computing average surprisingness and
+      IPI of notes at different periodicities (\ie hypothetical
+      bar lengths) and phases (\ie positions within a bar).
+      }
+    \end{fig}
+
+\begin{fig*}{drumfig}
+%	\includegraphics[width=0.9\linewidth]{drum_plots/file9-track.eps}% \\
+	\includegraphics[width=0.97\linewidth]{figs/file11-track.eps} \\
+%	\includegraphics[width=0.9\linewidth]{newplots/file8-track.eps}
+	\caption{Information dynamic analysis derived from audio recordings of
+	drumming, obtained by applying a Bayesian beat tracking system to the
+	sequence of detected kick and snare drum events. The grey line show the system's
+	varying level of uncertainty (entropy) about the tempo and phase of the
+	beat grid, while the stem plot shows the amount of information in each
+	drum event about the beat grid. The entropy drops instantaneously at each
+	event and rises gradually between events.
+	}
+\end{fig*}
+
+    \subsection{Real-valued signals and audio analysis}
+	 Using analogous definitions based on the differential entropy
+	 \cite{CoverThomas}, the methods outlined
+	 in \secrf{surprise-info-seq} and \secrf{process-info}
+	 can be reformulated for random variables taking values in a continuous domain
+	 and thus be applied to expressive parameters of music
+	 such as dynamics, timing and timbre, which are readily quantified on a continuous scale.
+%
+%    \subsection{Audio based content analysis}
+%    Using analogous definitions of differential entropy, the methods outlined
+%     in the previous section are equally applicable to continuous random variables.
+%     In the case of music, where expressive properties such as dynamics, tempo,
+%     timing and timbre are readily quantified on a continuous scale, the information
+%     dynamic framework may also be considered.
+%
+	 Dubnov \cite{Dubnov2004} considers the class of stationary Gaussian
+	 processes, for which the entropy rate may be obtained analytically
+	 from the power spectral density function $S(\omega)$ of the signal,
+	 and found that the
+	 multi-information rate  can be
+	 expressed as
+		\begin{equation}
+			\rho_\mu = \frac{1}{2} \left( \log \specint{} - \specint{\log}\right).
+			\label{eq:mir-sfm}
+		\end{equation}
+		Dubnov also notes that $e^{-2\rho_\mu}$ is equivalent to the well-known
+		\emph{spectral flatness measure}, and hence,
+	 Gaussian processes with maximal multi-information rate are those with maximally
+	 non-flat spectra, which are those dominated by a single frequency component.
+%	 These essentially consist of a single
+%	 sinusoidal component and hence are completely predictable once
+%	 the parameters of the sinusoid have been inferred.
+%	 Local stationarity is assumed, which may be achieved by windowing or
+%	 change point detection \cite{Dubnov2008}.
+	 %TODO
+
+	 We have found (to appear in forthcoming work) that the predictive information for autoregressive
+	 Gaussian processes can be expressed as
+		\begin{equation}
+			b_\mu = \frac{1}{2} \left( \log \specint{\frac{1}} - \specint{\log\frac{1}}\right),
+		\end{equation}
+		suggesting a sort of duality between $b_\mu$ and $\rho_\mu$ which is consistent with
+		the duality between multi-information and predictive information rates we discuss in
+		\cite{AbdallahPlumbley2012}. A consideration of the residual or erasure entropy rate
+		\cite{VerduWeissman2006}
+		suggests that this expression applies to Guassian processes in general but this is
+		yet to be confirmed rigorously.
+
+	Analysis shows that in stationary autogressive processes of a given finite order,
+	$\rho_\mu$ is unbounded, while for moving average process of a given order, $b_\mu$ is unbounded.
+	This is a result of the physically unattainable infinite precision observations which the
+	theoretical analysis assumes; adding more realistic limitations on the amount of information
+	that can be extracted from one measurement is the one of the aims of our ongoing work in this
+	area.
+%	 We are currently working towards methods for the computation of predictive information
+%	 rate in autorregressive and moving average Gaussian processes
+%	 and processes with power-law (or $1/f$) spectra,
+%	 which have previously been investegated in relation to their aesthetic properties
+%	 \cite{Voss75,TaylorSpeharVan-Donkelaar2011}.
+
+%	(fractionally integrated Gaussian noise).
+%	 %(fBm (continuous), fiGn discrete time) possible reference:
+%			@book{palma2007long,
+%		  title={Long-memory time series: theory and methods},
+%		  author={Palma, W.},
+%		  volume={662},
+%		  year={2007},
+%		  publisher={Wiley-Blackwell}
+%	}
+
+
+
+%	 mention non-gaussian processes extension Similarly, the predictive information
+%	 rate may be computed using a Gaussian linear formulation CITE. In this view,
+%	 the PIR is a function of the correlation  between random innovations supplied
+%	 to the stochastic process.  %Dubnov, MacAdams, Reynolds (2006) %Bailes and Dean (2009)
+
+%     In \cite{Dubnov2006}, Dubnov considers the class of stationary Gaussian
+%     processes. For such processes, the entropy rate may be obtained analytically
+%     from the power spectral density of the signal, allowing the multi-information
+%     rate to be subsequently obtained. One aspect demanding further investigation
+%     involves the comparison of alternative measures of predictability. In the case of the PIR, a Gaussian linear formulation is applicable, indicating that the PIR is a function of the correlation  between random innovations supplied to the stochastic process CITE.
+    % !!! FIXME
+
+
+\subsection{Beat Tracking}
+
+A probabilistic method for drum tracking was presented by Robertson
+\cite{Robertson11c}. The system infers a beat grid (a sequence
+of approximately regular beat times) given audio inputs from a
+live drummer, for the purpose of synchronising a music
+sequencer with the drummer.
+The times of kick and snare drum events are obtained
+using dedicated microphones for each drum and a percussive onset detector
+\cite{puckette98}. These event times are then sent
+to the beat tracker, which maintains a belief state in
+the form of distributions over the tempo and phase of the beat grid.
+Every time an event is received, these distributions are updated
+with respect to a probabilistic model which accounts both for tempo and phase
+variations and the emission of drum events at musically plausible times
+relative to the beat grid.
+%continually updates distributions for tempo and phase on receiving a new
+%event time
+
+The use of a probabilistic belief state means we can compute entropies
+representing the system's uncertainty about the beat grid, and quantify
+the amount of information in each event about the beat grid as the KL divergence
+between prior and posterior distributions. Though this is not strictly the
+instantaneous predictive information (IPI) as described in \secrf{surprise-info-seq}
+(the information gained is not directly about future event times), we can treat
+it as a proxy for the IPI, in the manner of the `model information rate'
+described in \secrf{minimusic}, which has a similar status.
+
+We carried out the analysis on 16 recordings; an example
+is shown in \figrf{drumfig}. There we can see variations in the
+entropy in the upper graph and the information in each drum event in the lower
+stem plot. At certain points in time, unusually large amounts of information
+arrive; these may be related to fills and other rhythmic irregularities, which
+are often followed by an emphatic return to a steady beat at the beginning
+of the next bar---this is something we are currently investigating.
+We also analysed the pattern of information flow
+on a cyclic metre, much as in \figrf{metre}. All the recordings we
+analysed are audibly in 4/4 metre, but we found no
+evidence of a general tendency for greater amounts of information to arrive
+at metrically strong beats, which suggests that the rhythmic accuracy of the
+drummers does not vary systematically across each bar. It is possible that metrical information
+existing  in the pattern of kick and snare events might emerge in an
+analysis using a model that attempts to predict the time and type of
+the next drum event, rather than just inferring the beat grid as the current model does.
+%The analysis of information rates can b
+%considered \emph{subjective}, in that it measures how the drum tracker's
+%probability distributions change, and these are contingent upon the
+%model used as well as external properties in the signal.
+%We expect,
+%however, that following periods of increased uncertainty, such as fills
+%or expressive timing, the information contained in an individual event
+%increases. We also examine whether the information is dependent upon
+%metrical position.
+
+
+\section{Information dynamics as compositional aid}
+\label{s:composition}
+
+The use of stochastic processes in music composition has been widespread for
+decades---for instance Iannis Xenakis applied probabilistic mathematical models
+to the creation of musical materials\cite{Xenakis:1992ul}. While such processes
+can drive the \emph{generative} phase of the creative process, information dynamics
+can serve as a novel framework for a \emph{selective} phase, by
+providing a set of criteria to be used in judging which of the
+generated materials
+are of value. This alternation of generative and selective phases as been
+noted before \cite{Boden1990}.
+%
+Information-dynamic criteria can also be used as \emph{constraints} on the
+generative processes, for example, by specifying a certain temporal profile
+of suprisingness and uncertainty the composer wishes to induce in the listener
+as the piece unfolds.
+%stochastic and algorithmic processes: ; outputs can be filtered to match a set of
+%criteria defined in terms of information-dynamical characteristics, such as
+%predictability vs unpredictability
+%s model, this criteria thus becoming a means of interfacing with the generative processes.
+
+%The tools of information dynamics provide a way to constrain and select musical
+%materials at the level of patterns of expectation, implication, uncertainty, and predictability.
+In particular, the behaviour of the predictive information rate (PIR) defined in
+\secrf{process-info} make it interesting from a compositional point of view. The definition
+of the PIR is such that it is low both for extremely regular processes, such as constant
+or periodic sequences, \emph{and} low for extremely random processes, where each symbol
+is chosen independently of the others, in a kind of `white noise'. In the former case,
+the pattern, once established, is completely predictable and therefore there is no
+\emph{new} information in subsequent observations. In the latter case, the randomness
+and independence of all elements of the sequence means that, though potentially surprising,
+each observation carries no information about the ones to come.
+
+Processes with high PIR maintain a certain kind of balance between
+predictability and unpredictability in such a way that the observer must continually
+pay attention to each new observation as it occurs in order to make the best
+possible predictions about the evolution of the seqeunce. This balance between predictability
+and unpredictability is reminiscent of the inverted `U' shape of the Wundt curve (see \figrf{wundt}),
+which summarises the observations of Wundt \cite{Wundt1897} that stimuli are most
+pleasing at intermediate levels of novelty or disorder,
+where there is a balance between `order' and `chaos'.
+
+Using the methods of \secrf{markov}, we found \cite{AbdallahPlumbley2009}
+a similar shape when plotting entropy rate againt PIR---this is visible in the
+upper envelope of the plot in \figrf{mtriscat}, which is a 3-D scatter plot of
+three of the information measures discussed in \secrf{process-info} for several thousand
+first-order Markov chain transition matrices generated by a random sampling method.
+The coordinates of the `information space' are entropy rate ($h_\mu$), redundancy ($\rho_\mu$), and
+predictive information rate ($b_\mu$). The points along the `redundancy' axis correspond
+to periodic Markov chains. Those along the `entropy' axis produce uncorrelated sequences
+with no temporal structure. Processes with high PIR are to be found at intermediate
+levels of entropy and redundancy.
+
+%It is possible to apply information dynamics to the generation of content, such as to the composition of musical materials.
+
+%For instance a stochastic music generating process could be controlled by modifying
+%constraints on its output in terms of predictive information rate or entropy
+%rate.
+
+  \begin{fig}{wundt}
+    \raisebox{-4em}{\colfig[0.43]{wundt}}
+ %  {\ \shortstack{{\Large$\longrightarrow$}\\ {\scriptsize\emph{exposure}}}\ }
+    {\ {\large$\longrightarrow$}\ }
+    \raisebox{-4em}{\colfig[0.43]{wundt2}}
+    \caption{
+      The Wundt curve relating randomness/complexity with
+      perceived value. Repeated exposure sometimes results
+      in a move to the left along the curve \cite{Berlyne71}.
+    }
+  \end{fig}
+
+
+
+ \subsection{The Melody Triangle}
+
+These observations led us to construct the `Melody Triangle', a graphical interface for
+%for %exploring the melodic patterns generated by each of the Markov chains represented
+%as points in \figrf{mtriscat}.
+%
+%The Melody Triangle is an interface for
+the discovery of melodic
+materials, where the input---positions within a triangle---directly map to information
+theoretic properties of the output. % as exemplified in \figrf{mtriscat}.
+%The measures---entropy rate, redundancy and
+%predictive information rate---form a criteria with which to filter the output
+%of the stochastic processes used to generate sequences of notes.
+%These measures
+%address notions of expectation and surprise in music, and as such the Melody
+%Triangle is a means of interfacing with a generative process in terms of the
+%predictability of its output.
+%§
+The triangle is populated with first order Markov chain transition
+matrices as illustrated in \figrf{mtriscat}.
+The distribution of transition matrices in this space forms a relatively thin
+curved sheet. Thus, it is a reasonable simplification to project out the
+third dimension (the PIR) and present an interface that is just two dimensional.
+The right-angled triangle is rotated, reflected and stretched to form an equilateral triangle with
+the $h_\mu=0, \rho_\mu=0$ vertex at the top, the `redundancy' axis down the left-hand
+side, and the `entropy rate' axis down the right, as shown in \figrf{TheTriangle}.
+This is our `Melody Triangle' and
+forms the interface by which the system is controlled.
+%Using this interface thus involves a mapping to information space;
+The user selects a point within the triangle, this is mapped into the
+information space and the nearest transition matrix is used to generate
+a sequence of values which are then sonified either as pitched notes or percussive
+sounds. By choosing the position within the triangle, the user can control the
+output at the level of its `collative' properties, with access to the variety
+of patterns as described above and in \secrf{markov}.
+%and information-theoretic criteria related to predictability
+%and information flow
+Though the interface is 2D, the third dimension (PIR) is implicitly present, as
+transition matrices retrieved from
+along the centre line of the triangle will tend to have higher PIR.
+We hypothesise that, under
+the appropriate conditions, these will be perceived as more `interesting' or
+`melodic.'
+
+%The corners correspond to three different extremes of predictability and
+%unpredictability, which could be loosely characterised as `periodicity', `noise'
+%and `repetition'.  Melodies from the `noise' corner (high $h_\mu$, low $\rho_\mu$
+%and $b_\mu$) have no discernible pattern;
+%those along the `periodicity'
+%to `repetition' edge are all cyclic patterns that get shorter as we approach
+%the `repetition' corner, until each is just one repeating note. Those along the
+%opposite edge consist of independent random notes from non-uniform distributions.
+%Areas between the left and right edges will tend to have higher PIR,
+%and we hypothesise that, under
+%the appropriate conditions, these will be perceived as more `interesting' or
+%`melodic.'
+%These melodies have some level of unpredictability, but are not completely random.
+% Or, conversely, are predictable, but not entirely so.
+
+ \begin{fig}{mtriscat}
+	\colfig[0.9]{mtriscat}
+	\caption{The population of transition matrices in the 3D space of
+	entropy rate ($h_\mu$), redundancy ($\rho_\mu$) and PIR ($b_\mu$),
+	all in bits.
+	The concentrations of points along the redundancy axis correspond
+	to Markov chains which are roughly periodic with periods of 2 (redundancy 1 bit),
+	3, 4, \etc all the way to period 7 (redundancy 2.8 bits). The colour of each point
+	represents its PIR---note that the highest values are found at intermediate entropy
+	and redundancy, and that the distribution as a whole makes a curved triangle. Although
+	not visible in this plot, it is largely hollow in the middle.}
+\end{fig}
+
+
+%PERHAPS WE SHOULD FOREGO TALKING ABOUT THE
+%INSTALLATION VERSION OF THE TRIANGLE?
+%feels a bit like a tangent, and could do with the space..
+The Melody Triangle exists in two incarnations: a screen-based interface
+where a user moves tokens in and around a triangle on screen, and a multi-user
+interactive installation where a Kinect camera tracks individuals in a space and
+maps their positions in physical space to the triangle.  In the latter each visitor
+that enters the installation generates a melody and can collaborate with their
+co-visitors to generate musical textures. This makes the interaction physically engaging
+and (as our experience with visitors both young and old has demonstrated) more playful.
+%Additionally visitors can change the
+%tempo, register, instrumentation and periodicity of their melody with body gestures.
+%
+The screen based interface can serve as a compositional tool.
+%%A triangle is drawn on the screen, screen space thus mapped to the statistical
+%space of the Melody Triangle.
+A number of tokens, each representing a
+sonification stream or `voice', can be dragged in and around the triangle.
+For each token, a sequence of symbols is sampled using the corresponding
+transition matrix, which
+%statistical properties that correspond to the token's position is generated.  These
+%symbols
+are then mapped to notes of a scale or percussive sounds%
+\footnote{The sampled sequence could easily be mapped to other musical processes, possibly over
+different time scales, such as chords, dynamics and timbres. It would also be possible
+to map the symbols to visual or other outputs.}%
+.  Keyboard commands give control over other musical parameters such
+as pitch register and inter-onset interval.
+%The possibilities afforded by the Melody Triangle in these other domains remains to be investigated.}.
+%
+The system is capable of generating quite intricate musical textures when multiple tokens
+are in the triangle, but unlike other computer aided composition tools or programming
+environments, the composer excercises control at the abstract level of information-dynamic
+properties.
+%the interface relating to subjective expectation and predictability.
+
+\begin{fig}{TheTriangle}
+	\colfig[0.7]{TheTriangle.pdf}
+	\caption{The Melody Triangle}
+\end{fig}
+
+\comment{
+\subsection{Information Dynamics as Evaluative Feedback Mechanism}
+%NOT SURE THIS SHOULD BE HERE AT ALL..?
+Information measures on a stream of symbols can form a feedback mechanism; a
+rudimentary `critic' of sorts.  For instance symbol by symbol measure of predictive
+information rate, entropy rate and redundancy could tell us if a stream of symbols
+is currently `boring', either because it is too repetitive, or because it is too
+chaotic.  Such feedback would be oblivious to long term and large scale
+structures and any cultural norms (such as style conventions), but
+nonetheless could provide a composer with valuable insight on
+the short term properties of a work.  This could not only be used for the
+evaluation of pre-composed streams of symbols, but could also provide real-time
+feedback in an improvisatory setup.
+}
+
+\subsection{User trials with the Melody Triangle}
+We are currently in the process of using the screen-based
+Melody Triangle user interface to investigate the relationship between the information-dynamic
+characteristics of sonified Markov chains and subjective musical preference.
+We carried out a pilot study with six participants, who were asked
+to use a simplified form of the user interface (a single controllable token,
+and no rhythmic, registral or timbral controls) under two conditions:
+one where a single sequence was sonified under user control, and another
+where an additional sequence was sonified in a different register, as if generated
+by a fixed invisible token in one of four regions of the triangle. In addition, subjects
+were asked to press a key if they `liked' what they were hearing.
+
+We recorded subjects' behaviour as well as points which they marked
+with a key press.
+Some results for two of the subjects are shown in \figrf{mtri-results}. Though
+we have not been able to detect any systematic across-subjects preference for any particular
+region of the triangle, subjects do seem to exhibit distinct kinds of exploratory behaviour.
+Our initial hypothesis, that subjects would linger longer in regions of the triangle
+that produced aesthetically preferable sequences, and that this would tend to be towards the
+centre line of the triangle for all subjects, was not confirmed. However, it is possible
+that the design of the experiment encouraged an initial exploration of the space (sometimes
+very systematic, as for subject c) aimed at \emph{understanding} %the parameter space and
+how the system works, rather than finding musical patterns. It is also possible that the
+system encourages users to create musically interesting output by \emph{moving the token},
+rather than finding a particular spot in the triangle which produces a musically interesting
+sequence by itself.
+
+\begin{fig}{mtri-results}
+	\def\scat#1{\colfig[0.42]{mtri/#1}}
+	\def\subj#1{\scat{scat_dwells_subj_#1} & \scat{scat_marks_subj_#1}}
+	\begin{tabular}{cc}
+%		\subj{a} \\
+		\subj{b} \\
+		\subj{c} \\
+		\subj{d}
+	\end{tabular}
+	\caption{Dwell times and mark positions from user trials with the
+	on-screen Melody Triangle interface, for three subjects. The left-hand column shows
+	the positions in a 2D information space (entropy rate vs multi-information rate
+	in bits) where each spent their time; the area of each circle is proportional
+	to the time spent there. The right-hand column shows point which subjects
+	`liked'; the area of the circles here is proportional to the duration spent at
+	that point before the point was marked.}
+\end{fig}
+
+Comments collected from the subjects
+%during and after the experiment
+suggest that
+the information-dynamic characteristics of the patterns were readily apparent
+to most: several noticed the main organisation of the triangle,
+with repetetive notes at the top, cyclic patterns along one edge, and unpredictable
+notes towards the opposite corner. Some described their systematic exploration of the space.
+Two felt that the right side was `more controllable' than the left (a consequence
+of their ability to return to a particular distinctive pattern and recognise it
+as one heard previously). Two reported that they became bored towards the end,
+but another felt there wasn't enough time to `hear out' the patterns properly.
+One subject did not `enjoy' the patterns in the lower region, but another said the lower
+central regions were more `melodic' and `interesting'.
+
+We plan to continue the trials with a slightly less restricted user interface in order
+make the experience more enjoyable and thereby give subjects longer to use the interface;
+this may allow them to get beyond the initial exploratory phase and give a clearer
+picture of their aesthetic preferences. In addition, we plan to conduct a
+study under more restrictive conditions, where subjects will have no control over the patterns
+other than to signal (a) which of two alternatives they prefer in a forced
+choice paradigm, and (b) when they are bored of listening to a given sequence.
+
+%\emph{comparable system}  Gordon Pask's Musicolor (1953) applied a similar notion
+%of boredom in its design.  The Musicolour would react to audio input through a
+%microphone by flashing coloured lights.  Rather than a direct mapping of sound
+%to light, Pask designed the device to be a partner to a performing musician.  It
+%would adapt its lighting pattern based on the rhythms and frequencies it would
+%hear, quickly `learning' to flash in time with the music.  However Pask endowed
+%the device with the ability to `be bored'; if the rhythmic and frequency content
+%of the input remained the same for too long it would listen for other rhythms
+%and frequencies, only lighting when it heard these.  As the Musicolour would
+%`get bored', the musician would have to change and vary their playing, eliciting
+%new and unexpected outputs in trying to keep the Musicolour interested.
+
+
+\section{Conclusions}
+
+	% !!! FIXME
+%We reviewed our information dynamics approach to the modelling of the perception
+We have looked at several emerging areas of application of the methods and
+ideas of information dynamics to various problems in music analysis, perception
+and cognition, including musicological analysis of symbolic music, audio analysis,
+rhythm processing and compositional and creative tasks. The approach has proved
+successful in musicological analysis, and though our initial data on
+rhythm processing and aesthetic preference are inconclusive, there is still
+plenty of work to be done in this area: where-ever there are probabilistic models,
+information dynamics can shed light on their behaviour.
+
+
+
+\section*{acknowledgments}
+This work is supported by EPSRC Doctoral Training Centre EP/G03723X/1 (HE),
+GR/S82213/01 and EP/E045235/1(SA), an EPSRC DTA Studentship (PF), an RAEng/EPSRC Research Fellowship 10216/88 (AR), an EPSRC Leadership Fellowship, EP/G007144/1
+(MDP) and EPSRC IDyOM2 EP/H013059/1.
+This work is partly funded by the CoSound project, funded by the Danish Agency for Science, Technology and Innovation.
+Thanks also Marcus Pearce for providing the two rule-based analyses of \emph{Two Pages}.
+
+
+\bibliographystyle{IEEEtran}
+{\bibliography{all,c4dm,nime,andrew}}
+\end{document}