view draft.tex @ 57:ceec4e8b6585

Merge.
author samer
date Fri, 16 Mar 2012 18:05:56 +0000
parents fa819cf73ea7 a377a2216a2e
children 6e492b4eff44
line wrap: on
line source
\documentclass[conference]{IEEEtran}
\usepackage{cite}
\usepackage[cmex10]{amsmath}
\usepackage{graphicx}
\usepackage{amssymb}
\usepackage{epstopdf}
\usepackage{url}
\usepackage{listings}
%\usepackage[expectangle]{tools}
\usepackage{tools}
\usepackage{tikz}
\usetikzlibrary{calc}
\usetikzlibrary{matrix}
\usetikzlibrary{patterns}
\usetikzlibrary{arrows}

\let\citep=\cite
\newcommand{\colfig}[2][1]{\includegraphics[width=#1\linewidth]{figs/#2}}%
\newcommand\preals{\reals_+}
\newcommand\X{\mathcal{X}}
\newcommand\Y{\mathcal{Y}}
\newcommand\domS{\mathcal{S}}
\newcommand\A{\mathcal{A}}
\newcommand\Data{\mathcal{D}}
\newcommand\rvm[1]{\mathrm{#1}}
\newcommand\sps{\,.\,}
\newcommand\Ipred{\mathcal{I}_{\mathrm{pred}}}
\newcommand\Ix{\mathcal{I}}
\newcommand\IXZ{\overline{\underline{\mathcal{I}}}}
\newcommand\x{\vec{x}}
\newcommand\Ham[1]{\mathcal{H}_{#1}}
\newcommand\subsets[2]{[#1]^{(k)}}
\def\bet(#1,#2){#1..#2}


\def\ev(#1=#2){#1\!\!=\!#2}
\newcommand\rv[1]{\Omega \to #1}
\newcommand\ceq{\!\!=\!}
\newcommand\cmin{\!-\!}
\newcommand\modulo[2]{#1\!\!\!\!\!\mod#2}

\newcommand\sumitoN{\sum_{i=1}^N}
\newcommand\sumktoK{\sum_{k=1}^K}
\newcommand\sumjtoK{\sum_{j=1}^K}
\newcommand\sumalpha{\sum_{\alpha\in\A}}
\newcommand\prodktoK{\prod_{k=1}^K}
\newcommand\prodjtoK{\prod_{j=1}^K}

\newcommand\past[1]{\overset{\rule{0pt}{0.2em}\smash{\leftarrow}}{#1}}
\newcommand\fut[1]{\overset{\rule{0pt}{0.1em}\smash{\rightarrow}}{#1}}
\newcommand\parity[2]{P^{#1}_{2,#2}}

%\usepackage[parfill]{parskip}

\begin{document}
\title{Cognitive Music Modelling: an\\Information Dynamics Approach}

\author{
	\IEEEauthorblockN{Samer A. Abdallah, Henrik Ekeus, Peter Foster}
	\IEEEauthorblockN{Andrew Robertson and Mark D. Plumbley}
	\IEEEauthorblockA{Centre for Digital Music\\
		Queen Mary University of London\\
		Mile End Road, London E1 4NS}}

\maketitle
\begin{abstract}
	People take in information when perceiving music.  With it they continually
	build predictive models of what is going to happen.  There is a relationship
	between information measures and how we perceive music.  An information
	theoretic approach to music cognition is thus a fruitful avenue of research.
	In this paper, we review the theoretical foundations of information dynamics
	and discuss a few emerging areas of application.
\end{abstract}


\section{Introduction}
\label{s:Intro}
	The relationship between
	Shannon's \cite{Shannon48} information theory and music and art in general has been the
	subject of some interest since the 1950s 
	\cite{Youngblood58,CoonsKraehenbuehl1958,HillerBean66,Moles66,Meyer67,Cohen1962}. 
	The general thesis is that perceptible qualities and subjective states
	like uncertainty, surprise, complexity, tension, and interestingness
	are closely related to information-theoretic quantities like
	entropy, relative entropy, and mutual information.

	Music is also an inherently dynamic process, 
	where listeners build up expectations on what is to happen next,
	which are either satisfied or modified as the music unfolds.
	In this paper, we explore this ``Information Dynamics'' view of music,
	discussing the theory behind it and some emerging appliations

	\subsection{Expectation and surprise in music}
	One of the effects of listening to music is to create 
	expectations of what is to come next, which may be fulfilled
	immediately, after some delay, or not at all as the case may be.
	This is the thesis put forward by, amongst others, music theorists 
	L. B. Meyer \cite{Meyer67} and Narmour \citep{Narmour77}, but was
	recognised much earlier; for example, 
	it was elegantly put by Hanslick \cite{Hanslick1854} in the
	nineteenth century:
	\begin{quote}
			`The most important factor in the mental process which accompanies the
			act of listening to music, and which converts it to a source of pleasure, 
			is \ldots the intellectual satisfaction 
			which the listener derives from continually following and anticipating 
			the composer's intentions---now, to see his expectations fulfilled, and 
			now, to find himself agreeably mistaken. 
			%It is a matter of course that 
			%this intellectual flux and reflux, this perpetual giving and receiving 
			%takes place unconsciously, and with the rapidity of lightning-flashes.'
	\end{quote}
	An essential aspect of this is that music is experienced as a phenomenon
	that `unfolds' in time, rather than being apprehended as a static object
	presented in its entirety. Meyer argued that musical experience depends
	on how we change and revise our conceptions \emph{as events happen}, on
	how expectation and prediction interact with occurrence, and that, to a
	large degree, the way to understand the effect of music is to focus on
	this `kinetics' of expectation and surprise.

  Prediction and expectation are essentially probabilistic concepts
  and can be treated mathematically using probability theory.
  We suppose that when we listen to music, expectations are created on the basis 
	of our familiarity with various styles of music and our ability to
	detect and learn statistical regularities in the music as they emerge,
	There is experimental evidence that human listeners are able to internalise
	statistical knowledge about musical structure, \eg
	\citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also
	that statistical models can form an effective basis for computational
	analysis of music, \eg
	\cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}.


\comment{
	The business of making predictions and assessing surprise is essentially
	one of reasoning under conditions of uncertainty and manipulating
	degrees of belief about the various proposition which may or may not
	hold, and, as has been argued elsewhere \cite{Cox1946,Jaynes27}, best
	quantified in terms of Bayesian probability theory.
   Thus, we suppose that
	when we listen to music, expectations are created on the basis of our
	familiarity with various stylistic norms that apply to music in general,
	the particular style (or styles) of music that seem best to fit the piece 
	we are listening to, and
	the emerging structures peculiar to the current piece.  There is
	experimental evidence that human listeners are able to internalise
	statistical knowledge about musical structure, \eg
	\citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also
	that statistical models can form an effective basis for computational
	analysis of music, \eg
	\cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}.
}

%	\subsection{Music and information theory}
	With a probabilistic framework for music modelling and prediction in hand,
	we are in a position to compute various
\comment{
	which provides us with a number of measures, such as entropy
  and mutual information, which are suitable for quantifying states of
  uncertainty and surprise, and thus could potentially enable us to build
  quantitative models of the listening process described above.  They are
  what Berlyne \cite{Berlyne71} called `collative variables' since they are
  to do with patterns of occurrence rather than medium-specific details.
  Berlyne sought to show that the collative variables are closely related to
  perceptual qualities like complexity, tension, interestingness,
  and even aesthetic value, not just in music, but in other temporal
  or visual media.
  The relevance of information theory to music and art has
  also been addressed by researchers from the 1950s onwards
  \cite{Youngblood58,CoonsKraehenbuehl1958,Cohen1962,HillerBean66,Moles66,Meyer67}.
}
	information-theoretic quantities like entropy, relative entropy,
	and mutual information.
%	and are major determinants of the overall experience.
	Berlyne \cite{Berlyne71} called such quantities `collative variables', since 
	they are to do with patterns of occurrence rather than medium-specific details,
	and developed the ideas of `information aesthetics' in an experimental setting.  
%	Berlyne's `new experimental aesthetics', the `information-aestheticians'.

%	Listeners then experience greater or lesser levels of surprise
%	in response to departures from these norms. 
%	By careful manipulation
%	of the material, the composer can thus define, and induce within the
%	listener, a temporal programme of varying
%	levels of uncertainty, ambiguity and surprise. 


\subsection{Information dynamic approach}
	Bringing the various strands together, our working hypothesis is that as a
	listener (to which will refer as `it') listens to a piece of music, it maintains
	a dynamically evolving probabilistic model that enables it to make predictions
	about how the piece will continue, relying on both its previous experience
	of music and the immediate context of the piece.  As events unfold, it revises
	its probabilistic belief state, which includes predictive
	distributions over possible future events.  These 
%	distributions and changes in distributions 
	can be characterised in terms of a handful of information
	theoretic-measures such as entropy and relative entropy.  By tracing the 
	evolution of a these measures, we obtain a representation which captures much
	of the significant structure of the music.
	
	One of the consequences of this approach is that regardless of the details of
	the sensory input or even which sensory modality is being processed, the resulting 
	analysis is in terms of the same units: quantities of information (bits) and
	rates of information flow (bits per second). The probabilistic and information
	theoretic concepts in terms of which the analysis is framed are universal to all sorts 
	of data.
	In addition, when adaptive probabilistic models are used, expectations are
	created mainly in response to to \emph{patterns} of occurence, 
	rather the details of which specific things occur.
	Together, these suggest that an information dynamic analysis captures a
	high level of \emph{abstraction}, and could be used to 
	make structural comparisons between different temporal media,
	such as music, film, animation, and dance.
%	analyse and compare information 
%	flow in different temporal media regardless of whether they are auditory, 
%	visual or otherwise. 

	Another consequence is that the information dynamic approach gives us a principled way
	to address the notion of \emph{subjectivity}, since the analysis is dependent on the 
	probability model the observer starts off with, which may depend on prior experience 
	or other factors, and which may change over time. Thus, inter-subject variablity and 
	variation in subjects' responses over time are 
	fundamental to the theory. 
			
	%modelling the creative process, which often alternates between generative
	%and selective or evaluative phases \cite{Boden1990}, and would have
	%applications in tools for computer aided composition.


\section{Theoretical review}

	\subsection{Entropy and information}
	\label{s:entro-info}

	Let $X$ denote some variable whose value is initially unknown to our 
	hypothetical observer. We will treat $X$ mathematically as a random variable,
	with a value to be drawn from some set $\X$ and a 
	probability distribution representing the observer's beliefs about the 
	true value of $X$.
	In this case, the observer's uncertainty about $X$ can be quantified
	as the entropy of the random variable $H(X)$. For a discrete variable
	with probability mass function $p:\X \to [0,1]$, this is
	\begin{equation}
		H(X) = \sum_{x\in\X} -p(x) \log p(x), % = \expect{-\log p(X)},
	\end{equation}
%	where $\expect{}$ is the expectation operator. 
	The negative-log-probability
	$\ell(x) = -\log p(x)$ of a particular value $x$ can usefully be thought of as
	the \emph{surprisingness} of the value $x$ should it be observed, and
	hence the entropy is the expectation of the surprisingness $\expect \ell(X)$.

	Now suppose that the observer receives some new data $\Data$ that
	causes a revision of its beliefs about $X$. The \emph{information}
	in this new data \emph{about} $X$ can be quantified as the 
	Kullback-Leibler (KL) divergence between the prior and posterior
	distributions $p(x)$ and $p(x|\Data)$ respectively:
	\begin{equation}
		\mathcal{I}_{\Data\to X} = D(p_{X|\Data} || p_{X})
			= \sum_{x\in\X} p(x|\Data) \log \frac{p(x|\Data)}{p(x)}.
			\label{eq:info}
	\end{equation}
	When there are multiple variables $X_1, X_2$ 
	\etc which the observer believes to be dependent, then the observation of 
	one may change its beliefs and hence yield information about the
	others. The joint and conditional entropies as described in any
	textbook on information theory (\eg \cite{CoverThomas}) then quantify
	the observer's expected uncertainty about groups of variables given the
	values of others. In particular, the \emph{mutual information}
	$I(X_1;X_2)$ is both the expected information
	in an observation of $X_2$ about $X_1$ and the expected reduction
	in uncertainty about $X_1$ after observing $X_2$:
	\begin{equation}
		I(X_1;X_2) = H(X_1) - H(X_1|X_2),
	\end{equation}
	where $H(X_1|X_2) = H(X_1,X_2) - H(X_2)$ is the conditional entropy
	of $X_2$ given $X_1$. A little algebra shows that $I(X_1;X_2)=I(X_2;X_1)$
	and so the mutual information is symmetric in its arguments. A conditional
	form of the mutual information can be formulated analogously:
	\begin{equation}
		I(X_1;X_2|X_3) = H(X_1|X_3) - H(X_1|X_2,X_3).
	\end{equation}
	These relationships between the various entropies and mutual
	informations are conveniently visualised in Venn diagram-like \emph{information diagrams}
	or I-diagrams \cite{Yeung1991} such as the one in \figrf{venn-example}.

	\begin{fig}{venn-example}
		\newcommand\rad{2.2em}%
		\newcommand\circo{circle (3.4em)}%
		\newcommand\labrad{4.3em}
		\newcommand\bound{(-6em,-5em) rectangle (6em,6em)}
		\newcommand\colsep{\ }
		\newcommand\clipin[1]{\clip (#1) \circo;}%
		\newcommand\clipout[1]{\clip \bound (#1) \circo;}%
		\newcommand\cliptwo[3]{%
			\begin{scope}
				\clipin{#1};
				\clipin{#2};
				\clipout{#3};
				\fill[black!30] \bound;
			\end{scope}
		}%
		\newcommand\clipone[3]{%
			\begin{scope}
				\clipin{#1};
				\clipout{#2};
				\clipout{#3};
				\fill[black!15] \bound;
			\end{scope}
		}%
		\begin{tabular}{c@{\colsep}c}
			\begin{tikzpicture}[baseline=0pt]
				\coordinate (p1) at (90:\rad);
				\coordinate (p2) at (210:\rad);
				\coordinate (p3) at (-30:\rad);
				\clipone{p1}{p2}{p3};
				\clipone{p2}{p3}{p1};
				\clipone{p3}{p1}{p2};
				\cliptwo{p1}{p2}{p3};
				\cliptwo{p2}{p3}{p1};
				\cliptwo{p3}{p1}{p2};
            \begin{scope}
               \clip (p1) \circo;
               \clip (p2) \circo;
               \clip (p3) \circo;
               \fill[black!45] \bound;
            \end{scope}
				\draw (p1) \circo;
				\draw (p2) \circo;
				\draw (p3) \circo;
				\path 
					(barycentric cs:p3=1,p1=-0.2,p2=-0.1) +(0ex,0) node {$I_{3|12}$}
					(barycentric cs:p1=1,p2=-0.2,p3=-0.1) +(0ex,0) node {$I_{1|23}$}
					(barycentric cs:p2=1,p3=-0.2,p1=-0.1) +(0ex,0) node {$I_{2|13}$}
					(barycentric cs:p3=1,p2=1,p1=-0.55) +(0ex,0) node {$I_{23|1}$}
					(barycentric cs:p1=1,p3=1,p2=-0.55) +(0ex,0) node {$I_{13|2}$}
					(barycentric cs:p2=1,p1=1,p3=-0.55) +(0ex,0) node {$I_{12|3}$}
					(barycentric cs:p3=1,p2=1,p1=1) node {$I_{123}$}
					;
				\path
					(p1) +(140:\labrad) node {$X_1$}
					(p2) +(-140:\labrad) node {$X_2$}
					(p3) +(-40:\labrad) node {$X_3$};
			\end{tikzpicture}
			&
			\parbox{0.5\linewidth}{
				\small
				\begin{align*}
					I_{1|23} &= H(X_1|X_2,X_3) \\
					I_{13|2} &= I(X_1;X_3|X_2) \\
					I_{1|23} + I_{13|2} &= H(X_1|X_2) \\
					I_{12|3} + I_{123} &= I(X_1;X_2) 
				\end{align*}
			}
		\end{tabular}
		\caption{
		I-diagram visualisation of entropies and mutual informations
		for three random variables $X_1$, $X_2$ and $X_3$. The areas of 
		the three circles represent $H(X_1)$, $H(X_2)$ and $H(X_3)$ respectively.
		The total shaded area is the joint entropy $H(X_1,X_2,X_3)$.
		The central area $I_{123}$ is the co-information \cite{McGill1954}.
		Some other information measures are indicated in the legend.
		}
	\end{fig}


	\subsection{Surprise and information in sequences}
	\label{s:surprise-info-seq}

	Suppose that  $(\ldots,X_{-1},X_0,X_1,\ldots)$ is a sequence of
	random variables, infinite in both directions, 
	and that $\mu$ is the associated probability measure over all 
	realisations of the sequence---in the following, $\mu$ will simply serve
	as a label for the process. We can indentify a number of information-theoretic
	measures meaningful in the context of a sequential observation of the sequence, during
	which, at any time $t$, the sequence of variables can be divided into a `present' $X_t$, a `past' 
	$\past{X}_t \equiv (\ldots, X_{t-2}, X_{t-1})$, and a `future' 
	$\fut{X}_t \equiv (X_{t+1},X_{t+2},\ldots)$.
	We will write the actually observed value of $X_t$ as $x_t$, and
	the sequence of observations up to but not including $x_t$ as 
	$\past{x}_t$.
%	Since the sequence is assumed stationary, we can without loss of generality,
%	assume that $t=0$ in the following definitions.

	The in-context surprisingness of the observation $X_t=x_t$ depends on 
	both $x_t$ and the context $\past{x}_t$:
	\begin{equation}
		\ell_t = - \log p(x_t|\past{x}_t).
	\end{equation}
	However, before $X_t$ is observed to be $x_t$, the observer can compute
	the \emph{expected} surprisingness as a measure of its uncertainty about
	the very next event; this may be written as an entropy 
	$H(X_t|\ev(\past{X}_t = \past{x}_t))$, but note that this is
	conditional on the \emph{event} $\ev(\past{X}_t=\past{x}_t)$, not
	\emph{variables} $\past{X}_t$ as in the conventional conditional entropy.

	The surprisingness $\ell_t$ and expected surprisingness
	$H(X_t|\ev(\past{X}_t=\past{x}_t))$
	can be understood as \emph{subjective} information dynamic measures, since they are 
	based on the observer's probability model in the context of the actually observed sequence
	$\past{x}_t$---they characterise what it is like to `be in the observer's shoes'.
	If we view the observer as a purely passive or reactive agent, this would
	probably be sufficient, but for active agents such as humans or animals, it is
	often necessary to \emph{aniticipate} future events in order, for example, to plan the
	most effective course of action. It makes sense for such observers to be
	concerned about the predictive probability distribution over future events,
	$p(\fut{x}_t|\past{x}_t)$. When an observation $\ev(X_t=x_t)$ is made in this context,
	the \emph{instantaneous predictive information} (IPI) $\mathcal{I}_t$ at time $t$
	is the information in the event $\ev(X_t=x_t)$ about the entire future of the sequence $\fut{X}_t$,
	\emph{given} the observed past $\past{X}_t=\past{x}_t$.
	Referring to the definition of information \eqrf{info}, this is the KL divergence
	between prior and posterior distributions over possible futures, which written out in full, is
	\begin{equation}
		\mathcal{I}_t = \sum_{\fut{x}_t \in \X^*} 
					p(\fut{x}_t|x_t,\past{x}_t) \log \frac{ p(\fut{x}_t|x_t,\past{x}_t) }{ p(\fut{x}_t|\past{x}_t) },
	\end{equation}
	where the sum is to be taken over the set of infinite sequences $\X^*$.
	Note that it is quite possible for an event to be surprising but not informative
	in predictive sense.
	As with the surprisingness, the observer can compute its \emph{expected} IPI
	at time $t$, which reduces to a mutual information $I(X_t;\fut{X}_t|\ev(\past{X}_t=\past{x}_t))$
	conditioned on the observed past. This could be used, for example, as an estimate
	of attentional resources which should be directed at this stream of data, which may
	be in competition with other sensory streams.

	\subsection{Information measures for stationary random processes}
	\label{s:process-info}


 	\begin{fig}{predinfo-bg}
		\newcommand\subfig[2]{\shortstack{#2\\[0.75em]#1}}
		\newcommand\rad{1.8em}%
		\newcommand\ovoid[1]{%
			++(-#1,\rad) 
			-- ++(2 * #1,0em) arc (90:-90:\rad)
 			-- ++(-2 * #1,0em) arc (270:90:\rad) 
		}%
		\newcommand\axis{2.75em}%
		\newcommand\olap{0.85em}%
		\newcommand\offs{3.6em}
		\newcommand\colsep{\hspace{5em}}
		\newcommand\longblob{\ovoid{\axis}}
		\newcommand\shortblob{\ovoid{1.75em}}
		\begin{tabular}{c}
			\subfig{(a) multi-information and entropy rates}{%
				\begin{tikzpicture}%[baseline=-1em]
					\newcommand\rc{1.75em}
					\newcommand\throw{2.5em}
					\coordinate (p1) at (180:1.5em);
					\coordinate (p2) at (0:0.3em);
					\newcommand\bound{(-7em,-2.6em) rectangle (7em,3.0em)}
					\newcommand\present{(p2) circle (\rc)}
					\newcommand\thepast{(p1) ++(-\throw,0) \ovoid{\throw}}
					\newcommand\fillclipped[2]{%
						\begin{scope}[even odd rule]
							\foreach \thing in {#2} {\clip \thing;}
							\fill[black!#1] \bound;
						\end{scope}%
					}%
					\fillclipped{30}{\present,\bound \thepast}
					\fillclipped{15}{\present,\bound \thepast}
					\fillclipped{45}{\present,\thepast}
					\draw \thepast;
					\draw \present;
					\node at (barycentric cs:p2=1,p1=-0.3) {$h_\mu$};
					\node at (barycentric cs:p2=1,p1=1) [shape=rectangle,fill=black!45,inner sep=1pt]{$\rho_\mu$};
					\path (p2) +(90:3em) node {$X_0$};
					\path (p1) +(-3em,0em) node  {\shortstack{infinite\\past}};
					\path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
				\end{tikzpicture}}%
			\\[1.25em]
			\subfig{(b) excess entropy}{%
				\newcommand\blob{\longblob}
				\begin{tikzpicture}
					\coordinate (p1) at (-\offs,0em);
					\coordinate (p2) at (\offs,0em);
					\begin{scope}
						\clip (p1) \blob;
						\clip (p2) \blob;
						\fill[lightgray] (-1,-1) rectangle (1,1);
					\end{scope}
					\draw (p1) +(-0.5em,0em) node{\shortstack{infinite\\past}} \blob;
					\draw (p2) +(0.5em,0em) node{\shortstack{infinite\\future}} \blob;
					\path (0,0) node (future) {$E$};
					\path (p1) +(-2em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
					\path (p2) +(2em,\rad) node [anchor=south] {$X_0,\ldots$};
				\end{tikzpicture}%
			}%
			\\[1.25em]
			\subfig{(c) predictive information rate $b_\mu$}{%
				\begin{tikzpicture}%[baseline=-1em]
					\newcommand\rc{2.1em}
					\newcommand\throw{2.5em}
					\coordinate (p1) at (210:1.5em);
					\coordinate (p2) at (90:0.7em);
					\coordinate (p3) at (-30:1.5em);
					\newcommand\bound{(-7em,-2.6em) rectangle (7em,3.0em)}
					\newcommand\present{(p2) circle (\rc)}
					\newcommand\thepast{(p1) ++(-\throw,0) \ovoid{\throw}}
					\newcommand\future{(p3) ++(\throw,0) \ovoid{\throw}}
					\newcommand\fillclipped[2]{%
						\begin{scope}[even odd rule]
							\foreach \thing in {#2} {\clip \thing;}
							\fill[black!#1] \bound;
						\end{scope}%
					}%
					\fillclipped{80}{\future,\thepast}
					\fillclipped{30}{\present,\future,\bound \thepast}
					\fillclipped{15}{\present,\bound \future,\bound \thepast}
					\draw \future;
					\fillclipped{45}{\present,\thepast}
					\draw \thepast;
					\draw \present;
					\node at (barycentric cs:p2=1,p1=-0.17,p3=-0.17) {$r_\mu$};
					\node at (barycentric cs:p1=-0.4,p2=1.0,p3=1) {$b_\mu$};
					\node at (barycentric cs:p3=0,p2=1,p1=1.2) [shape=rectangle,fill=black!45,inner sep=1pt]{$\rho_\mu$};
					\path (p2) +(140:3em) node {$X_0$};
	%            \node at (barycentric cs:p3=0,p2=1,p1=1) {$\rho_\mu$};
					\path (p3) +(3em,0em) node  {\shortstack{infinite\\future}};
					\path (p1) +(-3em,0em) node  {\shortstack{infinite\\past}};
					\path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
					\path (p3) +(4em,\rad) node [anchor=south] {$X_1,\ldots$};
				\end{tikzpicture}}%
				\\[0.5em]
		\end{tabular}
		\caption{
		I-diagrams for several information measures in
		stationary random processes. Each circle or oval represents a random
		variable or sequence of random variables relative to time $t=0$. Overlapped areas
		correspond to various mutual information as in \Figrf{venn-example}.
		In (b), the circle represents the `present'. Its total area is
		$H(X_0)=\rho_\mu+r_\mu+b_\mu$, where $\rho_\mu$ is the multi-information
		rate, $r_\mu$ is the residual entropy rate, and $b_\mu$ is the predictive
		information rate. The entropy rate is $h_\mu = r_\mu+b_\mu$. The small dark
		region  below $X_0$ in (c) is $\sigma_\mu = E-\rho_\mu$.
		}
	\end{fig}

	If we step back, out of the observer's shoes as it were, and consider the
	random process $(\ldots,X_{-1},X_0,X_1,\dots)$ as a statistical ensemble of 
	possible realisations, and furthermore assume that it is stationary,
	then it becomes possible to define a number of information-theoretic measures,
	closely related to those described above, but which characterise the
	process as a whole, rather than on a moment-by-moment basis. Some of these,
	such as the entropy rate, are well-known, but others are only recently being
	investigated. (In the following, the assumption of stationarity means that
	the measures defined below are independent of $t$.)

	The \emph{entropy rate} of the process is the entropy of the next variable
	$X_t$ given all the previous ones.
	\begin{equation}
		\label{eq:entro-rate}
		h_\mu = H(X_t|\past{X}_t).
	\end{equation}
	The entropy rate is a measure of the overall surprisingness
	or unpredictability of the process, and gives an indication of the average
	level of surprise and uncertainty that would be experienced by an observer
	processing a sequence sampled from the process using the methods of
	\secrf{surprise-info-seq}.

	The \emph{multi-information rate} $\rho_\mu$ (following Dubnov's \cite{Dubnov2006}
	notation for what he called the `information rate') is the mutual
	information between the `past' and the `present':
	\begin{equation}
		\label{eq:multi-info}
			\rho_\mu = I(\past{X}_t;X_t) = H(X_t) - h_\mu.
	\end{equation}
	It is a measure of how much the context of an observation (that is,
	the observation of previous elements of the sequence) helps in predicting
	or reducing the suprisingness of the current observation.

	The \emph{excess entropy} \cite{CrutchfieldPackard1983} 
	is the mutual information between 
	the entire `past' and the entire `future':
	\begin{equation}
		E = I(\past{X}_t; X_t,\fut{X}_t).
	\end{equation}
	Both the excess entropy and the multi-information rate can be thought
	of as measures of \emph{redundancy}, quantifying the extent to which 
	the same information is to be found in all parts of the sequence.
	

	The \emph{predictive information rate} (or PIR) \cite{AbdallahPlumbley2009}  
	is the mutual information between the present and the infinite future given the infinite
	past:
	\begin{equation}
		\label{eq:PIR}
		b_\mu = I(X_t;\fut{X}_t|\past{X}_t) = H(\fut{X}_t|\past{X}_t) - H(\fut{X}_t|X_t,\past{X}_t).
	\end{equation}
	Equation \eqrf{PIR} can be read as the average reduction
	in uncertainty about the future on learning $X_t$, given the past. 
	Due to the symmetry of the mutual information, it can also be written
	as 
	\begin{equation}
%		\IXZ_t 
b_\mu = H(X_t|\past{X}_t) - H(X_t|\past{X}_t,\fut{X}_t) = h_\mu - r_\mu,
%		\label{<++>}
	\end{equation}
%	If $X$ is stationary, then 
	where $r_\mu = H(X_t|\fut{X}_t,\past{X}_t)$, 
	is the \emph{residual} \cite{AbdallahPlumbley2010},
	or \emph{erasure} \cite{VerduWeissman2006} entropy rate.
	These relationships are illustrated in \Figrf{predinfo-bg}, along with
	several of the information measures we have discussed so far.
	The PIR gives an indication of the average IPI that would be experienced
	by an observer processing a sequence sampled from this process.


	James et al \cite{JamesEllisonCrutchfield2011} review several of these
	information measures and introduce some new related ones.
	In particular they identify the $\sigma_\mu = I(\past{X}_t;\fut{X}_t|X_t)$, 
	the mutual information between the past and the future given the present,
	as an interesting quantity that measures the predictive benefit of
	model-building (that is, maintaining an internal state summarising past 
	observations in order to make better predictions). It is shown as the
	small dark region below the circle in \figrf{predinfo-bg}(c).
	By comparing with \figrf{predinfo-bg}(b), we can see that 
	$\sigma_\mu = E - \rho_\mu$.
%	They also identify
%	$w_\mu = \rho_\mu + b_{\mu}$, which they call the \emph{local exogenous
%	information} rate.


	\subsection{First and higher order Markov chains}
	\label{s:markov}
	First order Markov chains are the simplest non-trivial models to which information 
	dynamics methods can be applied. In \cite{AbdallahPlumbley2009} we derived
	expressions for all the information measures described in \secrf{surprise-info-seq} for
	irreducible stationary Markov chains (\ie that have a unique stationary
	distribution). The derivation is greatly simplified by the dependency structure
	of the Markov chain: for the purpose of the analysis, the `past' and `future'
	segments $\past{X}_t$ and $\fut{X}_t$ can be collapsed to just the previous
	and next variables $X_{t-1}$ and $X_{t+1}$ respectively. We also showed that 
	the predictive information rate can be expressed simply in terms of entropy rates:
	if we let $a$ denote the $K\times K$ transition matrix of a Markov chain over
	an alphabet of $\{1,\ldots,K\}$, such that
	$a_{ij} = \Pr(\ev(X_t=i|X_{t-1}=j))$, and let $h:\reals^{K\times K}\to \reals$ be
	the entropy rate function such that $h(a)$ is the entropy rate of a Markov chain
	with transition matrix $a$, then the predictive information rate $b(a)$ is
	\begin{equation}
		b(a) = h(a^2) - h(a),
	\end{equation}
	where $a^2$, the transition matrix squared, is the transition matrix
	of the `skip one' Markov chain obtained by jumping two steps at a time
	along the original chain.

	Second and higher order Markov chains can be treated in a similar way by transforming
	to a first order representation of the high order Markov chain. If we are dealing
	with an $N$th order model, this is done forming a new alphabet of size $K^N$
	consisting of all possible $N$-tuples of symbols from the base alphabet. 
	An observation $\hat{x}_t$ in this new model encodes a block of $N$ observations 
	$(x_{t+1},\ldots,x_{t+N})$ from the base model. The next
	observation $\hat{x}_{t+1}$ encodes the block of $N$ obtained by shifting the previous 
	block along by one step. The new Markov of chain is parameterised by a sparse $K^N\times K^N$
	transition matrix $\hat{a}$. Adopting the label $\mu$ for the order $N$ system, 
	we obtain:
	\begin{equation}
		h_\mu = h(\hat{a}), \qquad b_\mu = h({\hat{a}^{N+1}}) - N h({\hat{a}}),
	\end{equation}
	where $\hat{a}^{N+1}$ is the $(N+1)$th power of the first order transition matrix.  
	Other information measures can also be computed for the high-order Markov chain, including
	the multi-information rate $\rho_\mu$ and the excess entropy $E$. These are identical
	for first order Markov chains, but for order $N$ chains, $E$ can be up to $N$ times larger 
	than $\rho_\mu$.

	[Something about what kinds of Markov chain maximise $h_\mu$ (uncorrelated `white'
	sequences, no temporal structure), $\rho_\mu$ and $E$ (periodic) and $b_\mu$. We return
	this in \secrf{composition}.]
		

\section{Information Dynamics in Analysis}

    \begin{fig}{twopages}
      \colfig[0.96]{matbase/fig9471}  % update from mbc paper
%      \colfig[0.97]{matbase/fig72663}\\  % later update from mbc paper (Keith's new picks)
			\vspace*{1em}
      \colfig[0.97]{matbase/fig13377}  % rule based analysis
      \caption{Analysis of \emph{Two Pages}.
      The thick vertical lines are the part boundaries as indicated in
      the score by the composer.
      The thin grey lines
      indicate changes in the melodic `figures' of which the piece is
      constructed. In the `model information rate' panel, the black asterisks
      mark the
      six most surprising moments selected by Keith Potter.
      The bottom panel shows a rule-based boundary strength analysis computed
      using Cambouropoulos' LBDM.
      All information measures are in nats and time is in notes.
      }
    \end{fig}

 	\subsection{Musicological Analysis}
	In \cite{AbdallahPlumbley2009}, methods based on the theory described above
	were used to analysis two pieces of music in the minimalist style 
	by Philip Glass: \emph{Two Pages} (1969) and \emph{Gradus} (1968).
	The analysis was done using a first-order Markov chain model, with the
	enhancement that the transition matrix of the model was allowed to
	evolve dynamically as the notes were processed, and was tracked (in
	a Bayesian way) as a \emph{distribution} over possible transition matrices,
	rather than a point estimate. The results are summarised in \figrf{twopages}:
	the  upper four plots show the dynamically evolving subjective information
	measures as described in \secrf{surprise-info-seq} computed using a point
	estimate of the current transition matrix, but the fifth plot (the `model information rate')
	measures the information in each observation about the transition matrix.
	In \cite{AbdallahPlumbley2010b}, we showed that this `model information rate'
	is actually a component of the true IPI in
	a time-varying Markov chain, which was neglected when we computed the IPI from
	point estimates of the transition matrix as if the transition probabilities
	were constant.

	The peaks of the surprisingness and both components of the predictive information 
	show good correspondence with structure of the piece both as marked in the score
	and as analysed by musicologist Keith Potter, who was asked to mark the six
	`most surprising moments' of the piece (shown as asterisks in the fifth plot)%
	\footnote{%
	Note that the boundary marked in the score at around note 5,400 is known to be
	anomalous; on the basis of a listening analysis, some musicologists [ref] have
	placed the boundary a few bars later, in agreement with our analysis.}.

	In contrast, the analyses shown in the lower two plots of \figrf{twopages},
	obtained using two rule-based music segmentation algorithms, while clearly
	\emph{reflecting} the structure of the piece, do not \emph{segment} the piece,
	with no tendency to peaking of the boundary strength function at
	the boundaries in the piece.
	
	The complete analysis of \emph{Gradus} can be found in \cite{AbdallahPlumbley2009},
	but \figrf{metre} illustrates the result of a metrical analysis: the piece was divided
	into bars of 32, 64 and 128 notes. In each case, the average surprisingness and
	IPI for the first, second, third \etc notes in each bar were computed. The plots
	show that the first note of each bar is, on average, significantly more surprising
	and informative than the others, up to the 64-note level, where as at the 128-note,
	level, the dominant periodicity appears to remain at 64 notes.

    \begin{fig}{metre}
%      \scalebox{1}[1]{%
        \begin{tabular}{cc}
       \colfig[0.45]{matbase/fig36859} & \colfig[0.48]{matbase/fig88658} \\
       \colfig[0.45]{matbase/fig48061} & \colfig[0.48]{matbase/fig46367} \\
       \colfig[0.45]{matbase/fig99042} & \colfig[0.47]{matbase/fig87490} 
%				\colfig[0.46]{matbase/fig56807} & \colfig[0.48]{matbase/fig27144} \\
%				\colfig[0.46]{matbase/fig87574} & \colfig[0.48]{matbase/fig13651} \\
%				\colfig[0.44]{matbase/fig19913} & \colfig[0.46]{matbase/fig66144} \\
%        \colfig[0.48]{matbase/fig73098} & \colfig[0.48]{matbase/fig57141} \\
%       \colfig[0.48]{matbase/fig25703} & \colfig[0.48]{matbase/fig72080} \\
%        \colfig[0.48]{matbase/fig9142}  & \colfig[0.48]{matbase/fig27751}
 
        \end{tabular}%
%     }
      \caption{Metrical analysis by computing average surprisingness and
      informative of notes at different periodicities (\ie hypothetical
      bar lengths) and phases (\ie positions within a bar).
      }
    \end{fig}

    \subsection{Content analysis/Sound Categorisation}
	 Using analogous definitions of differential entropy, the methods outlined
	 in the previous section are equally applicable to continuous random variables.
	 In the case of music, where expressive properties such as dynamics, tempo,
	 timing and timbre are readily quantified on a continuous scale, the information
	 dynamic framework thus may also be considered.

	 In \cite{Dubnov2006}, Dubnov considers the class of stationary Gaussian
	 processes. For such processes, the entropy rate may be obtained analytically
	 from the power spectral density of the signal, allowing the multi-information
	 rate to be subsequently obtained. Local stationarity is assumed, which may
	 be achieved by windowing or change point detection \cite{Dubnov2008}. %TODO
	 mention non-gaussian processes extension Similarly, the predictive information
	 rate may be computed using a Gaussian linear formulation CITE. In this view,
	 the PIR is a function of the correlation  between random innovations supplied
	 to the stochastic process.  %Dubnov, MacAdams, Reynolds (2006) %Bailes and
	 Dean (2009)

		[ Continuous domain information ]
		[Audio based music expectation modelling]
		[ Gaussian processes]


\subsection{Beat Tracking}

A probabilistic method for drum tracking was presented by Robertson 
\cite{Robertson11c}. The algorithm is used to synchronise a music 
sequencer to a live drummer. The expected beat time of the sequencer is 
represented by a click track, and the algorithm takes as input event 
times for discrete kick and snare drum events relative to this click 
track. These are obtained using dedicated microphones for each drum and 
using a percussive onset detector (Puckette 1998). The drum tracker 
continually updates distributions for tempo and phase on receiving a new 
event time. We can thus quantify the information contributed of an event 
by measuring the difference between the system's prior distribution and 
the posterior distribution using the Kullback-Leiber divergence.

Here, we have calculated the KL divergence and entropy for kick and 
snare events in sixteen files. The analysis of information rates can be 
considered \emph{subjective}, in that it measures how the drum tracker's 
probability distributions change, and these are contingent upon the 
model used as well as external properties in the signal. We expect, 
however, that following periods of increased uncertainty, such as fills 
or expressive timing, the information contained in an individual event 
increases. We also examine whether the information is dependent upon 
metrical position.


\section{Information dynamics as compositional aid}
\label{s:composition}

The use of stochastic processes in music composition has been widespread for
decades---for instance Iannis Xenakis applied probabilistic mathematical models
to the creation of musical materials\cite{Xenakis:1992ul}. While such processes
can drive the \emph{generative} phase of the creative process, information dynamics
can serve as a novel framework for a \emph{selective} phase, by 
providing a set of criteria to be used in judging which of the 
generated materials
are of value. This alternation of generative and selective phases as been
noted by art theorist Margaret Boden \cite{Boden1990}.

Information-dynamic criteria can also be used as \emph{constraints} on the
generative processes, for example, by specifying a certain temporal profile
of suprisingness and uncertainty the composer wishes to induce in the listener
as the piece unfolds.
%stochastic and algorithmic processes: ; outputs can be filtered to match a set of
%criteria defined in terms of information-dynamical characteristics, such as
%predictability vs unpredictability
%s model, this criteria thus becoming a means of interfacing with the generative processes.  

The tools of information dynamics provide a way to constrain and select musical
materials at the level of patterns of expectation, implication, uncertainty, and predictability.
In particular, the behaviour of the predictive information rate (PIR) defined in 
\secrf{process-info} make it interesting from a compositional point of view. The definition 
of the PIR is such that it is low both for extremely regular processes, such as constant
or periodic sequences, \emph{and} low for extremely random processes, where each symbol
is chosen independently of the others, in a kind of `white noise'. In the former case,
the pattern, once established, is completely predictable and therefore there is no
\emph{new} information in subsequent observations. In the latter case, the randomness
and independence of all elements of the sequence means that, though potentially surprising,
each observation carries no information about the ones to come.

Processes with high PIR maintain a certain kind of balance between
predictability and unpredictability in such a way that the observer must continually
pay attention to each new observation as it occurs in order to make the best
possible predictions about the evolution of the seqeunce. This balance between predictability
and unpredictability is reminiscent of the inverted `U' shape of the Wundt curve (see \figrf{wundt}), 
which summarises the observations of Wundt that the greatest aesthetic value in art
is to be found at intermediate levels of disorder, where there is a balance between
`order' and `chaos'. 

Using the methods of \secrf{markov}, we found \cite{AbdallahPlumbley2009}
a similar shape when plotting entropy rate againt PIR---this is visible in the
upper envelope of the scatter plot in \figrf{mtriscat}, which is a 3-D scatter plot of 
three of the information measures discussed in \secrf{process-info} for several thousand 
first-order Markov chain transition matrices generated by a random sampling method. 
The coordinates of the `information space' are entropy rate ($h_\mu$), redundancy ($\rho_\mu$), and
predictive information rate ($b_\mu$). The points along the 'redundancy' axis correspond
to periodic Markov chains. Those along the `entropy' produce uncorrelated sequences
with no temporal structure. Processes with high PIR are to be found at intermediate
levels of entropy and redundancy.
These observations led us to construct the `Melody Triangle' as a graphical interface
for exploring the melodic patterns generated by each of the Markov chains represented
as points in \figrf{mtriscat}.

  \begin{fig}{wundt}
    \raisebox{-4em}{\colfig[0.43]{wundt}}
 %  {\ \shortstack{{\Large$\longrightarrow$}\\ {\scriptsize\emph{exposure}}}\ }
    {\ {\large$\longrightarrow$}\ }
    \raisebox{-4em}{\colfig[0.43]{wundt2}}
    \caption{
      The Wundt curve relating randomness/complexity with
      perceived value. Repeated exposure sometimes results
      in a move to the left along the curve \cite{Berlyne71}.
    }
  \end{fig}
 

%It is possible to apply information dynamics to the generation of content, such as to the composition of musical materials. 

%For instance a stochastic music generating process could be controlled by modifying
%constraints on its output in terms of predictive information rate or entropy
%rate.



 \subsection{The Melody Triangle}  

The Melody Triangle is an exploratory interface for the discovery of melodic
content, where the input---positions within a triangle---directly map to information
theoretic measures of the output.  The measures---entropy rate, redundancy and
predictive information rate---form a criteria with which to filter the output
of the stochastic processes used to generate sequences of notes.  These measures
address notions of expectation and surprise in music, and as such the Melody
Triangle is a means of interfacing with a generative process in terms of the
predictability of its output.

The triangle is `populated' with first order Markov chain transition
matrices as illustrated in \figrf{mtriscat}.

 \begin{fig}{mtriscat}
	\colfig{mtriscat}
	\caption{The population of transition matrices distributed along three axes of 
	redundancy, entropy rate and predictive information rate (all measured in bits).
	The concentrations of points along the redundancy axis correspond
	to Markov chains which are roughly periodic with periods of 2 (redundancy 1 bit),
	3, 4, \etc all the way to period 8 (redundancy 3 bits). The colour of each point
	represents its PIR---note that the highest values are found at intermediate entropy
	and redundancy, and that the distribution as a whole makes a curved triangle. Although
	not visible in this plot, it is largely hollow in the middle.}
\end{fig}

The distribution of transition matrices plotted in this space forms an arch shape
that is fairly thin.  It thus becomes a reasonable approximation to pretend that
it is just a sheet in two dimensions; and so we stretch out this curved arc into
a flat triangle.  It is this triangular sheet that is our `Melody Triangle' and
forms the interface by which the system is controlled.  Using this interface
thus involves a mapping to information space; a user selects a position within
the triangle, and a corresponding transition matrix is returned.  
\Figrf{TheTriangle} shows how the triangle maps to different measures of redundancy,
entropy rate and predictive information rate.
	

Each corner corresponds to three different extremes of predictability and
unpredictability, which could be loosely characterised as `periodicity', `noise'
and `repetition'.  Melodies from the `noise' corner have no discernible pattern;
they have high entropy rate, low predictive information rate and low redundancy.
These melodies are essentially totally random.  A melody along the `periodicity'
to `repetition' edge are all deterministic loops that get shorter as we approach
the `repetition' corner, until it becomes just one repeating note.  It is the
areas in between the extremes that provide the more `interesting' melodies.
These melodies have some level of unpredictability, but are not completely random.
 Or, conversely, are predictable, but not entirely so.

\begin{fig}{TheTriangle}
	\colfig[0.9]{TheTriangle.pdf}
	\caption{The Melody Triangle}
\end{fig}	

%PERHAPS WE SHOULD FOREGO TALKING ABOUT THE 
%INSTALLATION VERSION OF THE TRIANGLE? 
%feels a bit like a tangent, and could do with the space..
The Melody Triangle exists in two incarnations; a standard screen based interface
where a user moves tokens in and around a triangle on screen, and a multi-user
interactive installation where a Kinect camera tracks individuals in a space and
maps their positions in physical space to the triangle.  In the latter each visitor
that enters the installation generates a melody and can collaborate with their
co-visitors to generate musical textures---a playful yet informative way to
explore expectation and surprise in music.  Additionally visitors can change the 
tempo, register, instrumentation and periodicity of their melody with body gestures.

As a screen based interface the Melody Triangle can serve as a composition tool.
A triangle is drawn on the screen, screen space thus mapped to the statistical
space of the Melody Triangle.  A number of tokens, each representing a
melody, can be dragged in and around the triangle.  For each token, a sequence of symbols with
statistical properties that correspond to the token's position is generated.  These
symbols are then mapped to notes of a scale%
\footnote{However they could just as well be mapped to any other property, such
as intervals, chords, dynamics and timbres.  It is even possible to map the
symbols to non-sonic outputs, such as colours.  The possibilities afforded by
the Melody Triangle in these other domains remains to be investigated.}.
Additionally keyboard commands give control over other musical parameters.  

The Melody Triangle can generate intricate musical textures when multiple tokens
are in the triangle.  Unlike other computer aided composition tools or programming
environments, here the composer engages with music on a high and abstract level;
the interface relating to subjective expectation and predictability.




\subsection{Information Dynamics as Evaluative Feedback Mechanism}
%NOT SURE THIS SHOULD BE HERE AT ALL..?
	
\begin{fig}{mtri-results}
	\def\scat#1{\colfig[0.42]{mtri/#1}}
	\def\subj#1{\scat{scat_dwells_subj_#1} & \scat{scat_marks_subj_#1}}
	\begin{tabular}{cc}
		\subj{a} \\
		\subj{b} \\
		\subj{c} \\
		\subj{d}
	\end{tabular}
	\caption{Dwell times and mark positions from user trials with the
	on-screen Melody Triangle interface. The left-hand column shows
	the positions in a 2D information space (entropy rate vs multi-information rate
	in bits) where spent their time; the area of each circle is proportional
	to the time spent there. The right-hand column shows point which subjects
	`liked'.}
\end{fig}

Information measures on a stream of symbols can form a feedback mechanism; a
rudimentary `critic' of sorts.  For instance symbol by symbol measure of predictive
information rate, entropy rate and redundancy could tell us if a stream of symbols
is currently `boring', either because it is too repetitive, or because it is too
chaotic.  Such feedback would be oblivious to long term and large scale
structures and any cultural norms (such as style conventions), but 
nonetheless could provide a composer with valuable insight on
the short term properties of a work.  This could not only be used for the
evaluation of pre-composed streams of symbols, but could also provide real-time
feedback in an improvisatory setup.

\section{Musical Preference and Information Dynamics}
We are carrying out a study to investigate the relationship between musical
preference and the information dynamics models, the experimental interface a
simplified version of the screen-based Melody Triangle.  Participants are asked
to use this music pattern generator under various experimental conditions in a
composition task.  The data collected includes usage statistics of the system:
where in the triangle they place the tokens, how long they leave them there and
the state of the system when users, by pressing a key, indicate that they like
what they are hearing.  As such the experiments will help us identify any
correlation between the information theoretic properties of a stream and its
perceived aesthetic worth.

Some initial results for four subjects are shown in \figrf{mtri-results}. Though
subjects seem to exhibit distinct kinds of exploratory behaviour, we have
not been able to show any systematic across-subjects preference for any particular
region of the triangle.

Subjects' comments: several noticed the main organisation of the triangle:
repetative notes at the top, cyclic patters along the right edge, and unpredictable
notes towards the bottom left (a,c,f). Some did systematic exploration.
Felt that the right side was more `controllable' than the left (a,f)---a direct consequence
of their ability to return to a particular periodic pattern and recognise at
as one heard previously. Some (a,e) felt the trial was too long and became
bored towards the end.
One subject (f) felt there wasn't enough time to get to hear out the patterns properly.
One subject (b) didn't enjoy the lower region whereas another (d) said the lower
regions were more `melodic' and `interesting'.

%\emph{comparable system}  Gordon Pask's Musicolor (1953) applied a similar notion
%of boredom in its design.  The Musicolour would react to audio input through a
%microphone by flashing coloured lights.  Rather than a direct mapping of sound
%to light, Pask designed the device to be a partner to a performing musician.  It
%would adapt its lighting pattern based on the rhythms and frequencies it would
%hear, quickly `learning' to flash in time with the music.  However Pask endowed
%the device with the ability to `be bored'; if the rhythmic and frequency content
%of the input remained the same for too long it would listen for other rhythms
%and frequencies, only lighting when it heard these.  As the Musicolour would
%`get bored', the musician would have to change and vary their playing, eliciting
%new and unexpected outputs in trying to keep the Musicolour interested.
	

\section{Conclusion}
We outlined our information dynamics approach to the modelling of the perception
of music.  This approach models the subjective assessments of an observer that
updates its probabilistic model of a process dynamically as events unfold.  We
outlined `time-varying' information measures, including a novel `predictive
information rate' that characterises the surprisingness and predictability of
musical patterns.


We have outlined how information dynamics can serve in three different forms of
analysis; musicological analysis, sound categorisation and beat tracking.

We have described the `Melody Triangle', a novel system that enables a user/composer
to discover musical content in terms of the information theoretic properties of
the output, and considered how information dynamics could be used to provide
evaluative feedback on a composition or improvisation.  Finally we outline a
pilot study that used the Melody Triangle as an experimental interface to help
determine if there are any correlations between aesthetic preference and information
dynamics measures.


\section{acknowledgments}
This work is supported by EPSRC Doctoral Training Centre EP/G03723X/1 (HE),
GR/S82213/01 and EP/E045235/1(SA), an EPSRC DTA Studentship (PF), an RAEng/EPSRC Research Fellowship 10216/88 (AR), an EPSRC Leadership Fellowship, EP/G007144/1
(MDP) and EPSRC IDyOM2 EP/H013059/1.
This work is partly funded by the CoSound project, funded by the Danish Agency for Science, Technology and Innovation.


\bibliographystyle{abbrv}
{\bibliography{all,c4dm,nime,andrew}}
\end{document}