view draft.tex @ 35:194c7ec7e35d

Re-wrote section IV
author Henrik Ekeus <hekeus@eecs.qmul.ac.uk>
date Wed, 14 Mar 2012 18:21:16 +0000
parents 25846c37a08a
children ec7d64c0ae44
line wrap: on
line source
\documentclass[conference,a4paper]{IEEEtran}
\usepackage{cite}
\usepackage[cmex10]{amsmath}
\usepackage{graphicx}
\usepackage{amssymb}
\usepackage{epstopdf}
\usepackage{url}
\usepackage{listings}
%\usepackage[expectangle]{tools}
\usepackage{tools}
\usepackage{tikz}
\usetikzlibrary{calc}
\usetikzlibrary{matrix}
\usetikzlibrary{patterns}
\usetikzlibrary{arrows}

\let\citep=\cite
\newcommand{\colfig}[2][1]{\includegraphics[width=#1\linewidth]{figs/#2}}%
\newcommand\preals{\reals_+}
\newcommand\X{\mathcal{X}}
\newcommand\Y{\mathcal{Y}}
\newcommand\domS{\mathcal{S}}
\newcommand\A{\mathcal{A}}
\newcommand\Data{\mathcal{D}}
\newcommand\rvm[1]{\mathrm{#1}}
\newcommand\sps{\,.\,}
\newcommand\Ipred{\mathcal{I}_{\mathrm{pred}}}
\newcommand\Ix{\mathcal{I}}
\newcommand\IXZ{\overline{\underline{\mathcal{I}}}}
\newcommand\x{\vec{x}}
\newcommand\Ham[1]{\mathcal{H}_{#1}}
\newcommand\subsets[2]{[#1]^{(k)}}
\def\bet(#1,#2){#1..#2}


\def\ev(#1=#2){#1\!\!=\!#2}
\newcommand\rv[1]{\Omega \to #1}
\newcommand\ceq{\!\!=\!}
\newcommand\cmin{\!-\!}
\newcommand\modulo[2]{#1\!\!\!\!\!\mod#2}

\newcommand\sumitoN{\sum_{i=1}^N}
\newcommand\sumktoK{\sum_{k=1}^K}
\newcommand\sumjtoK{\sum_{j=1}^K}
\newcommand\sumalpha{\sum_{\alpha\in\A}}
\newcommand\prodktoK{\prod_{k=1}^K}
\newcommand\prodjtoK{\prod_{j=1}^K}

\newcommand\past[1]{\overset{\rule{0pt}{0.2em}\smash{\leftarrow}}{#1}}
\newcommand\fut[1]{\overset{\rule{0pt}{0.1em}\smash{\rightarrow}}{#1}}
\newcommand\parity[2]{P^{#1}_{2,#2}}

%\usepackage[parfill]{parskip}

\begin{document}
\title{Cognitive Music Modelling: an Information Dynamics Approach}

\author{
	\IEEEauthorblockN{Samer A. Abdallah, Henrik Ekeus, Peter Foster}
	\IEEEauthorblockN{Andrew Robertson and Mark D. Plumbley}
	\IEEEauthorblockA{Centre for Digital Music\\
		Queen Mary University of London\\
		Mile End Road, London E1 4NS\\
		Email:}}

\maketitle
\begin{abstract}
	People take in information when perceiving music.  With it they continually
	build predictive models of what is going to happen.  There is a relationship
	between information measures and how we perceive music.  An information
	theoretic approach to music cognition is thus a fruitful avenue of research.
	In this paper, we review the theoretical foundations of information dynamics
	and discuss a few emerging areas of application.
\end{abstract}


\section{Introduction}
\label{s:Intro}

	\subsection{Expectation and surprise in music}
	One of the effects of listening to music is to create 
	expectations of what is to come next, which may be fulfilled
	immediately, after some delay, or not at all as the case may be.
	This is the thesis put forward by, amongst others, music theorists 
	L. B. Meyer \cite{Meyer67} and Narmour \citep{Narmour77}, but was
	recognised much earlier; for example, 
	it was elegantly put by Hanslick \cite{Hanslick1854} in the
	nineteenth century:
	\begin{quote}
			`The most important factor in the mental process which accompanies the
			act of listening to music, and which converts it to a source of pleasure, 
			is \ldots the intellectual satisfaction 
			which the listener derives from continually following and anticipating 
			the composer's intentions---now, to see his expectations fulfilled, and 
			now, to find himself agreeably mistaken. 
			%It is a matter of course that 
			%this intellectual flux and reflux, this perpetual giving and receiving 
			%takes place unconsciously, and with the rapidity of lightning-flashes.'
	\end{quote}
	An essential aspect of this is that music is experienced as a phenomenon
	that `unfolds' in time, rather than being apprehended as a static object
	presented in its entirety. Meyer argued that musical experience depends
	on how we change and revise our conceptions \emph{as events happen}, on
	how expectation and prediction interact with occurrence, and that, to a
	large degree, the way to understand the effect of music is to focus on
	this `kinetics' of expectation and surprise.

  Prediction and expectation are essentially probabilistic concepts
  and can be treated mathematically using probability theory.
  We suppose that when we listen to music, expectations are created on the basis 
	of our familiarity with various styles of music and our ability to
	detect and learn statistical regularities in the music as they emerge,
	There is experimental evidence that human listeners are able to internalise
	statistical knowledge about musical structure, \eg
	\citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also
	that statistical models can form an effective basis for computational
	analysis of music, \eg
	\cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}.


\comment{
	The business of making predictions and assessing surprise is essentially
	one of reasoning under conditions of uncertainty and manipulating
	degrees of belief about the various proposition which may or may not
	hold, and, as has been argued elsewhere \cite{Cox1946,Jaynes27}, best
	quantified in terms of Bayesian probability theory.
   Thus, we suppose that
	when we listen to music, expectations are created on the basis of our
	familiarity with various stylistic norms that apply to music in general,
	the particular style (or styles) of music that seem best to fit the piece 
	we are listening to, and
	the emerging structures peculiar to the current piece.  There is
	experimental evidence that human listeners are able to internalise
	statistical knowledge about musical structure, \eg
	\citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also
	that statistical models can form an effective basis for computational
	analysis of music, \eg
	\cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}.
}

	\subsection{Music and information theory}
	With a probabilistic framework for music modelling and prediction in hand,
	we are in a position to apply Shannon's quantitative information theory 
	\cite{Shannon48}. 
\comment{
	which provides us with a number of measures, such as entropy
  and mutual information, which are suitable for quantifying states of
  uncertainty and surprise, and thus could potentially enable us to build
  quantitative models of the listening process described above.  They are
  what Berlyne \cite{Berlyne71} called `collative variables' since they are
  to do with patterns of occurrence rather than medium-specific details.
  Berlyne sought to show that the collative variables are closely related to
  perceptual qualities like complexity, tension, interestingness,
  and even aesthetic value, not just in music, but in other temporal
  or visual media.
  The relevance of information theory to music and art has
  also been addressed by researchers from the 1950s onwards
  \cite{Youngblood58,CoonsKraehenbuehl1958,Cohen1962,HillerBean66,Moles66,Meyer67}.
}
	The relationship between information theory and music and art in general has been the 
	subject of some interest since the 1950s 
	\cite{Youngblood58,CoonsKraehenbuehl1958,HillerBean66,Moles66,Meyer67,Cohen1962}. 
	The general thesis is that perceptible qualities and subjective
	states like uncertainty, surprise, complexity, tension, and interestingness
	are closely related to 
	information-theoretic quantities like entropy, relative entropy,
	and mutual information.
%	and are major determinants of the overall experience.
	Berlyne \cite{Berlyne71} called such quantities `collative variables', since 
	they are to do with patterns of occurrence rather than medium-specific details,
	and developed the ideas of `information aesthetics' in an experimental setting.  
%	Berlyne's `new experimental aesthetics', the `information-aestheticians'.

%	Listeners then experience greater or lesser levels of surprise
%	in response to departures from these norms. 
%	By careful manipulation
%	of the material, the composer can thus define, and induce within the
%	listener, a temporal programme of varying
%	levels of uncertainty, ambiguity and surprise. 


\subsection{Information dynamic approach}

	Bringing the various strands together, our working hypothesis is that as a
	listener (to which will refer as `it') listens to a piece of music, it maintains
	a dynamically evolving probabilistic model that enables it to make predictions
	about how the piece will continue, relying on both its previous experience
	of music and the immediate context of the piece.  As events unfold, it revises
	its probabilistic belief state, which includes predictive
	distributions over possible future events.  These 
%	distributions and changes in distributions 
	can be characterised in terms of a handful of information
	theoretic-measures such as entropy and relative entropy.  By tracing the 
	evolution of a these measures, we obtain a representation which captures much
	of the significant structure of the music.
	
	One of the consequences of this approach is that regardless of the details of
	the sensory input or even which sensory modality is being processed, the resulting 
	analysis is in terms of the same units: quantities of information (bits) and
	rates of information flow (bits per second). The probabilistic and information
	theoretic concepts in terms of which the analysis is framed are universal to all sorts 
	of data.
	In addition, when adaptive probabilistic models are used, expectations are
	created mainly in response to to \emph{patterns} of occurence, 
	rather the details of which specific things occur.
	Together, these suggest that an information dynamic analysis captures a
	high level of \emph{abstraction}, and could be used to 
	make structural comparisons between different temporal media,
	such as music, film, animation, and dance.
%	analyse and compare information 
%	flow in different temporal media regardless of whether they are auditory, 
%	visual or otherwise. 

	Another consequence is that the information dynamic approach gives us a principled way
	to address the notion of \emph{subjectivity}, since the analysis is dependent on the 
	probability model the observer starts off with, which may depend on prior experience 
	or other factors, and which may change over time. Thus, inter-subject variablity and 
	variation in subjects' responses over time are 
	fundamental to the theory. 
			
	%modelling the creative process, which often alternates between generative
	%and selective or evaluative phases \cite{Boden1990}, and would have
	%applications in tools for computer aided composition.


\section{Theoretical review}

	\subsection{Entropy and information}
	Let $X$ denote some variable whose value is initially unknown to our 
	hypothetical observer. We will treat $X$ mathematically as a random variable,
	with a value to be drawn from some set $\A$ and a 
	probability distribution representing the observer's beliefs about the 
	true value of $X$.
	In this case, the observer's uncertainty about $X$ can be quantified
	as the entropy of the random variable $H(X)$. For a discrete variable
	with probability mass function $p:\A \to [0,1]$, this is
	\begin{equation}
		H(X) = \sum_{x\in\A} -p(x) \log p(x) = \expect{-\log p(X)},
	\end{equation}
	where $\expect{}$ is the expectation operator. The negative-log-probability
	$\ell(x) = -\log p(x)$ of a particular value $x$ can usefully be thought of as
	the \emph{surprisingness} of the value $x$ should it be observed, and
	hence the entropy is the expected surprisingness.

	Now suppose that the observer receives some new data $\Data$ that
	causes a revision of its beliefs about $X$. The \emph{information}
	in this new data \emph{about} $X$ can be quantified as the 
	Kullback-Leibler (KL) divergence between the prior and posterior
	distributions $p(x)$ and $p(x|\Data)$ respectively:
	\begin{equation}
		\mathcal{I}_{\Data\to X} = D(p_{X|\Data} || p_{X})
			= \sum_{x\in\A} p(x|\Data) \log \frac{p(x|\Data)}{p(x)}.
	\end{equation}
	When there are multiple variables $X_1, X_2$ 
	\etc which the observer believes to be dependent, then the observation of 
	one may change its beliefs and hence yield information about the
	others. The joint and conditional entropies as described in any
	textbook on information theory (\eg \cite{CoverThomas}) then quantify
	the observer's expected uncertainty about groups of variables given the
	values of others. In particular, the \emph{mutual information}
	$I(X_1;X_2)$ is both the expected information
	in an observation of $X_2$ about $X_1$ and the expected reduction
	in uncertainty about $X_1$ after observing $X_2$:
	\begin{equation}
		I(X_1;X_2) = H(X_1) - H(X_1|X_2),
	\end{equation}
	where $H(X_1|X_2) = H(X_1,X_2) - H(X_2)$ is the conditional entropy
	of $X_2$ given $X_1$. A little algebra shows that $I(X_1;X_2)=I(X_2;X_1)$
	and so the mutual information is symmetric in its arguments. A conditional
	form of the mutual information can be formulated analogously:
	\begin{equation}
		I(X_1;X_2|X_3) = H(X_1|X_3) - H(X_1|X_2,X_3).
	\end{equation}
	These relationships between the various entropies and mutual
	informations are conveniently visualised in Venn diagram-like \emph{information diagrams}
	or I-diagrams \cite{Yeung1991} such as the one in \figrf{venn-example}.

	\begin{fig}{venn-example}
		\newcommand\rad{2.2em}%
		\newcommand\circo{circle (3.4em)}%
		\newcommand\labrad{4.3em}
		\newcommand\bound{(-6em,-5em) rectangle (6em,6em)}
		\newcommand\colsep{\ }
		\newcommand\clipin[1]{\clip (#1) \circo;}%
		\newcommand\clipout[1]{\clip \bound (#1) \circo;}%
		\newcommand\cliptwo[3]{%
			\begin{scope}
				\clipin{#1};
				\clipin{#2};
				\clipout{#3};
				\fill[black!30] \bound;
			\end{scope}
		}%
		\newcommand\clipone[3]{%
			\begin{scope}
				\clipin{#1};
				\clipout{#2};
				\clipout{#3};
				\fill[black!15] \bound;
			\end{scope}
		}%
		\begin{tabular}{c@{\colsep}c}
			\begin{tikzpicture}[baseline=0pt]
				\coordinate (p1) at (90:\rad);
				\coordinate (p2) at (210:\rad);
				\coordinate (p3) at (-30:\rad);
				\clipone{p1}{p2}{p3};
				\clipone{p2}{p3}{p1};
				\clipone{p3}{p1}{p2};
				\cliptwo{p1}{p2}{p3};
				\cliptwo{p2}{p3}{p1};
				\cliptwo{p3}{p1}{p2};
            \begin{scope}
               \clip (p1) \circo;
               \clip (p2) \circo;
               \clip (p3) \circo;
               \fill[black!45] \bound;
            \end{scope}
				\draw (p1) \circo;
				\draw (p2) \circo;
				\draw (p3) \circo;
				\path 
					(barycentric cs:p3=1,p1=-0.2,p2=-0.1) +(0ex,0) node {$I_{3|12}$}
					(barycentric cs:p1=1,p2=-0.2,p3=-0.1) +(0ex,0) node {$I_{1|23}$}
					(barycentric cs:p2=1,p3=-0.2,p1=-0.1) +(0ex,0) node {$I_{2|13}$}
					(barycentric cs:p3=1,p2=1,p1=-0.55) +(0ex,0) node {$I_{23|1}$}
					(barycentric cs:p1=1,p3=1,p2=-0.55) +(0ex,0) node {$I_{13|2}$}
					(barycentric cs:p2=1,p1=1,p3=-0.55) +(0ex,0) node {$I_{12|3}$}
					(barycentric cs:p3=1,p2=1,p1=1) node {$I_{123}$}
					;
				\path
					(p1) +(140:\labrad) node {$X_1$}
					(p2) +(-140:\labrad) node {$X_2$}
					(p3) +(-40:\labrad) node {$X_3$};
			\end{tikzpicture}
			&
			\parbox{0.5\linewidth}{
				\small
				\begin{align*}
					I_{1|23} &= H(X_1|X_2,X_3) \\
					I_{13|2} &= I(X_1;X_3|X_2) \\
					I_{1|23} + I_{13|2} &= H(X_1|X_2) \\
					I_{12|3} + I_{123} &= I(X_1;X_2) 
				\end{align*}
			}
		\end{tabular}
		\caption{
		I-diagram visualisation of entropies and mutual informations
		for three random variables $X_1$, $X_2$ and $X_3$. The areas of 
		the three circles represent $H(X_1)$, $H(X_2)$ and $H(X_3)$ respectively.
		The total shaded area is the joint entropy $H(X_1,X_2,X_3)$.
		The central area $I_{123}$ is the co-information \cite{McGill1954}.
		Some other information measures are indicated in the legend.
		}
	\end{fig}


	\subsection{Entropy and information in sequences}

	Suppose that  $(\ldots,X_{-1},X_0,X_1,\ldots)$ is a stationary sequence of
	random variables, infinite in both directions, 
	and that $\mu$ is the associated shift-invariant probability measure over all 
	configurations of the sequence---in the following, $\mu$ will simply serve
	as a label for the process. We can indentify a number of information-theoretic
	measures meaningful in the context of a sequential observation of the sequence, during
	which, at any time $t$, there is `present' $X_t$, a `past' 
	$\past{X}_t \equiv (\ldots, X_{t-2}, X_{t-1})$, and a `future' 
	$\fut{X}_t \equiv (X_{t+1},X_{t+2},\ldots)$.
	Since the sequence is assumed stationary, we can without loss of generality,
	assume that $t=0$ in the following definitions.

	The \emph{entropy rate} of the process is the entropy of the next variable
	$X_t$ given all the previous ones.
	\begin{equation}
		\label{eq:entro-rate}
		h_\mu = H(X_0|\past{X}_0).
	\end{equation}
	The entropy rate gives a measure of the overall randomness
	or unpredictability of the process.

	The \emph{multi-information rate} $\rho_\mu$ (following Dubnov's \cite{Dubnov2006}
	notation for what he called the `information rate') is the mutual
	information between the `past' and the `present':
	\begin{equation}
		\label{eq:multi-info}
			\rho_\mu(t) = I(\past{X}_t;X_t) = H(X_0) - h_\mu.
	\end{equation}
	It is a measure of how much the context of an observation (that is,
	the observation of previous elements of the sequence) helps in predicting
	or reducing the suprisingness of the current observation.

	The \emph{excess entropy} \cite{CrutchfieldPackard1983} 
	is the mutual information between 
	the entire `past' and the entire `future':
	\begin{equation}
		E = I(\past{X}_0; X_0,\fut{X}_0).
	\end{equation}
	


 	\begin{fig}{predinfo-bg}
		\newcommand\subfig[2]{\shortstack{#2\\[0.75em]#1}}
		\newcommand\rad{1.8em}%
		\newcommand\ovoid[1]{%
			++(-#1,\rad) 
			-- ++(2 * #1,0em) arc (90:-90:\rad)
 			-- ++(-2 * #1,0em) arc (270:90:\rad) 
		}%
		\newcommand\axis{2.75em}%
		\newcommand\olap{0.85em}%
		\newcommand\offs{3.6em}
		\newcommand\colsep{\hspace{5em}}
		\newcommand\longblob{\ovoid{\axis}}
		\newcommand\shortblob{\ovoid{1.75em}}
		\begin{tabular}{c@{\colsep}c}
			\subfig{(a) excess entropy}{%
				\newcommand\blob{\longblob}
				\begin{tikzpicture}
					\coordinate (p1) at (-\offs,0em);
					\coordinate (p2) at (\offs,0em);
					\begin{scope}
						\clip (p1) \blob;
						\clip (p2) \blob;
						\fill[lightgray] (-1,-1) rectangle (1,1);
					\end{scope}
					\draw (p1) +(-0.5em,0em) node{\shortstack{infinite\\past}} \blob;
					\draw (p2) +(0.5em,0em) node{\shortstack{infinite\\future}} \blob;
					\path (0,0) node (future) {$E$};
					\path (p1) +(-2em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
					\path (p2) +(2em,\rad) node [anchor=south] {$X_0,\ldots$};
				\end{tikzpicture}%
			}%
			\\[1.25em]
			\subfig{(b) predictive information rate $b_\mu$}{%
				\begin{tikzpicture}%[baseline=-1em]
					\newcommand\rc{2.1em}
					\newcommand\throw{2.5em}
					\coordinate (p1) at (210:1.5em);
					\coordinate (p2) at (90:0.7em);
					\coordinate (p3) at (-30:1.5em);
					\newcommand\bound{(-7em,-2.6em) rectangle (7em,3.0em)}
					\newcommand\present{(p2) circle (\rc)}
					\newcommand\thepast{(p1) ++(-\throw,0) \ovoid{\throw}}
					\newcommand\future{(p3) ++(\throw,0) \ovoid{\throw}}
					\newcommand\fillclipped[2]{%
						\begin{scope}[even odd rule]
							\foreach \thing in {#2} {\clip \thing;}
							\fill[black!#1] \bound;
						\end{scope}%
					}%
					\fillclipped{30}{\present,\future,\bound \thepast}
					\fillclipped{15}{\present,\bound \future,\bound \thepast}
					\draw \future;
					\fillclipped{45}{\present,\thepast}
					\draw \thepast;
					\draw \present;
					\node at (barycentric cs:p2=1,p1=-0.17,p3=-0.17) {$r_\mu$};
					\node at (barycentric cs:p1=-0.4,p2=1.0,p3=1) {$b_\mu$};
					\node at (barycentric cs:p3=0,p2=1,p1=1.2) [shape=rectangle,fill=black!45,inner sep=1pt]{$\rho_\mu$};
					\path (p2) +(140:3em) node {$X_0$};
	%            \node at (barycentric cs:p3=0,p2=1,p1=1) {$\rho_\mu$};
					\path (p3) +(3em,0em) node  {\shortstack{infinite\\future}};
					\path (p1) +(-3em,0em) node  {\shortstack{infinite\\past}};
					\path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
					\path (p3) +(4em,\rad) node [anchor=south] {$X_1,\ldots$};
				\end{tikzpicture}}%
				\\[0.5em]
		\end{tabular}
		\caption{
		I-diagrams for several information measures in
		stationary random processes. Each circle or oval represents a random
		variable or sequence of random variables relative to time $t=0$. Overlapped areas
		correspond to various mutual information as in \Figrf{venn-example}.
		In (b), the circle represents the `present'. Its total area is
		$H(X_0)=\rho_\mu+r_\mu+b_\mu$, where $\rho_\mu$ is the multi-information
		rate, $r_\mu$ is the residual entropy rate, and $b_\mu$ is the predictive
		information rate. The entropy rate is $h_\mu = r_\mu+b_\mu$.
		}
	\end{fig}

	The \emph{predictive information rate} (or PIR) \cite{AbdallahPlumbley2009}  
	is the average information in one observation about the infinite future given the infinite past,
	and is defined as a conditional mutual information: 
	\begin{equation}
		\label{eq:PIR}
		b_\mu = I(X_0;\fut{X}_0|\past{X}_0) = H(\fut{X}_0|\past{X}_0) - H(\fut{X}_0|X_0,\past{X}_0).
	\end{equation}
	Equation \eqrf{PIR} can be read as the average reduction
	in uncertainty about the future on learning $X_t$, given the past. 
	Due to the symmetry of the mutual information, it can also be written
	as 
	\begin{equation}
%		\IXZ_t 
		I(X_0;\fut{X}_0|\past{X}_0) = h_\mu - r_\mu,
%		\label{<++>}
	\end{equation}
%	If $X$ is stationary, then 
	where $r_\mu = H(X_0|\fut{X}_0,\past{X}_0)$, 
	is the \emph{residual} \cite{AbdallahPlumbley2010},
	or \emph{erasure} \cite{VerduWeissman2006} entropy rate.
	These relationships are illustrated in \Figrf{predinfo-bg}, along with
	several of the information measures we have discussed so far.


	\subsection{Other sequential information measures}

	James et al \cite{JamesEllisonCrutchfield2011} study the predictive information
	rate and also examine some related measures. In particular they identify the
	$\sigma_\mu$, the difference between the multi-information rate and the excess
	entropy, as an interesting quantity that measures the predictive benefit of
	model-building (that is, maintaining an internal state summarising past 
	observations in order to make better predictions). They also identify
	$w_\mu = \rho_\mu + b_{\mu}$, which they call the \emph{local exogenous
	information} rate.

	\subsection{First order Markov chains}
	These are the simplest non-trivial models to which information dynamics methods
	can be applied. In \cite{AbdallahPlumbley2009} we, showed that the predictive information
	rate can be expressed simply in terms of the entropy rate of the Markov chain.
	If we let $a$ denote the transition matrix of the Markov chain, and $h_a$ it's
	entropy rate, then its predictive information rate $b_a$ is
	\begin{equation}
		b_a = h_{a^2} - h_a,
	\end{equation}
	where $a^2 = aa$, the transition matrix squared, is the transition matrix
	of the `skip one' Markov chain obtained by leaving out every other observation.

	\subsection{Higher order Markov chains}
	Second and higher order Markov chains can be treated in a similar way by transforming
	to a first order representation of the high order Markov chain. If we are dealing
	with an $N$th order model, this is done forming a new alphabet of possible observations 
	consisting of all possible $N$-tuples of symbols from the base alphabet. An observation
	in this new model represents a block of $N$ observations from the base model. The next
	observation represents the block of $N$ obtained by shift the previous block along
	by one step. The new Markov of chain is parameterised by a sparse $K^N\times K^N$
	transition matrix $\hat{a}$.
	\begin{equation}
		b_{\hat{a}} = h_{\hat{a}^{N+1}} - N h_{\hat{a}},
	\end{equation}
	where $\hat{a}^{N+1}$ is the $N+1$th power of the transition matrix.  
		

  \begin{fig}{wundt}
    \raisebox{-4em}{\colfig[0.43]{wundt}}
 %  {\ \shortstack{{\Large$\longrightarrow$}\\ {\scriptsize\emph{exposure}}}\ }
    {\ {\large$\longrightarrow$}\ }
    \raisebox{-4em}{\colfig[0.43]{wundt2}}
    \caption{
      The Wundt curve relating randomness/complexity with
      perceived value. Repeated exposure sometimes results
      in a move to the left along the curve \cite{Berlyne71}.
    }
  \end{fig}


\section{Information Dynamics in Analysis}

 	\subsection{Musicological Analysis}
	In \cite{AbdallahPlumbley2009}, methods based on the theory described above
	were used to analysis two pieces of music in the minimalist style 
	by Philip Glass: \emph{Two Pages} (1969) and \emph{Gradus} (1968).
	The analysis was done using a first-order Markov chain model, with the
	enhancement that the transition matrix of the model was allowed to
	evolve dynamically as the notes were processed, and was estimated (in
	a Bayesian way) as a \emph{distribution} over possible transition matrices,
	rather than a point estimate. [Bayesian surprise, other component of IPI].
	
    \begin{fig}{twopages}
      \colfig[0.96]{matbase/fig9471}  % update from mbc paper
%      \colfig[0.97]{matbase/fig72663}\\  % later update from mbc paper (Keith's new picks)
			\vspace*{1em}
      \colfig[0.97]{matbase/fig13377}  % rule based analysis
      \caption{Analysis of \emph{Two Pages}.
      The thick vertical lines are the part boundaries as indicated in
      the score by the composer.
      The thin grey lines
      indicate changes in the melodic `figures' of which the piece is
      constructed. In the `model information rate' panel, the black asterisks
      mark the
      six most surprising moments selected by Keith Potter.
      The bottom panel shows a rule-based boundary strength analysis computed
      using Cambouropoulos' LBDM.
      All information measures are in nats and time is in notes.
      }
    \end{fig}

    \begin{fig}{metre}
%      \scalebox{1}[1]{%
        \begin{tabular}{cc}
       \colfig[0.45]{matbase/fig36859} & \colfig[0.48]{matbase/fig88658} \\
       \colfig[0.45]{matbase/fig48061} & \colfig[0.48]{matbase/fig46367} \\
       \colfig[0.45]{matbase/fig99042} & \colfig[0.47]{matbase/fig87490} 
%				\colfig[0.46]{matbase/fig56807} & \colfig[0.48]{matbase/fig27144} \\
%				\colfig[0.46]{matbase/fig87574} & \colfig[0.48]{matbase/fig13651} \\
%				\colfig[0.44]{matbase/fig19913} & \colfig[0.46]{matbase/fig66144} \\
%        \colfig[0.48]{matbase/fig73098} & \colfig[0.48]{matbase/fig57141} \\
%       \colfig[0.48]{matbase/fig25703} & \colfig[0.48]{matbase/fig72080} \\
%        \colfig[0.48]{matbase/fig9142}  & \colfig[0.48]{matbase/fig27751}
 
        \end{tabular}%
%     }
      \caption{Metrical analysis by computing average surprisingness and
      informative of notes at different periodicities (\ie hypothetical
      bar lengths) and phases (\ie positions within a bar).
      }
    \end{fig}

	\subsection{Content analysis/Sound Categorisation}.  
        Overview of of information-theoretic approaches to music content analysis.
        \begin{itemize}
            \item Continuous domain information 
						\item Audio based music expectation modelling
            \item Proposed model for Gaussian processes      
        \end{itemize}
    \emph{Peter}


\subsection{Beat Tracking}
 \emph{Andrew}  


\section{Information dynamics as compositional aid}

In addition to applying information dynamics to analysis, it is also possible to apply it to the generation of content, such as to the composition of musical materials. 
The outputs of algorithmic or stochastic processes can be filtered to match a set of criteria defined in terms of the information dynamics model, this criteria thus becoming a means of interfacing with the generative process.  
For instance a stochastic music generating process could be controlled by modifying constraints on its output in terms of predictive information rate or entropy rate.    

The use of stochastic processes for the composition of musical material has been widespread for decades -- for instance Iannis Xenakis applied probabilistic mathematical models to the creation of musical materials\cite{Xenakis:1992ul}.   
Information dynamics can serve as a novel framework for the exploration of the possibilities of such processes at the high and abstract level of expectation, randomness and predictability.

 \subsection{The Melody Triangle}  
The Melody Triangle is an exploratory interface for the discovery of melodic content, where the input -- positions within a triangle -- directly map to information theoretic measures of the output.  
The measures -- entropy rate, redundancy and predictive information rate -- form a criteria with which to filter the output of the stochastic processes used to generate sequences of notes. 
These measures address notions of expectation and surprise in music, and as such the Melody Triangle is a means of interfacing with a generative process in terms of the predictability of its output.       
 	
The triangle is `populated' with possible parameter values for melody generators. 
These are plotted in a 3d statistical space of redundancy, entropy rate and predictive information rate. 
 In our case we generated thousands of transition matrixes, representing first-order Markov chains, by a random sampling method. 
 In figure \ref{InfoDynEngine} we see a representation of how these matrixes are distributed in the 3d statistical space; each one of these points corresponds to a transition matrix.


 
	
The distribution of transition matrixes plotted in this space forms an arch shape that is fairly thin.  
It thus becomes a reasonable approximation to pretend that it is just a sheet in two dimensions; and so we stretch out this curved arc into a flat triangle.  
It is this triangular sheet that is our `Melody Triangle' and forms the interface by which the system is controlled.  
Using this interface thus involves a mapping to statistical space; a user selects a position within the triangle, and a corresponding transition matrix is returned.  
Figure \ref{TheTriangle} shows how the triangle maps to different measures of redundancy, entropy rate and predictive information rate.

	


Each corner corresponds to three different extremes of predictability and unpredictability, which could be loosely characterised as `periodicity', `noise' and `repetition'.  
Melodies from the `noise' corner have no discernible pattern; they have high entropy rate, low predictive information rate and low redundancy. 
These melodies are essentially totally random.  
A melody along the `periodicity' to `repetition' edge are all deterministic loops that get shorter as we approach the `repetition' corner, until it becomes just one repeating note. 
It is the areas in between the extremes that provide the more `interesting' melodies. 
These melodies have some level of unpredictability, but are not completely random. 
 Or, conversely, are predictable, but not entirely so.  

The Melody Triangle exists in two incarnations; a standard screen based interface where a user moves tokens in and around a triangle on screen, and a multi-user interactive installation where a Kinect camera tracks individuals in a space and maps their positions in physical space to the triangle.
In the latter visitors entering the installation generates a melody, and could collaborate with their co-visitors to generate musical textures -- a playful yet informative way to explore expectation and surprise in music.  
Additionally different gestures could be detected to change the tempo, register, instrumentation and periodicity of the output melody.  

\begin{figure}
	\centering
	\includegraphics[width=\linewidth]{figs/mtriscat}
	\caption{The population of transition matrices distributed along three axes of 
	redundancy, entropy rate and predictive information rate (all measured in bits).
	The concentrations of points along the redundancy axis correspond
	to Markov chains which are roughly periodic with periods of 2 (redundancy 1 bit),
	3, 4, \etc all the way to period 8 (redundancy 3 bits). The colour of each point
	represents its PIR---note that the highest values are found at intermediate entropy
	and redundancy, and that the distribution as a whole makes a curved triangle. Although
	not visible in this plot, it is largely hollow in the middle. 
	\label{InfoDynEngine}}
\end{figure}

As a screen based interface the Melody Triangle can serve as composition tool.
A triangle is drawn on the screen, screen space thus mapped to the statistical space of the Melody Triangle.
A number of round tokens, each representing a melody can be dragged in and around the triangle.  
When a token is dragged into the triangle, the system will start generating the sequence of symbols with statistical properties that correspond to the position of the token.  
These symbols are then mapped to notes of a scale. 
 Keyboard input allow for control over additionally parameters.  

 \begin{figure}
\centering
\includegraphics[width=0.9\linewidth]{figs/TheTriangle.pdf}
\caption{The Melody Triangle\label{TheTriangle}}
\end{figure}	

In this mode, the Melody Triangle is a compositional tool.  
It can assist a composer in the creation not only of melodies, but by placing multiple tokens in the triangle, the generation of intricate musical textures.    
Unlike other computer aided composition tools or programming environments, here the composer engages with music on the high and abstract level of expectation, randomness and predictability.   	


\section{Musical Preference and Information Dynamics}
We carried out a preliminary study that sought to identify any correlation between
aesthetic preference and the information theoretical measures of the Melody
Triangle.  In this study participants were asked to use the screen based interface
but it was simplified so that all they could do was move tokens around.  To help
discount visual biases, the axes of the triangle would be randomly rearranged
for each participant.\emph{self-plagiarised}

The study was divided in to two parts, the first investigated musical preference
with respect to single melodies at different tempos.  In the second part of the
study, a background melody is playing and the participants are asked to continue
playing with the system under the implicit assumption that they will try to find
a second melody that works well with the background melody.  For each participant
this was done four times, each with a different background melody from four
different areas of the Melody Triangle.  For all parts of the study the participants
were asked to signal, by pressing the space bar, whenever they liked what they
were hearing.\emph{self-plagiarised}

\emph{todo - results}

\section{Information Dynamics as Evaluative Feedback Mechanism}

\emph{todo - code the info dyn evaluator :) }
	
It is possible to use information dynamics measures to develop a kind of `critic'
that would evaluate a stream of symbols.  For instance we could develop a system
to notify us if a stream of symbols is too boring, either because they are too
repetitive or too chaotic.  This could be used to evaluate both pre-composed
streams of symbols, or could even be used to provide real-time feedback in an
improvisatory setup.

\emph{comparable system}  Gordon Pask's Musicolor (1953) applied a similar notion
of boredom in its design.  The Musicolour would react to audio input through a
microphone by flashing coloured lights.  Rather than a direct mapping of sound
to light, Pask designed the device to be a partner to a performing musician.  It
would adapt its lighting pattern based on the rhythms and frequencies it would
hear, quickly `learning' to flash in time with the music.  However Pask endowed
the device with the ability to `be bored'; if the rhythmic and frequency content
of the input remained the same for too long it would listen for other rhythms
and frequencies, only lighting when it heard these.  As the Musicolour would
`get bored', the musician would have to change and vary their playing, eliciting
new and unexpected outputs in trying to keep the Musicolour interested.
	
In a similar vein, our \emph{Information Dynamics Critic}(name?) allows for an
evaluative measure of an input stream, however containing a more sophisticated
notion of boredom that \dots


   

\section{Conclusion}

\bibliographystyle{unsrt}
{\bibliography{all,c4dm,nime}}
\end{document}