diff draft.tex @ 41:9d03f05b6528

More sec 2, moved some figures.
author samer
date Thu, 15 Mar 2012 12:04:07 +0000
parents 3ec2037c4107
children 1161caf0bdda
line wrap: on
line diff
--- a/draft.tex	Thu Mar 15 01:06:02 2012 +0000
+++ b/draft.tex	Thu Mar 15 12:04:07 2012 +0000
@@ -1,4 +1,4 @@
-\documentclass[conference,a4paper]{IEEEtran}
+\documentclass[conference]{IEEEtran}
 \usepackage{cite}
 \usepackage[cmex10]{amsmath}
 \usepackage{graphicx}
@@ -53,15 +53,14 @@
 %\usepackage[parfill]{parskip}
 
 \begin{document}
-\title{Cognitive Music Modelling: an Information Dynamics Approach}
+\title{Cognitive Music Modelling: an\\Information Dynamics Approach}
 
 \author{
 	\IEEEauthorblockN{Samer A. Abdallah, Henrik Ekeus, Peter Foster}
 	\IEEEauthorblockN{Andrew Robertson and Mark D. Plumbley}
 	\IEEEauthorblockA{Centre for Digital Music\\
 		Queen Mary University of London\\
-		Mile End Road, London E1 4NS\\
-		Email:}}
+		Mile End Road, London E1 4NS}}
 
 \maketitle
 \begin{abstract}
@@ -226,6 +225,8 @@
 \section{Theoretical review}
 
 	\subsection{Entropy and information}
+	\label{s:entro-info}
+
 	Let $X$ denote some variable whose value is initially unknown to our 
 	hypothetical observer. We will treat $X$ mathematically as a random variable,
 	with a value to be drawn from some set $\X$ and a 
@@ -235,12 +236,13 @@
 	as the entropy of the random variable $H(X)$. For a discrete variable
 	with probability mass function $p:\X \to [0,1]$, this is
 	\begin{equation}
-		H(X) = \sum_{x\in\X} -p(x) \log p(x) = \expect{-\log p(X)},
+		H(X) = \sum_{x\in\X} -p(x) \log p(x), % = \expect{-\log p(X)},
 	\end{equation}
-	where $\expect{}$ is the expectation operator. The negative-log-probability
+%	where $\expect{}$ is the expectation operator. 
+	The negative-log-probability
 	$\ell(x) = -\log p(x)$ of a particular value $x$ can usefully be thought of as
 	the \emph{surprisingness} of the value $x$ should it be observed, and
-	hence the entropy is the expected surprisingness.
+	hence the entropy is the expectation of the surprisingness $\expect \ell(X)$.
 
 	Now suppose that the observer receives some new data $\Data$ that
 	causes a revision of its beliefs about $X$. The \emph{information}
@@ -250,6 +252,7 @@
 	\begin{equation}
 		\mathcal{I}_{\Data\to X} = D(p_{X|\Data} || p_{X})
 			= \sum_{x\in\X} p(x|\Data) \log \frac{p(x|\Data)}{p(x)}.
+			\label{eq:info}
 	\end{equation}
 	When there are multiple variables $X_1, X_2$ 
 	\etc which the observer believes to be dependent, then the observation of 
@@ -367,16 +370,16 @@
 	which, at any time $t$, the sequence of variables can be divided into a `present' $X_t$, a `past' 
 	$\past{X}_t \equiv (\ldots, X_{t-2}, X_{t-1})$, and a `future' 
 	$\fut{X}_t \equiv (X_{t+1},X_{t+2},\ldots)$.
-	The actually observed value of $X_t$ will be written as $x_t$, and
+	We will write the actually observed value of $X_t$ as $x_t$, and
 	the sequence of observations up to but not including $x_t$ as 
 	$\past{x}_t$.
 %	Since the sequence is assumed stationary, we can without loss of generality,
 %	assume that $t=0$ in the following definitions.
 
-	The in-context surprisingness of the observation $X_t=x_t$ is a function
-	of both $x_t$ and the context $\past{x}_t$:
+	The in-context surprisingness of the observation $X_t=x_t$ depends on 
+	both $x_t$ and the context $\past{x}_t$:
 	\begin{equation}
-		\ell(x_t|\past{x}_t) = - \log p(x_t|\past{x}_t).
+		\ell_t = - \log p(x_t|\past{x}_t).
 	\end{equation}
 	However, before $X_t$ is observed to be $x_t$, the observer can compute
 	its \emph{expected} surprisingness as a measure of its uncertainty about
@@ -385,10 +388,10 @@
 	conditional on the \emph{event} $\ev(\past{X}_t=\past{x}_t)$, not
 	\emph{variables} $\past{X}_t$ as in the conventional conditional entropy.
 
-	The surprisingness $\ell(x_t|\past{x}_t)$ and expected surprisingness
+	The surprisingness $\ell_t$ and expected surprisingness
 	$H(X_t|\ev(\past{X}_t=\past{x}_t))$
-	are subjective information dynamic measures since they are based on its
-	subjective probability model in the context of the actually observed sequence
+	can be understood as \emph{subjective} information dynamic measures, since they are 
+	based on the observer's probability model in the context of the actually observed sequence
 	$\past{x}_t$---they characterise what it is like to `be in the observer's shoes'.
 	If we view the observer as a purely passive or reactive agent, this would
 	probably be sufficient, but for active agents such as humans or animals, it is
@@ -396,39 +399,24 @@
 	most effective course of action. It makes sense for such observers to be
 	concerned about the predictive probability distribution over future events,
 	$p(\fut{x}_t|\past{x}_t)$. When an observation $\ev(X_t=x_t)$ is made in this context,
-	the \emph{instantaneous predictive information} (IPI) is the information in the
-	event $\ev(X_t=x_t)$ about the entire future of the sequence $\fut{X}_t$.
+	the \emph{instantaneous predictive information} (IPI) $\mathcal{I}_t$ at time $t$
+	is the information in the event $\ev(X_t=x_t)$ about the entire future of the sequence $\fut{X}_t$,
+	\emph{given} the observed past $\past{X}_t=\past{x}_t$.
+	Referring to the definition of information \eqrf{info}, this is the KL divergence
+	between prior and posterior distributions over possible futures, which written out in full, is
+	\begin{equation}
+		\mathcal{I}_t = \sum_{\fut{x}_t \in \X^*} 
+					p(\fut{x}_t|x_t,\past{x}_t) \log \frac{ p(\fut{x}_t|x_t,\past{x}_t) }{ p(\fut{x}_t|\past{x}_t) },
+	\end{equation}
+	where the sum is to be taken over the set of infinite sequences $\X^*$.
+	As with the surprisingness, the observer can compute its \emph{expected} IPI
+	at time $t$, which reduces to a mutual information $I(X_t;\fut{X}_t|\ev(\past{X}_t=\past{x}_t))$
+	conditioned on the observed past. This could be used, for example, as an estimate
+	of attentional resources which should be directed at this stream of data, which may
+	be in competition with other sensory streams.
 
 	\subsection{Information measures for stationary random processes}
 
-	The \emph{entropy rate} of the process is the entropy of the next variable
-	$X_t$ given all the previous ones.
-	\begin{equation}
-		\label{eq:entro-rate}
-		h_\mu = H(X_0|\past{X}_0).
-	\end{equation}
-	The entropy rate gives a measure of the overall randomness
-	or unpredictability of the process.
-
-	The \emph{multi-information rate} $\rho_\mu$ (following Dubnov's \cite{Dubnov2006}
-	notation for what he called the `information rate') is the mutual
-	information between the `past' and the `present':
-	\begin{equation}
-		\label{eq:multi-info}
-			\rho_\mu(t) = I(\past{X}_t;X_t) = H(X_0) - h_\mu.
-	\end{equation}
-	It is a measure of how much the context of an observation (that is,
-	the observation of previous elements of the sequence) helps in predicting
-	or reducing the suprisingness of the current observation.
-
-	The \emph{excess entropy} \cite{CrutchfieldPackard1983} 
-	is the mutual information between 
-	the entire `past' and the entire `future':
-	\begin{equation}
-		E = I(\past{X}_0; X_0,\fut{X}_0).
-	\end{equation}
-	
-
 
  	\begin{fig}{predinfo-bg}
 		\newcommand\subfig[2]{\shortstack{#2\\[0.75em]#1}}
@@ -510,12 +498,50 @@
 		}
 	\end{fig}
 
+	If we step back, out of the observer's shoes as it were, and consider the
+	random process $(\ldots,X_{-1},X_0,X_1,\dots)$ as a statistical ensemble of 
+	possible realisations, and furthermore assume that it is stationary,
+	then it becomes possible to define a number of information-theoretic measures,
+	closely related to those described above, but which characterise the
+	process as a whole, rather than on a moment-by-moment basis. Some of these,
+	such as the entropy rate, are well-known, but others are only recently being
+	investigated. (In the following, the assumption of stationarity means that
+	the measures defined below are independent of $t$.)
+
+	The \emph{entropy rate} of the process is the entropy of the next variable
+	$X_t$ given all the previous ones.
+	\begin{equation}
+		\label{eq:entro-rate}
+		h_\mu = H(X_t|\past{X}_t).
+	\end{equation}
+	The entropy rate gives a measure of the overall randomness
+	or unpredictability of the process.
+
+	The \emph{multi-information rate} $\rho_\mu$ (following Dubnov's \cite{Dubnov2006}
+	notation for what he called the `information rate') is the mutual
+	information between the `past' and the `present':
+	\begin{equation}
+		\label{eq:multi-info}
+			\rho_\mu = I(\past{X}_t;X_t) = H(X_t) - h_\mu.
+	\end{equation}
+	It is a measure of how much the context of an observation (that is,
+	the observation of previous elements of the sequence) helps in predicting
+	or reducing the suprisingness of the current observation.
+
+	The \emph{excess entropy} \cite{CrutchfieldPackard1983} 
+	is the mutual information between 
+	the entire `past' and the entire `future':
+	\begin{equation}
+		E = I(\past{X}_t; X_t,\fut{X}_t).
+	\end{equation}
+	
+
 	The \emph{predictive information rate} (or PIR) \cite{AbdallahPlumbley2009}  
 	is the average information in one observation about the infinite future given the infinite past,
 	and is defined as a conditional mutual information: 
 	\begin{equation}
 		\label{eq:PIR}
-		b_\mu = I(X_0;\fut{X}_0|\past{X}_0) = H(\fut{X}_0|\past{X}_0) - H(\fut{X}_0|X_0,\past{X}_0).
+		b_\mu = I(X_t;\fut{X}_t|\past{X}_t) = H(\fut{X}_t|\past{X}_t) - H(\fut{X}_t|X_t,\past{X}_t).
 	\end{equation}
 	Equation \eqrf{PIR} can be read as the average reduction
 	in uncertainty about the future on learning $X_t$, given the past. 
@@ -523,11 +549,11 @@
 	as 
 	\begin{equation}
 %		\IXZ_t 
-		I(X_0;\fut{X}_0|\past{X}_0) = h_\mu - r_\mu,
+		I(X_t;\fut{X}_t|\past{X}_t) = h_\mu - r_\mu,
 %		\label{<++>}
 	\end{equation}
 %	If $X$ is stationary, then 
-	where $r_\mu = H(X_0|\fut{X}_0,\past{X}_0)$, 
+	where $r_\mu = H(X_t|\fut{X}_t,\past{X}_t)$, 
 	is the \emph{residual} \cite{AbdallahPlumbley2010},
 	or \emph{erasure} \cite{VerduWeissman2006} entropy rate.
 	These relationships are illustrated in \Figrf{predinfo-bg}, along with
@@ -559,11 +585,11 @@
 	\subsection{First and higher order Markov chains}
 	First order Markov chains are the simplest non-trivial models to which information 
 	dynamics methods can be applied. In \cite{AbdallahPlumbley2009} we derived
-	expressions for all the information measures introduced [above] for
+	expressions for all the information measures described in \secrf{surprise-info-seq} for
 	irreducible stationary Markov chains (\ie that have a unique stationary
 	distribution). The derivation is greatly simplified by the dependency structure
 	of the Markov chain: for the purpose of the analysis, the `past' and `future'
-	segments $\past{X}_t$ and $\fut{X})_t$ can be reduced to just the previous
+	segments $\past{X}_t$ and $\fut{X}_t$ can be collapsed to just the previous
 	and next variables $X_{t-1}$ and $X_{t-1}$ respectively. We also showed that 
 	the predictive information rate can be expressed simply in terms of entropy rates:
 	if we let $a$ denote the $K\times K$ transition matrix of a Markov chain over
@@ -581,17 +607,21 @@
 	Second and higher order Markov chains can be treated in a similar way by transforming
 	to a first order representation of the high order Markov chain. If we are dealing
 	with an $N$th order model, this is done forming a new alphabet of size $K^N$
-	consisting of all possible $N$-tuples of symbols from the base alphabet of size $K$. 
-	An observation $\hat{x}_t$ in this new model represents a block of $N$ observations 
+	consisting of all possible $N$-tuples of symbols from the base alphabet. 
+	An observation $\hat{x}_t$ in this new model encodes a block of $N$ observations 
 	$(x_{t+1},\ldots,x_{t+N})$ from the base model. The next
-	observation $\hat{x}_{t+1}$ represents the block of $N$ obtained by shifting the previous 
+	observation $\hat{x}_{t+1}$ encodes the block of $N$ obtained by shifting the previous 
 	block along by one step. The new Markov of chain is parameterised by a sparse $K^N\times K^N$
-	transition matrix $\hat{a}$. The entropy rate of the first order system is the same
-	as the entropy rate of the original order $N$ system, and its PIR is
+	transition matrix $\hat{a}$. Adopting the label $\mu$ for the order $N$ system, 
+	we obtain:
 	\begin{equation}
-		b({\hat{a}}) = h({\hat{a}^{N+1}}) - N h({\hat{a}}),
+		h_\mu = h(\hat{a}), \qquad b_\mu = h({\hat{a}^{N+1}}) - N h({\hat{a}}),
 	\end{equation}
 	where $\hat{a}^{N+1}$ is the $(N+1)$th power of the first order transition matrix.  
+	Other information measures can also be computed for the high-order Markov chain, including
+	the multi-information rate $\rho_\mu$ and the excess entropy $E$. These are identical
+	for first order Markov chains, but for order $N$ chains, $E$ can be up to $N$ times larger 
+	than $\rho_\mu$.
 		
 
 \section{Information Dynamics in Analysis}
@@ -701,38 +731,6 @@
 Information dynamics can serve as a novel framework for the exploration of the possibilities of such processes at the high and abstract level of expectation, randomness and predictability.
 
  \subsection{The Melody Triangle}  
-The Melody Triangle is an exploratory interface for the discovery of melodic content, where the input -- positions within a triangle -- directly map to information theoretic measures of the output.  
-The measures -- entropy rate, redundancy and predictive information rate -- form a criteria with which to filter the output of the stochastic processes used to generate sequences of notes. 
-These measures address notions of expectation and surprise in music, and as such the Melody Triangle is a means of interfacing with a generative process in terms of the predictability of its output.       
- 	
-The triangle is `populated' with possible parameter values for melody generators. 
-These are plotted in a 3d statistical space of redundancy, entropy rate and predictive information rate. 
- In our case we generated thousands of transition matrixes, representing first-order Markov chains, by a random sampling method. 
- In figure \ref{InfoDynEngine} we see a representation of how these matrixes are distributed in the 3d statistical space; each one of these points corresponds to a transition matrix.
-
-
- 
-	
-The distribution of transition matrixes plotted in this space forms an arch shape that is fairly thin.  
-It thus becomes a reasonable approximation to pretend that it is just a sheet in two dimensions; and so we stretch out this curved arc into a flat triangle.  
-It is this triangular sheet that is our `Melody Triangle' and forms the interface by which the system is controlled.  
-Using this interface thus involves a mapping to statistical space; a user selects a position within the triangle, and a corresponding transition matrix is returned.  
-Figure \ref{TheTriangle} shows how the triangle maps to different measures of redundancy, entropy rate and predictive information rate.
-
-	
-
-
-Each corner corresponds to three different extremes of predictability and unpredictability, which could be loosely characterised as `periodicity', `noise' and `repetition'.  
-Melodies from the `noise' corner have no discernible pattern; they have high entropy rate, low predictive information rate and low redundancy. 
-These melodies are essentially totally random.  
-A melody along the `periodicity' to `repetition' edge are all deterministic loops that get shorter as we approach the `repetition' corner, until it becomes just one repeating note. 
-It is the areas in between the extremes that provide the more `interesting' melodies. 
-These melodies have some level of unpredictability, but are not completely random. 
- Or, conversely, are predictable, but not entirely so.  
-
-The Melody Triangle exists in two incarnations; a standard screen based interface where a user moves tokens in and around a triangle on screen, and a multi-user interactive installation where a Kinect camera tracks individuals in a space and maps their positions in physical space to the triangle.
-In the latter visitors entering the installation generates a melody, and could collaborate with their co-visitors to generate musical textures -- a playful yet informative way to explore expectation and surprise in music.  
-Additionally different gestures could be detected to change the tempo, register, instrumentation and periodicity of the output melody.  
 
 \begin{figure}
 	\centering
@@ -748,6 +746,46 @@
 	\label{InfoDynEngine}}
 \end{figure}
 
+The Melody Triangle is an exploratory interface for the discovery of melodic content, where the input -- positions within a triangle -- directly map to information theoretic measures of the output.  
+The measures -- entropy rate, redundancy and predictive information rate -- form a criteria with which to filter the output of the stochastic processes used to generate sequences of notes. 
+These measures address notions of expectation and surprise in music, and as such the Melody Triangle is a means of interfacing with a generative process in terms of the predictability of its output.       
+ 	
+The triangle is `populated' with possible parameter values for melody generators. 
+These are plotted in a 3d statistical space of redundancy, entropy rate and predictive information rate. 
+ In our case we generated thousands of transition matrixes, representing first-order Markov chains, by a random sampling method. 
+ In figure \ref{InfoDynEngine} we see a representation of how these matrixes are distributed in the 3d statistical space; each one of these points corresponds to a transition matrix.
+
+
+ 
+	
+The distribution of transition matrixes plotted in this space forms an arch shape that is fairly thin.  
+It thus becomes a reasonable approximation to pretend that it is just a sheet in two dimensions; and so we stretch out this curved arc into a flat triangle.  
+It is this triangular sheet that is our `Melody Triangle' and forms the interface by which the system is controlled.  
+Using this interface thus involves a mapping to statistical space; a user selects a position within the triangle, and a corresponding transition matrix is returned.  
+Figure \ref{TheTriangle} shows how the triangle maps to different measures of redundancy, entropy rate and predictive information rate.
+
+	
+
+
+Each corner corresponds to three different extremes of predictability and unpredictability, which could be loosely characterised as `periodicity', `noise' and `repetition'.  
+Melodies from the `noise' corner have no discernible pattern; they have high entropy rate, low predictive information rate and low redundancy. 
+These melodies are essentially totally random.  
+A melody along the `periodicity' to `repetition' edge are all deterministic loops that get shorter as we approach the `repetition' corner, until it becomes just one repeating note. 
+It is the areas in between the extremes that provide the more `interesting' melodies. 
+These melodies have some level of unpredictability, but are not completely random. 
+ Or, conversely, are predictable, but not entirely so.  
+
+ \begin{figure}
+\centering
+\includegraphics[width=0.9\linewidth]{figs/TheTriangle.pdf}
+\caption{The Melody Triangle\label{TheTriangle}}
+\end{figure}	
+
+
+The Melody Triangle exists in two incarnations; a standard screen based interface where a user moves tokens in and around a triangle on screen, and a multi-user interactive installation where a Kinect camera tracks individuals in a space and maps their positions in physical space to the triangle.
+In the latter visitors entering the installation generates a melody, and could collaborate with their co-visitors to generate musical textures -- a playful yet informative way to explore expectation and surprise in music.  
+Additionally different gestures could be detected to change the tempo, register, instrumentation and periodicity of the output melody.  
+
 As a screen based interface the Melody Triangle can serve as composition tool.
 A triangle is drawn on the screen, screen space thus mapped to the statistical space of the Melody Triangle.
 A number of round tokens, each representing a melody can be dragged in and around the triangle.  
@@ -755,12 +793,6 @@
 These symbols are then mapped to notes of a scale. 
  Keyboard input allow for control over additionally parameters.  
 
- \begin{figure}
-\centering
-\includegraphics[width=0.9\linewidth]{figs/TheTriangle.pdf}
-\caption{The Melody Triangle\label{TheTriangle}}
-\end{figure}	
-
 The Melody Triangle is can assist a composer in the creation not only of melodies, but, by placing multiple tokens in the triangle, can generate intricate musical textures.    
 Unlike other computer aided composition tools or programming environments, here the composer engages with music on the high and abstract level of expectation, randomness and predictability.