cip2012: draft.tex comparison

comparison draft.tex @ 41:9d03f05b6528

More sec 2, moved some figures.

author	samer
date	Thu, 15 Mar 2012 12:04:07 +0000
parents	3ec2037c4107
children	1161caf0bdda

comparison

equal deleted inserted replaced

-:3ec2037c4107
+:9d03f05b6528
-\documentclass[conference,a4paper]{IEEEtran}
+\documentclass[conference]{IEEEtran}
 \usepackage{cite}
 \usepackage[cmex10]{amsmath}
 \usepackage{graphicx}
 \usepackage{amssymb}
 \usepackage{epstopdf}
 \newcommand\parity[2]{P^{#1}_{2,#2}}
 %\usepackage[parfill]{parskip}
 \begin{document}
-\title{Cognitive Music Modelling: an Information Dynamics Approach}
+\title{Cognitive Music Modelling: an\\Information Dynamics Approach}
 \author{
 	\IEEEauthorblockN{Samer A. Abdallah, Henrik Ekeus, Peter Foster}
 	\IEEEauthorblockN{Andrew Robertson and Mark D. Plumbley}
 	\IEEEauthorblockA{Centre for Digital Music\\
 		Queen Mary University of London\\
-		Mile End Road, London E1 4NS\\
+		Mile End Road, London E1 4NS}}
-		Email:}}
 \maketitle
 \begin{abstract}
 	People take in information when perceiving music.  With it they continually
 	build predictive models of what is going to happen.  There is a relationship
 \section{Theoretical review}
 	\subsection{Entropy and information}
+	\label{s:entro-info}
 	Let $X$ denote some variable whose value is initially unknown to our
 	hypothetical observer. We will treat $X$ mathematically as a random variable,
 	with a value to be drawn from some set $\X$ and a
 	probability distribution representing the observer's beliefs about the
 	true value of $X$.
 	In this case, the observer's uncertainty about $X$ can be quantified
 	as the entropy of the random variable $H(X)$. For a discrete variable
 	with probability mass function $p:\X \to [0,1]$, this is
 	\begin{equation}
-		H(X) = \sum_{x\in\X} -p(x) \log p(x) = \expect{-\log p(X)},
+		H(X) = \sum_{x\in\X} -p(x) \log p(x), % = \expect{-\log p(X)},
 	\end{equation}
-	where $\expect{}$ is the expectation operator. The negative-log-probability
+%	where $\expect{}$ is the expectation operator.
+	The negative-log-probability
 	$\ell(x) = -\log p(x)$ of a particular value $x$ can usefully be thought of as
 	the \emph{surprisingness} of the value $x$ should it be observed, and
-	hence the entropy is the expected surprisingness.
+	hence the entropy is the expectation of the surprisingness $\expect \ell(X)$.
 	Now suppose that the observer receives some new data $\Data$ that
 	causes a revision of its beliefs about $X$. The \emph{information}
 	in this new data \emph{about} $X$ can be quantified as the
 	Kullback-Leibler (KL) divergence between the prior and posterior
 	distributions $p(x)$ and $p(x|\Data)$ respectively:
 	\begin{equation}
 		\mathcal{I}_{\Data\to X} = D(p_{X|\Data} || p_{X})
 			= \sum_{x\in\X} p(x|\Data) \log \frac{p(x|\Data)}{p(x)}.
+			\label{eq:info}
 	\end{equation}
 	When there are multiple variables $X_1, X_2$
 	\etc which the observer believes to be dependent, then the observation of
 	one may change its beliefs and hence yield information about the
 	others. The joint and conditional entropies as described in any
 	as a label for the process. We can indentify a number of information-theoretic
 	measures meaningful in the context of a sequential observation of the sequence, during
 	which, at any time $t$, the sequence of variables can be divided into a `present' $X_t$, a `past'
 	$\past{X}_t \equiv (\ldots, X_{t-2}, X_{t-1})$, and a `future'
 	$\fut{X}_t \equiv (X_{t+1},X_{t+2},\ldots)$.
-	The actually observed value of $X_t$ will be written as $x_t$, and
+	We will write the actually observed value of $X_t$ as $x_t$, and
 	the sequence of observations up to but not including $x_t$ as
 	$\past{x}_t$.
 %	Since the sequence is assumed stationary, we can without loss of generality,
 %	assume that $t=0$ in the following definitions.
-	The in-context surprisingness of the observation $X_t=x_t$ is a function
+	The in-context surprisingness of the observation $X_t=x_t$ depends on
-	of both $x_t$ and the context $\past{x}_t$:
+	both $x_t$ and the context $\past{x}_t$:
 	\begin{equation}
-		\ell(x_t|\past{x}_t) = - \log p(x_t|\past{x}_t).
+		\ell_t = - \log p(x_t|\past{x}_t).
 	\end{equation}
 	However, before $X_t$ is observed to be $x_t$, the observer can compute
 	its \emph{expected} surprisingness as a measure of its uncertainty about
 	the very next event; this may be written as an entropy
 	$H(X_t|\ev(\past{X}_t = \past{x}_t))$, but note that this is
 	conditional on the \emph{event} $\ev(\past{X}_t=\past{x}_t)$, not
 	\emph{variables} $\past{X}_t$ as in the conventional conditional entropy.
-	The surprisingness $\ell(x_t|\past{x}_t)$ and expected surprisingness
+	The surprisingness $\ell_t$ and expected surprisingness
 	$H(X_t|\ev(\past{X}_t=\past{x}_t))$
-	are subjective information dynamic measures since they are based on its
+	can be understood as \emph{subjective} information dynamic measures, since they are
-	subjective probability model in the context of the actually observed sequence
+	based on the observer's probability model in the context of the actually observed sequence
 	$\past{x}_t$---they characterise what it is like to `be in the observer's shoes'.
 	If we view the observer as a purely passive or reactive agent, this would
 	probably be sufficient, but for active agents such as humans or animals, it is
 	often necessary to \emph{aniticipate} future events in order, for example, to plan the
 	most effective course of action. It makes sense for such observers to be
 	concerned about the predictive probability distribution over future events,
 	$p(\fut{x}_t|\past{x}_t)$. When an observation $\ev(X_t=x_t)$ is made in this context,
-	the \emph{instantaneous predictive information} (IPI) is the information in the
+	the \emph{instantaneous predictive information} (IPI) $\mathcal{I}_t$ at time $t$
-	event $\ev(X_t=x_t)$ about the entire future of the sequence $\fut{X}_t$.
+	is the information in the event $\ev(X_t=x_t)$ about the entire future of the sequence $\fut{X}_t$,
+	\emph{given} the observed past $\past{X}_t=\past{x}_t$.
+	Referring to the definition of information \eqrf{info}, this is the KL divergence
+	between prior and posterior distributions over possible futures, which written out in full, is
+	\begin{equation}
+		\mathcal{I}_t = \sum_{\fut{x}_t \in \X^*}
+					p(\fut{x}_t|x_t,\past{x}_t) \log \frac{ p(\fut{x}_t|x_t,\past{x}_t) }{ p(\fut{x}_t|\past{x}_t) },
+	\end{equation}
+	where the sum is to be taken over the set of infinite sequences $\X^*$.
+	As with the surprisingness, the observer can compute its \emph{expected} IPI
+	at time $t$, which reduces to a mutual information $I(X_t;\fut{X}_t|\ev(\past{X}_t=\past{x}_t))$
+	conditioned on the observed past. This could be used, for example, as an estimate
+	of attentional resources which should be directed at this stream of data, which may
+	be in competition with other sensory streams.
 	\subsection{Information measures for stationary random processes}
-	The \emph{entropy rate} of the process is the entropy of the next variable
-	$X_t$ given all the previous ones.
-	\begin{equation}
-		\label{eq:entro-rate}
-		h_\mu = H(X_0|\past{X}_0).
-	\end{equation}
-	The entropy rate gives a measure of the overall randomness
-	or unpredictability of the process.
-	The \emph{multi-information rate} $\rho_\mu$ (following Dubnov's \cite{Dubnov2006}
-	notation for what he called the `information rate') is the mutual
-	information between the `past' and the `present':
-	\begin{equation}
-		\label{eq:multi-info}
-			\rho_\mu(t) = I(\past{X}_t;X_t) = H(X_0) - h_\mu.
-	\end{equation}
-	It is a measure of how much the context of an observation (that is,
-	the observation of previous elements of the sequence) helps in predicting
-	or reducing the suprisingness of the current observation.
-	The \emph{excess entropy} \cite{CrutchfieldPackard1983}
-	is the mutual information between
-	the entire `past' and the entire `future':
-	\begin{equation}
-		E = I(\past{X}_0; X_0,\fut{X}_0).
-	\end{equation}
 	\begin{fig}{predinfo-bg}
 		\newcommand\subfig[2]{\shortstack{#2\\[0.75em]#1}}
 		\newcommand\rad{1.8em}%
 		rate, $r_\mu$ is the residual entropy rate, and $b_\mu$ is the predictive
 		information rate. The entropy rate is $h_\mu = r_\mu+b_\mu$.
 		}
 	\end{fig}
+	If we step back, out of the observer's shoes as it were, and consider the
+	random process $(\ldots,X_{-1},X_0,X_1,\dots)$ as a statistical ensemble of
+	possible realisations, and furthermore assume that it is stationary,
+	then it becomes possible to define a number of information-theoretic measures,
+	closely related to those described above, but which characterise the
+	process as a whole, rather than on a moment-by-moment basis. Some of these,
+	such as the entropy rate, are well-known, but others are only recently being
+	investigated. (In the following, the assumption of stationarity means that
+	the measures defined below are independent of $t$.)
+	The \emph{entropy rate} of the process is the entropy of the next variable
+	$X_t$ given all the previous ones.
+	\begin{equation}
+		\label{eq:entro-rate}
+		h_\mu = H(X_t|\past{X}_t).
+	\end{equation}
+	The entropy rate gives a measure of the overall randomness
+	or unpredictability of the process.
+	The \emph{multi-information rate} $\rho_\mu$ (following Dubnov's \cite{Dubnov2006}
+	notation for what he called the `information rate') is the mutual
+	information between the `past' and the `present':
+	\begin{equation}
+		\label{eq:multi-info}
+			\rho_\mu = I(\past{X}_t;X_t) = H(X_t) - h_\mu.
+	\end{equation}
+	It is a measure of how much the context of an observation (that is,
+	the observation of previous elements of the sequence) helps in predicting
+	or reducing the suprisingness of the current observation.
+	The \emph{excess entropy} \cite{CrutchfieldPackard1983}
+	is the mutual information between
+	the entire `past' and the entire `future':
+	\begin{equation}
+		E = I(\past{X}_t; X_t,\fut{X}_t).
+	\end{equation}
 	The \emph{predictive information rate} (or PIR) \cite{AbdallahPlumbley2009}
 	is the average information in one observation about the infinite future given the infinite past,
 	and is defined as a conditional mutual information:
 	\begin{equation}
 		\label{eq:PIR}
-		b_\mu = I(X_0;\fut{X}_0|\past{X}_0) = H(\fut{X}_0|\past{X}_0) - H(\fut{X}_0|X_0,\past{X}_0).
+		b_\mu = I(X_t;\fut{X}_t|\past{X}_t) = H(\fut{X}_t|\past{X}_t) - H(\fut{X}_t|X_t,\past{X}_t).
 	\end{equation}
 	Equation \eqrf{PIR} can be read as the average reduction
 	in uncertainty about the future on learning $X_t$, given the past.
 	Due to the symmetry of the mutual information, it can also be written
 	as
 	\begin{equation}
 %		\IXZ_t
-		I(X_0;\fut{X}_0|\past{X}_0) = h_\mu - r_\mu,
+		I(X_t;\fut{X}_t|\past{X}_t) = h_\mu - r_\mu,
 %		\label{<++>}
 	\end{equation}
 %	If $X$ is stationary, then
-	where $r_\mu = H(X_0|\fut{X}_0,\past{X}_0)$,
+	where $r_\mu = H(X_t|\fut{X}_t,\past{X}_t)$,
 	is the \emph{residual} \cite{AbdallahPlumbley2010},
 	or \emph{erasure} \cite{VerduWeissman2006} entropy rate.
 	These relationships are illustrated in \Figrf{predinfo-bg}, along with
 	several of the information measures we have discussed so far.
 	\subsection{First and higher order Markov chains}
 	First order Markov chains are the simplest non-trivial models to which information
 	dynamics methods can be applied. In \cite{AbdallahPlumbley2009} we derived
-	expressions for all the information measures introduced [above] for
+	expressions for all the information measures described in \secrf{surprise-info-seq} for
 	irreducible stationary Markov chains (\ie that have a unique stationary
 	distribution). The derivation is greatly simplified by the dependency structure
 	of the Markov chain: for the purpose of the analysis, the `past' and `future'
-	segments $\past{X}_t$ and $\fut{X})_t$ can be reduced to just the previous
+	segments $\past{X}_t$ and $\fut{X}_t$ can be collapsed to just the previous
 	and next variables $X_{t-1}$ and $X_{t-1}$ respectively. We also showed that
 	the predictive information rate can be expressed simply in terms of entropy rates:
 	if we let $a$ denote the $K\times K$ transition matrix of a Markov chain over
 	an alphabet of $\{1,\ldots,K\}$, such that
 	$a_{ij} = \Pr(\ev(X_t=i|X_{t-1}=j))$, and let $h:\reals^{K\times K}\to \reals$ be
 	along the original chain.
 	Second and higher order Markov chains can be treated in a similar way by transforming
 	to a first order representation of the high order Markov chain. If we are dealing
 	with an $N$th order model, this is done forming a new alphabet of size $K^N$
-	consisting of all possible $N$-tuples of symbols from the base alphabet of size $K$.
+	consisting of all possible $N$-tuples of symbols from the base alphabet.
-	An observation $\hat{x}_t$ in this new model represents a block of $N$ observations
+	An observation $\hat{x}_t$ in this new model encodes a block of $N$ observations
 	$(x_{t+1},\ldots,x_{t+N})$ from the base model. The next
-	observation $\hat{x}_{t+1}$ represents the block of $N$ obtained by shifting the previous
+	observation $\hat{x}_{t+1}$ encodes the block of $N$ obtained by shifting the previous
 	block along by one step. The new Markov of chain is parameterised by a sparse $K^N\times K^N$
-	transition matrix $\hat{a}$. The entropy rate of the first order system is the same
+	transition matrix $\hat{a}$. Adopting the label $\mu$ for the order $N$ system,
-	as the entropy rate of the original order $N$ system, and its PIR is
+	we obtain:
 	\begin{equation}
-		b({\hat{a}}) = h({\hat{a}^{N+1}}) - N h({\hat{a}}),
+		h_\mu = h(\hat{a}), \qquad b_\mu = h({\hat{a}^{N+1}}) - N h({\hat{a}}),
 	\end{equation}
 	where $\hat{a}^{N+1}$ is the $(N+1)$th power of the first order transition matrix.
+	Other information measures can also be computed for the high-order Markov chain, including
+	the multi-information rate $\rho_\mu$ and the excess entropy $E$. These are identical
+	for first order Markov chains, but for order $N$ chains, $E$ can be up to $N$ times larger
+	than $\rho_\mu$.
 \section{Information Dynamics in Analysis}
 \begin{fig}{twopages}
 The use of stochastic processes for the composition of musical material has been widespread for decades -- for instance Iannis Xenakis applied probabilistic mathematical models to the creation of musical materials\cite{Xenakis:1992ul}.
 Information dynamics can serve as a novel framework for the exploration of the possibilities of such processes at the high and abstract level of expectation, randomness and predictability.
 \subsection{The Melody Triangle}
-The Melody Triangle is an exploratory interface for the discovery of melodic content, where the input -- positions within a triangle -- directly map to information theoretic measures of the output.
-The measures -- entropy rate, redundancy and predictive information rate -- form a criteria with which to filter the output of the stochastic processes used to generate sequences of notes.
-These measures address notions of expectation and surprise in music, and as such the Melody Triangle is a means of interfacing with a generative process in terms of the predictability of its output.
-The triangle is `populated' with possible parameter values for melody generators.
-These are plotted in a 3d statistical space of redundancy, entropy rate and predictive information rate.
-In our case we generated thousands of transition matrixes, representing first-order Markov chains, by a random sampling method.
-In figure \ref{InfoDynEngine} we see a representation of how these matrixes are distributed in the 3d statistical space; each one of these points corresponds to a transition matrix.
-The distribution of transition matrixes plotted in this space forms an arch shape that is fairly thin.
-It thus becomes a reasonable approximation to pretend that it is just a sheet in two dimensions; and so we stretch out this curved arc into a flat triangle.
-It is this triangular sheet that is our `Melody Triangle' and forms the interface by which the system is controlled.
-Using this interface thus involves a mapping to statistical space; a user selects a position within the triangle, and a corresponding transition matrix is returned.
-Figure \ref{TheTriangle} shows how the triangle maps to different measures of redundancy, entropy rate and predictive information rate.
-Each corner corresponds to three different extremes of predictability and unpredictability, which could be loosely characterised as `periodicity', `noise' and `repetition'.
-Melodies from the `noise' corner have no discernible pattern; they have high entropy rate, low predictive information rate and low redundancy.
-These melodies are essentially totally random.
-A melody along the `periodicity' to `repetition' edge are all deterministic loops that get shorter as we approach the `repetition' corner, until it becomes just one repeating note.
-It is the areas in between the extremes that provide the more `interesting' melodies.
-These melodies have some level of unpredictability, but are not completely random.
-Or, conversely, are predictable, but not entirely so.
-The Melody Triangle exists in two incarnations; a standard screen based interface where a user moves tokens in and around a triangle on screen, and a multi-user interactive installation where a Kinect camera tracks individuals in a space and maps their positions in physical space to the triangle.
-In the latter visitors entering the installation generates a melody, and could collaborate with their co-visitors to generate musical textures -- a playful yet informative way to explore expectation and surprise in music.
-Additionally different gestures could be detected to change the tempo, register, instrumentation and periodicity of the output melody.
 \begin{figure}
 	\centering
 	\includegraphics[width=\linewidth]{figs/mtriscat}
 	\caption{The population of transition matrices distributed along three axes of
 	and redundancy, and that the distribution as a whole makes a curved triangle. Although
 	not visible in this plot, it is largely hollow in the middle.
 	\label{InfoDynEngine}}
 \end{figure}
+The Melody Triangle is an exploratory interface for the discovery of melodic content, where the input -- positions within a triangle -- directly map to information theoretic measures of the output.
+The measures -- entropy rate, redundancy and predictive information rate -- form a criteria with which to filter the output of the stochastic processes used to generate sequences of notes.
+These measures address notions of expectation and surprise in music, and as such the Melody Triangle is a means of interfacing with a generative process in terms of the predictability of its output.
+The triangle is `populated' with possible parameter values for melody generators.
+These are plotted in a 3d statistical space of redundancy, entropy rate and predictive information rate.
+In our case we generated thousands of transition matrixes, representing first-order Markov chains, by a random sampling method.
+In figure \ref{InfoDynEngine} we see a representation of how these matrixes are distributed in the 3d statistical space; each one of these points corresponds to a transition matrix.
+The distribution of transition matrixes plotted in this space forms an arch shape that is fairly thin.
+It thus becomes a reasonable approximation to pretend that it is just a sheet in two dimensions; and so we stretch out this curved arc into a flat triangle.
+It is this triangular sheet that is our `Melody Triangle' and forms the interface by which the system is controlled.
+Using this interface thus involves a mapping to statistical space; a user selects a position within the triangle, and a corresponding transition matrix is returned.
+Figure \ref{TheTriangle} shows how the triangle maps to different measures of redundancy, entropy rate and predictive information rate.
+Each corner corresponds to three different extremes of predictability and unpredictability, which could be loosely characterised as `periodicity', `noise' and `repetition'.
+Melodies from the `noise' corner have no discernible pattern; they have high entropy rate, low predictive information rate and low redundancy.
+These melodies are essentially totally random.
+A melody along the `periodicity' to `repetition' edge are all deterministic loops that get shorter as we approach the `repetition' corner, until it becomes just one repeating note.
+It is the areas in between the extremes that provide the more `interesting' melodies.
+These melodies have some level of unpredictability, but are not completely random.
+Or, conversely, are predictable, but not entirely so.
+\begin{figure}
+\centering
+\includegraphics[width=0.9\linewidth]{figs/TheTriangle.pdf}
+\caption{The Melody Triangle\label{TheTriangle}}
+\end{figure}
+The Melody Triangle exists in two incarnations; a standard screen based interface where a user moves tokens in and around a triangle on screen, and a multi-user interactive installation where a Kinect camera tracks individuals in a space and maps their positions in physical space to the triangle.
+In the latter visitors entering the installation generates a melody, and could collaborate with their co-visitors to generate musical textures -- a playful yet informative way to explore expectation and surprise in music.
+Additionally different gestures could be detected to change the tempo, register, instrumentation and periodicity of the output melody.
 As a screen based interface the Melody Triangle can serve as composition tool.
 A triangle is drawn on the screen, screen space thus mapped to the statistical space of the Melody Triangle.
 A number of round tokens, each representing a melody can be dragged in and around the triangle.
 When a token is dragged into the triangle, the system will start generating the sequence of symbols with statistical properties that correspond to the position of the token.
 These symbols are then mapped to notes of a scale.
 Keyboard input allow for control over additionally parameters.
-\begin{figure}
-\centering
-\includegraphics[width=0.9\linewidth]{figs/TheTriangle.pdf}
-\caption{The Melody Triangle\label{TheTriangle}}
-\end{figure}
 The Melody Triangle is can assist a composer in the creation not only of melodies, but, by placing multiple tokens in the triangle, can generate intricate musical textures.
 Unlike other computer aided composition tools or programming environments, here the composer engages with music on the high and abstract level of expectation, randomness and predictability.

Mercurial > hg > cip2012

comparison draft.tex @ 41:9d03f05b6528