cip2012: draft.tex comparison

comparison draft.tex @ 34:25846c37a08a

New bits, rearranged figure placement.

author	samer
date	Wed, 14 Mar 2012 13:17:05 +0000
parents	a9c8580cb8ca
children	194c7ec7e35d

comparison

equal deleted inserted replaced

-:a9c8580cb8ca
+:25846c37a08a
 	%and selective or evaluative phases \cite{Boden1990}, and would have
 	%applications in tools for computer aided composition.
 \section{Theoretical review}
+	\subsection{Entropy and information}
+	Let $X$ denote some variable whose value is initially unknown to our
+	hypothetical observer. We will treat $X$ mathematically as a random variable,
+	with a value to be drawn from some set $\A$ and a
+	probability distribution representing the observer's beliefs about the
+	true value of $X$.
+	In this case, the observer's uncertainty about $X$ can be quantified
+	as the entropy of the random variable $H(X)$. For a discrete variable
+	with probability mass function $p:\A \to [0,1]$, this is
+	\begin{equation}
+		H(X) = \sum_{x\in\A} -p(x) \log p(x) = \expect{-\log p(X)},
+	\end{equation}
+	where $\expect{}$ is the expectation operator. The negative-log-probability
+	$\ell(x) = -\log p(x)$ of a particular value $x$ can usefully be thought of as
+	the \emph{surprisingness} of the value $x$ should it be observed, and
+	hence the entropy is the expected surprisingness.
+	Now suppose that the observer receives some new data $\Data$ that
+	causes a revision of its beliefs about $X$. The \emph{information}
+	in this new data \emph{about} $X$ can be quantified as the
+	Kullback-Leibler (KL) divergence between the prior and posterior
+	distributions $p(x)$ and $p(x|\Data)$ respectively:
+	\begin{equation}
+		\mathcal{I}_{\Data\to X} = D(p_{X|\Data} || p_{X})
+			= \sum_{x\in\A} p(x|\Data) \log \frac{p(x|\Data)}{p(x)}.
+	\end{equation}
+	When there are multiple variables $X_1, X_2$
+	\etc which the observer believes to be dependent, then the observation of
+	one may change its beliefs and hence yield information about the
+	others. The joint and conditional entropies as described in any
+	textbook on information theory (\eg \cite{CoverThomas}) then quantify
+	the observer's expected uncertainty about groups of variables given the
+	values of others. In particular, the \emph{mutual information}
+	$I(X_1;X_2)$ is both the expected information
+	in an observation of $X_2$ about $X_1$ and the expected reduction
+	in uncertainty about $X_1$ after observing $X_2$:
+	\begin{equation}
+		I(X_1;X_2) = H(X_1) - H(X_1|X_2),
+	\end{equation}
+	where $H(X_1|X_2) = H(X_1,X_2) - H(X_2)$ is the conditional entropy
+	of $X_2$ given $X_1$. A little algebra shows that $I(X_1;X_2)=I(X_2;X_1)$
+	and so the mutual information is symmetric in its arguments. A conditional
+	form of the mutual information can be formulated analogously:
+	\begin{equation}
+		I(X_1;X_2|X_3) = H(X_1|X_3) - H(X_1|X_2,X_3).
+	\end{equation}
+	These relationships between the various entropies and mutual
+	informations are conveniently visualised in Venn diagram-like \emph{information diagrams}
+	or I-diagrams \cite{Yeung1991} such as the one in \figrf{venn-example}.
 	\begin{fig}{venn-example}
 		\newcommand\rad{2.2em}%
 		\newcommand\circo{circle (3.4em)}%
 		\newcommand\labrad{4.3em}
 		The total shaded area is the joint entropy $H(X_1,X_2,X_3)$.
 		The central area $I_{123}$ is the co-information \cite{McGill1954}.
 		Some other information measures are indicated in the legend.
 		}
 	\end{fig}
-	\subsection{Entropy and information}
-	Let $X$ denote some variable whose value is initially unknown to our
-	hypothetical observer. We will treat $X$ mathematically as a random variable,
-	with a value to be drawn from some set (or \emph{alphabet}) $\A$ and a
-	probability distribution representing the observer's beliefs about the
-	true value of $X$.
-	In this case, the observer's uncertainty about $X$ can be quantified
-	as the entropy of the random variable $H(X)$. For a discrete variable
-	with probability mass function $p:\A \to [0,1]$, this is
-	\begin{equation}
-		H(X) = \sum_{x\in\A} -p(x) \log p(x) = \expect{-\log p(X)},
-	\end{equation}
-	where $\expect{}$ is the expectation operator. The negative-log-probability
-	$\ell(x) = -\log p(x)$ of a particular value $x$ can usefully be thought of as
-	the \emph{surprisingness} of the value $x$ should it be observed, and
-	hence the entropy is the expected surprisingness.
-	Now suppose that the observer receives some new data $\Data$ that
-	causes a revision of its beliefs about $X$. The \emph{information}
-	in this new data \emph{about} $X$ can be quantified as the
-	Kullback-Leibler (KL) divergence between the prior and posterior
-	distributions $p(x)$ and $p(x|\Data)$ respectively:
-	\begin{equation}
-		\mathcal{I}_{\Data\to X} = D(p_{X|\Data} || p_{X})
-			= \sum_{x\in\A} p(x|\Data) \log \frac{p(x|\Data)}{p(x)}.
-	\end{equation}
-	If there are multiple variables $X_1, X_2$
-	\etc which the observer believes to be dependent, then the observation of
-	one may change its beliefs and hence yield information about the
-	others.
-	The relationships between the various joint entropies, conditional
-	entropies, mutual informations and conditional mutual informations
-	can be visualised in Venn diagram-like \emph{information diagrams}
-	or I-diagrams \cite{Yeung1991}, for example, the three-variable
-	I-diagram in \figrf{venn-example}.
 	\subsection{Entropy and information in sequences}
 	Suppose that  $(\ldots,X_{-1},X_0,X_1,\ldots)$ is a stationary sequence of
 	in uncertainty about the future on learning $X_t$, given the past.
 	Due to the symmetry of the mutual information, it can also be written
 	as
 	\begin{equation}
 %		\IXZ_t
-		I(X_t;\fut{X}_t|\past{X}_t) = H(X_t|\past{X}_t) - H(X_t|\fut{X}_t,\past{X}_t).
+		I(X_0;\fut{X}_0|\past{X}_0) = h_\mu - r_\mu,
 %		\label{<++>}
 	\end{equation}
 %	If $X$ is stationary, then
-	Now, in the shift-invariant case, $H(X_t|\past{X}_t)$
+	where $r_\mu = H(X_0|\fut{X}_0,\past{X}_0)$,
-	is the familiar entropy rate $h_\mu$, but $H(X_t|\fut{X}_t,\past{X}_t)$,
+	is the \emph{residual} \cite{AbdallahPlumbley2010},
-	the conditional entropy of one variable given \emph{all} the others
+	or \emph{erasure} \cite{VerduWeissman2006} entropy rate.
-	in the sequence, future as well as past, is what
-	we called the \emph{residual entropy rate} $r_\mu$ in \cite{AbdallahPlumbley2010},
-	but was previously identified by Verd{\'u} and Weissman \cite{VerduWeissman2006} as the
-	\emph{erasure entropy rate}.
-	Thus, the PIR is the difference between
-	the entropy rate and the erasure entropy rate: $b_\mu = h_\mu - r_\mu$.
 	These relationships are illustrated in \Figrf{predinfo-bg}, along with
 	several of the information measures we have discussed so far.
+	\subsection{Other sequential information measures}
+	James et al \cite{JamesEllisonCrutchfield2011} study the predictive information
+	rate and also examine some related measures. In particular they identify the
+	$\sigma_\mu$, the difference between the multi-information rate and the excess
+	entropy, as an interesting quantity that measures the predictive benefit of
+	model-building (that is, maintaining an internal state summarising past
+	observations in order to make better predictions). They also identify
+	$w_\mu = \rho_\mu + b_{\mu}$, which they call the \emph{local exogenous
+	information} rate.
+	\subsection{First order Markov chains}
+	These are the simplest non-trivial models to which information dynamics methods
+	can be applied. In \cite{AbdallahPlumbley2009} we, showed that the predictive information
+	rate can be expressed simply in terms of the entropy rate of the Markov chain.
+	If we let $a$ denote the transition matrix of the Markov chain, and $h_a$ it's
+	entropy rate, then its predictive information rate $b_a$ is
+	\begin{equation}
+		b_a = h_{a^2} - h_a,
+	\end{equation}
+	where $a^2 = aa$, the transition matrix squared, is the transition matrix
+	of the `skip one' Markov chain obtained by leaving out every other observation.
+	\subsection{Higher order Markov chains}
+	Second and higher order Markov chains can be treated in a similar way by transforming
+	to a first order representation of the high order Markov chain. If we are dealing
+	with an $N$th order model, this is done forming a new alphabet of possible observations
+	consisting of all possible $N$-tuples of symbols from the base alphabet. An observation
+	in this new model represents a block of $N$ observations from the base model. The next
+	observation represents the block of $N$ obtained by shift the previous block along
+	by one step. The new Markov of chain is parameterised by a sparse $K^N\times K^N$
+	transition matrix $\hat{a}$.
+	\begin{equation}
+		b_{\hat{a}} = h_{\hat{a}^{N+1}} - N h_{\hat{a}},
+	\end{equation}
+	where $\hat{a}^{N+1}$ is the $N+1$th power of the transition matrix.
 \begin{fig}{wundt}
 \raisebox{-4em}{\colfig[0.43]{wundt}}
 %  {\ \shortstack{{\Large$\longrightarrow$}\\ {\scriptsize\emph{exposure}}}\ }
 {\ {\large$\longrightarrow$}\ }
 perceived value. Repeated exposure sometimes results
 in a move to the left along the curve \cite{Berlyne71}.
 }
 \end{fig}
-	\subsection{Other sequential information measures}
-	James et al \cite{JamesEllisonCrutchfield2011} study the predictive information
-	rate and also examine some related measures. In particular they identify the
-	$\sigma_\mu$, the difference between the multi-information rate and the excess
-	entropy, as an interesting quantity that measures the predictive benefit of
-	model-building (that is, maintaining an internal state summarising past
-	observations in order to make better predictions). They also identify
-	$w_\mu = \rho_\mu + b_{\mu}$, which they call the \emph{local exogenous
-	information} rate.
-	\subsection{First order Markov chains}
-	These are the simplest non-trivial models to which information dynamics methods
-	can be applied. In \cite{AbdallahPlumbley2009} we, showed that the predictive information
-	rate can be expressed simply in terms of the entropy rate of the Markov chain.
-	If we let $a$ denote the transition matrix of the Markov chain, and $h_a$ it's
-	entropy rate, then its predictive information rate $b_a$ is
-	\begin{equation}
-		b_a = h_{a^2} - h_a,
-	\end{equation}
-	where $a^2 = aa$, the transition matrix squared, is the transition matrix
-	of the `skip one' Markov chain obtained by leaving out every other observation.
-	\subsection{Higher order Markov chains}
-	Second and higher order Markov chains can be treated in a similar way by transforming
-	to a first order representation of the high order Markov chain. If we are dealing
-	with an $N$th order model, this is done forming a new alphabet of possible observations
-	consisting of all possible $N$-tuples of symbols from the base alphabet. An observation
-	in this new model represents a block of $N$ observations from the base model. The next
-	observation represents the block of $N$ obtained by shift the previous block along
-	by one step. The new Markov of chain is parameterised by a sparse $K^N\times K^N$
-	transition matrix $\hat{a}$.
-	\begin{equation}
-		b_{\hat{a}} = h_{\hat{a}^{N+1}} - N h_{\hat{a}},
-	\end{equation}
-	where $\hat{a}^{N+1}$ is the $N+1$th power of the transition matrix.
 \section{Information Dynamics in Analysis}
 	\subsection{Musicological Analysis}
-	refer to the work with the analysis of minimalist pieces
+	In \cite{AbdallahPlumbley2009}, methods based on the theory described above
+	were used to analysis two pieces of music in the minimalist style
+	by Philip Glass: \emph{Two Pages} (1969) and \emph{Gradus} (1968).
+	The analysis was done using a first-order Markov chain model, with the
+	enhancement that the transition matrix of the model was allowed to
+	evolve dynamically as the notes were processed, and was estimated (in
+	a Bayesian way) as a \emph{distribution} over possible transition matrices,
+	rather than a point estimate. [Bayesian surprise, other component of IPI].
 \begin{fig}{twopages}
 \colfig[0.96]{matbase/fig9471}  % update from mbc paper
 %      \colfig[0.97]{matbase/fig72663}\\  % later update from mbc paper (Keith's new picks)
 			\vspace*{1em}
 Markov chains, by a random sampling method. In figure \ref{InfoDynEngine} we see
 a representation of how these matrixes are distributed in the 3d statistical
 space; each one of these points corresponds to a transition
 matrix.\emph{self-plagiarised}
-\begin{figure}
-\centering
-\includegraphics[width=\linewidth]{figs/mtriscat}
-\caption{The population of transition matrices distributed along three axes of
-redundancy, entropy rate and predictive information rate (all measured in bits).
-The concentrations of points along the redundancy axis correspond
-to Markov chains which are roughly periodic with periods of 2 (redundancy 1 bit),
-3, 4, \etc all the way to period 8 (redundancy 3 bits). The colour of each point
-represents its PIR---note that the highest values are found at intermediate entropy
-and redundancy, and that the distribution as a whole makes a curved triangle. Although
-not visible in this plot, it is largely hollow in the middle.
-\label{InfoDynEngine}}
-\end{figure}
 When we look at the distribution of transition matrixes plotted in this space,
 we see that it forms an arch shape that is fairly thin.  It thus becomes a
 reasonable approximation to pretend that it is just a sheet in two dimensions;
 and so we stretch out this curved arc into a flat triangle.  It is this triangular
 system, or as an interactive installation, it involves a mapping to this statistical
 space.  When the user, through the interface, selects a position within the
 triangle, the corresponding transition matrix is returned.  Figure \ref{TheTriangle}
 shows how the triangle maps to different measures of redundancy, entropy rate
 and predictive information rate.\emph{self-plagiarised}
-\begin{figure}
-\centering
-\includegraphics[width=0.9\linewidth]{figs/TheTriangle.pdf}
-\caption{The Melody Triangle\label{TheTriangle}}
-\end{figure}
 Each corner corresponds to three different extremes of predictability and
 unpredictability, which could be loosely characterised as `periodicity', `noise'
 and `repetition'.  Melodies from the `noise' corner have no discernible pattern;
 they have high entropy rate, low predictive information rate and low redundancy.
 These melodies are essentially totally random.  A melody along the `periodicity'
 dom. Or, conversely, that are predictable, but not entirely so.  This triangular
 space allows for an intuitive explorationof expectation and surprise in temporal
 sequences based on a simple model of how one might guess the next event given
 the previous one.\emph{self-plagiarised}
+\begin{figure}
+	\centering
+	\includegraphics[width=\linewidth]{figs/mtriscat}
+	\caption{The population of transition matrices distributed along three axes of
+	redundancy, entropy rate and predictive information rate (all measured in bits).
+	The concentrations of points along the redundancy axis correspond
+	to Markov chains which are roughly periodic with periods of 2 (redundancy 1 bit),
+	3, 4, \etc all the way to period 8 (redundancy 3 bits). The colour of each point
+	represents its PIR---note that the highest values are found at intermediate entropy
+	and redundancy, and that the distribution as a whole makes a curved triangle. Although
+	not visible in this plot, it is largely hollow in the middle.
+	\label{InfoDynEngine}}
+\end{figure}
 Any number of interfaces could be developed for the Melody Triangle.  We have
 developed two; a standard screen based interface where a user moves tokens with
 a mouse in and around a triangle on screen, and a multi-user interactive
 Additionally the Melody Triangle serves as an effective tool for experimental investigations into musical preference and their relationship to the information dynamics models.
 	%As the Melody Triangle essentially operates on a stream of symbols, it it is possible to apply the melody triangle to the design of non-sonic content.
+\begin{figure}
+\centering
+\includegraphics[width=0.9\linewidth]{figs/TheTriangle.pdf}
+\caption{The Melody Triangle\label{TheTriangle}}
+\end{figure}
 \section{Musical Preference and Information Dynamics}
 We carried out a preliminary study that sought to identify any correlation between
 aesthetic preference and the information theoretical measures of the Melody
 Triangle.  In this study participants were asked to use the screen based interface
 but it was simplified so that all they could do was move tokens around.  To help

Mercurial > hg > cip2012

comparison draft.tex @ 34:25846c37a08a