Mercurial > hg > cip2012

Binary file draft.pdf has changed
--- a/draft.tex	Wed Mar 14 12:05:04 2012 +0000
+++ b/draft.tex	Wed Mar 14 13:17:05 2012 +0000
@@ -225,6 +225,56 @@

 \section{Theoretical review}

+	\subsection{Entropy and information}
+	Let $X$ denote some variable whose value is initially unknown to our
+	hypothetical observer. We will treat $X$ mathematically as a random variable,
+	with a value to be drawn from some set $\A$ and a
+	probability distribution representing the observer's beliefs about the
+	true value of $X$.
+	In this case, the observer's uncertainty about $X$ can be quantified
+	as the entropy of the random variable $H(X)$. For a discrete variable
+	with probability mass function $p:\A \to [0,1]$, this is
+	\begin{equation}
+		H(X) = \sum_{x\in\A} -p(x) \log p(x) = \expect{-\log p(X)},
+	\end{equation}
+	where $\expect{}$ is the expectation operator. The negative-log-probability
+	$\ell(x) = -\log p(x)$ of a particular value $x$ can usefully be thought of as
+	the \emph{surprisingness} of the value $x$ should it be observed, and
+	hence the entropy is the expected surprisingness.
+
+	Now suppose that the observer receives some new data $\Data$ that
+	causes a revision of its beliefs about $X$. The \emph{information}
+	in this new data \emph{about} $X$ can be quantified as the
+	Kullback-Leibler (KL) divergence between the prior and posterior
+	distributions $p(x)$ and $p(x|\Data)$ respectively:
+	\begin{equation}
+		\mathcal{I}_{\Data\to X} = D(p_{X|\Data} || p_{X})
+			= \sum_{x\in\A} p(x|\Data) \log \frac{p(x|\Data)}{p(x)}.
+	\end{equation}
+	When there are multiple variables $X_1, X_2$
+	\etc which the observer believes to be dependent, then the observation of
+	one may change its beliefs and hence yield information about the
+	others. The joint and conditional entropies as described in any
+	textbook on information theory (\eg \cite{CoverThomas}) then quantify
+	the observer's expected uncertainty about groups of variables given the
+	values of others. In particular, the \emph{mutual information}
+	$I(X_1;X_2)$ is both the expected information
+	in an observation of $X_2$ about $X_1$ and the expected reduction
+	in uncertainty about $X_1$ after observing $X_2$:
+	\begin{equation}
+		I(X_1;X_2) = H(X_1) - H(X_1|X_2),
+	\end{equation}
+	where $H(X_1|X_2) = H(X_1,X_2) - H(X_2)$ is the conditional entropy
+	of $X_2$ given $X_1$. A little algebra shows that $I(X_1;X_2)=I(X_2;X_1)$
+	and so the mutual information is symmetric in its arguments. A conditional
+	form of the mutual information can be formulated analogously:
+	\begin{equation}
+		I(X_1;X_2|X_3) = H(X_1|X_3) - H(X_1|X_2,X_3).
+	\end{equation}
+	These relationships between the various entropies and mutual
+	informations are conveniently visualised in Venn diagram-like \emph{information diagrams}
+	or I-diagrams \cite{Yeung1991} such as the one in \figrf{venn-example}.
+
 	\begin{fig}{venn-example}
 		\newcommand\rad{2.2em}%
 		\newcommand\circo{circle (3.4em)}%
@@ -304,42 +354,6 @@
 		}
 	\end{fig}

-	\subsection{Entropy and information}
-	Let $X$ denote some variable whose value is initially unknown to our
-	hypothetical observer. We will treat $X$ mathematically as a random variable,
-	with a value to be drawn from some set (or \emph{alphabet}) $\A$ and a
-	probability distribution representing the observer's beliefs about the
-	true value of $X$.
-	In this case, the observer's uncertainty about $X$ can be quantified
-	as the entropy of the random variable $H(X)$. For a discrete variable
-	with probability mass function $p:\A \to [0,1]$, this is
-	\begin{equation}
-		H(X) = \sum_{x\in\A} -p(x) \log p(x) = \expect{-\log p(X)},
-	\end{equation}
-	where $\expect{}$ is the expectation operator. The negative-log-probability
-	$\ell(x) = -\log p(x)$ of a particular value $x$ can usefully be thought of as
-	the \emph{surprisingness} of the value $x$ should it be observed, and
-	hence the entropy is the expected surprisingness.
-
-	Now suppose that the observer receives some new data $\Data$ that
-	causes a revision of its beliefs about $X$. The \emph{information}
-	in this new data \emph{about} $X$ can be quantified as the
-	Kullback-Leibler (KL) divergence between the prior and posterior
-	distributions $p(x)$ and $p(x|\Data)$ respectively:
-	\begin{equation}
-		\mathcal{I}_{\Data\to X} = D(p_{X|\Data} || p_{X})
-			= \sum_{x\in\A} p(x|\Data) \log \frac{p(x|\Data)}{p(x)}.
-	\end{equation}
-	If there are multiple variables $X_1, X_2$
-	\etc which the observer believes to be dependent, then the observation of
-	one may change its beliefs and hence yield information about the
-	others.
-	The relationships between the various joint entropies, conditional
-	entropies, mutual informations and conditional mutual informations
-	can be visualised in Venn diagram-like \emph{information diagrams}
-	or I-diagrams \cite{Yeung1991}, for example, the three-variable
-	I-diagram in \figrf{venn-example}.
-

 	\subsection{Entropy and information in sequences}

@@ -477,35 +491,17 @@
 	as
 	\begin{equation}
 %		\IXZ_t
-		I(X_t;\fut{X}_t|\past{X}_t) = H(X_t|\past{X}_t) - H(X_t|\fut{X}_t,\past{X}_t).
+		I(X_0;\fut{X}_0|\past{X}_0) = h_\mu - r_\mu,
 %		\label{<++>}
 	\end{equation}
 %	If $X$ is stationary, then
-	Now, in the shift-invariant case, $H(X_t|\past{X}_t)$
-	is the familiar entropy rate $h_\mu$, but $H(X_t|\fut{X}_t,\past{X}_t)$,
-	the conditional entropy of one variable given \emph{all} the others
-	in the sequence, future as well as past, is what
-	we called the \emph{residual entropy rate} $r_\mu$ in \cite{AbdallahPlumbley2010},
-	but was previously identified by Verd{\'u} and Weissman \cite{VerduWeissman2006} as the
-	\emph{erasure entropy rate}.
-	Thus, the PIR is the difference between
-	the entropy rate and the erasure entropy rate: $b_\mu = h_\mu - r_\mu$.
+	where $r_\mu = H(X_0|\fut{X}_0,\past{X}_0)$,
+	is the \emph{residual} \cite{AbdallahPlumbley2010},
+	or \emph{erasure} \cite{VerduWeissman2006} entropy rate.
 	These relationships are illustrated in \Figrf{predinfo-bg}, along with
 	several of the information measures we have discussed so far.


-  \begin{fig}{wundt}
-    \raisebox{-4em}{\colfig[0.43]{wundt}}
- %  {\ \shortstack{{\Large$\longrightarrow$}\\ {\scriptsize\emph{exposure}}}\ }
-    {\ {\large$\longrightarrow$}\ }
-    \raisebox{-4em}{\colfig[0.43]{wundt2}}
-    \caption{
-      The Wundt curve relating randomness/complexity with
-      perceived value. Repeated exposure sometimes results
-      in a move to the left along the curve \cite{Berlyne71}.
-    }
-  \end{fig}
-
 	\subsection{Other sequential information measures}

 	James et al \cite{JamesEllisonCrutchfield2011} study the predictive information
@@ -544,11 +540,30 @@
 	where $\hat{a}^{N+1}$ is the $N+1$th power of the transition matrix.


+  \begin{fig}{wundt}
+    \raisebox{-4em}{\colfig[0.43]{wundt}}
+ %  {\ \shortstack{{\Large$\longrightarrow$}\\ {\scriptsize\emph{exposure}}}\ }
+    {\ {\large$\longrightarrow$}\ }
+    \raisebox{-4em}{\colfig[0.43]{wundt2}}
+    \caption{
+      The Wundt curve relating randomness/complexity with
+      perceived value. Repeated exposure sometimes results
+      in a move to the left along the curve \cite{Berlyne71}.
+    }
+  \end{fig}
+

 \section{Information Dynamics in Analysis}

  	\subsection{Musicological Analysis}
-	refer to the work with the analysis of minimalist pieces
+	In \cite{AbdallahPlumbley2009}, methods based on the theory described above
+	were used to analysis two pieces of music in the minimalist style
+	by Philip Glass: \emph{Two Pages} (1969) and \emph{Gradus} (1968).
+	The analysis was done using a first-order Markov chain model, with the
+	enhancement that the transition matrix of the model was allowed to
+	evolve dynamically as the notes were processed, and was estimated (in
+	a Bayesian way) as a \emph{distribution} over possible transition matrices,
+	rather than a point estimate. [Bayesian surprise, other component of IPI].

     \begin{fig}{twopages}
       \colfig[0.96]{matbase/fig9471}  % update from mbc paper
@@ -641,20 +656,6 @@
 space; each one of these points corresponds to a transition
 matrix.\emph{self-plagiarised}

-\begin{figure}
-\centering
-\includegraphics[width=\linewidth]{figs/mtriscat}
-\caption{The population of transition matrices distributed along three axes of
-redundancy, entropy rate and predictive information rate (all measured in bits).
-The concentrations of points along the redundancy axis correspond
-to Markov chains which are roughly periodic with periods of 2 (redundancy 1 bit),
-3, 4, \etc all the way to period 8 (redundancy 3 bits). The colour of each point
-represents its PIR---note that the highest values are found at intermediate entropy
-and redundancy, and that the distribution as a whole makes a curved triangle. Although
-not visible in this plot, it is largely hollow in the middle.
-\label{InfoDynEngine}}
-\end{figure}
-

 When we look at the distribution of transition matrixes plotted in this space,
 we see that it forms an arch shape that is fairly thin.  It thus becomes a
@@ -669,11 +670,7 @@
 triangle, the corresponding transition matrix is returned.  Figure \ref{TheTriangle}
 shows how the triangle maps to different measures of redundancy, entropy rate
 and predictive information rate.\emph{self-plagiarised}
- \begin{figure}
-\centering
-\includegraphics[width=0.9\linewidth]{figs/TheTriangle.pdf}
-\caption{The Melody Triangle\label{TheTriangle}}
-\end{figure}
+
 Each corner corresponds to three different extremes of predictability and
 unpredictability, which could be loosely characterised as `periodicity', `noise'
 and `repetition'.  Melodies from the `noise' corner have no discernible pattern;
@@ -688,6 +685,20 @@
 sequences based on a simple model of how one might guess the next event given
 the previous one.\emph{self-plagiarised}

+\begin{figure}
+	\centering
+	\includegraphics[width=\linewidth]{figs/mtriscat}
+	\caption{The population of transition matrices distributed along three axes of
+	redundancy, entropy rate and predictive information rate (all measured in bits).
+	The concentrations of points along the redundancy axis correspond
+	to Markov chains which are roughly periodic with periods of 2 (redundancy 1 bit),
+	3, 4, \etc all the way to period 8 (redundancy 3 bits). The colour of each point
+	represents its PIR---note that the highest values are found at intermediate entropy
+	and redundancy, and that the distribution as a whole makes a curved triangle. Although
+	not visible in this plot, it is largely hollow in the middle.
+	\label{InfoDynEngine}}
+\end{figure}
+


 Any number of interfaces could be developed for the Melody Triangle.  We have
@@ -718,6 +729,12 @@

 	%As the Melody Triangle essentially operates on a stream of symbols, it it is possible to apply the melody triangle to the design of non-sonic content.

+ \begin{figure}
+\centering
+\includegraphics[width=0.9\linewidth]{figs/TheTriangle.pdf}
+\caption{The Melody Triangle\label{TheTriangle}}
+\end{figure}
+
 \section{Musical Preference and Information Dynamics}
 We carried out a preliminary study that sought to identify any correlation between
 aesthetic preference and the information theoretical measures of the Melody