diff draft.tex @ 25:3f08d18c65ce

Updates to section 2.
author samer
date Tue, 13 Mar 2012 16:02:05 +0000
parents 79ede31feb20
children fb1bfe785c05
line wrap: on
line diff
--- a/draft.tex	Tue Mar 13 11:28:02 2012 +0000
+++ b/draft.tex	Tue Mar 13 16:02:05 2012 +0000
@@ -21,6 +21,7 @@
 \newcommand\Y{\mathcal{Y}}
 \newcommand\domS{\mathcal{S}}
 \newcommand\A{\mathcal{A}}
+\newcommand\Data{\mathcal{D}}
 \newcommand\rvm[1]{\mathrm{#1}}
 \newcommand\sps{\,.\,}
 \newcommand\Ipred{\mathcal{I}_{\mathrm{pred}}}
@@ -73,9 +74,10 @@
 \end{abstract}
 
 
-\section{Expectation and surprise in music}
+\section{Introduction}
 \label{s:Intro}
 
+	\subsection{Expectation and surprise in music}
 	One of the effects of listening to music is to create 
 	expectations of what is to come next, which may be fulfilled
 	immediately, after some delay, or not at all as the case may be.
@@ -103,6 +105,20 @@
 	large degree, the way to understand the effect of music is to focus on
 	this `kinetics' of expectation and surprise.
 
+  Prediction and expectation are essentially probabilistic concepts
+  and can be treated mathematically using probability theory.
+  We suppose that when we listen to music, expectations are created on the basis 
+	of our familiarity with various styles of music and our ability to
+	detect and learn statistical regularities in the music as they emerge,
+	There is experimental evidence that human listeners are able to internalise
+	statistical knowledge about musical structure, \eg
+	\citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also
+	that statistical models can form an effective basis for computational
+	analysis of music, \eg
+	\cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}.
+
+
+\comment{
 	The business of making predictions and assessing surprise is essentially
 	one of reasoning under conditions of uncertainty and manipulating
 	degrees of belief about the various proposition which may or may not
@@ -120,10 +136,27 @@
 	that statistical models can form an effective basis for computational
 	analysis of music, \eg
 	\cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}.
+}
 
 	\subsection{Music and information theory}
 	With a probabilistic framework for music modelling and prediction in hand,
-	we are in a position to apply quantitative information theory \cite{Shannon48}.
+	we are in a position to apply Shannon's quantitative information theory 
+	\cite{Shannon48}. 
+\comment{
+	which provides us with a number of measures, such as entropy
+  and mutual information, which are suitable for quantifying states of
+  uncertainty and surprise, and thus could potentially enable us to build
+  quantitative models of the listening process described above.  They are
+  what Berlyne \cite{Berlyne71} called `collative variables' since they are
+  to do with patterns of occurrence rather than medium-specific details.
+  Berlyne sought to show that the collative variables are closely related to
+  perceptual qualities like complexity, tension, interestingness,
+  and even aesthetic value, not just in music, but in other temporal
+  or visual media.
+  The relevance of information theory to music and art has
+  also been addressed by researchers from the 1950s onwards
+  \cite{Youngblood58,CoonsKraehenbuehl1958,Cohen1962,HillerBean66,Moles66,Meyer67}.
+}
 	The relationship between information theory and music and art in general has been the 
 	subject of some interest since the 1950s 
 	\cite{Youngblood58,CoonsKraehenbuehl1958,HillerBean66,Moles66,Meyer67,Cohen1962}. 
@@ -146,41 +179,39 @@
 %	levels of uncertainty, ambiguity and surprise. 
 
 
-	Previous work in this area \cite{Berlyne74} treated the various 
-	information theoretic quantities
-	such as entropy as if they were intrinsic properties of the stimulus---subjects
-	were presented with a sequence of tones with `high entropy', or a visual pattern
-	with `low entropy'. These values were determined from some known `objective'
-	probability model of the stimuli,%
-	\footnote{%
-		The notion of objective probabalities and whether or not they can
-		usefully be said to exist is the subject of some debate, with advocates of 
-		subjective probabilities including de Finetti \cite{deFinetti}.} 
-	or from simple statistical analyses such as
-	computing emprical distributions. Our approach is explicitly to consider the role
-	of the observer in perception, and more specifically, to consider estimates of
-	entropy \etc with respect to \emph{subjective} probabilities.
 \subsection{Information dynamic approach}
 
 	Bringing the various strands together, our working hypothesis is that as a
 	listener (to which will refer as `it') listens to a piece of music, it maintains
-	a dynamically evolving statistical model that enables it to make predictions
+	a dynamically evolving probabilistic model that enables it to make predictions
 	about how the piece will continue, relying on both its previous experience
 	of music and the immediate context of the piece.  As events unfold, it revises
-	its model and hence its probabilistic belief state, which includes predictive
-	distributions over future observations.  These distributions and changes in
-	distributions can be characterised in terms of a handful of information
-	theoretic-measures such as entropy and relative entropy.  By tracing the
+	its probabilistic belief state, which includes predictive
+	distributions over possible future events.  These 
+%	distributions and changes in distributions 
+	can be characterised in terms of a handful of information
+	theoretic-measures such as entropy and relative entropy.  By tracing the 
 	evolution of a these measures, we obtain a representation which captures much
-	of the significant structure of the music, but does so at a high level of
-	\emph{abstraction}, since it is sensitive mainly to \emph{patterns} of occurence, 
-	rather the details of which specific things occur or even the sensory modality
-	through which they are detected.  This suggests that the 
-	same approach could, in principle, be used to analyse and compare information 
-	flow in different temporal media regardless of whether they are auditory, 
-	visual or otherwise. 
+	of the significant structure of the music.
+	
+	One of the consequences of this approach is that regardless of the details of
+	the sensory input or even which sensory modality is being processed, the resulting 
+	analysis is in terms of the same units: quantities of information (bits) and
+	rates of information flow (bits per second). The probabilistic and information
+	theoretic concepts in terms of which the analysis is framed are universal to all sorts 
+	of data.
+	In addition, when adaptive probabilistic models are used, expectations are
+	created mainly in response to to \emph{patterns} of occurence, 
+	rather the details of which specific things occur.
+	Together, these suggest that an information dynamic analysis captures a
+	high level of \emph{abstraction}, and could be used to 
+	make structural comparisons between different temporal media,
+	such as music, film, animation, and dance.
+%	analyse and compare information 
+%	flow in different temporal media regardless of whether they are auditory, 
+%	visual or otherwise. 
 
-	In addition, the information dynamic approach gives us a principled way
+	Another consequence is that the information dynamic approach gives us a principled way
 	to address the notion of \emph{subjectivity}, since the analysis is dependent on the 
 	probability model the observer starts off with, which may depend on prior experience 
 	or other factors, and which may change over time. Thus, inter-subject variablity and 
@@ -195,9 +226,41 @@
 \section{Theoretical review}
 
 	\subsection{Entropy and information in sequences}
-	In this section, we summarise the definitions of some of the relevant quantities
-	in information dynamics and show how they can be computed in some simple probabilistic
-	models (namely, first and higher-order Markov chains, and Gaussian processes [Peter?]).
+	Let $X$ denote some variable whos value is initially unknown to our 
+	hypothetical observer. We will treat $X$ mathematically as a random variable,
+	with a value to be drawn from some set (or \emph{alphabet}) $\A$ and a 
+	probability distribution representing the observer's beliefs about the 
+	true value of $X$.
+	In this case, the observer's uncertainty about $X$ can be quantified
+	as the entropy of the random variable $H(X)$. For a discrete variable
+	with probability mass function $p:\A \to [0,1]$, this is
+	\begin{equation}
+		H(X) = \sum_{x\in\A} -p(x) \log p(x) = \expect{-\log p(X)},
+	\end{equation}
+	where $\expect{}$ is the expectation operator. The negative-log-probability
+	$\ell(x) = -\log p(x)$ of a particular value $x$ can usefully be thought of as
+	the \emph{surprisingness} of the value $x$ should it be observed, and
+	hence the entropy is the expected surprisingness.
+
+	Now suppose that the observer receives some new data $\Data$ that
+	causes a revision of its beliefs about $X$. The \emph{information}
+	in this new data \emph{about} $X$ can be quantified as the 
+	Kullback-Leibler (KL) divergence between the prior and posterior
+	distributions $p(x)$ and $p(x|\Data)$ respectively:
+	\begin{equation}
+		\mathcal{I}_{\Data\to X} = D(p_{X|\Data} || p_{X})
+			= \sum_{x\in\A} p(x|\Data) \log \frac{p(x|\Data)}{p(x)}.
+	\end{equation}
+	If there are multiple variables $X_1, X_2$ 
+	\etc which the observer believes to be dependent, then the observation of 
+	one may change its beliefs and hence yield information about the
+	others.
+	The relationships between the various joint entropies, conditional
+	entropies, mutual informations and conditional mutual informations
+	can be visualised in Venn diagram-like \emph{information diagrams}
+	or I-diagrams \cite{Yeung1991}, for example, the three-variable
+	I-diagram in \figrf{venn-example}.
+
 
 	\begin{fig}{venn-example}
 		\newcommand\rad{2.2em}%
@@ -362,7 +425,7 @@
 		}
 	\end{fig}
 
-	\paragraph{Predictive information rate}
+	\subsection{Predictive information rate}
 	In previous work \cite{AbdallahPlumbley2009}, we introduced 
 %	examined several
 %	information-theoretic measures that could be used to characterise
@@ -432,6 +495,16 @@
     }
   \end{fig}
 
+	\subsection{Other sequential information measures}
+
+	James et al \cite{JamesEllisonCrutchfield2011} study the predictive information
+	rate and also examine some related measures. In particular they identify the
+	$\sigma_\mu$, the difference between the multi-information rate and the excess
+	entropy, as an interesting quantity that measures the predictive benefit of
+	model-building (that is, maintaining an internal state summarising past 
+	observations in order to make better predictions). They also identify
+	$w_\mu = \rho_\mu + b_{\mu}$, which they call the \emph{local exogenous
+	information}.
 
 	\subsection{First order Markov chains}
 	These are the simplest non-trivial models to which information dynamics methods