cip2012: draft.tex comparison

comparison draft.tex @ 25:3f08d18c65ce

Updates to section 2.

author	samer
date	Tue, 13 Mar 2012 16:02:05 +0000
parents	79ede31feb20
children	fb1bfe785c05

comparison

equal deleted inserted replaced

-:79ede31feb20
+:3f08d18c65ce
 \newcommand\preals{\reals_+}
 \newcommand\X{\mathcal{X}}
 \newcommand\Y{\mathcal{Y}}
 \newcommand\domS{\mathcal{S}}
 \newcommand\A{\mathcal{A}}
+\newcommand\Data{\mathcal{D}}
 \newcommand\rvm[1]{\mathrm{#1}}
 \newcommand\sps{\,.\,}
 \newcommand\Ipred{\mathcal{I}_{\mathrm{pred}}}
 \newcommand\Ix{\mathcal{I}}
 \newcommand\IXZ{\overline{\underline{\mathcal{I}}}}
 	In this paper, we review the theoretical foundations of information dynamics
 	and discuss a few emerging areas of application.
 \end{abstract}
-\section{Expectation and surprise in music}
+\section{Introduction}
 \label{s:Intro}
+	\subsection{Expectation and surprise in music}
 	One of the effects of listening to music is to create
 	expectations of what is to come next, which may be fulfilled
 	immediately, after some delay, or not at all as the case may be.
 	This is the thesis put forward by, amongst others, music theorists
 	L. B. Meyer \cite{Meyer67} and Narmour \citep{Narmour77}, but was
 	on how we change and revise our conceptions \emph{as events happen}, on
 	how expectation and prediction interact with occurrence, and that, to a
 	large degree, the way to understand the effect of music is to focus on
 	this `kinetics' of expectation and surprise.
+Prediction and expectation are essentially probabilistic concepts
+and can be treated mathematically using probability theory.
+We suppose that when we listen to music, expectations are created on the basis
+	of our familiarity with various styles of music and our ability to
+	detect and learn statistical regularities in the music as they emerge,
+	There is experimental evidence that human listeners are able to internalise
+	statistical knowledge about musical structure, \eg
+	\citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also
+	that statistical models can form an effective basis for computational
+	analysis of music, \eg
+	\cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}.
+\comment{
 	The business of making predictions and assessing surprise is essentially
 	one of reasoning under conditions of uncertainty and manipulating
 	degrees of belief about the various proposition which may or may not
 	hold, and, as has been argued elsewhere \cite{Cox1946,Jaynes27}, best
 	quantified in terms of Bayesian probability theory.
 	statistical knowledge about musical structure, \eg
 	\citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also
 	that statistical models can form an effective basis for computational
 	analysis of music, \eg
 	\cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}.
+}
 	\subsection{Music and information theory}
 	With a probabilistic framework for music modelling and prediction in hand,
-	we are in a position to apply quantitative information theory \cite{Shannon48}.
+	we are in a position to apply Shannon's quantitative information theory
+	\cite{Shannon48}.
+\comment{
+	which provides us with a number of measures, such as entropy
+and mutual information, which are suitable for quantifying states of
+uncertainty and surprise, and thus could potentially enable us to build
+quantitative models of the listening process described above.  They are
+what Berlyne \cite{Berlyne71} called `collative variables' since they are
+to do with patterns of occurrence rather than medium-specific details.
+Berlyne sought to show that the collative variables are closely related to
+perceptual qualities like complexity, tension, interestingness,
+and even aesthetic value, not just in music, but in other temporal
+or visual media.
+The relevance of information theory to music and art has
+also been addressed by researchers from the 1950s onwards
+\cite{Youngblood58,CoonsKraehenbuehl1958,Cohen1962,HillerBean66,Moles66,Meyer67}.
+}
 	The relationship between information theory and music and art in general has been the
 	subject of some interest since the 1950s
 	\cite{Youngblood58,CoonsKraehenbuehl1958,HillerBean66,Moles66,Meyer67,Cohen1962}.
 	The general thesis is that perceptible qualities and subjective
 	states like uncertainty, surprise, complexity, tension, and interestingness
 %	of the material, the composer can thus define, and induce within the
 %	listener, a temporal programme of varying
 %	levels of uncertainty, ambiguity and surprise.
-	Previous work in this area \cite{Berlyne74} treated the various
-	information theoretic quantities
-	such as entropy as if they were intrinsic properties of the stimulus---subjects
-	were presented with a sequence of tones with `high entropy', or a visual pattern
-	with `low entropy'. These values were determined from some known `objective'
-	probability model of the stimuli,%
-	\footnote{%
-		The notion of objective probabalities and whether or not they can
-		usefully be said to exist is the subject of some debate, with advocates of
-		subjective probabilities including de Finetti \cite{deFinetti}.}
-	or from simple statistical analyses such as
-	computing emprical distributions. Our approach is explicitly to consider the role
-	of the observer in perception, and more specifically, to consider estimates of
-	entropy \etc with respect to \emph{subjective} probabilities.
 \subsection{Information dynamic approach}
 	Bringing the various strands together, our working hypothesis is that as a
 	listener (to which will refer as `it') listens to a piece of music, it maintains
-	a dynamically evolving statistical model that enables it to make predictions
+	a dynamically evolving probabilistic model that enables it to make predictions
 	about how the piece will continue, relying on both its previous experience
 	of music and the immediate context of the piece.  As events unfold, it revises
-	its model and hence its probabilistic belief state, which includes predictive
+	its probabilistic belief state, which includes predictive
-	distributions over future observations.  These distributions and changes in
+	distributions over possible future events.  These
-	distributions can be characterised in terms of a handful of information
+%	distributions and changes in distributions
-	theoretic-measures such as entropy and relative entropy.  By tracing the
+	can be characterised in terms of a handful of information
+	theoretic-measures such as entropy and relative entropy.  By tracing the
 	evolution of a these measures, we obtain a representation which captures much
-	of the significant structure of the music, but does so at a high level of
+	of the significant structure of the music.
-	\emph{abstraction}, since it is sensitive mainly to \emph{patterns} of occurence,
-	rather the details of which specific things occur or even the sensory modality
+	One of the consequences of this approach is that regardless of the details of
-	through which they are detected.  This suggests that the
+	the sensory input or even which sensory modality is being processed, the resulting
-	same approach could, in principle, be used to analyse and compare information
+	analysis is in terms of the same units: quantities of information (bits) and
-	flow in different temporal media regardless of whether they are auditory,
+	rates of information flow (bits per second). The probabilistic and information
-	visual or otherwise.
+	theoretic concepts in terms of which the analysis is framed are universal to all sorts
+	of data.
-	In addition, the information dynamic approach gives us a principled way
+	In addition, when adaptive probabilistic models are used, expectations are
+	created mainly in response to to \emph{patterns} of occurence,
+	rather the details of which specific things occur.
+	Together, these suggest that an information dynamic analysis captures a
+	high level of \emph{abstraction}, and could be used to
+	make structural comparisons between different temporal media,
+	such as music, film, animation, and dance.
+%	analyse and compare information
+%	flow in different temporal media regardless of whether they are auditory,
+%	visual or otherwise.
+	Another consequence is that the information dynamic approach gives us a principled way
 	to address the notion of \emph{subjectivity}, since the analysis is dependent on the
 	probability model the observer starts off with, which may depend on prior experience
 	or other factors, and which may change over time. Thus, inter-subject variablity and
 	variation in subjects' responses over time are
 	fundamental to the theory.
 \section{Theoretical review}
 	\subsection{Entropy and information in sequences}
-	In this section, we summarise the definitions of some of the relevant quantities
+	Let $X$ denote some variable whos value is initially unknown to our
-	in information dynamics and show how they can be computed in some simple probabilistic
+	hypothetical observer. We will treat $X$ mathematically as a random variable,
-	models (namely, first and higher-order Markov chains, and Gaussian processes [Peter?]).
+	with a value to be drawn from some set (or \emph{alphabet}) $\A$ and a
+	probability distribution representing the observer's beliefs about the
+	true value of $X$.
+	In this case, the observer's uncertainty about $X$ can be quantified
+	as the entropy of the random variable $H(X)$. For a discrete variable
+	with probability mass function $p:\A \to [0,1]$, this is
+	\begin{equation}
+		H(X) = \sum_{x\in\A} -p(x) \log p(x) = \expect{-\log p(X)},
+	\end{equation}
+	where $\expect{}$ is the expectation operator. The negative-log-probability
+	$\ell(x) = -\log p(x)$ of a particular value $x$ can usefully be thought of as
+	the \emph{surprisingness} of the value $x$ should it be observed, and
+	hence the entropy is the expected surprisingness.
+	Now suppose that the observer receives some new data $\Data$ that
+	causes a revision of its beliefs about $X$. The \emph{information}
+	in this new data \emph{about} $X$ can be quantified as the
+	Kullback-Leibler (KL) divergence between the prior and posterior
+	distributions $p(x)$ and $p(x|\Data)$ respectively:
+	\begin{equation}
+		\mathcal{I}_{\Data\to X} = D(p_{X|\Data} || p_{X})
+			= \sum_{x\in\A} p(x|\Data) \log \frac{p(x|\Data)}{p(x)}.
+	\end{equation}
+	If there are multiple variables $X_1, X_2$
+	\etc which the observer believes to be dependent, then the observation of
+	one may change its beliefs and hence yield information about the
+	others.
+	The relationships between the various joint entropies, conditional
+	entropies, mutual informations and conditional mutual informations
+	can be visualised in Venn diagram-like \emph{information diagrams}
+	or I-diagrams \cite{Yeung1991}, for example, the three-variable
+	I-diagram in \figrf{venn-example}.
 	\begin{fig}{venn-example}
 		\newcommand\rad{2.2em}%
 		\newcommand\circo{circle (3.4em)}%
 		\newcommand\labrad{4.3em}
 		rate, $r_\mu$ is the residual entropy rate, and $b_\mu$ is the predictive
 		information rate. The entropy rate is $h_\mu = r_\mu+b_\mu$.
 		}
 	\end{fig}
-	\paragraph{Predictive information rate}
+	\subsection{Predictive information rate}
 	In previous work \cite{AbdallahPlumbley2009}, we introduced
 %	examined several
 %	information-theoretic measures that could be used to characterise
 %	not only random processes (\ie, an ensemble of possible sequences),
 %	but also the dynamic progress of specific realisations of such processes.
 perceived value. Repeated exposure sometimes results
 in a move to the left along the curve \cite{Berlyne71}.
 }
 \end{fig}
+	\subsection{Other sequential information measures}
+	James et al \cite{JamesEllisonCrutchfield2011} study the predictive information
+	rate and also examine some related measures. In particular they identify the
+	$\sigma_\mu$, the difference between the multi-information rate and the excess
+	entropy, as an interesting quantity that measures the predictive benefit of
+	model-building (that is, maintaining an internal state summarising past
+	observations in order to make better predictions). They also identify
+	$w_\mu = \rho_\mu + b_{\mu}$, which they call the \emph{local exogenous
+	information}.
 	\subsection{First order Markov chains}
 	These are the simplest non-trivial models to which information dynamics methods
 	can be applied. In \cite{AbdallahPlumbley2009} we, showed that the predictive information
 	rate can be expressed simply in terms of the entropy rate of the Markov chain.

Mercurial > hg > cip2012

comparison draft.tex @ 25:3f08d18c65ce