cip2012: draft.tex comparison

comparison draft.tex @ 18:ca694f7dc3f9

added bib files.

author	samer
date	Fri, 09 Mar 2012 18:45:55 +0000
parents	e47aaea2ac28
children	739b2444a4ac

comparison

equal deleted inserted replaced

-:e47aaea2ac28
+:ca694f7dc3f9
-\documentclass[conference]{IEEEtran}
+\documentclass[conference,a4paper]{IEEEtran}
 \usepackage{cite}
 \usepackage[cmex10]{amsmath}
 \usepackage{graphicx}
 \usepackage{amssymb}
 \usepackage{epstopdf}
 \usepackage{url}
 \usepackage{listings}
+%\usepackage[expectangle]{tools}
 \usepackage{tools}
+\usepackage{tikz}
+\usetikzlibrary{calc}
+\usetikzlibrary{matrix}
+\usetikzlibrary{patterns}
+\usetikzlibrary{arrows}
 \let\citep=\cite
-\def\squash{}
+\newcommand{\colfig}[2][1]{\includegraphics[width=#1\linewidth]{figs/#2}}%
+\newcommand\preals{\reals_+}
+\newcommand\X{\mathcal{X}}
+\newcommand\Y{\mathcal{Y}}
+\newcommand\domS{\mathcal{S}}
+\newcommand\A{\mathcal{A}}
+\newcommand\rvm[1]{\mathrm{#1}}
+\newcommand\sps{\,.\,}
+\newcommand\Ipred{\mathcal{I}_{\mathrm{pred}}}
+\newcommand\Ix{\mathcal{I}}
+\newcommand\IXZ{\overline{\underline{\mathcal{I}}}}
+\newcommand\x{\vec{x}}
+\newcommand\Ham[1]{\mathcal{H}_{#1}}
+\newcommand\subsets[2]{[#1]^{(k)}}
+\def\bet(#1,#2){#1..#2}
+\def\ev(#1=#2){#1\!\!=\!#2}
+\newcommand\rv[1]{\Omega \to #1}
+\newcommand\ceq{\!\!=\!}
+\newcommand\cmin{\!-\!}
+\newcommand\modulo[2]{#1\!\!\!\!\!\mod#2}
+\newcommand\sumitoN{\sum_{i=1}^N}
+\newcommand\sumktoK{\sum_{k=1}^K}
+\newcommand\sumjtoK{\sum_{j=1}^K}
+\newcommand\sumalpha{\sum_{\alpha\in\A}}
+\newcommand\prodktoK{\prod_{k=1}^K}
+\newcommand\prodjtoK{\prod_{j=1}^K}
+\newcommand\past[1]{\overset{\rule{0pt}{0.2em}\smash{\leftarrow}}{#1}}
+\newcommand\fut[1]{\overset{\rule{0pt}{0.1em}\smash{\rightarrow}}{#1}}
+\newcommand\parity[2]{P^{#1}_{2,#2}}
 %\usepackage[parfill]{parskip}
 \begin{document}
 \title{Cognitive Music Modelling: an Information Dynamics Approach}
 		Queen Mary University of London\\
 		Mile End Road, London E1 4NS\\
 		Email:}}
 \maketitle
-\begin{abstract}People take in information when perceiving music.  With it they continually build predictive models of what is going to happen.  There is a relationship between information measures and how we perceive music.  An information theoretic approach to music cognition is thus a fruitful avenue of research.
+\begin{abstract}
+	People take in information when perceiving music.  With it they continually
+	build predictive models of what is going to happen.  There is a relationship
+	between information measures and how we perceive music.  An information
+	theoretic approach to music cognition is thus a fruitful avenue of research.
+	In this paper, we review the theoretical foundations of information dynamics
+	and discuss a few emerging areas of application.
 \end{abstract}
 \section{Expectation and surprise in music}
 \label{s:Intro}
-	One of the more salient effects of listening to music is to create
+	One of the effects of listening to music is to create
-	\emph{expectations} of what is to come next, which may be fulfilled
+	expectations of what is to come next, which may be fulfilled
 	immediately, after some delay, or not at all as the case may be.
 	This is the thesis put forward by, amongst others, music theorists
-	L. B. Meyer \cite{Meyer67} and Narmour \citep{Narmour77}.
+	L. B. Meyer \cite{Meyer67} and Narmour \citep{Narmour77}, but was
-	In fact, %the gist of
+	recognised much earlier; for example,
-	this insight predates Meyer quite considerably; for example,
 	it was elegantly put by Hanslick \cite{Hanslick1854} in the
 	nineteenth century:
 	\begin{quote}
 			`The most important factor in the mental process which accompanies the
 			act of listening to music, and which converts it to a source of pleasure,
-			is %\ldots
+			is \ldots the intellectual satisfaction
-			frequently overlooked. We here refer to the intellectual satisfaction
 			which the listener derives from continually following and anticipating
 			the composer's intentions---now, to see his expectations fulfilled, and
-			now, to find himself agreeably mistaken. It is a matter of course that
+			now, to find himself agreeably mistaken.
-			this intellectual flux and reflux, this perpetual giving and receiving
+			%It is a matter of course that
-			takes place unconsciously, and with the rapidity of lightning-flashes.'
+			%this intellectual flux and reflux, this perpetual giving and receiving
+			%takes place unconsciously, and with the rapidity of lightning-flashes.'
 	\end{quote}
 	An essential aspect of this is that music is experienced as a phenomenon
 	that `unfolds' in time, rather than being apprehended as a static object
 	presented in its entirety. Meyer argued that musical experience depends
 	on how we change and revise our conceptions \emph{as events happen}, on
 	how expectation and prediction interact with occurrence, and that, to a
 	The business of making predictions and assessing surprise is essentially
 	one of reasoning under conditions of uncertainty and manipulating
 	degrees of belief about the various proposition which may or may not
 	hold, and, as has been argued elsewhere \cite{Cox1946,Jaynes27}, best
 	quantified in terms of Bayesian probability theory.
-%  Thus, we assume that musical schemata are encoded as probabilistic %
-%\citep{Meyer56} models, and
 Thus, we suppose that
 	when we listen to music, expectations are created on the basis of our
 	familiarity with various stylistic norms %, that is, using models that
 	encode the statistics of music in general, the particular styles of
 	music that seem best to fit the piece we happen to be listening to, and
 	the emerging structures peculiar to the current piece.  There is
 	experimental evidence that human listeners are able to internalise
 	statistical knowledge about musical structure, \eg
 	\citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also
 	that statistical models can form an effective basis for computational
-%       analysis of music, \eg \cite{Pearce2005}.
 	analysis of music, \eg
 	\cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}.
-%               \cite{Ferrand2002}. Dubnov and Assayag PSTs?
-	\squash
 	\subsection{Music and information theory}
 	Given a probabilistic framework for music modelling and prediction,
 	it is a small step to apply quantitative information theory \cite{Shannon48} to
 	the models at hand.
 	The relationship between information theory and music and art in general has been the
 		usefully be said to exist is the subject of some debate, with advocates of
 		subjective probabilities including de Finetti \cite{deFinetti}.
 		Accordingly, we will treat the concept of a `true' or `objective' probability
 		models with a grain of salt and not rely on them in our
 		theoretical development.}%
-%		since probabilities are almost always a function of the state of knowledge of the observer
 	or from simple statistical analyses such as
 	computing emprical distributions. Our approach is explicitly to consider the role
 	of the observer in perception, and more specifically, to consider estimates of
 	entropy \etc with respect to \emph{subjective} probabilities.
-	% !!REV - DONE - explain use of quoted `objective'
-	% !!REV - previous work on information theory in music
-	More recent work on using information theoretic concepts to analyse music in
-	includes Simon's \cite{Simon2005} assessments of the entropy of
-	Jazz improvisations and Dubnov's
-	\cite{Dubnov2006,DubnovMcAdamsReynolds2006,Dubnov2008}
-	investigations of the `information rate' of musical processes, which is related
-	to the notion of redundancy in a communications channel.
-	Dubnov's work in particular is informed by similar concerns to our own
-	and we will discuss the relationship between it and our work at
-	several points later in this paper
-	(see \secrf{Redundancy}, \secrf{methods} and \secrf{RelatedWork}).
-	% !!REV - DONE - rephrase, check grammar (now there are too many 'one's!)
-\squash
 \subsection{Information dynamic approach}
 	Bringing the various strands together, our working hypothesis is that
 	as a listener (to which will refer gender neutrally as `it')
 	listens to a piece of music, it maintains a dynamically evolving statistical
 	By tracing the evolution of a these measures, we obtain a representation
 	which captures much of the significant structure of the
 	music.
 	This approach has a number of features which we list below.
-		(1) \emph{Abstraction}:
+		\emph{Abstraction}:
 	Because it is sensitive mainly to \emph{patterns} of occurence,
 	rather the details of which specific things occur,
 	it operates at a level of abstraction removed from the details of the sensory
 	experience and the medium through which it was received, suggesting that the
 	same approach could, in principle, be used to analyse and compare information
 	flow in different temporal media regardless of whether they are auditory,
 	visual or otherwise.
-		(2) \emph{Generality}:
+		\emph{Generality} applicable to any probabilistic model.
-	This approach does not proscribe which probabilistic models should be used---the
-	choice can be guided by standard model selection criteria such as Bayes
+		\emph{Subjectivity}:
-	factors \cite{KassRaftery1995}, \etc
-		(3) \emph{Richness}:
-	It may be effective to use a model with time-dependent latent
-	variables, such as a hidden Markov model. In these cases, we can track changes
-	in beliefs about the hidden variables as well as the observed ones, adding
-	another layer of richness to the description while maintaining the same
-	level of abstraction.
-	For example, harmony (\ie, the `current chord') in music is not stated explicitly, but rather
-	must be inferred from the musical surface; nonetheless, a sense of harmonic
-	progression is an important aspect of many styles of music.
-		(4) \emph{Subjectivity}:
 	Since the analysis is dependent on the probability model the observer brings to the
 	problem, which may depend on prior experience or other factors, and which may change
 	over time, inter-subject variablity and variation in subjects' responses over time are
 	fundamental to the theory. It is essentially a theory of subjective response
-	% !!REV - clarify aims of paper.
+	%modelling the creative process, which often alternates between generative
-	Having outlined the basic ideas, our aims in pursuing this line of thought
+	%and selective or evaluative phases \cite{Boden1990}, and would have
-	are threefold: firstly, to propose dynamic information-based measures which
+	%applications in tools for computer aided composition.
-	are coherent from a theoretical point of view and consistent with the general
-	principles of probabilistic inference, with possible applications in
-	regulating machine learning systems;
+\section{Theoretical review}
-	% when heuristics are required to manage intractible models or limited computational resources.
-	secondly, to construct computational models of what human brains are doing
+	In this section, we summarise the definitions of some of the relevant quantities
-	in response to music, on the basis that our brains implement, or at least
+	in information dynamics and show how they can be computed in some simple probabilistic
-	approximate, optimal probabilistic inference under the relevant constraints;
+	models (namely, first and higher-order Markov chains, and Gaussian processes [Peter?]).
-	and thirdly, to construct a computational model of a certain restricted
-	field of aesthetic judgements (namely judgements related to formal structure)
+	\begin{fig}{venn-example}
-	that may shed light on what makes a stimulus interesting or aesthetically
+		\newcommand\rad{2.2em}%
-	pleasing. This would be of particular relevance to understanding and
+		\newcommand\circo{circle (3.4em)}%
-	modelling the creative process, which often alternates between generative
+		\newcommand\labrad{4.3em}
-	and selective or evaluative phases \cite{Boden1990}, and would have
+		\newcommand\bound{(-6em,-5em) rectangle (6em,6em)}
-	applications in tools for computer aided composition.
+		\newcommand\colsep{\ }
+		\newcommand\clipin[1]{\clip (#1) \circo;}%
+		\newcommand\clipout[1]{\clip \bound (#1) \circo;}%
+		\newcommand\cliptwo[3]{%
+			\begin{scope}
+				\clipin{#1};
+				\clipin{#2};
+				\clipout{#3};
+				\fill[black!30] \bound;
+			\end{scope}
+		}%
+		\newcommand\clipone[3]{%
+			\begin{scope}
+				\clipin{#1};
+				\clipout{#2};
+				\clipout{#3};
+				\fill[black!15] \bound;
+			\end{scope}
+		}%
+		\begin{tabular}{c@{\colsep}c}
+			\begin{tikzpicture}[baseline=0pt]
+				\coordinate (p1) at (90:\rad);
+				\coordinate (p2) at (210:\rad);
+				\coordinate (p3) at (-30:\rad);
+				\clipone{p1}{p2}{p3};
+				\clipone{p2}{p3}{p1};
+				\clipone{p3}{p1}{p2};
+				\cliptwo{p1}{p2}{p3};
+				\cliptwo{p2}{p3}{p1};
+				\cliptwo{p3}{p1}{p2};
+\begin{scope}
+\clip (p1) \circo;
+\clip (p2) \circo;
+\clip (p3) \circo;
+\fill[black!45] \bound;
+\end{scope}
+				\draw (p1) \circo;
+				\draw (p2) \circo;
+				\draw (p3) \circo;
+				\path
+					(barycentric cs:p3=1,p1=-0.2,p2=-0.1) +(0ex,0) node {$I_{3|12}$}
+					(barycentric cs:p1=1,p2=-0.2,p3=-0.1) +(0ex,0) node {$I_{1|23}$}
+					(barycentric cs:p2=1,p3=-0.2,p1=-0.1) +(0ex,0) node {$I_{2|13}$}
+					(barycentric cs:p3=1,p2=1,p1=-0.55) +(0ex,0) node {$I_{23|1}$}
+					(barycentric cs:p1=1,p3=1,p2=-0.55) +(0ex,0) node {$I_{13|2}$}
+					(barycentric cs:p2=1,p1=1,p3=-0.55) +(0ex,0) node {$I_{12|3}$}
+					(barycentric cs:p3=1,p2=1,p1=1) node {$I_{123}$}
+					;
+				\path
+					(p1) +(140:\labrad) node {$X_1$}
+					(p2) +(-140:\labrad) node {$X_2$}
+					(p3) +(-40:\labrad) node {$X_3$};
+			\end{tikzpicture}
+			&
+			\parbox{0.5\linewidth}{
+				\small
+				\begin{align*}
+					I_{1|23} &= H(X_1|X_2,X_3) \\
+					I_{13|2} &= I(X_1;X_3|X_2) \\
+					I_{1|23} + I_{13|2} &= H(X_1|X_2) \\
+					I_{12|3} + I_{123} &= I(X_1;X_2)
+				\end{align*}
+			}
+		\end{tabular}
+		\caption{
+		Venn diagram visualisation of entropies and mutual informations
+		for three random variables $X_1$, $X_2$ and $X_3$. The areas of
+		the three circles represent $H(X_1)$, $H(X_2)$ and $H(X_3)$ respectively.
+		The total shaded area is the joint entropy $H(X_1,X_2,X_3)$.
+		The central area $I_{123}$ is the co-information \cite{McGill1954}.
+		Some other information measures are indicated in the legend.
+		}
+	\end{fig}
+	[Adopting notation of recent Binding information paper.]
+	\subsection{'Anatomy of a bit' stuff}
+	Entropy rates, redundancy, predictive information etc.
+	Information diagrams.
+	\begin{fig}{predinfo-bg}
+		\newcommand\subfig[2]{\shortstack{#2\\[0.75em]#1}}
+		\newcommand\rad{1.8em}%
+		\newcommand\ovoid[1]{%
+			++(-#1,\rad)
+			-- ++(2 * #1,0em) arc (90:-90:\rad)
+			-- ++(-2 * #1,0em) arc (270:90:\rad)
+		}%
+		\newcommand\axis{2.75em}%
+		\newcommand\olap{0.85em}%
+		\newcommand\offs{3.6em}
+		\newcommand\colsep{\hspace{5em}}
+		\newcommand\longblob{\ovoid{\axis}}
+		\newcommand\shortblob{\ovoid{1.75em}}
+		\begin{tabular}{c@{\colsep}c}
+			\subfig{(a) excess entropy}{%
+				\newcommand\blob{\longblob}
+				\begin{tikzpicture}
+					\coordinate (p1) at (-\offs,0em);
+					\coordinate (p2) at (\offs,0em);
+					\begin{scope}
+						\clip (p1) \blob;
+						\clip (p2) \blob;
+						\fill[lightgray] (-1,-1) rectangle (1,1);
+					\end{scope}
+					\draw (p1) +(-0.5em,0em) node{\shortstack{infinite\\past}} \blob;
+					\draw (p2) +(0.5em,0em) node{\shortstack{infinite\\future}} \blob;
+					\path (0,0) node (future) {$E$};
+					\path (p1) +(-2em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
+					\path (p2) +(2em,\rad) node [anchor=south] {$X_0,\ldots$};
+				\end{tikzpicture}%
+			}%
+			\\[1.25em]
+			\subfig{(b) predictive information rate $b_\mu$}{%
+				\begin{tikzpicture}%[baseline=-1em]
+					\newcommand\rc{2.1em}
+					\newcommand\throw{2.5em}
+					\coordinate (p1) at (210:1.5em);
+					\coordinate (p2) at (90:0.7em);
+					\coordinate (p3) at (-30:1.5em);
+					\newcommand\bound{(-7em,-2.6em) rectangle (7em,3.0em)}
+					\newcommand\present{(p2) circle (\rc)}
+					\newcommand\thepast{(p1) ++(-\throw,0) \ovoid{\throw}}
+					\newcommand\future{(p3) ++(\throw,0) \ovoid{\throw}}
+					\newcommand\fillclipped[2]{%
+						\begin{scope}[even odd rule]
+							\foreach \thing in {#2} {\clip \thing;}
+							\fill[black!#1] \bound;
+						\end{scope}%
+					}%
+					\fillclipped{30}{\present,\future,\bound \thepast}
+					\fillclipped{15}{\present,\bound \future,\bound \thepast}
+					\draw \future;
+					\fillclipped{45}{\present,\thepast}
+					\draw \thepast;
+					\draw \present;
+					\node at (barycentric cs:p2=1,p1=-0.17,p3=-0.17) {$r_\mu$};
+					\node at (barycentric cs:p1=-0.4,p2=1.0,p3=1) {$b_\mu$};
+					\node at (barycentric cs:p3=0,p2=1,p1=1.2) [shape=rectangle,fill=black!45,inner sep=1pt]{$\rho_\mu$};
+					\path (p2) +(140:3em) node {$X_0$};
+	%            \node at (barycentric cs:p3=0,p2=1,p1=1) {$\rho_\mu$};
+					\path (p3) +(3em,0em) node  {\shortstack{infinite\\future}};
+					\path (p1) +(-3em,0em) node  {\shortstack{infinite\\past}};
+					\path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
+					\path (p3) +(4em,\rad) node [anchor=south] {$X_1,\ldots$};
+				\end{tikzpicture}}%
+				\\[0.5em]
+		\end{tabular}
+		\caption{
+		Venn diagram representation of several information measures for
+		stationary random processes. Each circle or oval represents a random
+		variable or sequence of random variables relative to time $t=0$. Overlapped areas
+		correspond to various mutual information as in \Figrf{venn-example}.
+		In (c), the circle represents the `present'. Its total area is
+		$H(X_0)=H(1)=\rho_\mu+r_\mu+b_\mu$, where $\rho_\mu$ is the multi-information
+		rate, $r_\mu$ is the residual entropy rate, and $b_\mu$ is the predictive
+		information rate. The entropy rate is $h_\mu = r_\mu+b_\mu$.
+		}
+	\end{fig}
+	\paragraph{Predictive information rate}
+	In previous work \cite{AbdallahPlumbley2009}, we introduced
+%	examined several
+%	information-theoretic measures that could be used to characterise
+%	not only random processes (\ie, an ensemble of possible sequences),
+%	but also the dynamic progress of specific realisations of such processes.
+%	One of these measures was
+%
+	the \emph{predictive information rate}
+	(PIR), which is the average information
+	in one observation about the infinite future given the infinite past.
+	If $\past{X}_t=(\ldots,X_{t-2},X_{t-1})$ denotes the variables
+	before time $t$,
+	and $\fut{X}_t = (X_{t+1},X_{t+2},\ldots)$ denotes
+	those after $t$,
+	the PIR at time $t$ is defined as a conditional mutual information:
+	\begin{equation}
+		\label{eq:PIR}
+		\IXZ_t \define I(X_t;\fut{X}_t|\past{X}_t) = H(\fut{X}_t|\past{X}_t) - H(\fut{X}_t|X_t,\past{X}_t).
+	\end{equation}
+%	(The underline/overline notation follows that of \cite[\S 3]{AbdallahPlumbley2009}.)
+%	Hence, $\Ix_t$ quantifies the \emph{new}
+%	information gained about the future from the observation at time $t$.
+	Equation \eqrf{PIR} can be read as the average reduction
+	in uncertainty about the future on learning $X_t$, given the past.
+	Due to the symmetry of the mutual information, it can also be written
+	as
+	\begin{equation}
+%		\IXZ_t
+		I(X_t;\fut{X}_t|\past{X}_t) = H(X_t|\past{X}_t) - H(X_t|\fut{X}_t,\past{X}_t).
+%		\label{<++>}
+	\end{equation}
+%	If $X$ is stationary, then
+	Now, in the shift-invariant case, $H(X_t|\past{X}_t)$
+	is the familiar entropy rate $h_\mu$, but $H(X_t|\fut{X}_t,\past{X}_t)$,
+	the conditional entropy of one variable given \emph{all} the others
+	in the sequence, future as well as past, is what
+	we called the \emph{residual entropy rate} $r_\mu$ in \cite{AbdallahPlumbley2010},
+	but was previously identified by Verd{\'u} and Weissman \cite{VerduWeissman2006} as the
+	\emph{erasure entropy rate}.
+%	It is not expressible in terms of the block entropy function $H(\cdot)$.
+	It can be defined as the limit
+	\begin{equation}
+		\label{eq:residual-entropy-rate}
+		r_\mu \define \lim_{N\tends\infty} H(X_{\bet(-N,N)}) - H(X_{\bet(-N,-1)},X_{\bet(1,N)}).
+	\end{equation}
+	The second term, $H(X_{\bet(1,N)},X_{\bet(-N,-1)})$,
+	is the joint entropy of two non-adjacent blocks each of length $N$ with a
+	gap between them,
+	and cannot be expressed as a function of block entropies alone.
+%	In order to associate it with the concept of \emph{binding information} which
+%	we will define in \secrf{binding-info}, we
+	Thus, the shift-invariant PIR (which we will write as $b_\mu$) is the difference between
+	the entropy rate and the erasure entropy rate: $b_\mu = h_\mu - r_\mu$.
+	These relationships are illustrated in \Figrf{predinfo-bg}, along with
+	several of the information measures we have discussed so far.
+	\subsection{First order Markov chains}
+	These are the simplest non-trivial models to which information dynamics methods
+	can be applied. In \cite{AbdallahPlumbley2009} we, showed that the predictive information
+	rate can be expressed simply in terms of the entropy rate of the Markov chain.
+	If we let $a$ denote the transition matrix of the Markov chain, and $h_a$ it's
+	entropy rate, then its predictive information rate $b_a$ is
+	\begin{equation}
+		b_a = h_{a^2} - h_a,
+	\end{equation}
+	where $a^2 = aa$, the transition matrix squared, is the transition matrix
+	of the `skip one' Markov chain obtained by leaving out every other observation.
+	\subsection{Higher order Markov chains}
+	Second and higher order Markov chains can be treated in a similar way by transforming
+	to a first order representation of the high order Markov chain. If we are dealing
+	with an $N$th order model, this is done forming a new alphabet of possible observations
+	consisting of all possible $N$-tuples of symbols from the base alphabet. An observation
+	in this new model represents a block of $N$ observations from the base model. The next
+	observation represents the block of $N$ obtained by shift the previous block along
+	by one step. The new Markov of chain is parameterised by a sparse $K^N\times K^N$
+	transition matrix $\hat{a}$.
+	\begin{equation}
+		b_{\hat{a}} = h_{\hat{a}^{N+1}} - N h_{\hat{a}},
+	\end{equation}
+	where $\hat{a}^{N+1}$ is the $N+1$th power of the transition matrix.
 \section{Information Dynamics in Analysis}
 	\subsection{Musicological Analysis}

Mercurial > hg > cip2012

comparison draft.tex @ 18:ca694f7dc3f9