cip2012: draft.tex comparison

comparison draft.tex @ 43:3f643e9fead0

Added Andrew's bits, added to fig 2, fixed some spellings, added some section crossrefs.

author	samer
date	Thu, 15 Mar 2012 15:08:46 +0000
parents	1161caf0bdda
children	244b74fb707d

comparison

equal deleted inserted replaced

-:1161caf0bdda
+:3f643e9fead0
 	conditioned on the observed past. This could be used, for example, as an estimate
 	of attentional resources which should be directed at this stream of data, which may
 	be in competition with other sensory streams.
 	\subsection{Information measures for stationary random processes}
+	\label{s:process-info}
 	\begin{fig}{predinfo-bg}
 		\newcommand\subfig[2]{\shortstack{#2\\[0.75em]#1}}
 		\newcommand\rad{1.8em}%
 		\newcommand\offs{3.6em}
 		\newcommand\colsep{\hspace{5em}}
 		\newcommand\longblob{\ovoid{\axis}}
 		\newcommand\shortblob{\ovoid{1.75em}}
 		\begin{tabular}{c@{\colsep}c}
-			\subfig{(a) excess entropy}{%
+			\subfig{(a) multi-information and entropy rates}{%
+				\begin{tikzpicture}%[baseline=-1em]
+					\newcommand\rc{1.75em}
+					\newcommand\throw{2.5em}
+					\coordinate (p1) at (180:1.5em);
+					\coordinate (p2) at (0:0.3em);
+					\newcommand\bound{(-7em,-2.6em) rectangle (7em,3.0em)}
+					\newcommand\present{(p2) circle (\rc)}
+					\newcommand\thepast{(p1) ++(-\throw,0) \ovoid{\throw}}
+					\newcommand\fillclipped[2]{%
+						\begin{scope}[even odd rule]
+							\foreach \thing in {#2} {\clip \thing;}
+							\fill[black!#1] \bound;
+						\end{scope}%
+					}%
+					\fillclipped{30}{\present,\bound \thepast}
+					\fillclipped{15}{\present,\bound \thepast}
+					\fillclipped{45}{\present,\thepast}
+					\draw \thepast;
+					\draw \present;
+					\node at (barycentric cs:p2=1,p1=-0.3) {$h_\mu$};
+					\node at (barycentric cs:p2=1,p1=1) [shape=rectangle,fill=black!45,inner sep=1pt]{$\rho_\mu$};
+					\path (p2) +(90:3em) node {$X_0$};
+					\path (p1) +(-3em,0em) node  {\shortstack{infinite\\past}};
+					\path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
+				\end{tikzpicture}}%
+			\\[1.25em]
+			\subfig{(b) excess entropy}{%
 				\newcommand\blob{\longblob}
 				\begin{tikzpicture}
 					\coordinate (p1) at (-\offs,0em);
 					\coordinate (p2) at (\offs,0em);
 					\begin{scope}
 					\path (p1) +(-2em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
 					\path (p2) +(2em,\rad) node [anchor=south] {$X_0,\ldots$};
 				\end{tikzpicture}%
 			}%
 			\\[1.25em]
-			\subfig{(b) predictive information rate $b_\mu$}{%
+			\subfig{(c) predictive information rate $b_\mu$}{%
 				\begin{tikzpicture}%[baseline=-1em]
 					\newcommand\rc{2.1em}
 					\newcommand\throw{2.5em}
 					\coordinate (p1) at (210:1.5em);
 					\coordinate (p2) at (90:0.7em);
 						\begin{scope}[even odd rule]
 							\foreach \thing in {#2} {\clip \thing;}
 							\fill[black!#1] \bound;
 						\end{scope}%
 					}%
+					\fillclipped{80}{\future,\thepast}
 					\fillclipped{30}{\present,\future,\bound \thepast}
 					\fillclipped{15}{\present,\bound \future,\bound \thepast}
 					\draw \future;
 					\fillclipped{45}{\present,\thepast}
 					\draw \thepast;
 		variable or sequence of random variables relative to time $t=0$. Overlapped areas
 		correspond to various mutual information as in \Figrf{venn-example}.
 		In (b), the circle represents the `present'. Its total area is
 		$H(X_0)=\rho_\mu+r_\mu+b_\mu$, where $\rho_\mu$ is the multi-information
 		rate, $r_\mu$ is the residual entropy rate, and $b_\mu$ is the predictive
-		information rate. The entropy rate is $h_\mu = r_\mu+b_\mu$.
+		information rate. The entropy rate is $h_\mu = r_\mu+b_\mu$. The small dark
+		region  below $X_0$ in (c) is $\sigma_\mu = E-\rho_\mu$.
 		}
 	\end{fig}
 	If we step back, out of the observer's shoes as it were, and consider the
 	random process $(\ldots,X_{-1},X_0,X_1,\dots)$ as a statistical ensemble of
 	is the mutual information between
 	the entire `past' and the entire `future':
 	\begin{equation}
 		E = I(\past{X}_t; X_t,\fut{X}_t).
 	\end{equation}
+	Both the excess entropy and the multi-information rate can be thought
+	of as measures of \emph{redundancy}, quantifying the extent to which
+	the same information is to be found in all parts of the sequence.
 	The \emph{predictive information rate} (or PIR) \cite{AbdallahPlumbley2009}
 	is the average information in one observation about the infinite future given the infinite past,
 	and is defined as a conditional mutual information:
 	in uncertainty about the future on learning $X_t$, given the past.
 	Due to the symmetry of the mutual information, it can also be written
 	as
 	\begin{equation}
 %		\IXZ_t
-		I(X_t;\fut{X}_t|\past{X}_t) = h_\mu - r_\mu,
+b_\mu = H(X_t|\past{X}_t) - H(X_t|\past{X}_t,\fut{X}_t) = h_\mu - r_\mu,
 %		\label{<++>}
 	\end{equation}
 %	If $X$ is stationary, then
 	where $r_\mu = H(X_t|\fut{X}_t,\past{X}_t)$,
 	is the \emph{residual} \cite{AbdallahPlumbley2010},
 	James et al \cite{JamesEllisonCrutchfield2011} study the predictive information
 	rate and also examine some related measures. In particular they identify the
 	$\sigma_\mu$, the difference between the multi-information rate and the excess
 	entropy, as an interesting quantity that measures the predictive benefit of
 	model-building (that is, maintaining an internal state summarising past
-	observations in order to make better predictions). They also identify
+	observations in order to make better predictions).
-	$w_\mu = \rho_\mu + b_{\mu}$, which they call the \emph{local exogenous
+%	They also identify
-	information} rate.
+%	$w_\mu = \rho_\mu + b_{\mu}$, which they call the \emph{local exogenous
+%	information} rate.
-\begin{fig}{wundt}
-\raisebox{-4em}{\colfig[0.43]{wundt}}
-%  {\ \shortstack{{\Large$\longrightarrow$}\\ {\scriptsize\emph{exposure}}}\ }
-{\ {\large$\longrightarrow$}\ }
-\raisebox{-4em}{\colfig[0.43]{wundt2}}
-\caption{
-The Wundt curve relating randomness/complexity with
-perceived value. Repeated exposure sometimes results
-in a move to the left along the curve \cite{Berlyne71}.
-}
-\end{fig}
 	\subsection{First and higher order Markov chains}
 	First order Markov chains are the simplest non-trivial models to which information
 	dynamics methods can be applied. In \cite{AbdallahPlumbley2009} we derived
 	where $\hat{a}^{N+1}$ is the $(N+1)$th power of the first order transition matrix.
 	Other information measures can also be computed for the high-order Markov chain, including
 	the multi-information rate $\rho_\mu$ and the excess entropy $E$. These are identical
 	for first order Markov chains, but for order $N$ chains, $E$ can be up to $N$ times larger
 	than $\rho_\mu$.
+	[Something about what kinds of Markov chain maximise $h_\mu$ (uncorrelated `white'
+	sequences, no temporal structure), $\rho_\mu$ and $E$ (periodic) and $b_\mu$. We return
+	this in \secrf{composition}.]
 \section{Information Dynamics in Analysis}
 \begin{fig}{twopages}
 \end{itemize}
 \subsection{Beat Tracking}
+A probabilistic method for drum tracking was presented by Robertson
+\cite{Robertson11c}. The algorithm is used to synchronise a music
+sequencer to a live drummer. The expected beat time of the sequencer is
+represented by a click track, and the algorithm takes as input event
+times for discrete kick and snare drum events relative to this click
+track. These are obtained using dedicated microphones for each drum and
+using a percussive onset detector (Puckette 1998). The drum tracker
+continually updates distributions for tempo and phase on receiving a new
+event time. We can thus quantify the information contributed of an event
+by measuring the difference between the system's prior distribution and
+the posterior distribution using the Kullback-Leiber divergence.
+Here, we have calculated the KL divergence and entropy for kick and
+snare events in sixteen files. The analysis of information rates can be
+considered \emph{subjective}, in that it measures how the drum tracker's
+probability distributions change, and these are contingent upon the
+model used as well as external properties in the signal. We expect,
+however, that following periods of increased uncertainty, such as fills
+or expressive timing, the information contained in an individual event
+increases. We also examine whether the information is dependent upon
+metrical position.
 \section{Information dynamics as compositional aid}
+\label{s:composition}
+\begin{fig}{wundt}
+\raisebox{-4em}{\colfig[0.43]{wundt}}
+%  {\ \shortstack{{\Large$\longrightarrow$}\\ {\scriptsize\emph{exposure}}}\ }
+{\ {\large$\longrightarrow$}\ }
+\raisebox{-4em}{\colfig[0.43]{wundt2}}
+\caption{
+The Wundt curve relating randomness/complexity with
+perceived value. Repeated exposure sometimes results
+in a move to the left along the curve \cite{Berlyne71}.
+}
+\end{fig}
 In addition to applying information dynamics to analysis, it is also possible
 to apply it to the generation of content, such as to the composition of musical
 materials.  The outputs of algorithmic or stochastic processes can be filtered
 to match a set of criteria defined in terms of the information dynamics model,
 address notions of expectation and surprise in music, and as such the Melody
 Triangle is a means of interfacing with a generative process in terms of the
 predictability of its output.
 The triangle is `populated' with possible parameter values for melody generators.
-These are plotted in a 3d statistical space of redundancy, entropy rate and
+These are plotted in a 3D information space of $\rho_\mu$ (redundancy), $h_\mu$ (entropy rate) and
-predictive information rate.
+$b_\mu$ (predictive information rate), as defined in \secrf{process-info}.
-In our case we generated thousands of transition matrixes, representing first-order
+In our case we generated thousands of transition matrices, representing first-order
 Markov chains, by a random sampling method.  In figure \ref{InfoDynEngine} we
-see a representation of how these matrixes are distributed in the 3d statistical
+see a representation of how these matrices are distributed in the 3d statistical
 space; each one of these points corresponds to a transition matrix.
-The distribution of transition matrixes plotted in this space forms an arch shape
+The distribution of transition matrices plotted in this space forms an arch shape
 that is fairly thin.  It thus becomes a reasonable approximation to pretend that
 it is just a sheet in two dimensions; and so we stretch out this curved arc into
 a flat triangle.  It is this triangular sheet that is our `Melody Triangle' and
 forms the interface by which the system is controlled.  Using this interface
 thus involves a mapping to statistical space; a user selects a position within
 \section{Conclusion}
 \bibliographystyle{unsrt}
-{\bibliography{all,c4dm,nime}}
+{\bibliography{all,c4dm,nime,andrew}}
 \end{document}

Mercurial > hg > cip2012

comparison draft.tex @ 43:3f643e9fead0