Mercurial > hg > cip2012
changeset 34:25846c37a08a
New bits, rearranged figure placement.
author | samer |
---|---|
date | Wed, 14 Mar 2012 13:17:05 +0000 |
parents | a9c8580cb8ca |
children | 194c7ec7e35d |
files | draft.pdf draft.tex |
diffstat | 2 files changed, 95 insertions(+), 78 deletions(-) [+] |
line wrap: on
line diff
--- a/draft.tex Wed Mar 14 12:05:04 2012 +0000 +++ b/draft.tex Wed Mar 14 13:17:05 2012 +0000 @@ -225,6 +225,56 @@ \section{Theoretical review} + \subsection{Entropy and information} + Let $X$ denote some variable whose value is initially unknown to our + hypothetical observer. We will treat $X$ mathematically as a random variable, + with a value to be drawn from some set $\A$ and a + probability distribution representing the observer's beliefs about the + true value of $X$. + In this case, the observer's uncertainty about $X$ can be quantified + as the entropy of the random variable $H(X)$. For a discrete variable + with probability mass function $p:\A \to [0,1]$, this is + \begin{equation} + H(X) = \sum_{x\in\A} -p(x) \log p(x) = \expect{-\log p(X)}, + \end{equation} + where $\expect{}$ is the expectation operator. The negative-log-probability + $\ell(x) = -\log p(x)$ of a particular value $x$ can usefully be thought of as + the \emph{surprisingness} of the value $x$ should it be observed, and + hence the entropy is the expected surprisingness. + + Now suppose that the observer receives some new data $\Data$ that + causes a revision of its beliefs about $X$. The \emph{information} + in this new data \emph{about} $X$ can be quantified as the + Kullback-Leibler (KL) divergence between the prior and posterior + distributions $p(x)$ and $p(x|\Data)$ respectively: + \begin{equation} + \mathcal{I}_{\Data\to X} = D(p_{X|\Data} || p_{X}) + = \sum_{x\in\A} p(x|\Data) \log \frac{p(x|\Data)}{p(x)}. + \end{equation} + When there are multiple variables $X_1, X_2$ + \etc which the observer believes to be dependent, then the observation of + one may change its beliefs and hence yield information about the + others. The joint and conditional entropies as described in any + textbook on information theory (\eg \cite{CoverThomas}) then quantify + the observer's expected uncertainty about groups of variables given the + values of others. In particular, the \emph{mutual information} + $I(X_1;X_2)$ is both the expected information + in an observation of $X_2$ about $X_1$ and the expected reduction + in uncertainty about $X_1$ after observing $X_2$: + \begin{equation} + I(X_1;X_2) = H(X_1) - H(X_1|X_2), + \end{equation} + where $H(X_1|X_2) = H(X_1,X_2) - H(X_2)$ is the conditional entropy + of $X_2$ given $X_1$. A little algebra shows that $I(X_1;X_2)=I(X_2;X_1)$ + and so the mutual information is symmetric in its arguments. A conditional + form of the mutual information can be formulated analogously: + \begin{equation} + I(X_1;X_2|X_3) = H(X_1|X_3) - H(X_1|X_2,X_3). + \end{equation} + These relationships between the various entropies and mutual + informations are conveniently visualised in Venn diagram-like \emph{information diagrams} + or I-diagrams \cite{Yeung1991} such as the one in \figrf{venn-example}. + \begin{fig}{venn-example} \newcommand\rad{2.2em}% \newcommand\circo{circle (3.4em)}% @@ -304,42 +354,6 @@ } \end{fig} - \subsection{Entropy and information} - Let $X$ denote some variable whose value is initially unknown to our - hypothetical observer. We will treat $X$ mathematically as a random variable, - with a value to be drawn from some set (or \emph{alphabet}) $\A$ and a - probability distribution representing the observer's beliefs about the - true value of $X$. - In this case, the observer's uncertainty about $X$ can be quantified - as the entropy of the random variable $H(X)$. For a discrete variable - with probability mass function $p:\A \to [0,1]$, this is - \begin{equation} - H(X) = \sum_{x\in\A} -p(x) \log p(x) = \expect{-\log p(X)}, - \end{equation} - where $\expect{}$ is the expectation operator. The negative-log-probability - $\ell(x) = -\log p(x)$ of a particular value $x$ can usefully be thought of as - the \emph{surprisingness} of the value $x$ should it be observed, and - hence the entropy is the expected surprisingness. - - Now suppose that the observer receives some new data $\Data$ that - causes a revision of its beliefs about $X$. The \emph{information} - in this new data \emph{about} $X$ can be quantified as the - Kullback-Leibler (KL) divergence between the prior and posterior - distributions $p(x)$ and $p(x|\Data)$ respectively: - \begin{equation} - \mathcal{I}_{\Data\to X} = D(p_{X|\Data} || p_{X}) - = \sum_{x\in\A} p(x|\Data) \log \frac{p(x|\Data)}{p(x)}. - \end{equation} - If there are multiple variables $X_1, X_2$ - \etc which the observer believes to be dependent, then the observation of - one may change its beliefs and hence yield information about the - others. - The relationships between the various joint entropies, conditional - entropies, mutual informations and conditional mutual informations - can be visualised in Venn diagram-like \emph{information diagrams} - or I-diagrams \cite{Yeung1991}, for example, the three-variable - I-diagram in \figrf{venn-example}. - \subsection{Entropy and information in sequences} @@ -477,35 +491,17 @@ as \begin{equation} % \IXZ_t - I(X_t;\fut{X}_t|\past{X}_t) = H(X_t|\past{X}_t) - H(X_t|\fut{X}_t,\past{X}_t). + I(X_0;\fut{X}_0|\past{X}_0) = h_\mu - r_\mu, % \label{<++>} \end{equation} % If $X$ is stationary, then - Now, in the shift-invariant case, $H(X_t|\past{X}_t)$ - is the familiar entropy rate $h_\mu$, but $H(X_t|\fut{X}_t,\past{X}_t)$, - the conditional entropy of one variable given \emph{all} the others - in the sequence, future as well as past, is what - we called the \emph{residual entropy rate} $r_\mu$ in \cite{AbdallahPlumbley2010}, - but was previously identified by Verd{\'u} and Weissman \cite{VerduWeissman2006} as the - \emph{erasure entropy rate}. - Thus, the PIR is the difference between - the entropy rate and the erasure entropy rate: $b_\mu = h_\mu - r_\mu$. + where $r_\mu = H(X_0|\fut{X}_0,\past{X}_0)$, + is the \emph{residual} \cite{AbdallahPlumbley2010}, + or \emph{erasure} \cite{VerduWeissman2006} entropy rate. These relationships are illustrated in \Figrf{predinfo-bg}, along with several of the information measures we have discussed so far. - \begin{fig}{wundt} - \raisebox{-4em}{\colfig[0.43]{wundt}} - % {\ \shortstack{{\Large$\longrightarrow$}\\ {\scriptsize\emph{exposure}}}\ } - {\ {\large$\longrightarrow$}\ } - \raisebox{-4em}{\colfig[0.43]{wundt2}} - \caption{ - The Wundt curve relating randomness/complexity with - perceived value. Repeated exposure sometimes results - in a move to the left along the curve \cite{Berlyne71}. - } - \end{fig} - \subsection{Other sequential information measures} James et al \cite{JamesEllisonCrutchfield2011} study the predictive information @@ -544,11 +540,30 @@ where $\hat{a}^{N+1}$ is the $N+1$th power of the transition matrix. + \begin{fig}{wundt} + \raisebox{-4em}{\colfig[0.43]{wundt}} + % {\ \shortstack{{\Large$\longrightarrow$}\\ {\scriptsize\emph{exposure}}}\ } + {\ {\large$\longrightarrow$}\ } + \raisebox{-4em}{\colfig[0.43]{wundt2}} + \caption{ + The Wundt curve relating randomness/complexity with + perceived value. Repeated exposure sometimes results + in a move to the left along the curve \cite{Berlyne71}. + } + \end{fig} + \section{Information Dynamics in Analysis} \subsection{Musicological Analysis} - refer to the work with the analysis of minimalist pieces + In \cite{AbdallahPlumbley2009}, methods based on the theory described above + were used to analysis two pieces of music in the minimalist style + by Philip Glass: \emph{Two Pages} (1969) and \emph{Gradus} (1968). + The analysis was done using a first-order Markov chain model, with the + enhancement that the transition matrix of the model was allowed to + evolve dynamically as the notes were processed, and was estimated (in + a Bayesian way) as a \emph{distribution} over possible transition matrices, + rather than a point estimate. [Bayesian surprise, other component of IPI]. \begin{fig}{twopages} \colfig[0.96]{matbase/fig9471} % update from mbc paper @@ -641,20 +656,6 @@ space; each one of these points corresponds to a transition matrix.\emph{self-plagiarised} -\begin{figure} -\centering -\includegraphics[width=\linewidth]{figs/mtriscat} -\caption{The population of transition matrices distributed along three axes of -redundancy, entropy rate and predictive information rate (all measured in bits). -The concentrations of points along the redundancy axis correspond -to Markov chains which are roughly periodic with periods of 2 (redundancy 1 bit), -3, 4, \etc all the way to period 8 (redundancy 3 bits). The colour of each point -represents its PIR---note that the highest values are found at intermediate entropy -and redundancy, and that the distribution as a whole makes a curved triangle. Although -not visible in this plot, it is largely hollow in the middle. -\label{InfoDynEngine}} -\end{figure} - When we look at the distribution of transition matrixes plotted in this space, we see that it forms an arch shape that is fairly thin. It thus becomes a @@ -669,11 +670,7 @@ triangle, the corresponding transition matrix is returned. Figure \ref{TheTriangle} shows how the triangle maps to different measures of redundancy, entropy rate and predictive information rate.\emph{self-plagiarised} - \begin{figure} -\centering -\includegraphics[width=0.9\linewidth]{figs/TheTriangle.pdf} -\caption{The Melody Triangle\label{TheTriangle}} -\end{figure} + Each corner corresponds to three different extremes of predictability and unpredictability, which could be loosely characterised as `periodicity', `noise' and `repetition'. Melodies from the `noise' corner have no discernible pattern; @@ -688,6 +685,20 @@ sequences based on a simple model of how one might guess the next event given the previous one.\emph{self-plagiarised} +\begin{figure} + \centering + \includegraphics[width=\linewidth]{figs/mtriscat} + \caption{The population of transition matrices distributed along three axes of + redundancy, entropy rate and predictive information rate (all measured in bits). + The concentrations of points along the redundancy axis correspond + to Markov chains which are roughly periodic with periods of 2 (redundancy 1 bit), + 3, 4, \etc all the way to period 8 (redundancy 3 bits). The colour of each point + represents its PIR---note that the highest values are found at intermediate entropy + and redundancy, and that the distribution as a whole makes a curved triangle. Although + not visible in this plot, it is largely hollow in the middle. + \label{InfoDynEngine}} +\end{figure} + Any number of interfaces could be developed for the Melody Triangle. We have @@ -718,6 +729,12 @@ %As the Melody Triangle essentially operates on a stream of symbols, it it is possible to apply the melody triangle to the design of non-sonic content. + \begin{figure} +\centering +\includegraphics[width=0.9\linewidth]{figs/TheTriangle.pdf} +\caption{The Melody Triangle\label{TheTriangle}} +\end{figure} + \section{Musical Preference and Information Dynamics} We carried out a preliminary study that sought to identify any correlation between aesthetic preference and the information theoretical measures of the Melody