comparison draft.tex @ 25:3f08d18c65ce

Updates to section 2.
author samer
date Tue, 13 Mar 2012 16:02:05 +0000
parents 79ede31feb20
children fb1bfe785c05
comparison
equal deleted inserted replaced
24:79ede31feb20 25:3f08d18c65ce
19 \newcommand\preals{\reals_+} 19 \newcommand\preals{\reals_+}
20 \newcommand\X{\mathcal{X}} 20 \newcommand\X{\mathcal{X}}
21 \newcommand\Y{\mathcal{Y}} 21 \newcommand\Y{\mathcal{Y}}
22 \newcommand\domS{\mathcal{S}} 22 \newcommand\domS{\mathcal{S}}
23 \newcommand\A{\mathcal{A}} 23 \newcommand\A{\mathcal{A}}
24 \newcommand\Data{\mathcal{D}}
24 \newcommand\rvm[1]{\mathrm{#1}} 25 \newcommand\rvm[1]{\mathrm{#1}}
25 \newcommand\sps{\,.\,} 26 \newcommand\sps{\,.\,}
26 \newcommand\Ipred{\mathcal{I}_{\mathrm{pred}}} 27 \newcommand\Ipred{\mathcal{I}_{\mathrm{pred}}}
27 \newcommand\Ix{\mathcal{I}} 28 \newcommand\Ix{\mathcal{I}}
28 \newcommand\IXZ{\overline{\underline{\mathcal{I}}}} 29 \newcommand\IXZ{\overline{\underline{\mathcal{I}}}}
71 In this paper, we review the theoretical foundations of information dynamics 72 In this paper, we review the theoretical foundations of information dynamics
72 and discuss a few emerging areas of application. 73 and discuss a few emerging areas of application.
73 \end{abstract} 74 \end{abstract}
74 75
75 76
76 \section{Expectation and surprise in music} 77 \section{Introduction}
77 \label{s:Intro} 78 \label{s:Intro}
78 79
80 \subsection{Expectation and surprise in music}
79 One of the effects of listening to music is to create 81 One of the effects of listening to music is to create
80 expectations of what is to come next, which may be fulfilled 82 expectations of what is to come next, which may be fulfilled
81 immediately, after some delay, or not at all as the case may be. 83 immediately, after some delay, or not at all as the case may be.
82 This is the thesis put forward by, amongst others, music theorists 84 This is the thesis put forward by, amongst others, music theorists
83 L. B. Meyer \cite{Meyer67} and Narmour \citep{Narmour77}, but was 85 L. B. Meyer \cite{Meyer67} and Narmour \citep{Narmour77}, but was
101 on how we change and revise our conceptions \emph{as events happen}, on 103 on how we change and revise our conceptions \emph{as events happen}, on
102 how expectation and prediction interact with occurrence, and that, to a 104 how expectation and prediction interact with occurrence, and that, to a
103 large degree, the way to understand the effect of music is to focus on 105 large degree, the way to understand the effect of music is to focus on
104 this `kinetics' of expectation and surprise. 106 this `kinetics' of expectation and surprise.
105 107
108 Prediction and expectation are essentially probabilistic concepts
109 and can be treated mathematically using probability theory.
110 We suppose that when we listen to music, expectations are created on the basis
111 of our familiarity with various styles of music and our ability to
112 detect and learn statistical regularities in the music as they emerge,
113 There is experimental evidence that human listeners are able to internalise
114 statistical knowledge about musical structure, \eg
115 \citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also
116 that statistical models can form an effective basis for computational
117 analysis of music, \eg
118 \cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}.
119
120
121 \comment{
106 The business of making predictions and assessing surprise is essentially 122 The business of making predictions and assessing surprise is essentially
107 one of reasoning under conditions of uncertainty and manipulating 123 one of reasoning under conditions of uncertainty and manipulating
108 degrees of belief about the various proposition which may or may not 124 degrees of belief about the various proposition which may or may not
109 hold, and, as has been argued elsewhere \cite{Cox1946,Jaynes27}, best 125 hold, and, as has been argued elsewhere \cite{Cox1946,Jaynes27}, best
110 quantified in terms of Bayesian probability theory. 126 quantified in terms of Bayesian probability theory.
118 statistical knowledge about musical structure, \eg 134 statistical knowledge about musical structure, \eg
119 \citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also 135 \citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also
120 that statistical models can form an effective basis for computational 136 that statistical models can form an effective basis for computational
121 analysis of music, \eg 137 analysis of music, \eg
122 \cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}. 138 \cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}.
139 }
123 140
124 \subsection{Music and information theory} 141 \subsection{Music and information theory}
125 With a probabilistic framework for music modelling and prediction in hand, 142 With a probabilistic framework for music modelling and prediction in hand,
126 we are in a position to apply quantitative information theory \cite{Shannon48}. 143 we are in a position to apply Shannon's quantitative information theory
144 \cite{Shannon48}.
145 \comment{
146 which provides us with a number of measures, such as entropy
147 and mutual information, which are suitable for quantifying states of
148 uncertainty and surprise, and thus could potentially enable us to build
149 quantitative models of the listening process described above. They are
150 what Berlyne \cite{Berlyne71} called `collative variables' since they are
151 to do with patterns of occurrence rather than medium-specific details.
152 Berlyne sought to show that the collative variables are closely related to
153 perceptual qualities like complexity, tension, interestingness,
154 and even aesthetic value, not just in music, but in other temporal
155 or visual media.
156 The relevance of information theory to music and art has
157 also been addressed by researchers from the 1950s onwards
158 \cite{Youngblood58,CoonsKraehenbuehl1958,Cohen1962,HillerBean66,Moles66,Meyer67}.
159 }
127 The relationship between information theory and music and art in general has been the 160 The relationship between information theory and music and art in general has been the
128 subject of some interest since the 1950s 161 subject of some interest since the 1950s
129 \cite{Youngblood58,CoonsKraehenbuehl1958,HillerBean66,Moles66,Meyer67,Cohen1962}. 162 \cite{Youngblood58,CoonsKraehenbuehl1958,HillerBean66,Moles66,Meyer67,Cohen1962}.
130 The general thesis is that perceptible qualities and subjective 163 The general thesis is that perceptible qualities and subjective
131 states like uncertainty, surprise, complexity, tension, and interestingness 164 states like uncertainty, surprise, complexity, tension, and interestingness
144 % of the material, the composer can thus define, and induce within the 177 % of the material, the composer can thus define, and induce within the
145 % listener, a temporal programme of varying 178 % listener, a temporal programme of varying
146 % levels of uncertainty, ambiguity and surprise. 179 % levels of uncertainty, ambiguity and surprise.
147 180
148 181
149 Previous work in this area \cite{Berlyne74} treated the various
150 information theoretic quantities
151 such as entropy as if they were intrinsic properties of the stimulus---subjects
152 were presented with a sequence of tones with `high entropy', or a visual pattern
153 with `low entropy'. These values were determined from some known `objective'
154 probability model of the stimuli,%
155 \footnote{%
156 The notion of objective probabalities and whether or not they can
157 usefully be said to exist is the subject of some debate, with advocates of
158 subjective probabilities including de Finetti \cite{deFinetti}.}
159 or from simple statistical analyses such as
160 computing emprical distributions. Our approach is explicitly to consider the role
161 of the observer in perception, and more specifically, to consider estimates of
162 entropy \etc with respect to \emph{subjective} probabilities.
163 \subsection{Information dynamic approach} 182 \subsection{Information dynamic approach}
164 183
165 Bringing the various strands together, our working hypothesis is that as a 184 Bringing the various strands together, our working hypothesis is that as a
166 listener (to which will refer as `it') listens to a piece of music, it maintains 185 listener (to which will refer as `it') listens to a piece of music, it maintains
167 a dynamically evolving statistical model that enables it to make predictions 186 a dynamically evolving probabilistic model that enables it to make predictions
168 about how the piece will continue, relying on both its previous experience 187 about how the piece will continue, relying on both its previous experience
169 of music and the immediate context of the piece. As events unfold, it revises 188 of music and the immediate context of the piece. As events unfold, it revises
170 its model and hence its probabilistic belief state, which includes predictive 189 its probabilistic belief state, which includes predictive
171 distributions over future observations. These distributions and changes in 190 distributions over possible future events. These
172 distributions can be characterised in terms of a handful of information 191 % distributions and changes in distributions
173 theoretic-measures such as entropy and relative entropy. By tracing the 192 can be characterised in terms of a handful of information
193 theoretic-measures such as entropy and relative entropy. By tracing the
174 evolution of a these measures, we obtain a representation which captures much 194 evolution of a these measures, we obtain a representation which captures much
175 of the significant structure of the music, but does so at a high level of 195 of the significant structure of the music.
176 \emph{abstraction}, since it is sensitive mainly to \emph{patterns} of occurence, 196
177 rather the details of which specific things occur or even the sensory modality 197 One of the consequences of this approach is that regardless of the details of
178 through which they are detected. This suggests that the 198 the sensory input or even which sensory modality is being processed, the resulting
179 same approach could, in principle, be used to analyse and compare information 199 analysis is in terms of the same units: quantities of information (bits) and
180 flow in different temporal media regardless of whether they are auditory, 200 rates of information flow (bits per second). The probabilistic and information
181 visual or otherwise. 201 theoretic concepts in terms of which the analysis is framed are universal to all sorts
182 202 of data.
183 In addition, the information dynamic approach gives us a principled way 203 In addition, when adaptive probabilistic models are used, expectations are
204 created mainly in response to to \emph{patterns} of occurence,
205 rather the details of which specific things occur.
206 Together, these suggest that an information dynamic analysis captures a
207 high level of \emph{abstraction}, and could be used to
208 make structural comparisons between different temporal media,
209 such as music, film, animation, and dance.
210 % analyse and compare information
211 % flow in different temporal media regardless of whether they are auditory,
212 % visual or otherwise.
213
214 Another consequence is that the information dynamic approach gives us a principled way
184 to address the notion of \emph{subjectivity}, since the analysis is dependent on the 215 to address the notion of \emph{subjectivity}, since the analysis is dependent on the
185 probability model the observer starts off with, which may depend on prior experience 216 probability model the observer starts off with, which may depend on prior experience
186 or other factors, and which may change over time. Thus, inter-subject variablity and 217 or other factors, and which may change over time. Thus, inter-subject variablity and
187 variation in subjects' responses over time are 218 variation in subjects' responses over time are
188 fundamental to the theory. 219 fundamental to the theory.
193 224
194 225
195 \section{Theoretical review} 226 \section{Theoretical review}
196 227
197 \subsection{Entropy and information in sequences} 228 \subsection{Entropy and information in sequences}
198 In this section, we summarise the definitions of some of the relevant quantities 229 Let $X$ denote some variable whos value is initially unknown to our
199 in information dynamics and show how they can be computed in some simple probabilistic 230 hypothetical observer. We will treat $X$ mathematically as a random variable,
200 models (namely, first and higher-order Markov chains, and Gaussian processes [Peter?]). 231 with a value to be drawn from some set (or \emph{alphabet}) $\A$ and a
232 probability distribution representing the observer's beliefs about the
233 true value of $X$.
234 In this case, the observer's uncertainty about $X$ can be quantified
235 as the entropy of the random variable $H(X)$. For a discrete variable
236 with probability mass function $p:\A \to [0,1]$, this is
237 \begin{equation}
238 H(X) = \sum_{x\in\A} -p(x) \log p(x) = \expect{-\log p(X)},
239 \end{equation}
240 where $\expect{}$ is the expectation operator. The negative-log-probability
241 $\ell(x) = -\log p(x)$ of a particular value $x$ can usefully be thought of as
242 the \emph{surprisingness} of the value $x$ should it be observed, and
243 hence the entropy is the expected surprisingness.
244
245 Now suppose that the observer receives some new data $\Data$ that
246 causes a revision of its beliefs about $X$. The \emph{information}
247 in this new data \emph{about} $X$ can be quantified as the
248 Kullback-Leibler (KL) divergence between the prior and posterior
249 distributions $p(x)$ and $p(x|\Data)$ respectively:
250 \begin{equation}
251 \mathcal{I}_{\Data\to X} = D(p_{X|\Data} || p_{X})
252 = \sum_{x\in\A} p(x|\Data) \log \frac{p(x|\Data)}{p(x)}.
253 \end{equation}
254 If there are multiple variables $X_1, X_2$
255 \etc which the observer believes to be dependent, then the observation of
256 one may change its beliefs and hence yield information about the
257 others.
258 The relationships between the various joint entropies, conditional
259 entropies, mutual informations and conditional mutual informations
260 can be visualised in Venn diagram-like \emph{information diagrams}
261 or I-diagrams \cite{Yeung1991}, for example, the three-variable
262 I-diagram in \figrf{venn-example}.
263
201 264
202 \begin{fig}{venn-example} 265 \begin{fig}{venn-example}
203 \newcommand\rad{2.2em}% 266 \newcommand\rad{2.2em}%
204 \newcommand\circo{circle (3.4em)}% 267 \newcommand\circo{circle (3.4em)}%
205 \newcommand\labrad{4.3em} 268 \newcommand\labrad{4.3em}
360 rate, $r_\mu$ is the residual entropy rate, and $b_\mu$ is the predictive 423 rate, $r_\mu$ is the residual entropy rate, and $b_\mu$ is the predictive
361 information rate. The entropy rate is $h_\mu = r_\mu+b_\mu$. 424 information rate. The entropy rate is $h_\mu = r_\mu+b_\mu$.
362 } 425 }
363 \end{fig} 426 \end{fig}
364 427
365 \paragraph{Predictive information rate} 428 \subsection{Predictive information rate}
366 In previous work \cite{AbdallahPlumbley2009}, we introduced 429 In previous work \cite{AbdallahPlumbley2009}, we introduced
367 % examined several 430 % examined several
368 % information-theoretic measures that could be used to characterise 431 % information-theoretic measures that could be used to characterise
369 % not only random processes (\ie, an ensemble of possible sequences), 432 % not only random processes (\ie, an ensemble of possible sequences),
370 % but also the dynamic progress of specific realisations of such processes. 433 % but also the dynamic progress of specific realisations of such processes.
430 perceived value. Repeated exposure sometimes results 493 perceived value. Repeated exposure sometimes results
431 in a move to the left along the curve \cite{Berlyne71}. 494 in a move to the left along the curve \cite{Berlyne71}.
432 } 495 }
433 \end{fig} 496 \end{fig}
434 497
498 \subsection{Other sequential information measures}
499
500 James et al \cite{JamesEllisonCrutchfield2011} study the predictive information
501 rate and also examine some related measures. In particular they identify the
502 $\sigma_\mu$, the difference between the multi-information rate and the excess
503 entropy, as an interesting quantity that measures the predictive benefit of
504 model-building (that is, maintaining an internal state summarising past
505 observations in order to make better predictions). They also identify
506 $w_\mu = \rho_\mu + b_{\mu}$, which they call the \emph{local exogenous
507 information}.
435 508
436 \subsection{First order Markov chains} 509 \subsection{First order Markov chains}
437 These are the simplest non-trivial models to which information dynamics methods 510 These are the simplest non-trivial models to which information dynamics methods
438 can be applied. In \cite{AbdallahPlumbley2009} we, showed that the predictive information 511 can be applied. In \cite{AbdallahPlumbley2009} we, showed that the predictive information
439 rate can be expressed simply in terms of the entropy rate of the Markov chain. 512 rate can be expressed simply in terms of the entropy rate of the Markov chain.