Mercurial > hg > cip2012
comparison draft.tex @ 25:3f08d18c65ce
Updates to section 2.
author | samer |
---|---|
date | Tue, 13 Mar 2012 16:02:05 +0000 |
parents | 79ede31feb20 |
children | fb1bfe785c05 |
comparison
equal
deleted
inserted
replaced
24:79ede31feb20 | 25:3f08d18c65ce |
---|---|
19 \newcommand\preals{\reals_+} | 19 \newcommand\preals{\reals_+} |
20 \newcommand\X{\mathcal{X}} | 20 \newcommand\X{\mathcal{X}} |
21 \newcommand\Y{\mathcal{Y}} | 21 \newcommand\Y{\mathcal{Y}} |
22 \newcommand\domS{\mathcal{S}} | 22 \newcommand\domS{\mathcal{S}} |
23 \newcommand\A{\mathcal{A}} | 23 \newcommand\A{\mathcal{A}} |
24 \newcommand\Data{\mathcal{D}} | |
24 \newcommand\rvm[1]{\mathrm{#1}} | 25 \newcommand\rvm[1]{\mathrm{#1}} |
25 \newcommand\sps{\,.\,} | 26 \newcommand\sps{\,.\,} |
26 \newcommand\Ipred{\mathcal{I}_{\mathrm{pred}}} | 27 \newcommand\Ipred{\mathcal{I}_{\mathrm{pred}}} |
27 \newcommand\Ix{\mathcal{I}} | 28 \newcommand\Ix{\mathcal{I}} |
28 \newcommand\IXZ{\overline{\underline{\mathcal{I}}}} | 29 \newcommand\IXZ{\overline{\underline{\mathcal{I}}}} |
71 In this paper, we review the theoretical foundations of information dynamics | 72 In this paper, we review the theoretical foundations of information dynamics |
72 and discuss a few emerging areas of application. | 73 and discuss a few emerging areas of application. |
73 \end{abstract} | 74 \end{abstract} |
74 | 75 |
75 | 76 |
76 \section{Expectation and surprise in music} | 77 \section{Introduction} |
77 \label{s:Intro} | 78 \label{s:Intro} |
78 | 79 |
80 \subsection{Expectation and surprise in music} | |
79 One of the effects of listening to music is to create | 81 One of the effects of listening to music is to create |
80 expectations of what is to come next, which may be fulfilled | 82 expectations of what is to come next, which may be fulfilled |
81 immediately, after some delay, or not at all as the case may be. | 83 immediately, after some delay, or not at all as the case may be. |
82 This is the thesis put forward by, amongst others, music theorists | 84 This is the thesis put forward by, amongst others, music theorists |
83 L. B. Meyer \cite{Meyer67} and Narmour \citep{Narmour77}, but was | 85 L. B. Meyer \cite{Meyer67} and Narmour \citep{Narmour77}, but was |
101 on how we change and revise our conceptions \emph{as events happen}, on | 103 on how we change and revise our conceptions \emph{as events happen}, on |
102 how expectation and prediction interact with occurrence, and that, to a | 104 how expectation and prediction interact with occurrence, and that, to a |
103 large degree, the way to understand the effect of music is to focus on | 105 large degree, the way to understand the effect of music is to focus on |
104 this `kinetics' of expectation and surprise. | 106 this `kinetics' of expectation and surprise. |
105 | 107 |
108 Prediction and expectation are essentially probabilistic concepts | |
109 and can be treated mathematically using probability theory. | |
110 We suppose that when we listen to music, expectations are created on the basis | |
111 of our familiarity with various styles of music and our ability to | |
112 detect and learn statistical regularities in the music as they emerge, | |
113 There is experimental evidence that human listeners are able to internalise | |
114 statistical knowledge about musical structure, \eg | |
115 \citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also | |
116 that statistical models can form an effective basis for computational | |
117 analysis of music, \eg | |
118 \cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}. | |
119 | |
120 | |
121 \comment{ | |
106 The business of making predictions and assessing surprise is essentially | 122 The business of making predictions and assessing surprise is essentially |
107 one of reasoning under conditions of uncertainty and manipulating | 123 one of reasoning under conditions of uncertainty and manipulating |
108 degrees of belief about the various proposition which may or may not | 124 degrees of belief about the various proposition which may or may not |
109 hold, and, as has been argued elsewhere \cite{Cox1946,Jaynes27}, best | 125 hold, and, as has been argued elsewhere \cite{Cox1946,Jaynes27}, best |
110 quantified in terms of Bayesian probability theory. | 126 quantified in terms of Bayesian probability theory. |
118 statistical knowledge about musical structure, \eg | 134 statistical knowledge about musical structure, \eg |
119 \citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also | 135 \citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also |
120 that statistical models can form an effective basis for computational | 136 that statistical models can form an effective basis for computational |
121 analysis of music, \eg | 137 analysis of music, \eg |
122 \cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}. | 138 \cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}. |
139 } | |
123 | 140 |
124 \subsection{Music and information theory} | 141 \subsection{Music and information theory} |
125 With a probabilistic framework for music modelling and prediction in hand, | 142 With a probabilistic framework for music modelling and prediction in hand, |
126 we are in a position to apply quantitative information theory \cite{Shannon48}. | 143 we are in a position to apply Shannon's quantitative information theory |
144 \cite{Shannon48}. | |
145 \comment{ | |
146 which provides us with a number of measures, such as entropy | |
147 and mutual information, which are suitable for quantifying states of | |
148 uncertainty and surprise, and thus could potentially enable us to build | |
149 quantitative models of the listening process described above. They are | |
150 what Berlyne \cite{Berlyne71} called `collative variables' since they are | |
151 to do with patterns of occurrence rather than medium-specific details. | |
152 Berlyne sought to show that the collative variables are closely related to | |
153 perceptual qualities like complexity, tension, interestingness, | |
154 and even aesthetic value, not just in music, but in other temporal | |
155 or visual media. | |
156 The relevance of information theory to music and art has | |
157 also been addressed by researchers from the 1950s onwards | |
158 \cite{Youngblood58,CoonsKraehenbuehl1958,Cohen1962,HillerBean66,Moles66,Meyer67}. | |
159 } | |
127 The relationship between information theory and music and art in general has been the | 160 The relationship between information theory and music and art in general has been the |
128 subject of some interest since the 1950s | 161 subject of some interest since the 1950s |
129 \cite{Youngblood58,CoonsKraehenbuehl1958,HillerBean66,Moles66,Meyer67,Cohen1962}. | 162 \cite{Youngblood58,CoonsKraehenbuehl1958,HillerBean66,Moles66,Meyer67,Cohen1962}. |
130 The general thesis is that perceptible qualities and subjective | 163 The general thesis is that perceptible qualities and subjective |
131 states like uncertainty, surprise, complexity, tension, and interestingness | 164 states like uncertainty, surprise, complexity, tension, and interestingness |
144 % of the material, the composer can thus define, and induce within the | 177 % of the material, the composer can thus define, and induce within the |
145 % listener, a temporal programme of varying | 178 % listener, a temporal programme of varying |
146 % levels of uncertainty, ambiguity and surprise. | 179 % levels of uncertainty, ambiguity and surprise. |
147 | 180 |
148 | 181 |
149 Previous work in this area \cite{Berlyne74} treated the various | |
150 information theoretic quantities | |
151 such as entropy as if they were intrinsic properties of the stimulus---subjects | |
152 were presented with a sequence of tones with `high entropy', or a visual pattern | |
153 with `low entropy'. These values were determined from some known `objective' | |
154 probability model of the stimuli,% | |
155 \footnote{% | |
156 The notion of objective probabalities and whether or not they can | |
157 usefully be said to exist is the subject of some debate, with advocates of | |
158 subjective probabilities including de Finetti \cite{deFinetti}.} | |
159 or from simple statistical analyses such as | |
160 computing emprical distributions. Our approach is explicitly to consider the role | |
161 of the observer in perception, and more specifically, to consider estimates of | |
162 entropy \etc with respect to \emph{subjective} probabilities. | |
163 \subsection{Information dynamic approach} | 182 \subsection{Information dynamic approach} |
164 | 183 |
165 Bringing the various strands together, our working hypothesis is that as a | 184 Bringing the various strands together, our working hypothesis is that as a |
166 listener (to which will refer as `it') listens to a piece of music, it maintains | 185 listener (to which will refer as `it') listens to a piece of music, it maintains |
167 a dynamically evolving statistical model that enables it to make predictions | 186 a dynamically evolving probabilistic model that enables it to make predictions |
168 about how the piece will continue, relying on both its previous experience | 187 about how the piece will continue, relying on both its previous experience |
169 of music and the immediate context of the piece. As events unfold, it revises | 188 of music and the immediate context of the piece. As events unfold, it revises |
170 its model and hence its probabilistic belief state, which includes predictive | 189 its probabilistic belief state, which includes predictive |
171 distributions over future observations. These distributions and changes in | 190 distributions over possible future events. These |
172 distributions can be characterised in terms of a handful of information | 191 % distributions and changes in distributions |
173 theoretic-measures such as entropy and relative entropy. By tracing the | 192 can be characterised in terms of a handful of information |
193 theoretic-measures such as entropy and relative entropy. By tracing the | |
174 evolution of a these measures, we obtain a representation which captures much | 194 evolution of a these measures, we obtain a representation which captures much |
175 of the significant structure of the music, but does so at a high level of | 195 of the significant structure of the music. |
176 \emph{abstraction}, since it is sensitive mainly to \emph{patterns} of occurence, | 196 |
177 rather the details of which specific things occur or even the sensory modality | 197 One of the consequences of this approach is that regardless of the details of |
178 through which they are detected. This suggests that the | 198 the sensory input or even which sensory modality is being processed, the resulting |
179 same approach could, in principle, be used to analyse and compare information | 199 analysis is in terms of the same units: quantities of information (bits) and |
180 flow in different temporal media regardless of whether they are auditory, | 200 rates of information flow (bits per second). The probabilistic and information |
181 visual or otherwise. | 201 theoretic concepts in terms of which the analysis is framed are universal to all sorts |
182 | 202 of data. |
183 In addition, the information dynamic approach gives us a principled way | 203 In addition, when adaptive probabilistic models are used, expectations are |
204 created mainly in response to to \emph{patterns} of occurence, | |
205 rather the details of which specific things occur. | |
206 Together, these suggest that an information dynamic analysis captures a | |
207 high level of \emph{abstraction}, and could be used to | |
208 make structural comparisons between different temporal media, | |
209 such as music, film, animation, and dance. | |
210 % analyse and compare information | |
211 % flow in different temporal media regardless of whether they are auditory, | |
212 % visual or otherwise. | |
213 | |
214 Another consequence is that the information dynamic approach gives us a principled way | |
184 to address the notion of \emph{subjectivity}, since the analysis is dependent on the | 215 to address the notion of \emph{subjectivity}, since the analysis is dependent on the |
185 probability model the observer starts off with, which may depend on prior experience | 216 probability model the observer starts off with, which may depend on prior experience |
186 or other factors, and which may change over time. Thus, inter-subject variablity and | 217 or other factors, and which may change over time. Thus, inter-subject variablity and |
187 variation in subjects' responses over time are | 218 variation in subjects' responses over time are |
188 fundamental to the theory. | 219 fundamental to the theory. |
193 | 224 |
194 | 225 |
195 \section{Theoretical review} | 226 \section{Theoretical review} |
196 | 227 |
197 \subsection{Entropy and information in sequences} | 228 \subsection{Entropy and information in sequences} |
198 In this section, we summarise the definitions of some of the relevant quantities | 229 Let $X$ denote some variable whos value is initially unknown to our |
199 in information dynamics and show how they can be computed in some simple probabilistic | 230 hypothetical observer. We will treat $X$ mathematically as a random variable, |
200 models (namely, first and higher-order Markov chains, and Gaussian processes [Peter?]). | 231 with a value to be drawn from some set (or \emph{alphabet}) $\A$ and a |
232 probability distribution representing the observer's beliefs about the | |
233 true value of $X$. | |
234 In this case, the observer's uncertainty about $X$ can be quantified | |
235 as the entropy of the random variable $H(X)$. For a discrete variable | |
236 with probability mass function $p:\A \to [0,1]$, this is | |
237 \begin{equation} | |
238 H(X) = \sum_{x\in\A} -p(x) \log p(x) = \expect{-\log p(X)}, | |
239 \end{equation} | |
240 where $\expect{}$ is the expectation operator. The negative-log-probability | |
241 $\ell(x) = -\log p(x)$ of a particular value $x$ can usefully be thought of as | |
242 the \emph{surprisingness} of the value $x$ should it be observed, and | |
243 hence the entropy is the expected surprisingness. | |
244 | |
245 Now suppose that the observer receives some new data $\Data$ that | |
246 causes a revision of its beliefs about $X$. The \emph{information} | |
247 in this new data \emph{about} $X$ can be quantified as the | |
248 Kullback-Leibler (KL) divergence between the prior and posterior | |
249 distributions $p(x)$ and $p(x|\Data)$ respectively: | |
250 \begin{equation} | |
251 \mathcal{I}_{\Data\to X} = D(p_{X|\Data} || p_{X}) | |
252 = \sum_{x\in\A} p(x|\Data) \log \frac{p(x|\Data)}{p(x)}. | |
253 \end{equation} | |
254 If there are multiple variables $X_1, X_2$ | |
255 \etc which the observer believes to be dependent, then the observation of | |
256 one may change its beliefs and hence yield information about the | |
257 others. | |
258 The relationships between the various joint entropies, conditional | |
259 entropies, mutual informations and conditional mutual informations | |
260 can be visualised in Venn diagram-like \emph{information diagrams} | |
261 or I-diagrams \cite{Yeung1991}, for example, the three-variable | |
262 I-diagram in \figrf{venn-example}. | |
263 | |
201 | 264 |
202 \begin{fig}{venn-example} | 265 \begin{fig}{venn-example} |
203 \newcommand\rad{2.2em}% | 266 \newcommand\rad{2.2em}% |
204 \newcommand\circo{circle (3.4em)}% | 267 \newcommand\circo{circle (3.4em)}% |
205 \newcommand\labrad{4.3em} | 268 \newcommand\labrad{4.3em} |
360 rate, $r_\mu$ is the residual entropy rate, and $b_\mu$ is the predictive | 423 rate, $r_\mu$ is the residual entropy rate, and $b_\mu$ is the predictive |
361 information rate. The entropy rate is $h_\mu = r_\mu+b_\mu$. | 424 information rate. The entropy rate is $h_\mu = r_\mu+b_\mu$. |
362 } | 425 } |
363 \end{fig} | 426 \end{fig} |
364 | 427 |
365 \paragraph{Predictive information rate} | 428 \subsection{Predictive information rate} |
366 In previous work \cite{AbdallahPlumbley2009}, we introduced | 429 In previous work \cite{AbdallahPlumbley2009}, we introduced |
367 % examined several | 430 % examined several |
368 % information-theoretic measures that could be used to characterise | 431 % information-theoretic measures that could be used to characterise |
369 % not only random processes (\ie, an ensemble of possible sequences), | 432 % not only random processes (\ie, an ensemble of possible sequences), |
370 % but also the dynamic progress of specific realisations of such processes. | 433 % but also the dynamic progress of specific realisations of such processes. |
430 perceived value. Repeated exposure sometimes results | 493 perceived value. Repeated exposure sometimes results |
431 in a move to the left along the curve \cite{Berlyne71}. | 494 in a move to the left along the curve \cite{Berlyne71}. |
432 } | 495 } |
433 \end{fig} | 496 \end{fig} |
434 | 497 |
498 \subsection{Other sequential information measures} | |
499 | |
500 James et al \cite{JamesEllisonCrutchfield2011} study the predictive information | |
501 rate and also examine some related measures. In particular they identify the | |
502 $\sigma_\mu$, the difference between the multi-information rate and the excess | |
503 entropy, as an interesting quantity that measures the predictive benefit of | |
504 model-building (that is, maintaining an internal state summarising past | |
505 observations in order to make better predictions). They also identify | |
506 $w_\mu = \rho_\mu + b_{\mu}$, which they call the \emph{local exogenous | |
507 information}. | |
435 | 508 |
436 \subsection{First order Markov chains} | 509 \subsection{First order Markov chains} |
437 These are the simplest non-trivial models to which information dynamics methods | 510 These are the simplest non-trivial models to which information dynamics methods |
438 can be applied. In \cite{AbdallahPlumbley2009} we, showed that the predictive information | 511 can be applied. In \cite{AbdallahPlumbley2009} we, showed that the predictive information |
439 rate can be expressed simply in terms of the entropy rate of the Markov chain. | 512 rate can be expressed simply in terms of the entropy rate of the Markov chain. |