Mercurial > hg > cip2012
comparison draft.tex @ 18:ca694f7dc3f9
added bib files.
author | samer |
---|---|
date | Fri, 09 Mar 2012 18:45:55 +0000 |
parents | e47aaea2ac28 |
children | 739b2444a4ac |
comparison
equal
deleted
inserted
replaced
17:e47aaea2ac28 | 18:ca694f7dc3f9 |
---|---|
1 \documentclass[conference]{IEEEtran} | 1 \documentclass[conference,a4paper]{IEEEtran} |
2 \usepackage{cite} | 2 \usepackage{cite} |
3 \usepackage[cmex10]{amsmath} | 3 \usepackage[cmex10]{amsmath} |
4 \usepackage{graphicx} | 4 \usepackage{graphicx} |
5 \usepackage{amssymb} | 5 \usepackage{amssymb} |
6 \usepackage{epstopdf} | 6 \usepackage{epstopdf} |
7 \usepackage{url} | 7 \usepackage{url} |
8 \usepackage{listings} | 8 \usepackage{listings} |
9 %\usepackage[expectangle]{tools} | |
9 \usepackage{tools} | 10 \usepackage{tools} |
11 \usepackage{tikz} | |
12 \usetikzlibrary{calc} | |
13 \usetikzlibrary{matrix} | |
14 \usetikzlibrary{patterns} | |
15 \usetikzlibrary{arrows} | |
10 | 16 |
11 \let\citep=\cite | 17 \let\citep=\cite |
12 \def\squash{} | 18 \newcommand{\colfig}[2][1]{\includegraphics[width=#1\linewidth]{figs/#2}}% |
19 \newcommand\preals{\reals_+} | |
20 \newcommand\X{\mathcal{X}} | |
21 \newcommand\Y{\mathcal{Y}} | |
22 \newcommand\domS{\mathcal{S}} | |
23 \newcommand\A{\mathcal{A}} | |
24 \newcommand\rvm[1]{\mathrm{#1}} | |
25 \newcommand\sps{\,.\,} | |
26 \newcommand\Ipred{\mathcal{I}_{\mathrm{pred}}} | |
27 \newcommand\Ix{\mathcal{I}} | |
28 \newcommand\IXZ{\overline{\underline{\mathcal{I}}}} | |
29 \newcommand\x{\vec{x}} | |
30 \newcommand\Ham[1]{\mathcal{H}_{#1}} | |
31 \newcommand\subsets[2]{[#1]^{(k)}} | |
32 \def\bet(#1,#2){#1..#2} | |
33 | |
34 | |
35 \def\ev(#1=#2){#1\!\!=\!#2} | |
36 \newcommand\rv[1]{\Omega \to #1} | |
37 \newcommand\ceq{\!\!=\!} | |
38 \newcommand\cmin{\!-\!} | |
39 \newcommand\modulo[2]{#1\!\!\!\!\!\mod#2} | |
40 | |
41 \newcommand\sumitoN{\sum_{i=1}^N} | |
42 \newcommand\sumktoK{\sum_{k=1}^K} | |
43 \newcommand\sumjtoK{\sum_{j=1}^K} | |
44 \newcommand\sumalpha{\sum_{\alpha\in\A}} | |
45 \newcommand\prodktoK{\prod_{k=1}^K} | |
46 \newcommand\prodjtoK{\prod_{j=1}^K} | |
47 | |
48 \newcommand\past[1]{\overset{\rule{0pt}{0.2em}\smash{\leftarrow}}{#1}} | |
49 \newcommand\fut[1]{\overset{\rule{0pt}{0.1em}\smash{\rightarrow}}{#1}} | |
50 \newcommand\parity[2]{P^{#1}_{2,#2}} | |
13 | 51 |
14 %\usepackage[parfill]{parskip} | 52 %\usepackage[parfill]{parskip} |
15 | 53 |
16 \begin{document} | 54 \begin{document} |
17 \title{Cognitive Music Modelling: an Information Dynamics Approach} | 55 \title{Cognitive Music Modelling: an Information Dynamics Approach} |
23 Queen Mary University of London\\ | 61 Queen Mary University of London\\ |
24 Mile End Road, London E1 4NS\\ | 62 Mile End Road, London E1 4NS\\ |
25 Email:}} | 63 Email:}} |
26 | 64 |
27 \maketitle | 65 \maketitle |
28 \begin{abstract}People take in information when perceiving music. With it they continually build predictive models of what is going to happen. There is a relationship between information measures and how we perceive music. An information theoretic approach to music cognition is thus a fruitful avenue of research. | 66 \begin{abstract} |
67 People take in information when perceiving music. With it they continually | |
68 build predictive models of what is going to happen. There is a relationship | |
69 between information measures and how we perceive music. An information | |
70 theoretic approach to music cognition is thus a fruitful avenue of research. | |
71 In this paper, we review the theoretical foundations of information dynamics | |
72 and discuss a few emerging areas of application. | |
29 \end{abstract} | 73 \end{abstract} |
30 | 74 |
31 | 75 |
32 \section{Expectation and surprise in music} | 76 \section{Expectation and surprise in music} |
33 \label{s:Intro} | 77 \label{s:Intro} |
34 | 78 |
35 One of the more salient effects of listening to music is to create | 79 One of the effects of listening to music is to create |
36 \emph{expectations} of what is to come next, which may be fulfilled | 80 expectations of what is to come next, which may be fulfilled |
37 immediately, after some delay, or not at all as the case may be. | 81 immediately, after some delay, or not at all as the case may be. |
38 This is the thesis put forward by, amongst others, music theorists | 82 This is the thesis put forward by, amongst others, music theorists |
39 L. B. Meyer \cite{Meyer67} and Narmour \citep{Narmour77}. | 83 L. B. Meyer \cite{Meyer67} and Narmour \citep{Narmour77}, but was |
40 In fact, %the gist of | 84 recognised much earlier; for example, |
41 this insight predates Meyer quite considerably; for example, | |
42 it was elegantly put by Hanslick \cite{Hanslick1854} in the | 85 it was elegantly put by Hanslick \cite{Hanslick1854} in the |
43 nineteenth century: | 86 nineteenth century: |
44 \begin{quote} | 87 \begin{quote} |
45 `The most important factor in the mental process which accompanies the | 88 `The most important factor in the mental process which accompanies the |
46 act of listening to music, and which converts it to a source of pleasure, | 89 act of listening to music, and which converts it to a source of pleasure, |
47 is %\ldots | 90 is \ldots the intellectual satisfaction |
48 frequently overlooked. We here refer to the intellectual satisfaction | |
49 which the listener derives from continually following and anticipating | 91 which the listener derives from continually following and anticipating |
50 the composer's intentions---now, to see his expectations fulfilled, and | 92 the composer's intentions---now, to see his expectations fulfilled, and |
51 now, to find himself agreeably mistaken. It is a matter of course that | 93 now, to find himself agreeably mistaken. |
52 this intellectual flux and reflux, this perpetual giving and receiving | 94 %It is a matter of course that |
53 takes place unconsciously, and with the rapidity of lightning-flashes.' | 95 %this intellectual flux and reflux, this perpetual giving and receiving |
96 %takes place unconsciously, and with the rapidity of lightning-flashes.' | |
54 \end{quote} | 97 \end{quote} |
55 | |
56 An essential aspect of this is that music is experienced as a phenomenon | 98 An essential aspect of this is that music is experienced as a phenomenon |
57 that `unfolds' in time, rather than being apprehended as a static object | 99 that `unfolds' in time, rather than being apprehended as a static object |
58 presented in its entirety. Meyer argued that musical experience depends | 100 presented in its entirety. Meyer argued that musical experience depends |
59 on how we change and revise our conceptions \emph{as events happen}, on | 101 on how we change and revise our conceptions \emph{as events happen}, on |
60 how expectation and prediction interact with occurrence, and that, to a | 102 how expectation and prediction interact with occurrence, and that, to a |
64 The business of making predictions and assessing surprise is essentially | 106 The business of making predictions and assessing surprise is essentially |
65 one of reasoning under conditions of uncertainty and manipulating | 107 one of reasoning under conditions of uncertainty and manipulating |
66 degrees of belief about the various proposition which may or may not | 108 degrees of belief about the various proposition which may or may not |
67 hold, and, as has been argued elsewhere \cite{Cox1946,Jaynes27}, best | 109 hold, and, as has been argued elsewhere \cite{Cox1946,Jaynes27}, best |
68 quantified in terms of Bayesian probability theory. | 110 quantified in terms of Bayesian probability theory. |
69 % Thus, we assume that musical schemata are encoded as probabilistic % | |
70 %\citep{Meyer56} models, and | |
71 Thus, we suppose that | 111 Thus, we suppose that |
72 when we listen to music, expectations are created on the basis of our | 112 when we listen to music, expectations are created on the basis of our |
73 familiarity with various stylistic norms %, that is, using models that | 113 familiarity with various stylistic norms %, that is, using models that |
74 encode the statistics of music in general, the particular styles of | 114 encode the statistics of music in general, the particular styles of |
75 music that seem best to fit the piece we happen to be listening to, and | 115 music that seem best to fit the piece we happen to be listening to, and |
76 the emerging structures peculiar to the current piece. There is | 116 the emerging structures peculiar to the current piece. There is |
77 experimental evidence that human listeners are able to internalise | 117 experimental evidence that human listeners are able to internalise |
78 statistical knowledge about musical structure, \eg | 118 statistical knowledge about musical structure, \eg |
79 \citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also | 119 \citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also |
80 that statistical models can form an effective basis for computational | 120 that statistical models can form an effective basis for computational |
81 % analysis of music, \eg \cite{Pearce2005}. | |
82 analysis of music, \eg | 121 analysis of music, \eg |
83 \cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}. | 122 \cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}. |
84 % \cite{Ferrand2002}. Dubnov and Assayag PSTs? | 123 |
85 | |
86 \squash | |
87 \subsection{Music and information theory} | 124 \subsection{Music and information theory} |
88 Given a probabilistic framework for music modelling and prediction, | 125 Given a probabilistic framework for music modelling and prediction, |
89 it is a small step to apply quantitative information theory \cite{Shannon48} to | 126 it is a small step to apply quantitative information theory \cite{Shannon48} to |
90 the models at hand. | 127 the models at hand. |
91 The relationship between information theory and music and art in general has been the | 128 The relationship between information theory and music and art in general has been the |
121 usefully be said to exist is the subject of some debate, with advocates of | 158 usefully be said to exist is the subject of some debate, with advocates of |
122 subjective probabilities including de Finetti \cite{deFinetti}. | 159 subjective probabilities including de Finetti \cite{deFinetti}. |
123 Accordingly, we will treat the concept of a `true' or `objective' probability | 160 Accordingly, we will treat the concept of a `true' or `objective' probability |
124 models with a grain of salt and not rely on them in our | 161 models with a grain of salt and not rely on them in our |
125 theoretical development.}% | 162 theoretical development.}% |
126 % since probabilities are almost always a function of the state of knowledge of the observer | |
127 or from simple statistical analyses such as | 163 or from simple statistical analyses such as |
128 computing emprical distributions. Our approach is explicitly to consider the role | 164 computing emprical distributions. Our approach is explicitly to consider the role |
129 of the observer in perception, and more specifically, to consider estimates of | 165 of the observer in perception, and more specifically, to consider estimates of |
130 entropy \etc with respect to \emph{subjective} probabilities. | 166 entropy \etc with respect to \emph{subjective} probabilities. |
131 % !!REV - DONE - explain use of quoted `objective' | |
132 | |
133 % !!REV - previous work on information theory in music | |
134 More recent work on using information theoretic concepts to analyse music in | |
135 includes Simon's \cite{Simon2005} assessments of the entropy of | |
136 Jazz improvisations and Dubnov's | |
137 \cite{Dubnov2006,DubnovMcAdamsReynolds2006,Dubnov2008} | |
138 investigations of the `information rate' of musical processes, which is related | |
139 to the notion of redundancy in a communications channel. | |
140 Dubnov's work in particular is informed by similar concerns to our own | |
141 and we will discuss the relationship between it and our work at | |
142 several points later in this paper | |
143 (see \secrf{Redundancy}, \secrf{methods} and \secrf{RelatedWork}). | |
144 | |
145 | |
146 % !!REV - DONE - rephrase, check grammar (now there are too many 'one's!) | |
147 \squash | |
148 \subsection{Information dynamic approach} | 167 \subsection{Information dynamic approach} |
149 | 168 |
150 Bringing the various strands together, our working hypothesis is that | 169 Bringing the various strands together, our working hypothesis is that |
151 as a listener (to which will refer gender neutrally as `it') | 170 as a listener (to which will refer gender neutrally as `it') |
152 listens to a piece of music, it maintains a dynamically evolving statistical | 171 listens to a piece of music, it maintains a dynamically evolving statistical |
161 By tracing the evolution of a these measures, we obtain a representation | 180 By tracing the evolution of a these measures, we obtain a representation |
162 which captures much of the significant structure of the | 181 which captures much of the significant structure of the |
163 music. | 182 music. |
164 This approach has a number of features which we list below. | 183 This approach has a number of features which we list below. |
165 | 184 |
166 (1) \emph{Abstraction}: | 185 \emph{Abstraction}: |
167 Because it is sensitive mainly to \emph{patterns} of occurence, | 186 Because it is sensitive mainly to \emph{patterns} of occurence, |
168 rather the details of which specific things occur, | 187 rather the details of which specific things occur, |
169 it operates at a level of abstraction removed from the details of the sensory | 188 it operates at a level of abstraction removed from the details of the sensory |
170 experience and the medium through which it was received, suggesting that the | 189 experience and the medium through which it was received, suggesting that the |
171 same approach could, in principle, be used to analyse and compare information | 190 same approach could, in principle, be used to analyse and compare information |
172 flow in different temporal media regardless of whether they are auditory, | 191 flow in different temporal media regardless of whether they are auditory, |
173 visual or otherwise. | 192 visual or otherwise. |
174 | 193 |
175 (2) \emph{Generality}: | 194 \emph{Generality} applicable to any probabilistic model. |
176 This approach does not proscribe which probabilistic models should be used---the | 195 |
177 choice can be guided by standard model selection criteria such as Bayes | 196 \emph{Subjectivity}: |
178 factors \cite{KassRaftery1995}, \etc | |
179 | |
180 (3) \emph{Richness}: | |
181 It may be effective to use a model with time-dependent latent | |
182 variables, such as a hidden Markov model. In these cases, we can track changes | |
183 in beliefs about the hidden variables as well as the observed ones, adding | |
184 another layer of richness to the description while maintaining the same | |
185 level of abstraction. | |
186 For example, harmony (\ie, the `current chord') in music is not stated explicitly, but rather | |
187 must be inferred from the musical surface; nonetheless, a sense of harmonic | |
188 progression is an important aspect of many styles of music. | |
189 | |
190 (4) \emph{Subjectivity}: | |
191 Since the analysis is dependent on the probability model the observer brings to the | 197 Since the analysis is dependent on the probability model the observer brings to the |
192 problem, which may depend on prior experience or other factors, and which may change | 198 problem, which may depend on prior experience or other factors, and which may change |
193 over time, inter-subject variablity and variation in subjects' responses over time are | 199 over time, inter-subject variablity and variation in subjects' responses over time are |
194 fundamental to the theory. It is essentially a theory of subjective response | 200 fundamental to the theory. It is essentially a theory of subjective response |
195 | 201 |
196 % !!REV - clarify aims of paper. | 202 %modelling the creative process, which often alternates between generative |
197 Having outlined the basic ideas, our aims in pursuing this line of thought | 203 %and selective or evaluative phases \cite{Boden1990}, and would have |
198 are threefold: firstly, to propose dynamic information-based measures which | 204 %applications in tools for computer aided composition. |
199 are coherent from a theoretical point of view and consistent with the general | 205 |
200 principles of probabilistic inference, with possible applications in | 206 |
201 regulating machine learning systems; | 207 \section{Theoretical review} |
202 % when heuristics are required to manage intractible models or limited computational resources. | 208 |
203 secondly, to construct computational models of what human brains are doing | 209 In this section, we summarise the definitions of some of the relevant quantities |
204 in response to music, on the basis that our brains implement, or at least | 210 in information dynamics and show how they can be computed in some simple probabilistic |
205 approximate, optimal probabilistic inference under the relevant constraints; | 211 models (namely, first and higher-order Markov chains, and Gaussian processes [Peter?]). |
206 and thirdly, to construct a computational model of a certain restricted | 212 |
207 field of aesthetic judgements (namely judgements related to formal structure) | 213 \begin{fig}{venn-example} |
208 that may shed light on what makes a stimulus interesting or aesthetically | 214 \newcommand\rad{2.2em}% |
209 pleasing. This would be of particular relevance to understanding and | 215 \newcommand\circo{circle (3.4em)}% |
210 modelling the creative process, which often alternates between generative | 216 \newcommand\labrad{4.3em} |
211 and selective or evaluative phases \cite{Boden1990}, and would have | 217 \newcommand\bound{(-6em,-5em) rectangle (6em,6em)} |
212 applications in tools for computer aided composition. | 218 \newcommand\colsep{\ } |
219 \newcommand\clipin[1]{\clip (#1) \circo;}% | |
220 \newcommand\clipout[1]{\clip \bound (#1) \circo;}% | |
221 \newcommand\cliptwo[3]{% | |
222 \begin{scope} | |
223 \clipin{#1}; | |
224 \clipin{#2}; | |
225 \clipout{#3}; | |
226 \fill[black!30] \bound; | |
227 \end{scope} | |
228 }% | |
229 \newcommand\clipone[3]{% | |
230 \begin{scope} | |
231 \clipin{#1}; | |
232 \clipout{#2}; | |
233 \clipout{#3}; | |
234 \fill[black!15] \bound; | |
235 \end{scope} | |
236 }% | |
237 \begin{tabular}{c@{\colsep}c} | |
238 \begin{tikzpicture}[baseline=0pt] | |
239 \coordinate (p1) at (90:\rad); | |
240 \coordinate (p2) at (210:\rad); | |
241 \coordinate (p3) at (-30:\rad); | |
242 \clipone{p1}{p2}{p3}; | |
243 \clipone{p2}{p3}{p1}; | |
244 \clipone{p3}{p1}{p2}; | |
245 \cliptwo{p1}{p2}{p3}; | |
246 \cliptwo{p2}{p3}{p1}; | |
247 \cliptwo{p3}{p1}{p2}; | |
248 \begin{scope} | |
249 \clip (p1) \circo; | |
250 \clip (p2) \circo; | |
251 \clip (p3) \circo; | |
252 \fill[black!45] \bound; | |
253 \end{scope} | |
254 \draw (p1) \circo; | |
255 \draw (p2) \circo; | |
256 \draw (p3) \circo; | |
257 \path | |
258 (barycentric cs:p3=1,p1=-0.2,p2=-0.1) +(0ex,0) node {$I_{3|12}$} | |
259 (barycentric cs:p1=1,p2=-0.2,p3=-0.1) +(0ex,0) node {$I_{1|23}$} | |
260 (barycentric cs:p2=1,p3=-0.2,p1=-0.1) +(0ex,0) node {$I_{2|13}$} | |
261 (barycentric cs:p3=1,p2=1,p1=-0.55) +(0ex,0) node {$I_{23|1}$} | |
262 (barycentric cs:p1=1,p3=1,p2=-0.55) +(0ex,0) node {$I_{13|2}$} | |
263 (barycentric cs:p2=1,p1=1,p3=-0.55) +(0ex,0) node {$I_{12|3}$} | |
264 (barycentric cs:p3=1,p2=1,p1=1) node {$I_{123}$} | |
265 ; | |
266 \path | |
267 (p1) +(140:\labrad) node {$X_1$} | |
268 (p2) +(-140:\labrad) node {$X_2$} | |
269 (p3) +(-40:\labrad) node {$X_3$}; | |
270 \end{tikzpicture} | |
271 & | |
272 \parbox{0.5\linewidth}{ | |
273 \small | |
274 \begin{align*} | |
275 I_{1|23} &= H(X_1|X_2,X_3) \\ | |
276 I_{13|2} &= I(X_1;X_3|X_2) \\ | |
277 I_{1|23} + I_{13|2} &= H(X_1|X_2) \\ | |
278 I_{12|3} + I_{123} &= I(X_1;X_2) | |
279 \end{align*} | |
280 } | |
281 \end{tabular} | |
282 \caption{ | |
283 Venn diagram visualisation of entropies and mutual informations | |
284 for three random variables $X_1$, $X_2$ and $X_3$. The areas of | |
285 the three circles represent $H(X_1)$, $H(X_2)$ and $H(X_3)$ respectively. | |
286 The total shaded area is the joint entropy $H(X_1,X_2,X_3)$. | |
287 The central area $I_{123}$ is the co-information \cite{McGill1954}. | |
288 Some other information measures are indicated in the legend. | |
289 } | |
290 \end{fig} | |
291 [Adopting notation of recent Binding information paper.] | |
292 \subsection{'Anatomy of a bit' stuff} | |
293 Entropy rates, redundancy, predictive information etc. | |
294 Information diagrams. | |
295 | |
296 \begin{fig}{predinfo-bg} | |
297 \newcommand\subfig[2]{\shortstack{#2\\[0.75em]#1}} | |
298 \newcommand\rad{1.8em}% | |
299 \newcommand\ovoid[1]{% | |
300 ++(-#1,\rad) | |
301 -- ++(2 * #1,0em) arc (90:-90:\rad) | |
302 -- ++(-2 * #1,0em) arc (270:90:\rad) | |
303 }% | |
304 \newcommand\axis{2.75em}% | |
305 \newcommand\olap{0.85em}% | |
306 \newcommand\offs{3.6em} | |
307 \newcommand\colsep{\hspace{5em}} | |
308 \newcommand\longblob{\ovoid{\axis}} | |
309 \newcommand\shortblob{\ovoid{1.75em}} | |
310 \begin{tabular}{c@{\colsep}c} | |
311 \subfig{(a) excess entropy}{% | |
312 \newcommand\blob{\longblob} | |
313 \begin{tikzpicture} | |
314 \coordinate (p1) at (-\offs,0em); | |
315 \coordinate (p2) at (\offs,0em); | |
316 \begin{scope} | |
317 \clip (p1) \blob; | |
318 \clip (p2) \blob; | |
319 \fill[lightgray] (-1,-1) rectangle (1,1); | |
320 \end{scope} | |
321 \draw (p1) +(-0.5em,0em) node{\shortstack{infinite\\past}} \blob; | |
322 \draw (p2) +(0.5em,0em) node{\shortstack{infinite\\future}} \blob; | |
323 \path (0,0) node (future) {$E$}; | |
324 \path (p1) +(-2em,\rad) node [anchor=south] {$\ldots,X_{-1}$}; | |
325 \path (p2) +(2em,\rad) node [anchor=south] {$X_0,\ldots$}; | |
326 \end{tikzpicture}% | |
327 }% | |
328 \\[1.25em] | |
329 \subfig{(b) predictive information rate $b_\mu$}{% | |
330 \begin{tikzpicture}%[baseline=-1em] | |
331 \newcommand\rc{2.1em} | |
332 \newcommand\throw{2.5em} | |
333 \coordinate (p1) at (210:1.5em); | |
334 \coordinate (p2) at (90:0.7em); | |
335 \coordinate (p3) at (-30:1.5em); | |
336 \newcommand\bound{(-7em,-2.6em) rectangle (7em,3.0em)} | |
337 \newcommand\present{(p2) circle (\rc)} | |
338 \newcommand\thepast{(p1) ++(-\throw,0) \ovoid{\throw}} | |
339 \newcommand\future{(p3) ++(\throw,0) \ovoid{\throw}} | |
340 \newcommand\fillclipped[2]{% | |
341 \begin{scope}[even odd rule] | |
342 \foreach \thing in {#2} {\clip \thing;} | |
343 \fill[black!#1] \bound; | |
344 \end{scope}% | |
345 }% | |
346 \fillclipped{30}{\present,\future,\bound \thepast} | |
347 \fillclipped{15}{\present,\bound \future,\bound \thepast} | |
348 \draw \future; | |
349 \fillclipped{45}{\present,\thepast} | |
350 \draw \thepast; | |
351 \draw \present; | |
352 \node at (barycentric cs:p2=1,p1=-0.17,p3=-0.17) {$r_\mu$}; | |
353 \node at (barycentric cs:p1=-0.4,p2=1.0,p3=1) {$b_\mu$}; | |
354 \node at (barycentric cs:p3=0,p2=1,p1=1.2) [shape=rectangle,fill=black!45,inner sep=1pt]{$\rho_\mu$}; | |
355 \path (p2) +(140:3em) node {$X_0$}; | |
356 % \node at (barycentric cs:p3=0,p2=1,p1=1) {$\rho_\mu$}; | |
357 \path (p3) +(3em,0em) node {\shortstack{infinite\\future}}; | |
358 \path (p1) +(-3em,0em) node {\shortstack{infinite\\past}}; | |
359 \path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$}; | |
360 \path (p3) +(4em,\rad) node [anchor=south] {$X_1,\ldots$}; | |
361 \end{tikzpicture}}% | |
362 \\[0.5em] | |
363 \end{tabular} | |
364 \caption{ | |
365 Venn diagram representation of several information measures for | |
366 stationary random processes. Each circle or oval represents a random | |
367 variable or sequence of random variables relative to time $t=0$. Overlapped areas | |
368 correspond to various mutual information as in \Figrf{venn-example}. | |
369 In (c), the circle represents the `present'. Its total area is | |
370 $H(X_0)=H(1)=\rho_\mu+r_\mu+b_\mu$, where $\rho_\mu$ is the multi-information | |
371 rate, $r_\mu$ is the residual entropy rate, and $b_\mu$ is the predictive | |
372 information rate. The entropy rate is $h_\mu = r_\mu+b_\mu$. | |
373 } | |
374 \end{fig} | |
375 | |
376 \paragraph{Predictive information rate} | |
377 In previous work \cite{AbdallahPlumbley2009}, we introduced | |
378 % examined several | |
379 % information-theoretic measures that could be used to characterise | |
380 % not only random processes (\ie, an ensemble of possible sequences), | |
381 % but also the dynamic progress of specific realisations of such processes. | |
382 % One of these measures was | |
383 % | |
384 the \emph{predictive information rate} | |
385 (PIR), which is the average information | |
386 in one observation about the infinite future given the infinite past. | |
387 If $\past{X}_t=(\ldots,X_{t-2},X_{t-1})$ denotes the variables | |
388 before time $t$, | |
389 and $\fut{X}_t = (X_{t+1},X_{t+2},\ldots)$ denotes | |
390 those after $t$, | |
391 the PIR at time $t$ is defined as a conditional mutual information: | |
392 \begin{equation} | |
393 \label{eq:PIR} | |
394 \IXZ_t \define I(X_t;\fut{X}_t|\past{X}_t) = H(\fut{X}_t|\past{X}_t) - H(\fut{X}_t|X_t,\past{X}_t). | |
395 \end{equation} | |
396 % (The underline/overline notation follows that of \cite[\S 3]{AbdallahPlumbley2009}.) | |
397 % Hence, $\Ix_t$ quantifies the \emph{new} | |
398 % information gained about the future from the observation at time $t$. | |
399 Equation \eqrf{PIR} can be read as the average reduction | |
400 in uncertainty about the future on learning $X_t$, given the past. | |
401 Due to the symmetry of the mutual information, it can also be written | |
402 as | |
403 \begin{equation} | |
404 % \IXZ_t | |
405 I(X_t;\fut{X}_t|\past{X}_t) = H(X_t|\past{X}_t) - H(X_t|\fut{X}_t,\past{X}_t). | |
406 % \label{<++>} | |
407 \end{equation} | |
408 % If $X$ is stationary, then | |
409 Now, in the shift-invariant case, $H(X_t|\past{X}_t)$ | |
410 is the familiar entropy rate $h_\mu$, but $H(X_t|\fut{X}_t,\past{X}_t)$, | |
411 the conditional entropy of one variable given \emph{all} the others | |
412 in the sequence, future as well as past, is what | |
413 we called the \emph{residual entropy rate} $r_\mu$ in \cite{AbdallahPlumbley2010}, | |
414 but was previously identified by Verd{\'u} and Weissman \cite{VerduWeissman2006} as the | |
415 \emph{erasure entropy rate}. | |
416 % It is not expressible in terms of the block entropy function $H(\cdot)$. | |
417 It can be defined as the limit | |
418 \begin{equation} | |
419 \label{eq:residual-entropy-rate} | |
420 r_\mu \define \lim_{N\tends\infty} H(X_{\bet(-N,N)}) - H(X_{\bet(-N,-1)},X_{\bet(1,N)}). | |
421 \end{equation} | |
422 The second term, $H(X_{\bet(1,N)},X_{\bet(-N,-1)})$, | |
423 is the joint entropy of two non-adjacent blocks each of length $N$ with a | |
424 gap between them, | |
425 and cannot be expressed as a function of block entropies alone. | |
426 % In order to associate it with the concept of \emph{binding information} which | |
427 % we will define in \secrf{binding-info}, we | |
428 Thus, the shift-invariant PIR (which we will write as $b_\mu$) is the difference between | |
429 the entropy rate and the erasure entropy rate: $b_\mu = h_\mu - r_\mu$. | |
430 These relationships are illustrated in \Figrf{predinfo-bg}, along with | |
431 several of the information measures we have discussed so far. | |
432 | |
433 | |
434 \subsection{First order Markov chains} | |
435 These are the simplest non-trivial models to which information dynamics methods | |
436 can be applied. In \cite{AbdallahPlumbley2009} we, showed that the predictive information | |
437 rate can be expressed simply in terms of the entropy rate of the Markov chain. | |
438 If we let $a$ denote the transition matrix of the Markov chain, and $h_a$ it's | |
439 entropy rate, then its predictive information rate $b_a$ is | |
440 \begin{equation} | |
441 b_a = h_{a^2} - h_a, | |
442 \end{equation} | |
443 where $a^2 = aa$, the transition matrix squared, is the transition matrix | |
444 of the `skip one' Markov chain obtained by leaving out every other observation. | |
445 | |
446 \subsection{Higher order Markov chains} | |
447 Second and higher order Markov chains can be treated in a similar way by transforming | |
448 to a first order representation of the high order Markov chain. If we are dealing | |
449 with an $N$th order model, this is done forming a new alphabet of possible observations | |
450 consisting of all possible $N$-tuples of symbols from the base alphabet. An observation | |
451 in this new model represents a block of $N$ observations from the base model. The next | |
452 observation represents the block of $N$ obtained by shift the previous block along | |
453 by one step. The new Markov of chain is parameterised by a sparse $K^N\times K^N$ | |
454 transition matrix $\hat{a}$. | |
455 \begin{equation} | |
456 b_{\hat{a}} = h_{\hat{a}^{N+1}} - N h_{\hat{a}}, | |
457 \end{equation} | |
458 where $\hat{a}^{N+1}$ is the $N+1$th power of the transition matrix. | |
459 | |
213 | 460 |
214 | 461 |
215 \section{Information Dynamics in Analysis} | 462 \section{Information Dynamics in Analysis} |
216 | 463 |
217 \subsection{Musicological Analysis} | 464 \subsection{Musicological Analysis} |