comparison draft.tex @ 18:ca694f7dc3f9

added bib files.
author samer
date Fri, 09 Mar 2012 18:45:55 +0000
parents e47aaea2ac28
children 739b2444a4ac
comparison
equal deleted inserted replaced
17:e47aaea2ac28 18:ca694f7dc3f9
1 \documentclass[conference]{IEEEtran} 1 \documentclass[conference,a4paper]{IEEEtran}
2 \usepackage{cite} 2 \usepackage{cite}
3 \usepackage[cmex10]{amsmath} 3 \usepackage[cmex10]{amsmath}
4 \usepackage{graphicx} 4 \usepackage{graphicx}
5 \usepackage{amssymb} 5 \usepackage{amssymb}
6 \usepackage{epstopdf} 6 \usepackage{epstopdf}
7 \usepackage{url} 7 \usepackage{url}
8 \usepackage{listings} 8 \usepackage{listings}
9 %\usepackage[expectangle]{tools}
9 \usepackage{tools} 10 \usepackage{tools}
11 \usepackage{tikz}
12 \usetikzlibrary{calc}
13 \usetikzlibrary{matrix}
14 \usetikzlibrary{patterns}
15 \usetikzlibrary{arrows}
10 16
11 \let\citep=\cite 17 \let\citep=\cite
12 \def\squash{} 18 \newcommand{\colfig}[2][1]{\includegraphics[width=#1\linewidth]{figs/#2}}%
19 \newcommand\preals{\reals_+}
20 \newcommand\X{\mathcal{X}}
21 \newcommand\Y{\mathcal{Y}}
22 \newcommand\domS{\mathcal{S}}
23 \newcommand\A{\mathcal{A}}
24 \newcommand\rvm[1]{\mathrm{#1}}
25 \newcommand\sps{\,.\,}
26 \newcommand\Ipred{\mathcal{I}_{\mathrm{pred}}}
27 \newcommand\Ix{\mathcal{I}}
28 \newcommand\IXZ{\overline{\underline{\mathcal{I}}}}
29 \newcommand\x{\vec{x}}
30 \newcommand\Ham[1]{\mathcal{H}_{#1}}
31 \newcommand\subsets[2]{[#1]^{(k)}}
32 \def\bet(#1,#2){#1..#2}
33
34
35 \def\ev(#1=#2){#1\!\!=\!#2}
36 \newcommand\rv[1]{\Omega \to #1}
37 \newcommand\ceq{\!\!=\!}
38 \newcommand\cmin{\!-\!}
39 \newcommand\modulo[2]{#1\!\!\!\!\!\mod#2}
40
41 \newcommand\sumitoN{\sum_{i=1}^N}
42 \newcommand\sumktoK{\sum_{k=1}^K}
43 \newcommand\sumjtoK{\sum_{j=1}^K}
44 \newcommand\sumalpha{\sum_{\alpha\in\A}}
45 \newcommand\prodktoK{\prod_{k=1}^K}
46 \newcommand\prodjtoK{\prod_{j=1}^K}
47
48 \newcommand\past[1]{\overset{\rule{0pt}{0.2em}\smash{\leftarrow}}{#1}}
49 \newcommand\fut[1]{\overset{\rule{0pt}{0.1em}\smash{\rightarrow}}{#1}}
50 \newcommand\parity[2]{P^{#1}_{2,#2}}
13 51
14 %\usepackage[parfill]{parskip} 52 %\usepackage[parfill]{parskip}
15 53
16 \begin{document} 54 \begin{document}
17 \title{Cognitive Music Modelling: an Information Dynamics Approach} 55 \title{Cognitive Music Modelling: an Information Dynamics Approach}
23 Queen Mary University of London\\ 61 Queen Mary University of London\\
24 Mile End Road, London E1 4NS\\ 62 Mile End Road, London E1 4NS\\
25 Email:}} 63 Email:}}
26 64
27 \maketitle 65 \maketitle
28 \begin{abstract}People take in information when perceiving music. With it they continually build predictive models of what is going to happen. There is a relationship between information measures and how we perceive music. An information theoretic approach to music cognition is thus a fruitful avenue of research. 66 \begin{abstract}
67 People take in information when perceiving music. With it they continually
68 build predictive models of what is going to happen. There is a relationship
69 between information measures and how we perceive music. An information
70 theoretic approach to music cognition is thus a fruitful avenue of research.
71 In this paper, we review the theoretical foundations of information dynamics
72 and discuss a few emerging areas of application.
29 \end{abstract} 73 \end{abstract}
30 74
31 75
32 \section{Expectation and surprise in music} 76 \section{Expectation and surprise in music}
33 \label{s:Intro} 77 \label{s:Intro}
34 78
35 One of the more salient effects of listening to music is to create 79 One of the effects of listening to music is to create
36 \emph{expectations} of what is to come next, which may be fulfilled 80 expectations of what is to come next, which may be fulfilled
37 immediately, after some delay, or not at all as the case may be. 81 immediately, after some delay, or not at all as the case may be.
38 This is the thesis put forward by, amongst others, music theorists 82 This is the thesis put forward by, amongst others, music theorists
39 L. B. Meyer \cite{Meyer67} and Narmour \citep{Narmour77}. 83 L. B. Meyer \cite{Meyer67} and Narmour \citep{Narmour77}, but was
40 In fact, %the gist of 84 recognised much earlier; for example,
41 this insight predates Meyer quite considerably; for example,
42 it was elegantly put by Hanslick \cite{Hanslick1854} in the 85 it was elegantly put by Hanslick \cite{Hanslick1854} in the
43 nineteenth century: 86 nineteenth century:
44 \begin{quote} 87 \begin{quote}
45 `The most important factor in the mental process which accompanies the 88 `The most important factor in the mental process which accompanies the
46 act of listening to music, and which converts it to a source of pleasure, 89 act of listening to music, and which converts it to a source of pleasure,
47 is %\ldots 90 is \ldots the intellectual satisfaction
48 frequently overlooked. We here refer to the intellectual satisfaction
49 which the listener derives from continually following and anticipating 91 which the listener derives from continually following and anticipating
50 the composer's intentions---now, to see his expectations fulfilled, and 92 the composer's intentions---now, to see his expectations fulfilled, and
51 now, to find himself agreeably mistaken. It is a matter of course that 93 now, to find himself agreeably mistaken.
52 this intellectual flux and reflux, this perpetual giving and receiving 94 %It is a matter of course that
53 takes place unconsciously, and with the rapidity of lightning-flashes.' 95 %this intellectual flux and reflux, this perpetual giving and receiving
96 %takes place unconsciously, and with the rapidity of lightning-flashes.'
54 \end{quote} 97 \end{quote}
55
56 An essential aspect of this is that music is experienced as a phenomenon 98 An essential aspect of this is that music is experienced as a phenomenon
57 that `unfolds' in time, rather than being apprehended as a static object 99 that `unfolds' in time, rather than being apprehended as a static object
58 presented in its entirety. Meyer argued that musical experience depends 100 presented in its entirety. Meyer argued that musical experience depends
59 on how we change and revise our conceptions \emph{as events happen}, on 101 on how we change and revise our conceptions \emph{as events happen}, on
60 how expectation and prediction interact with occurrence, and that, to a 102 how expectation and prediction interact with occurrence, and that, to a
64 The business of making predictions and assessing surprise is essentially 106 The business of making predictions and assessing surprise is essentially
65 one of reasoning under conditions of uncertainty and manipulating 107 one of reasoning under conditions of uncertainty and manipulating
66 degrees of belief about the various proposition which may or may not 108 degrees of belief about the various proposition which may or may not
67 hold, and, as has been argued elsewhere \cite{Cox1946,Jaynes27}, best 109 hold, and, as has been argued elsewhere \cite{Cox1946,Jaynes27}, best
68 quantified in terms of Bayesian probability theory. 110 quantified in terms of Bayesian probability theory.
69 % Thus, we assume that musical schemata are encoded as probabilistic %
70 %\citep{Meyer56} models, and
71 Thus, we suppose that 111 Thus, we suppose that
72 when we listen to music, expectations are created on the basis of our 112 when we listen to music, expectations are created on the basis of our
73 familiarity with various stylistic norms %, that is, using models that 113 familiarity with various stylistic norms %, that is, using models that
74 encode the statistics of music in general, the particular styles of 114 encode the statistics of music in general, the particular styles of
75 music that seem best to fit the piece we happen to be listening to, and 115 music that seem best to fit the piece we happen to be listening to, and
76 the emerging structures peculiar to the current piece. There is 116 the emerging structures peculiar to the current piece. There is
77 experimental evidence that human listeners are able to internalise 117 experimental evidence that human listeners are able to internalise
78 statistical knowledge about musical structure, \eg 118 statistical knowledge about musical structure, \eg
79 \citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also 119 \citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also
80 that statistical models can form an effective basis for computational 120 that statistical models can form an effective basis for computational
81 % analysis of music, \eg \cite{Pearce2005}.
82 analysis of music, \eg 121 analysis of music, \eg
83 \cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}. 122 \cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}.
84 % \cite{Ferrand2002}. Dubnov and Assayag PSTs? 123
85
86 \squash
87 \subsection{Music and information theory} 124 \subsection{Music and information theory}
88 Given a probabilistic framework for music modelling and prediction, 125 Given a probabilistic framework for music modelling and prediction,
89 it is a small step to apply quantitative information theory \cite{Shannon48} to 126 it is a small step to apply quantitative information theory \cite{Shannon48} to
90 the models at hand. 127 the models at hand.
91 The relationship between information theory and music and art in general has been the 128 The relationship between information theory and music and art in general has been the
121 usefully be said to exist is the subject of some debate, with advocates of 158 usefully be said to exist is the subject of some debate, with advocates of
122 subjective probabilities including de Finetti \cite{deFinetti}. 159 subjective probabilities including de Finetti \cite{deFinetti}.
123 Accordingly, we will treat the concept of a `true' or `objective' probability 160 Accordingly, we will treat the concept of a `true' or `objective' probability
124 models with a grain of salt and not rely on them in our 161 models with a grain of salt and not rely on them in our
125 theoretical development.}% 162 theoretical development.}%
126 % since probabilities are almost always a function of the state of knowledge of the observer
127 or from simple statistical analyses such as 163 or from simple statistical analyses such as
128 computing emprical distributions. Our approach is explicitly to consider the role 164 computing emprical distributions. Our approach is explicitly to consider the role
129 of the observer in perception, and more specifically, to consider estimates of 165 of the observer in perception, and more specifically, to consider estimates of
130 entropy \etc with respect to \emph{subjective} probabilities. 166 entropy \etc with respect to \emph{subjective} probabilities.
131 % !!REV - DONE - explain use of quoted `objective'
132
133 % !!REV - previous work on information theory in music
134 More recent work on using information theoretic concepts to analyse music in
135 includes Simon's \cite{Simon2005} assessments of the entropy of
136 Jazz improvisations and Dubnov's
137 \cite{Dubnov2006,DubnovMcAdamsReynolds2006,Dubnov2008}
138 investigations of the `information rate' of musical processes, which is related
139 to the notion of redundancy in a communications channel.
140 Dubnov's work in particular is informed by similar concerns to our own
141 and we will discuss the relationship between it and our work at
142 several points later in this paper
143 (see \secrf{Redundancy}, \secrf{methods} and \secrf{RelatedWork}).
144
145
146 % !!REV - DONE - rephrase, check grammar (now there are too many 'one's!)
147 \squash
148 \subsection{Information dynamic approach} 167 \subsection{Information dynamic approach}
149 168
150 Bringing the various strands together, our working hypothesis is that 169 Bringing the various strands together, our working hypothesis is that
151 as a listener (to which will refer gender neutrally as `it') 170 as a listener (to which will refer gender neutrally as `it')
152 listens to a piece of music, it maintains a dynamically evolving statistical 171 listens to a piece of music, it maintains a dynamically evolving statistical
161 By tracing the evolution of a these measures, we obtain a representation 180 By tracing the evolution of a these measures, we obtain a representation
162 which captures much of the significant structure of the 181 which captures much of the significant structure of the
163 music. 182 music.
164 This approach has a number of features which we list below. 183 This approach has a number of features which we list below.
165 184
166 (1) \emph{Abstraction}: 185 \emph{Abstraction}:
167 Because it is sensitive mainly to \emph{patterns} of occurence, 186 Because it is sensitive mainly to \emph{patterns} of occurence,
168 rather the details of which specific things occur, 187 rather the details of which specific things occur,
169 it operates at a level of abstraction removed from the details of the sensory 188 it operates at a level of abstraction removed from the details of the sensory
170 experience and the medium through which it was received, suggesting that the 189 experience and the medium through which it was received, suggesting that the
171 same approach could, in principle, be used to analyse and compare information 190 same approach could, in principle, be used to analyse and compare information
172 flow in different temporal media regardless of whether they are auditory, 191 flow in different temporal media regardless of whether they are auditory,
173 visual or otherwise. 192 visual or otherwise.
174 193
175 (2) \emph{Generality}: 194 \emph{Generality} applicable to any probabilistic model.
176 This approach does not proscribe which probabilistic models should be used---the 195
177 choice can be guided by standard model selection criteria such as Bayes 196 \emph{Subjectivity}:
178 factors \cite{KassRaftery1995}, \etc
179
180 (3) \emph{Richness}:
181 It may be effective to use a model with time-dependent latent
182 variables, such as a hidden Markov model. In these cases, we can track changes
183 in beliefs about the hidden variables as well as the observed ones, adding
184 another layer of richness to the description while maintaining the same
185 level of abstraction.
186 For example, harmony (\ie, the `current chord') in music is not stated explicitly, but rather
187 must be inferred from the musical surface; nonetheless, a sense of harmonic
188 progression is an important aspect of many styles of music.
189
190 (4) \emph{Subjectivity}:
191 Since the analysis is dependent on the probability model the observer brings to the 197 Since the analysis is dependent on the probability model the observer brings to the
192 problem, which may depend on prior experience or other factors, and which may change 198 problem, which may depend on prior experience or other factors, and which may change
193 over time, inter-subject variablity and variation in subjects' responses over time are 199 over time, inter-subject variablity and variation in subjects' responses over time are
194 fundamental to the theory. It is essentially a theory of subjective response 200 fundamental to the theory. It is essentially a theory of subjective response
195 201
196 % !!REV - clarify aims of paper. 202 %modelling the creative process, which often alternates between generative
197 Having outlined the basic ideas, our aims in pursuing this line of thought 203 %and selective or evaluative phases \cite{Boden1990}, and would have
198 are threefold: firstly, to propose dynamic information-based measures which 204 %applications in tools for computer aided composition.
199 are coherent from a theoretical point of view and consistent with the general 205
200 principles of probabilistic inference, with possible applications in 206
201 regulating machine learning systems; 207 \section{Theoretical review}
202 % when heuristics are required to manage intractible models or limited computational resources. 208
203 secondly, to construct computational models of what human brains are doing 209 In this section, we summarise the definitions of some of the relevant quantities
204 in response to music, on the basis that our brains implement, or at least 210 in information dynamics and show how they can be computed in some simple probabilistic
205 approximate, optimal probabilistic inference under the relevant constraints; 211 models (namely, first and higher-order Markov chains, and Gaussian processes [Peter?]).
206 and thirdly, to construct a computational model of a certain restricted 212
207 field of aesthetic judgements (namely judgements related to formal structure) 213 \begin{fig}{venn-example}
208 that may shed light on what makes a stimulus interesting or aesthetically 214 \newcommand\rad{2.2em}%
209 pleasing. This would be of particular relevance to understanding and 215 \newcommand\circo{circle (3.4em)}%
210 modelling the creative process, which often alternates between generative 216 \newcommand\labrad{4.3em}
211 and selective or evaluative phases \cite{Boden1990}, and would have 217 \newcommand\bound{(-6em,-5em) rectangle (6em,6em)}
212 applications in tools for computer aided composition. 218 \newcommand\colsep{\ }
219 \newcommand\clipin[1]{\clip (#1) \circo;}%
220 \newcommand\clipout[1]{\clip \bound (#1) \circo;}%
221 \newcommand\cliptwo[3]{%
222 \begin{scope}
223 \clipin{#1};
224 \clipin{#2};
225 \clipout{#3};
226 \fill[black!30] \bound;
227 \end{scope}
228 }%
229 \newcommand\clipone[3]{%
230 \begin{scope}
231 \clipin{#1};
232 \clipout{#2};
233 \clipout{#3};
234 \fill[black!15] \bound;
235 \end{scope}
236 }%
237 \begin{tabular}{c@{\colsep}c}
238 \begin{tikzpicture}[baseline=0pt]
239 \coordinate (p1) at (90:\rad);
240 \coordinate (p2) at (210:\rad);
241 \coordinate (p3) at (-30:\rad);
242 \clipone{p1}{p2}{p3};
243 \clipone{p2}{p3}{p1};
244 \clipone{p3}{p1}{p2};
245 \cliptwo{p1}{p2}{p3};
246 \cliptwo{p2}{p3}{p1};
247 \cliptwo{p3}{p1}{p2};
248 \begin{scope}
249 \clip (p1) \circo;
250 \clip (p2) \circo;
251 \clip (p3) \circo;
252 \fill[black!45] \bound;
253 \end{scope}
254 \draw (p1) \circo;
255 \draw (p2) \circo;
256 \draw (p3) \circo;
257 \path
258 (barycentric cs:p3=1,p1=-0.2,p2=-0.1) +(0ex,0) node {$I_{3|12}$}
259 (barycentric cs:p1=1,p2=-0.2,p3=-0.1) +(0ex,0) node {$I_{1|23}$}
260 (barycentric cs:p2=1,p3=-0.2,p1=-0.1) +(0ex,0) node {$I_{2|13}$}
261 (barycentric cs:p3=1,p2=1,p1=-0.55) +(0ex,0) node {$I_{23|1}$}
262 (barycentric cs:p1=1,p3=1,p2=-0.55) +(0ex,0) node {$I_{13|2}$}
263 (barycentric cs:p2=1,p1=1,p3=-0.55) +(0ex,0) node {$I_{12|3}$}
264 (barycentric cs:p3=1,p2=1,p1=1) node {$I_{123}$}
265 ;
266 \path
267 (p1) +(140:\labrad) node {$X_1$}
268 (p2) +(-140:\labrad) node {$X_2$}
269 (p3) +(-40:\labrad) node {$X_3$};
270 \end{tikzpicture}
271 &
272 \parbox{0.5\linewidth}{
273 \small
274 \begin{align*}
275 I_{1|23} &= H(X_1|X_2,X_3) \\
276 I_{13|2} &= I(X_1;X_3|X_2) \\
277 I_{1|23} + I_{13|2} &= H(X_1|X_2) \\
278 I_{12|3} + I_{123} &= I(X_1;X_2)
279 \end{align*}
280 }
281 \end{tabular}
282 \caption{
283 Venn diagram visualisation of entropies and mutual informations
284 for three random variables $X_1$, $X_2$ and $X_3$. The areas of
285 the three circles represent $H(X_1)$, $H(X_2)$ and $H(X_3)$ respectively.
286 The total shaded area is the joint entropy $H(X_1,X_2,X_3)$.
287 The central area $I_{123}$ is the co-information \cite{McGill1954}.
288 Some other information measures are indicated in the legend.
289 }
290 \end{fig}
291 [Adopting notation of recent Binding information paper.]
292 \subsection{'Anatomy of a bit' stuff}
293 Entropy rates, redundancy, predictive information etc.
294 Information diagrams.
295
296 \begin{fig}{predinfo-bg}
297 \newcommand\subfig[2]{\shortstack{#2\\[0.75em]#1}}
298 \newcommand\rad{1.8em}%
299 \newcommand\ovoid[1]{%
300 ++(-#1,\rad)
301 -- ++(2 * #1,0em) arc (90:-90:\rad)
302 -- ++(-2 * #1,0em) arc (270:90:\rad)
303 }%
304 \newcommand\axis{2.75em}%
305 \newcommand\olap{0.85em}%
306 \newcommand\offs{3.6em}
307 \newcommand\colsep{\hspace{5em}}
308 \newcommand\longblob{\ovoid{\axis}}
309 \newcommand\shortblob{\ovoid{1.75em}}
310 \begin{tabular}{c@{\colsep}c}
311 \subfig{(a) excess entropy}{%
312 \newcommand\blob{\longblob}
313 \begin{tikzpicture}
314 \coordinate (p1) at (-\offs,0em);
315 \coordinate (p2) at (\offs,0em);
316 \begin{scope}
317 \clip (p1) \blob;
318 \clip (p2) \blob;
319 \fill[lightgray] (-1,-1) rectangle (1,1);
320 \end{scope}
321 \draw (p1) +(-0.5em,0em) node{\shortstack{infinite\\past}} \blob;
322 \draw (p2) +(0.5em,0em) node{\shortstack{infinite\\future}} \blob;
323 \path (0,0) node (future) {$E$};
324 \path (p1) +(-2em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
325 \path (p2) +(2em,\rad) node [anchor=south] {$X_0,\ldots$};
326 \end{tikzpicture}%
327 }%
328 \\[1.25em]
329 \subfig{(b) predictive information rate $b_\mu$}{%
330 \begin{tikzpicture}%[baseline=-1em]
331 \newcommand\rc{2.1em}
332 \newcommand\throw{2.5em}
333 \coordinate (p1) at (210:1.5em);
334 \coordinate (p2) at (90:0.7em);
335 \coordinate (p3) at (-30:1.5em);
336 \newcommand\bound{(-7em,-2.6em) rectangle (7em,3.0em)}
337 \newcommand\present{(p2) circle (\rc)}
338 \newcommand\thepast{(p1) ++(-\throw,0) \ovoid{\throw}}
339 \newcommand\future{(p3) ++(\throw,0) \ovoid{\throw}}
340 \newcommand\fillclipped[2]{%
341 \begin{scope}[even odd rule]
342 \foreach \thing in {#2} {\clip \thing;}
343 \fill[black!#1] \bound;
344 \end{scope}%
345 }%
346 \fillclipped{30}{\present,\future,\bound \thepast}
347 \fillclipped{15}{\present,\bound \future,\bound \thepast}
348 \draw \future;
349 \fillclipped{45}{\present,\thepast}
350 \draw \thepast;
351 \draw \present;
352 \node at (barycentric cs:p2=1,p1=-0.17,p3=-0.17) {$r_\mu$};
353 \node at (barycentric cs:p1=-0.4,p2=1.0,p3=1) {$b_\mu$};
354 \node at (barycentric cs:p3=0,p2=1,p1=1.2) [shape=rectangle,fill=black!45,inner sep=1pt]{$\rho_\mu$};
355 \path (p2) +(140:3em) node {$X_0$};
356 % \node at (barycentric cs:p3=0,p2=1,p1=1) {$\rho_\mu$};
357 \path (p3) +(3em,0em) node {\shortstack{infinite\\future}};
358 \path (p1) +(-3em,0em) node {\shortstack{infinite\\past}};
359 \path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
360 \path (p3) +(4em,\rad) node [anchor=south] {$X_1,\ldots$};
361 \end{tikzpicture}}%
362 \\[0.5em]
363 \end{tabular}
364 \caption{
365 Venn diagram representation of several information measures for
366 stationary random processes. Each circle or oval represents a random
367 variable or sequence of random variables relative to time $t=0$. Overlapped areas
368 correspond to various mutual information as in \Figrf{venn-example}.
369 In (c), the circle represents the `present'. Its total area is
370 $H(X_0)=H(1)=\rho_\mu+r_\mu+b_\mu$, where $\rho_\mu$ is the multi-information
371 rate, $r_\mu$ is the residual entropy rate, and $b_\mu$ is the predictive
372 information rate. The entropy rate is $h_\mu = r_\mu+b_\mu$.
373 }
374 \end{fig}
375
376 \paragraph{Predictive information rate}
377 In previous work \cite{AbdallahPlumbley2009}, we introduced
378 % examined several
379 % information-theoretic measures that could be used to characterise
380 % not only random processes (\ie, an ensemble of possible sequences),
381 % but also the dynamic progress of specific realisations of such processes.
382 % One of these measures was
383 %
384 the \emph{predictive information rate}
385 (PIR), which is the average information
386 in one observation about the infinite future given the infinite past.
387 If $\past{X}_t=(\ldots,X_{t-2},X_{t-1})$ denotes the variables
388 before time $t$,
389 and $\fut{X}_t = (X_{t+1},X_{t+2},\ldots)$ denotes
390 those after $t$,
391 the PIR at time $t$ is defined as a conditional mutual information:
392 \begin{equation}
393 \label{eq:PIR}
394 \IXZ_t \define I(X_t;\fut{X}_t|\past{X}_t) = H(\fut{X}_t|\past{X}_t) - H(\fut{X}_t|X_t,\past{X}_t).
395 \end{equation}
396 % (The underline/overline notation follows that of \cite[\S 3]{AbdallahPlumbley2009}.)
397 % Hence, $\Ix_t$ quantifies the \emph{new}
398 % information gained about the future from the observation at time $t$.
399 Equation \eqrf{PIR} can be read as the average reduction
400 in uncertainty about the future on learning $X_t$, given the past.
401 Due to the symmetry of the mutual information, it can also be written
402 as
403 \begin{equation}
404 % \IXZ_t
405 I(X_t;\fut{X}_t|\past{X}_t) = H(X_t|\past{X}_t) - H(X_t|\fut{X}_t,\past{X}_t).
406 % \label{<++>}
407 \end{equation}
408 % If $X$ is stationary, then
409 Now, in the shift-invariant case, $H(X_t|\past{X}_t)$
410 is the familiar entropy rate $h_\mu$, but $H(X_t|\fut{X}_t,\past{X}_t)$,
411 the conditional entropy of one variable given \emph{all} the others
412 in the sequence, future as well as past, is what
413 we called the \emph{residual entropy rate} $r_\mu$ in \cite{AbdallahPlumbley2010},
414 but was previously identified by Verd{\'u} and Weissman \cite{VerduWeissman2006} as the
415 \emph{erasure entropy rate}.
416 % It is not expressible in terms of the block entropy function $H(\cdot)$.
417 It can be defined as the limit
418 \begin{equation}
419 \label{eq:residual-entropy-rate}
420 r_\mu \define \lim_{N\tends\infty} H(X_{\bet(-N,N)}) - H(X_{\bet(-N,-1)},X_{\bet(1,N)}).
421 \end{equation}
422 The second term, $H(X_{\bet(1,N)},X_{\bet(-N,-1)})$,
423 is the joint entropy of two non-adjacent blocks each of length $N$ with a
424 gap between them,
425 and cannot be expressed as a function of block entropies alone.
426 % In order to associate it with the concept of \emph{binding information} which
427 % we will define in \secrf{binding-info}, we
428 Thus, the shift-invariant PIR (which we will write as $b_\mu$) is the difference between
429 the entropy rate and the erasure entropy rate: $b_\mu = h_\mu - r_\mu$.
430 These relationships are illustrated in \Figrf{predinfo-bg}, along with
431 several of the information measures we have discussed so far.
432
433
434 \subsection{First order Markov chains}
435 These are the simplest non-trivial models to which information dynamics methods
436 can be applied. In \cite{AbdallahPlumbley2009} we, showed that the predictive information
437 rate can be expressed simply in terms of the entropy rate of the Markov chain.
438 If we let $a$ denote the transition matrix of the Markov chain, and $h_a$ it's
439 entropy rate, then its predictive information rate $b_a$ is
440 \begin{equation}
441 b_a = h_{a^2} - h_a,
442 \end{equation}
443 where $a^2 = aa$, the transition matrix squared, is the transition matrix
444 of the `skip one' Markov chain obtained by leaving out every other observation.
445
446 \subsection{Higher order Markov chains}
447 Second and higher order Markov chains can be treated in a similar way by transforming
448 to a first order representation of the high order Markov chain. If we are dealing
449 with an $N$th order model, this is done forming a new alphabet of possible observations
450 consisting of all possible $N$-tuples of symbols from the base alphabet. An observation
451 in this new model represents a block of $N$ observations from the base model. The next
452 observation represents the block of $N$ obtained by shift the previous block along
453 by one step. The new Markov of chain is parameterised by a sparse $K^N\times K^N$
454 transition matrix $\hat{a}$.
455 \begin{equation}
456 b_{\hat{a}} = h_{\hat{a}^{N+1}} - N h_{\hat{a}},
457 \end{equation}
458 where $\hat{a}^{N+1}$ is the $N+1$th power of the transition matrix.
459
213 460
214 461
215 \section{Information Dynamics in Analysis} 462 \section{Information Dynamics in Analysis}
216 463
217 \subsection{Musicological Analysis} 464 \subsection{Musicological Analysis}