samer@41
|
1 \documentclass[conference]{IEEEtran}
|
samer@4
|
2 \usepackage{cite}
|
samer@4
|
3 \usepackage[cmex10]{amsmath}
|
samer@4
|
4 \usepackage{graphicx}
|
samer@4
|
5 \usepackage{amssymb}
|
samer@4
|
6 \usepackage{epstopdf}
|
samer@4
|
7 \usepackage{url}
|
samer@4
|
8 \usepackage{listings}
|
samer@18
|
9 %\usepackage[expectangle]{tools}
|
samer@9
|
10 \usepackage{tools}
|
samer@18
|
11 \usepackage{tikz}
|
samer@18
|
12 \usetikzlibrary{calc}
|
samer@18
|
13 \usetikzlibrary{matrix}
|
samer@18
|
14 \usetikzlibrary{patterns}
|
samer@18
|
15 \usetikzlibrary{arrows}
|
samer@9
|
16
|
samer@9
|
17 \let\citep=\cite
|
samer@33
|
18 \newcommand{\colfig}[2][1]{\includegraphics[width=#1\linewidth]{figs/#2}}%
|
samer@18
|
19 \newcommand\preals{\reals_+}
|
samer@18
|
20 \newcommand\X{\mathcal{X}}
|
samer@18
|
21 \newcommand\Y{\mathcal{Y}}
|
samer@18
|
22 \newcommand\domS{\mathcal{S}}
|
samer@18
|
23 \newcommand\A{\mathcal{A}}
|
samer@25
|
24 \newcommand\Data{\mathcal{D}}
|
samer@18
|
25 \newcommand\rvm[1]{\mathrm{#1}}
|
samer@18
|
26 \newcommand\sps{\,.\,}
|
samer@18
|
27 \newcommand\Ipred{\mathcal{I}_{\mathrm{pred}}}
|
samer@18
|
28 \newcommand\Ix{\mathcal{I}}
|
samer@18
|
29 \newcommand\IXZ{\overline{\underline{\mathcal{I}}}}
|
samer@18
|
30 \newcommand\x{\vec{x}}
|
samer@18
|
31 \newcommand\Ham[1]{\mathcal{H}_{#1}}
|
samer@18
|
32 \newcommand\subsets[2]{[#1]^{(k)}}
|
samer@18
|
33 \def\bet(#1,#2){#1..#2}
|
samer@18
|
34
|
samer@18
|
35
|
samer@18
|
36 \def\ev(#1=#2){#1\!\!=\!#2}
|
samer@18
|
37 \newcommand\rv[1]{\Omega \to #1}
|
samer@18
|
38 \newcommand\ceq{\!\!=\!}
|
samer@18
|
39 \newcommand\cmin{\!-\!}
|
samer@18
|
40 \newcommand\modulo[2]{#1\!\!\!\!\!\mod#2}
|
samer@18
|
41
|
samer@18
|
42 \newcommand\sumitoN{\sum_{i=1}^N}
|
samer@18
|
43 \newcommand\sumktoK{\sum_{k=1}^K}
|
samer@18
|
44 \newcommand\sumjtoK{\sum_{j=1}^K}
|
samer@18
|
45 \newcommand\sumalpha{\sum_{\alpha\in\A}}
|
samer@18
|
46 \newcommand\prodktoK{\prod_{k=1}^K}
|
samer@18
|
47 \newcommand\prodjtoK{\prod_{j=1}^K}
|
samer@18
|
48
|
samer@18
|
49 \newcommand\past[1]{\overset{\rule{0pt}{0.2em}\smash{\leftarrow}}{#1}}
|
samer@18
|
50 \newcommand\fut[1]{\overset{\rule{0pt}{0.1em}\smash{\rightarrow}}{#1}}
|
samer@18
|
51 \newcommand\parity[2]{P^{#1}_{2,#2}}
|
samer@4
|
52
|
samer@4
|
53 %\usepackage[parfill]{parskip}
|
samer@4
|
54
|
samer@4
|
55 \begin{document}
|
samer@41
|
56 \title{Cognitive Music Modelling: an\\Information Dynamics Approach}
|
samer@4
|
57
|
samer@4
|
58 \author{
|
hekeus@16
|
59 \IEEEauthorblockN{Samer A. Abdallah, Henrik Ekeus, Peter Foster}
|
hekeus@16
|
60 \IEEEauthorblockN{Andrew Robertson and Mark D. Plumbley}
|
samer@4
|
61 \IEEEauthorblockA{Centre for Digital Music\\
|
samer@4
|
62 Queen Mary University of London\\
|
samer@41
|
63 Mile End Road, London E1 4NS}}
|
samer@4
|
64
|
samer@4
|
65 \maketitle
|
samer@18
|
66 \begin{abstract}
|
samer@18
|
67 People take in information when perceiving music. With it they continually
|
samer@18
|
68 build predictive models of what is going to happen. There is a relationship
|
samer@18
|
69 between information measures and how we perceive music. An information
|
samer@18
|
70 theoretic approach to music cognition is thus a fruitful avenue of research.
|
samer@18
|
71 In this paper, we review the theoretical foundations of information dynamics
|
samer@18
|
72 and discuss a few emerging areas of application.
|
hekeus@16
|
73 \end{abstract}
|
samer@4
|
74
|
samer@4
|
75
|
samer@25
|
76 \section{Introduction}
|
samer@9
|
77 \label{s:Intro}
|
samer@9
|
78
|
samer@25
|
79 \subsection{Expectation and surprise in music}
|
samer@18
|
80 One of the effects of listening to music is to create
|
samer@18
|
81 expectations of what is to come next, which may be fulfilled
|
samer@9
|
82 immediately, after some delay, or not at all as the case may be.
|
samer@9
|
83 This is the thesis put forward by, amongst others, music theorists
|
samer@18
|
84 L. B. Meyer \cite{Meyer67} and Narmour \citep{Narmour77}, but was
|
samer@18
|
85 recognised much earlier; for example,
|
samer@9
|
86 it was elegantly put by Hanslick \cite{Hanslick1854} in the
|
samer@9
|
87 nineteenth century:
|
samer@9
|
88 \begin{quote}
|
samer@9
|
89 `The most important factor in the mental process which accompanies the
|
samer@9
|
90 act of listening to music, and which converts it to a source of pleasure,
|
samer@18
|
91 is \ldots the intellectual satisfaction
|
samer@9
|
92 which the listener derives from continually following and anticipating
|
samer@9
|
93 the composer's intentions---now, to see his expectations fulfilled, and
|
samer@18
|
94 now, to find himself agreeably mistaken.
|
samer@18
|
95 %It is a matter of course that
|
samer@18
|
96 %this intellectual flux and reflux, this perpetual giving and receiving
|
samer@18
|
97 %takes place unconsciously, and with the rapidity of lightning-flashes.'
|
samer@9
|
98 \end{quote}
|
samer@9
|
99 An essential aspect of this is that music is experienced as a phenomenon
|
samer@9
|
100 that `unfolds' in time, rather than being apprehended as a static object
|
samer@9
|
101 presented in its entirety. Meyer argued that musical experience depends
|
samer@9
|
102 on how we change and revise our conceptions \emph{as events happen}, on
|
samer@9
|
103 how expectation and prediction interact with occurrence, and that, to a
|
samer@9
|
104 large degree, the way to understand the effect of music is to focus on
|
samer@9
|
105 this `kinetics' of expectation and surprise.
|
samer@9
|
106
|
samer@25
|
107 Prediction and expectation are essentially probabilistic concepts
|
samer@25
|
108 and can be treated mathematically using probability theory.
|
samer@25
|
109 We suppose that when we listen to music, expectations are created on the basis
|
samer@25
|
110 of our familiarity with various styles of music and our ability to
|
samer@25
|
111 detect and learn statistical regularities in the music as they emerge,
|
samer@25
|
112 There is experimental evidence that human listeners are able to internalise
|
samer@25
|
113 statistical knowledge about musical structure, \eg
|
samer@25
|
114 \citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also
|
samer@25
|
115 that statistical models can form an effective basis for computational
|
samer@25
|
116 analysis of music, \eg
|
samer@25
|
117 \cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}.
|
samer@25
|
118
|
samer@25
|
119
|
samer@25
|
120 \comment{
|
samer@9
|
121 The business of making predictions and assessing surprise is essentially
|
samer@9
|
122 one of reasoning under conditions of uncertainty and manipulating
|
samer@9
|
123 degrees of belief about the various proposition which may or may not
|
samer@9
|
124 hold, and, as has been argued elsewhere \cite{Cox1946,Jaynes27}, best
|
samer@9
|
125 quantified in terms of Bayesian probability theory.
|
samer@9
|
126 Thus, we suppose that
|
samer@9
|
127 when we listen to music, expectations are created on the basis of our
|
samer@24
|
128 familiarity with various stylistic norms that apply to music in general,
|
samer@24
|
129 the particular style (or styles) of music that seem best to fit the piece
|
samer@24
|
130 we are listening to, and
|
samer@9
|
131 the emerging structures peculiar to the current piece. There is
|
samer@9
|
132 experimental evidence that human listeners are able to internalise
|
samer@9
|
133 statistical knowledge about musical structure, \eg
|
samer@9
|
134 \citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also
|
samer@9
|
135 that statistical models can form an effective basis for computational
|
samer@9
|
136 analysis of music, \eg
|
samer@9
|
137 \cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}.
|
samer@25
|
138 }
|
samer@9
|
139
|
samer@9
|
140 \subsection{Music and information theory}
|
samer@24
|
141 With a probabilistic framework for music modelling and prediction in hand,
|
samer@25
|
142 we are in a position to apply Shannon's quantitative information theory
|
samer@25
|
143 \cite{Shannon48}.
|
samer@25
|
144 \comment{
|
samer@25
|
145 which provides us with a number of measures, such as entropy
|
samer@25
|
146 and mutual information, which are suitable for quantifying states of
|
samer@25
|
147 uncertainty and surprise, and thus could potentially enable us to build
|
samer@25
|
148 quantitative models of the listening process described above. They are
|
samer@25
|
149 what Berlyne \cite{Berlyne71} called `collative variables' since they are
|
samer@25
|
150 to do with patterns of occurrence rather than medium-specific details.
|
samer@25
|
151 Berlyne sought to show that the collative variables are closely related to
|
samer@25
|
152 perceptual qualities like complexity, tension, interestingness,
|
samer@25
|
153 and even aesthetic value, not just in music, but in other temporal
|
samer@25
|
154 or visual media.
|
samer@25
|
155 The relevance of information theory to music and art has
|
samer@25
|
156 also been addressed by researchers from the 1950s onwards
|
samer@25
|
157 \cite{Youngblood58,CoonsKraehenbuehl1958,Cohen1962,HillerBean66,Moles66,Meyer67}.
|
samer@25
|
158 }
|
samer@9
|
159 The relationship between information theory and music and art in general has been the
|
samer@9
|
160 subject of some interest since the 1950s
|
samer@9
|
161 \cite{Youngblood58,CoonsKraehenbuehl1958,HillerBean66,Moles66,Meyer67,Cohen1962}.
|
samer@9
|
162 The general thesis is that perceptible qualities and subjective
|
samer@9
|
163 states like uncertainty, surprise, complexity, tension, and interestingness
|
samer@9
|
164 are closely related to
|
samer@9
|
165 information-theoretic quantities like entropy, relative entropy,
|
samer@9
|
166 and mutual information.
|
samer@9
|
167 % and are major determinants of the overall experience.
|
samer@9
|
168 Berlyne \cite{Berlyne71} called such quantities `collative variables', since
|
samer@9
|
169 they are to do with patterns of occurrence rather than medium-specific details,
|
samer@9
|
170 and developed the ideas of `information aesthetics' in an experimental setting.
|
samer@9
|
171 % Berlyne's `new experimental aesthetics', the `information-aestheticians'.
|
samer@9
|
172
|
samer@9
|
173 % Listeners then experience greater or lesser levels of surprise
|
samer@9
|
174 % in response to departures from these norms.
|
samer@9
|
175 % By careful manipulation
|
samer@9
|
176 % of the material, the composer can thus define, and induce within the
|
samer@9
|
177 % listener, a temporal programme of varying
|
samer@9
|
178 % levels of uncertainty, ambiguity and surprise.
|
samer@9
|
179
|
samer@9
|
180
|
samer@9
|
181 \subsection{Information dynamic approach}
|
samer@9
|
182
|
samer@24
|
183 Bringing the various strands together, our working hypothesis is that as a
|
samer@24
|
184 listener (to which will refer as `it') listens to a piece of music, it maintains
|
samer@25
|
185 a dynamically evolving probabilistic model that enables it to make predictions
|
samer@24
|
186 about how the piece will continue, relying on both its previous experience
|
samer@24
|
187 of music and the immediate context of the piece. As events unfold, it revises
|
samer@25
|
188 its probabilistic belief state, which includes predictive
|
samer@25
|
189 distributions over possible future events. These
|
samer@25
|
190 % distributions and changes in distributions
|
samer@25
|
191 can be characterised in terms of a handful of information
|
samer@25
|
192 theoretic-measures such as entropy and relative entropy. By tracing the
|
samer@24
|
193 evolution of a these measures, we obtain a representation which captures much
|
samer@25
|
194 of the significant structure of the music.
|
samer@25
|
195
|
samer@25
|
196 One of the consequences of this approach is that regardless of the details of
|
samer@25
|
197 the sensory input or even which sensory modality is being processed, the resulting
|
samer@25
|
198 analysis is in terms of the same units: quantities of information (bits) and
|
samer@25
|
199 rates of information flow (bits per second). The probabilistic and information
|
samer@25
|
200 theoretic concepts in terms of which the analysis is framed are universal to all sorts
|
samer@25
|
201 of data.
|
samer@25
|
202 In addition, when adaptive probabilistic models are used, expectations are
|
samer@25
|
203 created mainly in response to to \emph{patterns} of occurence,
|
samer@25
|
204 rather the details of which specific things occur.
|
samer@25
|
205 Together, these suggest that an information dynamic analysis captures a
|
samer@25
|
206 high level of \emph{abstraction}, and could be used to
|
samer@25
|
207 make structural comparisons between different temporal media,
|
samer@25
|
208 such as music, film, animation, and dance.
|
samer@25
|
209 % analyse and compare information
|
samer@25
|
210 % flow in different temporal media regardless of whether they are auditory,
|
samer@25
|
211 % visual or otherwise.
|
samer@9
|
212
|
samer@25
|
213 Another consequence is that the information dynamic approach gives us a principled way
|
samer@24
|
214 to address the notion of \emph{subjectivity}, since the analysis is dependent on the
|
samer@24
|
215 probability model the observer starts off with, which may depend on prior experience
|
samer@24
|
216 or other factors, and which may change over time. Thus, inter-subject variablity and
|
samer@24
|
217 variation in subjects' responses over time are
|
samer@24
|
218 fundamental to the theory.
|
samer@9
|
219
|
samer@18
|
220 %modelling the creative process, which often alternates between generative
|
samer@18
|
221 %and selective or evaluative phases \cite{Boden1990}, and would have
|
samer@18
|
222 %applications in tools for computer aided composition.
|
samer@18
|
223
|
samer@18
|
224
|
samer@18
|
225 \section{Theoretical review}
|
samer@18
|
226
|
samer@34
|
227 \subsection{Entropy and information}
|
samer@41
|
228 \label{s:entro-info}
|
samer@41
|
229
|
samer@34
|
230 Let $X$ denote some variable whose value is initially unknown to our
|
samer@34
|
231 hypothetical observer. We will treat $X$ mathematically as a random variable,
|
samer@36
|
232 with a value to be drawn from some set $\X$ and a
|
samer@34
|
233 probability distribution representing the observer's beliefs about the
|
samer@34
|
234 true value of $X$.
|
samer@34
|
235 In this case, the observer's uncertainty about $X$ can be quantified
|
samer@34
|
236 as the entropy of the random variable $H(X)$. For a discrete variable
|
samer@36
|
237 with probability mass function $p:\X \to [0,1]$, this is
|
samer@34
|
238 \begin{equation}
|
samer@41
|
239 H(X) = \sum_{x\in\X} -p(x) \log p(x), % = \expect{-\log p(X)},
|
samer@34
|
240 \end{equation}
|
samer@41
|
241 % where $\expect{}$ is the expectation operator.
|
samer@41
|
242 The negative-log-probability
|
samer@34
|
243 $\ell(x) = -\log p(x)$ of a particular value $x$ can usefully be thought of as
|
samer@34
|
244 the \emph{surprisingness} of the value $x$ should it be observed, and
|
samer@41
|
245 hence the entropy is the expectation of the surprisingness $\expect \ell(X)$.
|
samer@34
|
246
|
samer@34
|
247 Now suppose that the observer receives some new data $\Data$ that
|
samer@34
|
248 causes a revision of its beliefs about $X$. The \emph{information}
|
samer@34
|
249 in this new data \emph{about} $X$ can be quantified as the
|
samer@34
|
250 Kullback-Leibler (KL) divergence between the prior and posterior
|
samer@34
|
251 distributions $p(x)$ and $p(x|\Data)$ respectively:
|
samer@34
|
252 \begin{equation}
|
samer@34
|
253 \mathcal{I}_{\Data\to X} = D(p_{X|\Data} || p_{X})
|
samer@36
|
254 = \sum_{x\in\X} p(x|\Data) \log \frac{p(x|\Data)}{p(x)}.
|
samer@41
|
255 \label{eq:info}
|
samer@34
|
256 \end{equation}
|
samer@34
|
257 When there are multiple variables $X_1, X_2$
|
samer@34
|
258 \etc which the observer believes to be dependent, then the observation of
|
samer@34
|
259 one may change its beliefs and hence yield information about the
|
samer@34
|
260 others. The joint and conditional entropies as described in any
|
samer@34
|
261 textbook on information theory (\eg \cite{CoverThomas}) then quantify
|
samer@34
|
262 the observer's expected uncertainty about groups of variables given the
|
samer@34
|
263 values of others. In particular, the \emph{mutual information}
|
samer@34
|
264 $I(X_1;X_2)$ is both the expected information
|
samer@34
|
265 in an observation of $X_2$ about $X_1$ and the expected reduction
|
samer@34
|
266 in uncertainty about $X_1$ after observing $X_2$:
|
samer@34
|
267 \begin{equation}
|
samer@34
|
268 I(X_1;X_2) = H(X_1) - H(X_1|X_2),
|
samer@34
|
269 \end{equation}
|
samer@34
|
270 where $H(X_1|X_2) = H(X_1,X_2) - H(X_2)$ is the conditional entropy
|
samer@34
|
271 of $X_2$ given $X_1$. A little algebra shows that $I(X_1;X_2)=I(X_2;X_1)$
|
samer@34
|
272 and so the mutual information is symmetric in its arguments. A conditional
|
samer@34
|
273 form of the mutual information can be formulated analogously:
|
samer@34
|
274 \begin{equation}
|
samer@34
|
275 I(X_1;X_2|X_3) = H(X_1|X_3) - H(X_1|X_2,X_3).
|
samer@34
|
276 \end{equation}
|
samer@34
|
277 These relationships between the various entropies and mutual
|
samer@34
|
278 informations are conveniently visualised in Venn diagram-like \emph{information diagrams}
|
samer@34
|
279 or I-diagrams \cite{Yeung1991} such as the one in \figrf{venn-example}.
|
samer@34
|
280
|
samer@18
|
281 \begin{fig}{venn-example}
|
samer@18
|
282 \newcommand\rad{2.2em}%
|
samer@18
|
283 \newcommand\circo{circle (3.4em)}%
|
samer@18
|
284 \newcommand\labrad{4.3em}
|
samer@18
|
285 \newcommand\bound{(-6em,-5em) rectangle (6em,6em)}
|
samer@18
|
286 \newcommand\colsep{\ }
|
samer@18
|
287 \newcommand\clipin[1]{\clip (#1) \circo;}%
|
samer@18
|
288 \newcommand\clipout[1]{\clip \bound (#1) \circo;}%
|
samer@18
|
289 \newcommand\cliptwo[3]{%
|
samer@18
|
290 \begin{scope}
|
samer@18
|
291 \clipin{#1};
|
samer@18
|
292 \clipin{#2};
|
samer@18
|
293 \clipout{#3};
|
samer@18
|
294 \fill[black!30] \bound;
|
samer@18
|
295 \end{scope}
|
samer@18
|
296 }%
|
samer@18
|
297 \newcommand\clipone[3]{%
|
samer@18
|
298 \begin{scope}
|
samer@18
|
299 \clipin{#1};
|
samer@18
|
300 \clipout{#2};
|
samer@18
|
301 \clipout{#3};
|
samer@18
|
302 \fill[black!15] \bound;
|
samer@18
|
303 \end{scope}
|
samer@18
|
304 }%
|
samer@18
|
305 \begin{tabular}{c@{\colsep}c}
|
samer@18
|
306 \begin{tikzpicture}[baseline=0pt]
|
samer@18
|
307 \coordinate (p1) at (90:\rad);
|
samer@18
|
308 \coordinate (p2) at (210:\rad);
|
samer@18
|
309 \coordinate (p3) at (-30:\rad);
|
samer@18
|
310 \clipone{p1}{p2}{p3};
|
samer@18
|
311 \clipone{p2}{p3}{p1};
|
samer@18
|
312 \clipone{p3}{p1}{p2};
|
samer@18
|
313 \cliptwo{p1}{p2}{p3};
|
samer@18
|
314 \cliptwo{p2}{p3}{p1};
|
samer@18
|
315 \cliptwo{p3}{p1}{p2};
|
samer@18
|
316 \begin{scope}
|
samer@18
|
317 \clip (p1) \circo;
|
samer@18
|
318 \clip (p2) \circo;
|
samer@18
|
319 \clip (p3) \circo;
|
samer@18
|
320 \fill[black!45] \bound;
|
samer@18
|
321 \end{scope}
|
samer@18
|
322 \draw (p1) \circo;
|
samer@18
|
323 \draw (p2) \circo;
|
samer@18
|
324 \draw (p3) \circo;
|
samer@18
|
325 \path
|
samer@18
|
326 (barycentric cs:p3=1,p1=-0.2,p2=-0.1) +(0ex,0) node {$I_{3|12}$}
|
samer@18
|
327 (barycentric cs:p1=1,p2=-0.2,p3=-0.1) +(0ex,0) node {$I_{1|23}$}
|
samer@18
|
328 (barycentric cs:p2=1,p3=-0.2,p1=-0.1) +(0ex,0) node {$I_{2|13}$}
|
samer@18
|
329 (barycentric cs:p3=1,p2=1,p1=-0.55) +(0ex,0) node {$I_{23|1}$}
|
samer@18
|
330 (barycentric cs:p1=1,p3=1,p2=-0.55) +(0ex,0) node {$I_{13|2}$}
|
samer@18
|
331 (barycentric cs:p2=1,p1=1,p3=-0.55) +(0ex,0) node {$I_{12|3}$}
|
samer@18
|
332 (barycentric cs:p3=1,p2=1,p1=1) node {$I_{123}$}
|
samer@18
|
333 ;
|
samer@18
|
334 \path
|
samer@18
|
335 (p1) +(140:\labrad) node {$X_1$}
|
samer@18
|
336 (p2) +(-140:\labrad) node {$X_2$}
|
samer@18
|
337 (p3) +(-40:\labrad) node {$X_3$};
|
samer@18
|
338 \end{tikzpicture}
|
samer@18
|
339 &
|
samer@18
|
340 \parbox{0.5\linewidth}{
|
samer@18
|
341 \small
|
samer@18
|
342 \begin{align*}
|
samer@18
|
343 I_{1|23} &= H(X_1|X_2,X_3) \\
|
samer@18
|
344 I_{13|2} &= I(X_1;X_3|X_2) \\
|
samer@18
|
345 I_{1|23} + I_{13|2} &= H(X_1|X_2) \\
|
samer@18
|
346 I_{12|3} + I_{123} &= I(X_1;X_2)
|
samer@18
|
347 \end{align*}
|
samer@18
|
348 }
|
samer@18
|
349 \end{tabular}
|
samer@18
|
350 \caption{
|
samer@30
|
351 I-diagram visualisation of entropies and mutual informations
|
samer@18
|
352 for three random variables $X_1$, $X_2$ and $X_3$. The areas of
|
samer@18
|
353 the three circles represent $H(X_1)$, $H(X_2)$ and $H(X_3)$ respectively.
|
samer@18
|
354 The total shaded area is the joint entropy $H(X_1,X_2,X_3)$.
|
samer@18
|
355 The central area $I_{123}$ is the co-information \cite{McGill1954}.
|
samer@18
|
356 Some other information measures are indicated in the legend.
|
samer@18
|
357 }
|
samer@18
|
358 \end{fig}
|
samer@30
|
359
|
samer@30
|
360
|
samer@36
|
361 \subsection{Surprise and information in sequences}
|
samer@36
|
362 \label{s:surprise-info-seq}
|
samer@30
|
363
|
samer@36
|
364 Suppose that $(\ldots,X_{-1},X_0,X_1,\ldots)$ is a sequence of
|
samer@30
|
365 random variables, infinite in both directions,
|
samer@36
|
366 and that $\mu$ is the associated probability measure over all
|
samer@36
|
367 realisations of the sequence---in the following, $\mu$ will simply serve
|
samer@30
|
368 as a label for the process. We can indentify a number of information-theoretic
|
samer@30
|
369 measures meaningful in the context of a sequential observation of the sequence, during
|
samer@36
|
370 which, at any time $t$, the sequence of variables can be divided into a `present' $X_t$, a `past'
|
samer@30
|
371 $\past{X}_t \equiv (\ldots, X_{t-2}, X_{t-1})$, and a `future'
|
samer@30
|
372 $\fut{X}_t \equiv (X_{t+1},X_{t+2},\ldots)$.
|
samer@41
|
373 We will write the actually observed value of $X_t$ as $x_t$, and
|
samer@36
|
374 the sequence of observations up to but not including $x_t$ as
|
samer@36
|
375 $\past{x}_t$.
|
samer@36
|
376 % Since the sequence is assumed stationary, we can without loss of generality,
|
samer@36
|
377 % assume that $t=0$ in the following definitions.
|
samer@36
|
378
|
samer@41
|
379 The in-context surprisingness of the observation $X_t=x_t$ depends on
|
samer@41
|
380 both $x_t$ and the context $\past{x}_t$:
|
samer@36
|
381 \begin{equation}
|
samer@41
|
382 \ell_t = - \log p(x_t|\past{x}_t).
|
samer@36
|
383 \end{equation}
|
samer@36
|
384 However, before $X_t$ is observed to be $x_t$, the observer can compute
|
samer@36
|
385 its \emph{expected} surprisingness as a measure of its uncertainty about
|
samer@36
|
386 the very next event; this may be written as an entropy
|
samer@36
|
387 $H(X_t|\ev(\past{X}_t = \past{x}_t))$, but note that this is
|
samer@36
|
388 conditional on the \emph{event} $\ev(\past{X}_t=\past{x}_t)$, not
|
samer@36
|
389 \emph{variables} $\past{X}_t$ as in the conventional conditional entropy.
|
samer@36
|
390
|
samer@41
|
391 The surprisingness $\ell_t$ and expected surprisingness
|
samer@36
|
392 $H(X_t|\ev(\past{X}_t=\past{x}_t))$
|
samer@41
|
393 can be understood as \emph{subjective} information dynamic measures, since they are
|
samer@41
|
394 based on the observer's probability model in the context of the actually observed sequence
|
samer@36
|
395 $\past{x}_t$---they characterise what it is like to `be in the observer's shoes'.
|
samer@36
|
396 If we view the observer as a purely passive or reactive agent, this would
|
samer@36
|
397 probably be sufficient, but for active agents such as humans or animals, it is
|
samer@36
|
398 often necessary to \emph{aniticipate} future events in order, for example, to plan the
|
samer@36
|
399 most effective course of action. It makes sense for such observers to be
|
samer@36
|
400 concerned about the predictive probability distribution over future events,
|
samer@36
|
401 $p(\fut{x}_t|\past{x}_t)$. When an observation $\ev(X_t=x_t)$ is made in this context,
|
samer@41
|
402 the \emph{instantaneous predictive information} (IPI) $\mathcal{I}_t$ at time $t$
|
samer@41
|
403 is the information in the event $\ev(X_t=x_t)$ about the entire future of the sequence $\fut{X}_t$,
|
samer@41
|
404 \emph{given} the observed past $\past{X}_t=\past{x}_t$.
|
samer@41
|
405 Referring to the definition of information \eqrf{info}, this is the KL divergence
|
samer@41
|
406 between prior and posterior distributions over possible futures, which written out in full, is
|
samer@41
|
407 \begin{equation}
|
samer@41
|
408 \mathcal{I}_t = \sum_{\fut{x}_t \in \X^*}
|
samer@41
|
409 p(\fut{x}_t|x_t,\past{x}_t) \log \frac{ p(\fut{x}_t|x_t,\past{x}_t) }{ p(\fut{x}_t|\past{x}_t) },
|
samer@41
|
410 \end{equation}
|
samer@41
|
411 where the sum is to be taken over the set of infinite sequences $\X^*$.
|
samer@41
|
412 As with the surprisingness, the observer can compute its \emph{expected} IPI
|
samer@41
|
413 at time $t$, which reduces to a mutual information $I(X_t;\fut{X}_t|\ev(\past{X}_t=\past{x}_t))$
|
samer@41
|
414 conditioned on the observed past. This could be used, for example, as an estimate
|
samer@41
|
415 of attentional resources which should be directed at this stream of data, which may
|
samer@41
|
416 be in competition with other sensory streams.
|
samer@36
|
417
|
samer@36
|
418 \subsection{Information measures for stationary random processes}
|
samer@43
|
419 \label{s:process-info}
|
samer@30
|
420
|
samer@18
|
421
|
samer@18
|
422 \begin{fig}{predinfo-bg}
|
samer@18
|
423 \newcommand\subfig[2]{\shortstack{#2\\[0.75em]#1}}
|
samer@18
|
424 \newcommand\rad{1.8em}%
|
samer@18
|
425 \newcommand\ovoid[1]{%
|
samer@18
|
426 ++(-#1,\rad)
|
samer@18
|
427 -- ++(2 * #1,0em) arc (90:-90:\rad)
|
samer@18
|
428 -- ++(-2 * #1,0em) arc (270:90:\rad)
|
samer@18
|
429 }%
|
samer@18
|
430 \newcommand\axis{2.75em}%
|
samer@18
|
431 \newcommand\olap{0.85em}%
|
samer@18
|
432 \newcommand\offs{3.6em}
|
samer@18
|
433 \newcommand\colsep{\hspace{5em}}
|
samer@18
|
434 \newcommand\longblob{\ovoid{\axis}}
|
samer@18
|
435 \newcommand\shortblob{\ovoid{1.75em}}
|
samer@18
|
436 \begin{tabular}{c@{\colsep}c}
|
samer@43
|
437 \subfig{(a) multi-information and entropy rates}{%
|
samer@43
|
438 \begin{tikzpicture}%[baseline=-1em]
|
samer@43
|
439 \newcommand\rc{1.75em}
|
samer@43
|
440 \newcommand\throw{2.5em}
|
samer@43
|
441 \coordinate (p1) at (180:1.5em);
|
samer@43
|
442 \coordinate (p2) at (0:0.3em);
|
samer@43
|
443 \newcommand\bound{(-7em,-2.6em) rectangle (7em,3.0em)}
|
samer@43
|
444 \newcommand\present{(p2) circle (\rc)}
|
samer@43
|
445 \newcommand\thepast{(p1) ++(-\throw,0) \ovoid{\throw}}
|
samer@43
|
446 \newcommand\fillclipped[2]{%
|
samer@43
|
447 \begin{scope}[even odd rule]
|
samer@43
|
448 \foreach \thing in {#2} {\clip \thing;}
|
samer@43
|
449 \fill[black!#1] \bound;
|
samer@43
|
450 \end{scope}%
|
samer@43
|
451 }%
|
samer@43
|
452 \fillclipped{30}{\present,\bound \thepast}
|
samer@43
|
453 \fillclipped{15}{\present,\bound \thepast}
|
samer@43
|
454 \fillclipped{45}{\present,\thepast}
|
samer@43
|
455 \draw \thepast;
|
samer@43
|
456 \draw \present;
|
samer@43
|
457 \node at (barycentric cs:p2=1,p1=-0.3) {$h_\mu$};
|
samer@43
|
458 \node at (barycentric cs:p2=1,p1=1) [shape=rectangle,fill=black!45,inner sep=1pt]{$\rho_\mu$};
|
samer@43
|
459 \path (p2) +(90:3em) node {$X_0$};
|
samer@43
|
460 \path (p1) +(-3em,0em) node {\shortstack{infinite\\past}};
|
samer@43
|
461 \path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
|
samer@43
|
462 \end{tikzpicture}}%
|
samer@43
|
463 \\[1.25em]
|
samer@43
|
464 \subfig{(b) excess entropy}{%
|
samer@18
|
465 \newcommand\blob{\longblob}
|
samer@18
|
466 \begin{tikzpicture}
|
samer@18
|
467 \coordinate (p1) at (-\offs,0em);
|
samer@18
|
468 \coordinate (p2) at (\offs,0em);
|
samer@18
|
469 \begin{scope}
|
samer@18
|
470 \clip (p1) \blob;
|
samer@18
|
471 \clip (p2) \blob;
|
samer@18
|
472 \fill[lightgray] (-1,-1) rectangle (1,1);
|
samer@18
|
473 \end{scope}
|
samer@18
|
474 \draw (p1) +(-0.5em,0em) node{\shortstack{infinite\\past}} \blob;
|
samer@18
|
475 \draw (p2) +(0.5em,0em) node{\shortstack{infinite\\future}} \blob;
|
samer@18
|
476 \path (0,0) node (future) {$E$};
|
samer@18
|
477 \path (p1) +(-2em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
|
samer@18
|
478 \path (p2) +(2em,\rad) node [anchor=south] {$X_0,\ldots$};
|
samer@18
|
479 \end{tikzpicture}%
|
samer@18
|
480 }%
|
samer@18
|
481 \\[1.25em]
|
samer@43
|
482 \subfig{(c) predictive information rate $b_\mu$}{%
|
samer@18
|
483 \begin{tikzpicture}%[baseline=-1em]
|
samer@18
|
484 \newcommand\rc{2.1em}
|
samer@18
|
485 \newcommand\throw{2.5em}
|
samer@18
|
486 \coordinate (p1) at (210:1.5em);
|
samer@18
|
487 \coordinate (p2) at (90:0.7em);
|
samer@18
|
488 \coordinate (p3) at (-30:1.5em);
|
samer@18
|
489 \newcommand\bound{(-7em,-2.6em) rectangle (7em,3.0em)}
|
samer@18
|
490 \newcommand\present{(p2) circle (\rc)}
|
samer@18
|
491 \newcommand\thepast{(p1) ++(-\throw,0) \ovoid{\throw}}
|
samer@18
|
492 \newcommand\future{(p3) ++(\throw,0) \ovoid{\throw}}
|
samer@18
|
493 \newcommand\fillclipped[2]{%
|
samer@18
|
494 \begin{scope}[even odd rule]
|
samer@18
|
495 \foreach \thing in {#2} {\clip \thing;}
|
samer@18
|
496 \fill[black!#1] \bound;
|
samer@18
|
497 \end{scope}%
|
samer@18
|
498 }%
|
samer@43
|
499 \fillclipped{80}{\future,\thepast}
|
samer@18
|
500 \fillclipped{30}{\present,\future,\bound \thepast}
|
samer@18
|
501 \fillclipped{15}{\present,\bound \future,\bound \thepast}
|
samer@18
|
502 \draw \future;
|
samer@18
|
503 \fillclipped{45}{\present,\thepast}
|
samer@18
|
504 \draw \thepast;
|
samer@18
|
505 \draw \present;
|
samer@18
|
506 \node at (barycentric cs:p2=1,p1=-0.17,p3=-0.17) {$r_\mu$};
|
samer@18
|
507 \node at (barycentric cs:p1=-0.4,p2=1.0,p3=1) {$b_\mu$};
|
samer@18
|
508 \node at (barycentric cs:p3=0,p2=1,p1=1.2) [shape=rectangle,fill=black!45,inner sep=1pt]{$\rho_\mu$};
|
samer@18
|
509 \path (p2) +(140:3em) node {$X_0$};
|
samer@18
|
510 % \node at (barycentric cs:p3=0,p2=1,p1=1) {$\rho_\mu$};
|
samer@18
|
511 \path (p3) +(3em,0em) node {\shortstack{infinite\\future}};
|
samer@18
|
512 \path (p1) +(-3em,0em) node {\shortstack{infinite\\past}};
|
samer@18
|
513 \path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
|
samer@18
|
514 \path (p3) +(4em,\rad) node [anchor=south] {$X_1,\ldots$};
|
samer@18
|
515 \end{tikzpicture}}%
|
samer@18
|
516 \\[0.5em]
|
samer@18
|
517 \end{tabular}
|
samer@18
|
518 \caption{
|
samer@30
|
519 I-diagrams for several information measures in
|
samer@18
|
520 stationary random processes. Each circle or oval represents a random
|
samer@18
|
521 variable or sequence of random variables relative to time $t=0$. Overlapped areas
|
samer@18
|
522 correspond to various mutual information as in \Figrf{venn-example}.
|
samer@33
|
523 In (b), the circle represents the `present'. Its total area is
|
samer@33
|
524 $H(X_0)=\rho_\mu+r_\mu+b_\mu$, where $\rho_\mu$ is the multi-information
|
samer@18
|
525 rate, $r_\mu$ is the residual entropy rate, and $b_\mu$ is the predictive
|
samer@43
|
526 information rate. The entropy rate is $h_\mu = r_\mu+b_\mu$. The small dark
|
samer@43
|
527 region below $X_0$ in (c) is $\sigma_\mu = E-\rho_\mu$.
|
samer@18
|
528 }
|
samer@18
|
529 \end{fig}
|
samer@18
|
530
|
samer@41
|
531 If we step back, out of the observer's shoes as it were, and consider the
|
samer@41
|
532 random process $(\ldots,X_{-1},X_0,X_1,\dots)$ as a statistical ensemble of
|
samer@41
|
533 possible realisations, and furthermore assume that it is stationary,
|
samer@41
|
534 then it becomes possible to define a number of information-theoretic measures,
|
samer@41
|
535 closely related to those described above, but which characterise the
|
samer@41
|
536 process as a whole, rather than on a moment-by-moment basis. Some of these,
|
samer@41
|
537 such as the entropy rate, are well-known, but others are only recently being
|
samer@41
|
538 investigated. (In the following, the assumption of stationarity means that
|
samer@41
|
539 the measures defined below are independent of $t$.)
|
samer@41
|
540
|
samer@41
|
541 The \emph{entropy rate} of the process is the entropy of the next variable
|
samer@41
|
542 $X_t$ given all the previous ones.
|
samer@41
|
543 \begin{equation}
|
samer@41
|
544 \label{eq:entro-rate}
|
samer@41
|
545 h_\mu = H(X_t|\past{X}_t).
|
samer@41
|
546 \end{equation}
|
samer@41
|
547 The entropy rate gives a measure of the overall randomness
|
samer@41
|
548 or unpredictability of the process.
|
samer@41
|
549
|
samer@41
|
550 The \emph{multi-information rate} $\rho_\mu$ (following Dubnov's \cite{Dubnov2006}
|
samer@41
|
551 notation for what he called the `information rate') is the mutual
|
samer@41
|
552 information between the `past' and the `present':
|
samer@41
|
553 \begin{equation}
|
samer@41
|
554 \label{eq:multi-info}
|
samer@41
|
555 \rho_\mu = I(\past{X}_t;X_t) = H(X_t) - h_\mu.
|
samer@41
|
556 \end{equation}
|
samer@41
|
557 It is a measure of how much the context of an observation (that is,
|
samer@41
|
558 the observation of previous elements of the sequence) helps in predicting
|
samer@41
|
559 or reducing the suprisingness of the current observation.
|
samer@41
|
560
|
samer@41
|
561 The \emph{excess entropy} \cite{CrutchfieldPackard1983}
|
samer@41
|
562 is the mutual information between
|
samer@41
|
563 the entire `past' and the entire `future':
|
samer@41
|
564 \begin{equation}
|
samer@41
|
565 E = I(\past{X}_t; X_t,\fut{X}_t).
|
samer@41
|
566 \end{equation}
|
samer@43
|
567 Both the excess entropy and the multi-information rate can be thought
|
samer@43
|
568 of as measures of \emph{redundancy}, quantifying the extent to which
|
samer@43
|
569 the same information is to be found in all parts of the sequence.
|
samer@41
|
570
|
samer@41
|
571
|
samer@30
|
572 The \emph{predictive information rate} (or PIR) \cite{AbdallahPlumbley2009}
|
samer@30
|
573 is the average information in one observation about the infinite future given the infinite past,
|
samer@30
|
574 and is defined as a conditional mutual information:
|
samer@18
|
575 \begin{equation}
|
samer@18
|
576 \label{eq:PIR}
|
samer@41
|
577 b_\mu = I(X_t;\fut{X}_t|\past{X}_t) = H(\fut{X}_t|\past{X}_t) - H(\fut{X}_t|X_t,\past{X}_t).
|
samer@18
|
578 \end{equation}
|
samer@18
|
579 Equation \eqrf{PIR} can be read as the average reduction
|
samer@18
|
580 in uncertainty about the future on learning $X_t$, given the past.
|
samer@18
|
581 Due to the symmetry of the mutual information, it can also be written
|
samer@18
|
582 as
|
samer@18
|
583 \begin{equation}
|
samer@18
|
584 % \IXZ_t
|
samer@43
|
585 b_\mu = H(X_t|\past{X}_t) - H(X_t|\past{X}_t,\fut{X}_t) = h_\mu - r_\mu,
|
samer@18
|
586 % \label{<++>}
|
samer@18
|
587 \end{equation}
|
samer@18
|
588 % If $X$ is stationary, then
|
samer@41
|
589 where $r_\mu = H(X_t|\fut{X}_t,\past{X}_t)$,
|
samer@34
|
590 is the \emph{residual} \cite{AbdallahPlumbley2010},
|
samer@34
|
591 or \emph{erasure} \cite{VerduWeissman2006} entropy rate.
|
samer@18
|
592 These relationships are illustrated in \Figrf{predinfo-bg}, along with
|
samer@18
|
593 several of the information measures we have discussed so far.
|
samer@18
|
594
|
samer@18
|
595
|
samer@25
|
596 James et al \cite{JamesEllisonCrutchfield2011} study the predictive information
|
samer@25
|
597 rate and also examine some related measures. In particular they identify the
|
samer@25
|
598 $\sigma_\mu$, the difference between the multi-information rate and the excess
|
samer@25
|
599 entropy, as an interesting quantity that measures the predictive benefit of
|
samer@25
|
600 model-building (that is, maintaining an internal state summarising past
|
samer@43
|
601 observations in order to make better predictions).
|
samer@43
|
602 % They also identify
|
samer@43
|
603 % $w_\mu = \rho_\mu + b_{\mu}$, which they call the \emph{local exogenous
|
samer@43
|
604 % information} rate.
|
samer@34
|
605
|
samer@4
|
606
|
samer@36
|
607 \subsection{First and higher order Markov chains}
|
samer@36
|
608 First order Markov chains are the simplest non-trivial models to which information
|
samer@36
|
609 dynamics methods can be applied. In \cite{AbdallahPlumbley2009} we derived
|
samer@41
|
610 expressions for all the information measures described in \secrf{surprise-info-seq} for
|
samer@36
|
611 irreducible stationary Markov chains (\ie that have a unique stationary
|
samer@36
|
612 distribution). The derivation is greatly simplified by the dependency structure
|
samer@36
|
613 of the Markov chain: for the purpose of the analysis, the `past' and `future'
|
samer@41
|
614 segments $\past{X}_t$ and $\fut{X}_t$ can be collapsed to just the previous
|
samer@36
|
615 and next variables $X_{t-1}$ and $X_{t-1}$ respectively. We also showed that
|
samer@36
|
616 the predictive information rate can be expressed simply in terms of entropy rates:
|
samer@36
|
617 if we let $a$ denote the $K\times K$ transition matrix of a Markov chain over
|
samer@36
|
618 an alphabet of $\{1,\ldots,K\}$, such that
|
samer@36
|
619 $a_{ij} = \Pr(\ev(X_t=i|X_{t-1}=j))$, and let $h:\reals^{K\times K}\to \reals$ be
|
samer@36
|
620 the entropy rate function such that $h(a)$ is the entropy rate of a Markov chain
|
samer@36
|
621 with transition matrix $a$, then the predictive information rate $b(a)$ is
|
samer@36
|
622 \begin{equation}
|
samer@36
|
623 b(a) = h(a^2) - h(a),
|
samer@36
|
624 \end{equation}
|
samer@36
|
625 where $a^2$, the transition matrix squared, is the transition matrix
|
samer@36
|
626 of the `skip one' Markov chain obtained by jumping two steps at a time
|
samer@36
|
627 along the original chain.
|
samer@36
|
628
|
samer@36
|
629 Second and higher order Markov chains can be treated in a similar way by transforming
|
samer@36
|
630 to a first order representation of the high order Markov chain. If we are dealing
|
samer@36
|
631 with an $N$th order model, this is done forming a new alphabet of size $K^N$
|
samer@41
|
632 consisting of all possible $N$-tuples of symbols from the base alphabet.
|
samer@41
|
633 An observation $\hat{x}_t$ in this new model encodes a block of $N$ observations
|
samer@36
|
634 $(x_{t+1},\ldots,x_{t+N})$ from the base model. The next
|
samer@41
|
635 observation $\hat{x}_{t+1}$ encodes the block of $N$ obtained by shifting the previous
|
samer@36
|
636 block along by one step. The new Markov of chain is parameterised by a sparse $K^N\times K^N$
|
samer@41
|
637 transition matrix $\hat{a}$. Adopting the label $\mu$ for the order $N$ system,
|
samer@41
|
638 we obtain:
|
samer@36
|
639 \begin{equation}
|
samer@41
|
640 h_\mu = h(\hat{a}), \qquad b_\mu = h({\hat{a}^{N+1}}) - N h({\hat{a}}),
|
samer@36
|
641 \end{equation}
|
samer@36
|
642 where $\hat{a}^{N+1}$ is the $(N+1)$th power of the first order transition matrix.
|
samer@41
|
643 Other information measures can also be computed for the high-order Markov chain, including
|
samer@41
|
644 the multi-information rate $\rho_\mu$ and the excess entropy $E$. These are identical
|
samer@41
|
645 for first order Markov chains, but for order $N$ chains, $E$ can be up to $N$ times larger
|
samer@41
|
646 than $\rho_\mu$.
|
samer@43
|
647
|
samer@43
|
648 [Something about what kinds of Markov chain maximise $h_\mu$ (uncorrelated `white'
|
samer@43
|
649 sequences, no temporal structure), $\rho_\mu$ and $E$ (periodic) and $b_\mu$. We return
|
samer@43
|
650 this in \secrf{composition}.]
|
samer@36
|
651
|
samer@36
|
652
|
hekeus@16
|
653 \section{Information Dynamics in Analysis}
|
samer@4
|
654
|
samer@24
|
655 \begin{fig}{twopages}
|
samer@33
|
656 \colfig[0.96]{matbase/fig9471} % update from mbc paper
|
samer@33
|
657 % \colfig[0.97]{matbase/fig72663}\\ % later update from mbc paper (Keith's new picks)
|
samer@24
|
658 \vspace*{1em}
|
samer@24
|
659 \colfig[0.97]{matbase/fig13377} % rule based analysis
|
samer@24
|
660 \caption{Analysis of \emph{Two Pages}.
|
samer@24
|
661 The thick vertical lines are the part boundaries as indicated in
|
samer@24
|
662 the score by the composer.
|
samer@24
|
663 The thin grey lines
|
samer@24
|
664 indicate changes in the melodic `figures' of which the piece is
|
samer@24
|
665 constructed. In the `model information rate' panel, the black asterisks
|
samer@24
|
666 mark the
|
samer@24
|
667 six most surprising moments selected by Keith Potter.
|
samer@24
|
668 The bottom panel shows a rule-based boundary strength analysis computed
|
samer@24
|
669 using Cambouropoulos' LBDM.
|
samer@24
|
670 All information measures are in nats and time is in notes.
|
samer@24
|
671 }
|
samer@24
|
672 \end{fig}
|
samer@24
|
673
|
samer@36
|
674 \subsection{Musicological Analysis}
|
samer@36
|
675 In \cite{AbdallahPlumbley2009}, methods based on the theory described above
|
samer@36
|
676 were used to analysis two pieces of music in the minimalist style
|
samer@36
|
677 by Philip Glass: \emph{Two Pages} (1969) and \emph{Gradus} (1968).
|
samer@36
|
678 The analysis was done using a first-order Markov chain model, with the
|
samer@36
|
679 enhancement that the transition matrix of the model was allowed to
|
samer@36
|
680 evolve dynamically as the notes were processed, and was tracked (in
|
samer@36
|
681 a Bayesian way) as a \emph{distribution} over possible transition matrices,
|
samer@36
|
682 rather than a point estimate. The results are summarised in \figrf{twopages}:
|
samer@36
|
683 the upper four plots show the dynamically evolving subjective information
|
samer@36
|
684 measures as described in \secrf{surprise-info-seq} computed using a point
|
samer@36
|
685 estimate of the current transition matrix, but the fifth plot (the `model information rate')
|
samer@36
|
686 measures the information in each observation about the transition matrix.
|
samer@36
|
687 In \cite{AbdallahPlumbley2010b}, we showed that this `model information rate'
|
samer@36
|
688 is actually a component of the true IPI in
|
samer@36
|
689 a time-varying Markov chain, which was neglected when we computed the IPI from
|
samer@36
|
690 point estimates of the transition matrix as if the transition probabilities
|
samer@36
|
691 were constant.
|
samer@36
|
692
|
samer@36
|
693 The peaks of the surprisingness and both components of the predictive information
|
samer@36
|
694 show good correspondence with structure of the piece both as marked in the score
|
samer@36
|
695 and as analysed by musicologist Keith Potter, who was asked to mark the six
|
samer@36
|
696 `most surprising moments' of the piece (shown as asterisks in the fifth plot)%
|
samer@36
|
697 \footnote{%
|
samer@36
|
698 Note that the boundary marked in the score at around note 5,400 is known to be
|
samer@36
|
699 anomalous; on the basis of a listening analysis, some musicologists [ref] have
|
samer@36
|
700 placed the boundary a few bars later, in agreement with our analysis.}.
|
samer@36
|
701
|
samer@36
|
702 In contrast, the analyses shown in the lower two plots of \figrf{twopages},
|
samer@36
|
703 obtained using two rule-based music segmentation algorithms, while clearly
|
samer@37
|
704 \emph{reflecting} the structure of the piece, do not \emph{segment} the piece,
|
samer@37
|
705 with no tendency to peaking of the boundary strength function at
|
samer@36
|
706 the boundaries in the piece.
|
samer@36
|
707
|
samer@36
|
708
|
samer@24
|
709 \begin{fig}{metre}
|
samer@33
|
710 % \scalebox{1}[1]{%
|
samer@24
|
711 \begin{tabular}{cc}
|
samer@33
|
712 \colfig[0.45]{matbase/fig36859} & \colfig[0.48]{matbase/fig88658} \\
|
samer@33
|
713 \colfig[0.45]{matbase/fig48061} & \colfig[0.48]{matbase/fig46367} \\
|
samer@33
|
714 \colfig[0.45]{matbase/fig99042} & \colfig[0.47]{matbase/fig87490}
|
samer@24
|
715 % \colfig[0.46]{matbase/fig56807} & \colfig[0.48]{matbase/fig27144} \\
|
samer@24
|
716 % \colfig[0.46]{matbase/fig87574} & \colfig[0.48]{matbase/fig13651} \\
|
samer@24
|
717 % \colfig[0.44]{matbase/fig19913} & \colfig[0.46]{matbase/fig66144} \\
|
samer@24
|
718 % \colfig[0.48]{matbase/fig73098} & \colfig[0.48]{matbase/fig57141} \\
|
samer@24
|
719 % \colfig[0.48]{matbase/fig25703} & \colfig[0.48]{matbase/fig72080} \\
|
samer@24
|
720 % \colfig[0.48]{matbase/fig9142} & \colfig[0.48]{matbase/fig27751}
|
samer@24
|
721
|
samer@24
|
722 \end{tabular}%
|
samer@33
|
723 % }
|
samer@24
|
724 \caption{Metrical analysis by computing average surprisingness and
|
samer@24
|
725 informative of notes at different periodicities (\ie hypothetical
|
samer@24
|
726 bar lengths) and phases (\ie positions within a bar).
|
samer@24
|
727 }
|
samer@24
|
728 \end{fig}
|
samer@24
|
729
|
peterf@39
|
730 \subsection{Content analysis/Sound Categorisation}.
|
samer@42
|
731 Using analogous definitions of differential entropy, the methods outlined
|
samer@42
|
732 in the previous section are equally applicable to continuous random variables.
|
samer@42
|
733 In the case of music, where expressive properties such as dynamics, tempo,
|
samer@42
|
734 timing and timbre are readily quantified on a continuous scale, the information
|
samer@42
|
735 dynamic framework thus may also be considered.
|
peterf@39
|
736
|
samer@42
|
737 In \cite{Dubnov2006}, Dubnov considers the class of stationary Gaussian
|
samer@42
|
738 processes. For such processes, the entropy rate may be obtained analytically
|
samer@42
|
739 from the power spectral density of the signal, allowing the multi-information
|
samer@42
|
740 rate to be subsequently obtained. Local stationarity is assumed, which may
|
samer@42
|
741 be achieved by windowing or change point detection \cite{Dubnov2008}. %TODO
|
samer@42
|
742 mention non-gaussian processes extension Similarly, the predictive information
|
samer@42
|
743 rate may be computed using a Gaussian linear formulation CITE. In this view,
|
samer@42
|
744 the PIR is a function of the correlation between random innovations supplied
|
samer@42
|
745 to the stochastic process. %Dubnov, MacAdams, Reynolds (2006) %Bailes and
|
samer@42
|
746 Dean (2009)
|
peterf@39
|
747
|
peterf@26
|
748 \begin{itemize}
|
peterf@39
|
749 \item Continuous domain information
|
peterf@39
|
750 \item Audio based music expectation modelling
|
peterf@39
|
751 \item Proposed model for Gaussian processes
|
peterf@26
|
752 \end{itemize}
|
peterf@26
|
753
|
samer@4
|
754
|
samer@4
|
755 \subsection{Beat Tracking}
|
samer@4
|
756
|
samer@43
|
757 A probabilistic method for drum tracking was presented by Robertson
|
samer@43
|
758 \cite{Robertson11c}. The algorithm is used to synchronise a music
|
samer@43
|
759 sequencer to a live drummer. The expected beat time of the sequencer is
|
samer@43
|
760 represented by a click track, and the algorithm takes as input event
|
samer@43
|
761 times for discrete kick and snare drum events relative to this click
|
samer@43
|
762 track. These are obtained using dedicated microphones for each drum and
|
samer@43
|
763 using a percussive onset detector (Puckette 1998). The drum tracker
|
samer@43
|
764 continually updates distributions for tempo and phase on receiving a new
|
samer@43
|
765 event time. We can thus quantify the information contributed of an event
|
samer@43
|
766 by measuring the difference between the system's prior distribution and
|
samer@43
|
767 the posterior distribution using the Kullback-Leiber divergence.
|
samer@43
|
768
|
samer@43
|
769 Here, we have calculated the KL divergence and entropy for kick and
|
samer@43
|
770 snare events in sixteen files. The analysis of information rates can be
|
samer@43
|
771 considered \emph{subjective}, in that it measures how the drum tracker's
|
samer@43
|
772 probability distributions change, and these are contingent upon the
|
samer@43
|
773 model used as well as external properties in the signal. We expect,
|
samer@43
|
774 however, that following periods of increased uncertainty, such as fills
|
samer@43
|
775 or expressive timing, the information contained in an individual event
|
samer@43
|
776 increases. We also examine whether the information is dependent upon
|
samer@43
|
777 metrical position.
|
samer@43
|
778
|
samer@4
|
779
|
samer@24
|
780 \section{Information dynamics as compositional aid}
|
samer@43
|
781 \label{s:composition}
|
samer@43
|
782
|
samer@43
|
783 \begin{fig}{wundt}
|
samer@43
|
784 \raisebox{-4em}{\colfig[0.43]{wundt}}
|
samer@43
|
785 % {\ \shortstack{{\Large$\longrightarrow$}\\ {\scriptsize\emph{exposure}}}\ }
|
samer@43
|
786 {\ {\large$\longrightarrow$}\ }
|
samer@43
|
787 \raisebox{-4em}{\colfig[0.43]{wundt2}}
|
samer@43
|
788 \caption{
|
samer@43
|
789 The Wundt curve relating randomness/complexity with
|
samer@43
|
790 perceived value. Repeated exposure sometimes results
|
samer@43
|
791 in a move to the left along the curve \cite{Berlyne71}.
|
samer@43
|
792 }
|
samer@43
|
793 \end{fig}
|
hekeus@45
|
794
|
hekeus@45
|
795 The use of stochastic processes in music composition has been widespread for decades---for instance Iannis Xenakis applied probabilistic mathematical models to the creation of musical materials\cite{Xenakis:1992ul}. Information dynamics can serve as a novel framework for the exploration of the possibilities of stochastic and algorithmic processes; outputs can be filtered to match a set of criteria defined in terms of the information dynamics model, this criteria thus becoming a means of interfacing with the generative processes. This allows a composer to explore musical possibilities at the high and abstract level of expectation, randomness and predictability.
|
hekeus@13
|
796
|
hekeus@13
|
797
|
hekeus@45
|
798 %It is possible to apply information dynamics to the generation of content, such as to the composition of musical materials.
|
hekeus@45
|
799
|
hekeus@45
|
800 %For instance a stochastic music generating process could be controlled by modifying
|
hekeus@45
|
801 %constraints on its output in terms of predictive information rate or entropy
|
hekeus@45
|
802 %rate.
|
hekeus@45
|
803
|
hekeus@45
|
804
|
hekeus@13
|
805
|
samer@23
|
806 \subsection{The Melody Triangle}
|
samer@23
|
807
|
samer@34
|
808 \begin{figure}
|
samer@34
|
809 \centering
|
samer@34
|
810 \includegraphics[width=\linewidth]{figs/mtriscat}
|
samer@34
|
811 \caption{The population of transition matrices distributed along three axes of
|
samer@34
|
812 redundancy, entropy rate and predictive information rate (all measured in bits).
|
samer@34
|
813 The concentrations of points along the redundancy axis correspond
|
samer@34
|
814 to Markov chains which are roughly periodic with periods of 2 (redundancy 1 bit),
|
samer@34
|
815 3, 4, \etc all the way to period 8 (redundancy 3 bits). The colour of each point
|
samer@34
|
816 represents its PIR---note that the highest values are found at intermediate entropy
|
samer@34
|
817 and redundancy, and that the distribution as a whole makes a curved triangle. Although
|
samer@34
|
818 not visible in this plot, it is largely hollow in the middle.
|
samer@34
|
819 \label{InfoDynEngine}}
|
samer@34
|
820 \end{figure}
|
samer@23
|
821
|
samer@42
|
822 The Melody Triangle is an exploratory interface for the discovery of melodic
|
samer@42
|
823 content, where the input---positions within a triangle---directly map to information
|
samer@42
|
824 theoretic measures of the output. The measures---entropy rate, redundancy and
|
samer@42
|
825 predictive information rate---form a criteria with which to filter the output
|
samer@42
|
826 of the stochastic processes used to generate sequences of notes. These measures
|
samer@42
|
827 address notions of expectation and surprise in music, and as such the Melody
|
samer@42
|
828 Triangle is a means of interfacing with a generative process in terms of the
|
samer@42
|
829 predictability of its output.
|
samer@41
|
830
|
samer@42
|
831 The triangle is `populated' with possible parameter values for melody generators.
|
samer@43
|
832 These are plotted in a 3D information space of $\rho_\mu$ (redundancy), $h_\mu$ (entropy rate) and
|
samer@43
|
833 $b_\mu$ (predictive information rate), as defined in \secrf{process-info}.
|
samer@43
|
834 In our case we generated thousands of transition matrices, representing first-order
|
samer@42
|
835 Markov chains, by a random sampling method. In figure \ref{InfoDynEngine} we
|
samer@43
|
836 see a representation of how these matrices are distributed in the 3d statistical
|
samer@42
|
837 space; each one of these points corresponds to a transition matrix.
|
samer@41
|
838
|
samer@43
|
839 The distribution of transition matrices plotted in this space forms an arch shape
|
samer@42
|
840 that is fairly thin. It thus becomes a reasonable approximation to pretend that
|
samer@42
|
841 it is just a sheet in two dimensions; and so we stretch out this curved arc into
|
samer@42
|
842 a flat triangle. It is this triangular sheet that is our `Melody Triangle' and
|
samer@42
|
843 forms the interface by which the system is controlled. Using this interface
|
samer@42
|
844 thus involves a mapping to statistical space; a user selects a position within
|
samer@42
|
845 the triangle, and a corresponding transition matrix is returned. Figure
|
samer@42
|
846 \ref{TheTriangle} shows how the triangle maps to different measures of redundancy,
|
samer@42
|
847 entropy rate and predictive information rate.
|
samer@41
|
848
|
samer@41
|
849
|
samer@42
|
850 Each corner corresponds to three different extremes of predictability and
|
samer@42
|
851 unpredictability, which could be loosely characterised as `periodicity', `noise'
|
samer@42
|
852 and `repetition'. Melodies from the `noise' corner have no discernible pattern;
|
samer@42
|
853 they have high entropy rate, low predictive information rate and low redundancy.
|
samer@42
|
854 These melodies are essentially totally random. A melody along the `periodicity'
|
samer@42
|
855 to `repetition' edge are all deterministic loops that get shorter as we approach
|
samer@42
|
856 the `repetition' corner, until it becomes just one repeating note. It is the
|
samer@42
|
857 areas in between the extremes that provide the more `interesting' melodies.
|
samer@42
|
858 These melodies have some level of unpredictability, but are not completely random.
|
samer@42
|
859 Or, conversely, are predictable, but not entirely so.
|
samer@41
|
860
|
samer@41
|
861 \begin{figure}
|
samer@41
|
862 \centering
|
samer@41
|
863 \includegraphics[width=0.9\linewidth]{figs/TheTriangle.pdf}
|
samer@41
|
864 \caption{The Melody Triangle\label{TheTriangle}}
|
samer@41
|
865 \end{figure}
|
samer@41
|
866
|
hekeus@45
|
867 %PERHAPS WE SHOULD FOREGO TALKING ABOUT THE
|
hekeus@45
|
868 %INSTALLATION VERSION OF THE TRIANGLE?
|
hekeus@45
|
869 %feels a bit like a tangent, and could do with the space..
|
samer@42
|
870 The Melody Triangle exists in two incarnations; a standard screen based interface
|
samer@42
|
871 where a user moves tokens in and around a triangle on screen, and a multi-user
|
samer@42
|
872 interactive installation where a Kinect camera tracks individuals in a space and
|
hekeus@45
|
873 maps their positions in physical space to the triangle. In the latter each visitor
|
hekeus@45
|
874 that enters the installation generates a melody and can collaborate with their
|
samer@42
|
875 co-visitors to generate musical textures---a playful yet informative way to
|
hekeus@45
|
876 explore expectation and surprise in music. Additionally visitors can change the
|
hekeus@45
|
877 tempo, register, instrumentation and periodicity of their melody with body gestures.
|
samer@41
|
878
|
hekeus@45
|
879 As a screen based interface the Melody Triangle can serve as a composition tool.
|
samer@42
|
880 A triangle is drawn on the screen, screen space thus mapped to the statistical
|
hekeus@45
|
881 space of the Melody Triangle. A number of tokens, each representing a
|
hekeus@45
|
882 melody, can be dragged in and around the triangle. For each token, a sequence of symbols with
|
hekeus@45
|
883 statistical properties that correspond to the token's position is generated. These
|
hekeus@45
|
884 symbols are then mapped to notes of a scale\footnote{However they could just as well be mapped to any other property, such as intervals, chords, dynamics and timbres. It is even possible to map the symbols to non-sonic outputs, such as colours. The possibilities afforded by the Melody Triangle in these other domains remains to be investigated.}.
|
hekeus@45
|
885 Additionally keyboard commands give control over other musical parameters.
|
samer@23
|
886
|
hekeus@45
|
887 The Melody Triangle can generate intricate musical textures when multiple tokens are in the triangle.
|
hekeus@45
|
888 Unlike other computer aided composition tools or programming environments, here the composer engages with music on a high and abstract level; the interface relating to subjective expectation and predictability.
|
hekeus@45
|
889
|
hekeus@35
|
890
|
hekeus@35
|
891
|
hekeus@38
|
892
|
hekeus@38
|
893 \subsection{Information Dynamics as Evaluative Feedback Mechanism}
|
hekeus@38
|
894 %NOT SURE THIS SHOULD BE HERE AT ALL..?
|
hekeus@38
|
895
|
hekeus@38
|
896
|
samer@42
|
897 Information measures on a stream of symbols can form a feedback mechanism; a
|
hekeus@45
|
898 rudimentary `critic' of sorts. For instance symbol by symbol measure of predictive
|
samer@42
|
899 information rate, entropy rate and redundancy could tell us if a stream of symbols
|
samer@42
|
900 is currently `boring', either because it is too repetitive, or because it is too
|
hekeus@45
|
901 chaotic. Such feedback would be oblivious to long term and large scale
|
hekeus@45
|
902 structures and any cultural norms (such as style conventions), but
|
hekeus@45
|
903 nonetheless could provide a composer with valuable insight on
|
samer@42
|
904 the short term properties of a work. This could not only be used for the
|
samer@42
|
905 evaluation of pre-composed streams of symbols, but could also provide real-time
|
samer@42
|
906 feedback in an improvisatory setup.
|
hekeus@38
|
907
|
hekeus@13
|
908 \section{Musical Preference and Information Dynamics}
|
samer@42
|
909 We are carrying out a study to investigate the relationship between musical
|
samer@42
|
910 preference and the information dynamics models, the experimental interface a
|
samer@42
|
911 simplified version of the screen-based Melody Triangle. Participants are asked
|
samer@42
|
912 to use this music pattern generator under various experimental conditions in a
|
samer@42
|
913 composition task. The data collected includes usage statistics of the system:
|
samer@42
|
914 where in the triangle they place the tokens, how long they leave them there and
|
samer@42
|
915 the state of the system when users, by pressing a key, indicate that they like
|
samer@42
|
916 what they are hearing. As such the experiments will help us identify any
|
samer@42
|
917 correlation between the information theoretic properties of a stream and its
|
samer@42
|
918 perceived aesthetic worth.
|
hekeus@16
|
919
|
samer@4
|
920
|
hekeus@38
|
921 %\emph{comparable system} Gordon Pask's Musicolor (1953) applied a similar notion
|
hekeus@38
|
922 %of boredom in its design. The Musicolour would react to audio input through a
|
hekeus@38
|
923 %microphone by flashing coloured lights. Rather than a direct mapping of sound
|
hekeus@38
|
924 %to light, Pask designed the device to be a partner to a performing musician. It
|
hekeus@38
|
925 %would adapt its lighting pattern based on the rhythms and frequencies it would
|
hekeus@38
|
926 %hear, quickly `learning' to flash in time with the music. However Pask endowed
|
hekeus@38
|
927 %the device with the ability to `be bored'; if the rhythmic and frequency content
|
hekeus@38
|
928 %of the input remained the same for too long it would listen for other rhythms
|
hekeus@38
|
929 %and frequencies, only lighting when it heard these. As the Musicolour would
|
hekeus@38
|
930 %`get bored', the musician would have to change and vary their playing, eliciting
|
hekeus@38
|
931 %new and unexpected outputs in trying to keep the Musicolour interested.
|
samer@4
|
932
|
hekeus@13
|
933
|
samer@4
|
934 \section{Conclusion}
|
samer@4
|
935
|
hekeus@45
|
936
|
hekeus@45
|
937
|
hekeus@44
|
938 \section{acknowledgments}
|
hekeus@44
|
939 This work is supported by EPSRC Doctoral Training Centre EP/G03723X/1 (HE), GR/S82213/01 and EP/E045235/1(SA), an EPSRC Leadership Fellowship, EP/G007144/1 (MDP) and EPSRC IDyOM2 EP/H013059/1.
|
hekeus@44
|
940
|
samer@9
|
941 \bibliographystyle{unsrt}
|
samer@43
|
942 {\bibliography{all,c4dm,nime,andrew}}
|
samer@4
|
943 \end{document}
|