samer@41
|
1 \documentclass[conference]{IEEEtran}
|
samer@4
|
2 \usepackage{cite}
|
samer@4
|
3 \usepackage[cmex10]{amsmath}
|
samer@4
|
4 \usepackage{graphicx}
|
samer@4
|
5 \usepackage{amssymb}
|
samer@4
|
6 \usepackage{epstopdf}
|
samer@4
|
7 \usepackage{url}
|
samer@4
|
8 \usepackage{listings}
|
samer@18
|
9 %\usepackage[expectangle]{tools}
|
samer@9
|
10 \usepackage{tools}
|
samer@18
|
11 \usepackage{tikz}
|
samer@18
|
12 \usetikzlibrary{calc}
|
samer@18
|
13 \usetikzlibrary{matrix}
|
samer@18
|
14 \usetikzlibrary{patterns}
|
samer@18
|
15 \usetikzlibrary{arrows}
|
samer@9
|
16
|
samer@9
|
17 \let\citep=\cite
|
samer@33
|
18 \newcommand{\colfig}[2][1]{\includegraphics[width=#1\linewidth]{figs/#2}}%
|
samer@18
|
19 \newcommand\preals{\reals_+}
|
samer@18
|
20 \newcommand\X{\mathcal{X}}
|
samer@18
|
21 \newcommand\Y{\mathcal{Y}}
|
samer@18
|
22 \newcommand\domS{\mathcal{S}}
|
samer@18
|
23 \newcommand\A{\mathcal{A}}
|
samer@25
|
24 \newcommand\Data{\mathcal{D}}
|
samer@18
|
25 \newcommand\rvm[1]{\mathrm{#1}}
|
samer@18
|
26 \newcommand\sps{\,.\,}
|
samer@18
|
27 \newcommand\Ipred{\mathcal{I}_{\mathrm{pred}}}
|
samer@18
|
28 \newcommand\Ix{\mathcal{I}}
|
samer@18
|
29 \newcommand\IXZ{\overline{\underline{\mathcal{I}}}}
|
samer@18
|
30 \newcommand\x{\vec{x}}
|
samer@18
|
31 \newcommand\Ham[1]{\mathcal{H}_{#1}}
|
samer@18
|
32 \newcommand\subsets[2]{[#1]^{(k)}}
|
samer@18
|
33 \def\bet(#1,#2){#1..#2}
|
samer@18
|
34
|
samer@18
|
35
|
samer@18
|
36 \def\ev(#1=#2){#1\!\!=\!#2}
|
samer@18
|
37 \newcommand\rv[1]{\Omega \to #1}
|
samer@18
|
38 \newcommand\ceq{\!\!=\!}
|
samer@18
|
39 \newcommand\cmin{\!-\!}
|
samer@18
|
40 \newcommand\modulo[2]{#1\!\!\!\!\!\mod#2}
|
samer@18
|
41
|
samer@18
|
42 \newcommand\sumitoN{\sum_{i=1}^N}
|
samer@18
|
43 \newcommand\sumktoK{\sum_{k=1}^K}
|
samer@18
|
44 \newcommand\sumjtoK{\sum_{j=1}^K}
|
samer@18
|
45 \newcommand\sumalpha{\sum_{\alpha\in\A}}
|
samer@18
|
46 \newcommand\prodktoK{\prod_{k=1}^K}
|
samer@18
|
47 \newcommand\prodjtoK{\prod_{j=1}^K}
|
samer@18
|
48
|
samer@18
|
49 \newcommand\past[1]{\overset{\rule{0pt}{0.2em}\smash{\leftarrow}}{#1}}
|
samer@18
|
50 \newcommand\fut[1]{\overset{\rule{0pt}{0.1em}\smash{\rightarrow}}{#1}}
|
samer@18
|
51 \newcommand\parity[2]{P^{#1}_{2,#2}}
|
samer@4
|
52
|
samer@4
|
53 %\usepackage[parfill]{parskip}
|
samer@4
|
54
|
samer@4
|
55 \begin{document}
|
samer@41
|
56 \title{Cognitive Music Modelling: an\\Information Dynamics Approach}
|
samer@4
|
57
|
samer@4
|
58 \author{
|
hekeus@16
|
59 \IEEEauthorblockN{Samer A. Abdallah, Henrik Ekeus, Peter Foster}
|
hekeus@16
|
60 \IEEEauthorblockN{Andrew Robertson and Mark D. Plumbley}
|
samer@4
|
61 \IEEEauthorblockA{Centre for Digital Music\\
|
samer@4
|
62 Queen Mary University of London\\
|
samer@41
|
63 Mile End Road, London E1 4NS}}
|
samer@4
|
64
|
samer@4
|
65 \maketitle
|
samer@18
|
66 \begin{abstract}
|
samer@18
|
67 People take in information when perceiving music. With it they continually
|
samer@18
|
68 build predictive models of what is going to happen. There is a relationship
|
samer@18
|
69 between information measures and how we perceive music. An information
|
samer@18
|
70 theoretic approach to music cognition is thus a fruitful avenue of research.
|
samer@18
|
71 In this paper, we review the theoretical foundations of information dynamics
|
samer@18
|
72 and discuss a few emerging areas of application.
|
hekeus@16
|
73 \end{abstract}
|
samer@4
|
74
|
samer@4
|
75
|
samer@25
|
76 \section{Introduction}
|
samer@9
|
77 \label{s:Intro}
|
samer@9
|
78
|
samer@25
|
79 \subsection{Expectation and surprise in music}
|
samer@18
|
80 One of the effects of listening to music is to create
|
samer@18
|
81 expectations of what is to come next, which may be fulfilled
|
samer@9
|
82 immediately, after some delay, or not at all as the case may be.
|
samer@9
|
83 This is the thesis put forward by, amongst others, music theorists
|
samer@18
|
84 L. B. Meyer \cite{Meyer67} and Narmour \citep{Narmour77}, but was
|
samer@18
|
85 recognised much earlier; for example,
|
samer@9
|
86 it was elegantly put by Hanslick \cite{Hanslick1854} in the
|
samer@9
|
87 nineteenth century:
|
samer@9
|
88 \begin{quote}
|
samer@9
|
89 `The most important factor in the mental process which accompanies the
|
samer@9
|
90 act of listening to music, and which converts it to a source of pleasure,
|
samer@18
|
91 is \ldots the intellectual satisfaction
|
samer@9
|
92 which the listener derives from continually following and anticipating
|
samer@9
|
93 the composer's intentions---now, to see his expectations fulfilled, and
|
samer@18
|
94 now, to find himself agreeably mistaken.
|
samer@18
|
95 %It is a matter of course that
|
samer@18
|
96 %this intellectual flux and reflux, this perpetual giving and receiving
|
samer@18
|
97 %takes place unconsciously, and with the rapidity of lightning-flashes.'
|
samer@9
|
98 \end{quote}
|
samer@9
|
99 An essential aspect of this is that music is experienced as a phenomenon
|
samer@9
|
100 that `unfolds' in time, rather than being apprehended as a static object
|
samer@9
|
101 presented in its entirety. Meyer argued that musical experience depends
|
samer@9
|
102 on how we change and revise our conceptions \emph{as events happen}, on
|
samer@9
|
103 how expectation and prediction interact with occurrence, and that, to a
|
samer@9
|
104 large degree, the way to understand the effect of music is to focus on
|
samer@9
|
105 this `kinetics' of expectation and surprise.
|
samer@9
|
106
|
samer@25
|
107 Prediction and expectation are essentially probabilistic concepts
|
samer@25
|
108 and can be treated mathematically using probability theory.
|
samer@25
|
109 We suppose that when we listen to music, expectations are created on the basis
|
samer@25
|
110 of our familiarity with various styles of music and our ability to
|
samer@25
|
111 detect and learn statistical regularities in the music as they emerge,
|
samer@25
|
112 There is experimental evidence that human listeners are able to internalise
|
samer@25
|
113 statistical knowledge about musical structure, \eg
|
samer@25
|
114 \citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also
|
samer@25
|
115 that statistical models can form an effective basis for computational
|
samer@25
|
116 analysis of music, \eg
|
samer@25
|
117 \cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}.
|
samer@25
|
118
|
samer@25
|
119
|
samer@25
|
120 \comment{
|
samer@9
|
121 The business of making predictions and assessing surprise is essentially
|
samer@9
|
122 one of reasoning under conditions of uncertainty and manipulating
|
samer@9
|
123 degrees of belief about the various proposition which may or may not
|
samer@9
|
124 hold, and, as has been argued elsewhere \cite{Cox1946,Jaynes27}, best
|
samer@9
|
125 quantified in terms of Bayesian probability theory.
|
samer@9
|
126 Thus, we suppose that
|
samer@9
|
127 when we listen to music, expectations are created on the basis of our
|
samer@24
|
128 familiarity with various stylistic norms that apply to music in general,
|
samer@24
|
129 the particular style (or styles) of music that seem best to fit the piece
|
samer@24
|
130 we are listening to, and
|
samer@9
|
131 the emerging structures peculiar to the current piece. There is
|
samer@9
|
132 experimental evidence that human listeners are able to internalise
|
samer@9
|
133 statistical knowledge about musical structure, \eg
|
samer@9
|
134 \citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also
|
samer@9
|
135 that statistical models can form an effective basis for computational
|
samer@9
|
136 analysis of music, \eg
|
samer@9
|
137 \cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}.
|
samer@25
|
138 }
|
samer@9
|
139
|
samer@9
|
140 \subsection{Music and information theory}
|
samer@24
|
141 With a probabilistic framework for music modelling and prediction in hand,
|
samer@25
|
142 we are in a position to apply Shannon's quantitative information theory
|
samer@25
|
143 \cite{Shannon48}.
|
samer@25
|
144 \comment{
|
samer@25
|
145 which provides us with a number of measures, such as entropy
|
samer@25
|
146 and mutual information, which are suitable for quantifying states of
|
samer@25
|
147 uncertainty and surprise, and thus could potentially enable us to build
|
samer@25
|
148 quantitative models of the listening process described above. They are
|
samer@25
|
149 what Berlyne \cite{Berlyne71} called `collative variables' since they are
|
samer@25
|
150 to do with patterns of occurrence rather than medium-specific details.
|
samer@25
|
151 Berlyne sought to show that the collative variables are closely related to
|
samer@25
|
152 perceptual qualities like complexity, tension, interestingness,
|
samer@25
|
153 and even aesthetic value, not just in music, but in other temporal
|
samer@25
|
154 or visual media.
|
samer@25
|
155 The relevance of information theory to music and art has
|
samer@25
|
156 also been addressed by researchers from the 1950s onwards
|
samer@25
|
157 \cite{Youngblood58,CoonsKraehenbuehl1958,Cohen1962,HillerBean66,Moles66,Meyer67}.
|
samer@25
|
158 }
|
samer@9
|
159 The relationship between information theory and music and art in general has been the
|
samer@9
|
160 subject of some interest since the 1950s
|
samer@9
|
161 \cite{Youngblood58,CoonsKraehenbuehl1958,HillerBean66,Moles66,Meyer67,Cohen1962}.
|
samer@9
|
162 The general thesis is that perceptible qualities and subjective
|
samer@9
|
163 states like uncertainty, surprise, complexity, tension, and interestingness
|
samer@9
|
164 are closely related to
|
samer@9
|
165 information-theoretic quantities like entropy, relative entropy,
|
samer@9
|
166 and mutual information.
|
samer@9
|
167 % and are major determinants of the overall experience.
|
samer@9
|
168 Berlyne \cite{Berlyne71} called such quantities `collative variables', since
|
samer@9
|
169 they are to do with patterns of occurrence rather than medium-specific details,
|
samer@9
|
170 and developed the ideas of `information aesthetics' in an experimental setting.
|
samer@9
|
171 % Berlyne's `new experimental aesthetics', the `information-aestheticians'.
|
samer@9
|
172
|
samer@9
|
173 % Listeners then experience greater or lesser levels of surprise
|
samer@9
|
174 % in response to departures from these norms.
|
samer@9
|
175 % By careful manipulation
|
samer@9
|
176 % of the material, the composer can thus define, and induce within the
|
samer@9
|
177 % listener, a temporal programme of varying
|
samer@9
|
178 % levels of uncertainty, ambiguity and surprise.
|
samer@9
|
179
|
samer@9
|
180
|
samer@9
|
181 \subsection{Information dynamic approach}
|
samer@9
|
182
|
samer@24
|
183 Bringing the various strands together, our working hypothesis is that as a
|
samer@24
|
184 listener (to which will refer as `it') listens to a piece of music, it maintains
|
samer@25
|
185 a dynamically evolving probabilistic model that enables it to make predictions
|
samer@24
|
186 about how the piece will continue, relying on both its previous experience
|
samer@24
|
187 of music and the immediate context of the piece. As events unfold, it revises
|
samer@25
|
188 its probabilistic belief state, which includes predictive
|
samer@25
|
189 distributions over possible future events. These
|
samer@25
|
190 % distributions and changes in distributions
|
samer@25
|
191 can be characterised in terms of a handful of information
|
samer@25
|
192 theoretic-measures such as entropy and relative entropy. By tracing the
|
samer@24
|
193 evolution of a these measures, we obtain a representation which captures much
|
samer@25
|
194 of the significant structure of the music.
|
samer@25
|
195
|
samer@25
|
196 One of the consequences of this approach is that regardless of the details of
|
samer@25
|
197 the sensory input or even which sensory modality is being processed, the resulting
|
samer@25
|
198 analysis is in terms of the same units: quantities of information (bits) and
|
samer@25
|
199 rates of information flow (bits per second). The probabilistic and information
|
samer@25
|
200 theoretic concepts in terms of which the analysis is framed are universal to all sorts
|
samer@25
|
201 of data.
|
samer@25
|
202 In addition, when adaptive probabilistic models are used, expectations are
|
samer@25
|
203 created mainly in response to to \emph{patterns} of occurence,
|
samer@25
|
204 rather the details of which specific things occur.
|
samer@25
|
205 Together, these suggest that an information dynamic analysis captures a
|
samer@25
|
206 high level of \emph{abstraction}, and could be used to
|
samer@25
|
207 make structural comparisons between different temporal media,
|
samer@25
|
208 such as music, film, animation, and dance.
|
samer@25
|
209 % analyse and compare information
|
samer@25
|
210 % flow in different temporal media regardless of whether they are auditory,
|
samer@25
|
211 % visual or otherwise.
|
samer@9
|
212
|
samer@25
|
213 Another consequence is that the information dynamic approach gives us a principled way
|
samer@24
|
214 to address the notion of \emph{subjectivity}, since the analysis is dependent on the
|
samer@24
|
215 probability model the observer starts off with, which may depend on prior experience
|
samer@24
|
216 or other factors, and which may change over time. Thus, inter-subject variablity and
|
samer@24
|
217 variation in subjects' responses over time are
|
samer@24
|
218 fundamental to the theory.
|
samer@9
|
219
|
samer@18
|
220 %modelling the creative process, which often alternates between generative
|
samer@18
|
221 %and selective or evaluative phases \cite{Boden1990}, and would have
|
samer@18
|
222 %applications in tools for computer aided composition.
|
samer@18
|
223
|
samer@18
|
224
|
samer@18
|
225 \section{Theoretical review}
|
samer@18
|
226
|
samer@34
|
227 \subsection{Entropy and information}
|
samer@41
|
228 \label{s:entro-info}
|
samer@41
|
229
|
samer@34
|
230 Let $X$ denote some variable whose value is initially unknown to our
|
samer@34
|
231 hypothetical observer. We will treat $X$ mathematically as a random variable,
|
samer@36
|
232 with a value to be drawn from some set $\X$ and a
|
samer@34
|
233 probability distribution representing the observer's beliefs about the
|
samer@34
|
234 true value of $X$.
|
samer@34
|
235 In this case, the observer's uncertainty about $X$ can be quantified
|
samer@34
|
236 as the entropy of the random variable $H(X)$. For a discrete variable
|
samer@36
|
237 with probability mass function $p:\X \to [0,1]$, this is
|
samer@34
|
238 \begin{equation}
|
samer@41
|
239 H(X) = \sum_{x\in\X} -p(x) \log p(x), % = \expect{-\log p(X)},
|
samer@34
|
240 \end{equation}
|
samer@41
|
241 % where $\expect{}$ is the expectation operator.
|
samer@41
|
242 The negative-log-probability
|
samer@34
|
243 $\ell(x) = -\log p(x)$ of a particular value $x$ can usefully be thought of as
|
samer@34
|
244 the \emph{surprisingness} of the value $x$ should it be observed, and
|
samer@41
|
245 hence the entropy is the expectation of the surprisingness $\expect \ell(X)$.
|
samer@34
|
246
|
samer@34
|
247 Now suppose that the observer receives some new data $\Data$ that
|
samer@34
|
248 causes a revision of its beliefs about $X$. The \emph{information}
|
samer@34
|
249 in this new data \emph{about} $X$ can be quantified as the
|
samer@34
|
250 Kullback-Leibler (KL) divergence between the prior and posterior
|
samer@34
|
251 distributions $p(x)$ and $p(x|\Data)$ respectively:
|
samer@34
|
252 \begin{equation}
|
samer@34
|
253 \mathcal{I}_{\Data\to X} = D(p_{X|\Data} || p_{X})
|
samer@36
|
254 = \sum_{x\in\X} p(x|\Data) \log \frac{p(x|\Data)}{p(x)}.
|
samer@41
|
255 \label{eq:info}
|
samer@34
|
256 \end{equation}
|
samer@34
|
257 When there are multiple variables $X_1, X_2$
|
samer@34
|
258 \etc which the observer believes to be dependent, then the observation of
|
samer@34
|
259 one may change its beliefs and hence yield information about the
|
samer@34
|
260 others. The joint and conditional entropies as described in any
|
samer@34
|
261 textbook on information theory (\eg \cite{CoverThomas}) then quantify
|
samer@34
|
262 the observer's expected uncertainty about groups of variables given the
|
samer@34
|
263 values of others. In particular, the \emph{mutual information}
|
samer@34
|
264 $I(X_1;X_2)$ is both the expected information
|
samer@34
|
265 in an observation of $X_2$ about $X_1$ and the expected reduction
|
samer@34
|
266 in uncertainty about $X_1$ after observing $X_2$:
|
samer@34
|
267 \begin{equation}
|
samer@34
|
268 I(X_1;X_2) = H(X_1) - H(X_1|X_2),
|
samer@34
|
269 \end{equation}
|
samer@34
|
270 where $H(X_1|X_2) = H(X_1,X_2) - H(X_2)$ is the conditional entropy
|
samer@34
|
271 of $X_2$ given $X_1$. A little algebra shows that $I(X_1;X_2)=I(X_2;X_1)$
|
samer@34
|
272 and so the mutual information is symmetric in its arguments. A conditional
|
samer@34
|
273 form of the mutual information can be formulated analogously:
|
samer@34
|
274 \begin{equation}
|
samer@34
|
275 I(X_1;X_2|X_3) = H(X_1|X_3) - H(X_1|X_2,X_3).
|
samer@34
|
276 \end{equation}
|
samer@34
|
277 These relationships between the various entropies and mutual
|
samer@34
|
278 informations are conveniently visualised in Venn diagram-like \emph{information diagrams}
|
samer@34
|
279 or I-diagrams \cite{Yeung1991} such as the one in \figrf{venn-example}.
|
samer@34
|
280
|
samer@18
|
281 \begin{fig}{venn-example}
|
samer@18
|
282 \newcommand\rad{2.2em}%
|
samer@18
|
283 \newcommand\circo{circle (3.4em)}%
|
samer@18
|
284 \newcommand\labrad{4.3em}
|
samer@18
|
285 \newcommand\bound{(-6em,-5em) rectangle (6em,6em)}
|
samer@18
|
286 \newcommand\colsep{\ }
|
samer@18
|
287 \newcommand\clipin[1]{\clip (#1) \circo;}%
|
samer@18
|
288 \newcommand\clipout[1]{\clip \bound (#1) \circo;}%
|
samer@18
|
289 \newcommand\cliptwo[3]{%
|
samer@18
|
290 \begin{scope}
|
samer@18
|
291 \clipin{#1};
|
samer@18
|
292 \clipin{#2};
|
samer@18
|
293 \clipout{#3};
|
samer@18
|
294 \fill[black!30] \bound;
|
samer@18
|
295 \end{scope}
|
samer@18
|
296 }%
|
samer@18
|
297 \newcommand\clipone[3]{%
|
samer@18
|
298 \begin{scope}
|
samer@18
|
299 \clipin{#1};
|
samer@18
|
300 \clipout{#2};
|
samer@18
|
301 \clipout{#3};
|
samer@18
|
302 \fill[black!15] \bound;
|
samer@18
|
303 \end{scope}
|
samer@18
|
304 }%
|
samer@18
|
305 \begin{tabular}{c@{\colsep}c}
|
samer@18
|
306 \begin{tikzpicture}[baseline=0pt]
|
samer@18
|
307 \coordinate (p1) at (90:\rad);
|
samer@18
|
308 \coordinate (p2) at (210:\rad);
|
samer@18
|
309 \coordinate (p3) at (-30:\rad);
|
samer@18
|
310 \clipone{p1}{p2}{p3};
|
samer@18
|
311 \clipone{p2}{p3}{p1};
|
samer@18
|
312 \clipone{p3}{p1}{p2};
|
samer@18
|
313 \cliptwo{p1}{p2}{p3};
|
samer@18
|
314 \cliptwo{p2}{p3}{p1};
|
samer@18
|
315 \cliptwo{p3}{p1}{p2};
|
samer@18
|
316 \begin{scope}
|
samer@18
|
317 \clip (p1) \circo;
|
samer@18
|
318 \clip (p2) \circo;
|
samer@18
|
319 \clip (p3) \circo;
|
samer@18
|
320 \fill[black!45] \bound;
|
samer@18
|
321 \end{scope}
|
samer@18
|
322 \draw (p1) \circo;
|
samer@18
|
323 \draw (p2) \circo;
|
samer@18
|
324 \draw (p3) \circo;
|
samer@18
|
325 \path
|
samer@18
|
326 (barycentric cs:p3=1,p1=-0.2,p2=-0.1) +(0ex,0) node {$I_{3|12}$}
|
samer@18
|
327 (barycentric cs:p1=1,p2=-0.2,p3=-0.1) +(0ex,0) node {$I_{1|23}$}
|
samer@18
|
328 (barycentric cs:p2=1,p3=-0.2,p1=-0.1) +(0ex,0) node {$I_{2|13}$}
|
samer@18
|
329 (barycentric cs:p3=1,p2=1,p1=-0.55) +(0ex,0) node {$I_{23|1}$}
|
samer@18
|
330 (barycentric cs:p1=1,p3=1,p2=-0.55) +(0ex,0) node {$I_{13|2}$}
|
samer@18
|
331 (barycentric cs:p2=1,p1=1,p3=-0.55) +(0ex,0) node {$I_{12|3}$}
|
samer@18
|
332 (barycentric cs:p3=1,p2=1,p1=1) node {$I_{123}$}
|
samer@18
|
333 ;
|
samer@18
|
334 \path
|
samer@18
|
335 (p1) +(140:\labrad) node {$X_1$}
|
samer@18
|
336 (p2) +(-140:\labrad) node {$X_2$}
|
samer@18
|
337 (p3) +(-40:\labrad) node {$X_3$};
|
samer@18
|
338 \end{tikzpicture}
|
samer@18
|
339 &
|
samer@18
|
340 \parbox{0.5\linewidth}{
|
samer@18
|
341 \small
|
samer@18
|
342 \begin{align*}
|
samer@18
|
343 I_{1|23} &= H(X_1|X_2,X_3) \\
|
samer@18
|
344 I_{13|2} &= I(X_1;X_3|X_2) \\
|
samer@18
|
345 I_{1|23} + I_{13|2} &= H(X_1|X_2) \\
|
samer@18
|
346 I_{12|3} + I_{123} &= I(X_1;X_2)
|
samer@18
|
347 \end{align*}
|
samer@18
|
348 }
|
samer@18
|
349 \end{tabular}
|
samer@18
|
350 \caption{
|
samer@30
|
351 I-diagram visualisation of entropies and mutual informations
|
samer@18
|
352 for three random variables $X_1$, $X_2$ and $X_3$. The areas of
|
samer@18
|
353 the three circles represent $H(X_1)$, $H(X_2)$ and $H(X_3)$ respectively.
|
samer@18
|
354 The total shaded area is the joint entropy $H(X_1,X_2,X_3)$.
|
samer@18
|
355 The central area $I_{123}$ is the co-information \cite{McGill1954}.
|
samer@18
|
356 Some other information measures are indicated in the legend.
|
samer@18
|
357 }
|
samer@18
|
358 \end{fig}
|
samer@30
|
359
|
samer@30
|
360
|
samer@36
|
361 \subsection{Surprise and information in sequences}
|
samer@36
|
362 \label{s:surprise-info-seq}
|
samer@30
|
363
|
samer@36
|
364 Suppose that $(\ldots,X_{-1},X_0,X_1,\ldots)$ is a sequence of
|
samer@30
|
365 random variables, infinite in both directions,
|
samer@36
|
366 and that $\mu$ is the associated probability measure over all
|
samer@36
|
367 realisations of the sequence---in the following, $\mu$ will simply serve
|
samer@30
|
368 as a label for the process. We can indentify a number of information-theoretic
|
samer@30
|
369 measures meaningful in the context of a sequential observation of the sequence, during
|
samer@36
|
370 which, at any time $t$, the sequence of variables can be divided into a `present' $X_t$, a `past'
|
samer@30
|
371 $\past{X}_t \equiv (\ldots, X_{t-2}, X_{t-1})$, and a `future'
|
samer@30
|
372 $\fut{X}_t \equiv (X_{t+1},X_{t+2},\ldots)$.
|
samer@41
|
373 We will write the actually observed value of $X_t$ as $x_t$, and
|
samer@36
|
374 the sequence of observations up to but not including $x_t$ as
|
samer@36
|
375 $\past{x}_t$.
|
samer@36
|
376 % Since the sequence is assumed stationary, we can without loss of generality,
|
samer@36
|
377 % assume that $t=0$ in the following definitions.
|
samer@36
|
378
|
samer@41
|
379 The in-context surprisingness of the observation $X_t=x_t$ depends on
|
samer@41
|
380 both $x_t$ and the context $\past{x}_t$:
|
samer@36
|
381 \begin{equation}
|
samer@41
|
382 \ell_t = - \log p(x_t|\past{x}_t).
|
samer@36
|
383 \end{equation}
|
samer@36
|
384 However, before $X_t$ is observed to be $x_t$, the observer can compute
|
samer@36
|
385 its \emph{expected} surprisingness as a measure of its uncertainty about
|
samer@36
|
386 the very next event; this may be written as an entropy
|
samer@36
|
387 $H(X_t|\ev(\past{X}_t = \past{x}_t))$, but note that this is
|
samer@36
|
388 conditional on the \emph{event} $\ev(\past{X}_t=\past{x}_t)$, not
|
samer@36
|
389 \emph{variables} $\past{X}_t$ as in the conventional conditional entropy.
|
samer@36
|
390
|
samer@41
|
391 The surprisingness $\ell_t$ and expected surprisingness
|
samer@36
|
392 $H(X_t|\ev(\past{X}_t=\past{x}_t))$
|
samer@41
|
393 can be understood as \emph{subjective} information dynamic measures, since they are
|
samer@41
|
394 based on the observer's probability model in the context of the actually observed sequence
|
samer@36
|
395 $\past{x}_t$---they characterise what it is like to `be in the observer's shoes'.
|
samer@36
|
396 If we view the observer as a purely passive or reactive agent, this would
|
samer@36
|
397 probably be sufficient, but for active agents such as humans or animals, it is
|
samer@36
|
398 often necessary to \emph{aniticipate} future events in order, for example, to plan the
|
samer@36
|
399 most effective course of action. It makes sense for such observers to be
|
samer@36
|
400 concerned about the predictive probability distribution over future events,
|
samer@36
|
401 $p(\fut{x}_t|\past{x}_t)$. When an observation $\ev(X_t=x_t)$ is made in this context,
|
samer@41
|
402 the \emph{instantaneous predictive information} (IPI) $\mathcal{I}_t$ at time $t$
|
samer@41
|
403 is the information in the event $\ev(X_t=x_t)$ about the entire future of the sequence $\fut{X}_t$,
|
samer@41
|
404 \emph{given} the observed past $\past{X}_t=\past{x}_t$.
|
samer@41
|
405 Referring to the definition of information \eqrf{info}, this is the KL divergence
|
samer@41
|
406 between prior and posterior distributions over possible futures, which written out in full, is
|
samer@41
|
407 \begin{equation}
|
samer@41
|
408 \mathcal{I}_t = \sum_{\fut{x}_t \in \X^*}
|
samer@41
|
409 p(\fut{x}_t|x_t,\past{x}_t) \log \frac{ p(\fut{x}_t|x_t,\past{x}_t) }{ p(\fut{x}_t|\past{x}_t) },
|
samer@41
|
410 \end{equation}
|
samer@41
|
411 where the sum is to be taken over the set of infinite sequences $\X^*$.
|
samer@41
|
412 As with the surprisingness, the observer can compute its \emph{expected} IPI
|
samer@41
|
413 at time $t$, which reduces to a mutual information $I(X_t;\fut{X}_t|\ev(\past{X}_t=\past{x}_t))$
|
samer@41
|
414 conditioned on the observed past. This could be used, for example, as an estimate
|
samer@41
|
415 of attentional resources which should be directed at this stream of data, which may
|
samer@41
|
416 be in competition with other sensory streams.
|
samer@36
|
417
|
samer@36
|
418 \subsection{Information measures for stationary random processes}
|
samer@30
|
419
|
samer@18
|
420
|
samer@18
|
421 \begin{fig}{predinfo-bg}
|
samer@18
|
422 \newcommand\subfig[2]{\shortstack{#2\\[0.75em]#1}}
|
samer@18
|
423 \newcommand\rad{1.8em}%
|
samer@18
|
424 \newcommand\ovoid[1]{%
|
samer@18
|
425 ++(-#1,\rad)
|
samer@18
|
426 -- ++(2 * #1,0em) arc (90:-90:\rad)
|
samer@18
|
427 -- ++(-2 * #1,0em) arc (270:90:\rad)
|
samer@18
|
428 }%
|
samer@18
|
429 \newcommand\axis{2.75em}%
|
samer@18
|
430 \newcommand\olap{0.85em}%
|
samer@18
|
431 \newcommand\offs{3.6em}
|
samer@18
|
432 \newcommand\colsep{\hspace{5em}}
|
samer@18
|
433 \newcommand\longblob{\ovoid{\axis}}
|
samer@18
|
434 \newcommand\shortblob{\ovoid{1.75em}}
|
samer@18
|
435 \begin{tabular}{c@{\colsep}c}
|
samer@18
|
436 \subfig{(a) excess entropy}{%
|
samer@18
|
437 \newcommand\blob{\longblob}
|
samer@18
|
438 \begin{tikzpicture}
|
samer@18
|
439 \coordinate (p1) at (-\offs,0em);
|
samer@18
|
440 \coordinate (p2) at (\offs,0em);
|
samer@18
|
441 \begin{scope}
|
samer@18
|
442 \clip (p1) \blob;
|
samer@18
|
443 \clip (p2) \blob;
|
samer@18
|
444 \fill[lightgray] (-1,-1) rectangle (1,1);
|
samer@18
|
445 \end{scope}
|
samer@18
|
446 \draw (p1) +(-0.5em,0em) node{\shortstack{infinite\\past}} \blob;
|
samer@18
|
447 \draw (p2) +(0.5em,0em) node{\shortstack{infinite\\future}} \blob;
|
samer@18
|
448 \path (0,0) node (future) {$E$};
|
samer@18
|
449 \path (p1) +(-2em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
|
samer@18
|
450 \path (p2) +(2em,\rad) node [anchor=south] {$X_0,\ldots$};
|
samer@18
|
451 \end{tikzpicture}%
|
samer@18
|
452 }%
|
samer@18
|
453 \\[1.25em]
|
samer@18
|
454 \subfig{(b) predictive information rate $b_\mu$}{%
|
samer@18
|
455 \begin{tikzpicture}%[baseline=-1em]
|
samer@18
|
456 \newcommand\rc{2.1em}
|
samer@18
|
457 \newcommand\throw{2.5em}
|
samer@18
|
458 \coordinate (p1) at (210:1.5em);
|
samer@18
|
459 \coordinate (p2) at (90:0.7em);
|
samer@18
|
460 \coordinate (p3) at (-30:1.5em);
|
samer@18
|
461 \newcommand\bound{(-7em,-2.6em) rectangle (7em,3.0em)}
|
samer@18
|
462 \newcommand\present{(p2) circle (\rc)}
|
samer@18
|
463 \newcommand\thepast{(p1) ++(-\throw,0) \ovoid{\throw}}
|
samer@18
|
464 \newcommand\future{(p3) ++(\throw,0) \ovoid{\throw}}
|
samer@18
|
465 \newcommand\fillclipped[2]{%
|
samer@18
|
466 \begin{scope}[even odd rule]
|
samer@18
|
467 \foreach \thing in {#2} {\clip \thing;}
|
samer@18
|
468 \fill[black!#1] \bound;
|
samer@18
|
469 \end{scope}%
|
samer@18
|
470 }%
|
samer@18
|
471 \fillclipped{30}{\present,\future,\bound \thepast}
|
samer@18
|
472 \fillclipped{15}{\present,\bound \future,\bound \thepast}
|
samer@18
|
473 \draw \future;
|
samer@18
|
474 \fillclipped{45}{\present,\thepast}
|
samer@18
|
475 \draw \thepast;
|
samer@18
|
476 \draw \present;
|
samer@18
|
477 \node at (barycentric cs:p2=1,p1=-0.17,p3=-0.17) {$r_\mu$};
|
samer@18
|
478 \node at (barycentric cs:p1=-0.4,p2=1.0,p3=1) {$b_\mu$};
|
samer@18
|
479 \node at (barycentric cs:p3=0,p2=1,p1=1.2) [shape=rectangle,fill=black!45,inner sep=1pt]{$\rho_\mu$};
|
samer@18
|
480 \path (p2) +(140:3em) node {$X_0$};
|
samer@18
|
481 % \node at (barycentric cs:p3=0,p2=1,p1=1) {$\rho_\mu$};
|
samer@18
|
482 \path (p3) +(3em,0em) node {\shortstack{infinite\\future}};
|
samer@18
|
483 \path (p1) +(-3em,0em) node {\shortstack{infinite\\past}};
|
samer@18
|
484 \path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
|
samer@18
|
485 \path (p3) +(4em,\rad) node [anchor=south] {$X_1,\ldots$};
|
samer@18
|
486 \end{tikzpicture}}%
|
samer@18
|
487 \\[0.5em]
|
samer@18
|
488 \end{tabular}
|
samer@18
|
489 \caption{
|
samer@30
|
490 I-diagrams for several information measures in
|
samer@18
|
491 stationary random processes. Each circle or oval represents a random
|
samer@18
|
492 variable or sequence of random variables relative to time $t=0$. Overlapped areas
|
samer@18
|
493 correspond to various mutual information as in \Figrf{venn-example}.
|
samer@33
|
494 In (b), the circle represents the `present'. Its total area is
|
samer@33
|
495 $H(X_0)=\rho_\mu+r_\mu+b_\mu$, where $\rho_\mu$ is the multi-information
|
samer@18
|
496 rate, $r_\mu$ is the residual entropy rate, and $b_\mu$ is the predictive
|
samer@18
|
497 information rate. The entropy rate is $h_\mu = r_\mu+b_\mu$.
|
samer@18
|
498 }
|
samer@18
|
499 \end{fig}
|
samer@18
|
500
|
samer@41
|
501 If we step back, out of the observer's shoes as it were, and consider the
|
samer@41
|
502 random process $(\ldots,X_{-1},X_0,X_1,\dots)$ as a statistical ensemble of
|
samer@41
|
503 possible realisations, and furthermore assume that it is stationary,
|
samer@41
|
504 then it becomes possible to define a number of information-theoretic measures,
|
samer@41
|
505 closely related to those described above, but which characterise the
|
samer@41
|
506 process as a whole, rather than on a moment-by-moment basis. Some of these,
|
samer@41
|
507 such as the entropy rate, are well-known, but others are only recently being
|
samer@41
|
508 investigated. (In the following, the assumption of stationarity means that
|
samer@41
|
509 the measures defined below are independent of $t$.)
|
samer@41
|
510
|
samer@41
|
511 The \emph{entropy rate} of the process is the entropy of the next variable
|
samer@41
|
512 $X_t$ given all the previous ones.
|
samer@41
|
513 \begin{equation}
|
samer@41
|
514 \label{eq:entro-rate}
|
samer@41
|
515 h_\mu = H(X_t|\past{X}_t).
|
samer@41
|
516 \end{equation}
|
samer@41
|
517 The entropy rate gives a measure of the overall randomness
|
samer@41
|
518 or unpredictability of the process.
|
samer@41
|
519
|
samer@41
|
520 The \emph{multi-information rate} $\rho_\mu$ (following Dubnov's \cite{Dubnov2006}
|
samer@41
|
521 notation for what he called the `information rate') is the mutual
|
samer@41
|
522 information between the `past' and the `present':
|
samer@41
|
523 \begin{equation}
|
samer@41
|
524 \label{eq:multi-info}
|
samer@41
|
525 \rho_\mu = I(\past{X}_t;X_t) = H(X_t) - h_\mu.
|
samer@41
|
526 \end{equation}
|
samer@41
|
527 It is a measure of how much the context of an observation (that is,
|
samer@41
|
528 the observation of previous elements of the sequence) helps in predicting
|
samer@41
|
529 or reducing the suprisingness of the current observation.
|
samer@41
|
530
|
samer@41
|
531 The \emph{excess entropy} \cite{CrutchfieldPackard1983}
|
samer@41
|
532 is the mutual information between
|
samer@41
|
533 the entire `past' and the entire `future':
|
samer@41
|
534 \begin{equation}
|
samer@41
|
535 E = I(\past{X}_t; X_t,\fut{X}_t).
|
samer@41
|
536 \end{equation}
|
samer@41
|
537
|
samer@41
|
538
|
samer@30
|
539 The \emph{predictive information rate} (or PIR) \cite{AbdallahPlumbley2009}
|
samer@30
|
540 is the average information in one observation about the infinite future given the infinite past,
|
samer@30
|
541 and is defined as a conditional mutual information:
|
samer@18
|
542 \begin{equation}
|
samer@18
|
543 \label{eq:PIR}
|
samer@41
|
544 b_\mu = I(X_t;\fut{X}_t|\past{X}_t) = H(\fut{X}_t|\past{X}_t) - H(\fut{X}_t|X_t,\past{X}_t).
|
samer@18
|
545 \end{equation}
|
samer@18
|
546 Equation \eqrf{PIR} can be read as the average reduction
|
samer@18
|
547 in uncertainty about the future on learning $X_t$, given the past.
|
samer@18
|
548 Due to the symmetry of the mutual information, it can also be written
|
samer@18
|
549 as
|
samer@18
|
550 \begin{equation}
|
samer@18
|
551 % \IXZ_t
|
samer@41
|
552 I(X_t;\fut{X}_t|\past{X}_t) = h_\mu - r_\mu,
|
samer@18
|
553 % \label{<++>}
|
samer@18
|
554 \end{equation}
|
samer@18
|
555 % If $X$ is stationary, then
|
samer@41
|
556 where $r_\mu = H(X_t|\fut{X}_t,\past{X}_t)$,
|
samer@34
|
557 is the \emph{residual} \cite{AbdallahPlumbley2010},
|
samer@34
|
558 or \emph{erasure} \cite{VerduWeissman2006} entropy rate.
|
samer@18
|
559 These relationships are illustrated in \Figrf{predinfo-bg}, along with
|
samer@18
|
560 several of the information measures we have discussed so far.
|
samer@18
|
561
|
samer@18
|
562
|
samer@25
|
563 James et al \cite{JamesEllisonCrutchfield2011} study the predictive information
|
samer@25
|
564 rate and also examine some related measures. In particular they identify the
|
samer@25
|
565 $\sigma_\mu$, the difference between the multi-information rate and the excess
|
samer@25
|
566 entropy, as an interesting quantity that measures the predictive benefit of
|
samer@25
|
567 model-building (that is, maintaining an internal state summarising past
|
samer@25
|
568 observations in order to make better predictions). They also identify
|
samer@25
|
569 $w_\mu = \rho_\mu + b_{\mu}$, which they call the \emph{local exogenous
|
samer@30
|
570 information} rate.
|
samer@24
|
571
|
samer@34
|
572 \begin{fig}{wundt}
|
samer@34
|
573 \raisebox{-4em}{\colfig[0.43]{wundt}}
|
samer@34
|
574 % {\ \shortstack{{\Large$\longrightarrow$}\\ {\scriptsize\emph{exposure}}}\ }
|
samer@34
|
575 {\ {\large$\longrightarrow$}\ }
|
samer@34
|
576 \raisebox{-4em}{\colfig[0.43]{wundt2}}
|
samer@34
|
577 \caption{
|
samer@34
|
578 The Wundt curve relating randomness/complexity with
|
samer@34
|
579 perceived value. Repeated exposure sometimes results
|
samer@34
|
580 in a move to the left along the curve \cite{Berlyne71}.
|
samer@34
|
581 }
|
samer@34
|
582 \end{fig}
|
samer@34
|
583
|
samer@4
|
584
|
samer@36
|
585 \subsection{First and higher order Markov chains}
|
samer@36
|
586 First order Markov chains are the simplest non-trivial models to which information
|
samer@36
|
587 dynamics methods can be applied. In \cite{AbdallahPlumbley2009} we derived
|
samer@41
|
588 expressions for all the information measures described in \secrf{surprise-info-seq} for
|
samer@36
|
589 irreducible stationary Markov chains (\ie that have a unique stationary
|
samer@36
|
590 distribution). The derivation is greatly simplified by the dependency structure
|
samer@36
|
591 of the Markov chain: for the purpose of the analysis, the `past' and `future'
|
samer@41
|
592 segments $\past{X}_t$ and $\fut{X}_t$ can be collapsed to just the previous
|
samer@36
|
593 and next variables $X_{t-1}$ and $X_{t-1}$ respectively. We also showed that
|
samer@36
|
594 the predictive information rate can be expressed simply in terms of entropy rates:
|
samer@36
|
595 if we let $a$ denote the $K\times K$ transition matrix of a Markov chain over
|
samer@36
|
596 an alphabet of $\{1,\ldots,K\}$, such that
|
samer@36
|
597 $a_{ij} = \Pr(\ev(X_t=i|X_{t-1}=j))$, and let $h:\reals^{K\times K}\to \reals$ be
|
samer@36
|
598 the entropy rate function such that $h(a)$ is the entropy rate of a Markov chain
|
samer@36
|
599 with transition matrix $a$, then the predictive information rate $b(a)$ is
|
samer@36
|
600 \begin{equation}
|
samer@36
|
601 b(a) = h(a^2) - h(a),
|
samer@36
|
602 \end{equation}
|
samer@36
|
603 where $a^2$, the transition matrix squared, is the transition matrix
|
samer@36
|
604 of the `skip one' Markov chain obtained by jumping two steps at a time
|
samer@36
|
605 along the original chain.
|
samer@36
|
606
|
samer@36
|
607 Second and higher order Markov chains can be treated in a similar way by transforming
|
samer@36
|
608 to a first order representation of the high order Markov chain. If we are dealing
|
samer@36
|
609 with an $N$th order model, this is done forming a new alphabet of size $K^N$
|
samer@41
|
610 consisting of all possible $N$-tuples of symbols from the base alphabet.
|
samer@41
|
611 An observation $\hat{x}_t$ in this new model encodes a block of $N$ observations
|
samer@36
|
612 $(x_{t+1},\ldots,x_{t+N})$ from the base model. The next
|
samer@41
|
613 observation $\hat{x}_{t+1}$ encodes the block of $N$ obtained by shifting the previous
|
samer@36
|
614 block along by one step. The new Markov of chain is parameterised by a sparse $K^N\times K^N$
|
samer@41
|
615 transition matrix $\hat{a}$. Adopting the label $\mu$ for the order $N$ system,
|
samer@41
|
616 we obtain:
|
samer@36
|
617 \begin{equation}
|
samer@41
|
618 h_\mu = h(\hat{a}), \qquad b_\mu = h({\hat{a}^{N+1}}) - N h({\hat{a}}),
|
samer@36
|
619 \end{equation}
|
samer@36
|
620 where $\hat{a}^{N+1}$ is the $(N+1)$th power of the first order transition matrix.
|
samer@41
|
621 Other information measures can also be computed for the high-order Markov chain, including
|
samer@41
|
622 the multi-information rate $\rho_\mu$ and the excess entropy $E$. These are identical
|
samer@41
|
623 for first order Markov chains, but for order $N$ chains, $E$ can be up to $N$ times larger
|
samer@41
|
624 than $\rho_\mu$.
|
samer@36
|
625
|
samer@36
|
626
|
hekeus@16
|
627 \section{Information Dynamics in Analysis}
|
samer@4
|
628
|
samer@24
|
629 \begin{fig}{twopages}
|
samer@33
|
630 \colfig[0.96]{matbase/fig9471} % update from mbc paper
|
samer@33
|
631 % \colfig[0.97]{matbase/fig72663}\\ % later update from mbc paper (Keith's new picks)
|
samer@24
|
632 \vspace*{1em}
|
samer@24
|
633 \colfig[0.97]{matbase/fig13377} % rule based analysis
|
samer@24
|
634 \caption{Analysis of \emph{Two Pages}.
|
samer@24
|
635 The thick vertical lines are the part boundaries as indicated in
|
samer@24
|
636 the score by the composer.
|
samer@24
|
637 The thin grey lines
|
samer@24
|
638 indicate changes in the melodic `figures' of which the piece is
|
samer@24
|
639 constructed. In the `model information rate' panel, the black asterisks
|
samer@24
|
640 mark the
|
samer@24
|
641 six most surprising moments selected by Keith Potter.
|
samer@24
|
642 The bottom panel shows a rule-based boundary strength analysis computed
|
samer@24
|
643 using Cambouropoulos' LBDM.
|
samer@24
|
644 All information measures are in nats and time is in notes.
|
samer@24
|
645 }
|
samer@24
|
646 \end{fig}
|
samer@24
|
647
|
samer@36
|
648 \subsection{Musicological Analysis}
|
samer@36
|
649 In \cite{AbdallahPlumbley2009}, methods based on the theory described above
|
samer@36
|
650 were used to analysis two pieces of music in the minimalist style
|
samer@36
|
651 by Philip Glass: \emph{Two Pages} (1969) and \emph{Gradus} (1968).
|
samer@36
|
652 The analysis was done using a first-order Markov chain model, with the
|
samer@36
|
653 enhancement that the transition matrix of the model was allowed to
|
samer@36
|
654 evolve dynamically as the notes were processed, and was tracked (in
|
samer@36
|
655 a Bayesian way) as a \emph{distribution} over possible transition matrices,
|
samer@36
|
656 rather than a point estimate. The results are summarised in \figrf{twopages}:
|
samer@36
|
657 the upper four plots show the dynamically evolving subjective information
|
samer@36
|
658 measures as described in \secrf{surprise-info-seq} computed using a point
|
samer@36
|
659 estimate of the current transition matrix, but the fifth plot (the `model information rate')
|
samer@36
|
660 measures the information in each observation about the transition matrix.
|
samer@36
|
661 In \cite{AbdallahPlumbley2010b}, we showed that this `model information rate'
|
samer@36
|
662 is actually a component of the true IPI in
|
samer@36
|
663 a time-varying Markov chain, which was neglected when we computed the IPI from
|
samer@36
|
664 point estimates of the transition matrix as if the transition probabilities
|
samer@36
|
665 were constant.
|
samer@36
|
666
|
samer@36
|
667 The peaks of the surprisingness and both components of the predictive information
|
samer@36
|
668 show good correspondence with structure of the piece both as marked in the score
|
samer@36
|
669 and as analysed by musicologist Keith Potter, who was asked to mark the six
|
samer@36
|
670 `most surprising moments' of the piece (shown as asterisks in the fifth plot)%
|
samer@36
|
671 \footnote{%
|
samer@36
|
672 Note that the boundary marked in the score at around note 5,400 is known to be
|
samer@36
|
673 anomalous; on the basis of a listening analysis, some musicologists [ref] have
|
samer@36
|
674 placed the boundary a few bars later, in agreement with our analysis.}.
|
samer@36
|
675
|
samer@36
|
676 In contrast, the analyses shown in the lower two plots of \figrf{twopages},
|
samer@36
|
677 obtained using two rule-based music segmentation algorithms, while clearly
|
samer@37
|
678 \emph{reflecting} the structure of the piece, do not \emph{segment} the piece,
|
samer@37
|
679 with no tendency to peaking of the boundary strength function at
|
samer@36
|
680 the boundaries in the piece.
|
samer@36
|
681
|
samer@36
|
682
|
samer@24
|
683 \begin{fig}{metre}
|
samer@33
|
684 % \scalebox{1}[1]{%
|
samer@24
|
685 \begin{tabular}{cc}
|
samer@33
|
686 \colfig[0.45]{matbase/fig36859} & \colfig[0.48]{matbase/fig88658} \\
|
samer@33
|
687 \colfig[0.45]{matbase/fig48061} & \colfig[0.48]{matbase/fig46367} \\
|
samer@33
|
688 \colfig[0.45]{matbase/fig99042} & \colfig[0.47]{matbase/fig87490}
|
samer@24
|
689 % \colfig[0.46]{matbase/fig56807} & \colfig[0.48]{matbase/fig27144} \\
|
samer@24
|
690 % \colfig[0.46]{matbase/fig87574} & \colfig[0.48]{matbase/fig13651} \\
|
samer@24
|
691 % \colfig[0.44]{matbase/fig19913} & \colfig[0.46]{matbase/fig66144} \\
|
samer@24
|
692 % \colfig[0.48]{matbase/fig73098} & \colfig[0.48]{matbase/fig57141} \\
|
samer@24
|
693 % \colfig[0.48]{matbase/fig25703} & \colfig[0.48]{matbase/fig72080} \\
|
samer@24
|
694 % \colfig[0.48]{matbase/fig9142} & \colfig[0.48]{matbase/fig27751}
|
samer@24
|
695
|
samer@24
|
696 \end{tabular}%
|
samer@33
|
697 % }
|
samer@24
|
698 \caption{Metrical analysis by computing average surprisingness and
|
samer@24
|
699 informative of notes at different periodicities (\ie hypothetical
|
samer@24
|
700 bar lengths) and phases (\ie positions within a bar).
|
samer@24
|
701 }
|
samer@24
|
702 \end{fig}
|
samer@24
|
703
|
peterf@39
|
704 \subsection{Content analysis/Sound Categorisation}.
|
peterf@39
|
705 Using analogous definitions of differential entropy, the methods outlined in the previous section are equally applicable to continuous random variables. In the case of music, where expressive properties such as dynamics, tempo, timing and timbre are readily quantified on a continuous scale, the information dynamic framework thus may also be considered.
|
peterf@39
|
706
|
peterf@39
|
707 In \cite{Dubnov2006}, Dubnov considers the class of stationary Gaussian processes. For such processes, the entropy rate may be obtained analytically from the power spectral density of the signal, allowing the multi-information rate to be subsequently obtained. Local stationarity is assumed, which may be achieved by windowing or change point detection \cite{Dubnov2008}. %TODO mention non-gaussian processes extension
|
peterf@39
|
708 Similarly, the predictive information rate may be computed using a Gaussian linear formulation CITE. In this view, the PIR is a function of the correlation between random innovations supplied to the stochastic process.
|
peterf@39
|
709 %Dubnov, MacAdams, Reynolds (2006)
|
peterf@39
|
710 %Bailes and Dean (2009)
|
peterf@39
|
711
|
peterf@26
|
712 \begin{itemize}
|
peterf@39
|
713 \item Continuous domain information
|
peterf@39
|
714 \item Audio based music expectation modelling
|
peterf@39
|
715 \item Proposed model for Gaussian processes
|
peterf@26
|
716 \end{itemize}
|
peterf@26
|
717 \emph{Peter}
|
peterf@26
|
718
|
samer@4
|
719
|
samer@4
|
720 \subsection{Beat Tracking}
|
hekeus@16
|
721 \emph{Andrew}
|
samer@4
|
722
|
samer@4
|
723
|
samer@24
|
724 \section{Information dynamics as compositional aid}
|
hekeus@13
|
725
|
hekeus@35
|
726 In addition to applying information dynamics to analysis, it is also possible to apply it to the generation of content, such as to the composition of musical materials.
|
hekeus@35
|
727 The outputs of algorithmic or stochastic processes can be filtered to match a set of criteria defined in terms of the information dynamics model, this criteria thus becoming a means of interfacing with the generative process.
|
hekeus@35
|
728 For instance a stochastic music generating process could be controlled by modifying constraints on its output in terms of predictive information rate or entropy rate.
|
hekeus@13
|
729
|
hekeus@35
|
730 The use of stochastic processes for the composition of musical material has been widespread for decades -- for instance Iannis Xenakis applied probabilistic mathematical models to the creation of musical materials\cite{Xenakis:1992ul}.
|
hekeus@35
|
731 Information dynamics can serve as a novel framework for the exploration of the possibilities of such processes at the high and abstract level of expectation, randomness and predictability.
|
hekeus@13
|
732
|
samer@23
|
733 \subsection{The Melody Triangle}
|
samer@23
|
734
|
samer@34
|
735 \begin{figure}
|
samer@34
|
736 \centering
|
samer@34
|
737 \includegraphics[width=\linewidth]{figs/mtriscat}
|
samer@34
|
738 \caption{The population of transition matrices distributed along three axes of
|
samer@34
|
739 redundancy, entropy rate and predictive information rate (all measured in bits).
|
samer@34
|
740 The concentrations of points along the redundancy axis correspond
|
samer@34
|
741 to Markov chains which are roughly periodic with periods of 2 (redundancy 1 bit),
|
samer@34
|
742 3, 4, \etc all the way to period 8 (redundancy 3 bits). The colour of each point
|
samer@34
|
743 represents its PIR---note that the highest values are found at intermediate entropy
|
samer@34
|
744 and redundancy, and that the distribution as a whole makes a curved triangle. Although
|
samer@34
|
745 not visible in this plot, it is largely hollow in the middle.
|
samer@34
|
746 \label{InfoDynEngine}}
|
samer@34
|
747 \end{figure}
|
samer@23
|
748
|
samer@41
|
749 The Melody Triangle is an exploratory interface for the discovery of melodic content, where the input -- positions within a triangle -- directly map to information theoretic measures of the output.
|
samer@41
|
750 The measures -- entropy rate, redundancy and predictive information rate -- form a criteria with which to filter the output of the stochastic processes used to generate sequences of notes.
|
samer@41
|
751 These measures address notions of expectation and surprise in music, and as such the Melody Triangle is a means of interfacing with a generative process in terms of the predictability of its output.
|
samer@41
|
752
|
samer@41
|
753 The triangle is `populated' with possible parameter values for melody generators.
|
samer@41
|
754 These are plotted in a 3d statistical space of redundancy, entropy rate and predictive information rate.
|
samer@41
|
755 In our case we generated thousands of transition matrixes, representing first-order Markov chains, by a random sampling method.
|
samer@41
|
756 In figure \ref{InfoDynEngine} we see a representation of how these matrixes are distributed in the 3d statistical space; each one of these points corresponds to a transition matrix.
|
samer@41
|
757
|
samer@41
|
758
|
samer@41
|
759
|
samer@41
|
760
|
samer@41
|
761 The distribution of transition matrixes plotted in this space forms an arch shape that is fairly thin.
|
samer@41
|
762 It thus becomes a reasonable approximation to pretend that it is just a sheet in two dimensions; and so we stretch out this curved arc into a flat triangle.
|
samer@41
|
763 It is this triangular sheet that is our `Melody Triangle' and forms the interface by which the system is controlled.
|
samer@41
|
764 Using this interface thus involves a mapping to statistical space; a user selects a position within the triangle, and a corresponding transition matrix is returned.
|
samer@41
|
765 Figure \ref{TheTriangle} shows how the triangle maps to different measures of redundancy, entropy rate and predictive information rate.
|
samer@41
|
766
|
samer@41
|
767
|
samer@41
|
768
|
samer@41
|
769
|
samer@41
|
770 Each corner corresponds to three different extremes of predictability and unpredictability, which could be loosely characterised as `periodicity', `noise' and `repetition'.
|
samer@41
|
771 Melodies from the `noise' corner have no discernible pattern; they have high entropy rate, low predictive information rate and low redundancy.
|
samer@41
|
772 These melodies are essentially totally random.
|
samer@41
|
773 A melody along the `periodicity' to `repetition' edge are all deterministic loops that get shorter as we approach the `repetition' corner, until it becomes just one repeating note.
|
samer@41
|
774 It is the areas in between the extremes that provide the more `interesting' melodies.
|
samer@41
|
775 These melodies have some level of unpredictability, but are not completely random.
|
samer@41
|
776 Or, conversely, are predictable, but not entirely so.
|
samer@41
|
777
|
samer@41
|
778 \begin{figure}
|
samer@41
|
779 \centering
|
samer@41
|
780 \includegraphics[width=0.9\linewidth]{figs/TheTriangle.pdf}
|
samer@41
|
781 \caption{The Melody Triangle\label{TheTriangle}}
|
samer@41
|
782 \end{figure}
|
samer@41
|
783
|
samer@41
|
784
|
samer@41
|
785 The Melody Triangle exists in two incarnations; a standard screen based interface where a user moves tokens in and around a triangle on screen, and a multi-user interactive installation where a Kinect camera tracks individuals in a space and maps their positions in physical space to the triangle.
|
samer@41
|
786 In the latter visitors entering the installation generates a melody, and could collaborate with their co-visitors to generate musical textures -- a playful yet informative way to explore expectation and surprise in music.
|
samer@41
|
787 Additionally different gestures could be detected to change the tempo, register, instrumentation and periodicity of the output melody.
|
samer@41
|
788
|
samer@23
|
789 As a screen based interface the Melody Triangle can serve as composition tool.
|
hekeus@35
|
790 A triangle is drawn on the screen, screen space thus mapped to the statistical space of the Melody Triangle.
|
hekeus@35
|
791 A number of round tokens, each representing a melody can be dragged in and around the triangle.
|
hekeus@35
|
792 When a token is dragged into the triangle, the system will start generating the sequence of symbols with statistical properties that correspond to the position of the token.
|
hekeus@35
|
793 These symbols are then mapped to notes of a scale.
|
hekeus@35
|
794 Keyboard input allow for control over additionally parameters.
|
samer@23
|
795
|
hekeus@40
|
796 The Melody Triangle is can assist a composer in the creation not only of melodies, but, by placing multiple tokens in the triangle, can generate intricate musical textures.
|
hekeus@35
|
797 Unlike other computer aided composition tools or programming environments, here the composer engages with music on the high and abstract level of expectation, randomness and predictability.
|
hekeus@35
|
798
|
hekeus@35
|
799
|
hekeus@38
|
800
|
hekeus@38
|
801 \subsection{Information Dynamics as Evaluative Feedback Mechanism}
|
hekeus@38
|
802 %NOT SURE THIS SHOULD BE HERE AT ALL..?
|
hekeus@38
|
803
|
hekeus@38
|
804
|
hekeus@40
|
805 Information measures on a stream of symbols can form a feedback mechanism; a rudamentary `critic' of sorts.
|
hekeus@40
|
806 For instance symbol by symbol measure of predictive information rate, entropy rate and redundancy could tell us if a stream of symbols is currently `boring', either because it is too repetitive, or because it is too chaotic.
|
hekeus@40
|
807 Such feedback would be oblivious to more long term and large scale structures, but it nonetheless could be provide a composer valuable insight on the short term properties of a work.
|
hekeus@40
|
808 This could not only be used for the evaluation of pre-composed streams of symbols, but could also provide real-time feedback in an improvisatory setup.
|
hekeus@38
|
809
|
hekeus@13
|
810 \section{Musical Preference and Information Dynamics}
|
hekeus@38
|
811 We are carrying out a study to investigate the relationship between musical preference and the information dynamics models, the experimental interface a simplified version of the screen-based Melody Triangle.
|
hekeus@38
|
812 Participants are asked to use this music pattern generator under various experimental conditions in a composition task.
|
hekeus@38
|
813 The data collected includes usage statistics of the system: where in the triangle they place the tokens, how long they leave them there and the state of the system when users, by pressing a key, indicate that they like what they are hearing.
|
hekeus@38
|
814 As such the experiments will help us identify any correlation between the information theoretic properties of a stream and its perceived aesthetic worth.
|
hekeus@16
|
815
|
samer@4
|
816
|
hekeus@38
|
817 %\emph{comparable system} Gordon Pask's Musicolor (1953) applied a similar notion
|
hekeus@38
|
818 %of boredom in its design. The Musicolour would react to audio input through a
|
hekeus@38
|
819 %microphone by flashing coloured lights. Rather than a direct mapping of sound
|
hekeus@38
|
820 %to light, Pask designed the device to be a partner to a performing musician. It
|
hekeus@38
|
821 %would adapt its lighting pattern based on the rhythms and frequencies it would
|
hekeus@38
|
822 %hear, quickly `learning' to flash in time with the music. However Pask endowed
|
hekeus@38
|
823 %the device with the ability to `be bored'; if the rhythmic and frequency content
|
hekeus@38
|
824 %of the input remained the same for too long it would listen for other rhythms
|
hekeus@38
|
825 %and frequencies, only lighting when it heard these. As the Musicolour would
|
hekeus@38
|
826 %`get bored', the musician would have to change and vary their playing, eliciting
|
hekeus@38
|
827 %new and unexpected outputs in trying to keep the Musicolour interested.
|
samer@4
|
828
|
hekeus@13
|
829
|
samer@4
|
830 \section{Conclusion}
|
samer@4
|
831
|
samer@9
|
832 \bibliographystyle{unsrt}
|
hekeus@16
|
833 {\bibliography{all,c4dm,nime}}
|
samer@4
|
834 \end{document}
|