samer@73
|
1 \documentclass[conference]{IEEEtran}
|
samer@73
|
2 \usepackage{fixltx2e}
|
samer@73
|
3 \usepackage{cite}
|
samer@73
|
4 \usepackage[spacing]{microtype}
|
samer@73
|
5 \usepackage[cmex10]{amsmath}
|
samer@73
|
6 \usepackage{graphicx}
|
samer@73
|
7 \usepackage{amssymb}
|
samer@73
|
8 \usepackage{epstopdf}
|
samer@73
|
9 \usepackage{url}
|
samer@73
|
10 \usepackage{listings}
|
samer@73
|
11 %\usepackage[expectangle]{tools}
|
samer@73
|
12 \usepackage{tools}
|
samer@73
|
13 \usepackage{tikz}
|
samer@73
|
14 \usetikzlibrary{calc}
|
samer@73
|
15 \usetikzlibrary{matrix}
|
samer@73
|
16 \usetikzlibrary{patterns}
|
samer@73
|
17 \usetikzlibrary{arrows}
|
samer@73
|
18
|
samer@73
|
19 \let\citep=\cite
|
samer@73
|
20 \newcommand{\colfig}[2][1]{\includegraphics[width=#1\linewidth]{figs/#2}}%
|
samer@73
|
21 \newcommand\preals{\reals_+}
|
samer@73
|
22 \newcommand\X{\mathcal{X}}
|
samer@73
|
23 \newcommand\Y{\mathcal{Y}}
|
samer@73
|
24 \newcommand\domS{\mathcal{S}}
|
samer@73
|
25 \newcommand\A{\mathcal{A}}
|
samer@73
|
26 \newcommand\Data{\mathcal{D}}
|
samer@73
|
27 \newcommand\rvm[1]{\mathrm{#1}}
|
samer@73
|
28 \newcommand\sps{\,.\,}
|
samer@73
|
29 \newcommand\Ipred{\mathcal{I}_{\mathrm{pred}}}
|
samer@73
|
30 \newcommand\Ix{\mathcal{I}}
|
samer@73
|
31 \newcommand\IXZ{\overline{\underline{\mathcal{I}}}}
|
samer@73
|
32 \newcommand\x{\vec{x}}
|
samer@73
|
33 \newcommand\Ham[1]{\mathcal{H}_{#1}}
|
samer@73
|
34 \newcommand\subsets[2]{[#1]^{(k)}}
|
samer@73
|
35 \def\bet(#1,#2){#1..#2}
|
samer@73
|
36
|
samer@73
|
37
|
samer@73
|
38 \def\ev(#1=#2){#1\!\!=\!#2}
|
samer@73
|
39 \newcommand\rv[1]{\Omega \to #1}
|
samer@73
|
40 \newcommand\ceq{\!\!=\!}
|
samer@73
|
41 \newcommand\cmin{\!-\!}
|
samer@73
|
42 \newcommand\modulo[2]{#1\!\!\!\!\!\mod#2}
|
samer@73
|
43
|
samer@73
|
44 \newcommand\sumitoN{\sum_{i=1}^N}
|
samer@73
|
45 \newcommand\sumktoK{\sum_{k=1}^K}
|
samer@73
|
46 \newcommand\sumjtoK{\sum_{j=1}^K}
|
samer@73
|
47 \newcommand\sumalpha{\sum_{\alpha\in\A}}
|
samer@73
|
48 \newcommand\prodktoK{\prod_{k=1}^K}
|
samer@73
|
49 \newcommand\prodjtoK{\prod_{j=1}^K}
|
samer@73
|
50
|
samer@73
|
51 \newcommand\past[1]{\overset{\rule{0pt}{0.2em}\smash{\leftarrow}}{#1}}
|
samer@73
|
52 \newcommand\fut[1]{\overset{\rule{0pt}{0.1em}\smash{\rightarrow}}{#1}}
|
samer@73
|
53 \newcommand\parity[2]{P^{#1}_{2,#2}}
|
samer@73
|
54 \newcommand\specint[1]{\frac{1}{2\pi}\int_{-\pi}^\pi #1{S(\omega)} \dd \omega}
|
samer@73
|
55 %\newcommand\specint[1]{\int_{-1/2}^{1/2} #1{S(f)} \dd f}
|
samer@73
|
56
|
samer@73
|
57
|
samer@73
|
58 %\usepackage[parfill]{parskip}
|
samer@73
|
59
|
samer@73
|
60 \begin{document}
|
samer@73
|
61 \title{Cognitive Music Modelling: an\\Information Dynamics Approach}
|
samer@73
|
62
|
samer@73
|
63 \author{
|
samer@73
|
64 \IEEEauthorblockN{Samer A. Abdallah, Henrik Ekeus, Peter Foster}
|
samer@73
|
65 \IEEEauthorblockN{Andrew Robertson and Mark D. Plumbley}
|
samer@73
|
66 \IEEEauthorblockA{Centre for Digital Music\\
|
samer@73
|
67 Queen Mary University of London\\
|
samer@73
|
68 Mile End Road, London E1 4NS}}
|
samer@73
|
69
|
samer@73
|
70 \maketitle
|
samer@73
|
71 \begin{abstract}
|
samer@73
|
72 We describe an information-theoretic approach to the analysis
|
samer@73
|
73 of music and other sequential data, which emphasises the predictive aspects
|
samer@73
|
74 of perception, and the dynamic process
|
samer@73
|
75 of forming and modifying expectations about an unfolding stream of data,
|
samer@73
|
76 characterising these using the tools of information theory: entropies,
|
samer@73
|
77 mutual informations, and related quantities.
|
samer@73
|
78 After reviewing the theoretical foundations,
|
samer@73
|
79 % we present a new result on predictive information rates in high-order Markov chains, and
|
samer@73
|
80 we discuss a few emerging areas of application, including
|
samer@73
|
81 musicological analysis, real-time beat-tracking analysis, and the generation
|
samer@73
|
82 of musical materials as a cognitively-informed compositional aid.
|
samer@73
|
83 \end{abstract}
|
samer@73
|
84
|
samer@73
|
85
|
samer@73
|
86 \section{Introduction}
|
samer@73
|
87 \label{s:Intro}
|
samer@73
|
88 The relationship between
|
samer@73
|
89 Shannon's \cite{Shannon48} information theory and music and art in general has been the
|
samer@73
|
90 subject of some interest since the 1950s
|
samer@73
|
91 \cite{Youngblood58,CoonsKraehenbuehl1958,Moles66,Meyer67,Cohen1962}.
|
samer@73
|
92 The general thesis is that perceptible qualities and subjective states
|
samer@73
|
93 like uncertainty, surprise, complexity, tension, and interestingness
|
samer@73
|
94 are closely related to information-theoretic quantities like
|
samer@73
|
95 entropy, relative entropy, and mutual information.
|
samer@73
|
96
|
samer@73
|
97 Music is also an inherently dynamic process,
|
samer@73
|
98 where listeners build up expectations about what is to happen next,
|
samer@73
|
99 which may be fulfilled
|
samer@73
|
100 immediately, after some delay, or modified as the music unfolds.
|
samer@73
|
101 In this paper, we explore this ``Information Dynamics'' view of music,
|
samer@73
|
102 discussing the theory behind it and some emerging applications.
|
samer@73
|
103
|
samer@73
|
104 \subsection{Expectation and surprise in music}
|
samer@73
|
105 The idea that the musical experience is strongly shaped by the generation
|
samer@73
|
106 and playing out of strong and weak expectations was put forward by, amongst others,
|
samer@73
|
107 music theorists L. B. Meyer \cite{Meyer67} and Narmour \citep{Narmour77}, but was
|
samer@73
|
108 recognised much earlier; for example,
|
samer@73
|
109 it was elegantly put by Hanslick \cite{Hanslick1854} in the
|
samer@73
|
110 nineteenth century:
|
samer@73
|
111 \begin{quote}
|
samer@73
|
112 `The most important factor in the mental process which accompanies the
|
samer@73
|
113 act of listening to music, and which converts it to a source of pleasure,
|
samer@73
|
114 is \ldots the intellectual satisfaction
|
samer@73
|
115 which the listener derives from continually following and anticipating
|
samer@73
|
116 the composer's intentions---now, to see his expectations fulfilled, and
|
samer@73
|
117 now, to find himself agreeably mistaken.
|
samer@73
|
118 %It is a matter of course that
|
samer@73
|
119 %this intellectual flux and reflux, this perpetual giving and receiving
|
samer@73
|
120 %takes place unconsciously, and with the rapidity of lightning-flashes.'
|
samer@73
|
121 \end{quote}
|
samer@73
|
122 An essential aspect of this is that music is experienced as a phenomenon
|
samer@73
|
123 that unfolds in time, rather than being apprehended as a static object
|
samer@73
|
124 presented in its entirety. Meyer argued that the experience depends
|
samer@73
|
125 on how we change and revise our conceptions \emph{as events happen}, on
|
samer@73
|
126 how expectation and prediction interact with occurrence, and that, to a
|
samer@73
|
127 large degree, the way to understand the effect of music is to focus on
|
samer@73
|
128 this `kinetics' of expectation and surprise.
|
samer@73
|
129
|
samer@73
|
130 Prediction and expectation are essentially probabilistic concepts
|
samer@73
|
131 and can be treated mathematically using probability theory.
|
samer@73
|
132 We suppose that when we listen to music, expectations are created on the basis
|
samer@73
|
133 of our familiarity with various styles of music and our ability to
|
samer@73
|
134 detect and learn statistical regularities in the music as they emerge,
|
samer@73
|
135 There is experimental evidence that human listeners are able to internalise
|
samer@73
|
136 statistical knowledge about musical structure, \eg
|
samer@73
|
137 % \citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also
|
samer@73
|
138 \citep{SaffranJohnsonAslin1999}, and also
|
samer@73
|
139 that statistical models can form an effective basis for computational
|
samer@73
|
140 analysis of music, \eg
|
samer@73
|
141 \cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}.
|
samer@73
|
142
|
samer@73
|
143 % \subsection{Music and information theory}
|
samer@73
|
144
|
samer@73
|
145 % With a probabilistic framework for music modelling and prediction in hand,
|
samer@73
|
146 % we can %are in a position to
|
samer@73
|
147 % compute various
|
samer@73
|
148 \comment{
|
samer@73
|
149 which provides us with a number of measures, such as entropy
|
samer@73
|
150 and mutual information, which are suitable for quantifying states of
|
samer@73
|
151 uncertainty and surprise, and thus could potentially enable us to build
|
samer@73
|
152 quantitative models of the listening process described above. They are
|
samer@73
|
153 what Berlyne \cite{Berlyne71} called `collative variables' since they are
|
samer@73
|
154 to do with patterns of occurrence rather than medium-specific details.
|
samer@73
|
155 Berlyne sought to show that the collative variables are closely related to
|
samer@73
|
156 perceptual qualities like complexity, tension, interestingness,
|
samer@73
|
157 and even aesthetic value, not just in music, but in other temporal
|
samer@73
|
158 or visual media.
|
samer@73
|
159 The relevance of information theory to music and art has
|
samer@73
|
160 also been addressed by researchers from the 1950s onwards
|
samer@73
|
161 \cite{Youngblood58,CoonsKraehenbuehl1958,Cohen1962,HillerBean66,Moles66,Meyer67}.
|
samer@73
|
162 }
|
samer@73
|
163 % information-theoretic quantities like entropy, relative entropy,
|
samer@73
|
164 % and mutual information.
|
samer@73
|
165 % and are major determinants of the overall experience.
|
samer@73
|
166 % Berlyne's `new experimental aesthetics', the `information-aestheticians'.
|
samer@73
|
167
|
samer@73
|
168 % Listeners then experience greater or lesser levels of surprise
|
samer@73
|
169 % in response to departures from these norms.
|
samer@73
|
170 % By careful manipulation
|
samer@73
|
171 % of the material, the composer can thus define, and induce within the
|
samer@73
|
172 % listener, a temporal programme of varying
|
samer@73
|
173 % levels of uncertainty, ambiguity and surprise.
|
samer@73
|
174
|
samer@73
|
175
|
samer@73
|
176 \subsection{Information dynamic approach}
|
samer@73
|
177 Our working hypothesis is that, as an intelligent, predictive
|
samer@73
|
178 agent (to which will refer as `it') listens to a piece of music, it maintains
|
samer@73
|
179 a dynamically evolving probabilistic belief state that enables it to make predictions
|
samer@73
|
180 about how the piece will continue, relying on both its previous experience
|
samer@73
|
181 of music and the emerging themes of the piece. As events unfold, it revises
|
samer@73
|
182 this belief state, which includes predictive
|
samer@73
|
183 distributions over possible future events. These
|
samer@73
|
184 % distributions and changes in distributions
|
samer@73
|
185 can be characterised in terms of a handful of information
|
samer@73
|
186 theoretic-measures such as entropy and relative entropy,
|
samer@73
|
187 what Berlyne \cite{Berlyne71} called `collative variables', since
|
samer@73
|
188 they are to do with \emph{patterns} of occurrence, rather than the details
|
samer@73
|
189 of which specific things occur,
|
samer@73
|
190 and developed the ideas of `information aesthetics' in an experimental setting.
|
samer@73
|
191 By tracing the
|
samer@73
|
192 evolution of a these measures, we obtain a representation which captures much
|
samer@73
|
193 of the significant structure of the music.
|
samer@73
|
194
|
samer@73
|
195 % In addition, when adaptive probabilistic models are used, expectations are
|
samer@73
|
196 % created mainly in response to \emph{patterns} of occurence,
|
samer@73
|
197 % rather the details of which specific things occur.
|
samer@73
|
198 One consequence of this approach is that regardless of the details of
|
samer@73
|
199 the sensory input or even which sensory modality is being processed, the resulting
|
samer@73
|
200 analysis is in terms of the same units: quantities of information (bits) and
|
samer@73
|
201 rates of information flow (bits per second). The information
|
samer@73
|
202 theoretic concepts in terms of which the analysis is framed are universal to all sorts
|
samer@73
|
203 of data.
|
samer@73
|
204 Together, these suggest that an information dynamic analysis captures a
|
samer@73
|
205 high level of \emph{abstraction}, and could be used to
|
samer@73
|
206 make structural comparisons between different temporal media,
|
samer@73
|
207 such as music, film, animation, and dance.
|
samer@73
|
208 % analyse and compare information
|
samer@73
|
209 % flow in different temporal media regardless of whether they are auditory,
|
samer@73
|
210 % visual or otherwise.
|
samer@73
|
211
|
samer@73
|
212 Another consequence is that the information dynamic approach gives us a principled way
|
samer@73
|
213 to address the notion of \emph{subjectivity}, since the analysis is dependent on the
|
samer@73
|
214 probability model the observer starts off with, which may depend on prior experience
|
samer@73
|
215 or other factors, and which may change over time. Thus, inter-subject variablity and
|
samer@73
|
216 variation in subjects' responses over time are
|
samer@73
|
217 fundamental to the theory.
|
samer@73
|
218
|
samer@73
|
219 %modelling the creative process, which often alternates between generative
|
samer@73
|
220 %and selective or evaluative phases \cite{Boden1990}, and would have
|
samer@73
|
221 %applications in tools for computer aided composition.
|
samer@73
|
222
|
samer@73
|
223
|
samer@73
|
224 \section{Theoretical review}
|
samer@73
|
225
|
samer@73
|
226 \subsection{Entropy and information}
|
samer@73
|
227 \label{s:entro-info}
|
samer@73
|
228
|
samer@73
|
229 Let $X$ denote some variable whose value is initially unknown to our
|
samer@73
|
230 hypothetical observer. We will treat $X$ mathematically as a random variable,
|
samer@73
|
231 with a value to be drawn from some set $\X$ and a
|
samer@73
|
232 probability distribution representing the observer's beliefs about the
|
samer@73
|
233 true value of $X$.
|
samer@73
|
234 In this case, the observer's uncertainty about $X$ can be quantified
|
samer@73
|
235 as the entropy of the random variable $H(X)$. For a discrete variable
|
samer@73
|
236 with probability mass function $p:\X \to [0,1]$, this is
|
samer@73
|
237 \begin{equation}
|
samer@73
|
238 H(X) = \sum_{x\in\X} -p(x) \log p(x), % = \expect{-\log p(X)},
|
samer@73
|
239 \end{equation}
|
samer@73
|
240 % where $\expect{}$ is the expectation operator.
|
samer@73
|
241 The negative-log-probability
|
samer@73
|
242 $\ell(x) = -\log p(x)$ of a particular value $x$ can usefully be thought of as
|
samer@73
|
243 the \emph{surprisingness} of the value $x$ should it be observed, and
|
samer@73
|
244 hence the entropy is the expectation of the surprisingness, $\expect \ell(X)$.
|
samer@73
|
245
|
samer@73
|
246 Now suppose that the observer receives some new data $\Data$ that
|
samer@73
|
247 causes a revision of its beliefs about $X$. The \emph{information}
|
samer@73
|
248 in this new data \emph{about} $X$ can be quantified as the
|
samer@73
|
249 relative entropy or
|
samer@73
|
250 Kullback-Leibler (KL) divergence between the prior and posterior
|
samer@73
|
251 distributions $p(x)$ and $p(x|\Data)$ respectively:
|
samer@73
|
252 \begin{equation}
|
samer@73
|
253 \mathcal{I}_{\Data\to X} = D(p_{X|\Data} || p_{X})
|
samer@73
|
254 = \sum_{x\in\X} p(x|\Data) \log \frac{p(x|\Data)}{p(x)}.
|
samer@73
|
255 \label{eq:info}
|
samer@73
|
256 \end{equation}
|
samer@73
|
257 When there are multiple variables $X_1, X_2$
|
samer@73
|
258 \etc which the observer believes to be dependent, then the observation of
|
samer@73
|
259 one may change its beliefs and hence yield information about the
|
samer@73
|
260 others. The joint and conditional entropies as described in any
|
samer@73
|
261 textbook on information theory (\eg \cite{CoverThomas}) then quantify
|
samer@73
|
262 the observer's expected uncertainty about groups of variables given the
|
samer@73
|
263 values of others. In particular, the \emph{mutual information}
|
samer@73
|
264 $I(X_1;X_2)$ is both the expected information
|
samer@73
|
265 in an observation of $X_2$ about $X_1$ and the expected reduction
|
samer@73
|
266 in uncertainty about $X_1$ after observing $X_2$:
|
samer@73
|
267 \begin{equation}
|
samer@73
|
268 I(X_1;X_2) = H(X_1) - H(X_1|X_2),
|
samer@73
|
269 \end{equation}
|
samer@73
|
270 where $H(X_1|X_2) = H(X_1,X_2) - H(X_2)$ is the conditional entropy
|
samer@73
|
271 of $X_1$ given $X_2$. A little algebra shows that $I(X_1;X_2)=I(X_2;X_1)$
|
samer@73
|
272 and so the mutual information is symmetric in its arguments. A conditional
|
samer@73
|
273 form of the mutual information can be formulated analogously:
|
samer@73
|
274 \begin{equation}
|
samer@73
|
275 I(X_1;X_2|X_3) = H(X_1|X_3) - H(X_1|X_2,X_3).
|
samer@73
|
276 \end{equation}
|
samer@73
|
277 These relationships between the various entropies and mutual
|
samer@73
|
278 informations are conveniently visualised in \emph{information diagrams}
|
samer@73
|
279 or I-diagrams \cite{Yeung1991} such as the one in \figrf{venn-example}.
|
samer@73
|
280
|
samer@73
|
281 \begin{fig}{venn-example}
|
samer@73
|
282 \newcommand\rad{2.2em}%
|
samer@73
|
283 \newcommand\circo{circle (3.4em)}%
|
samer@73
|
284 \newcommand\labrad{4.3em}
|
samer@73
|
285 \newcommand\bound{(-6em,-5em) rectangle (6em,6em)}
|
samer@73
|
286 \newcommand\colsep{\ }
|
samer@73
|
287 \newcommand\clipin[1]{\clip (#1) \circo;}%
|
samer@73
|
288 \newcommand\clipout[1]{\clip \bound (#1) \circo;}%
|
samer@73
|
289 \newcommand\cliptwo[3]{%
|
samer@73
|
290 \begin{scope}
|
samer@73
|
291 \clipin{#1};
|
samer@73
|
292 \clipin{#2};
|
samer@73
|
293 \clipout{#3};
|
samer@73
|
294 \fill[black!30] \bound;
|
samer@73
|
295 \end{scope}
|
samer@73
|
296 }%
|
samer@73
|
297 \newcommand\clipone[3]{%
|
samer@73
|
298 \begin{scope}
|
samer@73
|
299 \clipin{#1};
|
samer@73
|
300 \clipout{#2};
|
samer@73
|
301 \clipout{#3};
|
samer@73
|
302 \fill[black!15] \bound;
|
samer@73
|
303 \end{scope}
|
samer@73
|
304 }%
|
samer@73
|
305 \begin{tabular}{c@{\colsep}c}
|
samer@73
|
306 \scalebox{0.9}{%
|
samer@73
|
307 \begin{tikzpicture}[baseline=0pt]
|
samer@73
|
308 \coordinate (p1) at (90:\rad);
|
samer@73
|
309 \coordinate (p2) at (210:\rad);
|
samer@73
|
310 \coordinate (p3) at (-30:\rad);
|
samer@73
|
311 \clipone{p1}{p2}{p3};
|
samer@73
|
312 \clipone{p2}{p3}{p1};
|
samer@73
|
313 \clipone{p3}{p1}{p2};
|
samer@73
|
314 \cliptwo{p1}{p2}{p3};
|
samer@73
|
315 \cliptwo{p2}{p3}{p1};
|
samer@73
|
316 \cliptwo{p3}{p1}{p2};
|
samer@73
|
317 \begin{scope}
|
samer@73
|
318 \clip (p1) \circo;
|
samer@73
|
319 \clip (p2) \circo;
|
samer@73
|
320 \clip (p3) \circo;
|
samer@73
|
321 \fill[black!45] \bound;
|
samer@73
|
322 \end{scope}
|
samer@73
|
323 \draw (p1) \circo;
|
samer@73
|
324 \draw (p2) \circo;
|
samer@73
|
325 \draw (p3) \circo;
|
samer@73
|
326 \path
|
samer@73
|
327 (barycentric cs:p3=1,p1=-0.2,p2=-0.1) +(0ex,0) node {$I_{3|12}$}
|
samer@73
|
328 (barycentric cs:p1=1,p2=-0.2,p3=-0.1) +(0ex,0) node {$I_{1|23}$}
|
samer@73
|
329 (barycentric cs:p2=1,p3=-0.2,p1=-0.1) +(0ex,0) node {$I_{2|13}$}
|
samer@73
|
330 (barycentric cs:p3=1,p2=1,p1=-0.55) +(0ex,0) node {$I_{23|1}$}
|
samer@73
|
331 (barycentric cs:p1=1,p3=1,p2=-0.55) +(0ex,0) node {$I_{13|2}$}
|
samer@73
|
332 (barycentric cs:p2=1,p1=1,p3=-0.55) +(0ex,0) node {$I_{12|3}$}
|
samer@73
|
333 (barycentric cs:p3=1,p2=1,p1=1) node {$I_{123}$}
|
samer@73
|
334 ;
|
samer@73
|
335 \path
|
samer@73
|
336 (p1) +(140:\labrad) node {$X_1$}
|
samer@73
|
337 (p2) +(-140:\labrad) node {$X_2$}
|
samer@73
|
338 (p3) +(-40:\labrad) node {$X_3$};
|
samer@73
|
339 \end{tikzpicture}%
|
samer@73
|
340 }
|
samer@73
|
341 &
|
samer@73
|
342 \parbox{0.5\linewidth}{
|
samer@73
|
343 \small
|
samer@73
|
344 \begin{align*}
|
samer@73
|
345 I_{1|23} &= H(X_1|X_2,X_3) \\
|
samer@73
|
346 I_{13|2} &= I(X_1;X_3|X_2) \\
|
samer@73
|
347 I_{1|23} + I_{13|2} &= H(X_1|X_2) \\
|
samer@73
|
348 I_{12|3} + I_{123} &= I(X_1;X_2)
|
samer@73
|
349 \end{align*}
|
samer@73
|
350 }
|
samer@73
|
351 \end{tabular}
|
samer@73
|
352 \caption{
|
samer@73
|
353 I-diagram of entropies and mutual informations
|
samer@73
|
354 for three random variables $X_1$, $X_2$ and $X_3$. The areas of
|
samer@73
|
355 the three circles represent $H(X_1)$, $H(X_2)$ and $H(X_3)$ respectively.
|
samer@73
|
356 The total shaded area is the joint entropy $H(X_1,X_2,X_3)$.
|
samer@73
|
357 The central area $I_{123}$ is the co-information \cite{McGill1954}.
|
samer@73
|
358 Some other information measures are indicated in the legend.
|
samer@73
|
359 }
|
samer@73
|
360 \end{fig}
|
samer@73
|
361
|
samer@73
|
362
|
samer@73
|
363 \subsection{Surprise and information in sequences}
|
samer@73
|
364 \label{s:surprise-info-seq}
|
samer@73
|
365
|
samer@73
|
366 Suppose that $(\ldots,X_{-1},X_0,X_1,\ldots)$ is a sequence of
|
samer@73
|
367 random variables, infinite in both directions,
|
samer@73
|
368 and that $\mu$ is the associated probability measure over all
|
samer@73
|
369 realisations of the sequence. In the following, $\mu$ will simply serve
|
samer@73
|
370 as a label for the process. We can indentify a number of information-theoretic
|
samer@73
|
371 measures meaningful in the context of a sequential observation of the sequence, during
|
samer@73
|
372 which, at any time $t$, the sequence can be divided into a `present' $X_t$, a `past'
|
samer@73
|
373 $\past{X}_t \equiv (\ldots, X_{t-2}, X_{t-1})$, and a `future'
|
samer@73
|
374 $\fut{X}_t \equiv (X_{t+1},X_{t+2},\ldots)$.
|
samer@73
|
375 We will write the actually observed value of $X_t$ as $x_t$, and
|
samer@73
|
376 the sequence of observations up to but not including $x_t$ as
|
samer@73
|
377 $\past{x}_t$.
|
samer@73
|
378 % Since the sequence is assumed stationary, we can without loss of generality,
|
samer@73
|
379 % assume that $t=0$ in the following definitions.
|
samer@73
|
380
|
samer@73
|
381 The in-context surprisingness of the observation $X_t=x_t$ depends on
|
samer@73
|
382 both $x_t$ and the context $\past{x}_t$:
|
samer@73
|
383 \begin{equation}
|
samer@73
|
384 \ell_t = - \log p(x_t|\past{x}_t).
|
samer@73
|
385 \end{equation}
|
samer@73
|
386 However, before $X_t$ is observed, the observer can compute
|
samer@73
|
387 the \emph{expected} surprisingness as a measure of its uncertainty about
|
samer@73
|
388 $X_t$; this may be written as an entropy
|
samer@73
|
389 $H(X_t|\ev(\past{X}_t = \past{x}_t))$, but note that this is
|
samer@73
|
390 conditional on the \emph{event} $\ev(\past{X}_t=\past{x}_t)$, not the
|
samer@73
|
391 \emph{variables} $\past{X}_t$ as in the conventional conditional entropy.
|
samer@73
|
392
|
samer@73
|
393 The surprisingness $\ell_t$ and expected surprisingness
|
samer@73
|
394 $H(X_t|\ev(\past{X}_t=\past{x}_t))$
|
samer@73
|
395 can be understood as \emph{subjective} information dynamic measures, since they are
|
samer@73
|
396 based on the observer's probability model in the context of the actually observed sequence
|
samer@73
|
397 $\past{x}_t$. They characterise what it is like to be `in the observer's shoes'.
|
samer@73
|
398 If we view the observer as a purely passive or reactive agent, this would
|
samer@73
|
399 probably be sufficient, but for active agents such as humans or animals, it is
|
samer@73
|
400 often necessary to \emph{aniticipate} future events in order, for example, to plan the
|
samer@73
|
401 most effective course of action. It makes sense for such observers to be
|
samer@73
|
402 concerned about the predictive probability distribution over future events,
|
samer@73
|
403 $p(\fut{x}_t|\past{x}_t)$. When an observation $\ev(X_t=x_t)$ is made in this context,
|
samer@73
|
404 the \emph{instantaneous predictive information} (IPI) $\mathcal{I}_t$ at time $t$
|
samer@73
|
405 is the information in the event $\ev(X_t=x_t)$ about the entire future of the sequence $\fut{X}_t$,
|
samer@73
|
406 \emph{given} the observed past $\past{X}_t=\past{x}_t$.
|
samer@73
|
407 Referring to the definition of information \eqrf{info}, this is the KL divergence
|
samer@73
|
408 between prior and posterior distributions over possible futures, which written out in full, is
|
samer@73
|
409 \begin{equation}
|
samer@73
|
410 \mathcal{I}_t = \sum_{\fut{x}_t \in \X^*}
|
samer@73
|
411 p(\fut{x}_t|x_t,\past{x}_t) \log \frac{ p(\fut{x}_t|x_t,\past{x}_t) }{ p(\fut{x}_t|\past{x}_t) },
|
samer@73
|
412 \end{equation}
|
samer@73
|
413 where the sum is to be taken over the set of infinite sequences $\X^*$.
|
samer@73
|
414 Note that it is quite possible for an event to be surprising but not informative
|
samer@73
|
415 in a predictive sense.
|
samer@73
|
416 As with the surprisingness, the observer can compute its \emph{expected} IPI
|
samer@73
|
417 at time $t$, which reduces to a mutual information $I(X_t;\fut{X}_t|\ev(\past{X}_t=\past{x}_t))$
|
samer@73
|
418 conditioned on the observed past. This could be used, for example, as an estimate
|
samer@73
|
419 of attentional resources which should be directed at this stream of data, which may
|
samer@73
|
420 be in competition with other sensory streams.
|
samer@73
|
421
|
samer@73
|
422 \subsection{Information measures for stationary random processes}
|
samer@73
|
423 \label{s:process-info}
|
samer@73
|
424
|
samer@73
|
425
|
samer@73
|
426 \begin{fig}{predinfo-bg}
|
samer@73
|
427 \newcommand\subfig[2]{\shortstack{#2\\[0.75em]#1}}
|
samer@73
|
428 \newcommand\rad{2em}%
|
samer@73
|
429 \newcommand\ovoid[1]{%
|
samer@73
|
430 ++(-#1,\rad)
|
samer@73
|
431 -- ++(2 * #1,0em) arc (90:-90:\rad)
|
samer@73
|
432 -- ++(-2 * #1,0em) arc (270:90:\rad)
|
samer@73
|
433 }%
|
samer@73
|
434 \newcommand\axis{2.75em}%
|
samer@73
|
435 \newcommand\olap{0.85em}%
|
samer@73
|
436 \newcommand\offs{3.6em}
|
samer@73
|
437 \newcommand\colsep{\hspace{5em}}
|
samer@73
|
438 \newcommand\longblob{\ovoid{\axis}}
|
samer@73
|
439 \newcommand\shortblob{\ovoid{1.75em}}
|
samer@73
|
440 \begin{tabular}{c}
|
samer@73
|
441 \comment{
|
samer@73
|
442 \subfig{(a) multi-information and entropy rates}{%
|
samer@73
|
443 \begin{tikzpicture}%[baseline=-1em]
|
samer@73
|
444 \newcommand\rc{1.75em}
|
samer@73
|
445 \newcommand\throw{2.5em}
|
samer@73
|
446 \coordinate (p1) at (180:1.5em);
|
samer@73
|
447 \coordinate (p2) at (0:0.3em);
|
samer@73
|
448 \newcommand\bound{(-7em,-2.6em) rectangle (7em,3.0em)}
|
samer@73
|
449 \newcommand\present{(p2) circle (\rc)}
|
samer@73
|
450 \newcommand\thepast{(p1) ++(-\throw,0) \ovoid{\throw}}
|
samer@73
|
451 \newcommand\fillclipped[2]{%
|
samer@73
|
452 \begin{scope}[even odd rule]
|
samer@73
|
453 \foreach \thing in {#2} {\clip \thing;}
|
samer@73
|
454 \fill[black!#1] \bound;
|
samer@73
|
455 \end{scope}%
|
samer@73
|
456 }%
|
samer@73
|
457 \fillclipped{30}{\present,\bound \thepast}
|
samer@73
|
458 \fillclipped{15}{\present,\bound \thepast}
|
samer@73
|
459 \fillclipped{45}{\present,\thepast}
|
samer@73
|
460 \draw \thepast;
|
samer@73
|
461 \draw \present;
|
samer@73
|
462 \node at (barycentric cs:p2=1,p1=-0.3) {$h_\mu$};
|
samer@73
|
463 \node at (barycentric cs:p2=1,p1=1) [shape=rectangle,fill=black!45,inner sep=1pt]{$\rho_\mu$};
|
samer@73
|
464 \path (p2) +(90:3em) node {$X_0$};
|
samer@73
|
465 \path (p1) +(-3em,0em) node {\shortstack{infinite\\past}};
|
samer@73
|
466 \path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
|
samer@73
|
467 \end{tikzpicture}}%
|
samer@73
|
468 \\[1em]
|
samer@73
|
469 \subfig{(a) excess entropy}{%
|
samer@73
|
470 \newcommand\blob{\longblob}
|
samer@73
|
471 \begin{tikzpicture}
|
samer@73
|
472 \coordinate (p1) at (-\offs,0em);
|
samer@73
|
473 \coordinate (p2) at (\offs,0em);
|
samer@73
|
474 \begin{scope}
|
samer@73
|
475 \clip (p1) \blob;
|
samer@73
|
476 \clip (p2) \blob;
|
samer@73
|
477 \fill[lightgray] (-1,-1) rectangle (1,1);
|
samer@73
|
478 \end{scope}
|
samer@73
|
479 \draw (p1) +(-0.5em,0em) node{\shortstack{infinite\\past}} \blob;
|
samer@73
|
480 \draw (p2) +(0.5em,0em) node{\shortstack{infinite\\future}} \blob;
|
samer@73
|
481 \path (0,0) node (future) {$E$};
|
samer@73
|
482 \path (p1) +(-2em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
|
samer@73
|
483 \path (p2) +(2em,\rad) node [anchor=south] {$X_0,\ldots$};
|
samer@73
|
484 \end{tikzpicture}%
|
samer@73
|
485 }%
|
samer@73
|
486 \\[1em]
|
samer@73
|
487 }
|
samer@73
|
488 % \subfig{(b) predictive information rate $b_\mu$}{%
|
samer@73
|
489 \begin{tikzpicture}%[baseline=-1em]
|
samer@73
|
490 \newcommand\rc{2.2em}
|
samer@73
|
491 \newcommand\throw{2.5em}
|
samer@73
|
492 \coordinate (p1) at (210:1.5em);
|
samer@73
|
493 \coordinate (p2) at (90:0.8em);
|
samer@73
|
494 \coordinate (p3) at (-30:1.5em);
|
samer@73
|
495 \newcommand\bound{(-7em,-2.6em) rectangle (7em,3.0em)}
|
samer@73
|
496 \newcommand\present{(p2) circle (\rc)}
|
samer@73
|
497 \newcommand\thepast{(p1) ++(-\throw,0) \ovoid{\throw}}
|
samer@73
|
498 \newcommand\future{(p3) ++(\throw,0) \ovoid{\throw}}
|
samer@73
|
499 \newcommand\fillclipped[2]{%
|
samer@73
|
500 \begin{scope}[even odd rule]
|
samer@73
|
501 \foreach \thing in {#2} {\clip \thing;}
|
samer@73
|
502 \fill[black!#1] \bound;
|
samer@73
|
503 \end{scope}%
|
samer@73
|
504 }%
|
samer@73
|
505 % \fillclipped{80}{\future,\thepast}
|
samer@73
|
506 \fillclipped{30}{\present,\future,\bound \thepast}
|
samer@73
|
507 \fillclipped{15}{\present,\bound \future,\bound \thepast}
|
samer@73
|
508 \draw \future;
|
samer@73
|
509 \fillclipped{45}{\present,\thepast}
|
samer@73
|
510 \draw \thepast;
|
samer@73
|
511 \draw \present;
|
samer@73
|
512 \node at (barycentric cs:p2=0.9,p1=-0.17,p3=-0.17) {$r_\mu$};
|
samer@73
|
513 \node at (barycentric cs:p1=-0.5,p2=1.0,p3=1) {$b_\mu$};
|
samer@73
|
514 \node at (barycentric cs:p3=0,p2=1,p1=1.2) [shape=rectangle,fill=black!45,inner sep=1pt]{$\rho_\mu$};
|
samer@73
|
515 \path (p2) +(140:3.2em) node {$X_0$};
|
samer@73
|
516 % \node at (barycentric cs:p3=0,p2=1,p1=1) {$\rho_\mu$};
|
samer@73
|
517 \path (p3) +(3em,0em) node {\shortstack{infinite\\future}};
|
samer@73
|
518 \path (p1) +(-3em,0em) node {\shortstack{infinite\\past}};
|
samer@73
|
519 \path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
|
samer@73
|
520 \path (p3) +(4em,\rad) node [anchor=south] {$X_1,\ldots$};
|
samer@73
|
521 \end{tikzpicture}%}%
|
samer@73
|
522 % \\[0.25em]
|
samer@73
|
523 \end{tabular}
|
samer@73
|
524 \caption{
|
samer@73
|
525 I-diagram illustrating several information measures in
|
samer@73
|
526 stationary random processes. Each circle or oval represents a random
|
samer@73
|
527 variable or sequence of random variables relative to time $t=0$. Overlapped areas
|
samer@73
|
528 correspond to various mutual informations.
|
samer@73
|
529 The circle represents the `present'. Its total area is
|
samer@73
|
530 $H(X_0)=\rho_\mu+r_\mu+b_\mu$, where $\rho_\mu$ is the multi-information
|
samer@73
|
531 rate, $r_\mu$ is the residual entropy rate, and $b_\mu$ is the predictive
|
samer@73
|
532 information rate. The entropy rate is $h_\mu = r_\mu+b_\mu$.
|
samer@73
|
533 % The small dark
|
samer@73
|
534 % region below $X_0$ is $\sigma_\mu$ and the excess entropy
|
samer@73
|
535 % is $E = \rho_\mu + \sigma_\mu$.
|
samer@73
|
536 }
|
samer@73
|
537 \end{fig}
|
samer@73
|
538
|
samer@73
|
539 If we step back, out of the observer's shoes as it were, and consider the
|
samer@73
|
540 random process $(\ldots,X_{-1},X_0,X_1,\dots)$ as a statistical ensemble of
|
samer@73
|
541 possible realisations, and furthermore assume that it is stationary,
|
samer@73
|
542 then it becomes possible to define a number of information-theoretic measures,
|
samer@73
|
543 closely related to those described above, but which characterise the
|
samer@73
|
544 process as a whole, rather than on a moment-by-moment basis. Some of these,
|
samer@73
|
545 such as the entropy rate, are well-known, but others are only recently being
|
samer@73
|
546 investigated. (In the following, the assumption of stationarity means that
|
samer@73
|
547 the measures defined below are independent of $t$.)
|
samer@73
|
548
|
samer@73
|
549 The \emph{entropy rate} of the process is the entropy of the `present'
|
samer@73
|
550 $X_t$ given the `past':
|
samer@73
|
551 \begin{equation}
|
samer@73
|
552 \label{eq:entro-rate}
|
samer@73
|
553 h_\mu = H(X_t|\past{X}_t).
|
samer@73
|
554 \end{equation}
|
samer@73
|
555 The entropy rate is a measure of the overall surprisingness
|
samer@73
|
556 or unpredictability of the process, and gives an indication of the average
|
samer@73
|
557 level of surprise and uncertainty that would be experienced by an observer
|
samer@73
|
558 computing the measures of \secrf{surprise-info-seq} on a sequence sampled
|
samer@73
|
559 from the process.
|
samer@73
|
560
|
samer@73
|
561 The \emph{multi-information rate} $\rho_\mu$ \cite{Dubnov2004}
|
samer@73
|
562 is the mutual
|
samer@73
|
563 information between the `past' and the `present':
|
samer@73
|
564 \begin{equation}
|
samer@73
|
565 \label{eq:multi-info}
|
samer@73
|
566 \rho_\mu = I(\past{X}_t;X_t) = H(X_t) - h_\mu.
|
samer@73
|
567 \end{equation}
|
samer@73
|
568 It is a measure of how much the preceeding context of an observation
|
samer@73
|
569 helps in predicting or reducing the suprisingness of the current observation.
|
samer@73
|
570
|
samer@73
|
571 The \emph{excess entropy} \cite{CrutchfieldPackard1983}
|
samer@73
|
572 is the mutual information between
|
samer@73
|
573 the entire `past' and the entire `future' plus `present':
|
samer@73
|
574 \begin{equation}
|
samer@73
|
575 E = I(\past{X}_t; X_t,\fut{X}_t).
|
samer@73
|
576 \end{equation}
|
samer@73
|
577 Both the excess entropy and the multi-information rate can be thought
|
samer@73
|
578 of as measures of \emph{redundancy}, quantifying the extent to which
|
samer@73
|
579 the same information is to be found in all parts of the sequence.
|
samer@73
|
580
|
samer@73
|
581
|
samer@73
|
582 The \emph{predictive information rate} (or PIR) \cite{AbdallahPlumbley2009}
|
samer@73
|
583 is the mutual information between the `present' and the `future' given the
|
samer@73
|
584 `past':
|
samer@73
|
585 \begin{equation}
|
samer@73
|
586 \label{eq:PIR}
|
samer@73
|
587 b_\mu = I(X_t;\fut{X}_t|\past{X}_t) = H(\fut{X}_t|\past{X}_t) - H(\fut{X}_t|X_t,\past{X}_t),
|
samer@73
|
588 \end{equation}
|
samer@73
|
589 which can be read as the average reduction
|
samer@73
|
590 in uncertainty about the future on learning $X_t$, given the past.
|
samer@73
|
591 Due to the symmetry of the mutual information, it can also be written
|
samer@73
|
592 as
|
samer@73
|
593 \begin{equation}
|
samer@73
|
594 % \IXZ_t
|
samer@73
|
595 b_\mu = H(X_t|\past{X}_t) - H(X_t|\past{X}_t,\fut{X}_t) = h_\mu - r_\mu,
|
samer@73
|
596 % \label{<++>}
|
samer@73
|
597 \end{equation}
|
samer@73
|
598 % If $X$ is stationary, then
|
samer@73
|
599 where $r_\mu = H(X_t|\fut{X}_t,\past{X}_t)$,
|
samer@73
|
600 is the \emph{residual} \cite{AbdallahPlumbley2010},
|
samer@73
|
601 or \emph{erasure} \cite{VerduWeissman2006} entropy rate.
|
samer@73
|
602 The PIR gives an indication of the average IPI that would be experienced
|
samer@73
|
603 by an observer processing a sequence sampled from this process.
|
samer@73
|
604 The relationship between these various measures are illustrated in \Figrf{predinfo-bg};
|
samer@73
|
605 see James et al \cite{JamesEllisonCrutchfield2011} for further discussion.
|
samer@73
|
606 % in , along with several of the information measures we have discussed so far.
|
samer@73
|
607
|
samer@73
|
608 \comment{
|
samer@73
|
609 James et al v\cite{JamesEllisonCrutchfield2011} review several of these
|
samer@73
|
610 information measures and introduce some new related ones.
|
samer@73
|
611 In particular they identify the $\sigma_\mu = I(\past{X}_t;\fut{X}_t|X_t)$,
|
samer@73
|
612 the mutual information between the past and the future given the present,
|
samer@73
|
613 as an interesting quantity that measures the predictive benefit of
|
samer@73
|
614 model-building, that is, maintaining an internal state summarising past
|
samer@73
|
615 observations in order to make better predictions. It is shown as the
|
samer@73
|
616 small dark region below the circle in \figrf{predinfo-bg}(c).
|
samer@73
|
617 By comparing with \figrf{predinfo-bg}(b), we can see that
|
samer@73
|
618 $\sigma_\mu = E - \rho_\mu$.
|
samer@73
|
619 }
|
samer@73
|
620 % They also identify
|
samer@73
|
621 % $w_\mu = \rho_\mu + b_{\mu}$, which they call the \emph{local exogenous
|
samer@73
|
622 % information} rate.
|
samer@73
|
623
|
samer@73
|
624
|
samer@73
|
625 \subsection{First and higher order Markov chains}
|
samer@73
|
626 \label{s:markov}
|
samer@73
|
627 % First order Markov chains are the simplest non-trivial models to which information
|
samer@73
|
628 % dynamics methods can be applied.
|
samer@73
|
629 In \cite{AbdallahPlumbley2009} we derived
|
samer@73
|
630 expressions for all the information measures described in \secrf{surprise-info-seq} for
|
samer@73
|
631 ergodic first order Markov chains (\ie that have a unique stationary
|
samer@73
|
632 distribution).
|
samer@73
|
633 % The derivation is greatly simplified by the dependency structure
|
samer@73
|
634 % of the Markov chain: for the purpose of the analysis, the `past' and `future'
|
samer@73
|
635 % segments $\past{X}_t$ and $\fut{X}_t$ can be collapsed to just the previous
|
samer@73
|
636 % and next variables $X_{t-1}$ and $X_{t+1}$ respectively.
|
samer@73
|
637 We also showed that
|
samer@73
|
638 the PIR can be expressed simply in terms of entropy rates:
|
samer@73
|
639 if we let $a$ denote the $K\times K$ transition matrix of a Markov chain over
|
samer@73
|
640 an alphabet $\{1,\ldots,K\}$, such that
|
samer@73
|
641 $a_{ij} = \Pr(\ev(X_t=i|\ev(X_{t-1}=j)))$, and let $h:\reals^{K\times K}\to \reals$ be
|
samer@73
|
642 the entropy rate function such that $h(a)$ is the entropy rate of a Markov chain
|
samer@73
|
643 with transition matrix $a$, then the PIR is
|
samer@73
|
644 \begin{equation}
|
samer@73
|
645 b_\mu = h(a^2) - h(a),
|
samer@73
|
646 \end{equation}
|
samer@73
|
647 where $a^2$ is the transition matrix of the
|
samer@73
|
648 % `skip one'
|
samer@73
|
649 Markov chain obtained by jumping two steps at a time
|
samer@73
|
650 along the original chain.
|
samer@73
|
651
|
samer@73
|
652 Second and higher order Markov chains can be treated in a similar way by transforming
|
samer@73
|
653 to a first order representation of the high order Markov chain. With
|
samer@73
|
654 an $N$th order model, this is done by forming a new alphabet of size $K^N$
|
samer@73
|
655 consisting of all possible $N$-tuples of symbols from the base alphabet.
|
samer@73
|
656 An observation $\hat{x}_t$ in this new model encodes a block of $N$ observations
|
samer@73
|
657 $(x_{t+1},\ldots,x_{t+N})$ from the base model.
|
samer@73
|
658 % The next
|
samer@73
|
659 % observation $\hat{x}_{t+1}$ encodes the block of $N$ obtained by shifting the previous
|
samer@73
|
660 % block along by one step.
|
samer@73
|
661 The new Markov of chain is parameterised by a sparse $K^N\times K^N$
|
samer@73
|
662 transition matrix $\hat{a}$, in terms of which the PIR is
|
samer@73
|
663 \begin{equation}
|
samer@73
|
664 h_\mu = h(\hat{a}), \qquad b_\mu = h({\hat{a}^{N+1}}) - N h({\hat{a}}),
|
samer@73
|
665 \end{equation}
|
samer@73
|
666 where $\hat{a}^{N+1}$ is the $(N+1)$th power of the first order transition matrix.
|
samer@73
|
667 Other information measures can also be computed for the high-order Markov chain, including
|
samer@73
|
668 the multi-information rate $\rho_\mu$ and the excess entropy $E$. (These are identical
|
samer@73
|
669 for first order Markov chains, but for order $N$ chains, $E$ can be up to $N$ times larger
|
samer@73
|
670 than $\rho_\mu$.)
|
samer@73
|
671
|
samer@73
|
672 In our experiments with visualising and sonifying sequences sampled from
|
samer@73
|
673 first order Markov chains \cite{AbdallahPlumbley2009}, we found that
|
samer@73
|
674 the measures $h_\mu$, $\rho_\mu$ and $b_\mu$ correspond to perceptible
|
samer@73
|
675 characteristics, and that the transition matrices maximising or minimising
|
samer@73
|
676 each of these quantities are quite distinct. High entropy rates are associated
|
samer@73
|
677 with completely uncorrelated sequences with no recognisable temporal structure
|
samer@73
|
678 (and low $\rho_\mu$ and $b_\mu$).
|
samer@73
|
679 High values of $\rho_\mu$ are associated with long periodic cycles (and low $h_\mu$
|
samer@73
|
680 and $b_\mu$). High values of $b_\mu$ are associated with intermediate values
|
samer@73
|
681 of $\rho_\mu$ and $h_\mu$, and recognisable, but not completely predictable,
|
samer@73
|
682 temporal structures. These relationships are visible in \figrf{mtriscat} in
|
samer@73
|
683 \secrf{composition}, where we pick up this thread again, with an application of
|
samer@73
|
684 information dynamics in a compositional aid.
|
samer@73
|
685
|
samer@73
|
686
|
samer@73
|
687 \section{Information Dynamics in Analysis}
|
samer@73
|
688
|
samer@73
|
689 \subsection{Musicological Analysis}
|
samer@73
|
690 \label{s:minimusic}
|
samer@73
|
691
|
samer@73
|
692 In \cite{AbdallahPlumbley2009}, we analysed two pieces of music in the minimalist style
|
samer@73
|
693 by Philip Glass: \emph{Two Pages} (1969) and \emph{Gradus} (1968).
|
samer@73
|
694 The analysis was done using a first-order Markov chain model, with the
|
samer@73
|
695 enhancement that the transition matrix of the model was allowed to
|
samer@73
|
696 evolve dynamically as the notes were processed, and was tracked (in
|
samer@73
|
697 a Bayesian way) as a \emph{distribution} over possible transition matrices,
|
samer@73
|
698 rather than a point estimate. Some results are summarised in \figrf{twopages}:
|
samer@73
|
699 the upper four plots show the dynamically evolving subjective information
|
samer@73
|
700 measures as described in \secrf{surprise-info-seq}, computed using a point
|
samer@73
|
701 estimate of the current transition matrix; the fifth plot (the `model information rate')
|
samer@73
|
702 shows the information in each observation about the transition matrix.
|
samer@73
|
703 In \cite{AbdallahPlumbley2010b}, we showed that this `model information rate'
|
samer@73
|
704 is actually a component of the true IPI when the transition
|
samer@73
|
705 matrix is being learned online, and was neglected when we computed the IPI from
|
samer@73
|
706 the transition matrix as if it were a constant.
|
samer@73
|
707
|
samer@73
|
708 The peaks of the surprisingness and both components of the IPI
|
samer@73
|
709 show good correspondence with structure of the piece both as marked in the score
|
samer@73
|
710 and as analysed by musicologist Keith Potter, who was asked to mark the six
|
samer@73
|
711 `most surprising moments' of the piece (shown as asterisks in the fifth plot). %%
|
samer@73
|
712 % \footnote{%
|
samer@73
|
713 % Note that the boundary marked in the score at around note 5,400 is known to be
|
samer@73
|
714 % anomalous; on the basis of a listening analysis, some musicologists have
|
samer@73
|
715 % placed the boundary a few bars later, in agreement with our analysis
|
samer@73
|
716 % \cite{PotterEtAl2007}.}
|
samer@73
|
717 %
|
samer@73
|
718 In contrast, the analyses shown in the lower two plots of \figrf{twopages},
|
samer@73
|
719 obtained using two rule-based music segmentation algorithms, while clearly
|
samer@73
|
720 \emph{reflecting} the structure of the piece, do not \emph{segment} the piece,
|
samer@73
|
721 with no tendency to peaking of the boundary strength function at
|
samer@73
|
722 the boundaries in the piece.
|
samer@73
|
723
|
samer@73
|
724 The complete analysis of \emph{Gradus} can be found in \cite{AbdallahPlumbley2009},
|
samer@73
|
725 but \figrf{metre} illustrates the result of a metrical analysis: the piece was divided
|
samer@73
|
726 into bars of 32, 64 and 128 notes. In each case, the average surprisingness and
|
samer@73
|
727 IPI for the first, second, third \etc notes in each bar were computed. The plots
|
samer@73
|
728 show that the first note of each bar is, on average, significantly more surprising
|
samer@73
|
729 and informative than the others, up to the 64-note level, where as at the 128-note,
|
samer@73
|
730 level, the dominant periodicity appears to remain at 64 notes.
|
samer@73
|
731
|
samer@73
|
732 \begin{fig}{twopages}
|
samer@73
|
733 \colfig[0.96]{matbase/fig9471}\\ % update from mbc paper
|
samer@73
|
734 % \colfig[0.97]{matbase/fig72663}\\ % later update from mbc paper (Keith's new picks)
|
samer@73
|
735 \vspace*{0.5em}
|
samer@73
|
736 \colfig[0.97]{matbase/fig13377} % rule based analysis
|
samer@73
|
737 \caption{Analysis of \emph{Two Pages}.
|
samer@73
|
738 The thick vertical lines are the part boundaries as indicated in
|
samer@73
|
739 the score by the composer.
|
samer@73
|
740 The thin grey lines
|
samer@73
|
741 indicate changes in the melodic `figures' of which the piece is
|
samer@73
|
742 constructed. In the `model information rate' panel, the black asterisks
|
samer@73
|
743 mark the six most surprising moments selected by Keith Potter.
|
samer@73
|
744 The bottom two panels show two rule-based boundary strength analyses.
|
samer@73
|
745 All information measures are in nats.
|
samer@73
|
746 %Note that the boundary marked in the score at around note 5,400 is known to be
|
samer@73
|
747 %anomalous; on the basis of a listening analysis, some musicologists have
|
samer@73
|
748 %placed the boundary a few bars later, in agreement with our analysis
|
samer@73
|
749 \cite{PotterEtAl2007}.
|
samer@73
|
750 }
|
samer@73
|
751 \end{fig}
|
samer@73
|
752
|
samer@73
|
753 \begin{fig}{metre}
|
samer@73
|
754 % \scalebox{1}{%
|
samer@73
|
755 \begin{tabular}{cc}
|
samer@73
|
756 \colfig[0.45]{matbase/fig36859} & \colfig[0.48]{matbase/fig88658} \\
|
samer@73
|
757 \colfig[0.45]{matbase/fig48061} & \colfig[0.48]{matbase/fig46367} \\
|
samer@73
|
758 \colfig[0.45]{matbase/fig99042} & \colfig[0.47]{matbase/fig87490}
|
samer@73
|
759 % \colfig[0.46]{matbase/fig56807} & \colfig[0.48]{matbase/fig27144} \\
|
samer@73
|
760 % \colfig[0.46]{matbase/fig87574} & \colfig[0.48]{matbase/fig13651} \\
|
samer@73
|
761 % \colfig[0.44]{matbase/fig19913} & \colfig[0.46]{matbase/fig66144} \\
|
samer@73
|
762 % \colfig[0.48]{matbase/fig73098} & \colfig[0.48]{matbase/fig57141} \\
|
samer@73
|
763 % \colfig[0.48]{matbase/fig25703} & \colfig[0.48]{matbase/fig72080} \\
|
samer@73
|
764 % \colfig[0.48]{matbase/fig9142} & \colfig[0.48]{matbase/fig27751}
|
samer@73
|
765
|
samer@73
|
766 \end{tabular}%
|
samer@73
|
767 % }
|
samer@73
|
768 \caption{Metrical analysis by computing average surprisingness and
|
samer@73
|
769 IPI of notes at different periodicities (\ie hypothetical
|
samer@73
|
770 bar lengths) and phases (\ie positions within a bar).
|
samer@73
|
771 }
|
samer@73
|
772 \end{fig}
|
samer@73
|
773
|
samer@73
|
774 \begin{fig*}{drumfig}
|
samer@73
|
775 % \includegraphics[width=0.9\linewidth]{drum_plots/file9-track.eps}% \\
|
samer@73
|
776 \includegraphics[width=0.97\linewidth]{figs/file11-track.eps} \\
|
samer@73
|
777 % \includegraphics[width=0.9\linewidth]{newplots/file8-track.eps}
|
samer@73
|
778 \caption{Information dynamic analysis derived from audio recordings of
|
samer@73
|
779 drumming, obtained by applying a Bayesian beat tracking system to the
|
samer@73
|
780 sequence of detected kick and snare drum events. The grey line show the system's
|
samer@73
|
781 varying level of uncertainty (entropy) about the tempo and phase of the
|
samer@73
|
782 beat grid, while the stem plot shows the amount of information in each
|
samer@73
|
783 drum event about the beat grid. The entropy drops instantaneously at each
|
samer@73
|
784 event and rises gradually between events.
|
samer@73
|
785 }
|
samer@73
|
786 \end{fig*}
|
samer@73
|
787
|
samer@73
|
788 \subsection{Real-valued signals and audio analysis}
|
samer@73
|
789 Using analogous definitions based on the differential entropy
|
samer@73
|
790 \cite{CoverThomas}, the methods outlined
|
samer@73
|
791 in \secrf{surprise-info-seq} and \secrf{process-info}
|
samer@73
|
792 can be reformulated for random variables taking values in a continuous domain
|
samer@73
|
793 and thus be applied to expressive parameters of music
|
samer@73
|
794 such as dynamics, timing and timbre, which are readily quantified on a continuous scale.
|
samer@73
|
795 %
|
samer@73
|
796 % \subsection{Audio based content analysis}
|
samer@73
|
797 % Using analogous definitions of differential entropy, the methods outlined
|
samer@73
|
798 % in the previous section are equally applicable to continuous random variables.
|
samer@73
|
799 % In the case of music, where expressive properties such as dynamics, tempo,
|
samer@73
|
800 % timing and timbre are readily quantified on a continuous scale, the information
|
samer@73
|
801 % dynamic framework may also be considered.
|
samer@73
|
802 %
|
samer@73
|
803 Dubnov \cite{Dubnov2004} considers the class of stationary Gaussian
|
samer@73
|
804 processes, for which the entropy rate may be obtained analytically
|
samer@73
|
805 from the power spectral density function $S(\omega)$ of the signal,
|
samer@73
|
806 and found that the
|
samer@73
|
807 multi-information rate can be
|
samer@73
|
808 expressed as
|
samer@73
|
809 \begin{equation}
|
samer@73
|
810 \rho_\mu = \frac{1}{2} \left( \log \specint{} - \specint{\log}\right).
|
samer@73
|
811 \label{eq:mir-sfm}
|
samer@73
|
812 \end{equation}
|
samer@73
|
813 Dubnov also notes that $e^{-2\rho_\mu}$ is equivalent to the well-known
|
samer@73
|
814 \emph{spectral flatness measure}, and hence,
|
samer@73
|
815 Gaussian processes with maximal multi-information rate are those with maximally
|
samer@73
|
816 non-flat spectra, which are those dominated by a single frequency component.
|
samer@73
|
817 % These essentially consist of a single
|
samer@73
|
818 % sinusoidal component and hence are completely predictable once
|
samer@73
|
819 % the parameters of the sinusoid have been inferred.
|
samer@73
|
820 % Local stationarity is assumed, which may be achieved by windowing or
|
samer@73
|
821 % change point detection \cite{Dubnov2008}.
|
samer@73
|
822 %TODO
|
samer@73
|
823
|
samer@73
|
824 We have found (to appear in forthcoming work) that the predictive information for autoregressive
|
samer@73
|
825 Gaussian processes can be expressed as
|
samer@73
|
826 \begin{equation}
|
samer@73
|
827 b_\mu = \frac{1}{2} \left( \log \specint{\frac{1}} - \specint{\log\frac{1}}\right),
|
samer@73
|
828 \end{equation}
|
samer@73
|
829 suggesting a sort of duality between $b_\mu$ and $\rho_\mu$ which is consistent with
|
samer@73
|
830 the duality between multi-information and predictive information rates we discuss in
|
samer@73
|
831 \cite{AbdallahPlumbley2012}. A consideration of the residual or erasure entropy rate
|
samer@73
|
832 \cite{VerduWeissman2006}
|
samer@73
|
833 suggests that this expression applies to Guassian processes in general but this is
|
samer@73
|
834 yet to be confirmed rigorously.
|
samer@73
|
835
|
samer@73
|
836 Analysis shows that in stationary autogressive processes of a given finite order,
|
samer@73
|
837 $\rho_\mu$ is unbounded, while for moving average process of a given order, $b_\mu$ is unbounded.
|
samer@73
|
838 This is a result of the physically unattainable infinite precision observations which the
|
samer@73
|
839 theoretical analysis assumes; adding more realistic limitations on the amount of information
|
samer@73
|
840 that can be extracted from one measurement is the one of the aims of our ongoing work in this
|
samer@73
|
841 area.
|
samer@73
|
842 % We are currently working towards methods for the computation of predictive information
|
samer@73
|
843 % rate in autorregressive and moving average Gaussian processes
|
samer@73
|
844 % and processes with power-law (or $1/f$) spectra,
|
samer@73
|
845 % which have previously been investegated in relation to their aesthetic properties
|
samer@73
|
846 % \cite{Voss75,TaylorSpeharVan-Donkelaar2011}.
|
samer@73
|
847
|
samer@73
|
848 % (fractionally integrated Gaussian noise).
|
samer@73
|
849 % %(fBm (continuous), fiGn discrete time) possible reference:
|
samer@73
|
850 % @book{palma2007long,
|
samer@73
|
851 % title={Long-memory time series: theory and methods},
|
samer@73
|
852 % author={Palma, W.},
|
samer@73
|
853 % volume={662},
|
samer@73
|
854 % year={2007},
|
samer@73
|
855 % publisher={Wiley-Blackwell}
|
samer@73
|
856 % }
|
samer@73
|
857
|
samer@73
|
858
|
samer@73
|
859
|
samer@73
|
860 % mention non-gaussian processes extension Similarly, the predictive information
|
samer@73
|
861 % rate may be computed using a Gaussian linear formulation CITE. In this view,
|
samer@73
|
862 % the PIR is a function of the correlation between random innovations supplied
|
samer@73
|
863 % to the stochastic process. %Dubnov, MacAdams, Reynolds (2006) %Bailes and Dean (2009)
|
samer@73
|
864
|
samer@73
|
865 % In \cite{Dubnov2006}, Dubnov considers the class of stationary Gaussian
|
samer@73
|
866 % processes. For such processes, the entropy rate may be obtained analytically
|
samer@73
|
867 % from the power spectral density of the signal, allowing the multi-information
|
samer@73
|
868 % rate to be subsequently obtained. One aspect demanding further investigation
|
samer@73
|
869 % involves the comparison of alternative measures of predictability. In the case of the PIR, a Gaussian linear formulation is applicable, indicating that the PIR is a function of the correlation between random innovations supplied to the stochastic process CITE.
|
samer@73
|
870 % !!! FIXME
|
samer@73
|
871
|
samer@73
|
872
|
samer@73
|
873 \subsection{Beat Tracking}
|
samer@73
|
874
|
samer@73
|
875 A probabilistic method for drum tracking was presented by Robertson
|
samer@73
|
876 \cite{Robertson11c}. The system infers a beat grid (a sequence
|
samer@73
|
877 of approximately regular beat times) given audio inputs from a
|
samer@73
|
878 live drummer, for the purpose of synchronising a music
|
samer@73
|
879 sequencer with the drummer.
|
samer@73
|
880 The times of kick and snare drum events are obtained
|
samer@73
|
881 using dedicated microphones for each drum and a percussive onset detector
|
samer@73
|
882 \cite{puckette98}. These event times are then sent
|
samer@73
|
883 to the beat tracker, which maintains a belief state in
|
samer@73
|
884 the form of distributions over the tempo and phase of the beat grid.
|
samer@73
|
885 Every time an event is received, these distributions are updated
|
samer@73
|
886 with respect to a probabilistic model which accounts both for tempo and phase
|
samer@73
|
887 variations and the emission of drum events at musically plausible times
|
samer@73
|
888 relative to the beat grid.
|
samer@73
|
889 %continually updates distributions for tempo and phase on receiving a new
|
samer@73
|
890 %event time
|
samer@73
|
891
|
samer@73
|
892 The use of a probabilistic belief state means we can compute entropies
|
samer@73
|
893 representing the system's uncertainty about the beat grid, and quantify
|
samer@73
|
894 the amount of information in each event about the beat grid as the KL divergence
|
samer@73
|
895 between prior and posterior distributions. Though this is not strictly the
|
samer@73
|
896 instantaneous predictive information (IPI) as described in \secrf{surprise-info-seq}
|
samer@73
|
897 (the information gained is not directly about future event times), we can treat
|
samer@73
|
898 it as a proxy for the IPI, in the manner of the `model information rate'
|
samer@73
|
899 described in \secrf{minimusic}, which has a similar status.
|
samer@73
|
900
|
samer@73
|
901 We carried out the analysis on 16 recordings; an example
|
samer@73
|
902 is shown in \figrf{drumfig}. There we can see variations in the
|
samer@73
|
903 entropy in the upper graph and the information in each drum event in the lower
|
samer@73
|
904 stem plot. At certain points in time, unusually large amounts of information
|
samer@73
|
905 arrive; these may be related to fills and other rhythmic irregularities, which
|
samer@73
|
906 are often followed by an emphatic return to a steady beat at the beginning
|
samer@73
|
907 of the next bar---this is something we are currently investigating.
|
samer@73
|
908 We also analysed the pattern of information flow
|
samer@73
|
909 on a cyclic metre, much as in \figrf{metre}. All the recordings we
|
samer@73
|
910 analysed are audibly in 4/4 metre, but we found no
|
samer@73
|
911 evidence of a general tendency for greater amounts of information to arrive
|
samer@73
|
912 at metrically strong beats, which suggests that the rhythmic accuracy of the
|
samer@73
|
913 drummers does not vary systematically across each bar. It is possible that metrical information
|
samer@73
|
914 existing in the pattern of kick and snare events might emerge in an
|
samer@73
|
915 analysis using a model that attempts to predict the time and type of
|
samer@73
|
916 the next drum event, rather than just inferring the beat grid as the current model does.
|
samer@73
|
917 %The analysis of information rates can b
|
samer@73
|
918 %considered \emph{subjective}, in that it measures how the drum tracker's
|
samer@73
|
919 %probability distributions change, and these are contingent upon the
|
samer@73
|
920 %model used as well as external properties in the signal.
|
samer@73
|
921 %We expect,
|
samer@73
|
922 %however, that following periods of increased uncertainty, such as fills
|
samer@73
|
923 %or expressive timing, the information contained in an individual event
|
samer@73
|
924 %increases. We also examine whether the information is dependent upon
|
samer@73
|
925 %metrical position.
|
samer@73
|
926
|
samer@73
|
927
|
samer@73
|
928 \section{Information dynamics as compositional aid}
|
samer@73
|
929 \label{s:composition}
|
samer@73
|
930
|
samer@73
|
931 The use of stochastic processes in music composition has been widespread for
|
samer@73
|
932 decades---for instance Iannis Xenakis applied probabilistic mathematical models
|
samer@73
|
933 to the creation of musical materials\cite{Xenakis:1992ul}. While such processes
|
samer@73
|
934 can drive the \emph{generative} phase of the creative process, information dynamics
|
samer@73
|
935 can serve as a novel framework for a \emph{selective} phase, by
|
samer@73
|
936 providing a set of criteria to be used in judging which of the
|
samer@73
|
937 generated materials
|
samer@73
|
938 are of value. This alternation of generative and selective phases as been
|
samer@73
|
939 noted before \cite{Boden1990}.
|
samer@73
|
940 %
|
samer@73
|
941 Information-dynamic criteria can also be used as \emph{constraints} on the
|
samer@73
|
942 generative processes, for example, by specifying a certain temporal profile
|
samer@73
|
943 of suprisingness and uncertainty the composer wishes to induce in the listener
|
samer@73
|
944 as the piece unfolds.
|
samer@73
|
945 %stochastic and algorithmic processes: ; outputs can be filtered to match a set of
|
samer@73
|
946 %criteria defined in terms of information-dynamical characteristics, such as
|
samer@73
|
947 %predictability vs unpredictability
|
samer@73
|
948 %s model, this criteria thus becoming a means of interfacing with the generative processes.
|
samer@73
|
949
|
samer@73
|
950 %The tools of information dynamics provide a way to constrain and select musical
|
samer@73
|
951 %materials at the level of patterns of expectation, implication, uncertainty, and predictability.
|
samer@73
|
952 In particular, the behaviour of the predictive information rate (PIR) defined in
|
samer@73
|
953 \secrf{process-info} make it interesting from a compositional point of view. The definition
|
samer@73
|
954 of the PIR is such that it is low both for extremely regular processes, such as constant
|
samer@73
|
955 or periodic sequences, \emph{and} low for extremely random processes, where each symbol
|
samer@73
|
956 is chosen independently of the others, in a kind of `white noise'. In the former case,
|
samer@73
|
957 the pattern, once established, is completely predictable and therefore there is no
|
samer@73
|
958 \emph{new} information in subsequent observations. In the latter case, the randomness
|
samer@73
|
959 and independence of all elements of the sequence means that, though potentially surprising,
|
samer@73
|
960 each observation carries no information about the ones to come.
|
samer@73
|
961
|
samer@73
|
962 Processes with high PIR maintain a certain kind of balance between
|
samer@73
|
963 predictability and unpredictability in such a way that the observer must continually
|
samer@73
|
964 pay attention to each new observation as it occurs in order to make the best
|
samer@73
|
965 possible predictions about the evolution of the seqeunce. This balance between predictability
|
samer@73
|
966 and unpredictability is reminiscent of the inverted `U' shape of the Wundt curve (see \figrf{wundt}),
|
samer@73
|
967 which summarises the observations of Wundt \cite{Wundt1897} that stimuli are most
|
samer@73
|
968 pleasing at intermediate levels of novelty or disorder,
|
samer@73
|
969 where there is a balance between `order' and `chaos'.
|
samer@73
|
970
|
samer@73
|
971 Using the methods of \secrf{markov}, we found \cite{AbdallahPlumbley2009}
|
samer@73
|
972 a similar shape when plotting entropy rate againt PIR---this is visible in the
|
samer@73
|
973 upper envelope of the plot in \figrf{mtriscat}, which is a 3-D scatter plot of
|
samer@73
|
974 three of the information measures discussed in \secrf{process-info} for several thousand
|
samer@73
|
975 first-order Markov chain transition matrices generated by a random sampling method.
|
samer@73
|
976 The coordinates of the `information space' are entropy rate ($h_\mu$), redundancy ($\rho_\mu$), and
|
samer@73
|
977 predictive information rate ($b_\mu$). The points along the `redundancy' axis correspond
|
samer@73
|
978 to periodic Markov chains. Those along the `entropy' axis produce uncorrelated sequences
|
samer@73
|
979 with no temporal structure. Processes with high PIR are to be found at intermediate
|
samer@73
|
980 levels of entropy and redundancy.
|
samer@73
|
981
|
samer@73
|
982 %It is possible to apply information dynamics to the generation of content, such as to the composition of musical materials.
|
samer@73
|
983
|
samer@73
|
984 %For instance a stochastic music generating process could be controlled by modifying
|
samer@73
|
985 %constraints on its output in terms of predictive information rate or entropy
|
samer@73
|
986 %rate.
|
samer@73
|
987
|
samer@73
|
988 \begin{fig}{wundt}
|
samer@73
|
989 \raisebox{-4em}{\colfig[0.43]{wundt}}
|
samer@73
|
990 % {\ \shortstack{{\Large$\longrightarrow$}\\ {\scriptsize\emph{exposure}}}\ }
|
samer@73
|
991 {\ {\large$\longrightarrow$}\ }
|
samer@73
|
992 \raisebox{-4em}{\colfig[0.43]{wundt2}}
|
samer@73
|
993 \caption{
|
samer@73
|
994 The Wundt curve relating randomness/complexity with
|
samer@73
|
995 perceived value. Repeated exposure sometimes results
|
samer@73
|
996 in a move to the left along the curve \cite{Berlyne71}.
|
samer@73
|
997 }
|
samer@73
|
998 \end{fig}
|
samer@73
|
999
|
samer@73
|
1000
|
samer@73
|
1001
|
samer@73
|
1002 \subsection{The Melody Triangle}
|
samer@73
|
1003
|
samer@73
|
1004 These observations led us to construct the `Melody Triangle', a graphical interface for
|
samer@73
|
1005 %for %exploring the melodic patterns generated by each of the Markov chains represented
|
samer@73
|
1006 %as points in \figrf{mtriscat}.
|
samer@73
|
1007 %
|
samer@73
|
1008 %The Melody Triangle is an interface for
|
samer@73
|
1009 the discovery of melodic
|
samer@73
|
1010 materials, where the input---positions within a triangle---directly map to information
|
samer@73
|
1011 theoretic properties of the output. % as exemplified in \figrf{mtriscat}.
|
samer@73
|
1012 %The measures---entropy rate, redundancy and
|
samer@73
|
1013 %predictive information rate---form a criteria with which to filter the output
|
samer@73
|
1014 %of the stochastic processes used to generate sequences of notes.
|
samer@73
|
1015 %These measures
|
samer@73
|
1016 %address notions of expectation and surprise in music, and as such the Melody
|
samer@73
|
1017 %Triangle is a means of interfacing with a generative process in terms of the
|
samer@73
|
1018 %predictability of its output.
|
samer@73
|
1019 %ยง
|
samer@73
|
1020 The triangle is populated with first order Markov chain transition
|
samer@73
|
1021 matrices as illustrated in \figrf{mtriscat}.
|
samer@73
|
1022 The distribution of transition matrices in this space forms a relatively thin
|
samer@73
|
1023 curved sheet. Thus, it is a reasonable simplification to project out the
|
samer@73
|
1024 third dimension (the PIR) and present an interface that is just two dimensional.
|
samer@73
|
1025 The right-angled triangle is rotated, reflected and stretched to form an equilateral triangle with
|
samer@73
|
1026 the $h_\mu=0, \rho_\mu=0$ vertex at the top, the `redundancy' axis down the left-hand
|
samer@73
|
1027 side, and the `entropy rate' axis down the right, as shown in \figrf{TheTriangle}.
|
samer@73
|
1028 This is our `Melody Triangle' and
|
samer@73
|
1029 forms the interface by which the system is controlled.
|
samer@73
|
1030 %Using this interface thus involves a mapping to information space;
|
samer@73
|
1031 The user selects a point within the triangle, this is mapped into the
|
samer@73
|
1032 information space and the nearest transition matrix is used to generate
|
samer@73
|
1033 a sequence of values which are then sonified either as pitched notes or percussive
|
samer@73
|
1034 sounds. By choosing the position within the triangle, the user can control the
|
samer@73
|
1035 output at the level of its `collative' properties, with access to the variety
|
samer@73
|
1036 of patterns as described above and in \secrf{markov}.
|
samer@73
|
1037 %and information-theoretic criteria related to predictability
|
samer@73
|
1038 %and information flow
|
samer@73
|
1039 Though the interface is 2D, the third dimension (PIR) is implicitly present, as
|
samer@73
|
1040 transition matrices retrieved from
|
samer@73
|
1041 along the centre line of the triangle will tend to have higher PIR.
|
samer@73
|
1042 We hypothesise that, under
|
samer@73
|
1043 the appropriate conditions, these will be perceived as more `interesting' or
|
samer@73
|
1044 `melodic.'
|
samer@73
|
1045
|
samer@73
|
1046 %The corners correspond to three different extremes of predictability and
|
samer@73
|
1047 %unpredictability, which could be loosely characterised as `periodicity', `noise'
|
samer@73
|
1048 %and `repetition'. Melodies from the `noise' corner (high $h_\mu$, low $\rho_\mu$
|
samer@73
|
1049 %and $b_\mu$) have no discernible pattern;
|
samer@73
|
1050 %those along the `periodicity'
|
samer@73
|
1051 %to `repetition' edge are all cyclic patterns that get shorter as we approach
|
samer@73
|
1052 %the `repetition' corner, until each is just one repeating note. Those along the
|
samer@73
|
1053 %opposite edge consist of independent random notes from non-uniform distributions.
|
samer@73
|
1054 %Areas between the left and right edges will tend to have higher PIR,
|
samer@73
|
1055 %and we hypothesise that, under
|
samer@73
|
1056 %the appropriate conditions, these will be perceived as more `interesting' or
|
samer@73
|
1057 %`melodic.'
|
samer@73
|
1058 %These melodies have some level of unpredictability, but are not completely random.
|
samer@73
|
1059 % Or, conversely, are predictable, but not entirely so.
|
samer@73
|
1060
|
samer@73
|
1061 \begin{fig}{mtriscat}
|
samer@73
|
1062 \colfig[0.9]{mtriscat}
|
samer@73
|
1063 \caption{The population of transition matrices in the 3D space of
|
samer@73
|
1064 entropy rate ($h_\mu$), redundancy ($\rho_\mu$) and PIR ($b_\mu$),
|
samer@73
|
1065 all in bits.
|
samer@73
|
1066 The concentrations of points along the redundancy axis correspond
|
samer@73
|
1067 to Markov chains which are roughly periodic with periods of 2 (redundancy 1 bit),
|
samer@73
|
1068 3, 4, \etc all the way to period 7 (redundancy 2.8 bits). The colour of each point
|
samer@73
|
1069 represents its PIR---note that the highest values are found at intermediate entropy
|
samer@73
|
1070 and redundancy, and that the distribution as a whole makes a curved triangle. Although
|
samer@73
|
1071 not visible in this plot, it is largely hollow in the middle.}
|
samer@73
|
1072 \end{fig}
|
samer@73
|
1073
|
samer@73
|
1074
|
samer@73
|
1075 %PERHAPS WE SHOULD FOREGO TALKING ABOUT THE
|
samer@73
|
1076 %INSTALLATION VERSION OF THE TRIANGLE?
|
samer@73
|
1077 %feels a bit like a tangent, and could do with the space..
|
samer@73
|
1078 The Melody Triangle exists in two incarnations: a screen-based interface
|
samer@73
|
1079 where a user moves tokens in and around a triangle on screen, and a multi-user
|
samer@73
|
1080 interactive installation where a Kinect camera tracks individuals in a space and
|
samer@73
|
1081 maps their positions in physical space to the triangle. In the latter each visitor
|
samer@73
|
1082 that enters the installation generates a melody and can collaborate with their
|
samer@73
|
1083 co-visitors to generate musical textures. This makes the interaction physically engaging
|
samer@73
|
1084 and (as our experience with visitors both young and old has demonstrated) more playful.
|
samer@73
|
1085 %Additionally visitors can change the
|
samer@73
|
1086 %tempo, register, instrumentation and periodicity of their melody with body gestures.
|
samer@73
|
1087 %
|
samer@73
|
1088 The screen based interface can serve as a compositional tool.
|
samer@73
|
1089 %%A triangle is drawn on the screen, screen space thus mapped to the statistical
|
samer@73
|
1090 %space of the Melody Triangle.
|
samer@73
|
1091 A number of tokens, each representing a
|
samer@73
|
1092 sonification stream or `voice', can be dragged in and around the triangle.
|
samer@73
|
1093 For each token, a sequence of symbols is sampled using the corresponding
|
samer@73
|
1094 transition matrix, which
|
samer@73
|
1095 %statistical properties that correspond to the token's position is generated. These
|
samer@73
|
1096 %symbols
|
samer@73
|
1097 are then mapped to notes of a scale or percussive sounds%
|
samer@73
|
1098 \footnote{The sampled sequence could easily be mapped to other musical processes, possibly over
|
samer@73
|
1099 different time scales, such as chords, dynamics and timbres. It would also be possible
|
samer@73
|
1100 to map the symbols to visual or other outputs.}%
|
samer@73
|
1101 . Keyboard commands give control over other musical parameters such
|
samer@73
|
1102 as pitch register and inter-onset interval.
|
samer@73
|
1103 %The possibilities afforded by the Melody Triangle in these other domains remains to be investigated.}.
|
samer@73
|
1104 %
|
samer@73
|
1105 The system is capable of generating quite intricate musical textures when multiple tokens
|
samer@73
|
1106 are in the triangle, but unlike other computer aided composition tools or programming
|
samer@73
|
1107 environments, the composer excercises control at the abstract level of information-dynamic
|
samer@73
|
1108 properties.
|
samer@73
|
1109 %the interface relating to subjective expectation and predictability.
|
samer@73
|
1110
|
samer@73
|
1111 \begin{fig}{TheTriangle}
|
samer@73
|
1112 \colfig[0.7]{TheTriangle.pdf}
|
samer@73
|
1113 \caption{The Melody Triangle}
|
samer@73
|
1114 \end{fig}
|
samer@73
|
1115
|
samer@73
|
1116 \comment{
|
samer@73
|
1117 \subsection{Information Dynamics as Evaluative Feedback Mechanism}
|
samer@73
|
1118 %NOT SURE THIS SHOULD BE HERE AT ALL..?
|
samer@73
|
1119 Information measures on a stream of symbols can form a feedback mechanism; a
|
samer@73
|
1120 rudimentary `critic' of sorts. For instance symbol by symbol measure of predictive
|
samer@73
|
1121 information rate, entropy rate and redundancy could tell us if a stream of symbols
|
samer@73
|
1122 is currently `boring', either because it is too repetitive, or because it is too
|
samer@73
|
1123 chaotic. Such feedback would be oblivious to long term and large scale
|
samer@73
|
1124 structures and any cultural norms (such as style conventions), but
|
samer@73
|
1125 nonetheless could provide a composer with valuable insight on
|
samer@73
|
1126 the short term properties of a work. This could not only be used for the
|
samer@73
|
1127 evaluation of pre-composed streams of symbols, but could also provide real-time
|
samer@73
|
1128 feedback in an improvisatory setup.
|
samer@73
|
1129 }
|
samer@73
|
1130
|
samer@73
|
1131 \subsection{User trials with the Melody Triangle}
|
samer@73
|
1132 We are currently in the process of using the screen-based
|
samer@73
|
1133 Melody Triangle user interface to investigate the relationship between the information-dynamic
|
samer@73
|
1134 characteristics of sonified Markov chains and subjective musical preference.
|
samer@73
|
1135 We carried out a pilot study with six participants, who were asked
|
samer@73
|
1136 to use a simplified form of the user interface (a single controllable token,
|
samer@73
|
1137 and no rhythmic, registral or timbral controls) under two conditions:
|
samer@73
|
1138 one where a single sequence was sonified under user control, and another
|
samer@73
|
1139 where an additional sequence was sonified in a different register, as if generated
|
samer@73
|
1140 by a fixed invisible token in one of four regions of the triangle. In addition, subjects
|
samer@73
|
1141 were asked to press a key if they `liked' what they were hearing.
|
samer@73
|
1142
|
samer@73
|
1143 We recorded subjects' behaviour as well as points which they marked
|
samer@73
|
1144 with a key press.
|
samer@73
|
1145 Some results for two of the subjects are shown in \figrf{mtri-results}. Though
|
samer@73
|
1146 we have not been able to detect any systematic across-subjects preference for any particular
|
samer@73
|
1147 region of the triangle, subjects do seem to exhibit distinct kinds of exploratory behaviour.
|
samer@73
|
1148 Our initial hypothesis, that subjects would linger longer in regions of the triangle
|
samer@73
|
1149 that produced aesthetically preferable sequences, and that this would tend to be towards the
|
samer@73
|
1150 centre line of the triangle for all subjects, was not confirmed. However, it is possible
|
samer@73
|
1151 that the design of the experiment encouraged an initial exploration of the space (sometimes
|
samer@73
|
1152 very systematic, as for subject c) aimed at \emph{understanding} %the parameter space and
|
samer@73
|
1153 how the system works, rather than finding musical patterns. It is also possible that the
|
samer@73
|
1154 system encourages users to create musically interesting output by \emph{moving the token},
|
samer@73
|
1155 rather than finding a particular spot in the triangle which produces a musically interesting
|
samer@73
|
1156 sequence by itself.
|
samer@73
|
1157
|
samer@73
|
1158 \begin{fig}{mtri-results}
|
samer@73
|
1159 \def\scat#1{\colfig[0.42]{mtri/#1}}
|
samer@73
|
1160 \def\subj#1{\scat{scat_dwells_subj_#1} & \scat{scat_marks_subj_#1}}
|
samer@73
|
1161 \begin{tabular}{cc}
|
samer@73
|
1162 % \subj{a} \\
|
samer@73
|
1163 \subj{b} \\
|
samer@73
|
1164 \subj{c} \\
|
samer@73
|
1165 \subj{d}
|
samer@73
|
1166 \end{tabular}
|
samer@73
|
1167 \caption{Dwell times and mark positions from user trials with the
|
samer@73
|
1168 on-screen Melody Triangle interface, for three subjects. The left-hand column shows
|
samer@73
|
1169 the positions in a 2D information space (entropy rate vs multi-information rate
|
samer@73
|
1170 in bits) where each spent their time; the area of each circle is proportional
|
samer@73
|
1171 to the time spent there. The right-hand column shows point which subjects
|
samer@73
|
1172 `liked'; the area of the circles here is proportional to the duration spent at
|
samer@73
|
1173 that point before the point was marked.}
|
samer@73
|
1174 \end{fig}
|
samer@73
|
1175
|
samer@73
|
1176 Comments collected from the subjects
|
samer@73
|
1177 %during and after the experiment
|
samer@73
|
1178 suggest that
|
samer@73
|
1179 the information-dynamic characteristics of the patterns were readily apparent
|
samer@73
|
1180 to most: several noticed the main organisation of the triangle,
|
samer@73
|
1181 with repetetive notes at the top, cyclic patterns along one edge, and unpredictable
|
samer@73
|
1182 notes towards the opposite corner. Some described their systematic exploration of the space.
|
samer@73
|
1183 Two felt that the right side was `more controllable' than the left (a consequence
|
samer@73
|
1184 of their ability to return to a particular distinctive pattern and recognise it
|
samer@73
|
1185 as one heard previously). Two reported that they became bored towards the end,
|
samer@73
|
1186 but another felt there wasn't enough time to `hear out' the patterns properly.
|
samer@73
|
1187 One subject did not `enjoy' the patterns in the lower region, but another said the lower
|
samer@73
|
1188 central regions were more `melodic' and `interesting'.
|
samer@73
|
1189
|
samer@73
|
1190 We plan to continue the trials with a slightly less restricted user interface in order
|
samer@73
|
1191 make the experience more enjoyable and thereby give subjects longer to use the interface;
|
samer@73
|
1192 this may allow them to get beyond the initial exploratory phase and give a clearer
|
samer@73
|
1193 picture of their aesthetic preferences. In addition, we plan to conduct a
|
samer@73
|
1194 study under more restrictive conditions, where subjects will have no control over the patterns
|
samer@73
|
1195 other than to signal (a) which of two alternatives they prefer in a forced
|
samer@73
|
1196 choice paradigm, and (b) when they are bored of listening to a given sequence.
|
samer@73
|
1197
|
samer@73
|
1198 %\emph{comparable system} Gordon Pask's Musicolor (1953) applied a similar notion
|
samer@73
|
1199 %of boredom in its design. The Musicolour would react to audio input through a
|
samer@73
|
1200 %microphone by flashing coloured lights. Rather than a direct mapping of sound
|
samer@73
|
1201 %to light, Pask designed the device to be a partner to a performing musician. It
|
samer@73
|
1202 %would adapt its lighting pattern based on the rhythms and frequencies it would
|
samer@73
|
1203 %hear, quickly `learning' to flash in time with the music. However Pask endowed
|
samer@73
|
1204 %the device with the ability to `be bored'; if the rhythmic and frequency content
|
samer@73
|
1205 %of the input remained the same for too long it would listen for other rhythms
|
samer@73
|
1206 %and frequencies, only lighting when it heard these. As the Musicolour would
|
samer@73
|
1207 %`get bored', the musician would have to change and vary their playing, eliciting
|
samer@73
|
1208 %new and unexpected outputs in trying to keep the Musicolour interested.
|
samer@73
|
1209
|
samer@73
|
1210
|
samer@73
|
1211 \section{Conclusions}
|
samer@73
|
1212
|
samer@73
|
1213 % !!! FIXME
|
samer@73
|
1214 %We reviewed our information dynamics approach to the modelling of the perception
|
samer@73
|
1215 We have looked at several emerging areas of application of the methods and
|
samer@73
|
1216 ideas of information dynamics to various problems in music analysis, perception
|
samer@73
|
1217 and cognition, including musicological analysis of symbolic music, audio analysis,
|
samer@73
|
1218 rhythm processing and compositional and creative tasks. The approach has proved
|
samer@73
|
1219 successful in musicological analysis, and though our initial data on
|
samer@73
|
1220 rhythm processing and aesthetic preference are inconclusive, there is still
|
samer@73
|
1221 plenty of work to be done in this area: where-ever there are probabilistic models,
|
samer@73
|
1222 information dynamics can shed light on their behaviour.
|
samer@73
|
1223
|
samer@73
|
1224
|
samer@73
|
1225
|
samer@73
|
1226 \section*{acknowledgments}
|
samer@73
|
1227 This work is supported by EPSRC Doctoral Training Centre EP/G03723X/1 (HE),
|
samer@73
|
1228 GR/S82213/01 and EP/E045235/1(SA), an EPSRC DTA Studentship (PF), an RAEng/EPSRC Research Fellowship 10216/88 (AR), an EPSRC Leadership Fellowship, EP/G007144/1
|
samer@73
|
1229 (MDP) and EPSRC IDyOM2 EP/H013059/1.
|
samer@73
|
1230 This work is partly funded by the CoSound project, funded by the Danish Agency for Science, Technology and Innovation.
|
samer@73
|
1231 Thanks also Marcus Pearce for providing the two rule-based analyses of \emph{Two Pages}.
|
samer@73
|
1232
|
samer@73
|
1233
|
samer@73
|
1234 \bibliographystyle{IEEEtran}
|
samer@73
|
1235 {\bibliography{all,c4dm,nime,andrew}}
|
samer@73
|
1236 \end{document}
|