annotate draft.tex @ 64:a18a4b0517e8

Finished sec 3B.
author samer
date Sat, 17 Mar 2012 01:00:06 +0000
parents 2cd533f149b7
children 9d7e5f690f28
rev   line source
samer@41 1 \documentclass[conference]{IEEEtran}
samer@59 2 \usepackage{fixltx2e}
samer@4 3 \usepackage{cite}
samer@4 4 \usepackage[cmex10]{amsmath}
samer@4 5 \usepackage{graphicx}
samer@4 6 \usepackage{amssymb}
samer@4 7 \usepackage{epstopdf}
samer@4 8 \usepackage{url}
samer@4 9 \usepackage{listings}
samer@18 10 %\usepackage[expectangle]{tools}
samer@9 11 \usepackage{tools}
samer@18 12 \usepackage{tikz}
samer@18 13 \usetikzlibrary{calc}
samer@18 14 \usetikzlibrary{matrix}
samer@18 15 \usetikzlibrary{patterns}
samer@18 16 \usetikzlibrary{arrows}
samer@9 17
samer@9 18 \let\citep=\cite
samer@33 19 \newcommand{\colfig}[2][1]{\includegraphics[width=#1\linewidth]{figs/#2}}%
samer@18 20 \newcommand\preals{\reals_+}
samer@18 21 \newcommand\X{\mathcal{X}}
samer@18 22 \newcommand\Y{\mathcal{Y}}
samer@18 23 \newcommand\domS{\mathcal{S}}
samer@18 24 \newcommand\A{\mathcal{A}}
samer@25 25 \newcommand\Data{\mathcal{D}}
samer@18 26 \newcommand\rvm[1]{\mathrm{#1}}
samer@18 27 \newcommand\sps{\,.\,}
samer@18 28 \newcommand\Ipred{\mathcal{I}_{\mathrm{pred}}}
samer@18 29 \newcommand\Ix{\mathcal{I}}
samer@18 30 \newcommand\IXZ{\overline{\underline{\mathcal{I}}}}
samer@18 31 \newcommand\x{\vec{x}}
samer@18 32 \newcommand\Ham[1]{\mathcal{H}_{#1}}
samer@18 33 \newcommand\subsets[2]{[#1]^{(k)}}
samer@18 34 \def\bet(#1,#2){#1..#2}
samer@18 35
samer@18 36
samer@18 37 \def\ev(#1=#2){#1\!\!=\!#2}
samer@18 38 \newcommand\rv[1]{\Omega \to #1}
samer@18 39 \newcommand\ceq{\!\!=\!}
samer@18 40 \newcommand\cmin{\!-\!}
samer@18 41 \newcommand\modulo[2]{#1\!\!\!\!\!\mod#2}
samer@18 42
samer@18 43 \newcommand\sumitoN{\sum_{i=1}^N}
samer@18 44 \newcommand\sumktoK{\sum_{k=1}^K}
samer@18 45 \newcommand\sumjtoK{\sum_{j=1}^K}
samer@18 46 \newcommand\sumalpha{\sum_{\alpha\in\A}}
samer@18 47 \newcommand\prodktoK{\prod_{k=1}^K}
samer@18 48 \newcommand\prodjtoK{\prod_{j=1}^K}
samer@18 49
samer@18 50 \newcommand\past[1]{\overset{\rule{0pt}{0.2em}\smash{\leftarrow}}{#1}}
samer@18 51 \newcommand\fut[1]{\overset{\rule{0pt}{0.1em}\smash{\rightarrow}}{#1}}
samer@18 52 \newcommand\parity[2]{P^{#1}_{2,#2}}
samer@4 53
samer@4 54 %\usepackage[parfill]{parskip}
samer@4 55
samer@4 56 \begin{document}
samer@41 57 \title{Cognitive Music Modelling: an\\Information Dynamics Approach}
samer@4 58
samer@4 59 \author{
hekeus@16 60 \IEEEauthorblockN{Samer A. Abdallah, Henrik Ekeus, Peter Foster}
hekeus@16 61 \IEEEauthorblockN{Andrew Robertson and Mark D. Plumbley}
samer@4 62 \IEEEauthorblockA{Centre for Digital Music\\
samer@4 63 Queen Mary University of London\\
samer@41 64 Mile End Road, London E1 4NS}}
samer@4 65
samer@4 66 \maketitle
samer@18 67 \begin{abstract}
samer@61 68 We describe an information-theoretic approach to the analysis
samer@61 69 of music and other sequential data, which emphasises the predictive aspects
samer@61 70 of perception, and the dynamic process
samer@61 71 of forming and modifying expectations about an unfolding stream of data,
samer@61 72 characterising these using the tools of information theory: entropies,
samer@61 73 mutual informations, and related quantities.
samer@61 74 After reviewing the theoretical foundations,
samer@61 75 % we present a new result on predictive information rates in high-order Markov chains, and
samer@61 76 we discuss a few emerging areas of application, including
samer@61 77 musicological analysis, real-time beat-tracking analysis, and the generation
samer@61 78 of musical materials as a cognitively-informed compositional aid.
hekeus@16 79 \end{abstract}
samer@4 80
samer@4 81
samer@25 82 \section{Introduction}
samer@9 83 \label{s:Intro}
samer@56 84 The relationship between
samer@56 85 Shannon's \cite{Shannon48} information theory and music and art in general has been the
samer@56 86 subject of some interest since the 1950s
samer@56 87 \cite{Youngblood58,CoonsKraehenbuehl1958,HillerBean66,Moles66,Meyer67,Cohen1962}.
samer@56 88 The general thesis is that perceptible qualities and subjective states
samer@56 89 like uncertainty, surprise, complexity, tension, and interestingness
samer@56 90 are closely related to information-theoretic quantities like
samer@56 91 entropy, relative entropy, and mutual information.
samer@56 92
samer@56 93 Music is also an inherently dynamic process,
samer@61 94 where listeners build up expectations about what is to happen next,
samer@61 95 which may be fulfilled
samer@61 96 immediately, after some delay, or modified as the music unfolds.
samer@56 97 In this paper, we explore this ``Information Dynamics'' view of music,
samer@61 98 discussing the theory behind it and some emerging applications.
samer@9 99
samer@25 100 \subsection{Expectation and surprise in music}
samer@61 101 The thesis that the musical experience is strongly shaped by the generation
samer@61 102 and playing out of strong and weak expectations was put forward by, amongst others,
samer@61 103 music theorists L. B. Meyer \cite{Meyer67} and Narmour \citep{Narmour77}, but was
samer@18 104 recognised much earlier; for example,
samer@9 105 it was elegantly put by Hanslick \cite{Hanslick1854} in the
samer@9 106 nineteenth century:
samer@9 107 \begin{quote}
samer@9 108 `The most important factor in the mental process which accompanies the
samer@9 109 act of listening to music, and which converts it to a source of pleasure,
samer@18 110 is \ldots the intellectual satisfaction
samer@9 111 which the listener derives from continually following and anticipating
samer@9 112 the composer's intentions---now, to see his expectations fulfilled, and
samer@18 113 now, to find himself agreeably mistaken.
samer@18 114 %It is a matter of course that
samer@18 115 %this intellectual flux and reflux, this perpetual giving and receiving
samer@18 116 %takes place unconsciously, and with the rapidity of lightning-flashes.'
samer@9 117 \end{quote}
samer@9 118 An essential aspect of this is that music is experienced as a phenomenon
samer@61 119 that unfolds in time, rather than being apprehended as a static object
samer@61 120 presented in its entirety. Meyer argued that the experience depends
samer@9 121 on how we change and revise our conceptions \emph{as events happen}, on
samer@9 122 how expectation and prediction interact with occurrence, and that, to a
samer@9 123 large degree, the way to understand the effect of music is to focus on
samer@9 124 this `kinetics' of expectation and surprise.
samer@9 125
samer@25 126 Prediction and expectation are essentially probabilistic concepts
samer@25 127 and can be treated mathematically using probability theory.
samer@25 128 We suppose that when we listen to music, expectations are created on the basis
samer@25 129 of our familiarity with various styles of music and our ability to
samer@25 130 detect and learn statistical regularities in the music as they emerge,
samer@25 131 There is experimental evidence that human listeners are able to internalise
samer@25 132 statistical knowledge about musical structure, \eg
samer@25 133 \citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also
samer@25 134 that statistical models can form an effective basis for computational
samer@25 135 analysis of music, \eg
samer@25 136 \cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}.
samer@25 137
samer@56 138 % \subsection{Music and information theory}
samer@24 139 With a probabilistic framework for music modelling and prediction in hand,
samer@56 140 we are in a position to compute various
samer@25 141 \comment{
samer@25 142 which provides us with a number of measures, such as entropy
samer@25 143 and mutual information, which are suitable for quantifying states of
samer@25 144 uncertainty and surprise, and thus could potentially enable us to build
samer@25 145 quantitative models of the listening process described above. They are
samer@25 146 what Berlyne \cite{Berlyne71} called `collative variables' since they are
samer@25 147 to do with patterns of occurrence rather than medium-specific details.
samer@25 148 Berlyne sought to show that the collative variables are closely related to
samer@25 149 perceptual qualities like complexity, tension, interestingness,
samer@25 150 and even aesthetic value, not just in music, but in other temporal
samer@25 151 or visual media.
samer@25 152 The relevance of information theory to music and art has
samer@25 153 also been addressed by researchers from the 1950s onwards
samer@25 154 \cite{Youngblood58,CoonsKraehenbuehl1958,Cohen1962,HillerBean66,Moles66,Meyer67}.
samer@25 155 }
samer@9 156 information-theoretic quantities like entropy, relative entropy,
samer@9 157 and mutual information.
samer@9 158 % and are major determinants of the overall experience.
samer@9 159 Berlyne \cite{Berlyne71} called such quantities `collative variables', since
samer@9 160 they are to do with patterns of occurrence rather than medium-specific details,
samer@9 161 and developed the ideas of `information aesthetics' in an experimental setting.
samer@9 162 % Berlyne's `new experimental aesthetics', the `information-aestheticians'.
samer@9 163
samer@9 164 % Listeners then experience greater or lesser levels of surprise
samer@9 165 % in response to departures from these norms.
samer@9 166 % By careful manipulation
samer@9 167 % of the material, the composer can thus define, and induce within the
samer@9 168 % listener, a temporal programme of varying
samer@9 169 % levels of uncertainty, ambiguity and surprise.
samer@9 170
samer@9 171
samer@9 172 \subsection{Information dynamic approach}
samer@61 173 Our working hypothesis is that, as a
samer@24 174 listener (to which will refer as `it') listens to a piece of music, it maintains
samer@25 175 a dynamically evolving probabilistic model that enables it to make predictions
samer@24 176 about how the piece will continue, relying on both its previous experience
samer@61 177 of music and the emerging themes of the piece. As events unfold, it revises
samer@25 178 its probabilistic belief state, which includes predictive
samer@25 179 distributions over possible future events. These
samer@25 180 % distributions and changes in distributions
samer@25 181 can be characterised in terms of a handful of information
samer@25 182 theoretic-measures such as entropy and relative entropy. By tracing the
samer@24 183 evolution of a these measures, we obtain a representation which captures much
samer@25 184 of the significant structure of the music.
samer@25 185
samer@25 186 One of the consequences of this approach is that regardless of the details of
samer@25 187 the sensory input or even which sensory modality is being processed, the resulting
samer@25 188 analysis is in terms of the same units: quantities of information (bits) and
samer@61 189 rates of information flow (bits per second). The information
samer@25 190 theoretic concepts in terms of which the analysis is framed are universal to all sorts
samer@25 191 of data.
samer@25 192 In addition, when adaptive probabilistic models are used, expectations are
samer@61 193 created mainly in response to \emph{patterns} of occurence,
samer@25 194 rather the details of which specific things occur.
samer@25 195 Together, these suggest that an information dynamic analysis captures a
samer@25 196 high level of \emph{abstraction}, and could be used to
samer@25 197 make structural comparisons between different temporal media,
samer@25 198 such as music, film, animation, and dance.
samer@25 199 % analyse and compare information
samer@25 200 % flow in different temporal media regardless of whether they are auditory,
samer@25 201 % visual or otherwise.
samer@9 202
samer@25 203 Another consequence is that the information dynamic approach gives us a principled way
samer@24 204 to address the notion of \emph{subjectivity}, since the analysis is dependent on the
samer@24 205 probability model the observer starts off with, which may depend on prior experience
samer@24 206 or other factors, and which may change over time. Thus, inter-subject variablity and
samer@24 207 variation in subjects' responses over time are
samer@24 208 fundamental to the theory.
samer@9 209
samer@18 210 %modelling the creative process, which often alternates between generative
samer@18 211 %and selective or evaluative phases \cite{Boden1990}, and would have
samer@18 212 %applications in tools for computer aided composition.
samer@18 213
samer@18 214
samer@18 215 \section{Theoretical review}
samer@18 216
samer@34 217 \subsection{Entropy and information}
samer@41 218 \label{s:entro-info}
samer@41 219
samer@34 220 Let $X$ denote some variable whose value is initially unknown to our
samer@34 221 hypothetical observer. We will treat $X$ mathematically as a random variable,
samer@36 222 with a value to be drawn from some set $\X$ and a
samer@34 223 probability distribution representing the observer's beliefs about the
samer@34 224 true value of $X$.
samer@34 225 In this case, the observer's uncertainty about $X$ can be quantified
samer@34 226 as the entropy of the random variable $H(X)$. For a discrete variable
samer@36 227 with probability mass function $p:\X \to [0,1]$, this is
samer@34 228 \begin{equation}
samer@41 229 H(X) = \sum_{x\in\X} -p(x) \log p(x), % = \expect{-\log p(X)},
samer@34 230 \end{equation}
samer@41 231 % where $\expect{}$ is the expectation operator.
samer@41 232 The negative-log-probability
samer@34 233 $\ell(x) = -\log p(x)$ of a particular value $x$ can usefully be thought of as
samer@34 234 the \emph{surprisingness} of the value $x$ should it be observed, and
samer@61 235 hence the entropy is the expectation of the surprisingness, $\expect \ell(X)$.
samer@34 236
samer@34 237 Now suppose that the observer receives some new data $\Data$ that
samer@34 238 causes a revision of its beliefs about $X$. The \emph{information}
samer@34 239 in this new data \emph{about} $X$ can be quantified as the
samer@61 240 relative entropy or
samer@34 241 Kullback-Leibler (KL) divergence between the prior and posterior
samer@34 242 distributions $p(x)$ and $p(x|\Data)$ respectively:
samer@34 243 \begin{equation}
samer@34 244 \mathcal{I}_{\Data\to X} = D(p_{X|\Data} || p_{X})
samer@36 245 = \sum_{x\in\X} p(x|\Data) \log \frac{p(x|\Data)}{p(x)}.
samer@41 246 \label{eq:info}
samer@34 247 \end{equation}
samer@34 248 When there are multiple variables $X_1, X_2$
samer@34 249 \etc which the observer believes to be dependent, then the observation of
samer@34 250 one may change its beliefs and hence yield information about the
samer@34 251 others. The joint and conditional entropies as described in any
samer@34 252 textbook on information theory (\eg \cite{CoverThomas}) then quantify
samer@34 253 the observer's expected uncertainty about groups of variables given the
samer@34 254 values of others. In particular, the \emph{mutual information}
samer@34 255 $I(X_1;X_2)$ is both the expected information
samer@34 256 in an observation of $X_2$ about $X_1$ and the expected reduction
samer@34 257 in uncertainty about $X_1$ after observing $X_2$:
samer@34 258 \begin{equation}
samer@34 259 I(X_1;X_2) = H(X_1) - H(X_1|X_2),
samer@34 260 \end{equation}
samer@34 261 where $H(X_1|X_2) = H(X_1,X_2) - H(X_2)$ is the conditional entropy
samer@34 262 of $X_2$ given $X_1$. A little algebra shows that $I(X_1;X_2)=I(X_2;X_1)$
samer@34 263 and so the mutual information is symmetric in its arguments. A conditional
samer@34 264 form of the mutual information can be formulated analogously:
samer@34 265 \begin{equation}
samer@34 266 I(X_1;X_2|X_3) = H(X_1|X_3) - H(X_1|X_2,X_3).
samer@34 267 \end{equation}
samer@34 268 These relationships between the various entropies and mutual
samer@61 269 informations are conveniently visualised in \emph{information diagrams}
samer@34 270 or I-diagrams \cite{Yeung1991} such as the one in \figrf{venn-example}.
samer@34 271
samer@18 272 \begin{fig}{venn-example}
samer@18 273 \newcommand\rad{2.2em}%
samer@18 274 \newcommand\circo{circle (3.4em)}%
samer@18 275 \newcommand\labrad{4.3em}
samer@18 276 \newcommand\bound{(-6em,-5em) rectangle (6em,6em)}
samer@18 277 \newcommand\colsep{\ }
samer@18 278 \newcommand\clipin[1]{\clip (#1) \circo;}%
samer@18 279 \newcommand\clipout[1]{\clip \bound (#1) \circo;}%
samer@18 280 \newcommand\cliptwo[3]{%
samer@18 281 \begin{scope}
samer@18 282 \clipin{#1};
samer@18 283 \clipin{#2};
samer@18 284 \clipout{#3};
samer@18 285 \fill[black!30] \bound;
samer@18 286 \end{scope}
samer@18 287 }%
samer@18 288 \newcommand\clipone[3]{%
samer@18 289 \begin{scope}
samer@18 290 \clipin{#1};
samer@18 291 \clipout{#2};
samer@18 292 \clipout{#3};
samer@18 293 \fill[black!15] \bound;
samer@18 294 \end{scope}
samer@18 295 }%
samer@18 296 \begin{tabular}{c@{\colsep}c}
samer@18 297 \begin{tikzpicture}[baseline=0pt]
samer@18 298 \coordinate (p1) at (90:\rad);
samer@18 299 \coordinate (p2) at (210:\rad);
samer@18 300 \coordinate (p3) at (-30:\rad);
samer@18 301 \clipone{p1}{p2}{p3};
samer@18 302 \clipone{p2}{p3}{p1};
samer@18 303 \clipone{p3}{p1}{p2};
samer@18 304 \cliptwo{p1}{p2}{p3};
samer@18 305 \cliptwo{p2}{p3}{p1};
samer@18 306 \cliptwo{p3}{p1}{p2};
samer@18 307 \begin{scope}
samer@18 308 \clip (p1) \circo;
samer@18 309 \clip (p2) \circo;
samer@18 310 \clip (p3) \circo;
samer@18 311 \fill[black!45] \bound;
samer@18 312 \end{scope}
samer@18 313 \draw (p1) \circo;
samer@18 314 \draw (p2) \circo;
samer@18 315 \draw (p3) \circo;
samer@18 316 \path
samer@18 317 (barycentric cs:p3=1,p1=-0.2,p2=-0.1) +(0ex,0) node {$I_{3|12}$}
samer@18 318 (barycentric cs:p1=1,p2=-0.2,p3=-0.1) +(0ex,0) node {$I_{1|23}$}
samer@18 319 (barycentric cs:p2=1,p3=-0.2,p1=-0.1) +(0ex,0) node {$I_{2|13}$}
samer@18 320 (barycentric cs:p3=1,p2=1,p1=-0.55) +(0ex,0) node {$I_{23|1}$}
samer@18 321 (barycentric cs:p1=1,p3=1,p2=-0.55) +(0ex,0) node {$I_{13|2}$}
samer@18 322 (barycentric cs:p2=1,p1=1,p3=-0.55) +(0ex,0) node {$I_{12|3}$}
samer@18 323 (barycentric cs:p3=1,p2=1,p1=1) node {$I_{123}$}
samer@18 324 ;
samer@18 325 \path
samer@18 326 (p1) +(140:\labrad) node {$X_1$}
samer@18 327 (p2) +(-140:\labrad) node {$X_2$}
samer@18 328 (p3) +(-40:\labrad) node {$X_3$};
samer@18 329 \end{tikzpicture}
samer@18 330 &
samer@18 331 \parbox{0.5\linewidth}{
samer@18 332 \small
samer@18 333 \begin{align*}
samer@18 334 I_{1|23} &= H(X_1|X_2,X_3) \\
samer@18 335 I_{13|2} &= I(X_1;X_3|X_2) \\
samer@18 336 I_{1|23} + I_{13|2} &= H(X_1|X_2) \\
samer@18 337 I_{12|3} + I_{123} &= I(X_1;X_2)
samer@18 338 \end{align*}
samer@18 339 }
samer@18 340 \end{tabular}
samer@18 341 \caption{
samer@61 342 I-diagram of entropies and mutual informations
samer@18 343 for three random variables $X_1$, $X_2$ and $X_3$. The areas of
samer@18 344 the three circles represent $H(X_1)$, $H(X_2)$ and $H(X_3)$ respectively.
samer@18 345 The total shaded area is the joint entropy $H(X_1,X_2,X_3)$.
samer@18 346 The central area $I_{123}$ is the co-information \cite{McGill1954}.
samer@18 347 Some other information measures are indicated in the legend.
samer@18 348 }
samer@18 349 \end{fig}
samer@30 350
samer@30 351
samer@36 352 \subsection{Surprise and information in sequences}
samer@36 353 \label{s:surprise-info-seq}
samer@30 354
samer@36 355 Suppose that $(\ldots,X_{-1},X_0,X_1,\ldots)$ is a sequence of
samer@30 356 random variables, infinite in both directions,
samer@36 357 and that $\mu$ is the associated probability measure over all
samer@61 358 realisations of the sequence. In the following, $\mu$ will simply serve
samer@30 359 as a label for the process. We can indentify a number of information-theoretic
samer@30 360 measures meaningful in the context of a sequential observation of the sequence, during
samer@61 361 which, at any time $t$, the sequence can be divided into a `present' $X_t$, a `past'
samer@30 362 $\past{X}_t \equiv (\ldots, X_{t-2}, X_{t-1})$, and a `future'
samer@30 363 $\fut{X}_t \equiv (X_{t+1},X_{t+2},\ldots)$.
samer@41 364 We will write the actually observed value of $X_t$ as $x_t$, and
samer@36 365 the sequence of observations up to but not including $x_t$ as
samer@36 366 $\past{x}_t$.
samer@36 367 % Since the sequence is assumed stationary, we can without loss of generality,
samer@36 368 % assume that $t=0$ in the following definitions.
samer@36 369
samer@41 370 The in-context surprisingness of the observation $X_t=x_t$ depends on
samer@41 371 both $x_t$ and the context $\past{x}_t$:
samer@36 372 \begin{equation}
samer@41 373 \ell_t = - \log p(x_t|\past{x}_t).
samer@36 374 \end{equation}
samer@61 375 However, before $X_t$ is observed, the observer can compute
samer@46 376 the \emph{expected} surprisingness as a measure of its uncertainty about
samer@61 377 $X_t$; this may be written as an entropy
samer@36 378 $H(X_t|\ev(\past{X}_t = \past{x}_t))$, but note that this is
samer@61 379 conditional on the \emph{event} $\ev(\past{X}_t=\past{x}_t)$, not the
samer@36 380 \emph{variables} $\past{X}_t$ as in the conventional conditional entropy.
samer@36 381
samer@41 382 The surprisingness $\ell_t$ and expected surprisingness
samer@36 383 $H(X_t|\ev(\past{X}_t=\past{x}_t))$
samer@41 384 can be understood as \emph{subjective} information dynamic measures, since they are
samer@41 385 based on the observer's probability model in the context of the actually observed sequence
samer@61 386 $\past{x}_t$. They characterise what it is like to be `in the observer's shoes'.
samer@36 387 If we view the observer as a purely passive or reactive agent, this would
samer@36 388 probably be sufficient, but for active agents such as humans or animals, it is
samer@36 389 often necessary to \emph{aniticipate} future events in order, for example, to plan the
samer@36 390 most effective course of action. It makes sense for such observers to be
samer@36 391 concerned about the predictive probability distribution over future events,
samer@36 392 $p(\fut{x}_t|\past{x}_t)$. When an observation $\ev(X_t=x_t)$ is made in this context,
samer@41 393 the \emph{instantaneous predictive information} (IPI) $\mathcal{I}_t$ at time $t$
samer@41 394 is the information in the event $\ev(X_t=x_t)$ about the entire future of the sequence $\fut{X}_t$,
samer@41 395 \emph{given} the observed past $\past{X}_t=\past{x}_t$.
samer@41 396 Referring to the definition of information \eqrf{info}, this is the KL divergence
samer@41 397 between prior and posterior distributions over possible futures, which written out in full, is
samer@41 398 \begin{equation}
samer@41 399 \mathcal{I}_t = \sum_{\fut{x}_t \in \X^*}
samer@41 400 p(\fut{x}_t|x_t,\past{x}_t) \log \frac{ p(\fut{x}_t|x_t,\past{x}_t) }{ p(\fut{x}_t|\past{x}_t) },
samer@41 401 \end{equation}
samer@41 402 where the sum is to be taken over the set of infinite sequences $\X^*$.
samer@46 403 Note that it is quite possible for an event to be surprising but not informative
samer@46 404 in predictive sense.
samer@41 405 As with the surprisingness, the observer can compute its \emph{expected} IPI
samer@41 406 at time $t$, which reduces to a mutual information $I(X_t;\fut{X}_t|\ev(\past{X}_t=\past{x}_t))$
samer@41 407 conditioned on the observed past. This could be used, for example, as an estimate
samer@41 408 of attentional resources which should be directed at this stream of data, which may
samer@41 409 be in competition with other sensory streams.
samer@36 410
samer@36 411 \subsection{Information measures for stationary random processes}
samer@43 412 \label{s:process-info}
samer@30 413
samer@18 414
samer@18 415 \begin{fig}{predinfo-bg}
samer@18 416 \newcommand\subfig[2]{\shortstack{#2\\[0.75em]#1}}
samer@18 417 \newcommand\rad{1.8em}%
samer@18 418 \newcommand\ovoid[1]{%
samer@18 419 ++(-#1,\rad)
samer@18 420 -- ++(2 * #1,0em) arc (90:-90:\rad)
samer@18 421 -- ++(-2 * #1,0em) arc (270:90:\rad)
samer@18 422 }%
samer@18 423 \newcommand\axis{2.75em}%
samer@18 424 \newcommand\olap{0.85em}%
samer@18 425 \newcommand\offs{3.6em}
samer@18 426 \newcommand\colsep{\hspace{5em}}
samer@18 427 \newcommand\longblob{\ovoid{\axis}}
samer@18 428 \newcommand\shortblob{\ovoid{1.75em}}
samer@56 429 \begin{tabular}{c}
samer@43 430 \subfig{(a) multi-information and entropy rates}{%
samer@43 431 \begin{tikzpicture}%[baseline=-1em]
samer@43 432 \newcommand\rc{1.75em}
samer@43 433 \newcommand\throw{2.5em}
samer@43 434 \coordinate (p1) at (180:1.5em);
samer@43 435 \coordinate (p2) at (0:0.3em);
samer@43 436 \newcommand\bound{(-7em,-2.6em) rectangle (7em,3.0em)}
samer@43 437 \newcommand\present{(p2) circle (\rc)}
samer@43 438 \newcommand\thepast{(p1) ++(-\throw,0) \ovoid{\throw}}
samer@43 439 \newcommand\fillclipped[2]{%
samer@43 440 \begin{scope}[even odd rule]
samer@43 441 \foreach \thing in {#2} {\clip \thing;}
samer@43 442 \fill[black!#1] \bound;
samer@43 443 \end{scope}%
samer@43 444 }%
samer@43 445 \fillclipped{30}{\present,\bound \thepast}
samer@43 446 \fillclipped{15}{\present,\bound \thepast}
samer@43 447 \fillclipped{45}{\present,\thepast}
samer@43 448 \draw \thepast;
samer@43 449 \draw \present;
samer@43 450 \node at (barycentric cs:p2=1,p1=-0.3) {$h_\mu$};
samer@43 451 \node at (barycentric cs:p2=1,p1=1) [shape=rectangle,fill=black!45,inner sep=1pt]{$\rho_\mu$};
samer@43 452 \path (p2) +(90:3em) node {$X_0$};
samer@43 453 \path (p1) +(-3em,0em) node {\shortstack{infinite\\past}};
samer@43 454 \path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
samer@43 455 \end{tikzpicture}}%
samer@43 456 \\[1.25em]
samer@43 457 \subfig{(b) excess entropy}{%
samer@18 458 \newcommand\blob{\longblob}
samer@18 459 \begin{tikzpicture}
samer@18 460 \coordinate (p1) at (-\offs,0em);
samer@18 461 \coordinate (p2) at (\offs,0em);
samer@18 462 \begin{scope}
samer@18 463 \clip (p1) \blob;
samer@18 464 \clip (p2) \blob;
samer@18 465 \fill[lightgray] (-1,-1) rectangle (1,1);
samer@18 466 \end{scope}
samer@18 467 \draw (p1) +(-0.5em,0em) node{\shortstack{infinite\\past}} \blob;
samer@18 468 \draw (p2) +(0.5em,0em) node{\shortstack{infinite\\future}} \blob;
samer@18 469 \path (0,0) node (future) {$E$};
samer@18 470 \path (p1) +(-2em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
samer@18 471 \path (p2) +(2em,\rad) node [anchor=south] {$X_0,\ldots$};
samer@18 472 \end{tikzpicture}%
samer@18 473 }%
samer@18 474 \\[1.25em]
samer@43 475 \subfig{(c) predictive information rate $b_\mu$}{%
samer@18 476 \begin{tikzpicture}%[baseline=-1em]
samer@18 477 \newcommand\rc{2.1em}
samer@18 478 \newcommand\throw{2.5em}
samer@18 479 \coordinate (p1) at (210:1.5em);
samer@18 480 \coordinate (p2) at (90:0.7em);
samer@18 481 \coordinate (p3) at (-30:1.5em);
samer@18 482 \newcommand\bound{(-7em,-2.6em) rectangle (7em,3.0em)}
samer@18 483 \newcommand\present{(p2) circle (\rc)}
samer@18 484 \newcommand\thepast{(p1) ++(-\throw,0) \ovoid{\throw}}
samer@18 485 \newcommand\future{(p3) ++(\throw,0) \ovoid{\throw}}
samer@18 486 \newcommand\fillclipped[2]{%
samer@18 487 \begin{scope}[even odd rule]
samer@18 488 \foreach \thing in {#2} {\clip \thing;}
samer@18 489 \fill[black!#1] \bound;
samer@18 490 \end{scope}%
samer@18 491 }%
samer@43 492 \fillclipped{80}{\future,\thepast}
samer@18 493 \fillclipped{30}{\present,\future,\bound \thepast}
samer@18 494 \fillclipped{15}{\present,\bound \future,\bound \thepast}
samer@18 495 \draw \future;
samer@18 496 \fillclipped{45}{\present,\thepast}
samer@18 497 \draw \thepast;
samer@18 498 \draw \present;
samer@18 499 \node at (barycentric cs:p2=1,p1=-0.17,p3=-0.17) {$r_\mu$};
samer@18 500 \node at (barycentric cs:p1=-0.4,p2=1.0,p3=1) {$b_\mu$};
samer@18 501 \node at (barycentric cs:p3=0,p2=1,p1=1.2) [shape=rectangle,fill=black!45,inner sep=1pt]{$\rho_\mu$};
samer@18 502 \path (p2) +(140:3em) node {$X_0$};
samer@18 503 % \node at (barycentric cs:p3=0,p2=1,p1=1) {$\rho_\mu$};
samer@18 504 \path (p3) +(3em,0em) node {\shortstack{infinite\\future}};
samer@18 505 \path (p1) +(-3em,0em) node {\shortstack{infinite\\past}};
samer@18 506 \path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
samer@18 507 \path (p3) +(4em,\rad) node [anchor=south] {$X_1,\ldots$};
samer@18 508 \end{tikzpicture}}%
samer@18 509 \\[0.5em]
samer@18 510 \end{tabular}
samer@18 511 \caption{
samer@30 512 I-diagrams for several information measures in
samer@18 513 stationary random processes. Each circle or oval represents a random
samer@18 514 variable or sequence of random variables relative to time $t=0$. Overlapped areas
samer@61 515 correspond to various mutual informations.
samer@61 516 In (a) and (c), the circle represents the `present'. Its total area is
samer@33 517 $H(X_0)=\rho_\mu+r_\mu+b_\mu$, where $\rho_\mu$ is the multi-information
samer@18 518 rate, $r_\mu$ is the residual entropy rate, and $b_\mu$ is the predictive
samer@43 519 information rate. The entropy rate is $h_\mu = r_\mu+b_\mu$. The small dark
samer@43 520 region below $X_0$ in (c) is $\sigma_\mu = E-\rho_\mu$.
samer@18 521 }
samer@18 522 \end{fig}
samer@18 523
samer@41 524 If we step back, out of the observer's shoes as it were, and consider the
samer@41 525 random process $(\ldots,X_{-1},X_0,X_1,\dots)$ as a statistical ensemble of
samer@41 526 possible realisations, and furthermore assume that it is stationary,
samer@41 527 then it becomes possible to define a number of information-theoretic measures,
samer@41 528 closely related to those described above, but which characterise the
samer@41 529 process as a whole, rather than on a moment-by-moment basis. Some of these,
samer@41 530 such as the entropy rate, are well-known, but others are only recently being
samer@41 531 investigated. (In the following, the assumption of stationarity means that
samer@41 532 the measures defined below are independent of $t$.)
samer@41 533
samer@61 534 The \emph{entropy rate} of the process is the entropy of the `present'
samer@61 535 $X_t$ given the `past':
samer@41 536 \begin{equation}
samer@41 537 \label{eq:entro-rate}
samer@41 538 h_\mu = H(X_t|\past{X}_t).
samer@41 539 \end{equation}
samer@51 540 The entropy rate is a measure of the overall surprisingness
samer@51 541 or unpredictability of the process, and gives an indication of the average
samer@51 542 level of surprise and uncertainty that would be experienced by an observer
samer@61 543 computing the measures of \secrf{surprise-info-seq} on a sequence sampled
samer@61 544 from the process.
samer@41 545
samer@41 546 The \emph{multi-information rate} $\rho_\mu$ (following Dubnov's \cite{Dubnov2006}
samer@41 547 notation for what he called the `information rate') is the mutual
samer@41 548 information between the `past' and the `present':
samer@41 549 \begin{equation}
samer@41 550 \label{eq:multi-info}
samer@41 551 \rho_\mu = I(\past{X}_t;X_t) = H(X_t) - h_\mu.
samer@41 552 \end{equation}
samer@61 553 It is a measure of how much the preceeding context of an observation
samer@61 554 helps in predicting or reducing the suprisingness of the current observation.
samer@41 555
samer@41 556 The \emph{excess entropy} \cite{CrutchfieldPackard1983}
samer@41 557 is the mutual information between
samer@41 558 the entire `past' and the entire `future':
samer@41 559 \begin{equation}
samer@41 560 E = I(\past{X}_t; X_t,\fut{X}_t).
samer@41 561 \end{equation}
samer@43 562 Both the excess entropy and the multi-information rate can be thought
samer@43 563 of as measures of \emph{redundancy}, quantifying the extent to which
samer@43 564 the same information is to be found in all parts of the sequence.
samer@41 565
samer@41 566
samer@30 567 The \emph{predictive information rate} (or PIR) \cite{AbdallahPlumbley2009}
samer@61 568 is the mutual information between the `present' and the `future' given the
samer@61 569 `past':
samer@18 570 \begin{equation}
samer@18 571 \label{eq:PIR}
samer@61 572 b_\mu = I(X_t;\fut{X}_t|\past{X}_t) = H(\fut{X}_t|\past{X}_t) - H(\fut{X}_t|X_t,\past{X}_t),
samer@18 573 \end{equation}
samer@61 574 which can be read as the average reduction
samer@18 575 in uncertainty about the future on learning $X_t$, given the past.
samer@18 576 Due to the symmetry of the mutual information, it can also be written
samer@18 577 as
samer@18 578 \begin{equation}
samer@18 579 % \IXZ_t
samer@43 580 b_\mu = H(X_t|\past{X}_t) - H(X_t|\past{X}_t,\fut{X}_t) = h_\mu - r_\mu,
samer@18 581 % \label{<++>}
samer@18 582 \end{equation}
samer@18 583 % If $X$ is stationary, then
samer@41 584 where $r_\mu = H(X_t|\fut{X}_t,\past{X}_t)$,
samer@34 585 is the \emph{residual} \cite{AbdallahPlumbley2010},
samer@34 586 or \emph{erasure} \cite{VerduWeissman2006} entropy rate.
samer@18 587 These relationships are illustrated in \Figrf{predinfo-bg}, along with
samer@18 588 several of the information measures we have discussed so far.
samer@51 589 The PIR gives an indication of the average IPI that would be experienced
samer@51 590 by an observer processing a sequence sampled from this process.
samer@18 591
samer@18 592
samer@46 593 James et al \cite{JamesEllisonCrutchfield2011} review several of these
samer@46 594 information measures and introduce some new related ones.
samer@46 595 In particular they identify the $\sigma_\mu = I(\past{X}_t;\fut{X}_t|X_t)$,
samer@46 596 the mutual information between the past and the future given the present,
samer@46 597 as an interesting quantity that measures the predictive benefit of
samer@61 598 model-building, that is, maintaining an internal state summarising past
samer@61 599 observations in order to make better predictions. It is shown as the
samer@46 600 small dark region below the circle in \figrf{predinfo-bg}(c).
samer@46 601 By comparing with \figrf{predinfo-bg}(b), we can see that
samer@46 602 $\sigma_\mu = E - \rho_\mu$.
samer@43 603 % They also identify
samer@43 604 % $w_\mu = \rho_\mu + b_{\mu}$, which they call the \emph{local exogenous
samer@43 605 % information} rate.
samer@34 606
samer@4 607
samer@36 608 \subsection{First and higher order Markov chains}
samer@53 609 \label{s:markov}
samer@36 610 First order Markov chains are the simplest non-trivial models to which information
samer@36 611 dynamics methods can be applied. In \cite{AbdallahPlumbley2009} we derived
samer@41 612 expressions for all the information measures described in \secrf{surprise-info-seq} for
samer@61 613 ergodic Markov chains (\ie that have a unique stationary
samer@61 614 distribution).
samer@61 615 % The derivation is greatly simplified by the dependency structure
samer@61 616 % of the Markov chain: for the purpose of the analysis, the `past' and `future'
samer@61 617 % segments $\past{X}_t$ and $\fut{X}_t$ can be collapsed to just the previous
samer@61 618 % and next variables $X_{t-1}$ and $X_{t+1}$ respectively.
samer@61 619 We also showed that
samer@36 620 the predictive information rate can be expressed simply in terms of entropy rates:
samer@36 621 if we let $a$ denote the $K\times K$ transition matrix of a Markov chain over
samer@36 622 an alphabet of $\{1,\ldots,K\}$, such that
samer@61 623 $a_{ij} = \Pr(\ev(X_t=i|\ev(X_{t-1}=j)))$, and let $h:\reals^{K\times K}\to \reals$ be
samer@36 624 the entropy rate function such that $h(a)$ is the entropy rate of a Markov chain
samer@61 625 with transition matrix $a$, then the predictive information rate is
samer@36 626 \begin{equation}
samer@61 627 b_\mu = h(a^2) - h(a),
samer@36 628 \end{equation}
samer@36 629 where $a^2$, the transition matrix squared, is the transition matrix
samer@36 630 of the `skip one' Markov chain obtained by jumping two steps at a time
samer@36 631 along the original chain.
samer@36 632
samer@36 633 Second and higher order Markov chains can be treated in a similar way by transforming
samer@36 634 to a first order representation of the high order Markov chain. If we are dealing
samer@36 635 with an $N$th order model, this is done forming a new alphabet of size $K^N$
samer@41 636 consisting of all possible $N$-tuples of symbols from the base alphabet.
samer@41 637 An observation $\hat{x}_t$ in this new model encodes a block of $N$ observations
samer@36 638 $(x_{t+1},\ldots,x_{t+N})$ from the base model. The next
samer@41 639 observation $\hat{x}_{t+1}$ encodes the block of $N$ obtained by shifting the previous
samer@36 640 block along by one step. The new Markov of chain is parameterised by a sparse $K^N\times K^N$
samer@41 641 transition matrix $\hat{a}$. Adopting the label $\mu$ for the order $N$ system,
samer@41 642 we obtain:
samer@36 643 \begin{equation}
samer@41 644 h_\mu = h(\hat{a}), \qquad b_\mu = h({\hat{a}^{N+1}}) - N h({\hat{a}}),
samer@36 645 \end{equation}
samer@36 646 where $\hat{a}^{N+1}$ is the $(N+1)$th power of the first order transition matrix.
samer@41 647 Other information measures can also be computed for the high-order Markov chain, including
samer@41 648 the multi-information rate $\rho_\mu$ and the excess entropy $E$. These are identical
samer@41 649 for first order Markov chains, but for order $N$ chains, $E$ can be up to $N$ times larger
samer@41 650 than $\rho_\mu$.
samer@43 651
samer@61 652 In our early experiments with visualising and sonifying sequences sampled from
samer@61 653 first order Markov chains \cite{AbdallahPlumbley2009}, we found that
samer@61 654 the measures $h_\mu$, $\rho_\mu$ and $b_\mu$ are related to perceptible
samer@61 655 characteristics, and that the kinds of transition matrices maximising or minimising
samer@61 656 each of these quantities are quite distinct. High entropy rates are associated
samer@61 657 with completely uncorrelated sequences with no recognisable temporal structure,
samer@61 658 along with low $\rho_\mu$ and $b_\mu$.
samer@61 659 High values of $\rho_\mu$ are associated with long periodic cycles, low $h_\mu$
samer@61 660 and low $b_\mu$. High values of $b_\mu$ are associated with intermediate values
samer@61 661 of $\rho_\mu$ and $h_\mu$, and recognisable, but not completely predictable,
samer@61 662 temporal structures. These relationships are visible in \figrf{mtriscat} in
samer@61 663 \secrf{composition}, where we pick up the thread with an application of
samer@61 664 information dynamics in a compositional aid.
samer@36 665
samer@36 666
hekeus@16 667 \section{Information Dynamics in Analysis}
samer@4 668
samer@24 669 \begin{fig}{twopages}
samer@33 670 \colfig[0.96]{matbase/fig9471} % update from mbc paper
samer@33 671 % \colfig[0.97]{matbase/fig72663}\\ % later update from mbc paper (Keith's new picks)
samer@24 672 \vspace*{1em}
samer@24 673 \colfig[0.97]{matbase/fig13377} % rule based analysis
samer@24 674 \caption{Analysis of \emph{Two Pages}.
samer@24 675 The thick vertical lines are the part boundaries as indicated in
samer@24 676 the score by the composer.
samer@24 677 The thin grey lines
samer@24 678 indicate changes in the melodic `figures' of which the piece is
samer@24 679 constructed. In the `model information rate' panel, the black asterisks
samer@24 680 mark the
samer@24 681 six most surprising moments selected by Keith Potter.
samer@24 682 The bottom panel shows a rule-based boundary strength analysis computed
samer@24 683 using Cambouropoulos' LBDM.
samer@24 684 All information measures are in nats and time is in notes.
samer@24 685 }
samer@24 686 \end{fig}
samer@24 687
samer@36 688 \subsection{Musicological Analysis}
samer@36 689 In \cite{AbdallahPlumbley2009}, methods based on the theory described above
samer@36 690 were used to analysis two pieces of music in the minimalist style
samer@36 691 by Philip Glass: \emph{Two Pages} (1969) and \emph{Gradus} (1968).
samer@36 692 The analysis was done using a first-order Markov chain model, with the
samer@36 693 enhancement that the transition matrix of the model was allowed to
samer@36 694 evolve dynamically as the notes were processed, and was tracked (in
samer@36 695 a Bayesian way) as a \emph{distribution} over possible transition matrices,
samer@61 696 rather than a point estimate. Some results are summarised in \figrf{twopages}:
samer@36 697 the upper four plots show the dynamically evolving subjective information
samer@36 698 measures as described in \secrf{surprise-info-seq} computed using a point
samer@61 699 estimate of the current transition matrix; the fifth plot (the `model information rate')
samer@36 700 measures the information in each observation about the transition matrix.
samer@36 701 In \cite{AbdallahPlumbley2010b}, we showed that this `model information rate'
samer@61 702 is actually a component of the true IPI when the transition
samer@61 703 matrix is being learned online, and was neglected when we computed the IPI from
samer@61 704 the transition matrix as if the transition probabilities
samer@36 705 were constant.
samer@36 706
samer@36 707 The peaks of the surprisingness and both components of the predictive information
samer@36 708 show good correspondence with structure of the piece both as marked in the score
samer@36 709 and as analysed by musicologist Keith Potter, who was asked to mark the six
samer@36 710 `most surprising moments' of the piece (shown as asterisks in the fifth plot)%
samer@36 711 \footnote{%
samer@36 712 Note that the boundary marked in the score at around note 5,400 is known to be
samer@61 713 anomalous; on the basis of a listening analysis, some musicologists have
samer@61 714 placed the boundary a few bars later, in agreement with our analysis
samer@61 715 \cite{PotterEtAl2007}.}
samer@36 716
samer@36 717 In contrast, the analyses shown in the lower two plots of \figrf{twopages},
samer@36 718 obtained using two rule-based music segmentation algorithms, while clearly
samer@37 719 \emph{reflecting} the structure of the piece, do not \emph{segment} the piece,
samer@37 720 with no tendency to peaking of the boundary strength function at
samer@36 721 the boundaries in the piece.
samer@36 722
samer@46 723 The complete analysis of \emph{Gradus} can be found in \cite{AbdallahPlumbley2009},
samer@46 724 but \figrf{metre} illustrates the result of a metrical analysis: the piece was divided
samer@46 725 into bars of 32, 64 and 128 notes. In each case, the average surprisingness and
samer@46 726 IPI for the first, second, third \etc notes in each bar were computed. The plots
samer@46 727 show that the first note of each bar is, on average, significantly more surprising
samer@46 728 and informative than the others, up to the 64-note level, where as at the 128-note,
samer@46 729 level, the dominant periodicity appears to remain at 64 notes.
samer@36 730
samer@24 731 \begin{fig}{metre}
samer@33 732 % \scalebox{1}[1]{%
samer@24 733 \begin{tabular}{cc}
samer@33 734 \colfig[0.45]{matbase/fig36859} & \colfig[0.48]{matbase/fig88658} \\
samer@33 735 \colfig[0.45]{matbase/fig48061} & \colfig[0.48]{matbase/fig46367} \\
samer@33 736 \colfig[0.45]{matbase/fig99042} & \colfig[0.47]{matbase/fig87490}
samer@24 737 % \colfig[0.46]{matbase/fig56807} & \colfig[0.48]{matbase/fig27144} \\
samer@24 738 % \colfig[0.46]{matbase/fig87574} & \colfig[0.48]{matbase/fig13651} \\
samer@24 739 % \colfig[0.44]{matbase/fig19913} & \colfig[0.46]{matbase/fig66144} \\
samer@24 740 % \colfig[0.48]{matbase/fig73098} & \colfig[0.48]{matbase/fig57141} \\
samer@24 741 % \colfig[0.48]{matbase/fig25703} & \colfig[0.48]{matbase/fig72080} \\
samer@24 742 % \colfig[0.48]{matbase/fig9142} & \colfig[0.48]{matbase/fig27751}
samer@24 743
samer@24 744 \end{tabular}%
samer@33 745 % }
samer@24 746 \caption{Metrical analysis by computing average surprisingness and
samer@24 747 informative of notes at different periodicities (\ie hypothetical
samer@24 748 bar lengths) and phases (\ie positions within a bar).
samer@24 749 }
samer@24 750 \end{fig}
samer@24 751
samer@64 752 \subsection{Real-valued signals and audio analysis}
samer@64 753 Using analogous definitions based on the differential entropy
samer@64 754 \cite{CoverThomas}, the methods outlined
samer@64 755 in \secrf{surprise-info-seq} and \secrf{process-info}
samer@64 756 are equally applicable to random variables taking values in a continuous domain.
samer@42 757 In the case of music, where expressive properties such as dynamics, tempo,
samer@42 758 timing and timbre are readily quantified on a continuous scale, the information
samer@64 759 dynamic framework may thus be applied.
peterf@39 760
samer@64 761 Dubnov \cite{Dubnov2006} considers the class of stationary Gaussian
samer@42 762 processes. For such processes, the entropy rate may be obtained analytically
samer@64 763 from the power spectral density of the signal. Dubnov found that the
samer@64 764 multi-information rate (which he refers to as `information rate') can be
samer@64 765 expressed as a function of the spectral flatness measure. For a given variance,
samer@64 766 Gaussian processes with maximal multi-information rate are those with maximally
samer@64 767 non-flat spectra. These are essentially consist of a single
samer@64 768 sinusoidal component and hence are completely predictable and periodic once
samer@64 769 the parameters of the sinusoid have been inferred.
samer@61 770 % Local stationarity is assumed, which may be achieved by windowing or
samer@61 771 % change point detection \cite{Dubnov2008}.
samer@61 772 %TODO
peterf@39 773
samer@64 774 We are currently working towards methods for the computation of predictive information
samer@64 775 rate in some restricted classes of Gaussian processes including finite-order
samer@64 776 autoregressive models and processes with power-law spectra (fractional Brownian
samer@64 777 motions).
samer@64 778
samer@64 779 % mention non-gaussian processes extension Similarly, the predictive information
samer@64 780 % rate may be computed using a Gaussian linear formulation CITE. In this view,
samer@64 781 % the PIR is a function of the correlation between random innovations supplied
samer@64 782 % to the stochastic process. %Dubnov, MacAdams, Reynolds (2006) %Bailes and Dean (2009)
samer@64 783
peterf@26 784
samer@4 785
samer@4 786 \subsection{Beat Tracking}
samer@4 787
samer@43 788 A probabilistic method for drum tracking was presented by Robertson
samer@43 789 \cite{Robertson11c}. The algorithm is used to synchronise a music
samer@43 790 sequencer to a live drummer. The expected beat time of the sequencer is
samer@43 791 represented by a click track, and the algorithm takes as input event
samer@43 792 times for discrete kick and snare drum events relative to this click
samer@43 793 track. These are obtained using dedicated microphones for each drum and
samer@43 794 using a percussive onset detector (Puckette 1998). The drum tracker
samer@43 795 continually updates distributions for tempo and phase on receiving a new
samer@43 796 event time. We can thus quantify the information contributed of an event
samer@43 797 by measuring the difference between the system's prior distribution and
samer@43 798 the posterior distribution using the Kullback-Leiber divergence.
samer@43 799
samer@43 800 Here, we have calculated the KL divergence and entropy for kick and
samer@43 801 snare events in sixteen files. The analysis of information rates can be
samer@43 802 considered \emph{subjective}, in that it measures how the drum tracker's
samer@43 803 probability distributions change, and these are contingent upon the
samer@43 804 model used as well as external properties in the signal. We expect,
samer@43 805 however, that following periods of increased uncertainty, such as fills
samer@43 806 or expressive timing, the information contained in an individual event
samer@43 807 increases. We also examine whether the information is dependent upon
samer@43 808 metrical position.
samer@43 809
samer@61 810 % !!! FIXME
samer@4 811
samer@24 812 \section{Information dynamics as compositional aid}
samer@43 813 \label{s:composition}
samer@43 814
samer@53 815 The use of stochastic processes in music composition has been widespread for
samer@53 816 decades---for instance Iannis Xenakis applied probabilistic mathematical models
samer@53 817 to the creation of musical materials\cite{Xenakis:1992ul}. While such processes
samer@53 818 can drive the \emph{generative} phase of the creative process, information dynamics
samer@53 819 can serve as a novel framework for a \emph{selective} phase, by
samer@53 820 providing a set of criteria to be used in judging which of the
samer@53 821 generated materials
samer@53 822 are of value. This alternation of generative and selective phases as been
samer@53 823 noted by art theorist Margaret Boden \cite{Boden1990}.
samer@53 824
samer@53 825 Information-dynamic criteria can also be used as \emph{constraints} on the
samer@53 826 generative processes, for example, by specifying a certain temporal profile
samer@53 827 of suprisingness and uncertainty the composer wishes to induce in the listener
samer@53 828 as the piece unfolds.
samer@53 829 %stochastic and algorithmic processes: ; outputs can be filtered to match a set of
samer@53 830 %criteria defined in terms of information-dynamical characteristics, such as
samer@53 831 %predictability vs unpredictability
samer@53 832 %s model, this criteria thus becoming a means of interfacing with the generative processes.
samer@53 833
samer@62 834 %The tools of information dynamics provide a way to constrain and select musical
samer@62 835 %materials at the level of patterns of expectation, implication, uncertainty, and predictability.
samer@53 836 In particular, the behaviour of the predictive information rate (PIR) defined in
samer@53 837 \secrf{process-info} make it interesting from a compositional point of view. The definition
samer@53 838 of the PIR is such that it is low both for extremely regular processes, such as constant
samer@53 839 or periodic sequences, \emph{and} low for extremely random processes, where each symbol
samer@53 840 is chosen independently of the others, in a kind of `white noise'. In the former case,
samer@53 841 the pattern, once established, is completely predictable and therefore there is no
samer@53 842 \emph{new} information in subsequent observations. In the latter case, the randomness
samer@53 843 and independence of all elements of the sequence means that, though potentially surprising,
samer@53 844 each observation carries no information about the ones to come.
samer@53 845
samer@53 846 Processes with high PIR maintain a certain kind of balance between
samer@53 847 predictability and unpredictability in such a way that the observer must continually
samer@53 848 pay attention to each new observation as it occurs in order to make the best
samer@53 849 possible predictions about the evolution of the seqeunce. This balance between predictability
samer@53 850 and unpredictability is reminiscent of the inverted `U' shape of the Wundt curve (see \figrf{wundt}),
samer@53 851 which summarises the observations of Wundt that the greatest aesthetic value in art
samer@53 852 is to be found at intermediate levels of disorder, where there is a balance between
samer@53 853 `order' and `chaos'.
samer@53 854
samer@53 855 Using the methods of \secrf{markov}, we found \cite{AbdallahPlumbley2009}
samer@53 856 a similar shape when plotting entropy rate againt PIR---this is visible in the
samer@53 857 upper envelope of the scatter plot in \figrf{mtriscat}, which is a 3-D scatter plot of
samer@53 858 three of the information measures discussed in \secrf{process-info} for several thousand
samer@53 859 first-order Markov chain transition matrices generated by a random sampling method.
samer@53 860 The coordinates of the `information space' are entropy rate ($h_\mu$), redundancy ($\rho_\mu$), and
samer@62 861 predictive information rate ($b_\mu$). The points along the `redundancy' axis correspond
samer@62 862 to periodic Markov chains. Those along the `entropy' axis produce uncorrelated sequences
samer@53 863 with no temporal structure. Processes with high PIR are to be found at intermediate
samer@53 864 levels of entropy and redundancy.
samer@53 865 These observations led us to construct the `Melody Triangle' as a graphical interface
samer@53 866 for exploring the melodic patterns generated by each of the Markov chains represented
samer@53 867 as points in \figrf{mtriscat}.
samer@53 868
samer@43 869 \begin{fig}{wundt}
samer@43 870 \raisebox{-4em}{\colfig[0.43]{wundt}}
samer@43 871 % {\ \shortstack{{\Large$\longrightarrow$}\\ {\scriptsize\emph{exposure}}}\ }
samer@43 872 {\ {\large$\longrightarrow$}\ }
samer@43 873 \raisebox{-4em}{\colfig[0.43]{wundt2}}
samer@43 874 \caption{
samer@43 875 The Wundt curve relating randomness/complexity with
samer@43 876 perceived value. Repeated exposure sometimes results
samer@43 877 in a move to the left along the curve \cite{Berlyne71}.
samer@43 878 }
samer@43 879 \end{fig}
hekeus@45 880
hekeus@13 881
hekeus@45 882 %It is possible to apply information dynamics to the generation of content, such as to the composition of musical materials.
hekeus@45 883
hekeus@45 884 %For instance a stochastic music generating process could be controlled by modifying
hekeus@45 885 %constraints on its output in terms of predictive information rate or entropy
hekeus@45 886 %rate.
hekeus@45 887
hekeus@45 888
hekeus@13 889
samer@23 890 \subsection{The Melody Triangle}
samer@23 891
samer@53 892 The Melody Triangle is an exploratory interface for the discovery of melodic
samer@53 893 content, where the input---positions within a triangle---directly map to information
samer@62 894 theoretic properties of the output.
samer@62 895 %The measures---entropy rate, redundancy and
samer@62 896 %predictive information rate---form a criteria with which to filter the output
samer@62 897 %of the stochastic processes used to generate sequences of notes.
samer@62 898 These measures
samer@53 899 address notions of expectation and surprise in music, and as such the Melody
samer@53 900 Triangle is a means of interfacing with a generative process in terms of the
samer@53 901 predictability of its output.
samer@53 902
samer@53 903
samer@51 904 \begin{fig}{mtriscat}
samer@51 905 \colfig{mtriscat}
samer@34 906 \caption{The population of transition matrices distributed along three axes of
samer@34 907 redundancy, entropy rate and predictive information rate (all measured in bits).
samer@34 908 The concentrations of points along the redundancy axis correspond
samer@34 909 to Markov chains which are roughly periodic with periods of 2 (redundancy 1 bit),
samer@34 910 3, 4, \etc all the way to period 8 (redundancy 3 bits). The colour of each point
samer@34 911 represents its PIR---note that the highest values are found at intermediate entropy
samer@34 912 and redundancy, and that the distribution as a whole makes a curved triangle. Although
samer@51 913 not visible in this plot, it is largely hollow in the middle.}
samer@51 914 \end{fig}
samer@23 915
samer@62 916 The triangle is populated with first order Markov chain transition
samer@62 917 matrices as illustrated in \figrf{mtriscat}.
samer@43 918 The distribution of transition matrices plotted in this space forms an arch shape
samer@62 919 that is fairly thin. Thus, it is a reasonable simplification to project out the
samer@62 920 third dimension (the PIR) and present an interface that is just two dimensional.
samer@64 921 The right-angled triangle is rotated, reflected and stretched to form an equilateral triangle with
samer@64 922 the $h_\mu=0, \rho_\mu=0$ vertex at the top, the `redundancy' axis down the left-hand
samer@64 923 side, and the `entropy rate' axis down the right, as shown in \figrf{TheTriangle}.
samer@62 924 This is our `Melody Triangle' and
samer@62 925 forms the interface by which the system is controlled.
samer@62 926 %Using this interface thus involves a mapping to information space;
samer@62 927 The user selects a position within the triangle, the point is mapped into the
samer@62 928 information space, and a corresponding transition matrix is returned. The third dimension,
samer@62 929 though not visible, is implicitly there, as transition matrices retrieved from
samer@62 930 along the centre line of the triangle will tend to have higher PIR.
samer@41 931
samer@42 932 Each corner corresponds to three different extremes of predictability and
samer@42 933 unpredictability, which could be loosely characterised as `periodicity', `noise'
samer@62 934 and `repetition'. Melodies from the `noise' corner (high $h_\mu$, low $\rho_\mu$
samer@62 935 and $b_\mu$) have no discernible pattern;
samer@62 936 Melodies along the `periodicity'
samer@42 937 to `repetition' edge are all deterministic loops that get shorter as we approach
samer@62 938 the `repetition' corner, until each is just one repeating note. The
samer@62 939 areas in between will tend to have higher PIR, and we hypothesise that, under
samer@62 940 the appropriate conditions, these will be perceived as more `interesting' or
samer@62 941 `melodic.'
samer@62 942 %These melodies have some level of unpredictability, but are not completely random.
samer@62 943 % Or, conversely, are predictable, but not entirely so.
samer@41 944
samer@51 945 \begin{fig}{TheTriangle}
samer@51 946 \colfig[0.9]{TheTriangle.pdf}
samer@51 947 \caption{The Melody Triangle}
samer@51 948 \end{fig}
samer@41 949
hekeus@45 950 %PERHAPS WE SHOULD FOREGO TALKING ABOUT THE
hekeus@45 951 %INSTALLATION VERSION OF THE TRIANGLE?
hekeus@45 952 %feels a bit like a tangent, and could do with the space..
samer@42 953 The Melody Triangle exists in two incarnations; a standard screen based interface
samer@42 954 where a user moves tokens in and around a triangle on screen, and a multi-user
samer@42 955 interactive installation where a Kinect camera tracks individuals in a space and
hekeus@45 956 maps their positions in physical space to the triangle. In the latter each visitor
hekeus@45 957 that enters the installation generates a melody and can collaborate with their
samer@62 958 co-visitors to generate musical textures. This makes the interaction physically engaging
samer@62 959 and (as our experience with visitors both young and old has demonstrated) more playful.
samer@62 960 %Additionally visitors can change the
samer@62 961 %tempo, register, instrumentation and periodicity of their melody with body gestures.
samer@41 962
hekeus@45 963 As a screen based interface the Melody Triangle can serve as a composition tool.
samer@62 964 %%A triangle is drawn on the screen, screen space thus mapped to the statistical
samer@62 965 %space of the Melody Triangle.
samer@62 966 A number of tokens, each representing a
hekeus@45 967 melody, can be dragged in and around the triangle. For each token, a sequence of symbols with
hekeus@45 968 statistical properties that correspond to the token's position is generated. These
samer@62 969 symbols are then mapped to notes of a scale or percussive sounds.
samer@62 970 However they could easily be mapped to other musical processes, possibly over
samer@62 971 different time scales, such as chords, dynamics and timbres. It would also be possible
samer@62 972 to map the symbols to visual or kinetic outputs.
samer@62 973 %The possibilities afforded by the Melody Triangle in these other domains remains to be investigated.}.
samer@62 974 Additionally keyboard commands give control over other musical parameters such
samer@62 975 as pitch register and note duration.
samer@23 976
samer@51 977 The Melody Triangle can generate intricate musical textures when multiple tokens
samer@51 978 are in the triangle. Unlike other computer aided composition tools or programming
samer@51 979 environments, here the composer engages with music on a high and abstract level;
samer@51 980 the interface relating to subjective expectation and predictability.
hekeus@45 981
hekeus@35 982
hekeus@35 983
hekeus@38 984
hekeus@38 985 \subsection{Information Dynamics as Evaluative Feedback Mechanism}
hekeus@38 986 %NOT SURE THIS SHOULD BE HERE AT ALL..?
hekeus@38 987
samer@46 988 \begin{fig}{mtri-results}
samer@46 989 \def\scat#1{\colfig[0.42]{mtri/#1}}
samer@46 990 \def\subj#1{\scat{scat_dwells_subj_#1} & \scat{scat_marks_subj_#1}}
samer@46 991 \begin{tabular}{cc}
samer@64 992 % \subj{a} \\
samer@46 993 \subj{b} \\
samer@64 994 \subj{c}
samer@64 995 % \subj{d}
samer@46 996 \end{tabular}
samer@46 997 \caption{Dwell times and mark positions from user trials with the
samer@64 998 on-screen Melody Triangle interface, for two subjects. The left-hand column shows
samer@46 999 the positions in a 2D information space (entropy rate vs multi-information rate
samer@64 1000 in bits) where each spent their time; the area of each circle is proportional
samer@46 1001 to the time spent there. The right-hand column shows point which subjects
samer@64 1002 `liked'; the area of the circles here is proportional to the duration spent at
samer@64 1003 that point before the point was marked.}
samer@46 1004 \end{fig}
hekeus@38 1005
samer@42 1006 Information measures on a stream of symbols can form a feedback mechanism; a
hekeus@45 1007 rudimentary `critic' of sorts. For instance symbol by symbol measure of predictive
samer@42 1008 information rate, entropy rate and redundancy could tell us if a stream of symbols
samer@42 1009 is currently `boring', either because it is too repetitive, or because it is too
hekeus@45 1010 chaotic. Such feedback would be oblivious to long term and large scale
hekeus@45 1011 structures and any cultural norms (such as style conventions), but
hekeus@45 1012 nonetheless could provide a composer with valuable insight on
samer@42 1013 the short term properties of a work. This could not only be used for the
samer@42 1014 evaluation of pre-composed streams of symbols, but could also provide real-time
samer@42 1015 feedback in an improvisatory setup.
hekeus@38 1016
hekeus@13 1017 \section{Musical Preference and Information Dynamics}
samer@42 1018 We are carrying out a study to investigate the relationship between musical
samer@42 1019 preference and the information dynamics models, the experimental interface a
samer@42 1020 simplified version of the screen-based Melody Triangle. Participants are asked
samer@42 1021 to use this music pattern generator under various experimental conditions in a
samer@42 1022 composition task. The data collected includes usage statistics of the system:
samer@42 1023 where in the triangle they place the tokens, how long they leave them there and
samer@42 1024 the state of the system when users, by pressing a key, indicate that they like
samer@42 1025 what they are hearing. As such the experiments will help us identify any
samer@42 1026 correlation between the information theoretic properties of a stream and its
samer@42 1027 perceived aesthetic worth.
hekeus@16 1028
samer@46 1029 Some initial results for four subjects are shown in \figrf{mtri-results}. Though
samer@46 1030 subjects seem to exhibit distinct kinds of exploratory behaviour, we have
samer@46 1031 not been able to show any systematic across-subjects preference for any particular
samer@46 1032 region of the triangle.
samer@46 1033
samer@46 1034 Subjects' comments: several noticed the main organisation of the triangle:
samer@46 1035 repetative notes at the top, cyclic patters along the right edge, and unpredictable
samer@46 1036 notes towards the bottom left (a,c,f). Some did systematic exploration.
samer@46 1037 Felt that the right side was more `controllable' than the left (a,f)---a direct consequence
samer@46 1038 of their ability to return to a particular periodic pattern and recognise at
samer@46 1039 as one heard previously. Some (a,e) felt the trial was too long and became
samer@46 1040 bored towards the end.
samer@46 1041 One subject (f) felt there wasn't enough time to get to hear out the patterns properly.
samer@46 1042 One subject (b) didn't enjoy the lower region whereas another (d) said the lower
samer@46 1043 regions were more `melodic' and `interesting'.
samer@4 1044
hekeus@38 1045 %\emph{comparable system} Gordon Pask's Musicolor (1953) applied a similar notion
hekeus@38 1046 %of boredom in its design. The Musicolour would react to audio input through a
hekeus@38 1047 %microphone by flashing coloured lights. Rather than a direct mapping of sound
hekeus@38 1048 %to light, Pask designed the device to be a partner to a performing musician. It
hekeus@38 1049 %would adapt its lighting pattern based on the rhythms and frequencies it would
hekeus@38 1050 %hear, quickly `learning' to flash in time with the music. However Pask endowed
hekeus@38 1051 %the device with the ability to `be bored'; if the rhythmic and frequency content
hekeus@38 1052 %of the input remained the same for too long it would listen for other rhythms
hekeus@38 1053 %and frequencies, only lighting when it heard these. As the Musicolour would
hekeus@38 1054 %`get bored', the musician would have to change and vary their playing, eliciting
hekeus@38 1055 %new and unexpected outputs in trying to keep the Musicolour interested.
samer@4 1056
hekeus@13 1057
samer@4 1058 \section{Conclusion}
samer@61 1059
samer@61 1060 % !!! FIXME
samer@51 1061 We outlined our information dynamics approach to the modelling of the perception
samer@51 1062 of music. This approach models the subjective assessments of an observer that
samer@51 1063 updates its probabilistic model of a process dynamically as events unfold. We
samer@51 1064 outlined `time-varying' information measures, including a novel `predictive
samer@51 1065 information rate' that characterises the surprisingness and predictability of
samer@51 1066 musical patterns.
samer@4 1067
hekeus@45 1068
samer@51 1069 We have outlined how information dynamics can serve in three different forms of
samer@51 1070 analysis; musicological analysis, sound categorisation and beat tracking.
hekeus@50 1071
samer@51 1072 We have described the `Melody Triangle', a novel system that enables a user/composer
samer@51 1073 to discover musical content in terms of the information theoretic properties of
samer@51 1074 the output, and considered how information dynamics could be used to provide
samer@51 1075 evaluative feedback on a composition or improvisation. Finally we outline a
samer@51 1076 pilot study that used the Melody Triangle as an experimental interface to help
samer@51 1077 determine if there are any correlations between aesthetic preference and information
samer@51 1078 dynamics measures.
hekeus@50 1079
hekeus@45 1080
samer@59 1081 \section*{acknowledgments}
samer@51 1082 This work is supported by EPSRC Doctoral Training Centre EP/G03723X/1 (HE),
hekeus@54 1083 GR/S82213/01 and EP/E045235/1(SA), an EPSRC DTA Studentship (PF), an RAEng/EPSRC Research Fellowship 10216/88 (AR), an EPSRC Leadership Fellowship, EP/G007144/1
samer@51 1084 (MDP) and EPSRC IDyOM2 EP/H013059/1.
hekeus@55 1085 This work is partly funded by the CoSound project, funded by the Danish Agency for Science, Technology and Innovation.
samer@61 1086 Thanks also Marcus Pearce for providing the two rule-based analyses of \emph{Two Pages}.
hekeus@55 1087
hekeus@44 1088
samer@59 1089 \bibliographystyle{IEEEtran}
samer@43 1090 {\bibliography{all,c4dm,nime,andrew}}
samer@4 1091 \end{document}