cip2012: draft.tex annotate

annotate draft.tex @ 34:25846c37a08a

New bits, rearranged figure placement.

author	samer
date	Wed, 14 Mar 2012 13:17:05 +0000
parents	a9c8580cb8ca
children	194c7ec7e35d

rev	line source
samer@18	1 \documentclass[conference,a4paper]{IEEEtran}
samer@4	2 \usepackage{cite}
samer@4	3 \usepackage[cmex10]{amsmath}
samer@4	4 \usepackage{graphicx}
samer@4	5 \usepackage{amssymb}
samer@4	6 \usepackage{epstopdf}
samer@4	7 \usepackage{url}
samer@4	8 \usepackage{listings}
samer@18	9 %\usepackage[expectangle]{tools}
samer@9	10 \usepackage{tools}
samer@18	11 \usepackage{tikz}
samer@18	12 \usetikzlibrary{calc}
samer@18	13 \usetikzlibrary{matrix}
samer@18	14 \usetikzlibrary{patterns}
samer@18	15 \usetikzlibrary{arrows}
samer@9	16
samer@9	17 \let\citep=\cite
samer@33	18 \newcommand{\colfig}[2][1]{\includegraphics[width=#1\linewidth]{figs/#2}}%
samer@18	19 \newcommand\preals{\reals_+}
samer@18	20 \newcommand\X{\mathcal{X}}
samer@18	21 \newcommand\Y{\mathcal{Y}}
samer@18	22 \newcommand\domS{\mathcal{S}}
samer@18	23 \newcommand\A{\mathcal{A}}
samer@25	24 \newcommand\Data{\mathcal{D}}
samer@18	25 \newcommand\rvm[1]{\mathrm{#1}}
samer@18	26 \newcommand\sps{\,.\,}
samer@18	27 \newcommand\Ipred{\mathcal{I}_{\mathrm{pred}}}
samer@18	28 \newcommand\Ix{\mathcal{I}}
samer@18	29 \newcommand\IXZ{\overline{\underline{\mathcal{I}}}}
samer@18	30 \newcommand\x{\vec{x}}
samer@18	31 \newcommand\Ham[1]{\mathcal{H}_{#1}}
samer@18	32 \newcommand\subsets[2]{[#1]^{(k)}}
samer@18	33 \def\bet(#1,#2){#1..#2}
samer@18	34
samer@18	35
samer@18	36 \def\ev(#1=#2){#1\!\!=\!#2}
samer@18	37 \newcommand\rv[1]{\Omega \to #1}
samer@18	38 \newcommand\ceq{\!\!=\!}
samer@18	39 \newcommand\cmin{\!-\!}
samer@18	40 \newcommand\modulo[2]{#1\!\!\!\!\!\mod#2}
samer@18	41
samer@18	42 \newcommand\sumitoN{\sum_{i=1}^N}
samer@18	43 \newcommand\sumktoK{\sum_{k=1}^K}
samer@18	44 \newcommand\sumjtoK{\sum_{j=1}^K}
samer@18	45 \newcommand\sumalpha{\sum_{\alpha\in\A}}
samer@18	46 \newcommand\prodktoK{\prod_{k=1}^K}
samer@18	47 \newcommand\prodjtoK{\prod_{j=1}^K}
samer@18	48
samer@18	49 \newcommand\past[1]{\overset{\rule{0pt}{0.2em}\smash{\leftarrow}}{#1}}
samer@18	50 \newcommand\fut[1]{\overset{\rule{0pt}{0.1em}\smash{\rightarrow}}{#1}}
samer@18	51 \newcommand\parity[2]{P^{#1}_{2,#2}}
samer@4	52
samer@4	53 %\usepackage[parfill]{parskip}
samer@4	54
samer@4	55 \begin{document}
samer@4	56 \title{Cognitive Music Modelling: an Information Dynamics Approach}
samer@4	57
samer@4	58 \author{
hekeus@16	59 \IEEEauthorblockN{Samer A. Abdallah, Henrik Ekeus, Peter Foster}
hekeus@16	60 \IEEEauthorblockN{Andrew Robertson and Mark D. Plumbley}
samer@4	61 \IEEEauthorblockA{Centre for Digital Music\\
samer@4	62 Queen Mary University of London\\
hekeus@16	63 Mile End Road, London E1 4NS\\
hekeus@16	64 Email:}}
samer@4	65
samer@4	66 \maketitle
samer@18	67 \begin{abstract}
samer@18	68 People take in information when perceiving music. With it they continually
samer@18	69 build predictive models of what is going to happen. There is a relationship
samer@18	70 between information measures and how we perceive music. An information
samer@18	71 theoretic approach to music cognition is thus a fruitful avenue of research.
samer@18	72 In this paper, we review the theoretical foundations of information dynamics
samer@18	73 and discuss a few emerging areas of application.
hekeus@16	74 \end{abstract}
samer@4	75
samer@4	76
samer@25	77 \section{Introduction}
samer@9	78 \label{s:Intro}
samer@9	79
samer@25	80 \subsection{Expectation and surprise in music}
samer@18	81 One of the effects of listening to music is to create
samer@18	82 expectations of what is to come next, which may be fulfilled
samer@9	83 immediately, after some delay, or not at all as the case may be.
samer@9	84 This is the thesis put forward by, amongst others, music theorists
samer@18	85 L. B. Meyer \cite{Meyer67} and Narmour \citep{Narmour77}, but was
samer@18	86 recognised much earlier; for example,
samer@9	87 it was elegantly put by Hanslick \cite{Hanslick1854} in the
samer@9	88 nineteenth century:
samer@9	89 \begin{quote}
samer@9	90 `The most important factor in the mental process which accompanies the
samer@9	91 act of listening to music, and which converts it to a source of pleasure,
samer@18	92 is \ldots the intellectual satisfaction
samer@9	93 which the listener derives from continually following and anticipating
samer@9	94 the composer's intentions---now, to see his expectations fulfilled, and
samer@18	95 now, to find himself agreeably mistaken.
samer@18	96 %It is a matter of course that
samer@18	97 %this intellectual flux and reflux, this perpetual giving and receiving
samer@18	98 %takes place unconsciously, and with the rapidity of lightning-flashes.'
samer@9	99 \end{quote}
samer@9	100 An essential aspect of this is that music is experienced as a phenomenon
samer@9	101 that `unfolds' in time, rather than being apprehended as a static object
samer@9	102 presented in its entirety. Meyer argued that musical experience depends
samer@9	103 on how we change and revise our conceptions \emph{as events happen}, on
samer@9	104 how expectation and prediction interact with occurrence, and that, to a
samer@9	105 large degree, the way to understand the effect of music is to focus on
samer@9	106 this `kinetics' of expectation and surprise.
samer@9	107
samer@25	108 Prediction and expectation are essentially probabilistic concepts
samer@25	109 and can be treated mathematically using probability theory.
samer@25	110 We suppose that when we listen to music, expectations are created on the basis
samer@25	111 of our familiarity with various styles of music and our ability to
samer@25	112 detect and learn statistical regularities in the music as they emerge,
samer@25	113 There is experimental evidence that human listeners are able to internalise
samer@25	114 statistical knowledge about musical structure, \eg
samer@25	115 \citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also
samer@25	116 that statistical models can form an effective basis for computational
samer@25	117 analysis of music, \eg
samer@25	118 \cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}.
samer@25	119
samer@25	120
samer@25	121 \comment{
samer@9	122 The business of making predictions and assessing surprise is essentially
samer@9	123 one of reasoning under conditions of uncertainty and manipulating
samer@9	124 degrees of belief about the various proposition which may or may not
samer@9	125 hold, and, as has been argued elsewhere \cite{Cox1946,Jaynes27}, best
samer@9	126 quantified in terms of Bayesian probability theory.
samer@9	127 Thus, we suppose that
samer@9	128 when we listen to music, expectations are created on the basis of our
samer@24	129 familiarity with various stylistic norms that apply to music in general,
samer@24	130 the particular style (or styles) of music that seem best to fit the piece
samer@24	131 we are listening to, and
samer@9	132 the emerging structures peculiar to the current piece. There is
samer@9	133 experimental evidence that human listeners are able to internalise
samer@9	134 statistical knowledge about musical structure, \eg
samer@9	135 \citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also
samer@9	136 that statistical models can form an effective basis for computational
samer@9	137 analysis of music, \eg
samer@9	138 \cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}.
samer@25	139 }
samer@9	140
samer@9	141 \subsection{Music and information theory}
samer@24	142 With a probabilistic framework for music modelling and prediction in hand,
samer@25	143 we are in a position to apply Shannon's quantitative information theory
samer@25	144 \cite{Shannon48}.
samer@25	145 \comment{
samer@25	146 which provides us with a number of measures, such as entropy
samer@25	147 and mutual information, which are suitable for quantifying states of
samer@25	148 uncertainty and surprise, and thus could potentially enable us to build
samer@25	149 quantitative models of the listening process described above. They are
samer@25	150 what Berlyne \cite{Berlyne71} called `collative variables' since they are
samer@25	151 to do with patterns of occurrence rather than medium-specific details.
samer@25	152 Berlyne sought to show that the collative variables are closely related to
samer@25	153 perceptual qualities like complexity, tension, interestingness,
samer@25	154 and even aesthetic value, not just in music, but in other temporal
samer@25	155 or visual media.
samer@25	156 The relevance of information theory to music and art has
samer@25	157 also been addressed by researchers from the 1950s onwards
samer@25	158 \cite{Youngblood58,CoonsKraehenbuehl1958,Cohen1962,HillerBean66,Moles66,Meyer67}.
samer@25	159 }
samer@9	160 The relationship between information theory and music and art in general has been the
samer@9	161 subject of some interest since the 1950s
samer@9	162 \cite{Youngblood58,CoonsKraehenbuehl1958,HillerBean66,Moles66,Meyer67,Cohen1962}.
samer@9	163 The general thesis is that perceptible qualities and subjective
samer@9	164 states like uncertainty, surprise, complexity, tension, and interestingness
samer@9	165 are closely related to
samer@9	166 information-theoretic quantities like entropy, relative entropy,
samer@9	167 and mutual information.
samer@9	168 % and are major determinants of the overall experience.
samer@9	169 Berlyne \cite{Berlyne71} called such quantities `collative variables', since
samer@9	170 they are to do with patterns of occurrence rather than medium-specific details,
samer@9	171 and developed the ideas of `information aesthetics' in an experimental setting.
samer@9	172 % Berlyne's `new experimental aesthetics', the `information-aestheticians'.
samer@9	173
samer@9	174 % Listeners then experience greater or lesser levels of surprise
samer@9	175 % in response to departures from these norms.
samer@9	176 % By careful manipulation
samer@9	177 % of the material, the composer can thus define, and induce within the
samer@9	178 % listener, a temporal programme of varying
samer@9	179 % levels of uncertainty, ambiguity and surprise.
samer@9	180
samer@9	181
samer@9	182 \subsection{Information dynamic approach}
samer@9	183
samer@24	184 Bringing the various strands together, our working hypothesis is that as a
samer@24	185 listener (to which will refer as `it') listens to a piece of music, it maintains
samer@25	186 a dynamically evolving probabilistic model that enables it to make predictions
samer@24	187 about how the piece will continue, relying on both its previous experience
samer@24	188 of music and the immediate context of the piece. As events unfold, it revises
samer@25	189 its probabilistic belief state, which includes predictive
samer@25	190 distributions over possible future events. These
samer@25	191 % distributions and changes in distributions
samer@25	192 can be characterised in terms of a handful of information
samer@25	193 theoretic-measures such as entropy and relative entropy. By tracing the
samer@24	194 evolution of a these measures, we obtain a representation which captures much
samer@25	195 of the significant structure of the music.
samer@25	196
samer@25	197 One of the consequences of this approach is that regardless of the details of
samer@25	198 the sensory input or even which sensory modality is being processed, the resulting
samer@25	199 analysis is in terms of the same units: quantities of information (bits) and
samer@25	200 rates of information flow (bits per second). The probabilistic and information
samer@25	201 theoretic concepts in terms of which the analysis is framed are universal to all sorts
samer@25	202 of data.
samer@25	203 In addition, when adaptive probabilistic models are used, expectations are
samer@25	204 created mainly in response to to \emph{patterns} of occurence,
samer@25	205 rather the details of which specific things occur.
samer@25	206 Together, these suggest that an information dynamic analysis captures a
samer@25	207 high level of \emph{abstraction}, and could be used to
samer@25	208 make structural comparisons between different temporal media,
samer@25	209 such as music, film, animation, and dance.
samer@25	210 % analyse and compare information
samer@25	211 % flow in different temporal media regardless of whether they are auditory,
samer@25	212 % visual or otherwise.
samer@9	213
samer@25	214 Another consequence is that the information dynamic approach gives us a principled way
samer@24	215 to address the notion of \emph{subjectivity}, since the analysis is dependent on the
samer@24	216 probability model the observer starts off with, which may depend on prior experience
samer@24	217 or other factors, and which may change over time. Thus, inter-subject variablity and
samer@24	218 variation in subjects' responses over time are
samer@24	219 fundamental to the theory.
samer@9	220
samer@18	221 %modelling the creative process, which often alternates between generative
samer@18	222 %and selective or evaluative phases \cite{Boden1990}, and would have
samer@18	223 %applications in tools for computer aided composition.
samer@18	224
samer@18	225
samer@18	226 \section{Theoretical review}
samer@18	227
samer@34	228 \subsection{Entropy and information}
samer@34	229 Let $X$ denote some variable whose value is initially unknown to our
samer@34	230 hypothetical observer. We will treat $X$ mathematically as a random variable,
samer@34	231 with a value to be drawn from some set $\A$ and a
samer@34	232 probability distribution representing the observer's beliefs about the
samer@34	233 true value of $X$.
samer@34	234 In this case, the observer's uncertainty about $X$ can be quantified
samer@34	235 as the entropy of the random variable $H(X)$. For a discrete variable
samer@34	236 with probability mass function $p:\A \to [0,1]$, this is
samer@34	237 \begin{equation}
samer@34	238 H(X) = \sum_{x\in\A} -p(x) \log p(x) = \expect{-\log p(X)},
samer@34	239 \end{equation}
samer@34	240 where $\expect{}$ is the expectation operator. The negative-log-probability
samer@34	241 $\ell(x) = -\log p(x)$ of a particular value $x$ can usefully be thought of as
samer@34	242 the \emph{surprisingness} of the value $x$ should it be observed, and
samer@34	243 hence the entropy is the expected surprisingness.
samer@34	244
samer@34	245 Now suppose that the observer receives some new data $\Data$ that
samer@34	246 causes a revision of its beliefs about $X$. The \emph{information}
samer@34	247 in this new data \emph{about} $X$ can be quantified as the
samer@34	248 Kullback-Leibler (KL) divergence between the prior and posterior
samer@34	249 distributions $p(x)$ and $p(x\|\Data)$ respectively:
samer@34	250 \begin{equation}
samer@34	251 \mathcal{I}_{\Data\to X} = D(p_{X\|\Data} \|\| p_{X})
samer@34	252 = \sum_{x\in\A} p(x\|\Data) \log \frac{p(x\|\Data)}{p(x)}.
samer@34	253 \end{equation}
samer@34	254 When there are multiple variables $X_1, X_2$
samer@34	255 \etc which the observer believes to be dependent, then the observation of
samer@34	256 one may change its beliefs and hence yield information about the
samer@34	257 others. The joint and conditional entropies as described in any
samer@34	258 textbook on information theory (\eg \cite{CoverThomas}) then quantify
samer@34	259 the observer's expected uncertainty about groups of variables given the
samer@34	260 values of others. In particular, the \emph{mutual information}
samer@34	261 $I(X_1;X_2)$ is both the expected information
samer@34	262 in an observation of $X_2$ about $X_1$ and the expected reduction
samer@34	263 in uncertainty about $X_1$ after observing $X_2$:
samer@34	264 \begin{equation}
samer@34	265 I(X_1;X_2) = H(X_1) - H(X_1\|X_2),
samer@34	266 \end{equation}
samer@34	267 where $H(X_1\|X_2) = H(X_1,X_2) - H(X_2)$ is the conditional entropy
samer@34	268 of $X_2$ given $X_1$. A little algebra shows that $I(X_1;X_2)=I(X_2;X_1)$
samer@34	269 and so the mutual information is symmetric in its arguments. A conditional
samer@34	270 form of the mutual information can be formulated analogously:
samer@34	271 \begin{equation}
samer@34	272 I(X_1;X_2\|X_3) = H(X_1\|X_3) - H(X_1\|X_2,X_3).
samer@34	273 \end{equation}
samer@34	274 These relationships between the various entropies and mutual
samer@34	275 informations are conveniently visualised in Venn diagram-like \emph{information diagrams}
samer@34	276 or I-diagrams \cite{Yeung1991} such as the one in \figrf{venn-example}.
samer@34	277
samer@18	278 \begin{fig}{venn-example}
samer@18	279 \newcommand\rad{2.2em}%
samer@18	280 \newcommand\circo{circle (3.4em)}%
samer@18	281 \newcommand\labrad{4.3em}
samer@18	282 \newcommand\bound{(-6em,-5em) rectangle (6em,6em)}
samer@18	283 \newcommand\colsep{\ }
samer@18	284 \newcommand\clipin[1]{\clip (#1) \circo;}%
samer@18	285 \newcommand\clipout[1]{\clip \bound (#1) \circo;}%
samer@18	286 \newcommand\cliptwo[3]{%
samer@18	287 \begin{scope}
samer@18	288 \clipin{#1};
samer@18	289 \clipin{#2};
samer@18	290 \clipout{#3};
samer@18	291 \fill[black!30] \bound;
samer@18	292 \end{scope}
samer@18	293 }%
samer@18	294 \newcommand\clipone[3]{%
samer@18	295 \begin{scope}
samer@18	296 \clipin{#1};
samer@18	297 \clipout{#2};
samer@18	298 \clipout{#3};
samer@18	299 \fill[black!15] \bound;
samer@18	300 \end{scope}
samer@18	301 }%
samer@18	302 \begin{tabular}{c@{\colsep}c}
samer@18	303 \begin{tikzpicture}[baseline=0pt]
samer@18	304 \coordinate (p1) at (90:\rad);
samer@18	305 \coordinate (p2) at (210:\rad);
samer@18	306 \coordinate (p3) at (-30:\rad);
samer@18	307 \clipone{p1}{p2}{p3};
samer@18	308 \clipone{p2}{p3}{p1};
samer@18	309 \clipone{p3}{p1}{p2};
samer@18	310 \cliptwo{p1}{p2}{p3};
samer@18	311 \cliptwo{p2}{p3}{p1};
samer@18	312 \cliptwo{p3}{p1}{p2};
samer@18	313 \begin{scope}
samer@18	314 \clip (p1) \circo;
samer@18	315 \clip (p2) \circo;
samer@18	316 \clip (p3) \circo;
samer@18	317 \fill[black!45] \bound;
samer@18	318 \end{scope}
samer@18	319 \draw (p1) \circo;
samer@18	320 \draw (p2) \circo;
samer@18	321 \draw (p3) \circo;
samer@18	322 \path
samer@18	323 (barycentric cs:p3=1,p1=-0.2,p2=-0.1) +(0ex,0) node {$I_{3\|12}$}
samer@18	324 (barycentric cs:p1=1,p2=-0.2,p3=-0.1) +(0ex,0) node {$I_{1\|23}$}
samer@18	325 (barycentric cs:p2=1,p3=-0.2,p1=-0.1) +(0ex,0) node {$I_{2\|13}$}
samer@18	326 (barycentric cs:p3=1,p2=1,p1=-0.55) +(0ex,0) node {$I_{23\|1}$}
samer@18	327 (barycentric cs:p1=1,p3=1,p2=-0.55) +(0ex,0) node {$I_{13\|2}$}
samer@18	328 (barycentric cs:p2=1,p1=1,p3=-0.55) +(0ex,0) node {$I_{12\|3}$}
samer@18	329 (barycentric cs:p3=1,p2=1,p1=1) node {$I_{123}$}
samer@18	330 ;
samer@18	331 \path
samer@18	332 (p1) +(140:\labrad) node {$X_1$}
samer@18	333 (p2) +(-140:\labrad) node {$X_2$}
samer@18	334 (p3) +(-40:\labrad) node {$X_3$};
samer@18	335 \end{tikzpicture}
samer@18	336 &
samer@18	337 \parbox{0.5\linewidth}{
samer@18	338 \small
samer@18	339 \begin{align*}
samer@18	340 I_{1\|23} &= H(X_1\|X_2,X_3) \\
samer@18	341 I_{13\|2} &= I(X_1;X_3\|X_2) \\
samer@18	342 I_{1\|23} + I_{13\|2} &= H(X_1\|X_2) \\
samer@18	343 I_{12\|3} + I_{123} &= I(X_1;X_2)
samer@18	344 \end{align*}
samer@18	345 }
samer@18	346 \end{tabular}
samer@18	347 \caption{
samer@30	348 I-diagram visualisation of entropies and mutual informations
samer@18	349 for three random variables $X_1$, $X_2$ and $X_3$. The areas of
samer@18	350 the three circles represent $H(X_1)$, $H(X_2)$ and $H(X_3)$ respectively.
samer@18	351 The total shaded area is the joint entropy $H(X_1,X_2,X_3)$.
samer@18	352 The central area $I_{123}$ is the co-information \cite{McGill1954}.
samer@18	353 Some other information measures are indicated in the legend.
samer@18	354 }
samer@18	355 \end{fig}
samer@30	356
samer@30	357
samer@30	358 \subsection{Entropy and information in sequences}
samer@30	359
samer@30	360 Suppose that $(\ldots,X_{-1},X_0,X_1,\ldots)$ is a stationary sequence of
samer@30	361 random variables, infinite in both directions,
samer@30	362 and that $\mu$ is the associated shift-invariant probability measure over all
samer@30	363 configurations of the sequence---in the following, $\mu$ will simply serve
samer@30	364 as a label for the process. We can indentify a number of information-theoretic
samer@30	365 measures meaningful in the context of a sequential observation of the sequence, during
samer@30	366 which, at any time $t$, there is `present' $X_t$, a `past'
samer@30	367 $\past{X}_t \equiv (\ldots, X_{t-2}, X_{t-1})$, and a `future'
samer@30	368 $\fut{X}_t \equiv (X_{t+1},X_{t+2},\ldots)$.
samer@30	369 Since the sequence is assumed stationary, we can without loss of generality,
samer@30	370 assume that $t=0$ in the following definitions.
samer@30	371
samer@30	372 The \emph{entropy rate} of the process is the entropy of the next variable
samer@30	373 $X_t$ given all the previous ones.
samer@30	374 \begin{equation}
samer@30	375 \label{eq:entro-rate}
samer@30	376 h_\mu = H(X_0\|\past{X}_0).
samer@30	377 \end{equation}
samer@30	378 The entropy rate gives a measure of the overall randomness
samer@30	379 or unpredictability of the process.
samer@30	380
samer@30	381 The \emph{multi-information rate} $\rho_\mu$ (following Dubnov's \cite{Dubnov2006}
samer@30	382 notation for what he called the `information rate') is the mutual
samer@30	383 information between the `past' and the `present':
samer@30	384 \begin{equation}
samer@30	385 \label{eq:multi-info}
samer@30	386 \rho_\mu(t) = I(\past{X}_t;X_t) = H(X_0) - h_\mu.
samer@30	387 \end{equation}
samer@30	388 It is a measure of how much the context of an observation (that is,
samer@30	389 the observation of previous elements of the sequence) helps in predicting
samer@30	390 or reducing the suprisingness of the current observation.
samer@30	391
samer@30	392 The \emph{excess entropy} \cite{CrutchfieldPackard1983}
samer@30	393 is the mutual information between
samer@30	394 the entire `past' and the entire `future':
samer@30	395 \begin{equation}
samer@30	396 E = I(\past{X}_0; X_0,\fut{X}_0).
samer@30	397 \end{equation}
samer@30	398
samer@30	399
samer@18	400
samer@18	401 \begin{fig}{predinfo-bg}
samer@18	402 \newcommand\subfig[2]{\shortstack{#2\\[0.75em]#1}}
samer@18	403 \newcommand\rad{1.8em}%
samer@18	404 \newcommand\ovoid[1]{%
samer@18	405 ++(-#1,\rad)
samer@18	406 -- ++(2 * #1,0em) arc (90:-90:\rad)
samer@18	407 -- ++(-2 * #1,0em) arc (270:90:\rad)
samer@18	408 }%
samer@18	409 \newcommand\axis{2.75em}%
samer@18	410 \newcommand\olap{0.85em}%
samer@18	411 \newcommand\offs{3.6em}
samer@18	412 \newcommand\colsep{\hspace{5em}}
samer@18	413 \newcommand\longblob{\ovoid{\axis}}
samer@18	414 \newcommand\shortblob{\ovoid{1.75em}}
samer@18	415 \begin{tabular}{c@{\colsep}c}
samer@18	416 \subfig{(a) excess entropy}{%
samer@18	417 \newcommand\blob{\longblob}
samer@18	418 \begin{tikzpicture}
samer@18	419 \coordinate (p1) at (-\offs,0em);
samer@18	420 \coordinate (p2) at (\offs,0em);
samer@18	421 \begin{scope}
samer@18	422 \clip (p1) \blob;
samer@18	423 \clip (p2) \blob;
samer@18	424 \fill[lightgray] (-1,-1) rectangle (1,1);
samer@18	425 \end{scope}
samer@18	426 \draw (p1) +(-0.5em,0em) node{\shortstack{infinite\\past}} \blob;
samer@18	427 \draw (p2) +(0.5em,0em) node{\shortstack{infinite\\future}} \blob;
samer@18	428 \path (0,0) node (future) {$E$};
samer@18	429 \path (p1) +(-2em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
samer@18	430 \path (p2) +(2em,\rad) node [anchor=south] {$X_0,\ldots$};
samer@18	431 \end{tikzpicture}%
samer@18	432 }%
samer@18	433 \\[1.25em]
samer@18	434 \subfig{(b) predictive information rate $b_\mu$}{%
samer@18	435 \begin{tikzpicture}%[baseline=-1em]
samer@18	436 \newcommand\rc{2.1em}
samer@18	437 \newcommand\throw{2.5em}
samer@18	438 \coordinate (p1) at (210:1.5em);
samer@18	439 \coordinate (p2) at (90:0.7em);
samer@18	440 \coordinate (p3) at (-30:1.5em);
samer@18	441 \newcommand\bound{(-7em,-2.6em) rectangle (7em,3.0em)}
samer@18	442 \newcommand\present{(p2) circle (\rc)}
samer@18	443 \newcommand\thepast{(p1) ++(-\throw,0) \ovoid{\throw}}
samer@18	444 \newcommand\future{(p3) ++(\throw,0) \ovoid{\throw}}
samer@18	445 \newcommand\fillclipped[2]{%
samer@18	446 \begin{scope}[even odd rule]
samer@18	447 \foreach \thing in {#2} {\clip \thing;}
samer@18	448 \fill[black!#1] \bound;
samer@18	449 \end{scope}%
samer@18	450 }%
samer@18	451 \fillclipped{30}{\present,\future,\bound \thepast}
samer@18	452 \fillclipped{15}{\present,\bound \future,\bound \thepast}
samer@18	453 \draw \future;
samer@18	454 \fillclipped{45}{\present,\thepast}
samer@18	455 \draw \thepast;
samer@18	456 \draw \present;
samer@18	457 \node at (barycentric cs:p2=1,p1=-0.17,p3=-0.17) {$r_\mu$};
samer@18	458 \node at (barycentric cs:p1=-0.4,p2=1.0,p3=1) {$b_\mu$};
samer@18	459 \node at (barycentric cs:p3=0,p2=1,p1=1.2) [shape=rectangle,fill=black!45,inner sep=1pt]{$\rho_\mu$};
samer@18	460 \path (p2) +(140:3em) node {$X_0$};
samer@18	461 % \node at (barycentric cs:p3=0,p2=1,p1=1) {$\rho_\mu$};
samer@18	462 \path (p3) +(3em,0em) node {\shortstack{infinite\\future}};
samer@18	463 \path (p1) +(-3em,0em) node {\shortstack{infinite\\past}};
samer@18	464 \path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
samer@18	465 \path (p3) +(4em,\rad) node [anchor=south] {$X_1,\ldots$};
samer@18	466 \end{tikzpicture}}%
samer@18	467 \\[0.5em]
samer@18	468 \end{tabular}
samer@18	469 \caption{
samer@30	470 I-diagrams for several information measures in
samer@18	471 stationary random processes. Each circle or oval represents a random
samer@18	472 variable or sequence of random variables relative to time $t=0$. Overlapped areas
samer@18	473 correspond to various mutual information as in \Figrf{venn-example}.
samer@33	474 In (b), the circle represents the `present'. Its total area is
samer@33	475 $H(X_0)=\rho_\mu+r_\mu+b_\mu$, where $\rho_\mu$ is the multi-information
samer@18	476 rate, $r_\mu$ is the residual entropy rate, and $b_\mu$ is the predictive
samer@18	477 information rate. The entropy rate is $h_\mu = r_\mu+b_\mu$.
samer@18	478 }
samer@18	479 \end{fig}
samer@18	480
samer@30	481 The \emph{predictive information rate} (or PIR) \cite{AbdallahPlumbley2009}
samer@30	482 is the average information in one observation about the infinite future given the infinite past,
samer@30	483 and is defined as a conditional mutual information:
samer@18	484 \begin{equation}
samer@18	485 \label{eq:PIR}
samer@30	486 b_\mu = I(X_0;\fut{X}_0\|\past{X}_0) = H(\fut{X}_0\|\past{X}_0) - H(\fut{X}_0\|X_0,\past{X}_0).
samer@18	487 \end{equation}
samer@18	488 Equation \eqrf{PIR} can be read as the average reduction
samer@18	489 in uncertainty about the future on learning $X_t$, given the past.
samer@18	490 Due to the symmetry of the mutual information, it can also be written
samer@18	491 as
samer@18	492 \begin{equation}
samer@18	493 % \IXZ_t
samer@34	494 I(X_0;\fut{X}_0\|\past{X}_0) = h_\mu - r_\mu,
samer@18	495 % \label{<++>}
samer@18	496 \end{equation}
samer@18	497 % If $X$ is stationary, then
samer@34	498 where $r_\mu = H(X_0\|\fut{X}_0,\past{X}_0)$,
samer@34	499 is the \emph{residual} \cite{AbdallahPlumbley2010},
samer@34	500 or \emph{erasure} \cite{VerduWeissman2006} entropy rate.
samer@18	501 These relationships are illustrated in \Figrf{predinfo-bg}, along with
samer@18	502 several of the information measures we have discussed so far.
samer@18	503
samer@18	504
samer@25	505 \subsection{Other sequential information measures}
samer@25	506
samer@25	507 James et al \cite{JamesEllisonCrutchfield2011} study the predictive information
samer@25	508 rate and also examine some related measures. In particular they identify the
samer@25	509 $\sigma_\mu$, the difference between the multi-information rate and the excess
samer@25	510 entropy, as an interesting quantity that measures the predictive benefit of
samer@25	511 model-building (that is, maintaining an internal state summarising past
samer@25	512 observations in order to make better predictions). They also identify
samer@25	513 $w_\mu = \rho_\mu + b_{\mu}$, which they call the \emph{local exogenous
samer@30	514 information} rate.
samer@24	515
samer@18	516 \subsection{First order Markov chains}
samer@18	517 These are the simplest non-trivial models to which information dynamics methods
samer@18	518 can be applied. In \cite{AbdallahPlumbley2009} we, showed that the predictive information
samer@18	519 rate can be expressed simply in terms of the entropy rate of the Markov chain.
samer@18	520 If we let $a$ denote the transition matrix of the Markov chain, and $h_a$ it's
samer@18	521 entropy rate, then its predictive information rate $b_a$ is
samer@18	522 \begin{equation}
samer@18	523 b_a = h_{a^2} - h_a,
samer@18	524 \end{equation}
samer@18	525 where $a^2 = aa$, the transition matrix squared, is the transition matrix
samer@18	526 of the `skip one' Markov chain obtained by leaving out every other observation.
samer@18	527
samer@18	528 \subsection{Higher order Markov chains}
samer@18	529 Second and higher order Markov chains can be treated in a similar way by transforming
samer@18	530 to a first order representation of the high order Markov chain. If we are dealing
samer@18	531 with an $N$th order model, this is done forming a new alphabet of possible observations
samer@18	532 consisting of all possible $N$-tuples of symbols from the base alphabet. An observation
samer@18	533 in this new model represents a block of $N$ observations from the base model. The next
samer@18	534 observation represents the block of $N$ obtained by shift the previous block along
samer@18	535 by one step. The new Markov of chain is parameterised by a sparse $K^N\times K^N$
samer@18	536 transition matrix $\hat{a}$.
samer@18	537 \begin{equation}
samer@18	538 b_{\hat{a}} = h_{\hat{a}^{N+1}} - N h_{\hat{a}},
samer@18	539 \end{equation}
samer@18	540 where $\hat{a}^{N+1}$ is the $N+1$th power of the transition matrix.
samer@18	541
samer@9	542
samer@34	543 \begin{fig}{wundt}
samer@34	544 \raisebox{-4em}{\colfig[0.43]{wundt}}
samer@34	545 % {\ \shortstack{{\Large$\longrightarrow$}\\ {\scriptsize\emph{exposure}}}\ }
samer@34	546 {\ {\large$\longrightarrow$}\ }
samer@34	547 \raisebox{-4em}{\colfig[0.43]{wundt2}}
samer@34	548 \caption{
samer@34	549 The Wundt curve relating randomness/complexity with
samer@34	550 perceived value. Repeated exposure sometimes results
samer@34	551 in a move to the left along the curve \cite{Berlyne71}.
samer@34	552 }
samer@34	553 \end{fig}
samer@34	554
samer@4	555
hekeus@16	556 \section{Information Dynamics in Analysis}
samer@4	557
hekeus@16	558 \subsection{Musicological Analysis}
samer@34	559 In \cite{AbdallahPlumbley2009}, methods based on the theory described above
samer@34	560 were used to analysis two pieces of music in the minimalist style
samer@34	561 by Philip Glass: \emph{Two Pages} (1969) and \emph{Gradus} (1968).
samer@34	562 The analysis was done using a first-order Markov chain model, with the
samer@34	563 enhancement that the transition matrix of the model was allowed to
samer@34	564 evolve dynamically as the notes were processed, and was estimated (in
samer@34	565 a Bayesian way) as a \emph{distribution} over possible transition matrices,
samer@34	566 rather than a point estimate. [Bayesian surprise, other component of IPI].
samer@4	567
samer@24	568 \begin{fig}{twopages}
samer@33	569 \colfig[0.96]{matbase/fig9471} % update from mbc paper
samer@33	570 % \colfig[0.97]{matbase/fig72663}\\ % later update from mbc paper (Keith's new picks)
samer@24	571 \vspace*{1em}
samer@24	572 \colfig[0.97]{matbase/fig13377} % rule based analysis
samer@24	573 \caption{Analysis of \emph{Two Pages}.
samer@24	574 The thick vertical lines are the part boundaries as indicated in
samer@24	575 the score by the composer.
samer@24	576 The thin grey lines
samer@24	577 indicate changes in the melodic `figures' of which the piece is
samer@24	578 constructed. In the `model information rate' panel, the black asterisks
samer@24	579 mark the
samer@24	580 six most surprising moments selected by Keith Potter.
samer@24	581 The bottom panel shows a rule-based boundary strength analysis computed
samer@24	582 using Cambouropoulos' LBDM.
samer@24	583 All information measures are in nats and time is in notes.
samer@24	584 }
samer@24	585 \end{fig}
samer@24	586
samer@24	587 \begin{fig}{metre}
samer@33	588 % \scalebox{1}[1]{%
samer@24	589 \begin{tabular}{cc}
samer@33	590 \colfig[0.45]{matbase/fig36859} & \colfig[0.48]{matbase/fig88658} \\
samer@33	591 \colfig[0.45]{matbase/fig48061} & \colfig[0.48]{matbase/fig46367} \\
samer@33	592 \colfig[0.45]{matbase/fig99042} & \colfig[0.47]{matbase/fig87490}
samer@24	593 % \colfig[0.46]{matbase/fig56807} & \colfig[0.48]{matbase/fig27144} \\
samer@24	594 % \colfig[0.46]{matbase/fig87574} & \colfig[0.48]{matbase/fig13651} \\
samer@24	595 % \colfig[0.44]{matbase/fig19913} & \colfig[0.46]{matbase/fig66144} \\
samer@24	596 % \colfig[0.48]{matbase/fig73098} & \colfig[0.48]{matbase/fig57141} \\
samer@24	597 % \colfig[0.48]{matbase/fig25703} & \colfig[0.48]{matbase/fig72080} \\
samer@24	598 % \colfig[0.48]{matbase/fig9142} & \colfig[0.48]{matbase/fig27751}
samer@24	599
samer@24	600 \end{tabular}%
samer@33	601 % }
samer@24	602 \caption{Metrical analysis by computing average surprisingness and
samer@24	603 informative of notes at different periodicities (\ie hypothetical
samer@24	604 bar lengths) and phases (\ie positions within a bar).
samer@24	605 }
samer@24	606 \end{fig}
samer@24	607
samer@23	608 \subsection{Content analysis/Sound Categorisation}.
peterf@26	609 Overview of of information-theoretic approaches to music content analysis.
peterf@26	610 \begin{itemize}
samer@33	611 \item Continuous domain information
samer@33	612 \item Audio based music expectation modelling
peterf@26	613 \item Proposed model for Gaussian processes
peterf@26	614 \end{itemize}
peterf@26	615 \emph{Peter}
peterf@26	616
samer@4	617
samer@4	618 \subsection{Beat Tracking}
hekeus@16	619 \emph{Andrew}
samer@4	620
samer@4	621
samer@24	622 \section{Information dynamics as compositional aid}
hekeus@13	623
samer@23	624 In addition to applying information dynamics to analysis, it is also possible
samer@23	625 use this approach in design, such as the composition of musical materials. By
samer@23	626 providing a framework for linking information theoretic measures to the control
samer@23	627 of generative processes, it becomes possible to steer the output of these processes
samer@23	628 to match a criteria defined by these measures. For instance outputs of a
samer@23	629 stochastic musical process could be filtered to match constraints defined by a
samer@23	630 set of information theoretic measures.
hekeus@13	631
samer@23	632 The use of stochastic processes for the generation of musical material has been
samer@23	633 widespread for decades -- Iannis Xenakis applied probabilistic mathematical
samer@23	634 models to the creation of musical materials, including to the formulation of a
samer@23	635 theory of Markovian Stochastic Music. However we can use information dynamics
samer@23	636 measures to explore and interface with such processes at the high and abstract
samer@23	637 level of expectation, randomness and predictability. The Melody Triangle is
samer@23	638 such a system.
hekeus@13	639
samer@23	640 \subsection{The Melody Triangle}
samer@23	641 The Melody Triangle is an exploratory interface for the discovery of melodic
samer@23	642 content, where the input -- positions within a triangle -- directly map to
samer@23	643 information theoretic measures associated with the output.
samer@23	644 The measures are the entropy rate, redundancy and predictive information rate
samer@23	645 of the random process used to generate the sequence of notes.
samer@23	646 These are all related to the predictability of the the sequence and as such
samer@23	647 address the notions of expectation and surprise in the perception of
samer@23	648 music.\emph{self-plagiarised}
hekeus@13	649
samer@23	650 Before the Melody Triangle can used, it has to be `populated' with possible
samer@23	651 parameter values for the melody generators. These are then plotted in a 3d
samer@23	652 statistical space of redundancy, entropy rate and predictive information rate.
samer@23	653 In our case we generated thousands of transition matrixes, representing first-order
samer@23	654 Markov chains, by a random sampling method. In figure \ref{InfoDynEngine} we see
samer@23	655 a representation of how these matrixes are distributed in the 3d statistical
samer@23	656 space; each one of these points corresponds to a transition
samer@23	657 matrix.\emph{self-plagiarised}
hekeus@17	658
samer@4	659
samer@23	660 When we look at the distribution of transition matrixes plotted in this space,
samer@23	661 we see that it forms an arch shape that is fairly thin. It thus becomes a
samer@23	662 reasonable approximation to pretend that it is just a sheet in two dimensions;
samer@23	663 and so we stretch out this curved arc into a flat triangle. It is this triangular
samer@23	664 sheet that is our `Melody Triangle' and forms the interface by which the system
samer@23	665 is controlled. \emph{self-plagiarised}
samer@4	666
samer@23	667 When the Melody Triangle is used, regardless of whether it is as a screen based
samer@23	668 system, or as an interactive installation, it involves a mapping to this statistical
samer@23	669 space. When the user, through the interface, selects a position within the
samer@23	670 triangle, the corresponding transition matrix is returned. Figure \ref{TheTriangle}
samer@23	671 shows how the triangle maps to different measures of redundancy, entropy rate
samer@23	672 and predictive information rate.\emph{self-plagiarised}
samer@34	673
samer@23	674 Each corner corresponds to three different extremes of predictability and
samer@23	675 unpredictability, which could be loosely characterised as `periodicity', `noise'
samer@23	676 and `repetition'. Melodies from the `noise' corner have no discernible pattern;
samer@23	677 they have high entropy rate, low predictive information rate and low redundancy.
samer@23	678 These melodies are essentially totally random. A melody along the `periodicity'
samer@23	679 to `repetition' edge are all deterministic loops that get shorter as we approach
samer@23	680 the `repetition' corner, until it becomes just one repeating note. It is the
samer@23	681 areas in between the extremes that provide the more `interesting' melodies. That
samer@23	682 is, those that have some level of unpredictability, but are not completely ran-
samer@23	683 dom. Or, conversely, that are predictable, but not entirely so. This triangular
samer@23	684 space allows for an intuitive explorationof expectation and surprise in temporal
samer@23	685 sequences based on a simple model of how one might guess the next event given
samer@23	686 the previous one.\emph{self-plagiarised}
samer@23	687
samer@34	688 \begin{figure}
samer@34	689 \centering
samer@34	690 \includegraphics[width=\linewidth]{figs/mtriscat}
samer@34	691 \caption{The population of transition matrices distributed along three axes of
samer@34	692 redundancy, entropy rate and predictive information rate (all measured in bits).
samer@34	693 The concentrations of points along the redundancy axis correspond
samer@34	694 to Markov chains which are roughly periodic with periods of 2 (redundancy 1 bit),
samer@34	695 3, 4, \etc all the way to period 8 (redundancy 3 bits). The colour of each point
samer@34	696 represents its PIR---note that the highest values are found at intermediate entropy
samer@34	697 and redundancy, and that the distribution as a whole makes a curved triangle. Although
samer@34	698 not visible in this plot, it is largely hollow in the middle.
samer@34	699 \label{InfoDynEngine}}
samer@34	700 \end{figure}
samer@34	701
samer@4	702
samer@23	703
samer@23	704 Any number of interfaces could be developed for the Melody Triangle. We have
samer@23	705 developed two; a standard screen based interface where a user moves tokens with
samer@23	706 a mouse in and around a triangle on screen, and a multi-user interactive
samer@23	707 installation where a Kinect camera tracks individuals in a space and maps their
samer@23	708 positions in the space to the triangle.
samer@23	709 Each visitor would generate a melody, and could collaborate with their co-visitors
samer@23	710 to generate musical textures -- a playful yet informative way to explore
samer@23	711 expectation and surprise in music.
samer@23	712
samer@23	713 As a screen based interface the Melody Triangle can serve as composition tool.
samer@23	714 A triangle is drawn on the screen, screen space thus mapped to the statistical
samer@23	715 space of the Melody Triangle.
samer@23	716 A number of round tokens, each representing a melody can be dragged in and
samer@23	717 around the triangle. When a token is dragged into the triangle, the system
samer@23	718 will start generating the sequence of notes with statistical properties that
samer@23	719 correspond to its position in the triangle.\emph{self-plagiarised}
samer@23	720
samer@23	721 In this mode, the Melody Triangle can be used as a kind of composition assistant
samer@23	722 for the generation of interesting musical textures and melodies. However unlike
samer@23	723 other computer aided composition tools or programming environments, here the
samer@23	724 composer engages with music on the high and abstract level of expectation,
samer@23	725 randomness and predictability.\emph{self-plagiarised}
samer@23	726
hekeus@13	727
hekeus@13	728 Additionally the Melody Triangle serves as an effective tool for experimental investigations into musical preference and their relationship to the information dynamics models.
samer@4	729
hekeus@13	730 %As the Melody Triangle essentially operates on a stream of symbols, it it is possible to apply the melody triangle to the design of non-sonic content.
hekeus@13	731
samer@34	732 \begin{figure}
samer@34	733 \centering
samer@34	734 \includegraphics[width=0.9\linewidth]{figs/TheTriangle.pdf}
samer@34	735 \caption{The Melody Triangle\label{TheTriangle}}
samer@34	736 \end{figure}
samer@34	737
hekeus@13	738 \section{Musical Preference and Information Dynamics}
samer@23	739 We carried out a preliminary study that sought to identify any correlation between
samer@23	740 aesthetic preference and the information theoretical measures of the Melody
samer@23	741 Triangle. In this study participants were asked to use the screen based interface
samer@23	742 but it was simplified so that all they could do was move tokens around. To help
samer@23	743 discount visual biases, the axes of the triangle would be randomly rearranged
samer@23	744 for each participant.\emph{self-plagiarised}
hekeus@16	745
samer@23	746 The study was divided in to two parts, the first investigated musical preference
samer@23	747 with respect to single melodies at different tempos. In the second part of the
samer@23	748 study, a background melody is playing and the participants are asked to continue
samer@23	749 playing with the system under the implicit assumption that they will try to find
samer@23	750 a second melody that works well with the background melody. For each participant
samer@23	751 this was done four times, each with a different background melody from four
samer@23	752 different areas of the Melody Triangle. For all parts of the study the participants
samer@23	753 were asked to signal, by pressing the space bar, whenever they liked what they
samer@23	754 were hearing.\emph{self-plagiarised}
samer@4	755
hekeus@13	756 \emph{todo - results}
samer@4	757
hekeus@13	758 \section{Information Dynamics as Evaluative Feedback Mechanism}
hekeus@13	759
hekeus@13	760 \emph{todo - code the info dyn evaluator :) }
samer@4	761
samer@23	762 It is possible to use information dynamics measures to develop a kind of `critic'
samer@23	763 that would evaluate a stream of symbols. For instance we could develop a system
samer@23	764 to notify us if a stream of symbols is too boring, either because they are too
samer@23	765 repetitive or too chaotic. This could be used to evaluate both pre-composed
samer@23	766 streams of symbols, or could even be used to provide real-time feedback in an
samer@23	767 improvisatory setup.
hekeus@13	768
samer@23	769 \emph{comparable system} Gordon Pask's Musicolor (1953) applied a similar notion
samer@23	770 of boredom in its design. The Musicolour would react to audio input through a
samer@23	771 microphone by flashing coloured lights. Rather than a direct mapping of sound
samer@23	772 to light, Pask designed the device to be a partner to a performing musician. It
samer@23	773 would adapt its lighting pattern based on the rhythms and frequencies it would
samer@23	774 hear, quickly `learning' to flash in time with the music. However Pask endowed
samer@23	775 the device with the ability to `be bored'; if the rhythmic and frequency content
samer@23	776 of the input remained the same for too long it would listen for other rhythms
samer@23	777 and frequencies, only lighting when it heard these. As the Musicolour would
samer@23	778 `get bored', the musician would have to change and vary their playing, eliciting
samer@23	779 new and unexpected outputs in trying to keep the Musicolour interested.
samer@4	780
samer@23	781 In a similar vein, our \emph{Information Dynamics Critic}(name?) allows for an
samer@23	782 evaluative measure of an input stream, however containing a more sophisticated
samer@23	783 notion of boredom that \dots
samer@23	784
hekeus@13	785
hekeus@13	786
hekeus@13	787
samer@4	788 \section{Conclusion}
samer@4	789
samer@9	790 \bibliographystyle{unsrt}
hekeus@16	791 {\bibliography{all,c4dm,nime}}
samer@4	792 \end{document}

Mercurial > hg > cip2012

annotate draft.tex @ 34:25846c37a08a