Mercurial > hg > cip2012
changeset 73:56508a08924a
Camera ready version.
author | samer |
---|---|
date | Mon, 16 Apr 2012 15:33:42 +0100 |
parents | 9135f6fb1a68 |
children | 90901fd611d1 |
files | final.pdf final.tex |
diffstat | 2 files changed, 1236 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/final.tex Mon Apr 16 15:33:42 2012 +0100 @@ -0,0 +1,1236 @@ +\documentclass[conference]{IEEEtran} +\usepackage{fixltx2e} +\usepackage{cite} +\usepackage[spacing]{microtype} +\usepackage[cmex10]{amsmath} +\usepackage{graphicx} +\usepackage{amssymb} +\usepackage{epstopdf} +\usepackage{url} +\usepackage{listings} +%\usepackage[expectangle]{tools} +\usepackage{tools} +\usepackage{tikz} +\usetikzlibrary{calc} +\usetikzlibrary{matrix} +\usetikzlibrary{patterns} +\usetikzlibrary{arrows} + +\let\citep=\cite +\newcommand{\colfig}[2][1]{\includegraphics[width=#1\linewidth]{figs/#2}}% +\newcommand\preals{\reals_+} +\newcommand\X{\mathcal{X}} +\newcommand\Y{\mathcal{Y}} +\newcommand\domS{\mathcal{S}} +\newcommand\A{\mathcal{A}} +\newcommand\Data{\mathcal{D}} +\newcommand\rvm[1]{\mathrm{#1}} +\newcommand\sps{\,.\,} +\newcommand\Ipred{\mathcal{I}_{\mathrm{pred}}} +\newcommand\Ix{\mathcal{I}} +\newcommand\IXZ{\overline{\underline{\mathcal{I}}}} +\newcommand\x{\vec{x}} +\newcommand\Ham[1]{\mathcal{H}_{#1}} +\newcommand\subsets[2]{[#1]^{(k)}} +\def\bet(#1,#2){#1..#2} + + +\def\ev(#1=#2){#1\!\!=\!#2} +\newcommand\rv[1]{\Omega \to #1} +\newcommand\ceq{\!\!=\!} +\newcommand\cmin{\!-\!} +\newcommand\modulo[2]{#1\!\!\!\!\!\mod#2} + +\newcommand\sumitoN{\sum_{i=1}^N} +\newcommand\sumktoK{\sum_{k=1}^K} +\newcommand\sumjtoK{\sum_{j=1}^K} +\newcommand\sumalpha{\sum_{\alpha\in\A}} +\newcommand\prodktoK{\prod_{k=1}^K} +\newcommand\prodjtoK{\prod_{j=1}^K} + +\newcommand\past[1]{\overset{\rule{0pt}{0.2em}\smash{\leftarrow}}{#1}} +\newcommand\fut[1]{\overset{\rule{0pt}{0.1em}\smash{\rightarrow}}{#1}} +\newcommand\parity[2]{P^{#1}_{2,#2}} +\newcommand\specint[1]{\frac{1}{2\pi}\int_{-\pi}^\pi #1{S(\omega)} \dd \omega} +%\newcommand\specint[1]{\int_{-1/2}^{1/2} #1{S(f)} \dd f} + + +%\usepackage[parfill]{parskip} + +\begin{document} +\title{Cognitive Music Modelling: an\\Information Dynamics Approach} + +\author{ + \IEEEauthorblockN{Samer A. Abdallah, Henrik Ekeus, Peter Foster} + \IEEEauthorblockN{Andrew Robertson and Mark D. Plumbley} + \IEEEauthorblockA{Centre for Digital Music\\ + Queen Mary University of London\\ + Mile End Road, London E1 4NS}} + +\maketitle +\begin{abstract} + We describe an information-theoretic approach to the analysis + of music and other sequential data, which emphasises the predictive aspects + of perception, and the dynamic process + of forming and modifying expectations about an unfolding stream of data, + characterising these using the tools of information theory: entropies, + mutual informations, and related quantities. + After reviewing the theoretical foundations, +% we present a new result on predictive information rates in high-order Markov chains, and + we discuss a few emerging areas of application, including + musicological analysis, real-time beat-tracking analysis, and the generation + of musical materials as a cognitively-informed compositional aid. +\end{abstract} + + +\section{Introduction} +\label{s:Intro} + The relationship between + Shannon's \cite{Shannon48} information theory and music and art in general has been the + subject of some interest since the 1950s + \cite{Youngblood58,CoonsKraehenbuehl1958,Moles66,Meyer67,Cohen1962}. + The general thesis is that perceptible qualities and subjective states + like uncertainty, surprise, complexity, tension, and interestingness + are closely related to information-theoretic quantities like + entropy, relative entropy, and mutual information. + + Music is also an inherently dynamic process, + where listeners build up expectations about what is to happen next, + which may be fulfilled + immediately, after some delay, or modified as the music unfolds. + In this paper, we explore this ``Information Dynamics'' view of music, + discussing the theory behind it and some emerging applications. + + \subsection{Expectation and surprise in music} + The idea that the musical experience is strongly shaped by the generation + and playing out of strong and weak expectations was put forward by, amongst others, + music theorists L. B. Meyer \cite{Meyer67} and Narmour \citep{Narmour77}, but was + recognised much earlier; for example, + it was elegantly put by Hanslick \cite{Hanslick1854} in the + nineteenth century: + \begin{quote} + `The most important factor in the mental process which accompanies the + act of listening to music, and which converts it to a source of pleasure, + is \ldots the intellectual satisfaction + which the listener derives from continually following and anticipating + the composer's intentions---now, to see his expectations fulfilled, and + now, to find himself agreeably mistaken. + %It is a matter of course that + %this intellectual flux and reflux, this perpetual giving and receiving + %takes place unconsciously, and with the rapidity of lightning-flashes.' + \end{quote} + An essential aspect of this is that music is experienced as a phenomenon + that unfolds in time, rather than being apprehended as a static object + presented in its entirety. Meyer argued that the experience depends + on how we change and revise our conceptions \emph{as events happen}, on + how expectation and prediction interact with occurrence, and that, to a + large degree, the way to understand the effect of music is to focus on + this `kinetics' of expectation and surprise. + + Prediction and expectation are essentially probabilistic concepts + and can be treated mathematically using probability theory. + We suppose that when we listen to music, expectations are created on the basis + of our familiarity with various styles of music and our ability to + detect and learn statistical regularities in the music as they emerge, + There is experimental evidence that human listeners are able to internalise + statistical knowledge about musical structure, \eg +% \citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also + \citep{SaffranJohnsonAslin1999}, and also + that statistical models can form an effective basis for computational + analysis of music, \eg + \cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}. + +% \subsection{Music and information theory} + +% With a probabilistic framework for music modelling and prediction in hand, +% we can %are in a position to +% compute various +\comment{ + which provides us with a number of measures, such as entropy + and mutual information, which are suitable for quantifying states of + uncertainty and surprise, and thus could potentially enable us to build + quantitative models of the listening process described above. They are + what Berlyne \cite{Berlyne71} called `collative variables' since they are + to do with patterns of occurrence rather than medium-specific details. + Berlyne sought to show that the collative variables are closely related to + perceptual qualities like complexity, tension, interestingness, + and even aesthetic value, not just in music, but in other temporal + or visual media. + The relevance of information theory to music and art has + also been addressed by researchers from the 1950s onwards + \cite{Youngblood58,CoonsKraehenbuehl1958,Cohen1962,HillerBean66,Moles66,Meyer67}. +} +% information-theoretic quantities like entropy, relative entropy, +% and mutual information. +% and are major determinants of the overall experience. +% Berlyne's `new experimental aesthetics', the `information-aestheticians'. + +% Listeners then experience greater or lesser levels of surprise +% in response to departures from these norms. +% By careful manipulation +% of the material, the composer can thus define, and induce within the +% listener, a temporal programme of varying +% levels of uncertainty, ambiguity and surprise. + + +\subsection{Information dynamic approach} + Our working hypothesis is that, as an intelligent, predictive + agent (to which will refer as `it') listens to a piece of music, it maintains + a dynamically evolving probabilistic belief state that enables it to make predictions + about how the piece will continue, relying on both its previous experience + of music and the emerging themes of the piece. As events unfold, it revises + this belief state, which includes predictive + distributions over possible future events. These +% distributions and changes in distributions + can be characterised in terms of a handful of information + theoretic-measures such as entropy and relative entropy, + what Berlyne \cite{Berlyne71} called `collative variables', since + they are to do with \emph{patterns} of occurrence, rather than the details + of which specific things occur, + and developed the ideas of `information aesthetics' in an experimental setting. + By tracing the + evolution of a these measures, we obtain a representation which captures much + of the significant structure of the music. + +% In addition, when adaptive probabilistic models are used, expectations are +% created mainly in response to \emph{patterns} of occurence, +% rather the details of which specific things occur. + One consequence of this approach is that regardless of the details of + the sensory input or even which sensory modality is being processed, the resulting + analysis is in terms of the same units: quantities of information (bits) and + rates of information flow (bits per second). The information + theoretic concepts in terms of which the analysis is framed are universal to all sorts + of data. + Together, these suggest that an information dynamic analysis captures a + high level of \emph{abstraction}, and could be used to + make structural comparisons between different temporal media, + such as music, film, animation, and dance. +% analyse and compare information +% flow in different temporal media regardless of whether they are auditory, +% visual or otherwise. + + Another consequence is that the information dynamic approach gives us a principled way + to address the notion of \emph{subjectivity}, since the analysis is dependent on the + probability model the observer starts off with, which may depend on prior experience + or other factors, and which may change over time. Thus, inter-subject variablity and + variation in subjects' responses over time are + fundamental to the theory. + + %modelling the creative process, which often alternates between generative + %and selective or evaluative phases \cite{Boden1990}, and would have + %applications in tools for computer aided composition. + + +\section{Theoretical review} + + \subsection{Entropy and information} + \label{s:entro-info} + + Let $X$ denote some variable whose value is initially unknown to our + hypothetical observer. We will treat $X$ mathematically as a random variable, + with a value to be drawn from some set $\X$ and a + probability distribution representing the observer's beliefs about the + true value of $X$. + In this case, the observer's uncertainty about $X$ can be quantified + as the entropy of the random variable $H(X)$. For a discrete variable + with probability mass function $p:\X \to [0,1]$, this is + \begin{equation} + H(X) = \sum_{x\in\X} -p(x) \log p(x), % = \expect{-\log p(X)}, + \end{equation} +% where $\expect{}$ is the expectation operator. + The negative-log-probability + $\ell(x) = -\log p(x)$ of a particular value $x$ can usefully be thought of as + the \emph{surprisingness} of the value $x$ should it be observed, and + hence the entropy is the expectation of the surprisingness, $\expect \ell(X)$. + + Now suppose that the observer receives some new data $\Data$ that + causes a revision of its beliefs about $X$. The \emph{information} + in this new data \emph{about} $X$ can be quantified as the + relative entropy or + Kullback-Leibler (KL) divergence between the prior and posterior + distributions $p(x)$ and $p(x|\Data)$ respectively: + \begin{equation} + \mathcal{I}_{\Data\to X} = D(p_{X|\Data} || p_{X}) + = \sum_{x\in\X} p(x|\Data) \log \frac{p(x|\Data)}{p(x)}. + \label{eq:info} + \end{equation} + When there are multiple variables $X_1, X_2$ + \etc which the observer believes to be dependent, then the observation of + one may change its beliefs and hence yield information about the + others. The joint and conditional entropies as described in any + textbook on information theory (\eg \cite{CoverThomas}) then quantify + the observer's expected uncertainty about groups of variables given the + values of others. In particular, the \emph{mutual information} + $I(X_1;X_2)$ is both the expected information + in an observation of $X_2$ about $X_1$ and the expected reduction + in uncertainty about $X_1$ after observing $X_2$: + \begin{equation} + I(X_1;X_2) = H(X_1) - H(X_1|X_2), + \end{equation} + where $H(X_1|X_2) = H(X_1,X_2) - H(X_2)$ is the conditional entropy + of $X_1$ given $X_2$. A little algebra shows that $I(X_1;X_2)=I(X_2;X_1)$ + and so the mutual information is symmetric in its arguments. A conditional + form of the mutual information can be formulated analogously: + \begin{equation} + I(X_1;X_2|X_3) = H(X_1|X_3) - H(X_1|X_2,X_3). + \end{equation} + These relationships between the various entropies and mutual + informations are conveniently visualised in \emph{information diagrams} + or I-diagrams \cite{Yeung1991} such as the one in \figrf{venn-example}. + + \begin{fig}{venn-example} + \newcommand\rad{2.2em}% + \newcommand\circo{circle (3.4em)}% + \newcommand\labrad{4.3em} + \newcommand\bound{(-6em,-5em) rectangle (6em,6em)} + \newcommand\colsep{\ } + \newcommand\clipin[1]{\clip (#1) \circo;}% + \newcommand\clipout[1]{\clip \bound (#1) \circo;}% + \newcommand\cliptwo[3]{% + \begin{scope} + \clipin{#1}; + \clipin{#2}; + \clipout{#3}; + \fill[black!30] \bound; + \end{scope} + }% + \newcommand\clipone[3]{% + \begin{scope} + \clipin{#1}; + \clipout{#2}; + \clipout{#3}; + \fill[black!15] \bound; + \end{scope} + }% + \begin{tabular}{c@{\colsep}c} + \scalebox{0.9}{% + \begin{tikzpicture}[baseline=0pt] + \coordinate (p1) at (90:\rad); + \coordinate (p2) at (210:\rad); + \coordinate (p3) at (-30:\rad); + \clipone{p1}{p2}{p3}; + \clipone{p2}{p3}{p1}; + \clipone{p3}{p1}{p2}; + \cliptwo{p1}{p2}{p3}; + \cliptwo{p2}{p3}{p1}; + \cliptwo{p3}{p1}{p2}; + \begin{scope} + \clip (p1) \circo; + \clip (p2) \circo; + \clip (p3) \circo; + \fill[black!45] \bound; + \end{scope} + \draw (p1) \circo; + \draw (p2) \circo; + \draw (p3) \circo; + \path + (barycentric cs:p3=1,p1=-0.2,p2=-0.1) +(0ex,0) node {$I_{3|12}$} + (barycentric cs:p1=1,p2=-0.2,p3=-0.1) +(0ex,0) node {$I_{1|23}$} + (barycentric cs:p2=1,p3=-0.2,p1=-0.1) +(0ex,0) node {$I_{2|13}$} + (barycentric cs:p3=1,p2=1,p1=-0.55) +(0ex,0) node {$I_{23|1}$} + (barycentric cs:p1=1,p3=1,p2=-0.55) +(0ex,0) node {$I_{13|2}$} + (barycentric cs:p2=1,p1=1,p3=-0.55) +(0ex,0) node {$I_{12|3}$} + (barycentric cs:p3=1,p2=1,p1=1) node {$I_{123}$} + ; + \path + (p1) +(140:\labrad) node {$X_1$} + (p2) +(-140:\labrad) node {$X_2$} + (p3) +(-40:\labrad) node {$X_3$}; + \end{tikzpicture}% + } + & + \parbox{0.5\linewidth}{ + \small + \begin{align*} + I_{1|23} &= H(X_1|X_2,X_3) \\ + I_{13|2} &= I(X_1;X_3|X_2) \\ + I_{1|23} + I_{13|2} &= H(X_1|X_2) \\ + I_{12|3} + I_{123} &= I(X_1;X_2) + \end{align*} + } + \end{tabular} + \caption{ + I-diagram of entropies and mutual informations + for three random variables $X_1$, $X_2$ and $X_3$. The areas of + the three circles represent $H(X_1)$, $H(X_2)$ and $H(X_3)$ respectively. + The total shaded area is the joint entropy $H(X_1,X_2,X_3)$. + The central area $I_{123}$ is the co-information \cite{McGill1954}. + Some other information measures are indicated in the legend. + } + \end{fig} + + + \subsection{Surprise and information in sequences} + \label{s:surprise-info-seq} + + Suppose that $(\ldots,X_{-1},X_0,X_1,\ldots)$ is a sequence of + random variables, infinite in both directions, + and that $\mu$ is the associated probability measure over all + realisations of the sequence. In the following, $\mu$ will simply serve + as a label for the process. We can indentify a number of information-theoretic + measures meaningful in the context of a sequential observation of the sequence, during + which, at any time $t$, the sequence can be divided into a `present' $X_t$, a `past' + $\past{X}_t \equiv (\ldots, X_{t-2}, X_{t-1})$, and a `future' + $\fut{X}_t \equiv (X_{t+1},X_{t+2},\ldots)$. + We will write the actually observed value of $X_t$ as $x_t$, and + the sequence of observations up to but not including $x_t$ as + $\past{x}_t$. +% Since the sequence is assumed stationary, we can without loss of generality, +% assume that $t=0$ in the following definitions. + + The in-context surprisingness of the observation $X_t=x_t$ depends on + both $x_t$ and the context $\past{x}_t$: + \begin{equation} + \ell_t = - \log p(x_t|\past{x}_t). + \end{equation} + However, before $X_t$ is observed, the observer can compute + the \emph{expected} surprisingness as a measure of its uncertainty about + $X_t$; this may be written as an entropy + $H(X_t|\ev(\past{X}_t = \past{x}_t))$, but note that this is + conditional on the \emph{event} $\ev(\past{X}_t=\past{x}_t)$, not the + \emph{variables} $\past{X}_t$ as in the conventional conditional entropy. + + The surprisingness $\ell_t$ and expected surprisingness + $H(X_t|\ev(\past{X}_t=\past{x}_t))$ + can be understood as \emph{subjective} information dynamic measures, since they are + based on the observer's probability model in the context of the actually observed sequence + $\past{x}_t$. They characterise what it is like to be `in the observer's shoes'. + If we view the observer as a purely passive or reactive agent, this would + probably be sufficient, but for active agents such as humans or animals, it is + often necessary to \emph{aniticipate} future events in order, for example, to plan the + most effective course of action. It makes sense for such observers to be + concerned about the predictive probability distribution over future events, + $p(\fut{x}_t|\past{x}_t)$. When an observation $\ev(X_t=x_t)$ is made in this context, + the \emph{instantaneous predictive information} (IPI) $\mathcal{I}_t$ at time $t$ + is the information in the event $\ev(X_t=x_t)$ about the entire future of the sequence $\fut{X}_t$, + \emph{given} the observed past $\past{X}_t=\past{x}_t$. + Referring to the definition of information \eqrf{info}, this is the KL divergence + between prior and posterior distributions over possible futures, which written out in full, is + \begin{equation} + \mathcal{I}_t = \sum_{\fut{x}_t \in \X^*} + p(\fut{x}_t|x_t,\past{x}_t) \log \frac{ p(\fut{x}_t|x_t,\past{x}_t) }{ p(\fut{x}_t|\past{x}_t) }, + \end{equation} + where the sum is to be taken over the set of infinite sequences $\X^*$. + Note that it is quite possible for an event to be surprising but not informative + in a predictive sense. + As with the surprisingness, the observer can compute its \emph{expected} IPI + at time $t$, which reduces to a mutual information $I(X_t;\fut{X}_t|\ev(\past{X}_t=\past{x}_t))$ + conditioned on the observed past. This could be used, for example, as an estimate + of attentional resources which should be directed at this stream of data, which may + be in competition with other sensory streams. + + \subsection{Information measures for stationary random processes} + \label{s:process-info} + + + \begin{fig}{predinfo-bg} + \newcommand\subfig[2]{\shortstack{#2\\[0.75em]#1}} + \newcommand\rad{2em}% + \newcommand\ovoid[1]{% + ++(-#1,\rad) + -- ++(2 * #1,0em) arc (90:-90:\rad) + -- ++(-2 * #1,0em) arc (270:90:\rad) + }% + \newcommand\axis{2.75em}% + \newcommand\olap{0.85em}% + \newcommand\offs{3.6em} + \newcommand\colsep{\hspace{5em}} + \newcommand\longblob{\ovoid{\axis}} + \newcommand\shortblob{\ovoid{1.75em}} + \begin{tabular}{c} +\comment{ + \subfig{(a) multi-information and entropy rates}{% + \begin{tikzpicture}%[baseline=-1em] + \newcommand\rc{1.75em} + \newcommand\throw{2.5em} + \coordinate (p1) at (180:1.5em); + \coordinate (p2) at (0:0.3em); + \newcommand\bound{(-7em,-2.6em) rectangle (7em,3.0em)} + \newcommand\present{(p2) circle (\rc)} + \newcommand\thepast{(p1) ++(-\throw,0) \ovoid{\throw}} + \newcommand\fillclipped[2]{% + \begin{scope}[even odd rule] + \foreach \thing in {#2} {\clip \thing;} + \fill[black!#1] \bound; + \end{scope}% + }% + \fillclipped{30}{\present,\bound \thepast} + \fillclipped{15}{\present,\bound \thepast} + \fillclipped{45}{\present,\thepast} + \draw \thepast; + \draw \present; + \node at (barycentric cs:p2=1,p1=-0.3) {$h_\mu$}; + \node at (barycentric cs:p2=1,p1=1) [shape=rectangle,fill=black!45,inner sep=1pt]{$\rho_\mu$}; + \path (p2) +(90:3em) node {$X_0$}; + \path (p1) +(-3em,0em) node {\shortstack{infinite\\past}}; + \path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$}; + \end{tikzpicture}}% + \\[1em] + \subfig{(a) excess entropy}{% + \newcommand\blob{\longblob} + \begin{tikzpicture} + \coordinate (p1) at (-\offs,0em); + \coordinate (p2) at (\offs,0em); + \begin{scope} + \clip (p1) \blob; + \clip (p2) \blob; + \fill[lightgray] (-1,-1) rectangle (1,1); + \end{scope} + \draw (p1) +(-0.5em,0em) node{\shortstack{infinite\\past}} \blob; + \draw (p2) +(0.5em,0em) node{\shortstack{infinite\\future}} \blob; + \path (0,0) node (future) {$E$}; + \path (p1) +(-2em,\rad) node [anchor=south] {$\ldots,X_{-1}$}; + \path (p2) +(2em,\rad) node [anchor=south] {$X_0,\ldots$}; + \end{tikzpicture}% + }% + \\[1em] +} +% \subfig{(b) predictive information rate $b_\mu$}{% + \begin{tikzpicture}%[baseline=-1em] + \newcommand\rc{2.2em} + \newcommand\throw{2.5em} + \coordinate (p1) at (210:1.5em); + \coordinate (p2) at (90:0.8em); + \coordinate (p3) at (-30:1.5em); + \newcommand\bound{(-7em,-2.6em) rectangle (7em,3.0em)} + \newcommand\present{(p2) circle (\rc)} + \newcommand\thepast{(p1) ++(-\throw,0) \ovoid{\throw}} + \newcommand\future{(p3) ++(\throw,0) \ovoid{\throw}} + \newcommand\fillclipped[2]{% + \begin{scope}[even odd rule] + \foreach \thing in {#2} {\clip \thing;} + \fill[black!#1] \bound; + \end{scope}% + }% +% \fillclipped{80}{\future,\thepast} + \fillclipped{30}{\present,\future,\bound \thepast} + \fillclipped{15}{\present,\bound \future,\bound \thepast} + \draw \future; + \fillclipped{45}{\present,\thepast} + \draw \thepast; + \draw \present; + \node at (barycentric cs:p2=0.9,p1=-0.17,p3=-0.17) {$r_\mu$}; + \node at (barycentric cs:p1=-0.5,p2=1.0,p3=1) {$b_\mu$}; + \node at (barycentric cs:p3=0,p2=1,p1=1.2) [shape=rectangle,fill=black!45,inner sep=1pt]{$\rho_\mu$}; + \path (p2) +(140:3.2em) node {$X_0$}; + % \node at (barycentric cs:p3=0,p2=1,p1=1) {$\rho_\mu$}; + \path (p3) +(3em,0em) node {\shortstack{infinite\\future}}; + \path (p1) +(-3em,0em) node {\shortstack{infinite\\past}}; + \path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$}; + \path (p3) +(4em,\rad) node [anchor=south] {$X_1,\ldots$}; + \end{tikzpicture}%}% +% \\[0.25em] + \end{tabular} + \caption{ + I-diagram illustrating several information measures in + stationary random processes. Each circle or oval represents a random + variable or sequence of random variables relative to time $t=0$. Overlapped areas + correspond to various mutual informations. + The circle represents the `present'. Its total area is + $H(X_0)=\rho_\mu+r_\mu+b_\mu$, where $\rho_\mu$ is the multi-information + rate, $r_\mu$ is the residual entropy rate, and $b_\mu$ is the predictive + information rate. The entropy rate is $h_\mu = r_\mu+b_\mu$. +% The small dark +% region below $X_0$ is $\sigma_\mu$ and the excess entropy +% is $E = \rho_\mu + \sigma_\mu$. + } + \end{fig} + + If we step back, out of the observer's shoes as it were, and consider the + random process $(\ldots,X_{-1},X_0,X_1,\dots)$ as a statistical ensemble of + possible realisations, and furthermore assume that it is stationary, + then it becomes possible to define a number of information-theoretic measures, + closely related to those described above, but which characterise the + process as a whole, rather than on a moment-by-moment basis. Some of these, + such as the entropy rate, are well-known, but others are only recently being + investigated. (In the following, the assumption of stationarity means that + the measures defined below are independent of $t$.) + + The \emph{entropy rate} of the process is the entropy of the `present' + $X_t$ given the `past': + \begin{equation} + \label{eq:entro-rate} + h_\mu = H(X_t|\past{X}_t). + \end{equation} + The entropy rate is a measure of the overall surprisingness + or unpredictability of the process, and gives an indication of the average + level of surprise and uncertainty that would be experienced by an observer + computing the measures of \secrf{surprise-info-seq} on a sequence sampled + from the process. + + The \emph{multi-information rate} $\rho_\mu$ \cite{Dubnov2004} + is the mutual + information between the `past' and the `present': + \begin{equation} + \label{eq:multi-info} + \rho_\mu = I(\past{X}_t;X_t) = H(X_t) - h_\mu. + \end{equation} + It is a measure of how much the preceeding context of an observation + helps in predicting or reducing the suprisingness of the current observation. + + The \emph{excess entropy} \cite{CrutchfieldPackard1983} + is the mutual information between + the entire `past' and the entire `future' plus `present': + \begin{equation} + E = I(\past{X}_t; X_t,\fut{X}_t). + \end{equation} + Both the excess entropy and the multi-information rate can be thought + of as measures of \emph{redundancy}, quantifying the extent to which + the same information is to be found in all parts of the sequence. + + + The \emph{predictive information rate} (or PIR) \cite{AbdallahPlumbley2009} + is the mutual information between the `present' and the `future' given the + `past': + \begin{equation} + \label{eq:PIR} + b_\mu = I(X_t;\fut{X}_t|\past{X}_t) = H(\fut{X}_t|\past{X}_t) - H(\fut{X}_t|X_t,\past{X}_t), + \end{equation} + which can be read as the average reduction + in uncertainty about the future on learning $X_t$, given the past. + Due to the symmetry of the mutual information, it can also be written + as + \begin{equation} +% \IXZ_t +b_\mu = H(X_t|\past{X}_t) - H(X_t|\past{X}_t,\fut{X}_t) = h_\mu - r_\mu, +% \label{<++>} + \end{equation} +% If $X$ is stationary, then + where $r_\mu = H(X_t|\fut{X}_t,\past{X}_t)$, + is the \emph{residual} \cite{AbdallahPlumbley2010}, + or \emph{erasure} \cite{VerduWeissman2006} entropy rate. + The PIR gives an indication of the average IPI that would be experienced + by an observer processing a sequence sampled from this process. + The relationship between these various measures are illustrated in \Figrf{predinfo-bg}; + see James et al \cite{JamesEllisonCrutchfield2011} for further discussion. +% in , along with several of the information measures we have discussed so far. + +\comment{ + James et al v\cite{JamesEllisonCrutchfield2011} review several of these + information measures and introduce some new related ones. + In particular they identify the $\sigma_\mu = I(\past{X}_t;\fut{X}_t|X_t)$, + the mutual information between the past and the future given the present, + as an interesting quantity that measures the predictive benefit of + model-building, that is, maintaining an internal state summarising past + observations in order to make better predictions. It is shown as the + small dark region below the circle in \figrf{predinfo-bg}(c). + By comparing with \figrf{predinfo-bg}(b), we can see that + $\sigma_\mu = E - \rho_\mu$. +} +% They also identify +% $w_\mu = \rho_\mu + b_{\mu}$, which they call the \emph{local exogenous +% information} rate. + + + \subsection{First and higher order Markov chains} + \label{s:markov} +% First order Markov chains are the simplest non-trivial models to which information +% dynamics methods can be applied. + In \cite{AbdallahPlumbley2009} we derived + expressions for all the information measures described in \secrf{surprise-info-seq} for + ergodic first order Markov chains (\ie that have a unique stationary + distribution). +% The derivation is greatly simplified by the dependency structure +% of the Markov chain: for the purpose of the analysis, the `past' and `future' +% segments $\past{X}_t$ and $\fut{X}_t$ can be collapsed to just the previous +% and next variables $X_{t-1}$ and $X_{t+1}$ respectively. + We also showed that + the PIR can be expressed simply in terms of entropy rates: + if we let $a$ denote the $K\times K$ transition matrix of a Markov chain over + an alphabet $\{1,\ldots,K\}$, such that + $a_{ij} = \Pr(\ev(X_t=i|\ev(X_{t-1}=j)))$, and let $h:\reals^{K\times K}\to \reals$ be + the entropy rate function such that $h(a)$ is the entropy rate of a Markov chain + with transition matrix $a$, then the PIR is + \begin{equation} + b_\mu = h(a^2) - h(a), + \end{equation} + where $a^2$ is the transition matrix of the +% `skip one' + Markov chain obtained by jumping two steps at a time + along the original chain. + + Second and higher order Markov chains can be treated in a similar way by transforming + to a first order representation of the high order Markov chain. With + an $N$th order model, this is done by forming a new alphabet of size $K^N$ + consisting of all possible $N$-tuples of symbols from the base alphabet. + An observation $\hat{x}_t$ in this new model encodes a block of $N$ observations + $(x_{t+1},\ldots,x_{t+N})$ from the base model. +% The next +% observation $\hat{x}_{t+1}$ encodes the block of $N$ obtained by shifting the previous +% block along by one step. + The new Markov of chain is parameterised by a sparse $K^N\times K^N$ + transition matrix $\hat{a}$, in terms of which the PIR is + \begin{equation} + h_\mu = h(\hat{a}), \qquad b_\mu = h({\hat{a}^{N+1}}) - N h({\hat{a}}), + \end{equation} + where $\hat{a}^{N+1}$ is the $(N+1)$th power of the first order transition matrix. + Other information measures can also be computed for the high-order Markov chain, including + the multi-information rate $\rho_\mu$ and the excess entropy $E$. (These are identical + for first order Markov chains, but for order $N$ chains, $E$ can be up to $N$ times larger + than $\rho_\mu$.) + + In our experiments with visualising and sonifying sequences sampled from + first order Markov chains \cite{AbdallahPlumbley2009}, we found that + the measures $h_\mu$, $\rho_\mu$ and $b_\mu$ correspond to perceptible + characteristics, and that the transition matrices maximising or minimising + each of these quantities are quite distinct. High entropy rates are associated + with completely uncorrelated sequences with no recognisable temporal structure + (and low $\rho_\mu$ and $b_\mu$). + High values of $\rho_\mu$ are associated with long periodic cycles (and low $h_\mu$ + and $b_\mu$). High values of $b_\mu$ are associated with intermediate values + of $\rho_\mu$ and $h_\mu$, and recognisable, but not completely predictable, + temporal structures. These relationships are visible in \figrf{mtriscat} in + \secrf{composition}, where we pick up this thread again, with an application of + information dynamics in a compositional aid. + + +\section{Information Dynamics in Analysis} + + \subsection{Musicological Analysis} + \label{s:minimusic} + + In \cite{AbdallahPlumbley2009}, we analysed two pieces of music in the minimalist style + by Philip Glass: \emph{Two Pages} (1969) and \emph{Gradus} (1968). + The analysis was done using a first-order Markov chain model, with the + enhancement that the transition matrix of the model was allowed to + evolve dynamically as the notes were processed, and was tracked (in + a Bayesian way) as a \emph{distribution} over possible transition matrices, + rather than a point estimate. Some results are summarised in \figrf{twopages}: + the upper four plots show the dynamically evolving subjective information + measures as described in \secrf{surprise-info-seq}, computed using a point + estimate of the current transition matrix; the fifth plot (the `model information rate') + shows the information in each observation about the transition matrix. + In \cite{AbdallahPlumbley2010b}, we showed that this `model information rate' + is actually a component of the true IPI when the transition + matrix is being learned online, and was neglected when we computed the IPI from + the transition matrix as if it were a constant. + + The peaks of the surprisingness and both components of the IPI + show good correspondence with structure of the piece both as marked in the score + and as analysed by musicologist Keith Potter, who was asked to mark the six + `most surprising moments' of the piece (shown as asterisks in the fifth plot). %% +% \footnote{% +% Note that the boundary marked in the score at around note 5,400 is known to be +% anomalous; on the basis of a listening analysis, some musicologists have +% placed the boundary a few bars later, in agreement with our analysis +% \cite{PotterEtAl2007}.} +% + In contrast, the analyses shown in the lower two plots of \figrf{twopages}, + obtained using two rule-based music segmentation algorithms, while clearly + \emph{reflecting} the structure of the piece, do not \emph{segment} the piece, + with no tendency to peaking of the boundary strength function at + the boundaries in the piece. + + The complete analysis of \emph{Gradus} can be found in \cite{AbdallahPlumbley2009}, + but \figrf{metre} illustrates the result of a metrical analysis: the piece was divided + into bars of 32, 64 and 128 notes. In each case, the average surprisingness and + IPI for the first, second, third \etc notes in each bar were computed. The plots + show that the first note of each bar is, on average, significantly more surprising + and informative than the others, up to the 64-note level, where as at the 128-note, + level, the dominant periodicity appears to remain at 64 notes. + + \begin{fig}{twopages} + \colfig[0.96]{matbase/fig9471}\\ % update from mbc paper +% \colfig[0.97]{matbase/fig72663}\\ % later update from mbc paper (Keith's new picks) + \vspace*{0.5em} + \colfig[0.97]{matbase/fig13377} % rule based analysis + \caption{Analysis of \emph{Two Pages}. + The thick vertical lines are the part boundaries as indicated in + the score by the composer. + The thin grey lines + indicate changes in the melodic `figures' of which the piece is + constructed. In the `model information rate' panel, the black asterisks + mark the six most surprising moments selected by Keith Potter. + The bottom two panels show two rule-based boundary strength analyses. + All information measures are in nats. + %Note that the boundary marked in the score at around note 5,400 is known to be + %anomalous; on the basis of a listening analysis, some musicologists have + %placed the boundary a few bars later, in agreement with our analysis + \cite{PotterEtAl2007}. + } + \end{fig} + + \begin{fig}{metre} +% \scalebox{1}{% + \begin{tabular}{cc} + \colfig[0.45]{matbase/fig36859} & \colfig[0.48]{matbase/fig88658} \\ + \colfig[0.45]{matbase/fig48061} & \colfig[0.48]{matbase/fig46367} \\ + \colfig[0.45]{matbase/fig99042} & \colfig[0.47]{matbase/fig87490} +% \colfig[0.46]{matbase/fig56807} & \colfig[0.48]{matbase/fig27144} \\ +% \colfig[0.46]{matbase/fig87574} & \colfig[0.48]{matbase/fig13651} \\ +% \colfig[0.44]{matbase/fig19913} & \colfig[0.46]{matbase/fig66144} \\ +% \colfig[0.48]{matbase/fig73098} & \colfig[0.48]{matbase/fig57141} \\ +% \colfig[0.48]{matbase/fig25703} & \colfig[0.48]{matbase/fig72080} \\ +% \colfig[0.48]{matbase/fig9142} & \colfig[0.48]{matbase/fig27751} + + \end{tabular}% +% } + \caption{Metrical analysis by computing average surprisingness and + IPI of notes at different periodicities (\ie hypothetical + bar lengths) and phases (\ie positions within a bar). + } + \end{fig} + +\begin{fig*}{drumfig} +% \includegraphics[width=0.9\linewidth]{drum_plots/file9-track.eps}% \\ + \includegraphics[width=0.97\linewidth]{figs/file11-track.eps} \\ +% \includegraphics[width=0.9\linewidth]{newplots/file8-track.eps} + \caption{Information dynamic analysis derived from audio recordings of + drumming, obtained by applying a Bayesian beat tracking system to the + sequence of detected kick and snare drum events. The grey line show the system's + varying level of uncertainty (entropy) about the tempo and phase of the + beat grid, while the stem plot shows the amount of information in each + drum event about the beat grid. The entropy drops instantaneously at each + event and rises gradually between events. + } +\end{fig*} + + \subsection{Real-valued signals and audio analysis} + Using analogous definitions based on the differential entropy + \cite{CoverThomas}, the methods outlined + in \secrf{surprise-info-seq} and \secrf{process-info} + can be reformulated for random variables taking values in a continuous domain + and thus be applied to expressive parameters of music + such as dynamics, timing and timbre, which are readily quantified on a continuous scale. +% +% \subsection{Audio based content analysis} +% Using analogous definitions of differential entropy, the methods outlined +% in the previous section are equally applicable to continuous random variables. +% In the case of music, where expressive properties such as dynamics, tempo, +% timing and timbre are readily quantified on a continuous scale, the information +% dynamic framework may also be considered. +% + Dubnov \cite{Dubnov2004} considers the class of stationary Gaussian + processes, for which the entropy rate may be obtained analytically + from the power spectral density function $S(\omega)$ of the signal, + and found that the + multi-information rate can be + expressed as + \begin{equation} + \rho_\mu = \frac{1}{2} \left( \log \specint{} - \specint{\log}\right). + \label{eq:mir-sfm} + \end{equation} + Dubnov also notes that $e^{-2\rho_\mu}$ is equivalent to the well-known + \emph{spectral flatness measure}, and hence, + Gaussian processes with maximal multi-information rate are those with maximally + non-flat spectra, which are those dominated by a single frequency component. +% These essentially consist of a single +% sinusoidal component and hence are completely predictable once +% the parameters of the sinusoid have been inferred. +% Local stationarity is assumed, which may be achieved by windowing or +% change point detection \cite{Dubnov2008}. + %TODO + + We have found (to appear in forthcoming work) that the predictive information for autoregressive + Gaussian processes can be expressed as + \begin{equation} + b_\mu = \frac{1}{2} \left( \log \specint{\frac{1}} - \specint{\log\frac{1}}\right), + \end{equation} + suggesting a sort of duality between $b_\mu$ and $\rho_\mu$ which is consistent with + the duality between multi-information and predictive information rates we discuss in + \cite{AbdallahPlumbley2012}. A consideration of the residual or erasure entropy rate + \cite{VerduWeissman2006} + suggests that this expression applies to Guassian processes in general but this is + yet to be confirmed rigorously. + + Analysis shows that in stationary autogressive processes of a given finite order, + $\rho_\mu$ is unbounded, while for moving average process of a given order, $b_\mu$ is unbounded. + This is a result of the physically unattainable infinite precision observations which the + theoretical analysis assumes; adding more realistic limitations on the amount of information + that can be extracted from one measurement is the one of the aims of our ongoing work in this + area. +% We are currently working towards methods for the computation of predictive information +% rate in autorregressive and moving average Gaussian processes +% and processes with power-law (or $1/f$) spectra, +% which have previously been investegated in relation to their aesthetic properties +% \cite{Voss75,TaylorSpeharVan-Donkelaar2011}. + +% (fractionally integrated Gaussian noise). +% %(fBm (continuous), fiGn discrete time) possible reference: +% @book{palma2007long, +% title={Long-memory time series: theory and methods}, +% author={Palma, W.}, +% volume={662}, +% year={2007}, +% publisher={Wiley-Blackwell} +% } + + + +% mention non-gaussian processes extension Similarly, the predictive information +% rate may be computed using a Gaussian linear formulation CITE. In this view, +% the PIR is a function of the correlation between random innovations supplied +% to the stochastic process. %Dubnov, MacAdams, Reynolds (2006) %Bailes and Dean (2009) + +% In \cite{Dubnov2006}, Dubnov considers the class of stationary Gaussian +% processes. For such processes, the entropy rate may be obtained analytically +% from the power spectral density of the signal, allowing the multi-information +% rate to be subsequently obtained. One aspect demanding further investigation +% involves the comparison of alternative measures of predictability. In the case of the PIR, a Gaussian linear formulation is applicable, indicating that the PIR is a function of the correlation between random innovations supplied to the stochastic process CITE. + % !!! FIXME + + +\subsection{Beat Tracking} + +A probabilistic method for drum tracking was presented by Robertson +\cite{Robertson11c}. The system infers a beat grid (a sequence +of approximately regular beat times) given audio inputs from a +live drummer, for the purpose of synchronising a music +sequencer with the drummer. +The times of kick and snare drum events are obtained +using dedicated microphones for each drum and a percussive onset detector +\cite{puckette98}. These event times are then sent +to the beat tracker, which maintains a belief state in +the form of distributions over the tempo and phase of the beat grid. +Every time an event is received, these distributions are updated +with respect to a probabilistic model which accounts both for tempo and phase +variations and the emission of drum events at musically plausible times +relative to the beat grid. +%continually updates distributions for tempo and phase on receiving a new +%event time + +The use of a probabilistic belief state means we can compute entropies +representing the system's uncertainty about the beat grid, and quantify +the amount of information in each event about the beat grid as the KL divergence +between prior and posterior distributions. Though this is not strictly the +instantaneous predictive information (IPI) as described in \secrf{surprise-info-seq} +(the information gained is not directly about future event times), we can treat +it as a proxy for the IPI, in the manner of the `model information rate' +described in \secrf{minimusic}, which has a similar status. + +We carried out the analysis on 16 recordings; an example +is shown in \figrf{drumfig}. There we can see variations in the +entropy in the upper graph and the information in each drum event in the lower +stem plot. At certain points in time, unusually large amounts of information +arrive; these may be related to fills and other rhythmic irregularities, which +are often followed by an emphatic return to a steady beat at the beginning +of the next bar---this is something we are currently investigating. +We also analysed the pattern of information flow +on a cyclic metre, much as in \figrf{metre}. All the recordings we +analysed are audibly in 4/4 metre, but we found no +evidence of a general tendency for greater amounts of information to arrive +at metrically strong beats, which suggests that the rhythmic accuracy of the +drummers does not vary systematically across each bar. It is possible that metrical information +existing in the pattern of kick and snare events might emerge in an +analysis using a model that attempts to predict the time and type of +the next drum event, rather than just inferring the beat grid as the current model does. +%The analysis of information rates can b +%considered \emph{subjective}, in that it measures how the drum tracker's +%probability distributions change, and these are contingent upon the +%model used as well as external properties in the signal. +%We expect, +%however, that following periods of increased uncertainty, such as fills +%or expressive timing, the information contained in an individual event +%increases. We also examine whether the information is dependent upon +%metrical position. + + +\section{Information dynamics as compositional aid} +\label{s:composition} + +The use of stochastic processes in music composition has been widespread for +decades---for instance Iannis Xenakis applied probabilistic mathematical models +to the creation of musical materials\cite{Xenakis:1992ul}. While such processes +can drive the \emph{generative} phase of the creative process, information dynamics +can serve as a novel framework for a \emph{selective} phase, by +providing a set of criteria to be used in judging which of the +generated materials +are of value. This alternation of generative and selective phases as been +noted before \cite{Boden1990}. +% +Information-dynamic criteria can also be used as \emph{constraints} on the +generative processes, for example, by specifying a certain temporal profile +of suprisingness and uncertainty the composer wishes to induce in the listener +as the piece unfolds. +%stochastic and algorithmic processes: ; outputs can be filtered to match a set of +%criteria defined in terms of information-dynamical characteristics, such as +%predictability vs unpredictability +%s model, this criteria thus becoming a means of interfacing with the generative processes. + +%The tools of information dynamics provide a way to constrain and select musical +%materials at the level of patterns of expectation, implication, uncertainty, and predictability. +In particular, the behaviour of the predictive information rate (PIR) defined in +\secrf{process-info} make it interesting from a compositional point of view. The definition +of the PIR is such that it is low both for extremely regular processes, such as constant +or periodic sequences, \emph{and} low for extremely random processes, where each symbol +is chosen independently of the others, in a kind of `white noise'. In the former case, +the pattern, once established, is completely predictable and therefore there is no +\emph{new} information in subsequent observations. In the latter case, the randomness +and independence of all elements of the sequence means that, though potentially surprising, +each observation carries no information about the ones to come. + +Processes with high PIR maintain a certain kind of balance between +predictability and unpredictability in such a way that the observer must continually +pay attention to each new observation as it occurs in order to make the best +possible predictions about the evolution of the seqeunce. This balance between predictability +and unpredictability is reminiscent of the inverted `U' shape of the Wundt curve (see \figrf{wundt}), +which summarises the observations of Wundt \cite{Wundt1897} that stimuli are most +pleasing at intermediate levels of novelty or disorder, +where there is a balance between `order' and `chaos'. + +Using the methods of \secrf{markov}, we found \cite{AbdallahPlumbley2009} +a similar shape when plotting entropy rate againt PIR---this is visible in the +upper envelope of the plot in \figrf{mtriscat}, which is a 3-D scatter plot of +three of the information measures discussed in \secrf{process-info} for several thousand +first-order Markov chain transition matrices generated by a random sampling method. +The coordinates of the `information space' are entropy rate ($h_\mu$), redundancy ($\rho_\mu$), and +predictive information rate ($b_\mu$). The points along the `redundancy' axis correspond +to periodic Markov chains. Those along the `entropy' axis produce uncorrelated sequences +with no temporal structure. Processes with high PIR are to be found at intermediate +levels of entropy and redundancy. + +%It is possible to apply information dynamics to the generation of content, such as to the composition of musical materials. + +%For instance a stochastic music generating process could be controlled by modifying +%constraints on its output in terms of predictive information rate or entropy +%rate. + + \begin{fig}{wundt} + \raisebox{-4em}{\colfig[0.43]{wundt}} + % {\ \shortstack{{\Large$\longrightarrow$}\\ {\scriptsize\emph{exposure}}}\ } + {\ {\large$\longrightarrow$}\ } + \raisebox{-4em}{\colfig[0.43]{wundt2}} + \caption{ + The Wundt curve relating randomness/complexity with + perceived value. Repeated exposure sometimes results + in a move to the left along the curve \cite{Berlyne71}. + } + \end{fig} + + + + \subsection{The Melody Triangle} + +These observations led us to construct the `Melody Triangle', a graphical interface for +%for %exploring the melodic patterns generated by each of the Markov chains represented +%as points in \figrf{mtriscat}. +% +%The Melody Triangle is an interface for +the discovery of melodic +materials, where the input---positions within a triangle---directly map to information +theoretic properties of the output. % as exemplified in \figrf{mtriscat}. +%The measures---entropy rate, redundancy and +%predictive information rate---form a criteria with which to filter the output +%of the stochastic processes used to generate sequences of notes. +%These measures +%address notions of expectation and surprise in music, and as such the Melody +%Triangle is a means of interfacing with a generative process in terms of the +%predictability of its output. +%ยง +The triangle is populated with first order Markov chain transition +matrices as illustrated in \figrf{mtriscat}. +The distribution of transition matrices in this space forms a relatively thin +curved sheet. Thus, it is a reasonable simplification to project out the +third dimension (the PIR) and present an interface that is just two dimensional. +The right-angled triangle is rotated, reflected and stretched to form an equilateral triangle with +the $h_\mu=0, \rho_\mu=0$ vertex at the top, the `redundancy' axis down the left-hand +side, and the `entropy rate' axis down the right, as shown in \figrf{TheTriangle}. +This is our `Melody Triangle' and +forms the interface by which the system is controlled. +%Using this interface thus involves a mapping to information space; +The user selects a point within the triangle, this is mapped into the +information space and the nearest transition matrix is used to generate +a sequence of values which are then sonified either as pitched notes or percussive +sounds. By choosing the position within the triangle, the user can control the +output at the level of its `collative' properties, with access to the variety +of patterns as described above and in \secrf{markov}. +%and information-theoretic criteria related to predictability +%and information flow +Though the interface is 2D, the third dimension (PIR) is implicitly present, as +transition matrices retrieved from +along the centre line of the triangle will tend to have higher PIR. +We hypothesise that, under +the appropriate conditions, these will be perceived as more `interesting' or +`melodic.' + +%The corners correspond to three different extremes of predictability and +%unpredictability, which could be loosely characterised as `periodicity', `noise' +%and `repetition'. Melodies from the `noise' corner (high $h_\mu$, low $\rho_\mu$ +%and $b_\mu$) have no discernible pattern; +%those along the `periodicity' +%to `repetition' edge are all cyclic patterns that get shorter as we approach +%the `repetition' corner, until each is just one repeating note. Those along the +%opposite edge consist of independent random notes from non-uniform distributions. +%Areas between the left and right edges will tend to have higher PIR, +%and we hypothesise that, under +%the appropriate conditions, these will be perceived as more `interesting' or +%`melodic.' +%These melodies have some level of unpredictability, but are not completely random. +% Or, conversely, are predictable, but not entirely so. + + \begin{fig}{mtriscat} + \colfig[0.9]{mtriscat} + \caption{The population of transition matrices in the 3D space of + entropy rate ($h_\mu$), redundancy ($\rho_\mu$) and PIR ($b_\mu$), + all in bits. + The concentrations of points along the redundancy axis correspond + to Markov chains which are roughly periodic with periods of 2 (redundancy 1 bit), + 3, 4, \etc all the way to period 7 (redundancy 2.8 bits). The colour of each point + represents its PIR---note that the highest values are found at intermediate entropy + and redundancy, and that the distribution as a whole makes a curved triangle. Although + not visible in this plot, it is largely hollow in the middle.} +\end{fig} + + +%PERHAPS WE SHOULD FOREGO TALKING ABOUT THE +%INSTALLATION VERSION OF THE TRIANGLE? +%feels a bit like a tangent, and could do with the space.. +The Melody Triangle exists in two incarnations: a screen-based interface +where a user moves tokens in and around a triangle on screen, and a multi-user +interactive installation where a Kinect camera tracks individuals in a space and +maps their positions in physical space to the triangle. In the latter each visitor +that enters the installation generates a melody and can collaborate with their +co-visitors to generate musical textures. This makes the interaction physically engaging +and (as our experience with visitors both young and old has demonstrated) more playful. +%Additionally visitors can change the +%tempo, register, instrumentation and periodicity of their melody with body gestures. +% +The screen based interface can serve as a compositional tool. +%%A triangle is drawn on the screen, screen space thus mapped to the statistical +%space of the Melody Triangle. +A number of tokens, each representing a +sonification stream or `voice', can be dragged in and around the triangle. +For each token, a sequence of symbols is sampled using the corresponding +transition matrix, which +%statistical properties that correspond to the token's position is generated. These +%symbols +are then mapped to notes of a scale or percussive sounds% +\footnote{The sampled sequence could easily be mapped to other musical processes, possibly over +different time scales, such as chords, dynamics and timbres. It would also be possible +to map the symbols to visual or other outputs.}% +. Keyboard commands give control over other musical parameters such +as pitch register and inter-onset interval. +%The possibilities afforded by the Melody Triangle in these other domains remains to be investigated.}. +% +The system is capable of generating quite intricate musical textures when multiple tokens +are in the triangle, but unlike other computer aided composition tools or programming +environments, the composer excercises control at the abstract level of information-dynamic +properties. +%the interface relating to subjective expectation and predictability. + +\begin{fig}{TheTriangle} + \colfig[0.7]{TheTriangle.pdf} + \caption{The Melody Triangle} +\end{fig} + +\comment{ +\subsection{Information Dynamics as Evaluative Feedback Mechanism} +%NOT SURE THIS SHOULD BE HERE AT ALL..? +Information measures on a stream of symbols can form a feedback mechanism; a +rudimentary `critic' of sorts. For instance symbol by symbol measure of predictive +information rate, entropy rate and redundancy could tell us if a stream of symbols +is currently `boring', either because it is too repetitive, or because it is too +chaotic. Such feedback would be oblivious to long term and large scale +structures and any cultural norms (such as style conventions), but +nonetheless could provide a composer with valuable insight on +the short term properties of a work. This could not only be used for the +evaluation of pre-composed streams of symbols, but could also provide real-time +feedback in an improvisatory setup. +} + +\subsection{User trials with the Melody Triangle} +We are currently in the process of using the screen-based +Melody Triangle user interface to investigate the relationship between the information-dynamic +characteristics of sonified Markov chains and subjective musical preference. +We carried out a pilot study with six participants, who were asked +to use a simplified form of the user interface (a single controllable token, +and no rhythmic, registral or timbral controls) under two conditions: +one where a single sequence was sonified under user control, and another +where an additional sequence was sonified in a different register, as if generated +by a fixed invisible token in one of four regions of the triangle. In addition, subjects +were asked to press a key if they `liked' what they were hearing. + +We recorded subjects' behaviour as well as points which they marked +with a key press. +Some results for two of the subjects are shown in \figrf{mtri-results}. Though +we have not been able to detect any systematic across-subjects preference for any particular +region of the triangle, subjects do seem to exhibit distinct kinds of exploratory behaviour. +Our initial hypothesis, that subjects would linger longer in regions of the triangle +that produced aesthetically preferable sequences, and that this would tend to be towards the +centre line of the triangle for all subjects, was not confirmed. However, it is possible +that the design of the experiment encouraged an initial exploration of the space (sometimes +very systematic, as for subject c) aimed at \emph{understanding} %the parameter space and +how the system works, rather than finding musical patterns. It is also possible that the +system encourages users to create musically interesting output by \emph{moving the token}, +rather than finding a particular spot in the triangle which produces a musically interesting +sequence by itself. + +\begin{fig}{mtri-results} + \def\scat#1{\colfig[0.42]{mtri/#1}} + \def\subj#1{\scat{scat_dwells_subj_#1} & \scat{scat_marks_subj_#1}} + \begin{tabular}{cc} +% \subj{a} \\ + \subj{b} \\ + \subj{c} \\ + \subj{d} + \end{tabular} + \caption{Dwell times and mark positions from user trials with the + on-screen Melody Triangle interface, for three subjects. The left-hand column shows + the positions in a 2D information space (entropy rate vs multi-information rate + in bits) where each spent their time; the area of each circle is proportional + to the time spent there. The right-hand column shows point which subjects + `liked'; the area of the circles here is proportional to the duration spent at + that point before the point was marked.} +\end{fig} + +Comments collected from the subjects +%during and after the experiment +suggest that +the information-dynamic characteristics of the patterns were readily apparent +to most: several noticed the main organisation of the triangle, +with repetetive notes at the top, cyclic patterns along one edge, and unpredictable +notes towards the opposite corner. Some described their systematic exploration of the space. +Two felt that the right side was `more controllable' than the left (a consequence +of their ability to return to a particular distinctive pattern and recognise it +as one heard previously). Two reported that they became bored towards the end, +but another felt there wasn't enough time to `hear out' the patterns properly. +One subject did not `enjoy' the patterns in the lower region, but another said the lower +central regions were more `melodic' and `interesting'. + +We plan to continue the trials with a slightly less restricted user interface in order +make the experience more enjoyable and thereby give subjects longer to use the interface; +this may allow them to get beyond the initial exploratory phase and give a clearer +picture of their aesthetic preferences. In addition, we plan to conduct a +study under more restrictive conditions, where subjects will have no control over the patterns +other than to signal (a) which of two alternatives they prefer in a forced +choice paradigm, and (b) when they are bored of listening to a given sequence. + +%\emph{comparable system} Gordon Pask's Musicolor (1953) applied a similar notion +%of boredom in its design. The Musicolour would react to audio input through a +%microphone by flashing coloured lights. Rather than a direct mapping of sound +%to light, Pask designed the device to be a partner to a performing musician. It +%would adapt its lighting pattern based on the rhythms and frequencies it would +%hear, quickly `learning' to flash in time with the music. However Pask endowed +%the device with the ability to `be bored'; if the rhythmic and frequency content +%of the input remained the same for too long it would listen for other rhythms +%and frequencies, only lighting when it heard these. As the Musicolour would +%`get bored', the musician would have to change and vary their playing, eliciting +%new and unexpected outputs in trying to keep the Musicolour interested. + + +\section{Conclusions} + + % !!! FIXME +%We reviewed our information dynamics approach to the modelling of the perception +We have looked at several emerging areas of application of the methods and +ideas of information dynamics to various problems in music analysis, perception +and cognition, including musicological analysis of symbolic music, audio analysis, +rhythm processing and compositional and creative tasks. The approach has proved +successful in musicological analysis, and though our initial data on +rhythm processing and aesthetic preference are inconclusive, there is still +plenty of work to be done in this area: where-ever there are probabilistic models, +information dynamics can shed light on their behaviour. + + + +\section*{acknowledgments} +This work is supported by EPSRC Doctoral Training Centre EP/G03723X/1 (HE), +GR/S82213/01 and EP/E045235/1(SA), an EPSRC DTA Studentship (PF), an RAEng/EPSRC Research Fellowship 10216/88 (AR), an EPSRC Leadership Fellowship, EP/G007144/1 +(MDP) and EPSRC IDyOM2 EP/H013059/1. +This work is partly funded by the CoSound project, funded by the Danish Agency for Science, Technology and Innovation. +Thanks also Marcus Pearce for providing the two rule-based analyses of \emph{Two Pages}. + + +\bibliographystyle{IEEEtran} +{\bibliography{all,c4dm,nime,andrew}} +\end{document}