Mercurial > hg > cip2012
changeset 74:90901fd611d1
Final version of talk.
author | samer |
---|---|
date | Fri, 01 Jun 2012 16:17:32 +0100 |
parents | 56508a08924a |
children | 8a146c651475 |
files | talk/abstract talk/figs talk/talk.pdf talk/talk.tex |
diffstat | 4 files changed, 1347 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/talk/abstract Fri Jun 01 16:17:32 2012 +0100 @@ -0,0 +1,27 @@ +** Information dynamics and temporal structure in music ** + + +It has often been observed that one of the more salient effects +of listening to music to create expectations within the listener, +and that part of the art of making music to create a dynamic interplay +of uncertainty, expectation, fulfilment and surprise. It was not until +the publication of Shannon's work on information theory, however, that +the tools became available to quantify some of these concepts. + +In this talk, we will examine how a small number of +\emph{time-varying} information measures, such as entropies and mutual +informations, computed in the context +of a dynamically evolving probabilistic model, can be used to characterise +the temporal structue of a stimulus sequence, considered as a random process +from the point of view of a Bayesian observer. + +One such measure is a novel predictive information rate, which we conjecture +may provide an explanation for the `inverted-U' relationship often found between +simple measures of randomness (\eg entropy rate) and +judgements of aesthetic value [Berlyne 1971]. We explore these ideas in the context +of Markov chains using both artificially generated sequences and +two pieces of minimalist music by Philip Glass, showing that even an overly simple +model (the Markov chain), when interpreted according to information dynamic +principles, produces a structural analysis which largely agrees with that of an +human expert listener and improves on those generated by rule-based methods. +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/talk/figs Fri Jun 01 16:17:32 2012 +0100 @@ -0,0 +1,1 @@ +../../figs \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/talk/talk.tex Fri Jun 01 16:17:32 2012 +0100 @@ -0,0 +1,1319 @@ +\documentclass{beamer} + +\usepackage[T1]{fontenc} +\usepackage{microtype} +\usepackage{multimedia} +\usepackage{tikz} +\usetikzlibrary{matrix} +\usetikzlibrary{patterns} +\usetikzlibrary{arrows} +\usetikzlibrary{calc} +\usepackage{tools} +%\usepackage{amsfonts,amssymb} + +\tikzset{every picture/.style=semithick} + +%%% font options: +% atypewri, frankgth, gillsans, centuryg, futura, eurostil +%\usepackage{fourier} % Maths in serif Utopia +\usepackage[sf]{frankgth} +%\usepackage[sf]{optima} + +%%% Monospace font +%\usepackage[scaled=0.88]{ulgothic} % 0.88 % suits narrow faces +\renewcommand{\ttdefault}{plg} % Adobe Letter Gothic - suits light medium width face +%\renewcommand{\ttdefault}{pcr} % Courier - suits wide faces +% remember to match up size and weight of monospace font to main font + +\newcommand{\mytt}[1]{{\texttt{\footnotesize\fontseries{bx}\selectfont #1}}} + +\DeclareMathAlphabet{\mathcal}{OMS}{cmsy}{m}{n} + + +%%% Black on white +\definecolor{base}{rgb}{0,0,0} +\definecolor{comp}{named}{green} +\definecolor{paper}{named}{white} + +\logo{% + \includegraphics[height=16pt]{qmul-black}\hspace*{45pt}% + \raisebox{1pt}{\includegraphics[height=12pt]{c4dm-black-white}}% +} + +%%% Red on black +\comment{ +\definecolor{base}{rgb}{1,0,0} +\definecolor{comp}{rgb}{0,0.8,0.2} +\definecolor{paper}{named}{black} + +\logo{% + \includegraphics[height=16pt]{qmul-red}\hspace*{45pt}% + \raisebox{1pt}{\includegraphics[height=12pt]{c4dm-red-black}}% +} +} + + +\useinnertheme{default}%circles +\useoutertheme{default} +\usefonttheme[onlymath]{serif} + +\setbeamercolor{normal text}{bg=paper,fg=base!90!-paper} +\setbeamercolor{background}{bg=comp!50!paper,fg=comp} +%\setbeamercolor{structure}{fg=base!75!-paper} +\setbeamercolor{structure}{fg=red!50!base} +\setbeamercolor{palette primary}{bg=yellow!50!paper,fg=yellow} +\setbeamercolor{palette secondary}{bg=orange!50!paper,fg=orange} +\setbeamercolor{palette tertiary}{bg=blue!50!paper,fg=blue} +\setbeamercolor{palette quaternary}{bg=green!50!paper,fg=green} +\setbeamercolor{block body}{bg=base!20!paper} +\setbeamercolor{block title}{bg=base!60!paper,fg=paper} +\setbeamercolor{navigation symbols}{fg=base!90!paper} +\setbeamercolor{separation line}{bg=blue,fg=yellow} +\setbeamercolor{fine separation line}{bg=blue,fg=orange} + +% Title page +% \setbeamercolor{title}{bg=base!20!paper} +% \setbeamercolor{subtitle}{bg=base!20!paper} +% \setbeamercolor{title page}{bg=base!40!paper} + +% \setbeamercolor{headline}{bg=blue} +% \setbeamercolor{footline}{bg=blue} +% \setbeamercolor{frametitle}{bg=base!30!paper} +% \setbeamercolor{framesubtitle}{bg=base!40!paper} + +% \setbeamercolor{section in toc}{bg=base!25!paper,fg=orange} +% \setbeamercolor{section in toc shaded}{bg=base!25!paper,fg=orange!80!paper} +% \setbeamercolor{subsection in toc}{bg=base!25!paper,fg=orange} +% \setbeamercolor{subsection in toc shaded}{bg=yellow!25!paper,fg=orange!80!paper} +% page number in head/foot +% section in head/foot +% section in head/foot shaded + + +\setbeamerfont{structure}{series=\bfseries} +\setbeamerfont{title}{series=\mdseries,size=\Large} +%\setbeamerfont{title}{series=\ltseries,size=\huge} +\setbeamerfont{date}{size=\footnotesize}%,series=\mdcseries} +\setbeamerfont{institute}{size=\footnotesize}%,series=\mdcseries} +\setbeamerfont{author}{size=\footnotesize,series=\bfseries} +\setbeamercolor{bibliography item}{parent={normal text}} +\setbeamercolor{bibliography entry author}{fg=base} +\setbeamercolor{bibliography entry location}{fg=base!70!paper} + +%%% Templates + +\setbeamertemplate{bibliography item}[text] +\setbeamertemplate{bibliography entry title}{ } +\setbeamertemplate{bibliography entry location}{ } +\setbeamertemplate{blocks}[rounded][shadow=false] +\setbeamertemplate{items}[circle] +%\setbeamertemplate{bibliography item}[triangle] +% \setbeamertemplate{title page}[default][rounded=true,shadow=false] +% \setbeamertemplate{frametitle}[default][rounded=true,shadow=false] +\setbeamertemplate{sidebar right}{} +\setbeamertemplate{footline}{ + \hspace*{0.2cm} + \insertlogo + \hfill + \usebeamertemplate***{navigation symbols}% + \hfill + \makebox[6ex]{\hfill\insertframenumber/\inserttotalframenumber}% + \hspace*{0.2cm} + + \vskip 4pt +} + +\setbeamertemplate{navigation symbols} +{% + \hbox{% + \hbox{\insertslidenavigationsymbol} + \hbox{\insertframenavigationsymbol} +% \hbox{\insertsubsectionnavigationsymbol} + \hbox{\insertsectionnavigationsymbol} + \hbox{\insertdocnavigationsymbol} +% \hbox{\insertbackfindforwardnavigationsymbol}% + }% +} + + +\AtBeginSection[]{ + \begin{iframe}[Outline] + \tableofcontents[currentsection] + \end{iframe} +} +%\linespread{1.1} + +\setlength{\parskip}{0.5em} + +\newenvironment{bframe}[1][untitled]{\begin{frame}[allowframebreaks]\frametitle{#1}}{\end{frame}} +\newenvironment{iframe}[1][untitled]{\begin{frame}\frametitle{#1}}{\end{frame}} +\newenvironment{isframe}[1][untitled]{\begin{frame}[fragile=singleslide,environment=isframe]\frametitle{#1}}{\end{frame}} + +\renewenvironment{fig}[1] + {% + \begin{figure} + \def\fglbl{f:#1} + \let\ocap=\caption + \renewcommand{\caption}[2][]{\ocap[##1]{\small ##2}} + \centering\small + }{% + \label{\fglbl} + \end{figure} + } + +\newcommand{\paragraph}[1]{\textbf{#1}\qquad} +\newcommand{\colfig}[2][1]{\includegraphics[width=#1\linewidth]{figs/#2}}% +\let\citep=\cite +%\newcommand{\dotmath}[2]{\psfrag{#1}[Bc][Bc]{\small $#2$}} + +\title{Cognitive Music Modelling:\\An Information Dynamics Approach} +\author{Samer Abdallah, Henrik Ekeus, Peter Foster,\\Andrew Robertson and Mark Plumbley} +\institute{Centre for Digital Music\\Queen Mary, University of London} + +\date{\today} + +\def\X{\mathcal{X}} +\def\Y{\mathcal{Y}} +\def\Past{\mathrm{Past}} +\def\Future{\mathrm{Future}} +\def\Present{\mathrm{Present}} +\def\param{\theta} +\def\trans{a} +\def\init{\pi^{\trans}} +%\def\entrorate(#1){\mathcal{H}(#1)} +%\def\entrorate(#1){\dot{\mathcal{H}}(#1)} +\def\entrorate{h} +\def\emcmarg(#1){b_#1} +\def\mcmarg{\vec{b}} +\def\domS{\mathcal{S}} +\def\domA{\mathcal{A}} + +\def\Lxz(#1,#2){\mathcal{L}(#1|#2)} +\def\LXz(#1){\overline{\mathcal{L}}(#1)} +\def\LxZ(#1){\underline{\mathcal{L}}(#1)} +\def\LXZ{\overline{\underline{\mathcal{L}}}} +\def\Ixz(#1,#2){\mathcal{I}(#1|#2)} +\def\IXz(#1){\overline{\mathcal{I}}(#1)} +\def\IxZ(#1){\underline{\mathcal{I}}(#1)} +\def\IXZ{\overline{\underline{\mathcal{I}}}} + +\def\ev(#1=#2){#1\!\!=\!#2} +\def\sev(#1=#2){#1\!=#2} + +\def\FE{\mathcal{F}} + +\newcommand\past[1]{\overset{\rule{0pt}{0.2em}\smash{\leftarrow}}{#1}} +\newcommand\fut[1]{\overset{\rule{0pt}{0.1em}\smash{\rightarrow}}{#1}} + +\def\cn(#1,#2) {\node[circle,draw,inner sep=0.2em] (#1#2) {${#1}_{#2}$};} +\def\dn(#1) {\node[circle,inner sep=0.2em] (#1) {$\cdots$};} +\def\rl(#1,#2) {\draw (#1) -- (#2);} + +\definecolor{un0}{rgb}{0.5,0.0,0.0} +\definecolor{un1}{rgb}{0.6,0.15,0.15} +\definecolor{un2}{rgb}{0.7,0.3,0.3} +\definecolor{un3}{rgb}{0.8,0.45,0.45} +\definecolor{un4}{rgb}{0.9,0.6,0.6}{ +\definecolor{un5}{rgb}{1.0,0.75,0.75} + +%\def\blob(#1){\node[circle,draw,fill=#1,inner sep=0.25em]{};} +\def\bl(#1){\draw[circle,fill=#1] (0,0) circle (0.4em);} +\def\noderow(#1,#2,#3,#4,#5,#6){% + \tikz{\matrix[draw,rounded corners,inner sep=0.4em,column sep=2.1em,ampersand replacement=\&]{% + \bl(#1)\&\bl(#2)\&\bl(#3)\&\bl(#4)\&\bl(#5)\&\bl(#6)\\};}} + +\begin{document} + \frame{\titlepage} + \section[Outline]{} + \frame{ + \frametitle{Outline} + \tableofcontents + } + + + +\section{Expectation and surprise in music} +\label{s:Intro} + +\begin{iframe}[`Unfoldingness'] + Music is experienced as a + \uncover<2->{phenomenon} + \uncover<3->{that} + \uncover<4->{`unfolds'} \uncover<5->{in}\\ + \only<6>{blancmange}% + \only<7>{(just kidding)}% + \uncover<8->{time,} + \uncover<9->{rather than being apprehended as a static object presented in its + entirety.} + + \uncover<10->{[This is recognised in computation linguistics where the phenomenon is known as \emph{incrementality}, \eg in incremental parsing.]} + + \uncover<11->{% + Meyer \cite{Meyer67} argued that musical experience depends on + how we change and revise our conceptions \emph{as events happen}, + on how expectation and prediction interact with occurrence, and that, to a large + degree, the way to understand the effect of music is to focus on + this `kinetics' of expectation and surprise.% + } +\end{iframe} + +\begin{iframe}[Expectation and suprise in music] + + Music creates + \emph{expectations} of what is to come next, which may be fulfilled + immediately, after some delay, or not at all. + Suggested by music theorists, \eg + L. B. Meyer \cite{Meyer67} and Narmour \citep{Narmour77} but also + noted much earlier by Hanslick \cite{Hanslick1854} in the + 1850s: + \begin{quote} + \small + `The most important factor in the mental process which accompanies the + act of listening to music, and which converts it to a source of pleasure, is + \ldots +% frequently overlooked. We here refer to + the intellectual satisfaction which the + listener derives from continually following and anticipating the composer's + intentions---now, to see his expectations fulfilled, and now, to find himself + agreeably mistaken. It is a matter of course that this intellectual flux and + reflux, this perpetual giving and receiving takes place unconsciously, and with + the rapidity of lightning-flashes.' + \end{quote} +\end{iframe} + +\begin{iframe}[Probabilistic reasoning] + \uncover<1->{% + Making predictions and assessing surprise is + essentially reasoning with degrees of belief and (arguably) + the best way to do this is using Bayesian probability theory \cite{Cox1946,Jaynes27}.% + + [NB. this is \textbf{subjective} probability as advocated by \eg De Finetti and Jaynes.] + } + +% Thus, we assume that musical schemata are encoded as probabilistic % \citep{Meyer56} models, and + \uncover<2->{% + We suppose that familiarity with different styles of music takes the form + of various probabilistic models, and that these models are adapted through listening.% + } +% various stylistic norms is encoded as +% using models that encode the statistics of music in general, the particular styles +% of music that seem best to fit the piece we happen to be listening to, and the emerging +% structures peculiar to the current piece. + + \uncover<3->{% + Experimental evidence that humans are able to internalise statistical + knowledge about musical: \citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}; and also + that statistical models are effective for computational analysis of music, \eg \cite{ConklinWitten95,Pearce2005}.% + } + + % analysis of music, \eg \cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}. +% \cite{Ferrand2002}. Dubnov and Assayag PSTs? +\end{iframe} + +\begin{iframe}[Music and information theory] + \uncover<1->{ + With probabilistic models in hand we can apply quantitative information theory: we can compute entropies, + relative entropies, mutual information, and all that. + } + + \uncover<2->{ + Lots of interest in application of information theory to perception, music and aesthetics since the 50s, + \eg Moles \cite{Moles66}, Meyer \cite{Meyer67}, Cohen \cite{Cohen1962}, Berlyne \cite{Berlyne71}. + (See also Bense, Hiller) + } + + \uncover<3->{ + Idea is that subjective qualities and + states like uncertainty, surprise, complexity, tension, and interestingness + are determined by information-theoretic quantities. + } + + \uncover<4->{ + Berlyne \cite{Berlyne71} called such quantities `collative variables', since they are + to do with patterns of occurrence rather than medium-specific details. + \emph{Information aesthetics}. + } +% Listeners then experience greater or lesser levels of surprise +% in response to departures from these norms. +% By careful manipulation +% of the material, the composer can thus define, and induce within the +% listener, a temporal programme of varying +% levels of uncertainty, ambiguity and surprise. +\end{iframe} + +\begin{iframe}[Probabilistic model-based observer hypothesis] + \begin{itemize} + \item<1-> + As we listen, we maintain a probabilistic model that enables + us to make predictions. As events unfold, we revise our probabilistic `belief state', + including predictions about the future. + \item<2-> + Probability distributions and changes in distributions are characterised in terms + of information theoretic-measures such as entropy and relative entropy (KL divergence). + \item<3-> + The dynamic evolution of these information measures captures significant structure, + \eg events that are surprising, informative, explanatory \etc + \end{itemize} + +\end{iframe} + +\begin{iframe}[Features of information dynamics] + \uncover<1->{ + \textbf{Abstraction}: sensitive mainly to \emph{patterns} of occurence, + rather than details of which specific things occur or the sensory medium. +% it operates at a level of abstraction removed from the details of the sensory experience and +% the medium through which it was received, suggesting that the same +% approach could, in principle, be used to analyse and compare information +% flow in different temporal media regardless of whether they are auditory, visual or otherwise. + } + + \uncover<2->{ + \textbf{Generality}: applicable in principle to any probabilistic model, in particular, + models with time-dependent latent variables such as HMMs. + Many important musical concepts like key, harmony, and beat are essentially `hidden variables'. + } + + \uncover<3->{ + \textbf{Richness}: when applied to models with latent variables, can result in many-layered + analysis, capturing information flow about harmony, tempo, \etc + } + + \uncover<4->{ + \textbf{Subjectivity}: all probabilities are \emph{subjective} probabilities relative to \emph{observer's} + model, which can depend on observer's capabilities and prior experience. + } +\end{iframe} + +\section{Surprise, entropy and information in random sequences} +\label{s:InfoInRandomProcs} + +\begin{iframe}[Information theory primer\nicedot Entropy] + Let $X$ be a discrete-valued random (in the sense of \emph{subjective} probability) variable. + Entropy is a measure of \emph{uncertainty}. If observer expects to see $x$ with probability $p(x)$, + then + \begin{align*} + H(X) &= \sum_{x\in\X} - p(x) \log p(x) \\ + &= \expect{[-\log p(X)]}. + \end{align*} + Consider $-\log p(x)$ as the `surprisingness' of $x$, then the entropy is the `expected surprisingness'. + High for spread out distributions and low for concentrated ones. +\end{iframe} + +\begin{iframe}[Information theory primer\nicedot Relative entropy] + Relative entropy or Kullback-Leibler (KL) divergence quantifies difference between + probability distributions. + If observer receives data $\mathcal{D}$, divergence between (subjective) prior and + posterior distributions is the + amount of information in $\mathcal{D}$ \emph{about} $X$ for this observer: + \[ + I(\mathcal{D}\to X) = + D(p_{X|\mathcal{D}} || p_X) + = \sum_{x\in\X} p(x|\mathcal{D}) \log \frac{p(x|\mathcal{D})}{p(x)}. + \] + If observing $\mathcal{D}$ causes a large change in belief about $X$, then $\mathcal{D}$ + contained a lot of information about $X$. + + Like Lindley's (1956) information (thanks Lars!). +\end{iframe} + +\begin{iframe}[Information theory primer\nicedot Mutual information] + Mutual information between (MI) $X_1$ and $X_2$ is the expected amount of information about + $X_2$ in an observation of $X_1$. Can be written in several ways: + \begin{align*} + I(X_1;X_2) &= \sum_{x_1,x_2} p(x_1,x_2) \log \frac{p(x_1,x_2)}{p(x_1)p(x_2)} \\ + &= H(X_1) + H(X_2) - H(X_1,X_2) \\ + &= H(X_2) - H(X_2|X_1). + \end{align*} + (1) Expected information about $X_2$ in an observation of $X_1$;\\ + (2) Expected reduction in uncertainty about $X_2$ after observing $X_1$;\\ + (3) Symmetric: $I(X_1;X_2) = I(X_2;X_1)$. +\end{iframe} + +\begin{iframe}[Information theory primer\nicedot Conditional MI] + Information in one variable about another given observations of some third variable. + Formulated analogously by adding conditioning variables to entropies: + \begin{align*} + I(X_1;X_2|X_3) &= H(X_1|X_3) - H(X_1|X_2,X_3). + \end{align*} + Makes explicit the dependence of information assessment on background knowledge, + represented by conditioning variables. +\end{iframe} + + +\begin{isframe}[Information theory primer\nicedot I-Diagrams] + \newcommand\rad{2.2em}% + \newcommand\circo{circle (3.4em)}% + \newcommand\labrad{4.3em} + \newcommand\bound{(-6em,-5em) rectangle (6em,6em)} + \newcommand\clipin[1]{\clip (#1) \circo;}% + \newcommand\clipout[1]{\clip \bound (#1) \circo;}% + \newcommand\cliptwo[3]{% + \begin{scope} + \clipin{#1}; + \clipin{#2}; + \clipout{#3}; + \fill[black!30] \bound; + \end{scope} + }% + \newcommand\clipone[3]{% + \begin{scope} + \clipin{#1}; + \clipout{#2}; + \clipout{#3}; + \fill[black!15] \bound; + \end{scope} + }% + Information diagrams are a Venn diagram-like represention of entropies and mutual + informations for a set of random variables. + \begin{center} + \begin{tabular}{c@{\ }c} + \scalebox{0.8}{% + \begin{tikzpicture}[baseline=0pt] + \coordinate (p1) at (90:\rad); + \coordinate (p2) at (210:\rad); + \coordinate (p3) at (-30:\rad); + \clipone{p1}{p2}{p3}; + \clipone{p2}{p3}{p1}; + \clipone{p3}{p1}{p2}; + \cliptwo{p1}{p2}{p3}; + \cliptwo{p2}{p3}{p1}; + \cliptwo{p3}{p1}{p2}; + \begin{scope} + \clip (p1) \circo; + \clip (p2) \circo; + \clip (p3) \circo; + \fill[black!45] \bound; + \end{scope} + \draw (p1) \circo; + \draw (p2) \circo; + \draw (p3) \circo; + \path + (barycentric cs:p3=1,p1=-0.2,p2=-0.1) +(0ex,0) node {$I_{3|12}$} + (barycentric cs:p1=1,p2=-0.2,p3=-0.1) +(0ex,0) node {$I_{1|23}$} + (barycentric cs:p2=1,p3=-0.2,p1=-0.1) +(0ex,0) node {$I_{2|13}$} + (barycentric cs:p3=1,p2=1,p1=-0.55) +(0ex,0) node {$I_{23|1}$} + (barycentric cs:p1=1,p3=1,p2=-0.55) +(0ex,0) node {$I_{13|2}$} + (barycentric cs:p2=1,p1=1,p3=-0.55) +(0ex,0) node {$I_{12|3}$} + (barycentric cs:p3=1,p2=1,p1=1) node {$I_{123}$} + ; + \path + (p1) +(140:\labrad) node {$X_1$} + (p2) +(-140:\labrad) node {$X_2$} + (p3) +(-40:\labrad) node {$X_3$}; + \end{tikzpicture}% + } + & + \parbox{0.5\linewidth}{ + \small + \begin{align*} + I_{1|23} &= H(X_1|X_2,X_3) \\ + I_{13|2} &= I(X_1;X_3|X_2) \\ + I_{1|23} + I_{13|2} &= H(X_1|X_2) \\ + I_{12|3} + I_{123} &= I(X_1;X_2) + \end{align*} + } + \end{tabular} + \end{center} +The areas of + the three circles represent $H(X_1)$, $H(X_2)$ and $H(X_3)$ respectively. + The total shaded area is the joint entropy $H(X_1,X_2,X_3)$. + Each undivided region is an \emph{atom} of the I-diagram. +\end{isframe} + + + + +\begin{isframe}[Information theory in sequences] + \def\bx{1.6em}% + \def\cn(#1,#2) {\node[circle,draw,fill=white,inner sep=0.2em] at(#1) {$#2$};}% + \def\dn(#1){\node[circle,inner sep=0.2em] at(#1) {$\cdots$};}% + \def\en(#1){coordinate(#1)}% + \def\tb{++(3.8em,0)}% + \def\lb(#1)#2{\path (#1)+(0,\bx) node[anchor=south] {#2};} + \def\nr(#1,#2,#3){\draw[rounded corners,fill=#3] (#1) rectangle (#2);}% + + Consider an observer receiving elements of a random sequence + $(\ldots, X_{-1}, X_0, X_1, X_2, \ldots)$, so that at any time $t$ there is + a `present' $X_t$, an observed pasti $\past{X}_t$, and an unobserved future + $\fut{X}_t$. Eg, at time $t=3$: + + \begin{figure} + \begin{tikzpicture}%[baseline=-1em] + \path (0,0) \en(X0) \tb \en(X1) \tb \en(X2) \tb \en(X3) \tb \en(X4) \tb \en(X5) \tb \en(X6); + \path (X0)+(-\bx,-\bx) \en(p1) (X2)+(\bx,\bx) \en(p2) + (X3)+(-\bx,-\bx) \en(p3) (X3)+(\bx,\bx) \en(p4) + (X4)+(-\bx,-\bx) \en(p5) (X6)+(\bx,\bx) \en(p6); + \nr(p1,p2,un3) \nr(p3,p4,un4) \nr(p5,p6,un5) + \dn(X0) \cn(X1,X_1) \cn(X2,X_2) \cn(X3,X_3) \cn(X4,X_4) \cn(X5,X_5) \dn(X6) + \lb(X1){Past: $\past{X}_3$} + \lb(X5){Future $\fut{X}_3$} + \lb(X3){Present} + \end{tikzpicture}%}% + \end{figure} + Consider how the observer's belief state evolves when, having observed up to + $X_2$, it learns the value of $X_3$. +\end{isframe} + +\begin{iframe}[`Surprise' based quantities] + To obtain first set of measures, we ignore the future $\fut{X}_t$ + and consider the probability distribution for $X_t$ give the + observed past $\past{X}_t=\past{x}_t$. + + \begin{enumerate} + \item<1-> + \textbf{Surprisingness}: negative log-probability + $\ell_t = -\log p(x_t|\past{x}_t)$. + + \item<2-> + Expected surprisingness given context $\past{X}=\past{x}_t$ is the entropy of the predictive distribution, + $H(X_t|\ev(\past{X}_t=\past{x}_t))$: uncertainty about $X_t$ before the observation is made. + + \item<3-> + Expectation over all possible realisations of process is the conditional entropy + $H(X_t|\past{X}_t)$ according to the observer's model. For stationary process, is + \emph{entropy rate} $h_\mu$. + \end{enumerate} +\end{iframe} + +\begin{iframe}[Predictive information] + Second set of measures based on amount of information the observation $\ev(X_t=x_t)$ + carries \emph{about} about the unobserved future $\fut{X}_t$, \emph{given} that we already + know the past $\ev(\past{X}_t=\past{x}_t)$: + is + \begin{equation*} + \mathcal{I}_t = I(\ev(X_t=x_t)\to\fut{X}_t|\ev(\past{X}_t=\past{x}_t)). + \end{equation*} + Is KL divergence between beliefs about future $\fut{X}_t$ prior and posterior + to observation $\ev(X_t=x_t)$. + Hence, for continuous valued variables, invariant to invertible + transformations of the observation spaces. +\end{iframe} + +\begin{iframe}[Predictive information based quantities] + \begin{enumerate} + \item<1-> + \emph{Instantaneous predictive information} (IPI) is just $\mathcal{I}_t$. + +% Expectations over $X|\ev(Z=z)$, $Z|\ev(X=x)$, and $(X,Z)$ give 3 more information measures: + \item<2-> + Expectation of $\mathcal{I}_t$ before observation at time $t$ is + $I(X_t;\fut{X}_t | \ev(\past{X}_t=\past{x}_t))$: mutual information conditioned on + observed past. Is the amount of new information about the future expected from the next observation. + Useful for directing attention towards the next event even before it happens? + +% This is different from Itti and Baldi's proposal that Bayesian +% \emph{surprise} attracts attention \cite{IttiBaldi2005}, as it is a mechanism which can +% operate \emph{before} the surprise occurs. + + + \item<3-> + Expectation over all possible realisations is the conditional mutual information + $I(X_t;\fut{X}_t|\past{X}_t)$. For stationary process, this is the global + \emph{predictive information rate} (PIR), the average rate at which new information arrives about + the future. In terms of conditional entropies, has two forms: + $H(\fut{X}_t|\past{X}_t) - H(\fut{X}_t|X_t,\past{X}_t)$ or + $H(X_t|\past{X}_t) - H(X_t|\fut{X}_t,\past{X}_t)$. + \end{enumerate} + +\end{iframe} + +\begin{iframe}[Global measures for stationary processes] + For a stationary random process model, the average levels of suprise and information + are captured by the time-shift invariant process information measures: + \begin{align*} + \text{entropy rate} &: & h_\mu &= H(X_t | \past{X}_t) \\ + \text{multi-information rate} &: & \rho_\mu &= I(\past{X}_t;X_t) = H(X_t) - h_\mu \\ + \text{residual entropy rate} &: & r_\mu &= H(X_t | \past{X}_t, \fut{X}_t) \\ + \text{predictive information rate} &: & b_\mu &= I(X_t;\fut{X}_t|\past{X}_t) = h_\mu - r_\mu + \end{align*} + Residual entropy also known as \emph{erasure entropy} \cite{VerduWeissman2006}. +\end{iframe} + +\begin{isframe}[Process I-diagrams] +% \newcommand\subfig[2]{\shortstack{#2\\[0.75em]#1}} + \newcommand\subfig[2]{#2} + \newcommand\rad{1.75em}% + \newcommand\ovoid[1]{% + ++(-#1,\rad) + -- ++(2 * #1,0em) arc (90:-90:\rad) + -- ++(-2 * #1,0em) arc (270:90:\rad) + }% + \newcommand\axis{2.75em}% + \newcommand\olap{0.85em}% + \newcommand\offs{3.6em} + \newcommand\longblob{\ovoid{\axis}} + \newcommand\shortblob{\ovoid{1.75em}} + \begin{figure} + \begin{tikzpicture}%[baseline=-1em] + \newcommand\rc{\rad} + \newcommand\throw{2.5em} + \coordinate (p1) at (180:1.5em); + \coordinate (p2) at (0:0.3em); + \newcommand\bound{(-7em,-2.6em) rectangle (7em,3.0em)} + \newcommand\present{(p2) circle (\rc)} + \newcommand\thepast{(p1) ++(-\throw,0) \ovoid{\throw}} + \newcommand\fillclipped[2]{% + \begin{scope}[even odd rule] + \foreach \thing in {#2} {\clip \thing;} + \fill[black!#1] \bound; + \end{scope}% + }% + \fillclipped{30}{\present,\bound \thepast} + \fillclipped{15}{\present,\bound \thepast} + \fillclipped{45}{\present,\thepast} + \draw \thepast; + \draw \present; + \node at (barycentric cs:p2=1,p1=-0.3) {$h_\mu$}; + \node at (barycentric cs:p2=1,p1=1) [shape=rectangle,fill=black!45,inner sep=1pt]{$\rho_\mu$}; + \path (p2) +(90:3em) node {$X_0$}; + \path (p1) +(-3em,0em) node {\shortstack{infinite\\past}}; + \path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$}; + \end{tikzpicture}% + \\[0.25em] + \begin{tikzpicture}%[baseline=-1em] + \newcommand\rc{2.2em} + \newcommand\throw{2.5em} + \coordinate (p1) at (210:1.5em); + \coordinate (p2) at (90:0.8em); + \coordinate (p3) at (-30:1.5em); + \newcommand\bound{(-7em,-2.6em) rectangle (7em,3.0em)} + \newcommand\present{(p2) circle (\rc)} + \newcommand\thepast{(p1) ++(-\throw,0) \ovoid{\throw}} + \newcommand\future{(p3) ++(\throw,0) \ovoid{\throw}} + \newcommand\fillclipped[2]{% + \begin{scope}[even odd rule] + \foreach \thing in {#2} {\clip \thing;} + \fill[black!#1] \bound; + \end{scope}% + }% +% \fillclipped{80}{\future,\thepast} + \fillclipped{30}{\present,\future,\bound \thepast} + \fillclipped{15}{\present,\bound \future,\bound \thepast} + \draw \future; + \fillclipped{45}{\present,\thepast} + \draw \thepast; + \draw \present; + \node at (barycentric cs:p2=0.9,p1=-0.17,p3=-0.17) {$r_\mu$}; + \node at (barycentric cs:p1=-0.5,p2=1.0,p3=1) {$b_\mu$}; + \node at (barycentric cs:p3=0,p2=1,p1=1.2) [shape=rectangle,fill=black!45,inner sep=1pt]{$\rho_\mu$}; + \path (p2) +(140:3.2em) node {$X_0$}; + % \node at (barycentric cs:p3=0,p2=1,p1=1) {$\rho_\mu$}; + \path (p3) +(3em,0em) node {\shortstack{infinite\\future}}; + \path (p1) +(-3em,0em) node {\shortstack{infinite\\past}}; + \path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$}; + \path (p3) +(4em,\rad) node [anchor=south] {$X_1,\ldots$}; + \end{tikzpicture}% +% \\[0.25em] +% The small dark +% region below $X_0$ is $\sigma_\mu$ and the excess entropy +% is $E = \rho_\mu + \sigma_\mu$. + \end{figure} + Marginal entropy of `present' $X_0$ is $H(X_0)=\rho_\mu+r_\mu+b_\mu$.\\ + Entropy rate is $h_\mu = r_\mu+b_\mu$. +\end{isframe} + +\section{Markov chains} +\label{s:InfoInMC} + + +\begin{iframe}[Markov chains\nicedot Definitions] + +% Now we'll look at information dynamics in one of the simplest possible models, a Markov chain. +% To illustrate the how the measures defined in \secrf{InfoInRandomProcs} can be computed +% in practice, we will consider one of the simplest random processes, a +% first order Markov chain. +% In this case, the dynamic information measures can be computed in closed-form. +% + + Let $X$ be a Markov chain with state space + $\{1, \ldots, K\}$, \ie the $X_t$ take values from $1$ to $K$. + \begin{center} + \begin{tikzpicture}[->] + \matrix[column sep=2em,ampersand replacement=\&]{ + \cn(X,1) \& \cn(X,2) \& \cn(X,3) \& \cn(X,4) \& \dn(XT) \\}; + \rl(X1,X2) \rl(X2,X3) \rl(X3,X4) \rl(X4,XT) + \end{tikzpicture} + \end{center} +% For the sake of brevity let us assume that $\domA$ is the set of integers from 1 to $K$. + Parameterised by transition matrix $\trans \in \reals^{K\times K}$, +% encoding the distribution of any element of the sequence given previous one, + \ie $p(\ev(X_{t+1}=i)|\ev(X_t=j))=\trans_{ij}$. + Assume irreducibility, ergodicity \etc to ensure uniqueness of + stationary distribution $\pi$ such that + $p(\ev(X_t=i))=\init_i$ independent of $t$. Entropy rate as a function of + $a$ is +% $\entrorate:\reals^{K\times K} \to \reals$: + \[ + \entrorate(\trans) = \sum_{j=1}^K \init_j \sum_{i=1}^K -\trans_{ij} \log \trans_{ij}. + \] +\end{iframe} + +\begin{iframe}[Markov chains\nicedot PIR] + Predictive information rate for first order chains comes out in terms of entropy rate + function as + \[ + b_\mu = h(a^2) - h(a), + \] + where $a^2$ is \emph{two-step} transition matrix. + + \uncover<2->{ + Can be generalised to higher-order transition matrices + \[ + b_\mu = h(\hat{a}^{N+1}) - Nh(\hat{a}), + \] + where $N$ is the order of the chain and $\hat{a}$ is a sparse + $K^N\times K^N$ transition matrix over product state space of $N$ + consecutive observations (step size 1). + } +\end{iframe} + +\begin{iframe}[Entropy rate and PIR in Markov chains] + + \begin{fig}{artseq} + \hangbox{\colfig[0.40]{matbase/fig8515}}% + \quad + \hangbox{% + \begin{tabular}{cc}% + \colfig[0.18]{matbase/fig1356} & + \colfig[0.18]{matbase/fig45647} \\ + \colfig[0.18]{matbase/fig49938} & + \colfig[0.18]{matbase/fig23355}% + \end{tabular}% + }% +% \end{hanging}\\ + \end{fig} + For given $K$, entropy rate varies between 0 (deterministic sequence) + and $\log K$ when $\trans_{ij}=1/K$ for all $i,j$. + Space of transition matrices explored by generating + them at random and plotting entropy rate vs PIR. (Note inverted + `U' relationship). %Transmat (d) is almost uniform. +\end{iframe} + +\begin{iframe}[Samples from processes with different PIR] + \begin{figure} + \colfig[0.75]{matbase/fig847}\\ + \colfig[0.75]{matbase/fig61989}\\ + \colfig[0.75]{matbase/fig43415}\\ + \colfig[0.75]{matbase/fig50385} + \end{figure} + Sequence (a) is repetition + of state 4 (see transmat (a) on previous slide). + System (b) has the highest PIR. +\end{iframe} + +% \begin{tabular}{rl} +% (a) & \raisebox{-1em}{\colfig[0.58]{matbase/fig9048}}\\[1em] +% (b) & \raisebox{-1em}{\colfig[0.58]{matbase/fig58845}}\\[1em] +% (c) & \raisebox{-1em}{\colfig[0.58]{matbase/fig45019}}\\[1em] +% (d) & \raisebox{-1em}{\colfig[0.58]{matbase/fig1511}} +% \end{tabular} + +\section{Application: The Melody Triangle} +\begin{iframe}[Complexity and interestingness: the Wundt Curve] + \label{s:Wundt} + Studies looking into the relationship between stochastic complexity + (usually measured as entropy or entropy rate) and aesthetic value, reveal + an inverted `U' shaped curve \citep{Berlyne71}. (Also, Wundt curve \cite{Wundt1897}). + Repeated exposure tends to move stimuli leftwards. + + \hangbox{% + \only<1>{\colfig[0.5]{wundt}}% + \only<2>{\colfig[0.5]{wundt2}}% + }\hfill + \hangbox{\parbox{0.43\linewidth}{\raggedright + %Too deterministic $\rightarrow$ predictable, boring like a monotone;\\ + %Too random $\rightarrow$ are boring like white noise: unstructured, + %featureless, uniform. + Explanations for this usually appeal to a need for a `balance' + between order and chaos, unity and diversity, and so on, in a generally + imprecise way.}} + + +% Hence, a sequence can be uninteresting in two opposite ways: by +% being utterly predictable \emph{or} by being utterly +% unpredictable. +% Meyer \cite{Meyer2004} suggests something similar: +% hints at the same thing while discussing +% the relation between the rate of information flow and aesthetic experience, +% suggesting that +%% `unless there is some degree of order, \ldots +%% there is nothing to be uncertain \emph{about} \ldots +% `If the amount of information [by which he means entropy and surprisingness] +% is inordinately increased, the result is a kind of cognitive white noise.' + +\end{iframe} + +\begin{iframe}[PIR as a measure of cognitive activity] + + The predictive information rate incorporates a similar balance automatically: + is maximal for sequences which are neither deterministic nor + totally uncorrelated across time. + + \vspace{1em} + \begin{tabular}{rr}% + \raisebox{0.5em}{too predictable:} & + \only<1>{\noderow(black,un0,un0,un0,un1,un1)}% + \only<2>{\noderow(black,black,un0,un0,un0,un1)}% + \only<3>{\noderow(black,black,black,un0,un0,un0)}% + \only<4>{\noderow(black,black,black,black,un0,un0)}% + \\[1.2em] + \raisebox{0.5em}{intermediate:} & + \only<1>{\noderow(black,un1,un2,un3,un4,un5)}% + \only<2>{\noderow(black,black,un1,un2,un3,un4)}% + \only<3>{\noderow(black,black,black,un1,un2,un3)}% + \only<4>{\noderow(black,black,black,black,un1,un2)}% + \\[1.2em] + \raisebox{0.5em}{too random:} & + \only<1>{\noderow(black,un5,un5,un5,un5,un5)}% + \only<2>{\noderow(black,black,un5,un5,un5,un5)}% + \only<3>{\noderow(black,black,black,un5,un5,un5)}% + \only<4>{\noderow(black,black,black,black,un5,un5)}% + \end{tabular} + \vspace{1em} + + (Black: \emph{observed}; red: \emph{unobserved}; paler: \emph{greater uncertainty}.) + Our interpretation: +% when each event appears to carry no new information about the unknown future, +% it is `meaningless' and not worth attending to. + Things are `interesting' or at least `salient' when each new part supplies new information about parts to come. + +% Quantitative information dynamics will enable us to test this experimentally with human +% subjects. +\end{iframe} + +\begin{iframe}[The Melody Triangle\nicedot Information space] + \begin{figure} + \colfig[0.75]{mtriscat} + \end{figure} + Population of transition matrices in 3D space of $h_\mu$, $\rho_\mu$ and $b_\mu$. +% Concentrations of points along redundancy axis correspond to roughly periodic patterns. + Colour of each point + represents PIR. + %---highest values found at intermediate entropy and redundancy. + Shape is mostly (not completely) hollow inside: forming roughly + a curved triangular sheet. +\end{iframe} + +\begin{iframe}[The Melody Triangle\nicedot User interface] + \begin{figure} + \colfig[0.55]{TheTriangle.pdf} + \end{figure} + Allows user to place tokens in the triangle + to cause sonification of a Markov chain with corresponding information + `coordinate'. +\end{iframe} + +\begin{iframe}[Subjective information] + So far we've assumed that sequence is actually sampled from + from a stationary Markov chain with a transition matrix known + to the observer. + This means time averages of IPI and surprise should equal + expectations. + + \uncover<2->{ + What if sequence is sampled from some other Markov chain, + or is produced by some unknown process? + } + + \begin{itemize} + \item<3-> + In general, it may be impossible to identify any `true' model. There + are no `objective' probabilities; only subjective ones, as + argued by de Finetti \cite{deFinetti}. + + + \item<4-> + If sequence \emph{is} sampled from some Markov chain, we can + compute (time) averages of observer's average subjective surprise + and PI and also track what happens if observer gradually learns + the transition matrix from the data. + \end{itemize} +\end{iframe} + + +\begin{iframe}[Effect of learning on information dynamics] + \begin{figure} +% \colfig{matbase/fig42687} % too small text +% \colfig{matbase/fig60379} % 9*19 too tall +% \colfig{matbase/fig52515} % 9*20 ok, perhaps text still too small + \colfig[0.9]{matbase/fig30461} % 8*19 ok +% \colfig{matbase/fig66022} % 8.5*19 ok + \end{figure} +% Upper row shows actual stochastic learning, +% lower shows the idealised deterministic learning. + \textbf{(a/b/e/f)}: multiple runs starting from same + initial condition but using different generative transition matrices. + \textbf{(c/d/g/h)}: multiple runs starting from different + initial conditions and converging on transition matrices + with (c/g) high and (d/h) low PIR. +\end{iframe} + + +\section{More process models} +\begin{iframe}[Exchangeable sequences and parametric models] + De Finetti's theorem says that an exchangeable random process can be represented + as a sequence variables which are iid \emph{given} some hidden probability + distribution, which we can think of as a parameterised model: + \begin{tabular}{lp{0.45\linewidth}} + \hangbox{\begin{tikzpicture} + [>=stealth',var/.style={circle,draw,inner sep=1pt,text height=10pt,text depth=4pt}] + \matrix[ampersand replacement=\&,matrix of math nodes,row sep=2em,column sep=1.8em,minimum size=17pt] { + \& |(theta) [var]| \Theta \\ + |(x1) [var]| X_1 \& |(x2) [var]| X_2 \& |(x3) [var]| X_3 \& + |(etc) [outer sep=2pt]| \dots \\ + }; + \foreach \n in {x1,x2,x3,etc} \draw[->] (theta)--(\n); + \end{tikzpicture}} + & + \raggedright + \uncover<2->{Observer's belief state at time $t$ includes probability distribution + over the parameters $p(\ev(\Theta=\theta)|\ev(\past{X}_t=\past{x}_t))$.} + \end{tabular}\\[1em] + \uncover<3->{ + Each observation causes revision of belief state + and hence supplies information + $ + I(\ev(X_t=x_t)\to\Theta|\ev(\past{X}_t=\past{x}_t)) + % = D( p_{\Theta|\ev(X_t=x_t),\ev(\past{X}_t=\past{x}_t)} || p_{\Theta|\ev(\past{X}_t=\past{x}_t)} ). + $ about $\Theta$: + In previous work we called this the `model information rate'. + } + \uncover<4->{(Same as Haussler and Opper's \cite{HausslerOpper1995} IIG or + Itti and Baldi's \cite{IttiBaldi2005} Bayesian surprise.)} +\end{iframe} + + \def\circ{circle (9)}% + \def\bs(#1,#2,#3){(barycentric cs:p1=#1,p2=#2,p3=#3)}% +\begin{iframe}[IIG equals IPI in (some) XRPs] + \begin{tabular}{@{}lc} + \parbox[c]{0.5\linewidth}{\raggedright + Mild assumptions yield a relationship between IIG (instantaneous information gain) and IPI. + (Everything here implicitly conditioned on $\past{X}_t$).} + & + \pgfsetxvec{\pgfpoint{1mm}{0mm}}% + \pgfsetyvec{\pgfpoint{0mm}{1mm}}% + \begin{tikzpicture}[baseline=0pt] + \coordinate (p1) at (90:6); + \coordinate (p2) at (210:6); + \coordinate (p3) at (330:6); + \only<4->{% + \begin{scope} + \foreach \p in {p1,p2,p3} \clip (\p) \circ; + \fill[lightgray] (-10,-10) rectangle (10,10); + \end{scope} + \path (0,0) node {$\mathcal{I}_t$};} + \foreach \p in {p1,p2,p3} \draw (\p) \circ; + \path (p2) +(210:13) node {$X_t$} + (p3) +(330:13) node {$\fut{X}_t$} + (p1) +(140:12) node {$\Theta$}; + \only<2->{\path \bs(-0.25,0.5,0.5) node {$0$};} + \only<3->{\path \bs(0.5,0.5,-0.25) node {$0$};} + \end{tikzpicture} + \end{tabular}\\ + \begin{enumerate} + \uncover<2->{\item $X_t \perp \fut{X}_t | \Theta$: observations iid given $\Theta$ for XRPs;} + \uncover<3->{\item $\Theta \perp X_t | \fut{X}_t$: +% $I(X_t;\fut{X}_t|\Theta_t)=0$ due to the conditional independence of +% observables given the parameters $\Theta_t$, and +% $I(\Theta_t;X_t|\fut{X}_t)=0$ + assumption that $X_t$ adds no new information about $\Theta$ + given infinitely long sequence $\fut{X}_t =X_{t+1:\infty}$.} +\end{enumerate} +\uncover<4->{Hence, $I(X_t;\Theta_t|\past{X}_t)=I(X_t;\fut{X}_t|\past{X}_t) = \mathcal{I}_t$.\\} +\uncover<5->{Can drop assumption 1 and still get $I(X_t;\Theta_t|\past{X}_t)$ as an additive component (lower bound) of $\mathcal{I}_t$.} +\end{iframe} + +\def\fid#1{#1} +\def\specint#1{\frac{1}{2\pi}\int_{-\pi}^\pi #1{S(\omega)} \dd \omega} +\begin{iframe}[Discrete-time Gaussian processes] + Information-theoretic quantities used earlier have analogues for continuous-valued + random variables. For stationary Gaussian processes, we can obtain results in + terms of the power spectral density $S(\omega)$, (which for discrete time is periodic + in $\omega$ with period $2\pi$). Standard methods give + \begin{align*} + H(X_t) &= \frac{1}{2}\left( \log 2\pi e + \log \specint{}\right), \\ + h_\mu &= \frac{1}{2} \left( \log 2\pi e + \specint{\log} \right), \\ + \rho_\mu &= \frac{1}{2} \left( \log \specint{\fid} - \specint{\log}\right). + \end{align*} + Entropy rate is also known as Kolmogorov-Sinai entropy. +% $H(X_t)$ is a function of marginal variance which is just the total power in the spectrum. +\end{iframe} + +\begin{iframe}[PIR/Multi-information duality] + Analysis yeilds PIR: + \[ + b_\mu = \frac{1}{2} \left( \log \specint{\frac{1}} - \specint{\log\frac{1}} \right). + \] + Yields simple expression for finite-order autogregressive processes, but beware: can diverge + for moving average processes! + + \uncover<2->{ + Compare with multi-information rate: + \[ + \rho_\mu = \frac{1}{2} \left( \log \specint{\fid} - \specint{\log}\right). + \] + Yields simple expression for finite-order moving-average processes, but can diverge + for marginally stable autogregressive processes. + } + + \uncover<3->{ + Infinities are troublesome and point to problem with notion of infinitely + precise observation of continuous-valued variables. + } +\end{iframe} + +% Information gained about model parameters (measured as the KL divergence +% between prior and posterior distributions) is equivalent +% to \textbf{Itti and Baldi's `Bayesian surprise'} \cite{IttiBaldi2005}. + + + \section{Application: Analysis of minimalist music} + \label{s:Experiments} + +\begin{iframe}[Material and Methods] + +% Returning to our original goal of modelling the perception of temporal structure +% in music, we computed dynamic information measures for + We took two pieces of minimalist + music by Philip Glass, \emph{Two Pages} (1969) and \emph{Gradus} (1968). + Both monophonic and isochronous, so representable very simply as + a sequence of symbols (notes), one symbol per beat, + yet remain ecologically valid examples of `real' music. + + We use an elaboration of the Markov chain model---not necessarily + a good model \latin{per se}, but that wasn't the point of the experiment. + Markov chain model was chosen as it is tractable from and information + dynamics point of view while not being completely trivial. +\end{iframe} + +\begin{iframe}[Time-varying transition matrix model] + We allow transition matrix to vary slowly with time to track + changes in the sequence structure. + Hence, observer's belief state includes a probabilitiy + distribution over transition matrices; we choose a product of + Dirichlet distributions: + \[ + \textstyle + p(\trans|\param) = \prod_{j=1}^K p_\mathrm{Dir}(\trans_{:j}|\param_{:j}), + \] + where $\trans_{:j}$ is \nth{j} column of $\trans$ and $\param$ is an + $K \times K$ parameter matrix. +% (Dirichlet, being conjugate to discrete/multinomial distribution, +% makes processing of observations particularly simple.) +% such that $\param_{:j}$ is the +% parameter tuple for the $K$-component Dirichlet distribution $p_\mathrm{Dir}$. +% \begin{equation} +% \textstyle +% p(\trans|\param) = \prod_{j=1}^K p_\mathrm{Dir}(\trans_{:j}|\param_{:j}) +% = \prod_{j=1}^K (\prod_{i=1}^K \trans_{ij}^{\param_{ij}-1}) / B(\param_{:j}), +% \end{equation} +% where $\trans_{:j}$ is the \nth{j} column of $\trans$ and $\param$ is an +% $K \times K$ matrix of parameters. + + At each time step, distribution first \emph{spreads} under mapping + \[ + \param_{ij} \mapsto \frac{\beta\param_{ij}}{(\beta + \param_{ij})} + \] + to model possibility that transition matrix + has changed ($\beta=2500$ in our experiments). Then it \emph{contracts} + due to new observation providing fresh evidence about transition matrix. +% +% Each observed symbol % provides fresh evidence about current transition matrix, +% enables observer to update its belief state. +\end{iframe} + + +\begin{iframe}[Two Pages\nicedot Results] + +% \begin{fig}{twopages} + \begin{tabular}{c@{\hspace{1.5ex}}l}% +% \hspace*{-1.5em} +% \hangbox{\colfig[0.5]{matbase/fig20304}} % 3 plots +% \hangbox{\colfig[0.52]{matbase/fig39528}} % 4 plots with means +% \hangbox{\colfig[0.52]{matbase/fig63538}} % two pages, 5 plots +% \hangbox{\colfig[0.52]{matbase/fig53706}} % two pages, 5 plots + \hangbox{\colfig[0.72]{matbase/fig33309}} % two pages, 5 plots + & + \hangbox{% + \parbox{0.28\linewidth}{ + \raggedright + \textbf{Thick lines:} part boundaries as indicated + by Glass; \textbf{grey lines (top four panels):} changes in the melodic `figures'; + % of which the piece is constructed. + \textbf{grey lines (bottom panel):} + six most surprising moments chosen by expert listenter. + } + } + \end{tabular} +% \end{fig} +\end{iframe} + +\begin{iframe}[Two Pages\nicedot Rule based analysis] + \begin{figure} + \colfig[0.98]{matbase/fig13377} +% \hangbox{\colfig[0.98]{matbase/fig13377}} + \end{figure} + Analysis of \emph{Two Pages} using (top) Cambouropoulos' + Local Boundary Detection Model (LBDM) and + (bottom) Lerdahl and Jackendoff's + grouping preference rule 3a (GPR3a), which is a function of pitch proximity. + Both analyses indicate `boundary strength'. +\end{iframe} + +\begin{iframe}[Two Pages\nicedot Discussion] + Correspondence between the information + measures and the structure of the piece is quite close. + Good agreement between the six `most surprising + moments' chosen by expert listener and model information signal. + + What appears to be an error in the detection of + the major part boundary (between events 5000 and 6000) actually + raises a known anomaly in the score, where Glass places the boundary several events + before there is any change in the pattern of notes. Alternative analyses of \emph{Two Pages} + place the boundary in agreement with peak in our surprisingness signal. +\end{iframe} + +\comment{ +\begin{iframe}[Gradus\nicedot Results] + +% \begin{fig}{gradus} + \begin{tabular}{c@{\hspace{1.5ex}}l} +% & +% \hangbox{\colfig[0.4]{matbase/fig81812}} +% \hangbox{\colfig[0.52]{matbase/fig23177}} % two pages, 5 plots +% \hangbox{\colfig[0.495]{matbase/fig50709}} % Fudged segmentation +% \hangbox{\colfig[0.495]{matbase/fig3124}} % Geraint's segmentation + \hangbox{\colfig[0.715]{matbase/fig11808}} % Geraint's segmentation, corrected + & +% \hangbox{\colfig[0.5]{matbase/fig39914}} + \hangbox{% + \parbox{0.28\linewidth}{ + \raggedright + \textbf{Thick lines:} part boundaries as indicated + by the composer. + \textbf{Grey lines:} segmentation by expert listener. + + Note: traces smoothed with Gaussian + window about 16 events wide. + } + } + \end{tabular} +% \end{fig} +\end{iframe} + +\begin{iframe}[Gradus\nicedot Rule based analysis] + \begin{figure} + \colfig[0.98]{matbase/fig58691} + \end{figure} + Boundary strength analysis of \emph{Gradus} using (top) Cambouropoulos' + \cite{CambouropoulosPhD} Local Boundary Detection Model and + (bottom) Lerdahl and Jackendoff's \cite{LerdahlJackendoff83} + grouping preference rule 3a. +\end{iframe} +} +\begin{iframe}[Gradus\nicedot Metrical analysis] + \begin{figure} + \begin{tabular}{cc} + \colfig[0.40]{matbase/fig56807} & \colfig[0.41]{matbase/fig27144} \\ + \colfig[0.40]{matbase/fig87574} & \colfig[0.41]{matbase/fig13651} \\ + \hspace*{1ex}\colfig[0.39]{matbase/fig19913} & \hspace*{1ex}\colfig[0.40]{matbase/fig66144} + \end{tabular} + \end{figure} +\end{iframe} + +\comment{ +\begin{iframe}[Gradus\nicedot Discussion] + + \emph{Gradus} is much less systematically structured than \emph{Two Pages}, and + relies more on the conventions of tonal music, which are not represented the model. + + For example initial transition matrix is uniform, which does not correctly represent + prior knowledge about tonal music. + + Information dynamic analysis does not give such a + clear picture of the structure; but some of the fine structure can be related + to specific events in the music (see Pearce and Wiggins 2006). +% nonetheless, there are some points of correspondence between the analysis and +% segmentation given by Keith Potter. + +\end{iframe} +} + + \section{Application: Beat tracking and rhythm} + + \begin{iframe}[Bayesian beat tracker] + \uncover<1->{ + Works by maintaining probabilistic belief state about time of next + beat and current tempo. + + \begin{figure} + \colfig{beat_prior} + \end{figure} + } + + \uncover<2->{ + Receives categorised drum events (kick or snare) from audio analysis front-end. + } + + \end{iframe} + + \begin{iframe}[Information gain in the beat tracker] + \begin{tabular}{ll} + \parbox[t]{0.43\linewidth}{\raggedright + \uncover<1->{ + Each event triggers a change in belief state, so we can compute + information gain about beat parameters.}\\[1em] + + \uncover<2->{ + Relationship between IIG and IPI + means we treat it as a proxy for IPI.} + } + & + \hangbox{\colfig[0.55]{beat_info}} + \end{tabular} + \end{iframe} + + \begin{iframe}[Analysis of drum patterns] + We analysed 17 recordings of drummers, both playing solo or with a band. + All patterns in were in 4/4. + \begin{itemize} + \item + \uncover<1->{ + Information tends to arrive at beat times: consequence of structure of model. + } + \item + \uncover<2->{ + Lots of information seems to arrive after drum fills and breaks + as the drummer reestablishes the beat. + } + \item + \uncover<3->{ + No consistent pattern of information arrival in relation to metrical + structure, so no obvious metrical structure in micro-timing of events. + However, still possible that metrical structure might emerge from predictive + analysis of drum pattern. + } + \end{itemize} + \end{iframe} + + \section{Summary and conclusions} + \label{s:Conclusions} + + \begin{iframe}[Summary] + + \begin{itemize} + \item Dynamic, observer-centric information theory. + \item Applicable to any dynamic probabilistic model. + \item PIR potentially a measure of complexity. + \item Simple analysis for Markov chains and Gaussian processes. + \item Applications in music analysis and composition. + \item Search for neural correlates is ongoing (that's another talk\ldots). + \end{itemize} + Thanks! + \end{iframe} + + \begin{bframe}[Bibliography] + \bibliographystyle{alpha} + {\small \bibliography{all,c4dm,compsci}} + \end{bframe} +\end{document}