cip2012: draft.tex comparison

comparison draft.tex @ 70:2cb06db0d271

FINISHED!

author	samer
date	Sat, 17 Mar 2012 18:06:03 +0000
parents	3fa185431bbc
children	9135f6fb1a68

comparison

equal deleted inserted replaced

-:3fa185431bbc
+:2cb06db0d271
 \section{Introduction}
 \label{s:Intro}
 	The relationship between
 	Shannon's \cite{Shannon48} information theory and music and art in general has been the
 	subject of some interest since the 1950s
-	\cite{Youngblood58,CoonsKraehenbuehl1958,HillerBean66,Moles66,Meyer67,Cohen1962}.
+	\cite{Youngblood58,CoonsKraehenbuehl1958,Moles66,Meyer67,Cohen1962}.
 	The general thesis is that perceptible qualities and subjective states
 	like uncertainty, surprise, complexity, tension, and interestingness
 	are closely related to information-theoretic quantities like
 	entropy, relative entropy, and mutual information.
 	immediately, after some delay, or modified as the music unfolds.
 	In this paper, we explore this ``Information Dynamics'' view of music,
 	discussing the theory behind it and some emerging applications.
 	\subsection{Expectation and surprise in music}
-	The thesis that the musical experience is strongly shaped by the generation
+	The idea that the musical experience is strongly shaped by the generation
 	and playing out of strong and weak expectations was put forward by, amongst others,
 	music theorists L. B. Meyer \cite{Meyer67} and Narmour \citep{Narmour77}, but was
 	recognised much earlier; for example,
 	it was elegantly put by Hanslick \cite{Hanslick1854} in the
 	nineteenth century:
 We suppose that when we listen to music, expectations are created on the basis
 	of our familiarity with various styles of music and our ability to
 	detect and learn statistical regularities in the music as they emerge,
 	There is experimental evidence that human listeners are able to internalise
 	statistical knowledge about musical structure, \eg
-	\citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also
+%	\citep{SaffranJohnsonAslin1999,EerolaToiviainenKrumhansl2002}, and also
+	\citep{SaffranJohnsonAslin1999}, and also
 	that statistical models can form an effective basis for computational
 	analysis of music, \eg
 	\cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}.
 %	\subsection{Music and information theory}
 	With a probabilistic framework for music modelling and prediction in hand,
-	we are in a position to compute various
+	we can %are in a position to
+	compute various
 \comment{
 	which provides us with a number of measures, such as entropy
 and mutual information, which are suitable for quantifying states of
 uncertainty and surprise, and thus could potentially enable us to build
 quantitative models of the listening process described above.  They are
 %	listener, a temporal programme of varying
 %	levels of uncertainty, ambiguity and surprise.
 \subsection{Information dynamic approach}
-	Our working hypothesis is that, as a
+	Our working hypothesis is that, as an intelligent, predictive
-	listener (to which will refer as `it') listens to a piece of music, it maintains
+	agent (to which will refer as `it') listens to a piece of music, it maintains
-	a dynamically evolving probabilistic model that enables it to make predictions
+	a dynamically evolving probabilistic belief state that enables it to make predictions
 	about how the piece will continue, relying on both its previous experience
 	of music and the emerging themes of the piece.  As events unfold, it revises
-	its probabilistic belief state, which includes predictive
+	this belief state, which includes predictive
 	distributions over possible future events.  These
 %	distributions and changes in distributions
 	can be characterised in terms of a handful of information
 	theoretic-measures such as entropy and relative entropy.  By tracing the
 	evolution of a these measures, we obtain a representation which captures much
 	of the significant structure of the music.
-	One of the consequences of this approach is that regardless of the details of
+	One consequence of this approach is that regardless of the details of
 	the sensory input or even which sensory modality is being processed, the resulting
 	analysis is in terms of the same units: quantities of information (bits) and
 	rates of information flow (bits per second). The information
 	theoretic concepts in terms of which the analysis is framed are universal to all sorts
 	of data.
 		\mathcal{I}_t = \sum_{\fut{x}_t \in \X^*}
 					p(\fut{x}_t|x_t,\past{x}_t) \log \frac{ p(\fut{x}_t|x_t,\past{x}_t) }{ p(\fut{x}_t|\past{x}_t) },
 	\end{equation}
 	where the sum is to be taken over the set of infinite sequences $\X^*$.
 	Note that it is quite possible for an event to be surprising but not informative
-	in predictive sense.
+	in a predictive sense.
 	As with the surprisingness, the observer can compute its \emph{expected} IPI
 	at time $t$, which reduces to a mutual information $I(X_t;\fut{X}_t|\ev(\past{X}_t=\past{x}_t))$
 	conditioned on the observed past. This could be used, for example, as an estimate
 	of attentional resources which should be directed at this stream of data, which may
 	be in competition with other sensory streams.
 					\node at (barycentric cs:p2=1,p1=1) [shape=rectangle,fill=black!45,inner sep=1pt]{$\rho_\mu$};
 					\path (p2) +(90:3em) node {$X_0$};
 					\path (p1) +(-3em,0em) node  {\shortstack{infinite\\past}};
 					\path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
 				\end{tikzpicture}}%
-			\\[1.25em]
+			\\[1em]
 			\subfig{(b) excess entropy}{%
 				\newcommand\blob{\longblob}
 				\begin{tikzpicture}
 					\coordinate (p1) at (-\offs,0em);
 					\coordinate (p2) at (\offs,0em);
 					\path (0,0) node (future) {$E$};
 					\path (p1) +(-2em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
 					\path (p2) +(2em,\rad) node [anchor=south] {$X_0,\ldots$};
 				\end{tikzpicture}%
 			}%
-			\\[1.25em]
+			\\[1em]
 			\subfig{(c) predictive information rate $b_\mu$}{%
 				\begin{tikzpicture}%[baseline=-1em]
 					\newcommand\rc{2.1em}
 					\newcommand\throw{2.5em}
 					\coordinate (p1) at (210:1.5em);
 					\path (p3) +(3em,0em) node  {\shortstack{infinite\\future}};
 					\path (p1) +(-3em,0em) node  {\shortstack{infinite\\past}};
 					\path (p1) +(-4em,\rad) node [anchor=south] {$\ldots,X_{-1}$};
 					\path (p3) +(4em,\rad) node [anchor=south] {$X_1,\ldots$};
 				\end{tikzpicture}}%
-				\\[0.5em]
+				\\[0.25em]
 		\end{tabular}
 		\caption{
 		I-diagrams for several information measures in
 		stationary random processes. Each circle or oval represents a random
 		variable or sequence of random variables relative to time $t=0$. Overlapped areas
 %	information} rate.
 	\subsection{First and higher order Markov chains}
 	\label{s:markov}
-	First order Markov chains are the simplest non-trivial models to which information
+%	First order Markov chains are the simplest non-trivial models to which information
-	dynamics methods can be applied. In \cite{AbdallahPlumbley2009} we derived
+%	dynamics methods can be applied.
+	In \cite{AbdallahPlumbley2009} we derived
 	expressions for all the information measures described in \secrf{surprise-info-seq} for
-	ergodic Markov chains (\ie that have a unique stationary
+	ergodic first order Markov chains (\ie that have a unique stationary
 	distribution).
 %	The derivation is greatly simplified by the dependency structure
 %	of the Markov chain: for the purpose of the analysis, the `past' and `future'
 %	segments $\past{X}_t$ and $\fut{X}_t$ can be collapsed to just the previous
 %	and next variables $X_{t-1}$ and $X_{t+1}$ respectively.
 	We also showed that
-	the predictive information rate can be expressed simply in terms of entropy rates:
+	the PIR can be expressed simply in terms of entropy rates:
 	if we let $a$ denote the $K\times K$ transition matrix of a Markov chain over
 	an alphabet of $\{1,\ldots,K\}$, such that
 	$a_{ij} = \Pr(\ev(X_t=i|\ev(X_{t-1}=j)))$, and let $h:\reals^{K\times K}\to \reals$ be
 	the entropy rate function such that $h(a)$ is the entropy rate of a Markov chain
-	with transition matrix $a$, then the predictive information rate is
+	with transition matrix $a$, then the PIR is
 	\begin{equation}
 		b_\mu = h(a^2) - h(a),
 	\end{equation}
 	where $a^2$, the transition matrix squared, is the transition matrix
 	of the `skip one' Markov chain obtained by jumping two steps at a time
 	along the original chain.
 	Second and higher order Markov chains can be treated in a similar way by transforming
-	to a first order representation of the high order Markov chain. If we are dealing
+	to a first order representation of the high order Markov chain. With
-	with an $N$th order model, this is done forming a new alphabet of size $K^N$
+	an $N$th order model, this is done by forming a new alphabet of size $K^N$
 	consisting of all possible $N$-tuples of symbols from the base alphabet.
 	An observation $\hat{x}_t$ in this new model encodes a block of $N$ observations
-	$(x_{t+1},\ldots,x_{t+N})$ from the base model. The next
+	$(x_{t+1},\ldots,x_{t+N})$ from the base model.
-	observation $\hat{x}_{t+1}$ encodes the block of $N$ obtained by shifting the previous
+%	The next
-	block along by one step. The new Markov of chain is parameterised by a sparse $K^N\times K^N$
+%	observation $\hat{x}_{t+1}$ encodes the block of $N$ obtained by shifting the previous
+%	block along by one step.
+	The new Markov of chain is parameterised by a sparse $K^N\times K^N$
 	transition matrix $\hat{a}$, in terms of which the PIR is
 	\begin{equation}
 		h_\mu = h(\hat{a}), \qquad b_\mu = h({\hat{a}^{N+1}}) - N h({\hat{a}}),
 	\end{equation}
 	where $\hat{a}^{N+1}$ is the $(N+1)$th power of the first order transition matrix.
 	Other information measures can also be computed for the high-order Markov chain, including
-	the multi-information rate $\rho_\mu$ and the excess entropy $E$. These are identical
+	the multi-information rate $\rho_\mu$ and the excess entropy $E$. (These are identical
 	for first order Markov chains, but for order $N$ chains, $E$ can be up to $N$ times larger
-	than $\rho_\mu$.
+	than $\rho_\mu$.)
-	In our early experiments with visualising and sonifying sequences sampled from
+	In our experiments with visualising and sonifying sequences sampled from
 	first order Markov chains \cite{AbdallahPlumbley2009}, we found that
-	the measures $h_\mu$, $\rho_\mu$ and $b_\mu$ are related to perceptible
+	the measures $h_\mu$, $\rho_\mu$ and $b_\mu$ correspond to perceptible
-	characteristics, and that the kinds of transition matrices maximising or minimising
+	characteristics, and that the transition matrices maximising or minimising
 	each of these quantities are quite distinct. High entropy rates are associated
-	with completely uncorrelated sequences with no recognisable temporal structure,
+	with completely uncorrelated sequences with no recognisable temporal structure
-	along with low $\rho_\mu$ and $b_\mu$.
+	(and low $\rho_\mu$ and $b_\mu$).
-	High values of $\rho_\mu$ are associated with long periodic cycles, low $h_\mu$
+	High values of $\rho_\mu$ are associated with long periodic cycles (and low $h_\mu$
-	and low $b_\mu$. High values of $b_\mu$ are associated with intermediate values
+	and $b_\mu$). High values of $b_\mu$ are associated with intermediate values
 	of $\rho_\mu$ and $h_\mu$, and recognisable, but not completely predictable,
 	temporal structures. These relationships are visible in \figrf{mtriscat} in
-	\secrf{composition}, where we pick up the thread with an application of
+	\secrf{composition}, where we pick up this thread again, with an application of
 	information dynamics in a compositional aid.
 \section{Information Dynamics in Analysis}
+	\subsection{Musicological Analysis}
+	\label{s:minimusic}
 \begin{fig}{twopages}
-\colfig[0.96]{matbase/fig9471}  % update from mbc paper
+\colfig[0.96]{matbase/fig9471}\\  % update from mbc paper
 %      \colfig[0.97]{matbase/fig72663}\\  % later update from mbc paper (Keith's new picks)
-			\vspace*{1em}
+			\vspace*{0.5em}
 \colfig[0.97]{matbase/fig13377}  % rule based analysis
 \caption{Analysis of \emph{Two Pages}.
 The thick vertical lines are the part boundaries as indicated in
 the score by the composer.
 The thin grey lines
 indicate changes in the melodic `figures' of which the piece is
 constructed. In the `model information rate' panel, the black asterisks
-mark the
+mark the six most surprising moments selected by Keith Potter.
-six most surprising moments selected by Keith Potter.
+The bottom two panels show two rule-based boundary strength analyses.
-The bottom panel shows a rule-based boundary strength analysis computed
+All information measures are in nats.
-using Cambouropoulos' LBDM.
+	Note that the boundary marked in the score at around note 5,400 is known to be
-All information measures are in nats and time is in notes.
+	anomalous; on the basis of a listening analysis, some musicologists have
+	placed the boundary a few bars later, in agreement with our analysis
+	\cite{PotterEtAl2007}.
 }
 \end{fig}
-	\subsection{Musicological Analysis}
+	In \cite{AbdallahPlumbley2009}, we analysed two pieces of music in the minimalist style
-	In \cite{AbdallahPlumbley2009}, methods based on the theory described above
-	were used to analysis two pieces of music in the minimalist style
 	by Philip Glass: \emph{Two Pages} (1969) and \emph{Gradus} (1968).
 	The analysis was done using a first-order Markov chain model, with the
 	enhancement that the transition matrix of the model was allowed to
 	evolve dynamically as the notes were processed, and was tracked (in
 	a Bayesian way) as a \emph{distribution} over possible transition matrices,
 	rather than a point estimate. Some results are summarised in \figrf{twopages}:
 	the  upper four plots show the dynamically evolving subjective information
-	measures as described in \secrf{surprise-info-seq} computed using a point
+	measures as described in \secrf{surprise-info-seq}, computed using a point
 	estimate of the current transition matrix; the fifth plot (the `model information rate')
-	measures the information in each observation about the transition matrix.
+	shows the information in each observation about the transition matrix.
 	In \cite{AbdallahPlumbley2010b}, we showed that this `model information rate'
 	is actually a component of the true IPI when the transition
 	matrix is being learned online, and was neglected when we computed the IPI from
-	the transition matrix as if the transition probabilities
+	the transition matrix as if it were a constant.
-	were constant.
+	The peaks of the surprisingness and both components of the IPI
-	The peaks of the surprisingness and both components of the predictive information
 	show good correspondence with structure of the piece both as marked in the score
 	and as analysed by musicologist Keith Potter, who was asked to mark the six
-	`most surprising moments' of the piece (shown as asterisks in the fifth plot)%
+	`most surprising moments' of the piece (shown as asterisks in the fifth plot). %%
-	\footnote{%
+%	\footnote{%
-	Note that the boundary marked in the score at around note 5,400 is known to be
+%	Note that the boundary marked in the score at around note 5,400 is known to be
-	anomalous; on the basis of a listening analysis, some musicologists have
+%	anomalous; on the basis of a listening analysis, some musicologists have
-	placed the boundary a few bars later, in agreement with our analysis
+%	placed the boundary a few bars later, in agreement with our analysis
-	\cite{PotterEtAl2007}.}
+%	\cite{PotterEtAl2007}.}
+%
 	In contrast, the analyses shown in the lower two plots of \figrf{twopages},
 	obtained using two rule-based music segmentation algorithms, while clearly
 	\emph{reflecting} the structure of the piece, do not \emph{segment} the piece,
 	with no tendency to peaking of the boundary strength function at
 	the boundaries in the piece.
 	show that the first note of each bar is, on average, significantly more surprising
 	and informative than the others, up to the 64-note level, where as at the 128-note,
 	level, the dominant periodicity appears to remain at 64 notes.
 \begin{fig}{metre}
-%      \scalebox{1}[1]{%
+%      \scalebox{1}{%
 \begin{tabular}{cc}
 \colfig[0.45]{matbase/fig36859} & \colfig[0.48]{matbase/fig88658} \\
 \colfig[0.45]{matbase/fig48061} & \colfig[0.48]{matbase/fig46367} \\
 \colfig[0.45]{matbase/fig99042} & \colfig[0.47]{matbase/fig87490}
 %				\colfig[0.46]{matbase/fig56807} & \colfig[0.48]{matbase/fig27144} \\
 %        \colfig[0.48]{matbase/fig9142}  & \colfig[0.48]{matbase/fig27751}
 \end{tabular}%
 %     }
 \caption{Metrical analysis by computing average surprisingness and
-informative of notes at different periodicities (\ie hypothetical
+IPI of notes at different periodicities (\ie hypothetical
 bar lengths) and phases (\ie positions within a bar).
 }
 \end{fig}
 \subsection{Real-valued signals and audio analysis}
 	 Using analogous definitions based on the differential entropy
 	 \cite{CoverThomas}, the methods outlined
 	 in \secrf{surprise-info-seq} and \secrf{process-info}
-	 are equally applicable to random variables taking values in a continuous domain.
+	 can be reformulated for random variables taking values in a continuous domain.
-	 In the case of music, where expressive properties such as dynamics, tempo,
+	 Information-dynamic methods may thus be applied to expressive parameters of music
-	 timing and timbre are readily quantified on a continuous scale, the information
+	 such as dynamics, timing and timbre, which are readily quantified on a continuous scale.
-	 dynamic framework may thus be applied.
 %    \subsection{Audio based content analysis}
 %    Using analogous definitions of differential entropy, the methods outlined
 %     in the previous section are equally applicable to continuous random variables.
 %     In the case of music, where expressive properties such as dynamics, tempo,
 %     timing and timbre are readily quantified on a continuous scale, the information
 %     dynamic framework may also be considered.
 	 Dubnov \cite{Dubnov2006} considers the class of stationary Gaussian
-	 processes. For such processes, the entropy rate may be obtained analytically
+	 processes, for which entropy rate may be obtained analytically
 	 from the power spectral density of the signal. Dubnov found that the
 	 multi-information rate (which he refers to as `information rate') can be
-	 expressed as a function of the spectral flatness measure. For a given variance,
+	 expressed as a function of the \emph{spectral flatness measure}. Thus, for a given variance,
 	 Gaussian processes with maximal multi-information rate are those with maximally
-	 non-flat spectra. These are essentially consist of a single
+	 non-flat spectra. These essentially consist of a single
-	 sinusoidal component and hence are completely predictable and periodic once
+	 sinusoidal component and hence are completely predictable once
 	 the parameters of the sinusoid have been inferred.
 %	 Local stationarity is assumed, which may be achieved by windowing or
 %	 change point detection \cite{Dubnov2008}.
 	 %TODO
 	 We are currently working towards methods for the computation of predictive information
 	 rate in some restricted classes of Gaussian processes including finite-order
-	 autoregressive models and processes with power-law spectra (fractionally integrated Gaussian noise).
+	 autoregressive models and processes with power-law (or $1/f$) spectra,
+	 which have previously been investegated in relation to their aesthetic properties
+	 \cite{Voss75,TaylorSpeharVan-Donkelaar2011}.
+%	(fractionally integrated Gaussian noise).
 %	 %(fBm (continuous), fiGn discrete time) possible reference:
 %			@book{palma2007long,
 %		  title={Long-memory time series: theory and methods},
 %		  author={Palma, W.},
 %		  volume={662},
 \subsection{Beat Tracking}
 A probabilistic method for drum tracking was presented by Robertson
-\cite{Robertson11c}. The algorithm is used to synchronise a music
+\cite{Robertson11c}. The system infers a beat grid (a sequence
-sequencer to a live drummer. The expected beat time of the sequencer is
+of approximately regular beat times) given audio inputs from a
-represented by a click track, and the algorithm takes as input event
+live drummer, for the purpose of synchronising a music
-times for discrete kick and snare drum events relative to this click
+sequencer with the drummer.
-track. These are obtained using dedicated microphones for each drum and
+The times of kick and snare drum events are obtained
-using a percussive onset detector \cite{puckette98}. The drum tracker
+using dedicated microphones for each drum and a percussive onset detector
-continually updates distributions for tempo and phase on receiving a new
+\cite{puckette98}. These event times are then sent
-event time. We can thus quantify the information contributed of an event
+to the beat tracker, which maintains a probabilistic belief state in
-by measuring the difference between the system's prior distribution and
+the form of distributions over the tempo and phase of the beat grid.
-the posterior distribution using the Kullback-Leiber divergence.
+Every time an event is received, these distributions are updated
+with respect to a probabilistic model which accounts both for tempo and phase
-Here, we have calculated the KL divergence and entropy for kick and
+variations and the emission of drum events at musically plausible times
-snare events in sixteen files. The analysis of information rates can be
+relative to the beat grid.
-considered \emph{subjective}, in that it measures how the drum tracker's
+%continually updates distributions for tempo and phase on receiving a new
-probability distributions change, and these are contingent upon the
+%event time
-model used as well as external properties in the signal. We expect,
-however, that following periods of increased uncertainty, such as fills
+The use of a probabilistic belief state means we can compute entropies
-or expressive timing, the information contained in an individual event
+representing the system's uncertainty about the beat grid, and quantify
-increases. We also examine whether the information is dependent upon
+the amount of information in each event about the beat grid as the KL divergence
-metrical position.
+between prior and posterior distributions. Though this is not strictly the
+instantaneous predictive information (IPI) as described in \secrf{surprise-info-seq}
-	% !!! FIXME
+(the information gained is not directly about future event times), we can treat
+it as a proxy for the IPI, in the manner of the `model information rate'
+described in \secrf{minimusic}, which has a similar status.
+\begin{fig*}{drumfig}
+%	\includegraphics[width=0.9\linewidth]{drum_plots/file9-track.eps}% \\
+	\includegraphics[width=0.97\linewidth]{drum_plots/file11-track.eps} \\
+%	\includegraphics[width=0.9\linewidth]{newplots/file8-track.eps}
+	\caption{Information dynamic analysis derived from audio recordings of
+	drumming, obtained by applying a Bayesian beat tracking system to the
+	sequence of detected kick and snare drum events. The grey line show the system's
+	varying level of uncertainty (entropy) about the tempo and phase of the
+	beat grid, while the stem plot shows the amount of information in each
+	drum event about the beat grid. The entropy drops instantaneously at each
+	event and rises gradually between events.
+	}
+\end{fig*}
+We carried out the analysis on 16 recordings; an example
+is shown in \figrf{drumfig}. There we can see variations in the
+entropy in the upper graph and the information in each drum event in the lower
+stem plot. At certain points in time, unusually large amounts of information
+arrive; these may be related to fills and other rhythmic irregularities, which
+are often followed by an emphatic return to a steady beat at the beginning
+of the next bar---this is something we are currently investigating.
+We also analysed the pattern of information flow
+on a cyclic metre, much as in \figrf{metre}. All the recordings we
+analysed are audibly in 4/4 metre, but we found no
+evidence of a general tendency for greater amounts of information to arrive
+at metrically strong beats, which suggests that the rhythmic accuracy of the
+drummers does not vary systematically across each bar. It is possible that metrical information
+existing  in the pattern of kick and snare events might emerge in an information
+dynamic analysis using a model that attempts to predict the time and type of
+the next drum event, rather than just inferring the beat grid as the current model does.
+%The analysis of information rates can b
+%considered \emph{subjective}, in that it measures how the drum tracker's
+%probability distributions change, and these are contingent upon the
+%model used as well as external properties in the signal.
+%We expect,
+%however, that following periods of increased uncertainty, such as fills
+%or expressive timing, the information contained in an individual event
+%increases. We also examine whether the information is dependent upon
+%metrical position.
 \section{Information dynamics as compositional aid}
 \label{s:composition}
 The use of stochastic processes in music composition has been widespread for
 can drive the \emph{generative} phase of the creative process, information dynamics
 can serve as a novel framework for a \emph{selective} phase, by
 providing a set of criteria to be used in judging which of the
 generated materials
 are of value. This alternation of generative and selective phases as been
-noted by art theorist Margaret Boden \cite{Boden1990}.
+noted before \cite{Boden1990}.
+%
 Information-dynamic criteria can also be used as \emph{constraints} on the
 generative processes, for example, by specifying a certain temporal profile
 of suprisingness and uncertainty the composer wishes to induce in the listener
 as the piece unfolds.
 %stochastic and algorithmic processes: ; outputs can be filtered to match a set of
 Processes with high PIR maintain a certain kind of balance between
 predictability and unpredictability in such a way that the observer must continually
 pay attention to each new observation as it occurs in order to make the best
 possible predictions about the evolution of the seqeunce. This balance between predictability
 and unpredictability is reminiscent of the inverted `U' shape of the Wundt curve (see \figrf{wundt}),
-which summarises the observations of Wundt that the greatest aesthetic value in art
+which summarises the observations of Wundt \cite{Wundt1897} that stimuli are most
-is to be found at intermediate levels of disorder, where there is a balance between
+pleasing at intermediate levels of novelty or disorder, where there is a balance between
 `order' and `chaos'.
 Using the methods of \secrf{markov}, we found \cite{AbdallahPlumbley2009}
 a similar shape when plotting entropy rate againt PIR---this is visible in the
 upper envelope of the scatter plot in \figrf{mtriscat}, which is a 3-D scatter plot of
 The coordinates of the `information space' are entropy rate ($h_\mu$), redundancy ($\rho_\mu$), and
 predictive information rate ($b_\mu$). The points along the `redundancy' axis correspond
 to periodic Markov chains. Those along the `entropy' axis produce uncorrelated sequences
 with no temporal structure. Processes with high PIR are to be found at intermediate
 levels of entropy and redundancy.
-These observations led us to construct the `Melody Triangle' as a graphical interface
+These observations led us to construct the `Melody Triangle', a graphical interface
 for exploring the melodic patterns generated by each of the Markov chains represented
 as points in \figrf{mtriscat}.
+%It is possible to apply information dynamics to the generation of content, such as to the composition of musical materials.
+%For instance a stochastic music generating process could be controlled by modifying
+%constraints on its output in terms of predictive information rate or entropy
+%rate.
 \begin{fig}{wundt}
 \raisebox{-4em}{\colfig[0.43]{wundt}}
 %  {\ \shortstack{{\Large$\longrightarrow$}\\ {\scriptsize\emph{exposure}}}\ }
 {\ {\large$\longrightarrow$}\ }
 in a move to the left along the curve \cite{Berlyne71}.
 }
 \end{fig}
-%It is possible to apply information dynamics to the generation of content, such as to the composition of musical materials.
-%For instance a stochastic music generating process could be controlled by modifying
-%constraints on its output in terms of predictive information rate or entropy
-%rate.
 \subsection{The Melody Triangle}
-The Melody Triangle is an exploratory interface for the discovery of melodic
+The Melody Triangle is an interface for the discovery of melodic
-content, where the input---positions within a triangle---directly map to information
+materials, where the input---positions within a triangle---directly map to information
 theoretic properties of the output.
 %The measures---entropy rate, redundancy and
 %predictive information rate---form a criteria with which to filter the output
 %of the stochastic processes used to generate sequences of notes.
-These measures
+%These measures
-address notions of expectation and surprise in music, and as such the Melody
+%address notions of expectation and surprise in music, and as such the Melody
-Triangle is a means of interfacing with a generative process in terms of the
+%Triangle is a means of interfacing with a generative process in terms of the
-predictability of its output.
+%predictability of its output.
-\begin{fig}{mtriscat}
-	\colfig[0.9]{mtriscat}
-	\caption{The population of transition matrices distributed along three axes of
-	redundancy, entropy rate and predictive information rate (all measured in bits).
-	The concentrations of points along the redundancy axis correspond
-	to Markov chains which are roughly periodic with periods of 2 (redundancy 1 bit),
-	3, 4, \etc all the way to period 8 (redundancy 3 bits). The colour of each point
-	represents its PIR---note that the highest values are found at intermediate entropy
-	and redundancy, and that the distribution as a whole makes a curved triangle. Although
-	not visible in this plot, it is largely hollow in the middle.}
-\end{fig}
 The triangle is populated with first order Markov chain transition
 matrices as illustrated in \figrf{mtriscat}.
-The distribution of transition matrices plotted in this space forms an arch shape
+The distribution of transition matrices in this space forms a relatively thin
-that is fairly thin. Thus, it is a reasonable simplification to project out the
+curved sheet. Thus, it is a reasonable simplification to project out the
 third dimension (the PIR) and present an interface that is just two dimensional.
 The right-angled triangle is rotated, reflected and stretched to form an equilateral triangle with
 the $h_\mu=0, \rho_\mu=0$ vertex at the top, the `redundancy' axis down the left-hand
 side, and the `entropy rate' axis down the right, as shown in \figrf{TheTriangle}.
 This is our `Melody Triangle' and
 forms the interface by which the system is controlled.
 %Using this interface thus involves a mapping to information space;
-The user selects a position within the triangle, the point is mapped into the
+The user selects a point within the triangle, this is mapped into the
-information space, and a corresponding transition matrix is returned. The third dimension,
+information space and the nearest transition matrix is used to generate
-though not visible, is implicitly there, as transition matrices retrieved from
+a sequence of values which are then sonified either as pitched notes or percussive
+sounds. By choosing the position within the triangle, the user can control the
+output at the level of its `collative' properties, with access to the variety
+of patterns as described above and in \secrf{markov}.
+%and information-theoretic criteria related to predictability
+%and information flow
+Though the interface is 2D, the third dimension (PIR) is implicitly present, as
+transition matrices retrieved from
 along the centre line of the triangle will tend to have higher PIR.
+We hypothesise that, under
-Each corner corresponds to three different extremes of predictability and
-unpredictability, which could be loosely characterised as `periodicity', `noise'
-and `repetition'.  Melodies from the `noise' corner (high $h_\mu$, low $\rho_\mu$
-and $b_\mu$) have no discernible pattern;
-Melodies along the `periodicity'
-to `repetition' edge are all deterministic loops that get shorter as we approach
-the `repetition' corner, until each is just one repeating note. The
-areas in between will tend to have higher PIR, and we hypothesise that, under
 the appropriate conditions, these will be perceived as more `interesting' or
 `melodic.'
+%The corners correspond to three different extremes of predictability and
+%unpredictability, which could be loosely characterised as `periodicity', `noise'
+%and `repetition'.  Melodies from the `noise' corner (high $h_\mu$, low $\rho_\mu$
+%and $b_\mu$) have no discernible pattern;
+%those along the `periodicity'
+%to `repetition' edge are all cyclic patterns that get shorter as we approach
+%the `repetition' corner, until each is just one repeating note. Those along the
+%opposite edge consist of independent random notes from non-uniform distributions.
+%Areas between the left and right edges will tend to have higher PIR,
+%and we hypothesise that, under
+%the appropriate conditions, these will be perceived as more `interesting' or
+%`melodic.'
 %These melodies have some level of unpredictability, but are not completely random.
 % Or, conversely, are predictable, but not entirely so.
-\begin{fig}{TheTriangle}
-	\colfig[0.8]{TheTriangle.pdf}
-	\caption{The Melody Triangle}
-\end{fig}
 %PERHAPS WE SHOULD FOREGO TALKING ABOUT THE
 %INSTALLATION VERSION OF THE TRIANGLE?
 %feels a bit like a tangent, and could do with the space..
-The Melody Triangle exists in two incarnations; a standard screen based interface
+The Melody Triangle exists in two incarnations: a screen-based interface
 where a user moves tokens in and around a triangle on screen, and a multi-user
 interactive installation where a Kinect camera tracks individuals in a space and
 maps their positions in physical space to the triangle.  In the latter each visitor
 that enters the installation generates a melody and can collaborate with their
 co-visitors to generate musical textures. This makes the interaction physically engaging
 and (as our experience with visitors both young and old has demonstrated) more playful.
 %Additionally visitors can change the
 %tempo, register, instrumentation and periodicity of their melody with body gestures.
-As a screen based interface the Melody Triangle can serve as a composition tool.
+\begin{fig}{mtriscat}
+	\colfig[0.9]{mtriscat}
+	\caption{The population of transition matrices in the 3D space of
+	entropy rate ($h_\mu$), redundancy ($\rho_\mu$) and PIR ($b_\mu$),
+	all in bits.
+	The concentrations of points along the redundancy axis correspond
+	to Markov chains which are roughly periodic with periods of 2 (redundancy 1 bit),
+	3, 4, \etc all the way to period 7 (redundancy 2.8 bits). The colour of each point
+	represents its PIR---note that the highest values are found at intermediate entropy
+	and redundancy, and that the distribution as a whole makes a curved triangle. Although
+	not visible in this plot, it is largely hollow in the middle.}
+\end{fig}
+The screen based interface can serve as a compositional tool.
 %%A triangle is drawn on the screen, screen space thus mapped to the statistical
 %space of the Melody Triangle.
 A number of tokens, each representing a
-melody, can be dragged in and around the triangle.  For each token, a sequence of symbols with
+sonification stream or `voice', can be dragged in and around the triangle.
-statistical properties that correspond to the token's position is generated.  These
+For each token, a sequence of symbols is sampled using the corresponding
-symbols are then mapped to notes of a scale or percussive sounds.
+transition matrix, which
-However they could easily be mapped to other musical processes, possibly over
+%statistical properties that correspond to the token's position is generated.  These
+%symbols
+are then mapped to notes of a scale or percussive sounds%
+\footnote{The sampled sequence could easily be mapped to other musical processes, possibly over
 different time scales, such as chords, dynamics and timbres. It would also be possible
-to map the symbols to visual or kinetic outputs.
+to map the symbols to visual or other outputs.}%
+.  Keyboard commands give control over other musical parameters such
+as pitch register and inter-onset interval.
 %The possibilities afforded by the Melody Triangle in these other domains remains to be investigated.}.
-Additionally keyboard commands give control over other musical parameters such
+%
-as pitch register and note duration.
+The system is capable of generating quite intricate musical textures when multiple tokens
+are in the triangle, but unlike other computer aided composition tools or programming
-The Melody Triangle can generate intricate musical textures when multiple tokens
+environments, the composer excercises control at the abstract level of information-dynamic
-are in the triangle.  Unlike other computer aided composition tools or programming
+properties.
-environments, here the composer engages with music on a high and abstract level;
+%the interface relating to subjective expectation and predictability.
-the interface relating to subjective expectation and predictability.
+\begin{fig}{TheTriangle}
+	\colfig[0.7]{TheTriangle.pdf}
+	\caption{The Melody Triangle}
+\end{fig}
-\begin{fig}{mtri-results}
-	\def\scat#1{\colfig[0.42]{mtri/#1}}
-	\def\subj#1{\scat{scat_dwells_subj_#1} & \scat{scat_marks_subj_#1}}
-	\begin{tabular}{cc}
-%		\subj{a} \\
-		\subj{b} \\
-		\subj{c} \\
-		\subj{d}
-	\end{tabular}
-	\caption{Dwell times and mark positions from user trials with the
-	on-screen Melody Triangle interface, for two subjects. The left-hand column shows
-	the positions in a 2D information space (entropy rate vs multi-information rate
-	in bits) where each spent their time; the area of each circle is proportional
-	to the time spent there. The right-hand column shows point which subjects
-	`liked'; the area of the circles here is proportional to the duration spent at
-	that point before the point was marked.}
-\end{fig}
 \comment{
 \subsection{Information Dynamics as Evaluative Feedback Mechanism}
 %NOT SURE THIS SHOULD BE HERE AT ALL..?
 Information measures on a stream of symbols can form a feedback mechanism; a
 characteristics of sonified Markov chains and subjective musical preference.
 We carried out a pilot study with six participants, who were asked
 to use a simplified form of the user interface (a single controllable token,
 and no rhythmic, registral or timbral controls) under two conditions:
 one where a single sequence was sonified under user control, and another
-where an addition sequence was sonified in a different register, as if generated
+where an additional sequence was sonified in a different register, as if generated
-by a fixed invisible in one of four regions of the triangle. In addition, subjects
+by a fixed invisible token in one of four regions of the triangle. In addition, subjects
 were asked to press a key if they `liked' what they were hearing.
 We recorded subjects' behaviour as well as points which they marked
 with a key press.
-Some results for three of the subjects are shown in \figrf{mtri-results}. Though
+Some results for two of the subjects are shown in \figrf{mtri-results}. Though
 we have not been able to detect any systematic across-subjects preference for any particular
 region of the triangle, subjects do seem to exhibit distinct kinds of exploratory behaviour.
 Our initial hypothesis, that subjects would linger longer in regions of the triangle
-that produced aesthetically preferable sequences, and that this tend to be towards the
+that produced aesthetically preferable sequences, and that this would tend to be towards the
 centre line of the triangle for all subjects, was not confirmed. However, it is possible
 that the design of the experiment encouraged an initial exploration of the space (sometimes
-very systematic, as for subject c) aimed at \emph{understanding} the parameter space and
+very systematic, as for subject c) aimed at \emph{understanding} %the parameter space and
-how the system works, rather than finding musical sequences. It is also possible that the
+how the system works, rather than finding musical patterns. It is also possible that the
 system encourages users to create musically interesting output by \emph{moving the token},
 rather than finding a particular spot in the triangle which produces a musically interesting
-pattern by itself.
+sequence by itself.
+\begin{fig}{mtri-results}
+	\def\scat#1{\colfig[0.42]{mtri/#1}}
+	\def\subj#1{\scat{scat_dwells_subj_#1} & \scat{scat_marks_subj_#1}}
+	\begin{tabular}{cc}
+%		\subj{a} \\
+%		\subj{b} \\
+		\subj{c} \\
+		\subj{d}
+	\end{tabular}
+	\caption{Dwell times and mark positions from user trials with the
+	on-screen Melody Triangle interface, for two subjects. The left-hand column shows
+	the positions in a 2D information space (entropy rate vs multi-information rate
+	in bits) where each spent their time; the area of each circle is proportional
+	to the time spent there. The right-hand column shows point which subjects
+	`liked'; the area of the circles here is proportional to the duration spent at
+	that point before the point was marked.}
+\end{fig}
 Comments collected from the subjects
 %during and after the experiment
 suggest that
 the information-dynamic characteristics of the patterns were readily apparent
 to most: several noticed the main organisation of the triangle,
-with repetative notes at the top, cyclic patterns along one edge, and unpredictable
+with repetetive notes at the top, cyclic patterns along one edge, and unpredictable
-notes towards the opposite corner. Some described their consciously systematic exploration of the space.
+notes towards the opposite corner. Some described their systematic exploration of the space.
-Two felt that the right side was `more controllable' than the left (a direct consequence
+Two felt that the right side was `more controllable' than the left (a consequence
 of their ability to return to a particular distinctive pattern and recognise it
-as one heard previously). Two said that the trial was too long and became bored towards the end,
+as one heard previously). Two reported that they became bored towards the end,
-but another felt there wasn't enough time to get to hear out the patterns properly.
+but another felt there wasn't enough time to `hear out' the patterns properly.
 One subject did not `enjoy' the patterns in the lower region, but another said the lower
 central regions were more `melodic' and `interesting'.
 We plan to continue the trials with a slightly less restricted user interface in order
 make the experience more enjoyable and thereby give subjects longer to use the interface;
 %and frequencies, only lighting when it heard these.  As the Musicolour would
 %`get bored', the musician would have to change and vary their playing, eliciting
 %new and unexpected outputs in trying to keep the Musicolour interested.
-\section{Conclusion}
+\section{Conclusions}
 	% !!! FIXME
-We reviewed our information dynamics approach to the modelling of the perception
+%We reviewed our information dynamics approach to the modelling of the perception
-of music and have outlined several areas of application, including
+We have looked at several emerging areas of application of the methods and
-musicological analysis, audio analysis, beat tracking, and the generation
+ideas of information dynamics to various problems in music analysis, perception
-of musical materials in a compositional aid.
+and cognition, including musicological analysis of symbolic music, audio analysis,
+rhythm processing and compositional and creative tasks. The approach has proved
-We have described the `Melody Triangle', a novel system that enables a user/composer
+successful in musicological analysis, and though our initial data on
-to discover musical content in terms of the information theoretic properties of
+rhythm processing and aesthetic preference are inconclusive, there is still
-the output, and considered how information dynamics could be used to provide
+plenty of work to be done in this area: where-ever there are probabilistic models,
-evaluative feedback on a composition or improvisation.  Finally we outline a
+information dynamics can shed light on their behaviour.
-pilot study that used the Melody Triangle as an experimental interface to help
-determine if there are any correlations between aesthetic preference and information
-dynamics measures.
 \section*{acknowledgments}
 This work is supported by EPSRC Doctoral Training Centre EP/G03723X/1 (HE),
 GR/S82213/01 and EP/E045235/1(SA), an EPSRC DTA Studentship (PF), an RAEng/EPSRC Research Fellowship 10216/88 (AR), an EPSRC Leadership Fellowship, EP/G007144/1

Mercurial > hg > cip2012

comparison draft.tex @ 70:2cb06db0d271