view mume2012/mume2012.tex @ 52:bcedee4183e7

First Draft of MUME 2012 workshop paper
author Henrik Ekeus <hekeus@eecs.qmul.ac.uk>
date Sat, 07 Jul 2012 19:29:51 +0100
parents
children 508760245ab1
line wrap: on
line source
%File: formatting-instruction.tex
\documentclass[letterpaper]{article}
\usepackage{aaai}
\usepackage{times}
\usepackage{helvet}
\usepackage{courier}
\usepackage{tools}
\usepackage{url}



\let\citep=\cite
\newcommand{\colfig}[2][1]{\includegraphics[width=#1\linewidth]{figs/#2}}%
\frenchspacing
\pdfinfo{
/Title (The Melody Triangle - Exploring Pattern and Predictability in Music )
/Subject (Musical Metacreation, Interfaces)
/Author (Henrik Ekeus, Samer A. Abdallah, Mark D. Plumbley, Peter W. McOwan)}
\setcounter{secnumdepth}{0}  

% The file aaai.sty is the style file for AAAI Press 
% proceedings, working notes, and technical reports.
%
\title{The Melody Triangle \\ Exploring Pattern and Predictability in Music}
\author{Henrik Ekeus, Samer A. Abdallah, Mark D. Plumbley, Peter W. McOwan\\
Centre for Digital Music, Queen Mary University of London,\\
London E1 4NS, UK\\
}
\begin{document}
\maketitle
\begin{abstract}
\begin{quote}

The Melody Triangle is an interface for the discovery of melodic materials, where the input -- positions within a triangle -- directly map to information theoretic properties of the output. A model of human expectation and surprise in the perception of music, \emph{information dynamics}, is used to `map out' a musical generative system's parameter space.  This enables a user to explore the possibilities afforded by a generative algorithm, in this case Markov chains, not by directly selecting parameters, but by specifying the subjective \emph{predictability} of the output sequence.  We describe some of the relevant ideas from information dynamics and how the Melody Triangle is defined in terms of these. We describe its incarnation as a screen based performance tool and compositional aid for the generation of musical textures; the userŐs control at the abstract level of randomness and predictability, and some pilot studies being carried out with it. We also briefly outline a multi-user installation, where collaboration in a performative setting provides a playful yet informative way to explore expectation and surprise in music, and a forthcoming mobile phone version of the Melody Triangle.
 
\end{quote}
\end{abstract}

\noindent 



\section{Information Dynamics}
\label{s:Intro}
	The relationship between
	Shannon's \cite{Shannon48} information theory and music and art in general has been the
	subject of some interest since the 1950s 
	\cite{Youngblood58,CoonsKraehenbuehl1958,Moles66,Meyer67,Cohen1962}. 
	The general thesis is that perceptible qualities and subjective states
	like uncertainty, surprise, complexity, tension, and interestingness
	are closely related to information-theoretic quantities like
	entropy, relative entropy, and mutual information.

Music is an inherently dynamic process.   The idea that the musical experience is strongly shaped by the generation
	and playing out of strong and weak expectations was put forward by, amongst others, 
	music theorists L. B. Meyer \cite{Meyer:1967} and Narmour \cite{Narmour:1977}.

	

	An essential aspect of this is that music is experienced as a phenomenon
	that unfolds in time, rather than being apprehended as a static object
	presented in its entirety. Meyer argued that the experience depends
	on how we change and revise our conceptions \emph{as events happen}, on
	how expectation and prediction interact with occurrence, and that, to a
	large degree, the way to understand the effect of music is to focus on
	this `kinetics' of expectation and surprise.


 Prediction and expectation are essentially probabilistic concepts
  and can be treated mathematically using probability theory.
  We suppose that when we listen to music, expectations are created on the basis 
	of our familiarity with various styles of music and our ability to
	detect and learn statistical regularities in the music as they emerge.
	There is experimental evidence that human listeners are able to internalise
	statistical knowledge about musical structure, 
	\cite{SaffranJohnsonAslin1999}, and also
	that statistical models can form an effective basis for computational
	analysis of music, 
	\cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}.


Information dynamics\cite{Abdallah:2009p4089} considers several different kinds of predictability in musical patterns, how these might be quantified using the tools of information theory, 
%human listeners might perceive these, 
and how they shape or affect the listening experience.  Our working hypothesis is that listeners maintain a dynamically evolving probabilistic belief state that enables them to make predictions about how a piece of music will continue.  


They do this using both the immediate context of the piece as well as using previous musical experience, such as a familiarity with musical styles and conventions.  As the music unfolds, listeners continually revise this belief state, which includes predictive
	distributions over possible future events.    These changes in probabilistic beliefs can be associated with
quantities of information; these are the focus of information dynamics.


\section{The Melody Triangle}
%%%How we created the transition matrixes and created the triangle.
The use of stochastic processes in music composition has been widespread for
decades---for instance Iannis Xenakis applied probabilistic mathematical models
to the creation of musical materials\cite{Xenakis:1992ul}. While such processes
can drive the \emph{generative} phase of the creative process, information dynamics
can serve as a novel framework for a \emph{selective} phase, by 
providing a set of criteria to be used in judging which of the 
generated materials
are of value. This alternation of generative and selective phases has been
noted before \cite{Boden1990}.
%
Information-dynamic criteria can also be used as \emph{constraints} on the
generative processes, for example, by specifying a certain temporal profile
of suprisingness and uncertainty the composer wishes to induce in the listener
as the piece unfolds.


In the Melody Triangle, sequential information measures are used to `map out' the parameter space of a stochastic generative musical process.  Positions within the triangle map directly to information theoretic properties of the output, allowing a user to control the output in terms of these properties.  In the next section we describe the information theoretic measures that we use.  


 




\subsection{Sequential Information Measures}\label{sec:Sequential_Information_Measures}
The \emph{entropy rate} of a random process is a basic measure of its randomness or
unpredictablity. Consider the viewpoint of an observer at a certain time, and split the
sequence into an infinite \emph{past}, a single symbol in the \emph{present}, and the 
infinite \emph{future}. The entropy rate is a conditional entropy; informally:
\begin{equation}
	\mathrm{EntropyRate} = H( \mathrm{Present} | \mathrm{Past}).
\end{equation}
that is, it represents our average uncertainty about the present symbol \emph{given}
that we have observed everything before it. Processes with zero entropy rate can
be predicted perfectly given enough of the preceding context.

The \emph{redundancy} of a process, in the sense we are using the term here, is
a measure of how much the predictability of the process depends on knowing the
preceding context. It is the difference between the entropy of a single element of the
sequence in isolation (imagine choosing a note from a musical score at random with your 
eyes closed and then trying to guess the note) and its entropy after taking into account
the preceding context:
\begin{equation}
	\mathrm{Redundancy} = H( \mathrm{Present} ) - H(\mathrm{Present} | \mathrm{Past}).
\end{equation}
If the previous symbols reduce our uncertainty about present symbol a great deal, then 
the redundancy is high. For example, if we know that a sequence consists of a repeating
cycle such as $ \ldots b, c, d, a, b, c, d, a \ldots$, but we don't know which was the first
symbol, then the redundancy is high, as $H(\mathrm{Present})$ is high (because we
have no idea about the present symbol in isolation, but $H(\mathrm{Present}|\mathrm{Past})$
is zero, because knowing the previous symbol immediately tells us what the present symbol is.

The \emph{predictive information rate} (PIR) brings in our uncertainty about the future. It is a
measure of how much each symbol reduces our uncertainty about the future as it is
observed, \emph{given} that we have observed the past:
\begin{equation}
	\mathrm{PIR} = H(\mathrm{Future} | \mathrm{Past}) - H(\mathrm{Future} | \mathrm{Present}, \mathrm{Past}).
\end{equation}
It is a measure of the \emph{new} information in each symbol.
%Notice that if the past completely determines both the present and the future (as in the cyclic
%pattern above) the PIR is zero, since the present symbol brings no new information. However,
%if the symbols in a sequence are generated completely independently, e.g. by rolling a die for each
%one, then again, the present symbol provides no information about the future and the PIR
%is zero. 

The behaviour of the predictive information rate make it interesting from a compositional point of view. The definition 
of the PIR is such that it is low both for extremely regular processes, such as constant
or periodic sequences, \emph{and} low for extremely random processes, where each symbol
is chosen independently of the others, in a kind of `white noise'. In the former case,
the pattern, once established, is completely predictable and therefore there is no
\emph{new} information in subsequent observations. In the latter case, the randomness
and independence of all elements of the sequence means that, though potentially surprising,
each observation carries no information about the ones to come.

Processes with high PIR maintain a certain kind of balance between
predictability and unpredictability in such a way that the observer must continually
pay attention to each new observation as it occurs in order to make the best
possible predictions about the evolution of the sequence. This balance between predictability
and unpredictability is reminiscent of the inverted `U' shape of the Wundt curve (see \figrf{wundt}), 
which summarises the observations of Wundt \cite{Wundt1897} that stimuli are most
pleasing at intermediate levels of novelty or disorder, where there is a balance between
`order' and `chaos'. 

  \begin{fig}{wundt}
    \raisebox{-4em}{\colfig[0.43]{wundt}}
 %  {\ \shortstack{{\Large$\longrightarrow$}\\ {\scriptsize\emph{exposure}}}\ }
    {\ {\large$\longrightarrow$}\ }
    \raisebox{-4em}{\colfig[0.43]{wundt2}}
    \caption{
      The Wundt curve relating randomness/complexity with
      perceived value. Repeated exposure sometimes results
      in a move to the left along the curve \cite{Berlyne71}.
    }
  \end{fig}




\begin{figure}
\centering
\includegraphics[width=0.23\textwidth]{figs/PeriodicMatrix.png}
\includegraphics[width=0.23\textwidth]{figs/NonDeterministicMatrix_bw.png}
\caption{Two transition matrixes.  The shade of white represents the probabilities of transition from one symbol to the next (black=0, white=1). The current symbol is along the bottom, and in this case there are twelve possibilities (mapped to a chromatic scale).  The left hand matrix has no uncertainty; it represents a periodic pattern. The right hand matrix contains unpredictability but nonetheless is not completely without perceivable structure, it is of a higher entropy rate. \label{TransitionMatrixes}}
\end{figure}



 \begin{fig}{mtriscat}
	\colfig[0.9]{mtriscat}
	\caption{The population of transition matrices in the 3D space of 
	entropy rate, redundancy and PIR, 
	all in bits.
	The concentrations of points along the redundancy axis correspond
	to Markov chains which are roughly periodic with periods of 2 (redundancy 1 bit),
	3, 4, \etc all the way to period 7 (redundancy 2.8 bits). Note that the highest PIR values are found at intermediate entropy
	and redundancy, and that the distribution as a whole makes a curved triangle. Although
	not visible in this plot, it is largely hollow in the middle.  \label{InfoDynEngine}}
\end{fig}



%\begin{figure}
%\centering
%\includegraphics[width=0.5\textwidth]{MatrixDistribution.png}
%\caption{The population of transition matrixes distributed along three axes of redundancy, entropy rate and predictive information rate.  Note how the distribution makes a curved triangle-like plane floating in 3d space.  \label{InfoDynEngine}}
%\end{figure}
% \begin{figure}[h]
%\centering
%\includegraphics[width=0.8\textwidth]{figs/TheTriangle.pdf}
%\caption{The Melody Triangle\label{TheTriangle}}
%\end{figure}


\subsection{Populating the triangle}\label{makingthetriangle}



Before the Melody Triangle can used, it has to be populated with possible parameter values for the melody generators.    These are then plotted in a 3d statistical space of redundancy, entropy rate and predictive information rate.  In our case we generated thousands of transition matrixes, representing first-order Markov chains, by a random sampling method.   In figure \ref{InfoDynEngine} we see a representation of how these matrixes are distributed in the 3d statistical space; each one of these points corresponds to a transition matrix.  




The distribution of transition matrices in this space forms a relatively thin
curved sheet. Thus, it is a reasonable simplification to project out the 
third dimension (the PIR) and present an interface that is just two dimensional. 
The right-angled triangle is rotated, reflected and stretched to form an equilateral triangle with
the `redundancy'/`entropy rate' vertex at the top, the `redundancy' axis down the left-hand
side, and the `entropy rate' axis down the right, as shown in \figrf{TheTriangle}.
This is our `Melody Triangle' and
forms the interface by which the system is controlled. 

\begin{fig}{TheTriangle}
	\colfig[1]{TheTriangle.pdf}
	\caption{The Melody Triangle}
\end{fig}	

%Using this interface thus involves a mapping to information space; 
The user selects a point within the triangle, this is mapped into the 
information space and the nearest transition matrix is used to generate
a sequence of values which are then sonified either as pitched notes or percussive
sounds. 

Though the interface is 2D, the third dimension (PIR) is implicitly present, as 
transition matrices retrieved from
along the centre line of the triangle will tend to have higher PIR.

The corners correspond to three different extremes of predictability and
unpredictability, which could be loosely characterised as `periodicity', `noise'
and `repetition'.  Melodies from the `noise' corner (high entropy rate, low redundancy
and low predictive information rate) have no discernible pattern;
those along the `periodicity'
to `repetition' edge are all cyclic patterns that get shorter as we approach
the `repetition' corner, until each is just one repeating note. Those along the 
opposite edge consist of independent random notes from non-uniform distributions. 
Areas between the left and right edges will tend to have higher PIR, 
and we hypothesise that, under
the appropriate conditions, these will be perceived as more `interesting' or 
`melodic.'
These melodies have some level of unpredictability, but are not completely random.
 Or, conversely, are predictable, but not entirely so.

 Given coordinates corresponding to a point in the triangle, we select from a pre-built
library of random processes, choosing one whose entropy rate and redundancy match the desired
values.  The implementations discussed in this paper use first order Markov chains as the content generator,
since it is easy to compute the theoretically exact values of entropy rate, redundancy and predictive
information rate given the transition matrix of the Markov chain. However, in principle, any generative system could be used to create the library of sequences, given an appropriate probabilistic listener model supporting
the estimation of entropy rate and redundancy.


\section{Implementations}


The Markov chain based implementation generates streams of symbols in the abstract; the alphabet of symbols is then mapped to a set of distinct sounds, such as pitched notes in a scale or a set of percussive
sounds.  Further by layering these streams intricate musical textures can be created. The selection of
notes or sounds is arbitrary, as long as they are all distinguishable.
%)le is not a part of the Melody Triangle's core functionality, i
Indeed, the symbols could be mapped to even non sonic outputs such as visible shapes, colours, or movements.

The physical interface to the Triangle has so far been realised in two forms: as an interactive installation and as a screen based interface.  Currently a mobile phone version is under development.

\subsection{The Screen Interface}
The screen based interface can serve as a compositional tool or performance interface.

 \begin{fig}{melTriScreenShot}
	\colfig[1]{melTriScreenShot}
	\caption{Screen shot of the Melody Triangel screen UI.  On the right current transition matrixes being played are displayed.  The tokens flash when ever a note from its melody is rendered.  }
\end{fig}

%%A triangle is drawn on the screen, screen space thus mapped to the statistical
%space of the Melody Triangle.  
A number of tokens, each representing a
sonification stream or `voice', can be dragged in and around the triangle.
For each token, a sequence of symbols is sampled using the corresponding
transition matrix, which are then mapped to notes of a scale or percussive sounds%
\footnote{The sampled sequence could easily be mapped to other musical processes, possibly over
different time scales, such as chords, dynamics and timbres. It would also be possible
to map the symbols to visual or other outputs.}%
.  Keyboard commands give control over other musical parameters such
as the pitch register, volume, scale, inter-onset interval and instrument for each voice.  
%The possibilities afforded by the Melody Triangle in these other domains remains to be investigated.}.
%
The system is capable of generating quite intricate musical textures when multiple tokens
are in the triangle.  The overlapping and interweaving of melodies of varying periodicity's and predictability is well suited for making content that could stylistically be characterised as `minimalism'.   

This interface is quite unlike other computer aided composition tools or programming
environments, as here the composer exercises control at the abstract level of information-dynamic
properties.
A video of the interface in use can be viewed here - \url{http://webprojects.eecs.qmul.ac.uk/hekeus/MelodyTriangle/MelodyTriangle.mov}
%the interface relating to subjective expectation and predictability.






\subsection{The Interactive Installation}
The Melody Triangle was firs implemented as a multi-user
interactive installation.  A Kinect camera tracks individuals in a space and
maps their positions in physical space to the triangle.  Each visitor
that enters the installation generates a melody and can collaborate with their
co-visitors to generate musical textures.  Additionally the visitors can change the periodiciy, register and instrumentation of their melody with body gestures.  When multiple people are in the space they can cooperate to create interweaving melodies, forming intricate polyphonic textures.
 This makes the interaction physically engaging
and (as our experience with visitors both young and old has demonstrated) more playful.
%Additionally visitors can change the 
%tempo, register, instrumentation and periodicity of their melody with body gestures.

As an artefact this installation is an exploratory prototype and occupies an ambiguous role in terms of purpose; it is in a nebulous middle ground between instrument, art installation and technical demonstration. It is clear however, that as a vehicle for communicating ideas related to the expectation, pattern and predictability in music to the public, it is very effective.


\section{User trials with the Melody Triangle}
We are currently in the process of using the screen-based
Melody Triangle user interface to investigate the relationship between the information-dynamic
characteristics of sonified Markov chains and subjective musical preference.
We carried out a pilot study with six participants, who were asked
to use a simplified form of the user interface (a single controllable token,
and no rhythmic, registral or timbral controls) under two conditions:
one where a single sequence was sonified under user control, and another
where an additional sequence was sonified in a different register, as if generated
by a fixed invisible token in one of four regions of the triangle. In addition, subjects
were asked to press a key if they `liked' what they were hearing.

We recorded subjects' behaviour as well as points which they marked
with a key press.
Some results for two of the subjects are shown in \figrf{mtri-results}. Though
we have not been able to detect any systematic across-subjects preference for any particular
region of the triangle, subjects do seem to exhibit distinct kinds of exploratory behaviour.
Our initial hypothesis, that subjects would linger longer in regions of the triangle
that produced aesthetically preferable sequences, and that this would tend to be towards the
centre line of the triangle for all subjects, was not confirmed. However, it is possible
that the design of the experiment encouraged an initial exploration of the space (sometimes
very systematic, as for subject c) aimed at \emph{understanding} %the parameter space and
how the system works, rather than finding musical patterns. It is also possible that the
system encourages users to create musically interesting output by \emph{moving the token},
rather than finding a particular spot in the triangle which produces a musically interesting
sequence by itself.

\begin{fig}{mtri-results}
	\def\scat#1{\colfig[0.42]{mtri/#1}}
	\def\subj#1{\scat{scat_dwells_subj_#1} & \scat{scat_marks_subj_#1}}
	\begin{tabular}{cc}
%		\subj{a} \\
		\subj{b} \\
		\subj{c} \\
		\subj{d}
	\end{tabular}
	\caption{Dwell times and mark positions from user trials with the
	on-screen Melody Triangle interface, for three subjects. The left-hand column shows
	the positions in a 2D information space (entropy rate vs redundancy
	in bits) where each spent their time; the area of each circle is proportional
	to the time spent there. The right-hand column shows point which subjects
	`liked'; the area of the circles here is proportional to the duration spent at
	that point before the point was marked.}
\end{fig}

Comments collected from the subjects
%during and after the experiment 
suggest that
the information-dynamic characteristics of the patterns were readily apparent
to most: several noticed the main organisation of the triangle,
with repetitive notes at the top, cyclic patterns along one edge, and unpredictable
notes towards the opposite corner. Some described their systematic exploration of the space.
Two felt that the right side was `more controllable' than the left (a consequence
of their ability to return to a particular distinctive pattern and recognise it
as one heard previously). Two reported that they became bored towards the end,
but another felt there wasn't enough time to `hear out' the patterns properly.
%One subject did not `enjoy' the patterns in the lower region, but another said the lower
%central regions were more `melodic' and `interesting'.

We plan to continue the trials with a slightly less restricted user interface in order
make the experience more enjoyable and thereby give subjects longer to use the interface;
this may allow them to get beyond the initial exploratory phase and give a clearer
picture of their aesthetic preferences. In addition, we plan to conduct a
study under more restrictive conditions, where subjects will have no control over the patterns
other than to signal (a) which of two alternatives they prefer in a forced
choice paradigm, and (b) when they are bored of listening to a given sequence.

\subsection{Qualitative Feedback}

We've begun collective qualitative feedback from users.  Unlike the pilot study where participants would not know anything about the interface before hand and were asked to `explore' with as little instructions in possible, here potential users, who are music practitioners, are first taught how to use the system.  Then they are given some time to play and experiment, and in informal discussion feedback and criticism of the system is sought ought.  This is with a view to establish what features would be desired for any eventual further development of the interface, for instance as a VST instrument for inclusion in a standard audio production environment.  

Feedback thus far has been positive, and some points collected so far are briefly provided here -

\begin{itemize}
\item The subjects were very quick to get to grasps with the properties of the different areas of the triangle, and found it quite intuitive.
\item The more periodic/predictable half of the triangle was used considerably more.    
\item They expressed interest in its potential as live performance interface for electronic music.
\item Would like to be able to map the output of the triangle to other parameters than just notes, such as to the control of filters and effect parameters.  
\end{itemize}


\subsection{The Mobile App}
The melody triangle is being implemented as an Android mobile phone application.  The audio engine is developed in libpd\footnote{http://libpd.cc/}, a port of the open source Pure Data programming environment. The app will allow users to use a phone's touch screen to drag tokens around the screen and generate intricate musical textures.  It is essentially a user friendly generative, perpetual minimalist music making app.  Additionally it will collect usage statistics, and keep a record of the users' favourite positions in the triangle and periodically upload these to our servers.  It is hoped that the app will be downloaded and used by a significantly large number of people, and that the usage statistics collected will allow us to draw parallels between musical preferences and the information dynamics measures.

 \begin{fig}{mobile}
	\colfig[1]{mobile}
	\caption{The Melody Triangle mobile phone app  }
\end{fig}



\bibliography{all,c4dm,all2}\bibliographystyle{aaai} 
\end{document}