Mercurial > hg > mtridoc

\documentclass{NIME-alternate} % [dvips] ??
\newcommand{\comment}[1] {}
\usepackage{multirow,url,tools}
\usepackage[ps,dvips,all]{xy}
%\usepackage[psamsfonts]{amsfonts}
% \DeclareMathAlphabet\CMcal{OMS}{cmsy}{m}{n}
%% \SetMathAlphabet\CMcal{bold}{OT1}{cmsy}{m}{n}
% \renewcommand{\mathcal}{\CMcal}
\newcommand{\colfig}[2][1]{\includegraphics[width=#1\linewidth]{figs/#2}}%
%\let\expect=\avg

\CopyrightYear{2012}   %will cause 2008 to appear in the copyright line.
\crdata{Copyright remains with the author(s).}
\conferenceinfo{NIME'12,}{May 21 -- 23, 2012, University of Michigan,
  Ann Arbor.}

%TODO
%
%formal descriptions of redundancy, entropy rate, predictive information rate
%discussion on its use as a composition assistant..
%better triangle diagram (fix shading)
%
%experiment section
%
\title{The Melody Triangle - Pattern and Predictability in Music}
\numberofauthors{4}
\author{
\alignauthor
Henrik Ekeus\\
        \affaddr{Queen Mary University of London}\\
        \affaddr{Media and Arts Technology}\\
        \affaddr{School of Electronic Engineering and Computer Science}\\
       \email{hekeus@eecs.qmul.ac.uk}
\alignauthor
Samer Abdallah\\
       \affaddr{Queen Mary University of London}\\
       \affaddr{Center for Digital Music}\\
       \affaddr{School of Electronic Engineering and Computer Science}\\
       \email{samer.abdallah@\\eecs.qmul.ac.uk}
\and
\alignauthor
Mark D. Plumbley\\
       \affaddr{Queen Mary University of London}\\
       \affaddr{Center for Digital Music}\\
       \affaddr{School of Electronic Engineering and Computer Science}\\
       \email{mark.plumbley@\\eecs.qmul.ac.uk}
% 3rd. author
\alignauthor
Peter W. McOwan\\
       \affaddr{Queen Mary University of London}\\
       \affaddr{Computer Vision Group}\\
       \affaddr{School of Electronic Engineering and Computer Science}\\
       \email{Peter.McOwan@\\eecs.qmul.ac.uk}
}
\date{7 February 2012}
\begin{document}
\maketitle
\begin{abstract}
%The Melody Triangle is a Markov-chain based melody generator where the input - positions within a triangle - directly map to information theoretic measures of its output.

The Melody Triangle is an exploratory interface for the discovery of melodic content, where the input - positions within a triangle - directly map to information theoretic measures of the output.  The measures are the entropy rate, redundancy and \emph{predictive information rate}\cite{Abdallah:2009p4089} of the melody. Predictive information rate is an information measure developed as part of the Information Dynamics of Music project\footnote{(IDyOM) http://www.idyom.org/}.  It characterises temporal structure and is a way of modelling expectation and surprise in the perception of music.

We describe the information dynamics approach, how it forms the basis of the Melody Triangle, and outline two of its incarnations. The first is a multi-user installation where collaboration in a performative setting provides a playful yet informative way to explore expectation and surprise in music.  The second is a screen based interface where the Melody Triangle becomes a compositional tool for the generation of intricate musical textures using an abstract, high-level description of predictability. Finally we outline a pilot study where the screen-based interface was used under experimental conditions to determine how the three measures of predictive information rate, entropy and redundancy might relate to musical preference.

\end{abstract}
\keywords{Information dynamics, Markov chains, Collaborative performance, Aleatoric composition, Information theory}

\section{Information dynamics}

 Music involves patterns in time.  When listening to music we continually build and re-evaluate expectations of what is to come next.  Composers commonly, consciously or not, play with this expectation by setting up expectations which may, or may not be fulfilled.  This manipulation of expectation and surprise in the listener has been articulated by music theorist Meyer\cite{Meyer:1967} and Narmour\cite{Narmour:1977}.  Core to this is the idea that music is not a static object presented as a whole, as a Lerdahl and Jackendo analysis\cite{Lerdahl:1983} would imply, but as a phenomenon that `unfolds' and is experienced \emph{in time}.

The information dynamic approach\cite{Abdallah:2009p4089} considers several different kinds of predictability in musical patterns, how human listeners might perceive these, and how they shape or affect the listening experience.  Central to this is the idea that listeners maintain a dynamically evolving statistical model that enables them to make predictions about how a piece of music will continue.  They do this using both the immediate context of the piece as well as using previous musical experience.  As the music unfolds listeners continually revise their model; in other words, they revise their own, subjective probabilistic belief state.

\section{The Melody Triangle}
%%%How we created the transition matrixes and created the triangle.
The Melody Triangle enables the discovery of melodic content matching a set of information theoretic criteria.   This  criteria is the user input and maps to positions within a triangle.  How exactly the triangle is formed relative to the information theoretic measures is outlined in section \ref{makingthetriangle}.  The interface to the triangle may come in different forms; so far it has been realised as an interactive installation and as a traditional screen based interface.

The Melody Triangle does not generate the melodic content itself, but rather selects appropriate parameters for another system to generate the content.  The implementations discussed in this paper use first order Markov chains as the content generator.  However any generative system can be used, so long as it possible to define a listener model to calculate the appropriate information measures.

The Triangle operates on streams of symbols, and it is by mapping the symbols to individual notes that melodies are generated.  Further by layering these streams intricate musical textures can be created. The choice of notes or scale is not a part of the Melody Triangle's core functionality, in fact the symbols could be mapped to anything, even non sonic outputs.

Any sequence of symbols can be analysed and information theoretic measures taken from it.  The novelty of the Melody Triangle lies in that we go `backwards' - given desired values for these measures, as determined from the user interface, we return a stream of symbols that match those measures.  The information measures used are redundancy, entropy rate and predictive information rate.

\subsection{Information measures}
\subsubsection{Redundancy}
[todo (Samer) - a more formal description]
Redundancy tells us the difference in uncertainty before we look at the context (the fixed point distribution) and the uncertainty after we look at context.  For instance a transition matrix with high redundancy, such as one that represents a long periodic sequence, would have high uncertainty before we look at the context but as soon as we look at the previous symbol, the uncertainty drops to zero because we now know what is coming next.
\subsubsection{Entropy rate}
[todo (Samer) - a more formal description]
Entropy rate is the average uncertainty for the next symbol as we go through the sequence.  A looping sequence has 0 entropy, a sequence that is difficult to predict has high entropy rate.   Entropy rate is an average of `surprisingness' over time.

\subsubsection{Predictive Information Rate}
[todo (Samer) - a more formal description]
Predictive information rate tell us the average reduction in uncertainty upon perceiving a symbol; a system with high predictive information rate means that each symbol tells you more about the next one.

If we imagine a purely periodic sequence, each symbol tells you nothing about the next one that we didn't already know as we already know how the pattern is going.  Similarly with a seemingly uncorrelated sequence,  seeing the next symbol does not tell us anymore because they are completely independent anyway; there is no pattern.   There is a subset of transition matrixes that have high predictive information rate, and it is neither the periodic ones, nor the completely un-corellated ones.  Rather they tend to yield output that have certain characteristic patterns, however a listener can't necessarily know when they occur.  However a certain sequence of symbols might tell us about which one of the characteristics patterns will show up next.  Each symbols tell a us little bit about the future but nothing about the infinite future, we only learn about that as time goes on; there is continual building of prediction.


\begin{figure}
\centering
\includegraphics[width=0.2\textwidth]{PeriodicMatrix.png}
\includegraphics[width=0.2\textwidth]{NonDeterministicMatrix_bw.png}
\caption{Two transition matrixes.  The shade of white represents the probabilities of transition from one symbol to the next (black=0, white=1). The current symbol is along the bottom, and in this case there are twelve possibilities (mapped to a chromatic scale).  The left hand matrix has no uncertainty; it represents a periodic pattern. The right hand matrix contains unpredictability but nonetheless is not completely without perceivable structure, it is of a higher entropy rate. \label{TransitionMatrixes}}
\end{figure}


   \begin{figure}
\centering
\includegraphics[width=0.5\textwidth]{MatrixDistribution.png}
\caption{The population of transition matrixes distributed along three axes of redundancy, entropy rate and predictive information rate.  Note how the distribution makes a curved triangle-like plane floating in 3d space.  \label{InfoDynEngine}}
\end{figure}
  \begin{figure}[h]
\centering
\includegraphics[width=0.5\textwidth]{TheTriangle.pdf}
\caption{The Melody Triangle\label{TheTriangle}}
\end{figure}

\subsection{Populating the triangle}\label{makingthetriangle}

Before the Melody Triangle can used, it has to be `populated' with possible parameter values for the melody generators.    These are then plotted in a 3d statistical space of redundancy, entropy rate and predictive information rate.  In our case we generated thousands of transition matrixes, representing first-order Markov chains, by a random sampling method.   In figure \ref{InfoDynEngine} we see a representation of how these matrixes are distributed in the 3d statistical space; each one of these points corresponds to a transition matrix.


When we look at the distribution of transition matrixes plotted in this space, we see that it forms an arch shape that is fairly thin.  It thus becomes a reasonable approximation to pretend that it is just a sheet in two dimensions; and so we stretch out this curved arc into a flat triangle.  It is this triangular sheet that is our `Melody Triangle' and forms the interface by which the system is controlled.


   When the Melody Triangle is used, regardless of whether it is as a screen based system, or as an interactive installation, it involves a mapping to this statistical space.  When the user, through the interface, selects a position within the triangle, the corresponding transition matrix is returned. Figure \ref{TheTriangle} shows how the triangle maps to different measures of redundancy, entropy rate and predictive information rate.

%%%paragraph explaining what the different parts of the triangle are like.
Each corner corresponds to three different extremes of predictability and unpredictability, which could be loosely characterised as `periodicity', `noise' and `repetition'. Melodies from the `noise' corner have no discernible pattern; they have high entropy rate, low predictive information rate and low redundancy. These melodies are essentially totally random. A melody along the `periodicity' to `repetition' edge are all deterministic loops that get shorter as we approach the `repetition' corner, until it becomes just one repeating note.  It is the areas in between the extreems that provide the more `interesting' melodies. That is, those that have some level of unpredictability, but are not completely random.  Or, conversely, that are predictable, but not entirely so.  This triangular space allows for an intuitive exploration of expectation and surprise in temporal sequences based on a simple model of how one might guess the next event given the previous one.


\section{User Interfaces}
The Melody Triangle engine\footnote{developed in Prolog and MatLab} is controlled over OSC messages and thus any number of interfaces could be developed to for it. Currently two different interfaces exist; a standard screen based interface where a user moves tokens with a mouse in and around a triangle on screen, and a multi-user interactive installation where a Kinect\footnote{http://www.xbox.com/en-GB/Kinect} camera tracks individuals in a space and maps their positions in the space to the triangle.

\subsection{The Multi-User Installation}

\begin{figure}
\centering
\includegraphics[width=0.5\textwidth]{kinnect.pdf}
\caption{The depth map as seen by the Kinect, and the bounding box outlines the blobs detected by OpenNI.\label{Kinect}}
\end{figure}

As a Kinect camera overlooks a space, its the range naturally forms a triangle.  As visitors/users comes into the range of the camera, they start generating a melody, the statistical properties of this melody determined by the mapping of physical space to statistical space as discussed above.  Thus by exploring the physical space the participant explores the predictability of the generated melodic content.  When multiple people are in the space, they can cooperate to create musical polyphonic textures.

The streams of symbols are mapped to MIDI and then played with software instruments in Logic.  The tracking system was capable of detecting gestures, and these were mapped to different musical effects such as tempo changes, periodicity changes (going to the off-beat), instrument/register changes and volume (see Figure \ref{gestures}).

\subsubsection{Tracking and Control}

Tracking and control was done using the OpenNI libraries' API\footnote{http://OpenNi.org/} and high level middle-ware for tracking with Kinect.  This provided reliable blob tracking of humanoid forms in 2d space.  By triangulating this to the Kinect's depth map it became possible to get reliable coordinate of visitors positions in the space.

This system was extended to detect gestures.  By detecting the bounding box of the 2d blobs of individuals in the space, and then normalising these based on the distance of the depth map it became possible to work out if an individual had an arm stretched out or if they were crouching.

With this it was possible to define a series of gestures for controlling the system without the use of any controllers(see table \ref{gestures}).  Thus for instance by sticking out one's left arm quickly, the melody doubles in tempo.  By pulling one's left arm in at the same time as sticking the right arm out the melody would shift onto the offbeat.   Sending out both arms would change instrument.

\begin{table}
\centering
%\includegraphics[width=0.5\textwidth]{InstructionsText.pdf}
\caption{Gestures and their resulting effect\label{gestures}}
\begin{tabular}{ l c l }
left arm & right arm & meaning\\
\hline
  out & static & double tempo \\
  in & static & halve tempo \\
  static & out & triple tempo \\
  static & in & one-third tempo\\
  out & in & shift to off-beat \\
  out & out & change instrument\\
  in & in & reset tempo\\
\end{tabular}
\end{table}

\subsubsection{Observations}
Although visitors would need an initial bit of training they could then quickly be made to collaboratively design musical textures.  For example, one person could lay down a predictable repeating bass line by keeping themselves to the periodicity/repetition side of the room, while a companion can generate a freer melodic line by being nearer the 'noise' part of the space.


The collaborative nature of this installation is one area that merits attention.  By not having one user be able to control the whole narrative, the participants would communicate verbally and direct each other in the goals of learning to use the system, and eventually towards finding interesting musical textures.  The collaborative nature added an element of playfulness and enjoyment that was obviously apparent.

As an artefact this installation is an exploratory prototype, and occupies an ambiguous role in terms of purpose; it is in a nebulous middle ground between instrument and art installation.  One thing is clear is that as a vehicle for communicating ideas related to the expectation, pattern and predictability in music it is very effective.

\subsection{The Screen Based Interface}

\begin{figure}
\centering
\includegraphics[width=0.3\textwidth]{UIscreenshot.png}
\caption{Screen shot of the screen based interface for the Melody Triangle\label{UIScreenShot}}
\end{figure}

The Melody Triangle can also be explored with a standard keyboard and mouse interface.  A triangle is drawn on the screen, screen space thus mapped to the statistical space of the Melody Triangle.   A number of round tokens, each representing a melody can be dragged in and around the triangle.  When a token is dragged into the triangle, the system will start generating the sequence of notes with statistical properties that correspond to its position in the triangle.

Additionally there are a number of keyboard controls.  These include controls for changing the overall tempo, for enabling and disabling individual voices, changing registers, going to off-beats and changing the speed of individual voices.  The system gives visual feedback to indicate when a token has locked on to a new melody, and contains a buffer zone for allowing tokens to be pushed right to the edges of the triangle without falling out.

In this mode, the Melody Triangle can be used as a kind of composition assistant for the generation of interesting musical textures and melodies. However unlike other computer aided composition tools or programming environments, here the composer engages with the musical process on a very high and abstract level; notions of predictability, expectation and surprise the control parameters.


\section{Musical Preference and Information Dynamics Study}
We carried out a preliminary study that sought to identify any correlation between aesthetic preference and the information theoretical measures of the Melody Triangle.  In this study participants were asked to use the screen based interface but it was simplified so that all they could do was move tokens around.  To help discount visual biases, the axes of the triangle would be randomly rearranged for each participant.

The study was divided in to two parts, the first investigated musical preference with respect to single melodies at different tempos. In the second part of the study, a background melody is playing and the participants are asked to find a second melody that 'works well' with the background melody. For each participant this was done four times, each with a different background melody from four different areas of the Melody Triangle.

After the study the participants were surveyed with the Goldsmiths Musical Sophistication Index\cite{Mullensiefen:2011ts} to elicit their prior musical experience.

\subsection{Results}
[todo]


\subsection{Observation/Discussion}
[todo]

\section{Further Work}
In using first-order Markov chains the patterns generated don't have any long term structure or form and as such the melodies generated don't seem to `go anywhere' in the long term.  The Melody Triangle and is better suited to creating textures and patterns as oppose to composing over-arching musical structures.

We are currently investigating how higher-order Markov models can be mapped to information theoretic measures and if the Melody Triangle could be adapted to those models.  This would generate a higher level patterns and provide more long-term structures.

 As it stands, the streams of symbols generated are only mapped to note values.  However they could just as well be applied to any other musical property, such as intervals, chords, dynamics, timbres, structures and key changes.  The possibilities for the Melody Triangle to be compositional guide in these other domains remains to be investigated.

The Melody Triangle in its current form however forms an ideal tool for investigations into musical preference and their relationship to the information dynamics models, and as such more detailed studies under wider experimental conditions and with more participants will be carried out.

\section{acknowledgments}
This work is supported by EPSRC Doctoral Training Centre EP/G03723X/1 (HE), GR/S82213/01 and EP/E045235/1(SA), an EPSRC Leadership Fellowship, EP/G007144/1 (MDP) and EPSRC IDyOM2 EP/H013059/1.  Thanks to Louie McCallum and Davie Smith from QMUL EECS for Kinect programming support.
\bibliographystyle{abbrv}
\bibliography{nime}
\end{document}
author	Henrik Ekeus <hekeus@eecs.qmul.ac.uk>
date	Tue, 07 Feb 2012 23:40:37 +0000
parents	92ba14bc6db4
children	edbd4d53829b