Mercurial > hg > mtridoc
changeset 54:70bfa77c1476
camera ready version
author | Henrik Ekeus <hekeus@eecs.qmul.ac.uk> |
---|---|
date | Fri, 24 Aug 2012 14:40:59 +0100 |
parents | 508760245ab1 |
children | 2cad1d57f7e2 |
files | mume2012/MelodyTriangleMUME2012.pdf mume2012/MelodyTriangleMUME2012.tex mume2012/figs/InstructionsImage3.pdf mume2012/figs/NonPeriodicMatrix.pdf mume2012/figs/PeriodicMatrix.pdf mume2012/figs/kinnect.pdf mume2012/figs/melTriScreenShot.pdf mume2012/figs/mobile.pdf mume2012/figs/mtri/scat_dwells_subj_a-eps-converted-to.pdf mume2012/figs/mtri/scat_marks_subj_a-eps-converted-to.pdf mume2012/figs/mtriscat.pdf mume2012/figs/wundt.pdf mume2012/figs/wundt2.pdf mume2012/figures_old/InstructionsImage3.pdf mume2012/mume2012.pdf mume2012/mume2012_review.pdf mume2012/mume2012_review.tex |
diffstat | 17 files changed, 1054 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/mume2012/MelodyTriangleMUME2012.tex Fri Aug 24 14:40:59 2012 +0100 @@ -0,0 +1,586 @@ +%File: MelodyTriangleMUME2010.tex +\documentclass{article} +\usepackage{aaai} +\usepackage{times} +\usepackage{helvet} +\usepackage{courier} +\frenchspacing +%TODO +\pdfinfo{ +/Title (Melody Triangle Mume (todo) +/Subject(todo) +/Author(todo)} +\usepackage{cite} + +\usepackage{graphicx} +\usepackage{amssymb} +\usepackage{epstopdf} +\usepackage{url} +\usepackage{listings} +%\usepackage[expectangle]{tools} +\usepackage{tools} +\usepackage{fixfloats} +\usepackage{tikz} +\usetikzlibrary{calc} +\usetikzlibrary{matrix} +\usetikzlibrary{patterns} +\usetikzlibrary{arrows} + +\let\citep=\cite +\newcommand{\colfig}[2][1]{\includegraphics[width=#1\linewidth]{figures/#2}}% +\newcommand\preals{\reals_+} +\newcommand\X{\mathcal{X}} +\newcommand\Y{\mathcal{Y}} +\newcommand\domS{\mathcal{S}} +\newcommand\A{\mathcal{A}} +\newcommand\Data{\mathcal{D}} +\newcommand\rvm[1]{\mathrm{#1}} +\newcommand\sps{\,.\,} +\newcommand\Ipred{\mathcal{I}_{\mathrm{pred}}} +\newcommand\Ix{\mathcal{I}} +\newcommand\IXZ{\overline{\underline{\mathcal{I}}}} +\newcommand\x{\vec{x}} +\newcommand\Ham[1]{\mathcal{H}_{#1}} +\newcommand\subsets[2]{[#1]^{(k)}} +\def\bet(#1,#2){#1..#2} + + +\def\ev(#1=#2){#1\!\!=\!#2} +\newcommand\rv[1]{\Omega \to #1} +\newcommand\ceq{\!\!=\!} +\newcommand\cmin{\!-\!} +\newcommand\modulo[2]{#1\!\!\!\!\!\mod#2} + +\newcommand\sumitoN{\sum_{i=1}^N} +\newcommand\sumktoK{\sum_{k=1}^K} +\newcommand\sumjtoK{\sum_{j=1}^K} +\newcommand\sumalpha{\sum_{\alpha\in\A}} +\newcommand\prodktoK{\prod_{k=1}^K} +\newcommand\prodjtoK{\prod_{j=1}^K} + +\newcommand\past[1]{\overset{\rule{0pt}{0.2em}\smash{\leftarrow}}{#1}} +\newcommand\fut[1]{\overset{\rule{0pt}{0.1em}\smash{\rightarrow}}{#1}} +\newcommand\parity[2]{P^{#1}_{2,#2}} + + +%%%%%%%%%%%%%%%%%%%%%%%% Some useful packages %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%%%%%%%%%%%%%%%%%%%%%%%% See related documentation %%%%%%%%%%%%%%%%%%%%%%%%%% +%\usepackage{amsmath} % popular packages from Am. Math. Soc. Please use the +%\usepackage{amssymb} % related math environments (split, subequation, cases, +%\usepackage{amsfonts}% multline, etc.) +%\usepackage{bm} % Bold Math package, defines the command \bf{} +%\usepackage{paralist}% extended list environments +%%subfig.sty is the modern replacement for subfigure.sty. However, subfig.sty +%%requires and automatically loads caption.sty which overrides class handling +%%of captions. To prevent this problem, preload caption.sty with caption=false +%\usepackage[caption=false]{caption} +%\usepackage[font=footnotesize]{subfig} + + +%user defined variables +\def\papertitle{The Melody Triangle - Pattern and Predictability in Music} +\def\firstauthor{Henrik Ekeus} +\def\secondauthor{Samer A. Abdallah} +\def\thirdauthor{Mark D. Plumbley} +\def\fourthauthor{Peter W. McOwan} + +% adds the automatic +% Saves a lot of ouptut space in PDF... after conversion with the distiller +% Delete if you cannot get PS fonts working on your system. + +% pdf-tex settings: detect automatically if run by latex or pdflatex +\newif\ifpdf +\ifx\pdfoutput\relax +\else + \ifcase\pdfoutput + \pdffalse + \else + \pdftrue +\fi + +\ifpdf % compiling with pdflatex + \usepackage[pdftex, + pdftitle={\papertitle}, + pdfauthor={\firstauthor, \secondauthor, \thirdauthor}, + bookmarksnumbered, % use section numbers with bookmarks + pdfstartview=XYZ % start with zoom=100% instead of full screen; + % especially useful if working with a big screen :-) + ]{hyperref} + %\pdfcompresslevel=9 + + %\usepackage[pdftex]{graphicx} + % declare the path(s) where your graphic files are and their extensions so + %you won't have to specify these with every instance of \includegraphics + %\graphicspath{{./figures/}} + %\DeclareGraphicsExtensions{.pdf,.jpeg,.png} + + \usepackage[figure,table]{hypcap} + +\else % compiling with latex + \usepackage[dvips, + bookmarksnumbered, % use section numbers with bookmarks + pdfstartview=XYZ % start with zoom=100% instead of full screen + ]{hyperref} % hyperrefs are active in the pdf file after conversion + + \usepackage[dvips]{epsfig,graphicx} + % declare the path(s) where your graphic files are and their extensions so + %you won't have to specify these with every instance of \includegraphics + \graphicspath{{./figures/}} + \DeclareGraphicsExtensions{.eps} + + \usepackage[figure,table]{hypcap} +\fi + +%setup the hyperref package - make the links black without a surrounding frame +\hypersetup{ + colorlinks,% + citecolor=black,% + filecolor=black,% + linkcolor=black,% + urlcolor=black +} + + +% Title. +% ------ +\title{\papertitle} + +% Authors +% Please note that submissions are NOT anonymous, therefore +% authors' names have to be VISIBLE in your manuscript. +% +% Single address +% To use with only one author or several with the same address +% --------------- +\oneauthor + {\firstauthor, \secondauthor, \thirdauthor, \fourthauthor} {Queen Mary University of London \\ Centre for Digital Music \\ School of Electronic Engineering and Computer Science\\% + {\tt \href{mailto:hekeus@eecs.qmul.ac.uk}{\{hekeus,samer.abdallah\}@eecs.qmul.ac.uk}}} + +%Two addresses +%-------------- +% \twoauthors +% {\firstauthor} {Affiliation1 \\ % +% {\tt \href{mailto:author1@smcnetwork.org}{author1@smcnetwork.org}}} +% {\secondauthor} {Affiliation2 \\ % +% {\tt \href{mailto:author2@smcnetwork.org}{author2@smcnetwork.org}}} + +% Three addresses +% -------------- +% \threeauthors +% {\firstauthor} {Affiliation1 \\ % +% {\tt \href{mailto:author1@smcnetwork.org}{author1@smcnetwork.org}}} +% {\secondauthor} {Affiliation2 \\ % +% {\tt \href{mailto:author2@smcnetwork.org}{author2@smcnetwork.org}}} +% {\thirdauthor} { Affiliation3 \\ % +% {\tt \href{mailto:author3@smcnetwork.org}{author3@smcnetwork.org}}} +% + +% ***************************************** the document starts here *************** +\begin{document} +% +\capstartfalse +\maketitle +\capstarttrue +% +\begin{abstract} +The Melody Triangle is an interface for the discovery of melodic materials, where the input -- positions within a triangle -- directly map to information theoretic properties of the output. The measures are the entropy rate, redundancy and \emph{predictive information rate}\cite{Abdallah:2009p4089} of the random process used to generate the sequence of notes. These are all related to the \emph{predictability} of the sequence and as such address the notions of expectation and surprise in the perception of music. We describe some of the relevant ideas from information dynamics, how the Melody Triangle is defined in terms of these, and describe two physical incarnations of the Melody Triangle. The first is a multi-user installation where collaboration in a performative setting provides a playful yet informative way to explore expectation and surprise in music. The second is a screen based interface where the Melody Triangle becomes a cognitively-informed compositional aid for the generation of musical textures; the user's control at the abstract level of randomness and predictability. Finally we outline a pilot study where the screen-based interface was used under experimental conditions to determine how the three measures of predictive information rate, entropy and redundancy might relate to musical preference. +\end{abstract} +%the generation of musical materials as a cognitively-informed compositional aid + +\section{Information Dynamics}\label{sec:Information_dynamics} + +The relationship between + Shannon's \cite{Shannon48} information theory and music and art in general has been the + subject of some interest since the 1950s + \cite{Youngblood58,CoonsKraehenbuehl1958,Moles66,Meyer67,Cohen1962}. + The general thesis is that perceptible qualities and subjective states + like uncertainty, surprise, complexity, tension, and interestingness + are closely related to information-theoretic quantities like + entropy, relative entropy, and mutual information. + + +Music is an inherently dynamic process. The idea that the musical experience is strongly shaped by the generation + and playing out of strong and weak expectations was put forward by, amongst others, + music theorists L. B. Meyer \cite{Meyer:1967} and Narmour \citep{Narmour:1977}. +%Composers commonly, consciously or not, play with this process by setting up expectations which may, or may not be fulfilled, manipulating the expectations of the listener and inducing surprise or not as the music progresses +%and surprise in the listener has been articulated by music theorist Meyer +%\cite{Meyer:1967,Narmour:1977}. +Central to this is the idea that music is not a static object presented as a whole, +%as the grammatical analysis of Lerdahl and Jackendoff \cite{Lerdahl:1983} might imply, +but as a phenomenon that `unfolds' and is experienced \emph{in time}; as listeners we continually build and re-evaluate expectations of what is to come next. + + + + + + +Information dynamics\cite{Abdallah:2009p4089} considers several different kinds of predictability in musical patterns, how these might be quantified using the tools of information theory, +%human listeners might perceive these, +and how they shape or affect the listening experience. Our working hypothesis is that listeners maintain a dynamically evolving statistical model that enables them to make predictions about how a piece of music will continue. They do this using both the immediate context of the piece as well as using previous musical experience, such as a familiarity with musical styles and conventions. As the music unfolds, listeners continually revise their model; in other words, they revise their own, subjective probabilistic belief state. These changes in probabilistic beliefs can be associated with +quantities of information; these are the focus of information dynamics. + + + +\section{The Melody Triangle}\label{sec:The_Melody_triangle} +%%%How we created the transition matrixes and created the triangle. +The use of stochastic processes in music composition has been widespread for +decades---for instance Iannis Xenakis applied probabilistic mathematical models +to the creation of musical materials\cite{Xenakis:1992ul}. While such processes +can drive the \emph{generative} phase of the creative process, information dynamics +can serve as a novel framework for a \emph{selective} phase, by +providing a set of criteria to be used in judging which of the +generated materials +are of value. This alternation of generative and selective phases as been +noted before \cite{Boden1990}. +% +Information-dynamic criteria can also be used as \emph{constraints} on the +generative processes, for example, by specifying a certain temporal profile +of suprisingness and uncertainty the composer wishes to induce in the listener +as the piece unfolds. + +The Melody Triangle enables the discovery of melodic content matching a set of information theoretic criteria. Positions within the triangle correspond with pairs of values of entropy rate and redundancy. %The relationship with the predictive information rate is not explicitly controlled as this would require a three-dimensional interface, but an implicit relationship emerges, which is described in section \ref{makingthetriangle}. +The physical interface to the Triangle has so far been realised in two forms: as an interactive installation and as a screen based interface. + +Given coordinates corresponding to a point in the triangle, we select from a pre-built +library of random processes, choosing one whose entropy rate and redundancy match the desired +values. The implementations discussed in this paper use first order Markov chains as the content generator, +since it is easy to compute the theoretically exact values of entropy rate, redundancy and predictive +information rate given the transition matrix of the Markov chain. However, in principle, any generative system could be used to create the library of sequences, given an appropriate probabilistic listener model supporting +the estimation of entropy rate and redundancy. + +The Markov chain based implementation generates streams of symbols in the abstract; the alphabet of symbols is then mapped to a set of distinct sounds, such as pitched notes in a scale or a set of percussive +sounds. Further by layering these streams intricate musical textures can be created. The selection of +notes or sounds is arbitrary, as long as they are all distinguishable. +%)le is not a part of the Melody Triangle's core functionality, i +Indeed, the symbols could be mapped to even non sonic outputs such as visible shapes, colours, or movements. + +Any sequence of symbols can be analysed and information theoretic measures estimated from it. +The novelty of the Melody Triangle lies in that we reverse this mapping: given desired values for these measures, as determined from the user interface, we return a stream of symbols with the desired properties. +In the next section we describe the three information theoretic measures that we use. + + +\section{Sequential Information Measures}\label{sec:Sequential_Information_Measures} +The \emph{entropy rate} of a random process is a basic measure of its randomness or +unpredictablity. Consider the viewpoint of an observer at a certain time, and split the +sequence into an infinite \emph{past}, as single symbol in the \emph{present}, and the +infinite \emph{future}. The entropy rate is a conditional entropy; informally: +\begin{equation} + \mathrm{EntropyRate} = H( \mathrm{Present} | \mathrm{Past}), +\end{equation} +that is, it represents our average uncertainty about the present symbol \emph{given} +that we have observed everything before it. Processes with zero entropy rate can +be predicted perfectly given enough of the preceeding context. + +The \emph{redundancy} of the a process, in the sense we are using the term here, is +a measure of how much the predictability of the process depends on knowing the +preceeding context. It is the difference between the entropy of a single element of the +sequence in isolation (imagine chosing a note from a musical score at random with your +eyes closed and then trying to guess the note) and its entropy after taking into account +the preceeding context: +\begin{equation} + \mathrm{Redundancy} = H( \mathrm{Present} ) - H(\mathrm{Present} | \mathrm{Past}). +\end{equation} +If the previous symbols reduce our uncertainty about present symbol a great deal, then +the redundancy is high. For example, if we know that a sequence consists of a repeating +cycle such as $ \ldots b, c, d, a, b, c, d, a \ldots$, but we don't know which was the first +symbol, then the redundancy is high, as $H(\mathrm{Present})$ is high (because we +have no idea about the present symbol in isolation, but $H(\mathrm{Present}|\mathrm{Past})$ +is zero, because knowing the previous symbol immediately tells us what the present symbol is. + +The \emph{predictive information rate} (PIR) brings in our uncertainty about the future. It is a +measure of how much each symbol reduces our uncertainty about the future as it is +observed, \emph{given} that we have observed the past: +\begin{equation} + \mathrm{PIR} = H(\mathrm{Future} | \mathrm{Past}) - H(\mathrm{Future} | \mathrm{Present}, \mathrm{Past}). +\end{equation} +It is a measure of the \emph{new} information in each symbol. +Notice that if the past completely determines both the present and the future (as in the cyclic +pattern above) the PIR is zero, since the present symbol brings no new information. However, +if the symbols in a sequence are generated completely independently, e.g. by rolling a die for each +one, then again, the present symbol provides no information about the future and the PIR +is zero. + +%However, there do exist processes that have high predictive information rates as compared +%with their entropy rates: within the class of Markov chains, these are neither the periodic nor the sequentially uncorrellated ones. Rather they tend to yield sequences that have certain recognisable patterns or motifs, +%but which occur at irregular times. A certain symbol might tell us about which one of the characteristic patterns will appear next. Each symbol tell a us little bit about the future; in order to make good predictions, +%the listener must continually pay attention, building up expectations on the basis of each new observation. +%% but only a limited amount about the infinite future, we only learn about that as time goes on; there is continual building of prediction. +Processes with high PIR maintain a certain kind of balance between +predictability and unpredictability in such a way that the observer must continually +pay attention to each new observation as it occurs in order to make the best +possible predictions about the evolution of the sequence. This balance between predictability +and unpredictability is reminiscent of the inverted `U' shape of the Wundt curve (see \figrf{wundt}), +which summarises the observations of Wundt \cite{Wundt1897} that stimuli are most +pleasing at intermediate levels of novelty or disorder, where there is a balance between +`order' and `chaos'. + + \begin{fig}{wundt} + \raisebox{-4em}{\colfig[0.43]{wundt}} + % {\ \shortstack{{\Large$\longrightarrow$}\\ {\scriptsize\emph{exposure}}}\ } + {\ {\large$\longrightarrow$}\ } + \raisebox{-4em}{\colfig[0.43]{wundt2}} + \caption{ + The Wundt curve relating randomness/complexity with + perceived value. Repeated exposure sometimes results + in a move to the left along the curve \cite{Berlyne71}. + } + \end{fig} + + +\begin{figure} +\centering +\includegraphics[width=0.2\textwidth]{figures/PeriodicMatrix.png} +\includegraphics[width=0.2\textwidth]{figures/NonDeterministicMatrix_bw.png} +\caption{Two transition matrixes. The shade of white represents the probabilities of transition from one symbol to the next (black=0, white=1). The current symbol is along the bottom, and in this case there are twelve possibilities (mapped to a chromatic scale). The left hand matrix has no uncertainty; it represents a periodic pattern. The right hand matrix contains unpredictability but nonetheless is not completely without perceivable structure, it is of a higher entropy rate. \label{TransitionMatrixes}} +\end{figure} + + + + \begin{fig}{mtriscat} + \colfig[0.9]{mtriscat} + \caption{The population of transition matrices in the 3D space of + entropy rate, redundancy and PIR, + all in bits. + The concentrations of points along the redundancy axis correspond + to Markov chains which are roughly periodic with periods of 2 (redundancy 1 bit), + 3, 4, \etc all the way to period 7 (redundancy 2.8 bits). The colour of each point + represents its PIR---note that the highest values are found at intermediate entropy + and redundancy, and that the distribution as a whole makes a curved triangle. Although + not visible in this plot, it is largely hollow in the middle. \label{InfoDynEngine}} +\end{fig} + + + +%\begin{figure} +%\centering +%\includegraphics[width=0.5\textwidth]{MatrixDistribution.png} +%\caption{The population of transition matrixes distributed along three axes of redundancy, entropy rate and predictive information rate. Note how the distribution makes a curved triangle-like plane floating in 3d space. \label{InfoDynEngine}} +%\end{figure} + \begin{figure}[h] +\centering +\includegraphics[width=0.5\textwidth]{figures/TheTriangle.pdf} +\caption{The Melody Triangle\label{TheTriangle}} +\end{figure} + +\subsection{Populating the triangle}\label{makingthetriangle} + + + +Before the Melody Triangle can used, it has to be `populated' with possible parameter values for the melody generators. These are then plotted in a 3d statistical space of redundancy, entropy rate and predictive information rate. In our case we generated thousands of transition matrixes, representing first-order Markov chains, by a random sampling method. In figure \ref{InfoDynEngine} we see a representation of how these matrixes are distributed in the 3d statistical space; each one of these points corresponds to a transition matrix. + + + +When we look at the distribution of transition matrixes plotted in this space, we see that it forms an arch shape that is fairly thin. It thus becomes a reasonable approximation to pretend that it is just a sheet in two dimensions; and so we stretch out this curved arc into a flat triangle. It is this triangular sheet that is our `Melody Triangle' and forms the interface by which the system is controlled. + + Though the interface is 2D, the third dimension (PIR) is implicitly present, as +transition matrices retrieved from +along the centre line of the triangle will tend to have higher PIR. +We hypothesise that, under +the appropriate conditions, these will be perceived as more `interesting' or +`melodic.' + + When the Melody Triangle is used, regardless of whether it is as a screen based system, or as an interactive installation, it involves a mapping to this statistical space. When the user, through the interface, selects a position within the triangle, the corresponding transition matrix is returned. Figure \ref{TheTriangle} shows how the triangle maps to different measures of redundancy, entropy rate and predictive information rate. + +%%%paragraph explaining what the different parts of the triangle are like. +Each corner corresponds to three different extremes of predictability and unpredictability, which could be loosely characterised as `periodicity', `noise' and `repetition'. %Melodies from the `noise' corner have no discernible pattern; they have high entropy rate, low predictive information rate and low redundancy. These melodies are essentially totally random. A melody along the `periodicity' to `repetition' edge are all deterministic loops that get shorter as we approach the `repetition' corner, until it becomes just one repeating note. It is the areas in between the extremes that provide the more `interesting' melodies. That is, those that have some level of unpredictability, but are not completely random. Or, conversely, that are predictable, but not entirely so. This triangular space allows for an intuitive exploration of expectation and surprise in temporal sequences based on a simple model of how one might guess the next event given the previous one. +In our experiments with visualising and sonifying sequences sampled from + first order Markov chains \cite{Abdallah:2009p4089}, we found that + the measures of redundancy rate, entropy rate and predictive information rate correspond to perceptible + characteristics, and that the transition matrices maximising or minimising + each of these quantities are quite distinct. High entropy rates are associated + with completely uncorrelated sequences with no recognisable temporal structure. + High values of redundancy rate are associated with long periodic cycles (and low PIR + and entropy rate). High values of predictive information rate are associated with intermediate values + of redundancy rate and entropy rate, and recognisable, but not completely predictable, + temporal structures. + + +\section{User Interfaces} +Any number of interfaces could be developed for the Melody Triangle\footnote{The Melody Triangle was developed in Prolog and MatLab. It can be controlled with OpenSoundControl messages, and thus is independent of any specific interface implementation.}. We have developed two; a standard screen based interface where a user moves tokens with a mouse in and around a triangle on screen, and a multi-user interactive installation where a Kinect\footnote{http://www.xbox.com/en-GB/Kinect} camera tracks individuals in a space and maps their positions in the space to the triangle. + +\subsection{The Multi-User Installation} + +\begin{figure} +\centering +\includegraphics[width=0.5\textwidth]{figures/kinnect.pdf} +\caption{The depth map as seen by the Kinect, and the bounding box outlines the blobs detected by OpenNI.\label{Kinect}} +\end{figure} + +As a Kinect camera overlooks a space, its range naturally forms a triangle. As visitors/users comes into the range of the camera, they start generating a melody, the statistical properties of this melody determined by the mapping of physical space to statistical space as discussed above. Thus by exploring the physical space the participant changes the predictability of the generated melodic content. When multiple people are in the space they can cooperate to create interweaving melodies, forming intricate polyphonic textures. + +The streams of symbols are mapped to MIDI and then played with software instruments in Logic. The tracking system was capable of detecting gestures, and these were mapped to different musical effects such as tempo changes, periodicity changes (going to the off-beat), instrument/register changes and volume (see Table \ref{gestures}, Figure \ref{gestures2}). + +\subsubsection{Tracking and Control} + +Tracking and control was done using the OpenNI libraries' API\footnote{http://OpenNi.org/} and high level middle-ware for tracking with Kinect. This provided reliable blob tracking of humanoid forms in 2d space. By triangulating this to the Kinect's depth map it became possible to get reliable coordinate of visitors' positions in the space. + +By detecting the bounding box of the 2d blobs of individuals in the space, and then normalising these based on the distance of the depth map it became possible to work out if an individual had an arm stretched out or if they were crouching. With this it was possible to define a series of gestures for controlling the system without the use of any controllers(see table \ref{gestures}). Thus for instance by sticking out one's left arm quickly, the melody doubles in tempo. By pulling one's left arm in at the same time as sticking the right arm out the melody would shift onto the offbeat. Sending out both arms would change the instrument being `played'. + +\begin{table} +\centering +%\includegraphics[width=0.5\textwidth]{InstructionsText.pdf} +\caption{Gestures and their resulting effect\label{gestures}} +\begin{tabular}{ l c l } +left arm & right arm & meaning\\ +\hline + out & static & double tempo \\ + in & static & halve tempo \\ + static & out & triple tempo \\ + static & in & one-third tempo\\ + out & in & shift to off-beat \\ + out & out & change instrument\\ + in & in & reset tempo\\ +\end{tabular} +\end{table} + +\begin{figure} +\centering +\includegraphics[width=0.5\textwidth]{figures/InstructionsImage2.pdf} +\caption{Gestures and their resulting effect \label{gestures2}} +\end{figure} + + +\subsubsection{Observations} +Although visitors would need an initial bit of training they would then quickly be able to collaboratively design musical textures. For example, one person could lay down a predictable repeating bass line by keeping themselves to the periodicity/repetition side of the room, while a companion can generate a freer melodic line by being nearer the 'noise' part of the space. + + +The collaborative nature of this installation is an area that merits attention. By not having one user be able to control the whole narrative, the participants would communicate verbally and direct each other in the goals of learning to use the system and finding interesting musical textures. This collaboration added an element of playfulness and enjoyment that was clearly apparent. + +As an artefact this installation is an exploratory prototype and occupies an ambiguous role in terms of purpose; it is in a nebulous middle ground between instrument, art installation and technical demonstration. It is clear however, that as a vehicle for communicating ideas related to the expectation, pattern and predictability in music to the public, it is very effective. + +\subsection{The Screen Based Interface} + +\begin{figure} +\centering +\includegraphics[width=0.3\textwidth]{figures/UIscreenshot.png} +\caption{Screen shot of the screen based interface for the Melody Triangle\label{UIScreenShot}} +\end{figure} + +%The Melody Triangle can also be explored with a standard screen, keyboard and mouse interface. A triangle is drawn on the screen, screen space thus mapped to the statistical space of the Melody Triangle. A number of round tokens, each representing a melody can be dragged in and around the triangle. When a token is dragged into the triangle, the system will start generating the sequence of notes with statistical properties that correspond to its position in the triangle. +% +%Additionally there are a number of keyboard controls. These include controls for changing the overall tempo, for enabling and disabling individual voices, changing registers, going to off-beats and changing the speed of individual voices. The system gives visual feedback to indicate when a token has locked on to a new melody, and contains a buffer zone for allowing tokens to be pushed right to the edges of the triangle without falling out. +% +%In this mode, the Melody Triangle can be used as a kind of composition assistant for the generation of interesting musical textures and melodies. However unlike other computer aided composition tools or programming environments, here the composer engages with music on the high and abstract level of expectation, randomness and predictability. + +The screen based interface can serve as a compositional tool. +%%A triangle is drawn on the screen, screen space thus mapped to the statistical +%space of the Melody Triangle. +A number of tokens, each representing a +sonification stream or `voice', can be dragged in and around the triangle. +For each token, a sequence of symbols is sampled using the corresponding +transition matrix, which +%statistical properties that correspond to the token's position is generated. These +%symbols +are then mapped to notes of a scale or percussive sounds% +\footnote{The sampled sequence could easily be mapped to other musical processes, possibly over +different time scales, such as chords, dynamics and timbres. It would also be possible +to map the symbols to visual or other outputs.}% +. Keyboard commands give control over other musical parameters such +as pitch register, inter-onset interval, tempo and dynamics. The system is capable of generating intricate musical textures when multiple tokens are in the triangle. + +In this mode the Melody Triangle is a cognitively-informed compositional aid; unlike other computer aided composition tools or programming environments, here the composer exercises control at the abstract level of information-dynamic +properties. The use of Markov Chains for the generation of musical content is not anything new, rather the novelty lies in the ability to define criteria in the selection of generated materials that relate to how a listener might perceive the output. + + + + + + +\section{Information Dynamics and Musical Preference Study} + +We are currently in the process of using the screen-based +Melody Triangle user interface to investigate the relationship between the information-dynamic +characteristics of sonified Markov chains and subjective musical preference. +We carried out a pilot study with six participants, who were asked +to use a simplified form of the user interface (a single controllable token, +and no rhythmic, registral or timbral controls) under two conditions: +one where a single sequence was sonified under user control, and another +where an additional sequence was sonified in a different register, as if generated +by a fixed invisible token in one of four regions of the triangle. In addition, subjects +were asked to press a key if they `liked' what they were hearing. + +After the study the participants were surveyed with the Goldsmiths Musical Sophistication Index\cite{Mullensiefen:2011ts} to elicit their prior musical experience. + +We recorded subjects' behaviour as well as points which they marked +with a key press. +Some results for three of the subjects are shown in \figrf{mtri-results}. Though +we have not been able to detect any systematic across-subjects preference for any particular +region of the triangle, subjects do seem to exhibit distinct kinds of exploratory behaviour. +Our initial hypothesis, that subjects would linger longer in regions of the triangle +that produced aesthetically preferable sequences, and that this would tend to be towards the +centre line of the triangle for all subjects, was not confirmed. However, it is possible +that the design of the experiment encouraged an initial exploration of the space (sometimes +very systematic, as for subject c) aimed at \emph{understanding} %the parameter space and +how the system works, rather than finding musical patterns. It is also possible that the +system encourages users to create musically interesting output by \emph{moving the token}, +rather than finding a particular spot in the triangle which produces a musically interesting +sequence by itself. + +\begin{fig}{mtri-results} + \def\scat#1{\colfig[0.42]{mtri/#1}} + \def\subj#1{\scat{scat_dwells_subj_#1} & \scat{scat_marks_subj_#1}} + \begin{tabular}{cc} +% \subj{a} \\ + \subj{b} \\ + \subj{c} \\ + \subj{d} + \end{tabular} + \caption{Dwell times and mark positions from user trials with the + on-screen Melody Triangle interface, for three subjects. The left-hand column shows + the positions in a 2D information space (entropy rate vs multi-information rate + in bits) where each spent their time; the area of each circle is proportional + to the time spent there. The right-hand column shows point which subjects + `liked'; the area of the circles here is proportional to the duration spent at + that point before the point was marked.} +\end{fig} + +Comments collected from the subjects +%during and after the experiment +suggest that +the information-dynamic characteristics of the patterns were readily apparent +to most: several noticed the main organisation of the triangle, +with repetitive notes at the top, cyclic patterns along one edge, and unpredictable +notes towards the opposite corner. Some described their systematic exploration of the space. +Two felt that the right side was `more controllable' than the left (a consequence +of their ability to return to a particular distinctive pattern and recognise it +as one heard previously). Two reported that they became bored towards the end, +but another felt there wasn't enough time to `hear out' the patterns properly. +One subject did not `enjoy' the patterns in the lower region, but another said the lower +central regions were more `melodic' and `interesting'. + +We plan to continue the trials with a slightly less restricted user interface in order +make the experience more enjoyable and thereby give subjects longer to use the interface; +this may allow them to get beyond the initial exploratory phase and give a clearer +picture of their aesthetic preferences. In addition, we plan to conduct a +study under more restrictive conditions, where subjects will have no control over the patterns +other than to signal (a) which of two alternatives they prefer in a forced +choice paradigm, and (b) when they are bored of listening to a given sequence. + + + + + +\section{Further Work} +%The Melody Triangle has so far only been used with first-order Markov chains for generating content. This mean that the melodies generated don't have any long term structure or form and hence don't seem to `go anywhere'. As such the system in its current form is better suited to creating textures and short phrases as oppose to composing over-arching musical structures. + +We are currently investigating how higher-order Markov models can be mapped to information theoretic measures and adapting the Melody Triangle to those models. This would generate higher level patterns and provide more long-term structures. Further more sophisticated listener models\cite{Pearce:2005wr}\cite{Potter:2007tt} could be used for computing information measures for more conventional or ecologically valid music. + + As it stands, the streams of symbols generated are only mapped to note values. However they could just as well be applied to any other musical property, such as intervals, chords, dynamics, timbres, structures and key changes. The possibilities for the Melody Triangle to be compositional guide in these other domains remains to be investigated. + + We are investigating the possibility of turning the Melody Triangle into a mobile phone based music making application. It is hoped that by collecting usage statistics we may have a rich source of data that can help determine any relationship between the information dynamics measures and aesthetic preference. +%The Melody Triangle in its current form however forms an ideal tool for investigations into musical preference and their relationship to the information dynamics models, and as such more detailed studies under wider experimental conditions and with more participants will be carried out. +Although our initial data on aesthetic preference are inconclusive, there is still +plenty of work to be done in this area: where-ever there are probabilistic models, +information dynamics can shed light on their behaviour. + +\section{acknowledgments} +This work is supported by an EPSRC Doctoral Training Centre EP/G03723X/1 (HE), GR/S82213/01 and \\EP/E045235/1(SA), an EPSRC Leadership Fellowship, \\EP/G007144/1 (MDP) and EPSRC IDyOM2 EP/H013059/1. Thanks to Louie McCallum and Davie Smith from QMUL EECS for Kinect programming support. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%bibliography here +\bibliography{smc2012template,nime,all,c4dm} + + + + +\end{document}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/mume2012/mume2012_review.tex Fri Aug 24 14:40:59 2012 +0100 @@ -0,0 +1,468 @@ +\documentclass[letterpaper]{article} +\usepackage{aaai} +\usepackage{times} +\usepackage{helvet} +\usepackage{courier} +\usepackage{tools} %custom + + + + +\let\citep=\cite +\newcommand{\colfig}[2][1]{\includegraphics[width=#1\linewidth]{figs/#2}}% +\newcommand\past[1]{\overset{\rule{0pt}{0.2em}\smash{\leftarrow}}{#1}} +\newcommand\fut[1]{\overset{\rule{0pt}{0.1em}\smash{\rightarrow}}{#1}} +\frenchspacing +\pdfinfo{ +/Title (The Melody Triangle: Exploring Pattern and Predictability in Music ) +/Subject (Musical Metacreation, Interfaces) +/Author (Henrik Ekeus, Samer A. Abdallah, Mark D. Plumbley, Peter W. McOwan)} +\setcounter{secnumdepth}{0} + +% The file aaai.sty is the style file for AAAI Press +% proceedings, working notes, and technical reports. +% +\title{The Melody Triangle:\\ Exploring Pattern and Predictability in Music} +\author{Henrik Ekeus, Samer A. Abdallah, Mark D. Plumbley, Peter W. McOwan\\ +Centre for Digital Music, Queen Mary University of London,\\ +London E1 4NS, UK\\ +} +\begin{document} +\maketitle +\begin{abstract} +\begin{quote} + +The Melody Triangle is an interface for the discovery of melodic materials, where the input -- positions within a triangle -- directly map to information theoretic properties of the output. A model of human expectation and surprise in the perception of music, \emph{information dynamics}, is used to `map out' a musical generative system's parameter space. This enables a user to explore the possibilities afforded by a generative algorithm, in this case Markov chains, not by directly selecting parameters, but by specifying the subjective \emph{predictability} of the output sequence. We describe some of the relevant ideas from information dynamics and how the Melody Triangle is defined in terms of these. We describe its incarnation as a screen based performance tool and compositional aid for the generation of musical textures; the userŐs control at the abstract level of randomness and predictability, and some pilot studies carried out with it. We also briefly outline a multi-user installation, where collaboration in a performative setting provides a playful yet informative way to explore expectation and surprise in music, and a forthcoming mobile phone version of the Melody Triangle. + +\end{quote} +\end{abstract} + +\noindent + +\section{Introduction} + +The use of generative stochastic processes in music composition has been widespread for +decades---for instance Iannis Xenakis applied probabilistic mathematical models +to the creation of musical materials\cite{Xenakis:1992ul}. However it can sometimes be difficult for a composer to find desirable parameters and navigate the possibilities of a generative algorithm intuitively. + +The Melody Triangle is an interface for the discovery of melodic content where the parameter space of a stochastic generative musical process, the Markov chain, is `mapped out' according to the \emph{predictability} of the output. The Melody Triangle was developed in the context of \emph{information dynamics}\cite{CIP}; an information theoretic approach to modelling human expectation and surprise in the perception of music. +Users of the Melody Triangle do not select the parameters to generative processes directly, rather they provide input in the form of a position within a triangle, and this maps to the information theoretic properties of an output melody. + For instance one corner of the triangle returns completely random melodies, while an other area yields entirely predictable and periodic patterns, the entirety of the triangle covering a spectrum of predictability of the output melodies. + +In this paper we outline the concepts and ideas behind information dynamics, and describe the information measures that lead to the development of the Melody Triangle. We describe its physical realisations; a multi-user interactive installation where visitors would use their bodies and gestures to generate musical materials, and a screen based interface. We outline some pilot studies carried out with the screen interface, as well as some qualitative feedback from music practitioners exploring its potential as a performance or composition tool. Finally we outline a forthcoming mobile phone version of the Melody Triangle. + +\section{Information Dynamics} +\label{s:Intro} + The relationship between + Shannon's \shortcite{Shannon48} information theory and music and art in general has been the + subject of some interest since the 1950s + \cite{Youngblood58,CoonsKraehenbuehl1958,Moles66,Meyer67,Cohen1962}. + The general thesis is that perceptible qualities and subjective states + like uncertainty, surprise, complexity, tension, and interestingness + are closely related to information-theoretic quantities like + entropy, relative entropy, and mutual information. + +Music is an inherently dynamic process. The idea that the musical experience is strongly shaped by the generation + and playing out of strong and weak expectations was put forward by, amongst others, + music theorists L. B. Meyer \shortcite{Meyer:1967} and Narmour \shortcite{Narmour:1977}. + + An essential aspect of this is that music is experienced as a phenomenon + that unfolds in time, rather than being apprehended as a static object + presented in its entirety. Meyer argued that the experience depends + on how we change and revise our conceptions \emph{as events happen}, on + how expectation and prediction interact with occurrence, and that, to a + large degree, the way to understand the effect of music is to focus on + this `kinetics' of expectation and surprise. + + Prediction and expectation are essentially probabilistic concepts + and can be treated mathematically using probability theory. + We suppose that when we listen to music, expectations are created on the basis + of our familiarity with various styles of music and our ability to + detect and learn statistical regularities in the music as they emerge. + There is experimental evidence that human listeners are able to internalise + statistical knowledge about musical structure, + \cite{SaffranJohnsonAslin1999}, and also + that statistical models can form an effective basis for computational + analysis of music, + \cite{ConklinWitten95,PonsfordWigginsMellish1999,Pearce2005}. + +Information dynamics considers several different kinds of predictability in musical patterns, how these might be quantified using the tools of information theory, +and how they shape or affect the listening experience. Our working hypothesis is that listeners maintain a dynamically evolving probabilistic belief state that enables them to make predictions about how a piece of music will continue. + +They do this using both the immediate context of the piece as well as using previous musical experience, such as a familiarity with musical styles and conventions. As the music unfolds, listeners continually revise this belief state, which includes predictive + distributions over possible future events. These changes in probabilistic beliefs can be associated with +quantities of information; these are the focus of information dynamics. + +In this next section we briefly describe the information measures that we use to define the Melody Triangle, however a more complete overview of information dynamics and some of its applications can be found in \cite{Abdallah:2009p4089} and \cite{CIP}. + +\subsection{Sequential Information Measures}\label{sec:Sequential_Information_Measures} + +Consider a sequence of symbols from the viewpoint of an observer at a certain time, and split the +sequence into a single symbol in the \emph{present} ($X_t$), an infinite \emph{past} ($\past{X}_t$) and the +infinite \emph{future} ($\fut{X}_t$). The symbols arrive at a constant, uniform rate. + +The \emph{entropy rate} of a random process is a well-known, basic measure of its randomness or +unpredictablity. The entropy rate is the entropy, \emph{H}, of the \emph{present} given the \emph{past}: +\begin{equation} + \label{eq:entro-rate} + h_\mu = H(X_t|\past{X}_t). +\end{equation} +that is, it represents our average uncertainty about the present symbol \emph{given} +that we have observed everything before it. Processes with zero entropy rate can +be predicted perfectly given enough of the preceding context. + +The \emph{multi-information rate} $\rho_\mu$ \cite{Dubnov2004} + is the mutual + information, \emph{I}, between the `past' and the `present': +\begin{equation} + \label{eq:multi-info} + \rho_\mu = I(\past{X}_t;X_t) = H(X_t) - H(X_t|\past{X}_t). +\end{equation} + +Multi-information rate can be thought of as measures of \emph{redundancy}, quantifying the extent to which the same information is to be found in all parts of the sequence. +It is a measure of how much the predictability of the process depends on knowing the +preceding context. It is the difference between the entropy of a single element of the +sequence in isolation (imagine choosing a note from a musical score at random with your +eyes closed and then trying to guess the note) and its entropy after taking into account +the preceding context: +If the previous symbols reduce our uncertainty about the present symbol a great deal, then +the redundancy is high. For example, if we know that a sequence consists of a repeating +cycle such as $ \ldots b, c, d, a, b, c, d, a \ldots$, but we don't know which was the first +symbol, then the redundancy is high, as $H(X_t)$ is high (because we +have no idea about the present symbol in isolation), but $H(X_t|\past{X}_t)$ +is zero, because knowing the previous symbol immediately tells us what the present symbol is. + +The \emph{predictive information rate} (PIR) \cite{Abdallah:2009p4089} brings in our uncertainty about the future. It is a +measure of how much each symbol reduces our uncertainty about the future as it is +observed, \emph{given} that we have observed the past: +\begin{equation} +\label{eq:PIR} + b_\mu = I(X_t;\fut{X}_t|\past{X}_t) = H(\fut{X}_t|\past{X}_t) - H(\fut{X}_t|X_t,\past{X}_t). +\end{equation} +It is a measure of the mutual information between the `presentŐ and the `futureŐ given the `past'. In other words, it is a measure of the \emph{new} information in each symbol. + +The behaviour of the predictive information rate make it interesting from a compositional point of view. The definition +of the PIR is such that it is low both for extremely regular processes, such as constant +or periodic sequences, \emph{and} low for extremely random processes, where each symbol +is chosen independently of the others, in a kind of `white noise'. In the former case, +the pattern, once established, is completely predictable and therefore there is no +\emph{new} information in subsequent observations. In the latter case, the randomness +and independence of all elements of the sequence means that, though potentially surprising, +each observation carries no information about the ones to come. + +Processes with high PIR maintain a certain kind of balance between +predictability and unpredictability in such a way that the observer must continually +pay attention to each new observation as it occurs in order to make the best +possible predictions about the evolution of the sequence. This balance between predictability +and unpredictability is reminiscent of the inverted `U' shape of the Wundt curve (see \Figrf{wundt}), +which summarises the observations of Wundt \shortcite{Wundt1897} that stimuli are most +pleasing at intermediate levels of novelty or disorder, where there is a balance between +`order' and `chaos'. + + \begin{fig}{wundt} + \raisebox{-4em}{\colfig[0.43]{wundt}} + {\ {\large$\longrightarrow$}\ } + \raisebox{-4em}{\colfig[0.43]{wundt2}} + \caption{ + The Wundt curve relating randomness/complexity with + perceived value. Repeated exposure sometimes results + in a move to the left along the curve \cite{Berlyne71}. + } + \end{fig} + +A similar shape is visible in the upper envelope of the plot in \Figrf{mtriscat}, which is a 3-D scatter plot of + the information information measures for several thousand +first-order Markov chain transition matrices generated by a random sampling method. +The coordinates of the `information space' are entropy rate ($h_\mu$), redundancy ($\rho_\mu$), and +predictive information rate ($b_\mu$). The points along the `redundancy' axis correspond +to periodic Markov chains. Those along the `entropy' axis produce uncorrelated sequences +with no temporal structure. Processes with high PIR are to be found at intermediate +levels of entropy and redundancy. + +These observations led us to construct the `Melody Triangle'. + +\begin{figure} +\centering +\includegraphics[width=0.49\linewidth]{figs/PeriodicMatrix.pdf} +\includegraphics[width=0.49\linewidth]{figs/NonPeriodicMatrix.pdf} +\caption{Two transition matrixes representing Markov chains. The shade of gray represents the probabilities of transition from one symbol to the next (white=0, black=1). The current symbol is along the bottom, and the next symbol is along the left. The left hand matrix has no uncertainty; it represents a periodic pattern (a,d,c,b,a,d,c,b,a,d,c,b,a\dots). The right hand matrix contains unpredictability but nonetheless is not completely without perceivable structure (we know for instance that any `b' will always be followed by an `a' and preceded by a `c'), it is of a higher entropy rate. \label{TransitionMatrixes}} +\end{figure} + + \begin{fig}{mtriscat} + \colfig[1]{mtriscat} + \caption{The population of transition matrices in the 3D space of + entropy rate ($h_\mu$), redundancy ($\rho_\mu$) and predictive information rate ($b_\mu$), + all in bits. Note that the distribution as a whole makes a curved triangle. Although + not visible in this plot, it is largely hollow in the middle. + The concentrations of points along the redundancy axis correspond + to Markov chains which are roughly periodic with periods of 2 (redundancy 1 bit), + 3, 4, \etc all the way to period 7 (redundancy 2.8 bits). Note that the highest PIR values are found at intermediate entropy + and redundancy. \label{InfoDynEngine}} +\end{fig} + +\section{The Melody Triangle}\label{makingthetriangle} + +The Melody Triangle is an interface that is designed around this natural distribution of Markov chain transition +matrices in the information space of entropy rate ($h_\mu$), redundancy ($\rho_\mu$) and predictive information rate ($b_\mu$), as illustrated in \Figrf{mtriscat}. + +The distribution of transition matrices in this space forms a relatively thin +curved sheet. Thus, it is a reasonable simplification to project out the +third dimension (the PIR) and present an interface that is just two dimensional. + +The right-angled triangle is rotated, reflected and stretched to form an equilateral triangle with +the `redundancy'/`entropy rate' vertex at the top, the `redundancy' axis down the left-hand +side, and the `entropy rate' axis down the right, as shown in \Figrf{TheTriangle}. +This is our `Melody Triangle' and +forms the interface by which the system is controlled. + +\begin{fig}{TheTriangle} + \colfig[1]{TheTriangle.pdf} + \caption{The Melody Triangle} +\end{fig} + +\subsection{Usage} + +The user selects a point within the triangle, this is mapped into the +information space and the nearest transition matrix is used to generate +a sequence of values which are then sonified either as pitched notes or percussive +sounds. + +Though the interface is 2D, the third dimension (predictive information rate) is implicitly present, as +transition matrices retrieved from +along the centre line of the triangle will tend to have higher PIR. + +As shown in as shown in \Figrf{TheTriangle}, the corners correspond to three different extremes of predictability and +unpredictability, which could be loosely characterised as `periodicity', `noise' +and `repetition'. Melodies from the `noise' corner (high $h_\mu$, low $\rho_\mu$ +and low $b_\mu$) have no discernible pattern; +those along the `periodicity' +to `repetition' edge are all cyclic patterns that get shorter as we approach +the `repetition' corner, until each is just one repeating note. Those along the +opposite edge consist of independent random notes from non-uniform distributions. +Areas between the left and right edges will tend to have higher predictive information rate ($b_\mu$), +and we hypothesise that, under +the appropriate conditions, these will be perceived as more `interesting' or `melodic.' +These melodies have some level of unpredictability, but are not completely random. +Or, conversely, are predictable, but not entirely so. + + Given coordinates corresponding to a point in the triangle, we select from a pre-built +library of random processes, choosing one whose entropy rate and redundancy match the desired +values. The implementations discussed in this paper use first order Markov chains as the content generator, +since it is easy to compute the theoretically exact values of entropy rate, redundancy and predictive +information rate given the transition matrix of the Markov chain. However, in principle, any generative system could be used to create the library of sequences, given an appropriate probabilistic listener model supporting +the estimation of entropy rate and redundancy. + +The Markov chain based implementation generates streams of symbols in the abstract; the alphabet of symbols is then mapped to a set of distinct sounds, such as pitched notes in a scale. Further by layering these streams, intricate musical textures can be created. The Melody Triangle does not take into account the statistical experience of our exposure to tonal music. Even if a particular stream of symbols is periodic and predictable, in mapping to the chromatic scale there is a chance that the melody may conflict with culturally defined expectations. A mapping to the diatonic scale however is less likely to lead to such conflicts, and mappings to the pentatonic scale even less so. Indeed, the symbols can be mapped to a set of percussive sounds, and even non sonic outputs such as visible shapes, colours, or movements. + +The information measures that define the Melody Triangle assume a constant rate of symbols, and thus the notes of each output melody proceeds at a uniform mate. Although the placing of events in time has a strong effect on expectations, surprise and satisfaction in music, the system does not, as yet, address this temporal dimension. + +\section{Interfaces} + +\subsection{Interface 1: The Interactive Installation} +\begin{figure} +\centering +\includegraphics[width=1\linewidth]{figs/kinnect.pdf} +\caption{The depth map as seen by the Kinect camera in the interactive installation version of the Melody Triangle. The bounding box outlines the blobs detected by OpenNI.\label{Kinect}} +\end{figure} +The Melody Triangle was first implemented as a multi-user +interactive installation. It has been exhibited at the Brighton Science Festival 2012, Digital Shoreditch as well as at The British Science Festival 2011. A Kinect\footnote{http://www.xbox.com/en-GB/Kinect} camera tracks individuals in a space, the range of its depth sensors naturally forming a triangle. + + As visitors/users come into the range of the camera, they start generating a melody, the statistical properties of this melody determined by the mapping of physical space to the statistical space of the Melody Triangle. Thus by exploring the physical space, the participant changes the predictability of the generated melodic content. When multiple people are in the space they can cooperate to create interweaving melodies, forming intricate polyphonic textures. + +This makes the interaction physically engaging and (as our experience with visitors both young and old has demonstrated) more playful. + +\subsubsection{Tracking and Control} + +Tracking and control was done using the OpenNI libraries' API\footnote{http://OpenNi.org/} and high level middle-ware for tracking with Kinect. This provided reliable blob tracking of humanoid forms in 2D space. By triangulating this to the Kinect's depth map it became possible to get reliable coordinate of visitors' positions in the space. + +By detecting the bounding box of the 2D blobs of individuals in the space, and then normalising these based on the distance of the depth map it became possible to work out if an individual had an arm stretched out or if they were crouching. With this it was possible to define a series of gestures for controlling the system without the use of any controllers(see table \ref{gestures}). Thus for instance by sticking out one's left arm quickly, the melody doubles in tempo. By pulling one's left arm in at the same time as sticking the right arm out the melody would shift onto the offbeat. Sending out both arms would change the instrument being `played', and crouching would decrease the volume of the melody. + +\begin{table} +\centering +\caption{Gestures and their resulting effect\label{gestures}} +\begin{tabular}{ l c l } +left arm & right arm & meaning\\ +\hline + out & static & double tempo \\ + in & static & halve tempo \\ + static & out & triple tempo \\ + static & in & one-third tempo\\ + out & in & shift to off-beat \\ + out & out & change instrument\\ + in & in & reset tempo\\ +\end{tabular} +\end{table} + +\begin{figure} +\centering +\includegraphics[width=1\linewidth]{figs/InstructionsImage3.pdf} +\caption{Gestures and their resulting effect \label{gestures2}} +\end{figure} + + + +\subsubsection{Observations} +Although visitors would need some initial instructions, they were then quickly able to collaboratively design musical textures. For example, one person would lay down a predictable repeating bass line by keeping themselves to the periodicity/repetition side of the room, while a companion can generate a freer melodic line by being nearer the 'noise' part of the space. + +The collaborative nature of this installation is an area that merits attention. By not having one user be able to control the whole narrative, the participants would communicate verbally and direct each other in the goals of learning to use the system and finding interesting musical textures. This collaboration added an element of playfulness and enjoyment that was clearly apparent. + +As an artefact this installation occupies an ambiguous role in terms of purpose; it is in a nebulous middle ground between instrument, art installation and technical demonstration. It is clear however, that as a vehicle for communicating ideas related to the expectation, pattern and predictability in music to the general public, it has proved very effective. + +However we were interested in carrying out some studies under more controlled circumstances. Additionally we are interested in the Melody Triangle's potential as a compositional aid or music performance interface. To this end we developed a screen based user interface to the Melody Triangle. + + +\subsection{Interface 2: The Screen Interface} + + \begin{fig}{melTriScreenShot} + \colfig[1]{melTriScreenShot} + \caption{Screen shot of the Melody Triangle screen UI. On the right current transition matrixes being played are displayed. The tokens flash when ever a note from its melody is rendered. } +\end{fig} + +In the screen based interface, a number of tokens, each representing a +sonification stream or `voice', can be dragged in and around the triangle. +For each token, a sequence of symbols is sampled using the corresponding +transition matrix, which are then mapped to notes of a scale or percussive sounds% +\footnote{The sampled sequence could easily be mapped to other musical processes, possibly over +different time scales, such as chords, dynamics and timbres. It would also be possible +to map the symbols to visual or other outputs.}% +. Keyboard commands give control over other musical parameters such +as the pitch register, volume, scale, inter-onset interval and instrument for each voice. +%The possibilities afforded by the Melody Triangle in these other domains remains to be investigated. +The system is capable of generating quite intricate musical textures when multiple tokens +are in the triangle. The overlapping and interweaving of melodies of varying periodicity's and predictability is well suited for making content that could stylistically be characterised as `minimalism'. + +This interface is quite unlike other computer aided composition tools or programming +environments, as here the composer exercises control at the abstract level of information-dynamic +properties. +A video of the interface in use can be viewed here - \emph{http://bit.ly/My49lT} + + + + + + +\section{User trials with the Melody Triangle} +We carried out a pilot study with six participants who were asked +to use a simplified form of the user interface (a single controllable token, +and no rhythmic, registral or timbral controls) under two conditions: +one where a single sequence was sonified under user control, and another +where an additional sequence was sonified in a different register, as if generated +by a fixed invisible token in one of four regions of the triangle. In addition, subjects +were asked to press a key if they `liked' what they were hearing. The subj + + +Our hypothesis is that users would linger longer in areas of the triangle that would produce more aesthetically desirable sequences, and these would tend to be the in the areas of the triangle that are of high predictive information rate, that is, areas along the middle and lower edge of the triangle. + + +We recorded subjects' behaviour as well as points which they marked +with a key press. After the study the participants were surveyed with the Goldsmiths Musical Sophistication Index\cite{Mullensiefen:2011ts} to elicit their prior musical experience, which varied broadly. The sample size, however, was too small to draw any statistically significant correlations between the collected data and the index. + +\subsection{Results} +Some results for four of the subjects are shown in \Figrf{mtri-results}. We have not been able to detect any systematic across-subject preference for any particular region of the triangle. + +\begin{fig}{mtri-results} + \def\scat#1{\colfig[0.42]{mtri/#1}} + \def\subj#1{\scat{scat_dwells_subj_#1} & \scat{scat_marks_subj_#1}} + \begin{tabular}{cc} + \subj{a} \\ + \subj{b} \\ + \subj{c} \\ + \subj{d} + \end{tabular} + \caption{Dwell times and mark positions from user trials with the + on-screen Melody Triangle interface, for four subjects. The left-hand column shows + the positions in a 2D information space (entropy rate vs redundancy + in bits) where each spent their time; the area of each circle is proportional + to the time spent there. The right-hand column shows point which subjects + `liked'; the area of the circles here is proportional to the duration spent at + that point before the point was marked.} +\end{fig} + + +Comments collected from the subjects +suggest that +the information-dynamic characteristics of the patterns were readily apparent +to most: several noticed the main organisation of the triangle, +with repetitive notes at the top, cyclic patterns along one edge, and unpredictable +notes towards the opposite corner. Some described their systematic exploration of the space. +Two felt that the right side was `more controllable' than the left (a consequence +of their ability to return to a particular distinctive pattern and recognise it +as one heard previously). Two reported that they became bored towards the end, +but another felt there wasn't enough time to `hear out' the patterns properly. +One subject did not `enjoy' the patterns in the lower region, but another said the lower +central regions were more `melodic' and `interesting'. + +\subsection{Discussion} +Our initial hypothesis, that subjects would linger longer in regions of the triangle +that produced aesthetically preferable sequences, and that this would tend to be towards the +centre line of the triangle for all subjects, was not confirmed. +However the subjects did seem to exhibit distinct kinds of exploratory behaviour. +It is possible +that the design of the experiment encouraged an initial exploration of the space (sometimes +very systematic, as for subject (c)) aimed at \emph{understanding} +how the system works, rather than finding musical patterns. It is also possible that the +system encourages users to create musically interesting output by \emph{moving the token}, +rather than finding a particular spot in the triangle which produces a musically interesting +sequence by itself. + +We plan to continue the trials with a slightly less restricted user interface in order +make the experience more enjoyable and thereby give subjects longer to use the interface; +this may allow them to get beyond the initial exploratory phase and give a clearer +picture of their aesthetic preferences. In addition, we plan to conduct a +study under more restrictive conditions, where subjects will have no control over the patterns +other than to signal (a) which of two alternatives they prefer in a forced +choice paradigm, and (b) when they are bored of listening to a given sequence. + +\section{Qualitative Feedback} + +In parallel to the pilot study, we have collected qualitative feedback from potential users of the screen interface. Here four participants were interviewed, all practicing musicians that use computers in music production or in performance. This is with a view to establish what features would be desired for any eventual further development of the interface, for instance as a VST instrument for inclusion in a standard audio production environment. + +Unlike in the pilot study where participants would not know anything about the interface before hand and were asked to `explore' with as little instructions in possible, here the potential users are first taught how to use the system. Then they are given some time to play and experiment, and in informal discussion feedback and criticism of the system is sought ought. As part of a broader conversation, they were asked if they could identify the different areas of the triangle, what features of the system they liked and disliked, if they could see themselves using the system as part of their musical practice, and if so how. + +Some points collected include - +\begin{itemize} +\item The subjects were very quick to get to grips with the properties of the different areas of the triangle, and found it quite intuitive. +\item The more periodic/predictable half of the triangle was used considerably more by all participants. +\item Some expressed interest in its potential as live performance interface for electronic music. +\item All users desired more control over the mapping of symbols to notes, and some desired the ability to map the output of the triangle to other parameters such as to the control of filters and effect parameters. +\end{itemize} + +Two of the users indicated that the Melody Triangle could integrate well into their musical practice, one was unsure and the other said it would not and expressed frustration at having little control over the musical style of the output. +Some comments are provided here - +\begin{quote}``If it was a kind of VST instrument, I would use it really a lot, definitely! Because there are not that many around that make this kind of stuff. I always love if something is generative or stochastic to generate things I would not come up with, but to generate a lot of them in a short amount of time and I'm the creative catalyst that just picks them.. and then have this kind of choices to edit probabilities, I really like that.''\end{quote} + +\begin{quote} +``Here what is cool is that .. I can make multiple loops and they all have different characteristics and I don't have to adjust like five numbers in different places, it's in one thing, and that's what I like most, it's kind of like a macro [interface].'' +\end{quote} + +\begin{quote}``I would use it as an idea generator ..what i probably would do is I would run this, maybe I would select some random sounds and maybe I would try around and develop some motifs, and see `oh I like that!' and would record that as midi and move on. "\end{quote} + +Stochastic process have often been used to generate musical materials. While such processes can drive the \emph{generative} phase of the creative process, these comments suggest that information dynamics and the Melody Triangle can serve as a novel framework for a \emph{selective} phase; helping composers discover generated materials that are of value. This alternation of generative and selective phases has been noted before \cite{Boden1990}. + +\section{The Mobile App} + +In order to further our study into musical preferences with a wider audience, the Melody Triangle is being implemented as an Android mobile phone application. The research motivation is to use the app as a means of collecting large quantities of crowd-sourced data, providing us with a larger data set than could be realistically achieved through individual studies. + + The audio engine is developed in libpd\footnote{http://libpd.cc/}, a port of the open source Pure Data programming environment. The app will allow users to use the phone's touch screen to drag tokens around the triangle and generate musical textures. Usage statistics will be collected on the phone and periodically uploaded to our servers for analysis. + + + \begin{fig}{mobile} + \colfig[1]{mobile} + \caption{The Melody Triangle mobile phone app } +\end{fig} + + +\section{Conclusion} +We presented the Melody Triangle; an interface for the discovery of melodic content where the input -- positions within a triangle -- corresponds to the predictability of the output melodies. The Melody Triangle is contextualised in \emph{information dynamics}; an information theoretic approach to modelling human expectation and surprise. + +We outlined the relevant ideas behind information dynamics and described three key information theoretic measures; entropy rate, redundancy and a measure of \emph{predictive information rate}, which describes the gain in information made by current observations about the future, but which are not already known from past observations. We described how the natural distribution of randomly generated Markov chains in terms of these measures lead us to design the Melody Triangle, and outlined its two physical incarnations. + +The first is a multi-user installation where collaboration in a performative setting provides a playful yet informative way to explore expectation and surprise in music. + +The second is a screen based interface where the Melody Triangle can be used as a musical performance interface or compositional aid for the generation of musical textures; the userŐs control at the abstract level of randomness and predictability. We outlined some qualitative feedback gathered from users of the system. It indicates that the Melody Triangle could be useful as a performance tool or composition aid. We described a pilot study where the screen-based interface was used under experimental conditions to determine how the information dynamics measures might relate to musical preference. Although the results were inconclusive, we plan to continue this work under different experimental setups. Finally we outlined a forthcoming mobile phone version of the Melody Triangle that, when released, will collect data from its users with a view to help us identify any relationship between human musical preferences and the information-dynamic model of human expectation and surprise. + +\section{Acknowledgments} +This work is supported by an EPSRC Doctoral Training Centre EP/G03723X/1 (HE), GR/S82213/01 and \\EP/E045235/1(SA), an EPSRC Leadership Fellowship, \\EP/G007144/1 (MDP) and EPSRC IDyOM2 EP/H013059/1. + +\bibliography{all,c4dm,all2}\bibliographystyle{aaai} +\end{document}