Mercurial > hg > syncopation-dataset

\section{Background}
\label{sec:background}

In this section, to introduce the theory behind the toolkit, we briefly present key aspects of the mathematical framework described in \cite{Song15thesis} and then give an overview of each syncopation model. %Please refer to for a more detailed treatment of all the related concepts and their mathematical notation.

\begin{figure}[t]
\centering
\includegraphics[width=\columnwidth]{images/general3.pdf}
\caption{An example note sequence. Two note events $\note_0$ and $\note_1$ occur in the time-span between time origin $\timeorigin$ and end time $\timeend$. The time-span duration $\timespan$ is three quarter-note periods. The rests at the start and end of the bar are not explicitly represented as objects in their own right here but as periods where no notes sound.}
\label{fig:general}
\end{figure}

% \subsection{Rhythm representation}
% \label{sec:background:rhythm}

\subsection{Time-span}
\label{sec:background:rhythm:timespan}
The term \emph{time-span} has been defined as the period between two points in time, including all time points in between \cite{Lerdahl_Jackendoff83GTTM}. To represent a given rhythm, we must specify the time-span within which it occurs by defining a reference time origin $\timeorigin$ and end time $\timeend$, the total duration $\timespan$ of which is $\timespan = \timeend-\timeorigin$ (Figure~\ref{fig:general}).

The basic time unit is in \emph{ticks} as opposed to seconds, therefore we set the parameter Ticks Per Quarter-note (TPQ) to describe the time-span of a length of rhythm. The minimum TPQ is determined by the rhythm-pattern so that all the events can be represented. As demonstrated in Figure~\ref{fig:clave}, the \emph{Son} clave rhythm pattern could be represented both at 8 and 4 ticks per quarter-note but the minimum representable resolution would be 4.

\begin{figure}[t]
\centering
\includegraphics[width=0.85\columnwidth]{images/clave_tpq.pdf}
\caption{The representation of \emph{Son} clave rhythm in different settings of Ticks Per Quarter-note (TPQ). Each quarter-note is represented by 8 and 4 ticks in (a) and (b) respectively, thus all the sounded notes are captured (highlighted by the blue circles); however in (c) where TQP is 2, the second note cannot be represented by this resolution.}
\label{fig:clave}
\end{figure}


\subsection{Note and sequences}
\label{sec:background:rhythm:note}
A single, \emph{note} event $\note$ occurring in this time-span may be described by the tuple $(\starttime, \durationtime, \velocity)$ as shown in Figure~\ref{fig:general}, where $\starttime$ represents start or \emph{onset} time relative to $\timeorigin$, $\durationtime$ represents note duration in the same units and $\velocity$ represents the note \emph{velocity} (i.e. the dynamic; how loud or accented the event is relative to others), where $\velocity > 0$.

This allows us to represent an arbitrary rhythm as a note sequence $\sequence$, ordered in time
\begin{equation}
\label{eq:def_sequence}
\sequence = \langle\note_0, \note_1, \cdots,  \note_{\sequencelength-1}\rangle
\end{equation}

Suppose TQP is set as 4, an example note sequence for the clave rhythm in Figure~\ref{fig:clave} can be:
\begin{equation}
\label{eq:note_sequence}
\sequence = \langle (0,3,2),(3,1,1),(6,2,2),(10,2,1),(12,4,1) \rangle
\end{equation}

The higher $velocity$ values of the first and third notes reflect that these notes are accented.

An alternative representation of a rhythm is the \emph{velocity sequence}.  This is a sequence of values representing equally spaced points in time; the values corresponding to the normalised velocity of a note onset if one is present at that time or zero otherwise.

The velocity sequence for the above clave rhythm can be derived as
\begin{equation}
\label{eq:velocity_sequence}
\spanvector = \langle 1,0,0,0.5,0,0,1,0,0,0,0.5,0,0.5,0,0,0 \rangle
\end{equation}

It should be noted that the conversion between note sequence and velocity sequence is not commutative, because the note duration information is lost when converting from note sequence to velocity sequence. For example, the resulting note sequence converted from Equation~\ref{eq:velocity_sequence} would be
\begin{equation}
\label{eq:note_sequence}
\sequence' = \langle (0,3,2),(3,3,1),(6,4,2),(10,2,1),(12,4,1) \rangle
\end{equation}
, which is different from the original note sequence in Equation~\ref{eq:note_sequence}.

\subsection{Metrical structure and time-signature}
\label{sec:background:rhythm:meter}

\begin{figure}[t]
\centering
\includegraphics[width=\columnwidth]{images/meter_hierarchy7.pdf}
\caption{Metrical hierarchies for different time-signatures.(a) A simple-duple hierarchy dividing the bar into two groups of two (as with a 4/4 time-signature). (b) A compound-duple hierarchy dividing a bar into two beats, each of which is subdivided by three (e.g. 6/8 time-signature). Reading the weights from left to right in any level $\metriclevel$ gives the elements in sequence $\metricvector_\metriclevel$}
\label{fig:meter-hierarchy}
\end{figure}


Isochronous-meter is formed with a multi-level hierarchical metrical structure~\cite{Lerdahl_Jackendoff83GTTM, London04Meter}. As shown in Figure~\ref{fig:meter-hierarchy}, under a certain metrical hierarchy, a bar is divided by a subdivision factor $\subdivision$ at each metrical level with index $\metriclevel$ where $\metriclevel \in [0, \levelmax]$. The list of subdivision factors is referred as a \emph{subdivision sequence}.

Events at different metrical positions vary in perceptual salience or \emph{metrical weight}~\cite{Palmer_Krumhansl90}. These weights may be represented as a \emph{weight sequence} $\metricweightset = \langle \metricweight_0, \metricweight_1, ... \metricweight_{\levelmax}\rangle$. The prevailing hypothesis for the assignment of weights in the metrical hierarchy is that a time point that exists in both the current metrical level and the level above is said to have a \emph{strong} weight compared gto time points that are not also present in the level above~\cite{Lerdahl_Jackendoff83GTTM}. The choice of values for the weights in $\metricweightset$ can vary between different models but the assignment of weights to nodes is common to all as in ~\cite{Lerdahl_Jackendoff83GTTM}.

\subsection{Syncopation models}
\label{sec:background:models}
In this section we give a brief review of each implemented syncopation model, including their general hypothesis and mechanism.   To compare the capabilities of each model, we give an overview of the musical features each captures in Table~\ref{ta:capabilities}. For a detailed review of these models see \cite{Song15thesis}.

\subsubsection{Longuet-Higgins and Lee 1984 (\lhl)}
\label{sec:background:models:lhl}

Longuet-Higgins and Lee's model \cite{LHL84} decomposes rhythm patterns into a tree structure as described in Section~\ref{sec:background:rhythm:meter} with metrical weights $\metricweight_\metriclevel = -\metriclevel$ for all $\metricweight_\metriclevel \in \metricweightset$ i.e. $\metricweightset = \langle 0,-1,-2, ... \rangle$.
The hypothesis of this model is that a syncopation occurs when a rest ($\RestNode$) in one metrical position follows a note ($\NoteNode$) in a weaker position.  Where such a note-rest pair occurs, the difference in their metrical weights is taken as a local syncopation score. Summing the local scores produces the syncopation prediction for the whole rhythm sequence.

\subsubsection{Pressing 1997 (\pressing)}
\label{sec:background:models:prs}
Pressing's cognitive complexity model~\cite{Pressing97,Pressing93} specifies six prototype binary sequences and ranks them in terms of \emph{cognitive cost}. For example, the lowest cost is the \emph{null} prototype that contains either a rest or a single note whereas the \emph{filled} prototype that has a note in every position of the sequence e.g.
$
\langle 1,1,1,1 \rangle \nonumber
$
which, in turn, has a lower cost than the \emph{syncopated} prototype that has a 0 in the first (i.e.\ strongest) metrical position e.g.
$
\langle 0,1,1,1 \rangle \nonumber
$.
The model analyses the cost for the whole rhythm-pattern and its sub-sequences at each metrical level determined by $\subdivision_\metriclevel$. The final output is a weighted sum of the costs by the number of sub-sequences in each level.

\subsubsection{Toussaint 2002 `Metric Complexity' (\metrical)}
\label{sec:background:models:tmc}
Toussaint's \emph{metric complexity} measure \cite{Toussaint02Metrical} defines the metrical weights as $\metricweight_\metriclevel = \metriclevel_{\textrm{max}} - \metriclevel +1$, thus stronger metrical position is associated with higher weight and the weakest position will be $\metricweight_{\metriclevel_{\textrm{max}}}=1$. The hypothesis of the model is that the level of syncopation is the difference between the metrical simplicity of the rhythm (i.e. the sum of the metrical weights for each note) and the maximum possible metrical simplicity (i.e. the sum of metrical weights for a rhythm containing the same number of notes but placed at strongest possible metrical positions).

\subsubsection{Sioros and Guedes 2011 (\sioros)}
\label{sec:background:models:sg}
Sioros and Guedes~\cite{Sioros11,Sioros12} has three main hypotheses: First, accenting of notes affects perceived syncopation and should be included in the model (the only model in this study to do so). Second, humans try to minimise the syncopation of a particular note relative to its neighbours in each level of the metrical hierarchy. Third, syncopations at the beat level are more salient than those that occur in higher or lower metrical levels so the outcome should be scaled to reflect this~\cite{Sioros13}.

\subsubsection{Keith 1991 (\keith)}
\label{sec:background:models:kth}


\subsubsection{Toussaint 2005 `Off-Beatness' (\offbeat)}
\label{sec:background:models:tob}


\subsubsection{G\'omez 2005 `Weighted Note-to-Beat Distance' (WNBD)}
\label{sec:background:models:wnbd}


\begin{table}
\renewcommand{\arraystretch}{1.2}
\centering

{\footnotesize
\begin{tabular}{lccccccc}
%\hline
Property & \lhl  & \pressing  & \metrical  & \sioros & \keith  & \offbeat  & \wnbd \\
\hline
Onset & \checkmark & \checkmark & \checkmark & \checkmark & \checkmark & \checkmark & \checkmark \\
Duration & & & & & \checkmark & & \checkmark \\
Dynamics & & & & \checkmark & & & \\
Mono & \checkmark & \checkmark & \checkmark & \checkmark & \checkmark & \checkmark & \checkmark \\
Poly & & & & & \checkmark & \checkmark & \checkmark \\
Duple & \checkmark & \checkmark & \checkmark & \checkmark & \checkmark & \checkmark & \checkmark \\
Triple & \checkmark & \checkmark & \checkmark & \checkmark & & \checkmark & \checkmark \\
\hline
\end{tabular}
}
\caption{Musical properties captured by the different syncopation models. All models use note onsets, but only two use note duration rather than inter-onset intervals. Only SG takes dynamics (i.e. variation in note velocity) into account. All models handle monorhythms but the four models based on hierarchical decomposition of rhythm patterns are unable to handle polyrhythmic patterns. All models can process both duple and triple meters with the exception of KTH that can only process duple.}
\label{ta:capabilities}
\end{table}


%All the models use temporal features (i.e. onset time point and/or note duration) in the modelling. The SG model also process dynamic information of musical events (i.e. note velocity). We use the term \emph{monorhythm} to refer to any rhythm-pattern that is not polyrhythmic. All the models can measure syncopation of monorhythms, but only KTH, TOB and WNBD models can deal with polyrhythms. Finally, all the models can deal with rhythms (notated) in duple meter, but all models except KTH can cope with rhythms in a triple meter.
author	christopherh <christopher.harte@eecs.qmul.ac.uk>
date	Mon, 27 Apr 2015 18:15:53 +0100
parents	e2b9ccb92973
children	cf0305dc0ba0