Mercurial > hg > musicweb-iswc2016
diff musicweb.tex @ 38:35d37b14685d
added 4.3 and merged
author | gyorgyf |
---|---|
date | Sun, 01 May 2016 02:36:03 +0100 |
parents | af1ad3bca2c6 |
children | 89ad7f8945db |
line wrap: on
line diff
--- a/musicweb.tex Sun May 01 02:31:14 2016 +0100 +++ b/musicweb.tex Sun May 01 02:36:03 2016 +0100 @@ -299,12 +299,25 @@ \subsection{Content-based linking}\label{sec:mir} -Content-based Music Information Retrieval (MIR) [Casey et.al. 2008] facilitates applications that rely on perceptual, statistical, semantic or musical features derived from audio using digital signal processing and machine learning methods. These features may include statistical aggregates computed from time-frequency representations extracted over short time windows. For instance, spectral centroid is said to correlate with the perceived brightness of a sound [Schubert et.al., 2006], therefore it may be used in the characterisation in timbral similarity between music pieces. More complex representations include features that are extracted using a perceptually motivated algorithm. Mel-Frequency Cepstral Coefficients for instance are often used in speech recognition as well as in estimating music similarity. Higher-level musical features include keys, chords, tempo, rhythm, as well as semantic features like genre or mood, with specific algorithms to extract this information from audio. +Content-based Music Information Retrieval (MIR) [Casey et.al. 2008] facilitates applications that rely on perceptual, statistical, semantic or musical features derived from audio using digital signal processing and machine learning methods. These features may include statistical aggregates computed from time-frequency representations extracted over short time windows. For instance, spectral centroid is said to correlate with the perceived brightness of a sound [Schubert et.al., 2006], therefore it may be used in the characterisation in timbral similarity between music pieces. More complex representations include features that are extracted using a perceptually motivated algorithm. Mel-Frequency Cepstral Coefficients (MFCCs) for instance are often used in speech recognition as well as in estimating music similarity. Higher-level musical features include keys, chords, tempo, rhythm, as well as semantic features like genre or mood, with specific algorithms to extract this information from audio. % -Content-based features are increasingly used in music recommendation systems to overcome issues such as infrequent access of lesser known pieces in large music catalogues (the ``long-tail'' problem) or the difficulty of recommending new pieces without user ratings in systems that employ collaborative filtering (``cold-start'' problem) [Celma, 2008]. +Content-based features are increasingly used in music recommendation systems to overcome issues such as infrequent access of lesser known pieces in large music catalogues (the ``long tail'' problem) or the difficulty of recommending new pieces without user ratings in systems that employ collaborative filtering (``cold start'' problem) \cite{Celma2010}. In this work, we are interested in supporting music discovery by facilitating a user to engage in interesting journeys through the ``space of music artists''. Although similar to recommendation, this is in contrast with most recommender systems which operate on the level of individual music items. We aim at creating links between artists based on stylistic elements of their music derived from a collection of recordings and complement the social and cultural links discussed in the previous sections. +High-level stylistic descriptors are not easily estimated from audio but they can correlate with lower level features such as the average tempo of a track, the frequency of note onsets, the most commonly occurring keys or chords or the overall spectral envelope that characterises dominant voices or instrumentation. To exploit different types of similarity, we model each artist using three main categories of audio descriptors: rhythmic, harmonic and timbral. We compute the joint distribution of several low-level features in each category over a large collection of tracks from each artist. We then link artists exhibiting similar distributions of these features. + +% for XXXX artists with a mean track count of YYY +We obtain audio features form the AcousticBrainz\footnote{https://acousticbrainz.org/} Web service which provides audio descriptors in each category of interest. Tracks are indexed by MusicBrainz identifiers enabling unambiguous linking to artists and other relevant metadata. For each artist in our database, we retrieve features for a large collection of their tracks in the above categories, including beats-per-minute and onset rate (rhythmic), chord histograms (harmonic) and MFCC (timbral) features. + +For each artist, we fit a Gaussian Mixture Model (GMM) with full covariance on each set of aggregated features in each category across several tracks and compute the distances $D_{cat}$ in each feature category using Eq.\ref{eq:dist}, +% +\begin{equation}\label{eq:dist} +D_{cat} = d_{skl}(artist\_model_{cat}(i), artist\_model_{cat}(j)), +\end{equation} +% +where $d_{skl}$ is the symmetrised Kullback-Leibler divergence obtained using a Variational Bayes approximation for mixture models described in [Hershey, Olsen 2007] and $artist\_model_{cat}(i)$ are the model parameters (GMM weighs, means and covariance matrices) for artist $i$ in the selected category. The divergences for each artist are then ranked and the top $N$ closest artists are stored. + \subsection{Similarity by mood}