Mercurial > hg > hybrid-music-recommender-using-content-based-and-social-information
changeset 29:b1c54790ed97
Report update
author | Paulo Chiliguano <p.e.chiilguano@se14.qmul.ac.uk> |
---|---|
date | Tue, 01 Sep 2015 11:29:38 +0100 |
parents | a95e656907c3 |
children | eba57dbe56f3 |
files | Report/chapter2/background.tex Report/chapter2/eda.png Report/chapter3/CDNN.png Report/chapter3/General_model_hybrid_recommender.png Report/chapter3/ch3.tex Report/chapter3/fetch_audio.png Report/chapter3/taste_profile.png Report/chapter3/time_frequency.png Report/chapter6/conclusions.tex Report/chiliguano_msc_finalproject.blg Report/chiliguano_msc_finalproject.lof Report/chiliguano_msc_finalproject.pdf Report/chiliguano_msc_finalproject.synctex.gz Report/chiliguano_msc_finalproject.toc Report/references.bib slides/chiliguano_msc_project_slides.tex |
diffstat | 16 files changed, 205 insertions(+), 117 deletions(-) [+] |
line wrap: on
line diff
--- a/Report/chapter2/background.tex Mon Aug 31 02:43:54 2015 +0100 +++ b/Report/chapter2/background.tex Tue Sep 01 11:29:38 2015 +0100 @@ -13,7 +13,7 @@ %\subsection{Last.fm} \section{Music services platforms} -%\subsection{Echonest} +\label{sec:musicservices} The Echo Nest\footnote{http://developer.echonest.com/} was a music intelligence company that offered solutions for music discovery and personalisation, dynamic curated sources, audio fingerprinting and interactive music applications. In 2014, The Echo Nest was acquired by Spotify\footnote{https://www.spotify.com/}, which is a commercial music streaming service, where a user can browse and listen music tracks sorted by artists, albums, genres or playlists. However, The Echo Nest API is still active for developer community and offers the access to artists, songs, taste profiles and playlists data. Particularly, The Echo Nest API is able to retrieve information limited to a particular music tracks catalogue such as 7digital\footnote{http://developer.7digital.com/}. @@ -115,6 +115,7 @@ In our project, we model each user profile through EDAs by minimising a fitness function. The parameters of the fitness function are the rating and similarity values of each song that a user has listened. The user profile is also represented in a n-dimensional vector of probabilities of music genres. This process is illustrated in section~\ref{subsec:profile} \subsection{Hybrid recommender approaches} +\label{subsec:hybridrecommender} An hybrid recommender system is developed through the combination of the recommendation techniques mentioned in the previous sections. Usually, hybrid approaches boost the advantages of CF by considering the user's feedback and the advantages of CB by taking into count the item attributes. According to \textcite{Burke2002331}, there are the following combination methods to accomplish hybridisation: @@ -128,13 +129,7 @@ \item \textbf{Meta-level} method, where a model generated for user's interest representation using one recommendation technique is used as the input of another recommender system. The advantage of this method is the performance of the second recommender that uses the compressed representation instead of sparse raw data. \end{itemize} -The hybrid music recommender approach in this project can be considered as implementation of feature augmentation method and a meta-level method. First, user profiles are generated using the rating matrix and the song vector representation. Next, the model generated is the input of a CB recommender to produce \emph{top-N} recommendations. The general model of our hybrid recommender is shown in Figure~\ref{fig:generalhybrid} -\begin{figure}[ht!] - \centering - \includegraphics[width=\textwidth]{chapter2/General_model_hybrid_recommender.png} - \caption{Content-based filtering process} - \label{fig:generalhybrid} -\end{figure} +The hybrid music recommender approach in this project can be considered as implementation of feature augmentation method and a meta-level method. The general model of our hybrid recommender is explained in detail in Section~\ref{sec:algorithms}. %is based on a three-way aspect model \citep{Yoshii2008435}. Real item ratings are obtained through Last.fm API and spectral information are represented by convolutional deep belief networks (CDBN) features computed from items' spectrogram \citep{Lee20091096}. @@ -142,7 +137,7 @@ Music Information Retrieval (MIR) \parencite{Casey2008668} is a field of research for better human understanding of music data in an effort to reduce the \textit{semantic gap} \parencite{Celma2006} between high-level musical information and low-level audio data. Applications of MIR include artist identification, genre classification and music recommender systems~\parencite{weston2012latent,Yoshii2008435}. \subsection{Genre classification} -Music classification is one of the main tasks in MIR for clustering audio tracks based on similarities between features of pieces of music. Automatic musical genre classification approach proposed by \textcite{Tzanetakis2002293}, which uses GTZAN genre dataset\footnote{http://marsyas.info/downloads/datasets.html}, has been widely used in the past decade. The GTZAN dataset consists of a total of 1,000 clips, corresponding to 100 examples for each of the 10 genres. The duration of each clip is 30 seconds. +Music classification is one of the main tasks in MIR for clustering audio tracks based on similarities between features of pieces of music. Automatic musical genre classification approach proposed by \textcite{Tzanetakis2002293}, which uses GTZAN genre dataset\footnote{http://marsyas.info/downloads/datasets.html}, has been widely used in the past decade. The GTZAN dataset consists of a total of 1,000 clips, corresponding to 100 examples for each of the 10 music genres: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae and rock. The total duration of each clip is 30 seconds. Nonetheless, the GTZAN dataset has inaccuracies~\parencite{Sturm20127}, it still provides an useful baseline to compare genre classifiers. @@ -256,20 +251,22 @@ In our project, we investigate permutation EDAs and continuous EDAs for user profile modelling. \subsection{A Hybrid Recommendation Model Based on EDA} -\textcite{Liang2014781} exploited a permutation EDA to model user profiles in an hybrid model for movie recommendation using the MovieLens 1M dataset\footnote{http://grouplens.org/datasets/movielens/}. A movie, \emph{i}, is described using keywords and weights vector, $t_i$, calculated by term frequency-inverse document frequency (TF-IDF) technique. A user is initially represented by a set, $S_u$, of \textit{(movie, rating)} tuples. The keywords of every $S_u$ set are embedded in a new set, $D_u$. +\textcite{Liang2014781} exploited a permutation EDA to model user profiles in an hybrid model for movie recommendation using the MovieLens 1M dataset\footnote{http://grouplens.org/datasets/movielens/}. + +A movie, \emph{i}, is described using a vector, $t_i=\{(k_1,w_1),\ldots ,(k_n,w_n)\}$, where the keywords $k_n$ and weights $w_n$ are calculated with term frequency-inverse document frequency (TF-IDF) technique. A user is initially represented by a set, $S_u$, of $(i, r_{u,i}$) tuples, where, $r_{u,i}$ is the rating of the movie \emph{i} given by user \emph{u}. The keywords in every $S_u$ set are embedded in a new set, $D_u$. The goal is to learn the user profile, $profile_u$, by minimisation of the fitness function, defined by Equation~\eqref{eq:fitness} \begin{equation} fitness(profile_u) =\sum_{i\in S_u}\log(r_{u,i}\times sim(profile_u,t_i)) \label{eq:fitness} \end{equation} -where, $r_{u,i}$ is the rating of the movie \emph{i} given by user \emph{u}, and $sim(profile_u,t_i)$ is computed by the cosine similarity coefficient, defined by Equation~\eqref{eq:cossim} +where $sim(profile_u,t_i)$ is computed by the cosine similarity coefficient, defined by Equation~\eqref{eq:cossim} \begin{equation} sim(profile_u, t_i)=cos(profile_u, t_i) =\frac{profile_u\cdot t_i}{\Vert profile_u\Vert\times\Vert t_i\Vert} \label{eq:cossim} \end{equation} -The pseudocode of EDA implemented by \textcite{Liang2014781} is delineated by Algorithm~\ref{algo:hybrideda}, where MAXGEN is the maximum number of generations. +The pseudocode of EDA implemented by \textcite{Liang2014781} is delineated by Algorithm~\ref{alg:hybrideda}, where MAXGEN is the maximum number of generations. \begin{algorithm}[ht!] \caption{Calculate $profile_u$} \begin{algorithmic} @@ -284,17 +281,26 @@ \STATE Compute each $fitness(profile_u)$ \STATE Rank individuals by their fitness value \STATE Select top $M < N$ individuals - \STATE Update $c_{n,i}$ by counting the occurrences of $(k_n,w_{n,i})$ on the $M$ individuals + \STATE Update $c_{n,i}$ by counting the occurrences of $(k_n,w_{n,i})$ in the $M$ individuals profiles \STATE Generate $profile_u$ by random sampling according to updated $c_{n,i}$ \ENDWHILE \RETURN $profile_u$ \end{algorithmic} - \label{algo:hybrideda} + \label{alg:hybrideda} \end{algorithm} +To recommend a new movie, \emph{j}, to a user, the similarity between the user profile, $u_i$, and the movie vector, $t_j$, is calculated using Pearson correlation coefficient, defined by Equation~\eqref{eq:wpearson}: +\begin{equation} +sim(u_i,t_j) =\frac{\sum _{c\in I_i \cap I_j}^{ } (w_{i,c} - \bar{w}_i)(w_{j,c} - \bar{w}_j)}{\sqrt{\sum _{c\in I_u \cap I_j}^{ }(w_{i,c} - \bar{w}_i)^2} \sqrt{\sum _{c\in I_u \cap I_j}^{ }(w_{j,c} - \bar{w}_j)^2}} +\label{eq:wpearson} +\end{equation} +where, $c\in I_i \cap I_j$ are the keywords in common between the user profile and the new movie vector, $w_{i,c}$ and $w_{j,c}$ are the weights of keyword \emph{c} in the user profile and movie vector, $\bar{w}_i$ is the mean weight of user profile and $\bar{w}_j$ is the mean weight of movie vector. + +In our approach, we use the algorithm proposed by~\textcite{Liang2014781} to model user profiles but considering probability values of music genres instead of weight values of keywords. The adapted algorithm is explained in subsection~\ref{subsec:profile}. + \subsection{Continuous Univariate Marginal Distribution Algorithm} -\textcite{gallagher2007bayesian} presented the continuous univariate marginal distribution algorithm ($UMDA_c^G$) as an extension of a discrete variable EDA. The general pseudocode of the ($UMDA_c^G$) is delineated in Algorithm~\ref{algo:umda}, where $x_i\in \textbf{x}$ represent the \emph{i} parameter of \textbf{\emph{x}} individual solution. +\textcite{gallagher2007bayesian} presented the continuous univariate marginal distribution algorithm ($UMDA_c^G$) as an extension of a discrete variable EDA. The general pseudocode of the $UMDA_c^G$ is delineated in Algorithm~\ref{alg:umda}, where $x_i\in \textbf{x}$ represent the \emph{i} parameter of \textbf{\emph{x}} individual solution. \begin{algorithm}[ht!] \caption{Framework for $UMDA_c^G$} @@ -313,13 +319,11 @@ \STATE Sample $M$ individuals from $p_t({x_{i}}\vert \mu_{i,t},\sigma_{i,t}^2)$ \STATE $t\leftarrow t+1$ \ENDWHILE - \RETURN $profile_u$ \end{algorithmic} - \label{algo:umda} + \label{alg:umda} \end{algorithm} - -To our knowledge, this is the first work to use a continuous EDA for user profile modelling in a recommender system. +To our knowledge, our hybrid recommender design is the first work to consider a continuous EDA for user profile modelling in a recommender system. The implementation of the continuous EDA is explained in subsection~\ref{subsec:profile}. \section{Summary} In this chapter, previous work on recommender systems has been reviewed and novelty techniques to representing acoustical features and to model user profiles has been presented. The next steps are to collect the dataset by crawling online social information, to extract the acoustical features of a collection of songs to represent them as n-dimensional vectors, to model the user profiles by using EDAs, and therefore, to return a list of song recommendations. \ No newline at end of file
--- a/Report/chapter3/ch3.tex Mon Aug 31 02:43:54 2015 +0100 +++ b/Report/chapter3/ch3.tex Tue Sep 01 11:29:38 2015 +0100 @@ -1,6 +1,6 @@ \chapter{Methodology} \label{ch:methodology} -The methodology used to develop our hybrid music recommender consists of three main stages. First, the collection of real world user-item data corresponding to the play counts of specific songs and the fetching of audio clips of the unique identified songs in the dataset. Secondly, the implementation of the deep learning algorithm to represent the audio clips in terms of music genre probabilities as n-dimensional vectors. Finally, we investigate estimation of distribution algorithms to model user profiles based on the rated songs above a threshold. Every stage of our hybrid recommender is entirely done in Python 2.7\footnote{https://www.python.org/download/releases/2.7/}. +The methodology used to develop our hybrid music recommender consists of three main stages. First, the collection of real world user-item data corresponding to the play counts of specific songs and the fetching of audio clips of the unique identified songs in the dataset. Secondly, the implementation of the CDNN to represent the audio clips in terms of music genre probabilities as n-dimensional vectors. Finally, we investigate a permutation EDA and a continuous EDA to model user profiles based on the rated songs above a threshold. Every stage of our hybrid recommender is entirely developed in Python 2.7\footnote{https://www.python.org/download/releases/2.7/}, although, they are implemented in different platforms, e.g., OS X (v10.10.4) for the most part of the implementation, Ubuntu (14.04 LTS installed on VirtualBox 5.0.0) for intermediate time-frequency representation and CentOS (Linux release 7.1.1503) for the data preprocessing and CDNN implementation. \section{Data collection} The Million Song Dataset \parencite{Bertin-Mahieux2011} is a collection of audio features and metadata for a million contemporary popular music tracks which provides ground truth for evaluation research in MIR. This collection is also complemented by the Taste Profie subset which provides 48,373,586 triplets, each of them consist of anonymised user ID, Echo Nest song ID and play count. We choose this dataset because it is publicly available data and it contains enough data for user modelling and recommender evaluation. @@ -9,59 +9,114 @@ Due to potential mismatches\footnote{http://labrosa.ee.columbia.edu/millionsong/blog/12-2-12-fixing-matching-errors} between song ID and track ID on the Echo Nest database, it is required to filter out the wrong matches in the Taste Profile subset. The cleaning process is illustrated in Figure~\ref{fig:taste_profile} \begin{figure}[ht!] \centering - \includegraphics[width=0.8\textwidth]{chapter3/taste_profile.png} - \caption{Cleaning of the taste profile subset} + \includegraphics[width=0.9\textwidth]{chapter3/taste_profile.png} + \caption{Diagram of the cleaning process of the Taste Profile subset} \label{fig:taste_profile} \end{figure} %Please see figure ~\ref{fig:JobInformationDialog} A script is implemented to discard the triplets that contain the song identifiers from the mismatches text file. First, we load the file to read each line of it to obtain song identifier. The identifiers are stored as elements of a set object to construct a collection of unique elements. Next, due to the size of the Taste Profile subset (about 3 GB, uncompressed), we load the dataset by chunks of 20,000 triplets in a \textit{pandas}\footnote{http://pandas.pydata.org/} dataframe to clean each chunk by discarding the triplets that contains the song identifiers in the set object of the previous step. The cleaning process takes around 2.47 minutes and we obtain 45,795,100 triplets. -In addition to the cleaning process, we reduce significantly the size of the dataset for experimental purposes. We only consider users with more than 1,000 played songs and select the identifiers of 1,500 most played songs. This additional process takes around 3.23 minutes and we obtain 65,327 triplets. +In addition to the cleaning process, we reduce significantly the size of the dataset for experimental purposes. We only consider users with more than 1,000 played songs and select the identifiers of 1,500 most played songs. This additional process takes around 3.23 minutes and we obtain 65,327 triplets. The triplets are stored in a cPickle\footnote{https://docs.python.org/2/library/pickle.html\#module-cPickle} data stream (2.8 MB). %count resulting number of triplets %At this stage, similarities between users is calculated to form a neighbourhood and predict user rating based on combination of the ratings of selected users in the neighbourhood. \subsection{Fetching audio data} -First, for each element of the list of 1,500 songs identifiers obtained in the previous step is used to retrieve the correspondent Echo Nest track ID through a script using the \emph{get\_tracks} method from the \textit{Pyechonest}\footnote{http://echonest.github.io/pyechonest/} package which allow us to acquire track ID and preview URL for each song ID through Echo Nest API. The reason behind this is 7digital API uses Echo Nest track ID instead of song ID to retrieve any data from its catalogue. If the track information of a song is not available, we skip to retrieve the Echo Nest information of the next song ID. +First, for each element of the list of 1,500 songs identifiers obtained in the previous step is used to retrieve the associated Echo Nest track ID through a script using the \emph{get\_tracks} method from the \textit{Pyechonest}\footnote{http://echonest.github.io/pyechonest/} package which allow us to acquire track ID and preview URL for each song ID through Echo Nest API. The reason behind this is 7digital API uses Echo Nest track ID instead of song ID to retrieve any data from its catalogue. If the track information of a song is not available, the script skips to retrieve the Echo Nest information of the next song ID. At this point, it is useful to check if the provided 7digital API keys, a preview URL, and the country parameter, e.g., 'GB' to access to UK catalogue, work in the \textit{OAuth 1.0 Signature Reference Implementation}\footnote{http://7digital.github.io/oauth-reference-page/}. -Moreover, for each preview URL obtained in the previous step, we can fetch an audio clip of 30 to 60 seconds of duration through a OAuth request to 7digital API. For this particular API requests, we use the GET method of the request class from the \textit{python-oauth2}\footnote{https://github.com/jasonrubenstein/python\_oauth2} package, because every request require a nonce, timestamp and a signature method, and also, the country parameter, e.g., 'GB' to access to UK catalogue. Before running the script, it is useful to check if the provided 7digital API keys and the country parameter are enabled in the \textit{OAuth 1.0 Signature Reference Implementation}\footnote{http://7digital.github.io/oauth-reference-page/} for 7digital. +Next, for each preview URL, we can create a GET request using \textit{python-oauth2}\footnote{https://github.com/jasonrubenstein/python\_oauth2} package, because it allows us to assign the nonce, the timestamp, the signature method and the country parameters. The request is converted to a URL to be opened with \textit{urlopen} function from the \textit{urllib2}\footnote{https://docs.python.org/2/library/urllib2.html} module, to download a MP3 file (44.1 kHz, 128 kbps, stereo) of 30 to 60 seconds of duration in a song repository. -Additionally, the script accumulates the Echo Nest song identifier, track ID, artist name, song title and the 7digital preview audio URL for each downloaded track in a text file. If a preview audio clip is not available, the script skip to the next song ID. The generated text file is used to reduce more the triplets dataset from previous section. +Considering the Echo Nest API and 7digital API limited number of requests (see Section~\ref{sec:musicservices}), the process of fetching data from 1,500 song IDs takes at least 8 hours, resulting in a total of 640 MP3 files. + +Additionally, the script accumulates the Echo Nest song identifier, track ID, artist name, song title and the 7digital preview audio URL for each downloaded track in a text file only if the audio clip is available for download. The generated text file is used for the preprocessing of the cleaned taste profile dataset in Subsection~\ref{rating}. The flowchart of the script is shown in Figure~\ref{fig:fetchaudio} +\begin{figure}[ht!] + \centering + \includegraphics[width=0.9\textwidth]{chapter3/fetch_audio.png} + \caption{Flowchart of the fetching audio process} + \label{fig:fetchaudio} +\end{figure} %include number of tracks available %Classifier creates a model for each user based on the acoustic features of the tracks that user has liked. \subsection{Intermediate time-frequency representation for audio signals} -For representing audio waveforms of the song collection obtained through 7digital API, a similar procedure suggested by \textcite{NIPS2013_5004} is followed: -\begin{itemize} - \item Read 3 seconds of each song at a sampling rate of 22050 Hz and mono channel. - \item Compute log-mel spectrograms with 128 components from windows of 1024 frames and a hop size of 512 samples. -\end{itemize} +\label{subsec:spectrogram} +Intermediate audio representation instead of waveform (time-domain) representation is required to feed a CDNN according to \textcite{NIPS2013_5004}. The flowchart to obtain the time-frequency representation from raw audio content of the song repository assembled in the previous section is shown in Figure~\ref{fig:timefrequency}. +\begin{figure}[ht!] + \centering + \includegraphics[width=0.7\textwidth]{chapter3/time_frequency.png} + \caption{Flowchart for time-frequency representation process} + \label{fig:timefrequency} +\end{figure} -The Python script for feature extraction implemented by \textcite{Sigtia20146959} is modified to return the log-mel spectrograms by using the LibROSA\footnote{https://bmcfee.github.io/librosa/index.html} package. +First, a list of absolute paths corresponding to the songs in the repository is generated. The sequence of paths in the list is modified by random shuffling. This new sequence of absolute paths is saved in a text file. -``Representations of music directly from the temporal or spectral domain can be very sensitive to small time and frequency deformations''. \parencite{zhang2014deep} +Second, for every path in the text file of randomised absolute paths, a fragment equivalent to 3 seconds of the associated audio clip is loaded at a sampling rate of 22,050 Hz and converted to mono channel. For every fragment, a mel-scaled power spectrogram with 128 bands is computed from windows of 1,024 samples with a hop size of 512 samples. Hence, the spectrogram is converted to logarithmic scale in dB using peak power as reference. The functions \textit{load}, \textit{feature.melspectrogram} and \textit{logamplitude}, correspondingly to load an audio clip, spectrogram computation and logarithmic conversion, from the LibROSA\footnote{https://bmcfee.github.io/librosa/index.html} package are used. + +To handle audio with LibROSA functions, it is recommended to use the Samplerate\footnote{https://pypi.python.org/pypi/scikits.samplerate/} package for efficient resampling. In our project, we considered to use the SoX\footnote{http://sox.sourceforge.net/} cross-platform without success due to operating system restrictions. Alternatively, we use the FFmpeg\footnote{https://www.ffmpeg.org/} cross-platform and \textit{libmp3lame0}\footnote{http://packages.ubuntu.com/precise/libmp3lame0} packages for efficient resampling. + +Finally, we store the absolute path and the log-mel-spectrogram values of the 640 songs in a HDF5\footnote{https://www.hdfgroup.org/HDF5/} data file. + +In the particular case for the time-frequency representation of each audio clip in the GTZAN dataset, we generate a list of the genre associated to each audio fragment that represent the target values (ground truth). This procedure for the GTZAN dataset is repeated for 9 times, considering the rest of 3-seconds fragments in each audio clip of the dataset for training, validation and testing of the CDNN (see Section~\ref{subsec:genre}) + +The time elapsed to obtain the time-frequency representation of the clips in the GTZAN dataset with the procedure described above is about 55 seconds, generating a HDF5 file (66.9 MB). Because of the number of MP3 files in the song repository is less than the number of files of the GTZAN dataset, the process is faster and the size of the HDF5 file is smaller (42.8 MB). + \section{Data preprocessing} -\begin{itemize} - \item Rating from complementary cumulative distribution. - \item Flatenning spectrogram. -\end{itemize} +In order to obtain suitable representations for users' interest in the taste profile dataset and for songs' spectrograms, it is necessary an additional process of the data. +\subsection{Rating from implicit user feedback} +\label{rating} +First, the text file of the downloaded MP3 metadata (see subsection~\ref{subsec:spectrogram}) is used to retain the triplets, from the cleaned taste profile subset, that contain the song IDs of the available audio clips. A reduced taste profile dataset with 4,685 triplets is obtained. + +The reduced taste profile dataset represent the user listening habits as implicit feedback, i.e., play counts of songs, it is necessary to normalise the listening habits as explicit feedback, i.e., range of values $[1\ldots5]$ that indicate how much a user likes a song. Normalisation of play counts is computed with the complementary cumulative distribution of play counts of a user, following the procedure given by \textcite{1242}. Songs in the top 80–100\% of the distribution get a rating of 5, songs in the 60–80\% range get a 4, songs in the 40-60\% range get a 3, songs in the 20-40\% get a 2 and songs in the 0-20\% range get a rating of 1. An exception for this allocation of ratings comes out when the coefficient of variation, given by Equation~\eqref{eq:cv}: +\begin{equation} +CV=\frac{\sigma}{\mu} +\label{eq:cv} +\end{equation} +where, $\sigma$ is the standard deviation and $\mu$ is the mean of the play counts of a user, is less or equal than $0.5$. In that case, every song gets a rating of 3. + +\subsection{Standardise time-frequency representation} +The logarithmic mel-scaled power spectrograms obtained in subsection~\ref{subsec:spectrogram} are normalised to have zero mean and unit variance in each frequency band, using the \textit{fit} and \textit{transform} methods of the \textit{StandardScaler} class from the Scikit-learn~\parencite{scikit-learn} package, as a common requirement of several machine learning classifiers. + \section{Algorithms} -\subsection{Music genre classifier} +\label{sec:algorithms} +The hybrid music recommender approach in this project can be considered as implementation of feature augmentation method and a meta-level method presented in subsection~\ref{subsec:hybridrecommender}. First, user profiles are generated using the rating matrix and the song vector representation. Next, the model generated is the input of a CB recommender to produce \emph{top-N} song recommendations. The general model of our hybrid recommender is shown in Figure~\ref{fig:generalhybrid} +\begin{figure}[ht!] + \centering + \includegraphics[width=\textwidth]{chapter3/General_model_hybrid_recommender.png} + \caption{Diagram of hybrid music recommender} + \label{fig:generalhybrid} +\end{figure} + +\subsection{Probability of music genre representation} \label{subsec:genre} -The input of the CNN consist of the 128-component spectrograms obtained in feature extraction. The batch size considered is 20 frames. -Each convolutional layer consists of 10 kernels and ReLUs activation units. In the first convolutional layer the pooling size is 4 and in the second layer the pooling size is 2. The filters analyses the frames along the frequency axis to consider every Mel components with a hop size of 4 frames in the time axis. Additionally, there is a hidden multi perceptron layer with 513 units. +To represent an audio file in a 10-dimensional vector, whose dimensions corresponds to the 10 music genres specified in the GTZAN dataset, a CDNN is implemented using Theano library. For intensive computation processes, such as convolution, the implementation on equipment with Graphical Processing Unit (GPU) acceleration is recommended. In this project, a CentOS (Linux release 7.1.1503) server with a Tesla K40c\footnote{http://www.nvidia.com/object/tesla-servers.html} GPU is exploited. + +The code for logistic regression, multilayer perceptron and deep convolutional network designed for character recognition of MNIST\footnote{http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz} dataset, available on~\textcite{1_deeplearning.net_2015} is adapted to our purpose of music genre classification + %Deep belief network is a probabilistic model that has one observed layer and several hidden layers. -\subsubsection{CNN network architecture} +\subsubsection{CDNN architecture} +\begin{figure}[ht!] + \centering + \includegraphics[width=\textwidth]{chapter3/CDNN.png} + \caption{Diagram of hybrid music recommender} + \label{fig:cdnn} +\end{figure} +The input of the CDNN consist of the 128-bands spectrograms obtained in subsection~\ref{subsec:spectrogram}. The batch size considered is 20 frames. +Each convolutional layer consists of 10 kernels and ReLUs activation units. In the first convolutional layer the pooling size is 4 and in the second layer the pooling size is 2. The filters analyses the frames along the frequency axis to consider every Mel components with a hop size of 4 frames in the time axis. Additionally, there is a hidden MLP with 500 units. + + The classification of genre for each frame is returned by negative log likelihood estimation of a logistic stochastic gradient descent (SGD) layer. -In our testing, we obtained a 38.8 \% of classification error after 9 trials using the GTZAN dataset. More details of classification results are shown on Table \ref{table:genre}. +\subsubsection{Learning parameters} +In our testing, we obtained a 38.8 \% of classification error after 9 trials using the GTZAN dataset. More details of classification results are shown in Table~\ref{table:genre}. + +%\subsubsection{Probability of genre representation} \subsection{User profile modelling} \label{subsec:profile} \subsubsection{Permutation EDA} \subsubsection{Continuous Univariate Marginal Distribution Algorithm} -\subsection{Song recommendation} \ No newline at end of file +\subsection{Top-N songs recommendation} \ No newline at end of file
--- a/Report/chapter6/conclusions.tex Mon Aug 31 02:43:54 2015 +0100 +++ b/Report/chapter6/conclusions.tex Tue Sep 01 11:29:38 2015 +0100 @@ -1,6 +1,7 @@ \chapter{Conclusion} \label{ch:conclusion} -Data is not strong enough +%``Representations of music directly from the temporal or spectral domain can be very sensitive to small time and frequency deformations''. \parencite{zhang2014deep} + \section{Future work} \begin{itemize}
--- a/Report/chiliguano_msc_finalproject.blg Mon Aug 31 02:43:54 2015 +0100 +++ b/Report/chiliguano_msc_finalproject.blg Tue Sep 01 11:29:38 2015 +0100 @@ -25,11 +25,11 @@ Database file #1: chiliguano_msc_finalproject-blx.bib Database file #2: references.bib Warning--I'm ignoring Putzke2014519's extra "keywords" field ---line 251 of file references.bib +--line 262 of file references.bib Warning--I'm ignoring Putzke2014519's extra "keywords" field ---line 252 of file references.bib +--line 263 of file references.bib Warning--I'm ignoring Putzke2014519's extra "keywords" field ---line 253 of file references.bib +--line 264 of file references.bib Biblatex version: 3.0 Name 1 in "Hypebot.com," has a comma at the end for entry 1_hypebot.com_2015 while executing---line 2513 of file biblatex.bst @@ -419,43 +419,43 @@ while executing---line 2659 of file biblatex.bst You've used 48 entries, 6047 wiz_defined-function locations, - 1474 strings with 26412 characters, -and the built_in function-call counts, 169804 in all, are: -= -- 5715 -> -- 7284 -< -- 1512 -+ -- 3554 -- -- 3970 -* -- 14112 -:= -- 11579 + 1473 strings with 26491 characters, +and the built_in function-call counts, 171582 in all, are: += -- 5752 +> -- 7367 +< -- 1506 ++ -- 3605 +- -- 4025 +* -- 14453 +:= -- 11734 add.period$ -- 0 call.type$ -- 48 -change.case$ -- 682 +change.case$ -- 678 chr.to.int$ -- 212 cite$ -- 94 -duplicate$ -- 18879 -empty$ -- 17377 -format.name$ -- 3662 -if$ -- 36209 +duplicate$ -- 19117 +empty$ -- 17502 +format.name$ -- 3793 +if$ -- 36491 int.to.chr$ -- 0 -int.to.str$ -- 109 +int.to.str$ -- 108 missing$ -- 0 -newline$ -- 1665 -num.names$ -- 2080 -pop$ -- 15408 +newline$ -- 1716 +num.names$ -- 2073 +pop$ -- 15454 preamble$ -- 1 -purify$ -- 849 +purify$ -- 856 quote$ -- 0 -skip$ -- 8915 +skip$ -- 8980 stack$ -- 0 -substring$ -- 3256 -swap$ -- 6714 -text.length$ -- 1411 +substring$ -- 3245 +swap$ -- 6795 +text.length$ -- 1420 text.prefix$ -- 47 top$ -- 1 type$ -- 1803 warning$ -- 0 -while$ -- 1040 +while$ -- 1039 width$ -- 0 -write$ -- 1616 +write$ -- 1667 (There were 192 error messages)
--- a/Report/chiliguano_msc_finalproject.lof Mon Aug 31 02:43:54 2015 +0100 +++ b/Report/chiliguano_msc_finalproject.lof Tue Sep 01 11:29:38 2015 +0100 @@ -12,19 +12,25 @@ \defcounter {refsection}{0}\relax \contentsline {figure}{\numberline {2.4}{\ignorespaces Content-based filtering process \parencite {1_blogseagatesoftcom_2015}\relax }}{11}{figure.caption.10} \defcounter {refsection}{0}\relax -\contentsline {figure}{\numberline {2.5}{\ignorespaces Content-based filtering process\relax }}{15}{figure.caption.12} +\contentsline {figure}{\numberline {2.5}{\ignorespaces Three-way aspect model\nobreakspace {}\parencite {Yoshii2008435}\relax }}{17}{figure.caption.14} \defcounter {refsection}{0}\relax -\contentsline {figure}{\numberline {2.6}{\ignorespaces Three-way aspect model\nobreakspace {}\parencite {Yoshii2008435}\relax }}{17}{figure.caption.15} +\contentsline {figure}{\numberline {2.6}{\ignorespaces Schematic representation of a deep neural network\nobreakspace {}\parencite {1_brown_2014}\relax }}{19}{figure.caption.15} \defcounter {refsection}{0}\relax -\contentsline {figure}{\numberline {2.7}{\ignorespaces Schematic representation of a deep neural network\nobreakspace {}\parencite {1_brown_2014}\relax }}{19}{figure.caption.16} +\contentsline {figure}{\numberline {2.7}{\ignorespaces Convolutional deep neural network LeNet model \parencite {1_deeplearning.net_2015}\relax }}{21}{figure.caption.17} \defcounter {refsection}{0}\relax -\contentsline {figure}{\numberline {2.8}{\ignorespaces Convolutional deep neural network LeNet model \parencite {1_deeplearning.net_2015}\relax }}{22}{figure.caption.18} -\defcounter {refsection}{0}\relax -\contentsline {figure}{\numberline {2.9}{\ignorespaces Flowchart of estimation of distribution algorithm \parencite {Ding2015451}\relax }}{23}{figure.caption.20} +\contentsline {figure}{\numberline {2.8}{\ignorespaces Flowchart of estimation of distribution algorithm \parencite {Ding2015451}\relax }}{23}{figure.caption.19} \defcounter {refsection}{0}\relax \addvspace {10\p@ } \defcounter {refsection}{0}\relax -\contentsline {figure}{\numberline {3.1}{\ignorespaces Cleaning of the taste profile subset\relax }}{29}{figure.caption.21} +\contentsline {figure}{\numberline {3.1}{\ignorespaces Diagram of the cleaning process of the Taste Profile subset\relax }}{29}{figure.caption.20} +\defcounter {refsection}{0}\relax +\contentsline {figure}{\numberline {3.2}{\ignorespaces Flowchart of the fetching audio process\relax }}{32}{figure.caption.21} +\defcounter {refsection}{0}\relax +\contentsline {figure}{\numberline {3.3}{\ignorespaces Flowchart for time-frequency representation process\relax }}{32}{figure.caption.22} +\defcounter {refsection}{0}\relax +\contentsline {figure}{\numberline {3.4}{\ignorespaces Diagram of hybrid music recommender\relax }}{36}{figure.caption.23} +\defcounter {refsection}{0}\relax +\contentsline {figure}{\numberline {3.5}{\ignorespaces Diagram of hybrid music recommender\relax }}{37}{figure.caption.25} \defcounter {refsection}{0}\relax \addvspace {10\p@ } \defcounter {refsection}{0}\relax
--- a/Report/chiliguano_msc_finalproject.toc Mon Aug 31 02:43:54 2015 +0100 +++ b/Report/chiliguano_msc_finalproject.toc Tue Sep 01 11:29:38 2015 +0100 @@ -38,31 +38,31 @@ \defcounter {refsection}{0}\relax \contentsline {subsection}{\numberline {2.4.2}Music recommender systems}{16}{subsection.2.4.2} \defcounter {refsection}{0}\relax -\contentsline {subsubsection}{Collaborative retrieval music recommender}{16}{section*.13} +\contentsline {subsubsection}{Collaborative retrieval music recommender}{16}{section*.12} \defcounter {refsection}{0}\relax -\contentsline {subsubsection}{Hybrid music recommender}{16}{section*.14} +\contentsline {subsubsection}{Hybrid music recommender}{16}{section*.13} \defcounter {refsection}{0}\relax -\contentsline {section}{\numberline {2.5}Deep Learning}{18}{section.2.5} +\contentsline {section}{\numberline {2.5}Deep Learning}{17}{section.2.5} \defcounter {refsection}{0}\relax \contentsline {subsection}{\numberline {2.5.1}Deep Neural Networks}{18}{subsection.2.5.1} \defcounter {refsection}{0}\relax -\contentsline {subsubsection}{Music Feature Learning}{20}{section*.17} +\contentsline {subsubsection}{Music Feature Learning}{19}{section*.16} \defcounter {refsection}{0}\relax \contentsline {subsection}{\numberline {2.5.2}Convolutional Deep Neural Networks}{20}{subsection.2.5.2} \defcounter {refsection}{0}\relax -\contentsline {subsubsection}{Deep content-based music recommendation}{22}{section*.19} +\contentsline {subsubsection}{Deep content-based music recommendation}{21}{section*.18} \defcounter {refsection}{0}\relax -\contentsline {section}{\numberline {2.6}Estimation of Distribution Algorithms}{23}{section.2.6} +\contentsline {section}{\numberline {2.6}Estimation of Distribution Algorithms}{22}{section.2.6} \defcounter {refsection}{0}\relax \contentsline {subsection}{\numberline {2.6.1}A Hybrid Recommendation Model Based on EDA}{24}{subsection.2.6.1} \defcounter {refsection}{0}\relax \contentsline {subsection}{\numberline {2.6.2}Continuous Univariate Marginal Distribution Algorithm}{26}{subsection.2.6.2} \defcounter {refsection}{0}\relax -\contentsline {section}{\numberline {2.7}Summary}{26}{section.2.7} +\contentsline {section}{\numberline {2.7}Summary}{27}{section.2.7} \defcounter {refsection}{0}\relax \contentsline {chapter}{\numberline {3}Methodology}{28}{chapter.3} \defcounter {refsection}{0}\relax -\contentsline {section}{\numberline {3.1}Data collection}{28}{section.3.1} +\contentsline {section}{\numberline {3.1}Data collection}{29}{section.3.1} \defcounter {refsection}{0}\relax \contentsline {subsection}{\numberline {3.1.1}Taste Profile subset cleaning}{29}{subsection.3.1.1} \defcounter {refsection}{0}\relax @@ -70,52 +70,58 @@ \defcounter {refsection}{0}\relax \contentsline {subsection}{\numberline {3.1.3}Intermediate time-frequency representation for audio signals}{31}{subsection.3.1.3} \defcounter {refsection}{0}\relax -\contentsline {section}{\numberline {3.2}Data preprocessing}{32}{section.3.2} +\contentsline {section}{\numberline {3.2}Data preprocessing}{34}{section.3.2} \defcounter {refsection}{0}\relax -\contentsline {section}{\numberline {3.3}Algorithms}{32}{section.3.3} +\contentsline {subsection}{\numberline {3.2.1}Rating from implicit user feedback}{34}{subsection.3.2.1} \defcounter {refsection}{0}\relax -\contentsline {subsection}{\numberline {3.3.1}Music genre classifier}{32}{subsection.3.3.1} +\contentsline {subsection}{\numberline {3.2.2}Standardise time-frequency representation}{35}{subsection.3.2.2} \defcounter {refsection}{0}\relax -\contentsline {subsubsection}{CNN network architecture}{32}{section*.22} +\contentsline {section}{\numberline {3.3}Algorithms}{35}{section.3.3} \defcounter {refsection}{0}\relax -\contentsline {subsection}{\numberline {3.3.2}User profile modelling}{33}{subsection.3.3.2} +\contentsline {subsection}{\numberline {3.3.1}Probability of music genre representation}{36}{subsection.3.3.1} \defcounter {refsection}{0}\relax -\contentsline {subsubsection}{Permutation EDA}{33}{section*.23} +\contentsline {subsubsection}{CDNN architecture}{36}{section*.24} \defcounter {refsection}{0}\relax -\contentsline {subsubsection}{Continuous Univariate Marginal Distribution Algorithm}{33}{section*.24} +\contentsline {subsubsection}{Learning parameters}{37}{section*.26} \defcounter {refsection}{0}\relax -\contentsline {subsection}{\numberline {3.3.3}Song recommendation}{33}{subsection.3.3.3} +\contentsline {subsection}{\numberline {3.3.2}User profile modelling}{38}{subsection.3.3.2} \defcounter {refsection}{0}\relax -\contentsline {chapter}{\numberline {4}Experiments}{34}{chapter.4} +\contentsline {subsubsection}{Permutation EDA}{38}{section*.27} \defcounter {refsection}{0}\relax -\contentsline {section}{\numberline {4.1}Evaluation for recommender systems}{34}{section.4.1} +\contentsline {subsubsection}{Continuous Univariate Marginal Distribution Algorithm}{38}{section*.28} \defcounter {refsection}{0}\relax -\contentsline {subsection}{\numberline {4.1.1}Types of experiments}{34}{subsection.4.1.1} +\contentsline {subsection}{\numberline {3.3.3}Top-N songs recommendation}{38}{subsection.3.3.3} \defcounter {refsection}{0}\relax -\contentsline {subsection}{\numberline {4.1.2}Evaluation strategies}{35}{subsection.4.1.2} +\contentsline {chapter}{\numberline {4}Experiments}{39}{chapter.4} \defcounter {refsection}{0}\relax -\contentsline {subsection}{\numberline {4.1.3}Decision based metrics}{36}{subsection.4.1.3} +\contentsline {section}{\numberline {4.1}Evaluation for recommender systems}{39}{section.4.1} \defcounter {refsection}{0}\relax -\contentsline {subsubsection}{Precision}{36}{section*.25} +\contentsline {subsection}{\numberline {4.1.1}Types of experiments}{39}{subsection.4.1.1} \defcounter {refsection}{0}\relax -\contentsline {subsubsection}{Recall}{36}{section*.26} +\contentsline {subsection}{\numberline {4.1.2}Evaluation strategies}{40}{subsection.4.1.2} \defcounter {refsection}{0}\relax -\contentsline {subsubsection}{F1}{36}{section*.27} +\contentsline {subsection}{\numberline {4.1.3}Decision based metrics}{41}{subsection.4.1.3} \defcounter {refsection}{0}\relax -\contentsline {subsubsection}{Accuracy}{36}{section*.28} +\contentsline {subsubsection}{Precision}{41}{section*.29} \defcounter {refsection}{0}\relax -\contentsline {section}{\numberline {4.2}Evaluation method}{37}{section.4.2} +\contentsline {subsubsection}{Recall}{41}{section*.30} \defcounter {refsection}{0}\relax -\contentsline {subsection}{\numberline {4.2.1}Training set and test set}{37}{subsection.4.2.1} +\contentsline {subsubsection}{F1}{41}{section*.31} \defcounter {refsection}{0}\relax -\contentsline {chapter}{\numberline {5}Results}{38}{chapter.5} +\contentsline {subsubsection}{Accuracy}{41}{section*.32} \defcounter {refsection}{0}\relax -\contentsline {section}{\numberline {5.1}Genre classification results}{38}{section.5.1} +\contentsline {section}{\numberline {4.2}Evaluation method}{42}{section.4.2} \defcounter {refsection}{0}\relax -\contentsline {section}{\numberline {5.2}Recommender evaluation results}{39}{section.5.2} +\contentsline {subsection}{\numberline {4.2.1}Training set and test set}{42}{subsection.4.2.1} \defcounter {refsection}{0}\relax -\contentsline {chapter}{\numberline {6}Conclusion}{40}{chapter.6} +\contentsline {chapter}{\numberline {5}Results}{43}{chapter.5} \defcounter {refsection}{0}\relax -\contentsline {section}{\numberline {6.1}Future work}{40}{section.6.1} +\contentsline {section}{\numberline {5.1}Genre classification results}{43}{section.5.1} \defcounter {refsection}{0}\relax -\contentsline {chapter}{References}{41}{section.6.1} +\contentsline {section}{\numberline {5.2}Recommender evaluation results}{44}{section.5.2} +\defcounter {refsection}{0}\relax +\contentsline {chapter}{\numberline {6}Conclusion}{45}{chapter.6} +\defcounter {refsection}{0}\relax +\contentsline {section}{\numberline {6.1}Future work}{45}{section.6.1} +\defcounter {refsection}{0}\relax +\contentsline {chapter}{References}{46}{section.6.1}
--- a/Report/references.bib Mon Aug 31 02:43:54 2015 +0100 +++ b/Report/references.bib Tue Sep 01 11:29:38 2015 +0100 @@ -1,3 +1,14 @@ +@article{scikit-learn, + title={Scikit-learn: Machine Learning in {P}ython}, + author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V. + and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P. + and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and + Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.}, + journal={Journal of Machine Learning Research}, + volume={12}, + pages={2825--2830}, + year={2011} +} @inproceedings{gallagher2007bayesian, title={Bayesian inference in estimation of distribution algorithms}, author={Gallagher, Marcus and Wood, Ian and Keith, Jonathan and Sofronov, George}, @@ -15,7 +26,7 @@ number={3}, pages={451-468}, doi={10.3923/jse.2015.451.468}, - note={cited By 0}, + note={}, url={http://www.scopus.com/inward/record.url?eid=2-s2.0-84924609049&partnerID=40&md5=e6419e97e218f8ef1600e3d21e6a9e36}, document_type={Article}, source={Scopus}, @@ -29,7 +40,7 @@ volume={15}, pages={113-120}, doi={10.1016/j.asoc.2013.10.016}, - note={cited By 0}, + note={}, url={http://www.scopus.com/inward/record.url?eid=2-s2.0-84889065631&partnerID=40&md5=6ba595eee679fa8355329646504b3ae3}, document_type={Article}, source={Scopus},
--- a/slides/chiliguano_msc_project_slides.tex Mon Aug 31 02:43:54 2015 +0100 +++ b/slides/chiliguano_msc_project_slides.tex Tue Sep 01 11:29:38 2015 +0100 @@ -30,9 +30,9 @@ \end{frame} % Uncomment these lines for an automatically generated outline. -%\begin{frame}{Outline} -% \tableofcontents -%\end{frame} +\begin{frame}{Outline} + \tableofcontents +\end{frame} \section{Introduction} @@ -92,4 +92,9 @@ \end{frame} +\section{Conclusions} +\begin{frame} + a +\end{frame} + \end{document}