view Report/chapter4/evaluation.tex @ 23:45e6f85d0ba4

List of clips downloaded
author Paulo Chiliguano <p.e.chiilguano@se14.qmul.ac.uk>
date Tue, 11 Aug 2015 14:23:42 +0100
parents e68dbee1f6db
children fafc0b249a73
line wrap: on
line source
\chapter{Experiments}

In order to evaluate the performance of a recommender system, there are several scenarios to be considered depending on the structure of the dataset and the prediction accuracy. It is therefore necessary to determine a suitable experiment for evaluation of the proposed hybrid music recommendation system that employs an user-item matrix and vector representation for songs as inputs to predict ratings of items that an user has not previously listened to. In addition, the performance of the hybrid approach is compared with a pure content-based recommender algorithm. 

%\section{Experiment aims}
%deviation between the actual and predicted ratings is measured 
%the prediction ratings are compared with a model-based collaborative filtering.

\section{Evaluation for recommender systems}

\subsection{Types of experiments}
The scenarios for experiments requires to define an hypothesis, controlling variables and generalization of the results. Three types of experiments \citep{export:115396} can be used to compare and evaluate recommender algorithms:
\begin{itemize}
\item \textbf{Offline experiments:} where recorded historic data of users' ratings are used to simulate online users behaviour. The aim of this type of experiment is to refine approaches before testing with real users. On the other hand, results may have biases due to distribution of users.
\item \textbf{User studies:} where test subjects interact with the recommendation system and its behaviour is recorded giving a large sets of quantitative measurements. One disadvantage of this type of experiment is to recruit subjects that represent the population of the users of the real recommendation system.
\item \textbf{Online evaluation:} where the designer of the recommender application expect to influence the users' behaviour. Usually, this type of evaluation are run after extensive offline studies.
\end{itemize}

Besides, evaluation of recommender systems can be classified \citep{1242} in:
\begin{itemize}
\item \textbf{System-centric} process has been extensively exploited in CF systems. The accuracy of recommendations is based exclusively on users' dataset.
\item \textbf{Network-centric} process examines other components of the recommendation system, such as diversity of recommendations, and they are measured as a complement of the metrics of system-centric evaluation.
\item \textbf{User-centric:} The perceived quality and usefulness of recommendations for the users are measured via provided feedback.
\end{itemize}

\section{Evaluation method}
The hybrid music recommender system proposed in this project is evaluated through an offline experiment and the results are presented with system-centric metrics.

\subsection{Dataset description}
For the purpose of evaluation of the hybrid recommender system, a sample from the Taste Profile subset is used because the data format includes user-item ratings and it is publicly available. A 10-fold cross validation is performed which splits the data set in 90\% for training and 10\% for testing.

\subsection{Evaluation measures}
Because the dataset does not include explicit ratings, hence, the number of plays of tracks are considered as users' behaviours,

decision-based metrics are considered.