Mercurial > hg > hybrid-music-recommender-using-content-based-and-social-information

\chapter{Experiments}
\section{Evaluation for recommender systems}

\subsection{Types of experiments}
The scenarios for experiments requires to define an hypothesis, controlling variables and generalization of the results. Three types of experiments \cite{export:115396} can be used to compare and evaluate recommender algorithms:
\begin{itemize}
\item \textbf{Offline experiments:} where recorded historic data of users' ratings are used to simulate online users behaviour. The aim of this type of experiment is to refine approaches before testing with real users. On the other hand, results may have biases due to distribution of users.
\item \textbf{User studies:} where test subjects interact with the recommendation system and its behaviour is recorded giving a large sets of quantitative measurements. One disadvantage of this type of experiment is to recruit subjects that represent the population of the users of the real recommendation system.
\item \textbf{Online evaluation:} where the designer of the recommender application expect to influence the users' behaviour. Usually, this type of evaluation are run after extensive offline studies.
\end{itemize}
Also, evaluation for recommender systems can be classified \cite{1242} in:
\begin{itemize}
\item \textbf{System-centric evaluation:} The accuracy is based only on users' dataset.
\item \textbf{Network-centric evaluation:} Other components of the recommendation system such as diversity of recommendations are measured as a complement of the metrics of system-centric evaluation.
\item \textbf{User-centric evaluation:} The perceived quality and usefulness of recommendations for the users are measured via provided feedback.
\end{itemize}
\section{Evaluation settings}
The hybrid recommender system of this project is evaluated with an offline experiment and system-centric metrics.
\subsection{Dataset}
For the purpose of evaluation of the hybrid recommender system the Last.fm Dataset - 1K users because the data format includes timestamps and it is publicly available. A 10-fold cross validation is performed which splits the data set in 90\% for training and 20\% for testing.
\subsection{Evaluation measures}
Because the data set does not include explicit ratings, hence, the number of plays of tracks are used as users's behavious, decision-based metrics are considered.
\subsection{Experimentation aims}
In order to evaluate the performance of the hybrid recommender, the prediction ratings are compared with a model-based collaborative filtering.
author	Paulo Chiliguano <p.e.chiilguano@se14.qmul.ac.uk>
date	Tue, 11 Aug 2015 10:56:51 +0100
parents	cb62e1df4493
children