Mercurial > hg > musicweb-iswc2016
changeset 18:596164f18966
NLP
author | mariano |
---|---|
date | Sat, 30 Apr 2016 14:17:23 +0100 |
parents | e0906a511be2 |
children | 3100eb38e180 |
files | musicweb.tex |
diffstat | 1 files changed, 17 insertions(+), 1 deletions(-) [+] |
line wrap: on
line diff
--- a/musicweb.tex Sat Apr 30 13:00:44 2016 +0100 +++ b/musicweb.tex Sat Apr 30 14:17:23 2016 +0100 @@ -272,7 +272,12 @@ NOTE: not sure about this. Do we consider the dbpedia queries to be socio-cultural? or the collaborates-with in musicbrainz? \subsection{Similarity in the literature} - +Artists tend to be regarded as similar when writing about certain topics. For example: a psychologist interested in self-image during adolescence might want to research the impact of artists like Miley Cyrus or Rihanna on young teenagers\cite{Lamb2013}. Or a historian researching class politics might write about The Sex Pistols and John Lennon\cite{Moliterno2012}. MusicWeb searches and collects texts from several sources and carries out semantic analysis to identify such connections between artists and higher-level topics. There are two main sources of texts: +\begin{enumerate} +\item Research articles. There are various web resources that allow querying their research literature databases. MusicWeb uses mendeley\footnote{http://dev.mendeley.com/} and elsevier\footnote{http://dev.elsevier.com/}. Both resources offer managed and largely curated data and search possibilities include keywords, authors and disciplines. Data comprehension varies, but most often it features an array of keywords, an abstract, readership categorised according to discipline and sometimes the article itself. + \item Online publications, such as newspapers, music magazines and blogs focused on music. This is non-managed, non-curated data, it must be extracted from the body of the text. The data is accessed after having crawled websites searching for keywords or tags in the title, and then scraped. External links contained in the page are also followed. +\end{enumerate} +The text (or the abstact, in the case of research articles) are subjected to semantic analysis. It is first tokenised and a bag of words is extracted from it. This bag of words is used to query the alchemy\footnote{http://www.alchemyapi.com/} language analysis service for entity recognition, keyword extraction and topic modelling. The entity recogniser provides a list of names that are mentioned in the text and which are identified by a model trained with a large dataset of names. It can include toponyms, institutions, publications and persons. MusicWeb is interested in identifying artists, so every person mentioned is checked against three resources: dbpedia, musicbrainz and freebase. \begin{itemize} \item Semantic analysis\cite{Landauer1998} \item Topic modeling\cite{Blei2012} @@ -375,6 +380,17 @@ E.~Oren, R.~ Delbru, and S.~Decker \newblock Extending faceted navigation for rdf data. \newblock In {\em ISWC, 559–572}, 2006 + + \bibitem{Lamb2013} + S.~Lamb, K.~Graling and E. E. Wheeler + \newblock ‘Pole-arized’ discourse: An analysis of responses to Miley Cyrus’s Teen Choice Awards pole dance. + \newblock In {\em Feminism Psychology vol. 23}, May 2013 + + \bibitem{Moliterno2012} + A. G.~Moliterno + \newblock What Riot? Punk Rock Politics, Fascism, and Rock Against Racism. + \newblock Published online: \url{http://www.studentpulse.com/articles/612/what-riot-punk-rock-politics-fascism-and-rock-against-racism}, 2012 + \bibitem{Landauer1998} T.~Landauer, P.~Folt, and D.~Laham. \newblock An introduction to latent semantic analysis