Mercurial > hg > musicweb-iswc2016

--- a/musicweb.tex	Sat Apr 30 18:15:14 2016 +0100
+++ b/musicweb.tex	Sat Apr 30 18:48:51 2016 +0100
@@ -264,12 +264,12 @@
 %% 4.1 Socio-cultiral linkage (using linked data)
 %% 4.2 Artist similarity by NLP [needs a better subtitle]
 %% 4.3 Artist similarity by features [i can write this part]
-Music does not lend itself easily to categorisation. There are many ways in which artist can be, and in fact are, considered to be related. Similarity may refer to whether artists' songs sound similar, or are considered to be in the same style or genre. But it may also mean that they are followed by people from similar social backgrounds or political inclinations, or similar ages; or perhaps they are similar because they have played together, or participated in the same event, or their songs touch on similar themes. Linked data facilitates faceted searching and displaying of information\cite{Oren2006}: an artist may be similar to many other artists in one of the ways just mentioned, and to a completely different plethora of artists in other senses, all of which might contribute to music discovery. Semantic web technologies can help us gather different facets of data and shape them into representations of knowledge. MusicWeb does this by searching similarities in three different domains: socio-cultural, research and journalistic literature and content-based linkage.
+Music does not lend itself easily to categorisation. There are many ways in which artist can be, and in fact are, considered to be related. Similarity may refer to whether artists' songs sound similar, or are perceived to be in the same style or genre. But it may also mean that they are followed by people from similar social backgrounds or political inclinations, or similar ages; or perhaps they are similar because they have played together, or participated in the same event, or their songs touch on similar themes. Linked data facilitates faceted searching and displaying of information\cite{Oren2006}: an artist may be similar to many other artists in one of the ways just mentioned, and to a completely different plethora of artists in other senses, all of which might contribute to music discovery. Semantic web technologies can help us gather different facets of data and shape them into representations of knowledge. MusicWeb does this by searching similarities in three different domains: socio-cultural, research and journalistic literature and content-based linkage.
 \subsection{Socio-cultural linkage}
-NOTE: not sure about this. Do we consider the dbpedia queries to be socio-cultural? or the collaborates-with in musicbrainz?
+NOTE: not sure about this. Do we consider the dbpedia queries to be socio-cultural? or the collaborates-with in musicbrainz? Or do you (George) mean something like the introduction just above?

 \subsection{Similarity in the literature}
-Artists tend to be regarded as similar when writing about certain topics. For example: a psychologist interested in self-image during adolescence might want to research the impact of artists like Miley Cyrus or Rihanna on young teenagers\cite{Lamb2013}. Or a historian researching class politics might write about The Sex Pistols and John Lennon\cite{Moliterno2012}. The starting point is a large database of 100,000 artists. MusicWeb searches and collects texts which mention each artist from several sources and carries out semantic analysis to identify such connections between artists and higher-level topics. There are two main sources of texts:
+Artists can be regarded as similar when they appear in texts about certain topics. For example: a psychologist interested in self-image during adolescence might want to research the impact of artists like Miley Cyrus or Rihanna on young teenagers\cite{Lamb2013}. Or a historian researching class politics in the UK might write about The Sex Pistols and John Lennon\cite{Moliterno2012}. Such artists can be linked together as being somehow involved with these topics, having some connection to them. In order to extract these relations one must mine the data from texts using natural language processing. Our starting point is a large database of 100,000 artists. MusicWeb searches several sources and collects texts that mention each artist. It then carries out semantic analysis to identify connections between artists and higher-level topics. There are two main sources of texts:
 \begin{enumerate}
 \item Research articles. There are various web resources that allow querying their research literature databases. MusicWeb uses mendeley\footnote{http://dev.mendeley.com/} and elsevier\footnote{http://dev.elsevier.com/}. Both resources offer managed and largely curated data and search possibilities include keywords, authors and disciplines. Data comprehension varies, but most often it features an array of keywords, an abstract, readership categorised according to discipline and sometimes the article itself.
   \item Online publications, such as newspapers, music magazines and blogs focused on music. This is non-managed, non-curated data, it must be extracted from the body of the text. The data is accessed after having crawled websites searching for keywords or tags in the title, and then scraped. External links contained in the page are also followed.