view jamendo/sparql-archived/SeRQL/serql.doc @ 27:d95e683fbd35 tip

Enable CORS on urispace redirects as well
author Chris Cannam
date Tue, 20 Feb 2018 14:52:02 +0000
parents df9685986338
children
line wrap: on
line source
\documentclass[11pt]{article}
\usepackage{times}
\usepackage{pl}
\usepackage{html}
\makeindex

\onefile
\htmloutput{html}				% Output directory
\htmlmainfile{index}				% Main document file
\bodycolor{white}				% Page colour
\sloppy

\renewcommand{\runningtitle}{SWI-Prolog Semantic Web Server}

\begin{document}

\title{SWI-Prolog Semantic Web Server}
\author{Jan Wielemaker \\
	Human Computer Studies (HCS), \\
	University of Amsterdam \\
	The Netherlands \\
	E-mail: \email{wielemak@science.uva.nl}}

\maketitle

\begin{abstract}
SWI-Prolog offers an extensive library for loading, saving and querying
Semantic Web documents. Internally, the query language is `Prolog',
building on top of an efficient implementation of a predicate rdf/3
expressing the content of the triple store.

Emerging dedicated Semantic Web query languages change this view.
Supporting such languages provides a comfortable infrastructure for
distributed Semantic Web processing systems.  This document describes
the SWI-Prolog Semantic Web Server.  The server provides access to the
Prolog triple store using either SeRQL or SPARQL.  At the same time it
is an extensible platform for realising Semantic Web based applications.
\end{abstract}

\vfill

\pagebreak
\tableofcontents

\vfill
\vfill

\newpage


\section{Introduction}

The SWI-Prolog Semantic Web Server unifies the SWI-Prolog general Web
support and Semantic Web support, providing both a starting point for
dedicated applications and a platform for exchange of RDF-based data
using a standardised language and protocol. An overview of the
SWI-Prolog Web support libraries can be found in \url[SWI-Prolog and the
Web]{http://hcs.science.uva.nl/projects/SWI-Prolog/articles/TPLP-plweb.pdf},%
	\footnote{Submitted to Theory and Practice of Logic Programming}


\section{Query Languages}

The current server supports two query languages:
\url[SeRQL]{http://www.openrdf.org} and
\url[SPARQL]{http://www.w3.org/TR/rdf-sparql-query/}.
For both languages we provide an interactive service that presents
the results as a human-readable HTML table, a service presenting its
result as RDF/XML or XML that follows the HTTP protocol definition for
the query language, the possibility to query the local database using
a query language in Prolog and a Prolog client that can be used to
query remote services supporting the query language and HTTP service.

For both query languages, queries are translated to a complex Prolog
goal calling rdf/3 to resolve edges in the graph and calls to predicates
from rdfql_runtime.pl that realise constraints imposed by the SeRQL
\const{WHERE} clause and SPARQL \const{FILTER} clauses.


\subsection{SPARQL Support}

SPARQL support is based on the SPARQL specification, versioned April 6,
2006.  Status:

\begin{shortlist}
    \item No query optimization
    \item Limited value-testing, notably on xsd:dateTime
    \item Incomplete ORDER BY support.  Only ascending and all values
          are compared lexically.
    \item No support for named graphs
    \item Passes current test-suite, except tests affected by the above
          or acknowledged as errornous.
\end{shortlist}


\subsection{SeRQL Support}

SeRQL support and compatibility is based on development version
20040820, with additional support for the new 1.2 syntax and some of the
built-in functions. Both SeRQL and the HTTP API are fully defined in the
Sesame documentation.


\section{Installation and Administration}

\subsection{Getting started}

The file \file{parms.pl} contains a number of settings relevant to the
server. Notable the port to connect to, where to store user information,
etc. Persistent data kept by the server is a list of users and their
access rights (default \file{users.db}) and a file-based backup of the
in-memory store (default in the directory \file{SeRQL-store}). Please
check the content of \file{parms.pl} and follow directions in the
comments. On Unix-like systems, edit \file{run.pl} to adjust the
location of SWI-Prolog on the \verb$!#$ line. Next, start \file{run.pl}
and launch the server using the command below.

\begin{code}
?- serql_server.
\end{code}

Now direct your browser to the server, using the default setup this
is \url{http://localhost:3020}.  If no users are defined the browser
will prompt to enter the administrative password.  After that the
admin and anonymous users are created.  Accounts can be created and
modified by users with administrative rights through the
\emph{List users ...} link on the sidebar.

To restart from scratch, stop the server, delete the users database file
and/or the triple backup file and restart the server as described above.


\subsection{Persistent store}		\label{sec:backup}

The \file{parms.pl} setting \term{persistent_store}{Directory, Options}
can be used to specify file-based persistent backup for the in-memory
triple store. The store is a combination of quick-load triple databases
and journal files that hold the modifications made to the triple store.
Details of the persistent store are documented with the SWI-Prolog
\url[Semantic Web package]{http://www.swi-prolog.org/packages/semweb.html}


\section{Roadmap}

\subsection{Query processing and entailment}
\label{sec:entailment}

The kernel of the system is formed by \file{serql.pl} and \file{sparql},
that implement the DCG parsers for the respective query languages as
well as a compiler that translates this into a Prolog goal executing the
query op top of the SWI-Prolog SemWeb package. The file
\file{rdfql_runtime.pl} contains predicates that implement the
constraints (SeRQL WHERE or SPARQL FILTER) and other constructs
generated by the query-compiler. 

Entailment reasoning is defined by \file{rdf_entailment.pl}.  Specific
entailments are in seperate files:

\begin{description}
    \item [\file{no_entailment.pl}]
Defines entailment \const{none}.  Query explicitely stored triples only.

    \item [\file{rdf_entailment.pl}]
Defines entailment \const{rdf}. Any resource appearing in a
predicate position is of type \const{rdf:Property}. Any subject is an
instance of \const{rdf:Resource}

    \item [\file{rdfs_entailment.pl}]
Defines entailment \const{rdfs}. Adds class- and property-hierarchy
reasoning to RDF reasoning, as well as reasoning on the basis of
property domain and range.

    \item [\file{rdfslite_entailment.pl}]
Defines entailment \const{rdfslite}. Only considers the class- and
property-hierarchy. Using a backward chaining solver this is much
faster, while normally keeping the intended meaning.
\end{description}

The query compiler and execution system can be called directly from
Prolog.

\begin{description}
    \predicate{serql_compile}{3}{+Query, -Compiled, +Options}
Compile \arg{Query}, which is either an atom or a list of character
codes and unify \arg{Compiled} with an opaque term representing the
query and suitable for passing to serql_run/2.  Defined
\arg{Options} are:

\begin{description}
    \termitem{entailment}{Entailment}
Entailment to use.  Default is \const{rdfs}.  See \secref{entailment}.

    \termitem{type}{-Type}
Extract the type of query compiled and generally useful information on
it. SeRQL defines the types \const{construct} and
\term{select}{VarNames}, where \arg{VarNames} is a list of variables
appearing in the projection.

    \termitem{optimise}{Bool}
Whether or not to optimise the query.  Default is defined by the setting
\const{optimise_query}.
\end{description}

    \predicate{sparql_compile}{3}{+Query, -Compiled, +Options}
Similar to to serql_compile/3.  Defined types are extended with
\const{describe} and \const{ask}.  Addional options are:

\begin{description}
    \termitem{base_uri}{-URI}
Base URI used to compile the query if not specified as part of the
query.

    \termitem{ordered}{-Bool}
Unify \arg{Bool} with true if query contains an \const{ORDER BY} clause.

    \termitem{distinct}{-Bool}
Unify \arg{Bool} with true if query contains a \const{DISTINCT}
modifier.
\end{description}

    \predicate{serql_run}{2}{+Compiled, -Answer}
Run a query compiled by serql_compile/3, returning terms \term{row}{Arg
...} for select queries and terms \term{rdf}{Subject, Predicate, Object}
for construct queries.	Subsequent results are returned on backtracking.

    \predicate{sparql_run}{2}{+Compiled, -Answer}
Similar to serql_run/2. Queries of type \const{describe} return
rdf-terms like \const{construct}.  Queries of type \const{ask} return
either \const{true} or \const{false}.

    \predicate{serql_query}{3}{+Query, -Answer, +Options}
Utility combining of serql_compile/3 and serql_run/2. Note this gives no
access to the column-names.

    \predicate{sparql_query}{3}{+Query, -Answer, +Options}
Similar to serql_query/3.
\end{description}


\subsection{Query optimisation}

By default, but under control of the setting/1 option
\term{optimise_query}{Bool}, and the option \term{optimise}{Bool}, the
query compiler optimises initial goal obtained from naive translation of
the query text. The optimiser is defined in \file{rdf_optimise.pl}. The
optimiser is described in detail in \url[An optimised Semantic Web query
language implementation in
Prolog]{http://hcs.science.uva.nl/projects/SWI-Prolog/articles/ICLP05-SeRQL.pdf}.
The optimiser reorders goals in the generated conjunction and prepares
for independent execution of independent parts of the generated goal.
With the optimiser enabled (default), the provided order of
path-expressions on the query text is completely ignored and constraints
are inserted at the earliest possible point.

The SeRQL \const{LIKE} operator applies to both resources and literals,
while the SWI-Prolog RDF-DB module can only handle \const{LIKE}
efficiently on literals. The optimiser can be made aware of this using
\exam{WHERE label(X) LIKE "joe*"}. Taking the label informs the
optimiser that it only needs to consider literals. Likewise, equivalence
tests where one of the arguments is used as subject or predicate or has
the isResource(X) constraint tell the system it can do straight
identifier comparison rather then the much more expensive general
comparison.

Query optimisation is not yet supported for SPARQL.

\subsection{Webserver}

The webserver is realised by \file{server.pl}, merely loading both
components: \file{http_data.pl} providing the Sesame HTTP API using
the same paths and parameters and \file{http_user.pl} providing a
browser-friendly frontend.  Error messages are still very crude
and almost all errors return a 500 server error page with a 
transcription of the Prolog exception.

The Sesame HTTP API deals with a large number of data formats, only part
of which are realised by the current system. This realisation is
achieved through \file{rdf_result}, providing an extensible API for
reading and writing in different formats.  \file{rdf_html},
\file{rdf_write} and \file{xml_result} provide some implementations
thereof.


\section{The Sesame client}

The file \file{sesame_client.pl}, created by \url[Maarten
Menken]{mailto:mrmenken@cs.vu.nl} provides an API to remote
Sesame servers.  Below is a brief documentation of the available
primitives.  All predicates take an option list.  To simplify 
applications that communicate with a single server defauls for
the server and reposititory locations can be specified using
set_sesame_default/1.

\begin{description}
    \predicate{set_sesame_default}{1}{+DefaultOrList}
This predicate can be used to specify defaults for the options
available to the other Sesame interface predicates.  A default
is a term \term{Option}{Value}.  If a list of such options is
provided all options are set in the order of appearance in the
list.  This implies options later in the list may overrule
already set options.  Defined options are:

\begin{description}
    \termitem{host}{Host}
Hostname running the Sesame server.
    \termitem{port}{Port}
Por the sesame server listens on.
    \termitem{path}{Path}
Path from the root to the Sesame server.  For the SWI-Prolog
Sesame client, this is normally the empty atom (\verb$''$).
For thte Java based Sesame this is normally \verb$'/sesame'$.
    \termitem{repository}{Repository}
Name of the repository to connect to. See also
sesame_current_repository/3.
\end{description}

Below is a typical call to connect to a sesame server:

\begin{code}
...,
set_sesame_default([ host(localhost),
		     port(8080),
		     path('/sesame'),
		     repository('mem-rdfs-db')
		   ]).
\end{code}

    \predicate{sesame_current_repository}{3}{-Id, -Properties, +Options}
Enumerate the currently available Sesame repositories.  \arg{Id} is
unified to the name of the repository.  \arg{Properties} is a
list of \term{Name}{Value} terms providing title and access details.
\arg{Options} specifies the host, port and path of the server.

    \predicate{sesame_clear_repository}{1}{+Options}
Remove all content from the repository. \arg{Options} specifies the
host, port and path of the server as well as the target repository.

    \predicate{sesame_login}{3}{+User, +Password, +Options}
Login to a Sesame server. On success the returned cookie is stored and
transmitted with each query on the same server. \arg{Options} specifies
the host, port and path of the server.

    \predicate{sesame_logout}{1}{+Options}
\arg{Options} specifies the host, port and path of the server.

    \predicate{sesame_graph_query}{3}{+Query, -Triple, +Options}
Execute \arg{Query} on the given server and return the resulting
triples on backtracking.  \arg{Options} specifies the
host, port and path of the server as well as the target repository.
The example below extracts all type relations from the default server.

\begin{code}
...,
sesame_graph_query('construct * from {s} <rdf:type> {o}',
		   rdf(S,P,O),
		   []),
\end{code}

    \predicate{sesame_table_query}{3}{+Query, -Row, +Options}
Execute \arg{Query} on the given server and return the resulting
rows on backtracking.  Each \arg{Row} is a term of the format
\term{row}{Col1, Col2, ... ColN}. \arg{Options} specifies the
host, port and path of the server as well as the target repository.

    \predicate{sesame_extract_rdf}{2}{-Triple, +Options}
Extract all content from an RDF repository.  In addition to the
server and repository options the following options are defined:
    \begin{description}
	\termitem{schema}{OnOff}
Extract the schema information.
	\termitem{data}{OnOff}
Extract the plain data
	\termitem{explicit_only}{OnOff}
Determine whether or not entailed triples are returned.  Default
is \const{off}, returning both explicit and inferred triples.
    \end{description}

    \predicate{sesame_upload_file}{+File, +Options}
Add the content of \arg{File} to the repository.  In addition to the
server and repository options the following options are defined:

    \begin{description}
        \termitem{data_format}{+Format}
Format of the input file.  Default is \const{rdfxml}.
	\termitem{base_uri}{+BaseURI}
URI for resolving local names.  Default is \const{foo:bar}.
	\termitem{verify_data}{OnOff}
Do/do not verify the input.  Default is \const{off}.
    \end{description}

    \predicate{sesame_assert}{2}{+TripleOrList, +Options}
Assert a single \term{rdf}{Subject, Predicate, Object} or a list
of such terms.   In addition to the
server and repository options the following options are defined:

    \begin{description}
	\termitem{base_uri}{+BaseURI}
URI for resolving local names.  Default is \const{foo:bar}.
    \end{description}

    \predicate{sesame_retract}{2}{+Triple, +Options}
Remove a triple from the repository.  Variables in Triple match all
values for that field.
\end{description}


\section{Sesame interoperability}

The SWI-Prolog SeRQL engine provides a (still incomplete) drop-in
replacement for the Sesame HTTP access protocol.  Sesame's remote
server class can be used to access the SWI-Prolog SeRQL engine
through the Sesame Java API. Likewise the Prolog client realised by
\file{sesame_client.pl} provides a Prolog API that can be used to access
both Sesame and the SWI-Prolog SeRQL engine.


\section{The SPARQL client}

The file \file{sparql_client.pl} provides a client to the SPARQL HTTP
protocol. The protocol defines how a SPARQL query is asked over HTTP
and how the results are presented.  It is possible to use the SeRQL
protocol on the same server to perform tasks such as modifying the
triple store.

The structure of the SPARQL client API is closely based on the SeRQL
client.

\begin{description}
    \predicate{sparql_query}{3}{+Query, -Row, +Options}
Run a SPARQL query on a remote server, retrieving the results one-by-one
on backtracking.  \arg{Options} provide the host, port and path of the
server.  sparql_set_server/1 can be used to define default locations.

    \predicate{sparql_set_server}{1}{+Options}
List of options that act as defaults for sparql_query/3.  Commonly set
to specify the server location.  For example:

\begin{code}
?- sparql_set_server([ host(localhost),
		       port(3020),
		       path('/sparql/')
		     ]).
\end{code}
\end{description}


\section{Security issues}

HTTP Communication with the server, including usernames and passwords,
is in cleartext and therefore sensitive to sniffing. The overall
security of the server is unknown.   It is advised to run the server
as user with minimal access rights, only providing write access to
the user database file.


\section{Downloading}

The SWI-Prolog SeRQL engine is available from CVS using the following
commands:

\begin{code}
% cvs -d :pserver:pl@gollem.science.uva.nl:/usr/local/cvspl login
Password: prolog
% cvs -d :pserver:pl@gollem.science.uva.nl:/usr/local/cvspl co SeRQL
\end{code}

Infrequently announces and snapshots are provided through the
\url[Prolog
Wiki]{http://gollem.science.uva.nl/twiki/pl/bin/view/Library/SeRQL}


\subsection*{Acknowledgements}

The SeRQL server has been realised as part of the \url[HOPS
project]{http://www.hops-fp6.org} and could not have been done without
Sesame and feedback from Jeen Broekstra and Maarten Menken from the Free
University of Amsterdam (VU). Adding SPARQL support has been realised as
part of the E-culture sub-project of Dutch MultiMedia project.

\printindex

\end{document}