Chris@0
|
1 /* $Id$
|
Chris@0
|
2
|
Chris@0
|
3 Part of SWI-Prolog
|
Chris@0
|
4
|
Chris@0
|
5 Author: Jan Wielemaker
|
Chris@0
|
6 E-mail: jan@swi.psy.uva.nl
|
Chris@0
|
7 WWW: http://www.swi-prolog.org
|
Chris@0
|
8 Copyright (C): 1985-2004, University of Amsterdam
|
Chris@0
|
9
|
Chris@0
|
10 This program is free software; you can redistribute it and/or
|
Chris@0
|
11 modify it under the terms of the GNU General Public License
|
Chris@0
|
12 as published by the Free Software Foundation; either version 2
|
Chris@0
|
13 of the License, or (at your option) any later version.
|
Chris@0
|
14
|
Chris@0
|
15 This program is distributed in the hope that it will be useful,
|
Chris@0
|
16 but WITHOUT ANY WARRANTY; without even the implied warranty of
|
Chris@0
|
17 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
Chris@0
|
18 GNU General Public License for more details.
|
Chris@0
|
19
|
Chris@0
|
20 You should have received a copy of the GNU Lesser General Public
|
Chris@0
|
21 License along with this library; if not, write to the Free Software
|
Chris@0
|
22 Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
|
Chris@0
|
23
|
Chris@0
|
24 As a special exception, if you link this library with other files,
|
Chris@0
|
25 compiled with a Free Software compiler, to produce an executable, this
|
Chris@0
|
26 library does not by itself cause the resulting executable to be covered
|
Chris@0
|
27 by the GNU General Public License. This exception does not however
|
Chris@0
|
28 invalidate any other reasons why the executable file might be covered by
|
Chris@0
|
29 the GNU General Public License.
|
Chris@0
|
30 */
|
Chris@0
|
31
|
Chris@0
|
32 :- module(rdf_optimise,
|
Chris@0
|
33 [ rdf_optimise/2, % +Query, -Optimised
|
Chris@0
|
34 rdf_optimise/4, % +Query, -Optimised, -Space, -Time
|
Chris@0
|
35 rdf_complexity/3, % :Goal, -SpaceEstimate, -TimeEstimate
|
Chris@0
|
36 serql_select_bind_null/2 % +Goal, -WithBind
|
Chris@0
|
37 ]).
|
Chris@0
|
38 :- use_module(library('semweb/rdf_db')).
|
Chris@0
|
39 :- use_module(library(debug)).
|
Chris@0
|
40 :- use_module(library(lists)).
|
Chris@0
|
41 :- use_module(library(assoc)).
|
Chris@0
|
42 :- use_module(library(ordsets)).
|
Chris@0
|
43
|
Chris@0
|
44 /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
Chris@0
|
45 Queries as returned by serql_compile_path/2 consists of a path
|
Chris@0
|
46 expression which is compiled to a conjunction of calls to rdf/3 and the
|
Chris@0
|
47 translation of the WHERE clause acting as an additional filter.
|
Chris@0
|
48
|
Chris@0
|
49 Optimisation of a query basically means moving conditions into the rdf/3
|
Chris@0
|
50 calls where possible, moving other conditions as early as possibly in
|
Chris@0
|
51 the graph-matching and reordering graph matching calls (rdf/3) to reduce
|
Chris@0
|
52 non-determinism.
|
Chris@0
|
53
|
Chris@0
|
54 Reordering
|
Chris@0
|
55 ----------
|
Chris@0
|
56
|
Chris@0
|
57 Reordering of graph expressions is required to reduce backtracking.
|
Chris@0
|
58 Roughly I see three approaches:
|
Chris@0
|
59
|
Chris@0
|
60 * Learning
|
Chris@0
|
61 Create permutations of the query and make them run under time
|
Chris@0
|
62 constraints. Try to learn patterns that work (fast).
|
Chris@0
|
63
|
Chris@0
|
64 * Use statistics
|
Chris@0
|
65 Given the number of solutions on a certain partially instantiated
|
Chris@0
|
66 rdf/3 call (and the required execution time), reorganise them to
|
Chris@0
|
67 minimalise the cost.
|
Chris@0
|
68
|
Chris@0
|
69 * Use constraint solving
|
Chris@0
|
70 Instead of trying to solve an rdf/3 call, create a constraint from
|
Chris@0
|
71 it and only try to solve it if there is more information. This
|
Chris@0
|
72 is especially attractive if some form of high-level reasoning
|
Chris@0
|
73 from the language entailment rules can be applied or set-theory
|
Chris@0
|
74 is a possibility.
|
Chris@0
|
75
|
Chris@0
|
76 After experiments with using constraint solving, this module now uses
|
Chris@0
|
77 planning based on statistical information provided by the rdf_db module.
|
Chris@0
|
78 This algorithm reaches optimal performance under quite reasonable
|
Chris@0
|
79 assumptions while the planning overhead is very reasonable. The
|
Chris@0
|
80 algorithm is described in "An optimised Semantic Web query language
|
Chris@0
|
81 implementation in Prolog", available from
|
Chris@0
|
82 http://hcs.science.uva.nl/projects/SWI-Prolog/articles/ICLP05-SeRQL.pdf
|
Chris@0
|
83
|
Chris@0
|
84 NOTES:
|
Chris@0
|
85
|
Chris@0
|
86 * LIKE works on resources *and* literals. Do we want this?
|
Chris@0
|
87 See http://www.openrdf.org/forum/mvnforum/viewthread?thread=275
|
Chris@0
|
88 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
|
Chris@0
|
89
|
Chris@0
|
90 :- multifile
|
Chris@0
|
91 user:goal_expansion/2.
|
Chris@0
|
92
|
Chris@0
|
93 user:goal_expansion(rdf_complexity(G0, C), rdf_complexity(G, C)) :-
|
Chris@0
|
94 expand_goal(G0, G).
|
Chris@0
|
95 user:goal_expansion(rdf_optimise(G0, C, E), rdf_optimise(G, C, E)) :-
|
Chris@0
|
96 expand_goal(G0, G).
|
Chris@0
|
97
|
Chris@0
|
98
|
Chris@0
|
99 /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
Chris@0
|
100 Plan (conjunctions)
|
Chris@0
|
101
|
Chris@0
|
102 * Generate permutations and costs. Select cheapest
|
Chris@0
|
103 * The above moves tests right after the RDF call filling
|
Chris@0
|
104 its input argument. As a last optimisation some of these
|
Chris@0
|
105 searches may be integrated in the rdf/3 call to help using
|
Chris@0
|
106 indexing of the RDF database.
|
Chris@0
|
107
|
Chris@0
|
108 complexity/2 needs to update the order of clauses inside meta calls
|
Chris@0
|
109 (notably optional path expressions).
|
Chris@0
|
110
|
Chris@0
|
111 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
|
Chris@0
|
112
|
Chris@0
|
113 %% rdf_optimise(+Goal, -Optimized) is det.
|
Chris@0
|
114 %
|
Chris@0
|
115 % Goal is a Prolog control-structure with calls to the rdf_db.pl
|
Chris@0
|
116 % and SeRQL runtime predicates. Optimized is a semantically
|
Chris@0
|
117 % equivalent goal, obtained by re-ordering conjunctions in Goal
|
Chris@0
|
118 % and processing sub-queries that do not share variables
|
Chris@0
|
119 % independently.
|
Chris@0
|
120
|
Chris@0
|
121 rdf_optimise(Conj, Optimised) :-
|
Chris@0
|
122 rdf_optimise(Conj, Optimised, _, _).
|
Chris@0
|
123
|
Chris@0
|
124 rdf_optimise(Conj, Optimised, SpaceEstimate, TimeEstimate) :-
|
Chris@0
|
125 optimise_order(Conj, Ordered, SpaceEstimate, TimeEstimate),
|
Chris@0
|
126 carthesian(Ordered, Optimised).
|
Chris@0
|
127
|
Chris@0
|
128 optimise_order(Conj, Optimised, Space, Estimate) :-
|
Chris@0
|
129 debug(rdf_optimise, '*** OPTIMIZING ***~n', []),
|
Chris@0
|
130 dbg_portray_body(Conj),
|
Chris@0
|
131 term_variables(Conj, Vars),
|
Chris@0
|
132 rdf_complexity(Conj, Conj, S0, E0),
|
Chris@0
|
133 State = state(Vars-Conj, S0, E0, 1),
|
Chris@0
|
134 debug(rdf_optimise, 'C0 = ~w~n', [E0]),
|
Chris@0
|
135 ( reorder(Conj, Perm),
|
Chris@0
|
136 rdf_complexity(Perm, Perm1, S, C),
|
Chris@0
|
137
|
Chris@0
|
138 arg(4, State, N),
|
Chris@0
|
139 N2 is N + 1,
|
Chris@0
|
140 nb_setarg(4, State, N2),
|
Chris@0
|
141
|
Chris@0
|
142 ( arg(3, State, C0),
|
Chris@0
|
143 C < C0
|
Chris@0
|
144 -> debug(rdf_optimise,
|
Chris@0
|
145 'BETTER ONE [~D]: --> space=~w, time=~w~n', [N, S, C]),
|
Chris@0
|
146 dbg_portray_body(Perm1),
|
Chris@0
|
147 nb_setarg(3, State, C),
|
Chris@0
|
148 nb_setarg(2, State, S),
|
Chris@0
|
149 nb_setarg(1, State, Vars-Perm1)
|
Chris@0
|
150 ; true
|
Chris@0
|
151 ),
|
Chris@0
|
152 fail
|
Chris@0
|
153 ; arg(1, State, Vars-Optimised),
|
Chris@0
|
154 arg(2, State, Space),
|
Chris@0
|
155 arg(3, State, Estimate),
|
Chris@0
|
156 debug(rdf_optimise, ' --> optimised: s/t = ~w/~w --> ~w/~w~n',
|
Chris@0
|
157 [S0, E0, Space, Estimate]),
|
Chris@0
|
158 dbg_portray_body(Optimised)
|
Chris@0
|
159 ), !.
|
Chris@0
|
160 optimise_order(Conj, Conj, -1, -1) :-
|
Chris@0
|
161 debug(rdf_optimise, 'Failed to optimise:~n', []),
|
Chris@0
|
162 dbg_portray_body(Conj).
|
Chris@0
|
163
|
Chris@0
|
164
|
Chris@0
|
165 /*******************************
|
Chris@0
|
166 * REORDERING *
|
Chris@0
|
167 *******************************/
|
Chris@0
|
168
|
Chris@0
|
169 %% reorder(Goal, Reordered)
|
Chris@0
|
170 %
|
Chris@0
|
171 % Assuming Goal is a conjunction, create permutations of its
|
Chris@0
|
172 % order. Instead of blindly generating permutations however, we
|
Chris@0
|
173 % get an arbitrary element of the conjunction to the front and
|
Chris@0
|
174 % compute subgraphs of goals connected through variables _after_
|
Chris@0
|
175 % executing the first goal.
|
Chris@0
|
176
|
Chris@0
|
177 reorder(Goal, Reordered) :-
|
Chris@0
|
178 State = bindings([]),
|
Chris@0
|
179 conj_to_list(Goal, Conj0),
|
Chris@0
|
180 reorder_conj(Conj0, State, Conj1),
|
Chris@0
|
181 %% permutation(Conj0, Conj1), % Alternatively :-)
|
Chris@0
|
182 list_to_conj(Conj1, Reordered),
|
Chris@0
|
183 arg(1, State, Bindings),
|
Chris@0
|
184 unbind(Bindings).
|
Chris@0
|
185
|
Chris@0
|
186 reorder_conj([One], _, [One]) :- !.
|
Chris@0
|
187 reorder_conj(List, State, Perm) :-
|
Chris@0
|
188 group_by_cut(List, Before, Cut, After), !,
|
Chris@0
|
189 reorder_conj(Before, State, PermBefore),
|
Chris@0
|
190 bind_args(Before, State), % this part is done
|
Chris@0
|
191 reorder_conj(After, State, PermAfter),
|
Chris@0
|
192 append(PermBefore, [Cut|PermAfter], Perm).
|
Chris@0
|
193 reorder_conj(List, State, Perm) :-
|
Chris@0
|
194 group_by_optional(List, Normal, Optional), !,
|
Chris@0
|
195 reorder_conj(Normal, State, PermNormal),
|
Chris@0
|
196 bind_args(Normal, State), % this part is done
|
Chris@0
|
197 reorder_conj(Optional, State, PermOptional),
|
Chris@0
|
198 append(PermNormal, PermOptional, Perm).
|
Chris@0
|
199 reorder_conj(List, State, Result) :-
|
Chris@0
|
200 make_subgraphs(List, SubGraphs),
|
Chris@0
|
201 SubGraphs \= [_], !,
|
Chris@0
|
202 reorder_subgraph_conjs(SubGraphs, State, RestPerm),
|
Chris@0
|
203 flatten(RestPerm, Result).
|
Chris@0
|
204 reorder_conj(List, State, [Prefix|Perm]) :-
|
Chris@0
|
205 select(Prefix, List, Rest),
|
Chris@0
|
206 bind_args(Prefix, State),
|
Chris@0
|
207 make_subgraphs(Rest, SubGraphs),
|
Chris@0
|
208 reorder_subgraph_conjs(SubGraphs, State, RestPerm),
|
Chris@0
|
209 flatten(RestPerm, Perm).
|
Chris@0
|
210
|
Chris@0
|
211
|
Chris@0
|
212 %% reorder_subgraph_conjs(SubGraphs, -ReorderedSubGraphs)
|
Chris@0
|
213 %
|
Chris@0
|
214 % Reorder the individual subgraphs. As we know these are
|
Chris@0
|
215 % independent there is no need to order the subgraphs themselves.
|
Chris@0
|
216 % we also know they are fully connected, and no longer contain
|
Chris@0
|
217 % cuts, so we use the simplified version of reorder_conj/2 called
|
Chris@0
|
218 % reorder_conj2/2.
|
Chris@0
|
219
|
Chris@0
|
220 reorder_subgraph_conjs([], _, []).
|
Chris@0
|
221 reorder_subgraph_conjs([H0|T0], State, [H|T]) :-
|
Chris@0
|
222 reorder_conj2(H0, State, H),
|
Chris@0
|
223 reorder_subgraph_conjs(T0, State, T).
|
Chris@0
|
224
|
Chris@0
|
225 reorder_conj2([One], _, [One]) :- !.
|
Chris@0
|
226 reorder_conj2(List, State, [Prefix|Perm]) :-
|
Chris@0
|
227 select(Prefix, List, Rest),
|
Chris@0
|
228 bind_args(Prefix, State),
|
Chris@0
|
229 make_subgraphs(Rest, SubGraphs),
|
Chris@0
|
230 reorder_subgraph_conjs(SubGraphs, State, RestPerm),
|
Chris@0
|
231 flatten(RestPerm, Perm).
|
Chris@0
|
232
|
Chris@0
|
233
|
Chris@0
|
234 %% group_by_cut(+List, -BeforeCut, -Cut, -AfterCut)
|
Chris@0
|
235 %
|
Chris@0
|
236 % Split the conjunction over a cut (!) as we cannot guarantee
|
Chris@0
|
237 % proper behaviour when moving goals to the other side of a cut.
|
Chris@0
|
238
|
Chris@0
|
239 group_by_cut(Goals, Before, Cut, After) :-
|
Chris@0
|
240 Cut = goal(_, !, _),
|
Chris@0
|
241 append(Before, [Cut|After], Goals), !.
|
Chris@0
|
242
|
Chris@0
|
243
|
Chris@0
|
244 %% group_by_optional(+List, -NotOptional, -Optional)
|
Chris@0
|
245 %
|
Chris@0
|
246 % Split the conjunction in optional and non-optional part as we
|
Chris@0
|
247 % always want the optional part to happen last.
|
Chris@0
|
248
|
Chris@0
|
249 group_by_optional(List, Normal, Optional) :-
|
Chris@0
|
250 split_optional(List, Normal, Optional),
|
Chris@0
|
251 Normal \== [],
|
Chris@0
|
252 Optional \== [].
|
Chris@0
|
253
|
Chris@0
|
254 split_optional([], [], []).
|
Chris@0
|
255 split_optional([H|T0], Normal, Optional) :-
|
Chris@0
|
256 ( optional(H)
|
Chris@0
|
257 -> Optional = [H|T],
|
Chris@0
|
258 split_optional(T0, Normal, T)
|
Chris@0
|
259 ; Normal = [H|T],
|
Chris@0
|
260 split_optional(T0, T, Optional)
|
Chris@0
|
261 ).
|
Chris@0
|
262
|
Chris@0
|
263 optional(G) :-
|
Chris@0
|
264 goal(G, (_*->_;_)).
|
Chris@0
|
265
|
Chris@0
|
266 %% bind_args(Goal, !State)
|
Chris@0
|
267 %
|
Chris@0
|
268 % Bind all arguments in Goal or list of goals. Assumes that
|
Chris@0
|
269 % executing a goal grounds all its arguments. Only the goal A =
|
Chris@0
|
270 %% literal(B), generated by optimising where constraints is handled
|
Chris@0
|
271 % special.
|
Chris@0
|
272 %
|
Chris@0
|
273 % State is a term bindings(List) that is destructively maintained
|
Chris@0
|
274 % by instantiate/4.
|
Chris@0
|
275
|
Chris@0
|
276 bind_args([], _) :- !.
|
Chris@0
|
277 bind_args([H|T], State) :- !,
|
Chris@0
|
278 bind_args(H, State),
|
Chris@0
|
279 bind_args(T, State).
|
Chris@0
|
280 bind_args(H, State) :-
|
Chris@0
|
281 goal(H, A=literal(B)),
|
Chris@0
|
282 var(A), var(B), !,
|
Chris@0
|
283 ( instantiated(A, I),
|
Chris@0
|
284 I \== (-)
|
Chris@0
|
285 -> instantiate(B, _, I, State)
|
Chris@0
|
286 ; instantiated(B, I),
|
Chris@0
|
287 I \== (-)
|
Chris@0
|
288 -> instantiate(A, _, I, State)
|
Chris@0
|
289 ; true
|
Chris@0
|
290 ).
|
Chris@0
|
291 bind_args(Goal, State) :-
|
Chris@0
|
292 vars(Goal, Vars),
|
Chris@0
|
293 ground_vars(Vars, State).
|
Chris@0
|
294
|
Chris@0
|
295 ground_vars([], _).
|
Chris@0
|
296 ground_vars([H|T], State) :-
|
Chris@0
|
297 instantiate(H, _, r, State),
|
Chris@0
|
298 ground_vars(T, State).
|
Chris@0
|
299
|
Chris@0
|
300
|
Chris@0
|
301 %% make_subgraphs(+Goals, -SubGraphs)
|
Chris@0
|
302 %
|
Chris@0
|
303 % Create a list of connected subgraphs from Goals, assuming the
|
Chris@0
|
304 % variables in the assoc Grounded have been bound.
|
Chris@0
|
305
|
Chris@0
|
306 make_subgraphs([], []).
|
Chris@0
|
307 make_subgraphs([G0|GT], [S0|ST]) :-
|
Chris@0
|
308 empty_assoc(Visited0),
|
Chris@0
|
309 put_assoc(G0, Visited0, t, Visited1),
|
Chris@0
|
310 unbound_vars(G0, Vars),
|
Chris@0
|
311 empty_assoc(VV0),
|
Chris@0
|
312 vars_visited(Vars, VV0, VV, _, []),
|
Chris@0
|
313 select_subgraph(Vars, VV, GT, GR, Visited1, Visited),
|
Chris@0
|
314 assoc_keys(Visited, S0),
|
Chris@0
|
315 make_subgraphs(GR, ST).
|
Chris@0
|
316
|
Chris@0
|
317 select_subgraph([], _, Rest, Rest, Visited, Visited).
|
Chris@0
|
318 select_subgraph([V0|VT], VV0, Goals, Rest, Visited0, Visited) :-
|
Chris@0
|
319 select_related(Goals, V0, NewA, RG, VV0, VV1, Visited0, Visited1),
|
Chris@0
|
320 append(NewA, VT, Agenda),
|
Chris@0
|
321 select_subgraph(Agenda, VV1, RG, Rest, Visited1, Visited).
|
Chris@0
|
322
|
Chris@0
|
323
|
Chris@0
|
324 % select_related(+Goals, +Var, -NewVars, -RestGoals,
|
Chris@0
|
325 % +VisVar0, -VisVar, +Vis0, -Vis)
|
Chris@0
|
326
|
Chris@0
|
327 select_related([], _, [], [], VV, VV, V, V).
|
Chris@0
|
328 select_related([G0|GT], Var, NewA, Rest, VV0, VV, V0, V) :-
|
Chris@0
|
329 get_assoc(G0, V0, _), !,
|
Chris@0
|
330 select_related(GT, Var, NewA, Rest, VV0, VV, V0, V).
|
Chris@0
|
331 select_related([G0|GT], Var, Agenda, Rest, VV0, VV, V0, V) :-
|
Chris@0
|
332 unbound_vars(G0, VG0),
|
Chris@0
|
333 member(VG1, VG0), VG1 == Var, !,
|
Chris@0
|
334 vars_visited(VG0, VV0, VV1, Agenda, AT),
|
Chris@0
|
335 put_assoc(G0, V0, t, V1),
|
Chris@0
|
336 select_related(GT, Var, AT, Rest, VV1, VV, V1, V).
|
Chris@0
|
337 select_related([G0|GT], Var, NewA, [G0|Rest], VV0, VV, V0, V) :-
|
Chris@0
|
338 select_related(GT, Var, NewA, Rest, VV0, VV, V0, V).
|
Chris@0
|
339
|
Chris@0
|
340
|
Chris@0
|
341 unbound_vars(Goal, Vars) :-
|
Chris@0
|
342 vars(Goal, AllVars),
|
Chris@0
|
343 unbound(AllVars, Vars).
|
Chris@0
|
344
|
Chris@0
|
345 unbound([], []).
|
Chris@0
|
346 unbound([H|T0], [H|T]) :-
|
Chris@0
|
347 instantiated(H, -), !,
|
Chris@0
|
348 unbound(T0, T).
|
Chris@0
|
349 unbound([_|T0], T) :-
|
Chris@0
|
350 unbound(T0, T).
|
Chris@0
|
351
|
Chris@0
|
352 vars_visited([], VV, VV, A, A).
|
Chris@0
|
353 vars_visited([H|T], VV0, VV, [H|L0], L) :-
|
Chris@0
|
354 put_assoc(H, VV0, t, VV1),
|
Chris@0
|
355 vars_visited(T, VV1, VV, L0, L).
|
Chris@0
|
356
|
Chris@0
|
357
|
Chris@0
|
358 %% assoc_keys(+Assoc, -Keys)
|
Chris@0
|
359 %
|
Chris@0
|
360 % Return the keys of an assoc as a list. Can be optimised further.
|
Chris@0
|
361
|
Chris@0
|
362 assoc_keys(Assoc, Keys) :-
|
Chris@0
|
363 assoc_to_list(Assoc, List),
|
Chris@0
|
364 keys(List, Keys).
|
Chris@0
|
365
|
Chris@0
|
366 keys([], []).
|
Chris@0
|
367 keys([K-_|T0], [K|T]) :-
|
Chris@0
|
368 keys(T0, T).
|
Chris@0
|
369
|
Chris@0
|
370
|
Chris@0
|
371 %% conj_to_list(+Conj, -List)
|
Chris@0
|
372 %
|
Chris@0
|
373 % Translate a conjunction into a list of elements of the format
|
Chris@0
|
374 %
|
Chris@0
|
375 %% goal(Id, Goal, Vars)
|
Chris@0
|
376 %
|
Chris@0
|
377 % Where Id is a goal identifier, Goal is the goal itself and Vars
|
Chris@0
|
378 % is a list of variables inside the Goal. Variables are sorted to
|
Chris@0
|
379 % the standard order of terms.
|
Chris@0
|
380
|
Chris@0
|
381 conj_to_list(Conj, List) :-
|
Chris@0
|
382 phrase(conj_to_list(Conj, 1, _), List).
|
Chris@0
|
383
|
Chris@0
|
384 conj_to_list((A,B), N0, N) --> !,
|
Chris@0
|
385 conj_to_list(A, N0, N1),
|
Chris@0
|
386 conj_to_list(B, N1, N).
|
Chris@0
|
387 conj_to_list(true, N, N) --> !,
|
Chris@0
|
388 [].
|
Chris@0
|
389 conj_to_list(G, N0, N) -->
|
Chris@0
|
390 { term_variables(G, Vars0),
|
Chris@0
|
391 sort(Vars0, Vars),
|
Chris@0
|
392 N is N0 + 1
|
Chris@0
|
393 },
|
Chris@0
|
394 [ goal(N0, G, Vars)
|
Chris@0
|
395 ].
|
Chris@0
|
396
|
Chris@0
|
397
|
Chris@0
|
398 %% list_to_conj(+List, -Conj)
|
Chris@0
|
399
|
Chris@0
|
400
|
Chris@0
|
401 list_to_conj([], true).
|
Chris@0
|
402 list_to_conj([goal(_,G,_)], G) :- !.
|
Chris@0
|
403 list_to_conj([goal(_,G,_)|T0], (G,T)) :-
|
Chris@0
|
404 list_to_conj(T0, T).
|
Chris@0
|
405
|
Chris@0
|
406
|
Chris@0
|
407 %% id(G, Id) is det.
|
Chris@0
|
408 %% goal(G, Goal) is det.
|
Chris@0
|
409 %% vars(G, Vars) is det.
|
Chris@0
|
410 %
|
Chris@0
|
411 % Extract fields from the goal structure.
|
Chris@0
|
412
|
Chris@0
|
413 %id(goal(Id, _, _), Id).
|
Chris@0
|
414 goal(goal(_, Goal, _), Goal).
|
Chris@0
|
415 vars(goal(_, _, Vars), Vars).
|
Chris@0
|
416
|
Chris@0
|
417
|
Chris@0
|
418 /*******************************
|
Chris@0
|
419 * CARTHESIAN PRODUCT *
|
Chris@0
|
420 *******************************/
|
Chris@0
|
421
|
Chris@0
|
422 /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
Chris@0
|
423 If the cost is high, it can be worthwhile to see whether we can split
|
Chris@0
|
424 the conjunction into independent parts, execute these seperately and
|
Chris@0
|
425 determine the carthesian product.
|
Chris@0
|
426
|
Chris@0
|
427 To indicate carthesian execution is profitable, a conjunction is
|
Chris@0
|
428 transformed into a call to
|
Chris@0
|
429
|
Chris@0
|
430 rdfql_carthesian(ListOfSubGoals)
|
Chris@0
|
431
|
Chris@0
|
432 where each SubGoal is of the form
|
Chris@0
|
433
|
Chris@0
|
434 bag(Vars, Goal)
|
Chris@0
|
435
|
Chris@0
|
436 where Vars are the variables in Goal and Goal is a subgoal that is fully
|
Chris@0
|
437 independent from the others.
|
Chris@0
|
438 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
|
Chris@0
|
439
|
Chris@0
|
440 carthesian(Goal, Carthesian) :-
|
Chris@0
|
441 State = bindings([]),
|
Chris@0
|
442 conj_to_list(Goal, Conj0),
|
Chris@0
|
443 carthesian_conj(Conj0, State, Carthesian), !,
|
Chris@0
|
444 arg(1, State, Bindings),
|
Chris@0
|
445 unbind(Bindings).
|
Chris@0
|
446 carthesian(Goal, Goal).
|
Chris@0
|
447
|
Chris@0
|
448 carthesian_conj(List, State, Carthesian) :-
|
Chris@0
|
449 group_by_cut(List, Before, _Cut, After), !,
|
Chris@0
|
450 carthesian_conj(Before, State, B),
|
Chris@0
|
451 carthesian_conj(After, State, A),
|
Chris@0
|
452 Carthesian = (B, !, A).
|
Chris@0
|
453 carthesian_conj(List, State, Carthesian) :-
|
Chris@0
|
454 append(Before, After, List),
|
Chris@0
|
455 bind_args(Before, State),
|
Chris@0
|
456 make_subgraphs(After, SubGraphs),
|
Chris@0
|
457 SubGraphs = [_,_|_], !,
|
Chris@0
|
458 list_to_conj(Before, B),
|
Chris@0
|
459 mk_carthesian(SubGraphs, Bags),
|
Chris@0
|
460 carthesian_final(Bags, CarthGoal),
|
Chris@0
|
461 Carthesian = (B, CarthGoal).
|
Chris@0
|
462
|
Chris@0
|
463 mk_carthesian([], []).
|
Chris@0
|
464 mk_carthesian([G0|T0], [bag(Vars, Goal)|T]) :-
|
Chris@0
|
465 list_to_conj(G0, Goal),
|
Chris@0
|
466 term_variables(Goal, Vars0),
|
Chris@0
|
467 delete_instantiated(Vars0, Vars),
|
Chris@0
|
468 mk_carthesian(T0, T).
|
Chris@0
|
469
|
Chris@0
|
470 %% carthesian_final(+Bags, -Final)
|
Chris@0
|
471 %
|
Chris@0
|
472 % Remove some common results that are not useful. Notably bags
|
Chris@0
|
473 % with empty variable-set are interesting. They are basically sets
|
Chris@0
|
474 % of goals called with ground variables and therefore can be
|
Chris@0
|
475 % merged with the bag in front of it.
|
Chris@0
|
476 %
|
Chris@0
|
477 % This needs some more though!
|
Chris@0
|
478
|
Chris@0
|
479 carthesian_final([bag(_, G0), bag([],G1)], (G0, G1)) :- !.
|
Chris@0
|
480 carthesian_final(Bags, rdfql_carthesian(Bags)).
|
Chris@0
|
481
|
Chris@0
|
482
|
Chris@0
|
483 /*******************************
|
Chris@0
|
484 * BINDING *
|
Chris@0
|
485 *******************************/
|
Chris@0
|
486
|
Chris@0
|
487 /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
Chris@0
|
488 Keep track of binding status of a variable. There are basically three
|
Chris@0
|
489 ways. One is to use a seperate (assoc) table. Alternatively we can use
|
Chris@0
|
490 attributed variables and delete the attributes at the end, and finally
|
Chris@0
|
491 we can use normal variables and recreate the original goal by unbinding
|
Chris@0
|
492 them again.
|
Chris@0
|
493
|
Chris@0
|
494 Using assocs requires us to pass these things around and involves a
|
Chris@0
|
495 log(N) complexity lookup. Using real terms has the disadvantage that to
|
Chris@0
|
496 unbind we have to copy the term, thus loosing bindings it may have with
|
Chris@0
|
497 the environment. Using attributes suffers neither of these problems and
|
Chris@0
|
498 its only drawback is relying on non-standard Prolog features.
|
Chris@0
|
499
|
Chris@0
|
500 Note that the remainder of the algorithm uses sets organised to the
|
Chris@0
|
501 standard order of terms. As putting attributes does not change the
|
Chris@0
|
502 identity of global stack variables and goals are global stack terms this
|
Chris@0
|
503 is guaranteed.
|
Chris@0
|
504 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
|
Chris@0
|
505
|
Chris@0
|
506 %% instantiate_obj(+Arg, -Old, +New, !Bindings) is det.
|
Chris@0
|
507 %
|
Chris@0
|
508 % Called to change the state of an RDF object after running some
|
Chris@0
|
509 % goal. Old is the current instantiation. After executing the RDF
|
Chris@0
|
510 % Goal the new instantiation is New (typically =b=). Bindings is a
|
Chris@0
|
511 % term bindings(List), which is updated using destructive
|
Chris@0
|
512 % assignment. List is the list of all variables to which we added
|
Chris@0
|
513 % attributes.
|
Chris@0
|
514
|
Chris@0
|
515 instantiate_obj(Arg, Old, New, Bindings) :-
|
Chris@0
|
516 ( var(Arg)
|
Chris@0
|
517 ; ground(Arg)
|
Chris@0
|
518 ), !,
|
Chris@0
|
519 instantiate(Arg, Old, New, Bindings).
|
Chris@0
|
520 instantiate_obj(literal(Pattern, Var), +(Pattern), New, Bindings) :-
|
Chris@0
|
521 instantiate(Var, _, New, Bindings).
|
Chris@0
|
522
|
Chris@0
|
523 instantiate(Var, Old, New, Bindings) :-
|
Chris@0
|
524 instantiated(Var, Old),
|
Chris@0
|
525 ( Old == (-)
|
Chris@0
|
526 -> put_attr(Var, instantiated, New),
|
Chris@0
|
527 arg(1, Bindings, B0),
|
Chris@0
|
528 setarg(1, Bindings, [Var|B0])
|
Chris@0
|
529 ; true
|
Chris@0
|
530 ).
|
Chris@0
|
531
|
Chris@0
|
532 instantiated(Term, How) :-
|
Chris@0
|
533 ( nonvar(Term)
|
Chris@0
|
534 -> How = +(+)
|
Chris@0
|
535 ; get_attr(Term, instantiated, H)
|
Chris@0
|
536 -> How = +(H)
|
Chris@0
|
537 ; How = -
|
Chris@0
|
538 ).
|
Chris@0
|
539
|
Chris@0
|
540 uninstantiate(Term, How) :-
|
Chris@0
|
541 ( get_attr(Term, instantiated, How)
|
Chris@0
|
542 -> del_attr(Term, instantiated)
|
Chris@0
|
543 ; true
|
Chris@0
|
544 ).
|
Chris@0
|
545
|
Chris@0
|
546
|
Chris@0
|
547 %% attr_unify_hook(+Attribute, +Value)
|
Chris@0
|
548 %
|
Chris@0
|
549 % For now, the attribute unify hook only allows unifying with a
|
Chris@0
|
550 % variable with the same attribute. This deals with the
|
Chris@0
|
551 % unification that takes place in rdf_optimise/3 for the variables
|
Chris@0
|
552 % of the saved copy.
|
Chris@0
|
553
|
Chris@0
|
554 instantiated:attr_unify_hook(Attr, Value) :-
|
Chris@0
|
555 get_attr(Value, instantiated, Attr).
|
Chris@0
|
556
|
Chris@0
|
557 instantiated:attr_portray_hook(Value, _Var) :-
|
Chris@0
|
558 write(+(Value)).
|
Chris@0
|
559
|
Chris@0
|
560 %% instantiate_term(+Term)
|
Chris@0
|
561 %
|
Chris@0
|
562 % Instantiate all unbound variables
|
Chris@0
|
563
|
Chris@0
|
564 instantiate_term(Term, How) :-
|
Chris@0
|
565 compound(Term), !,
|
Chris@0
|
566 functor(Term, _, Arity),
|
Chris@0
|
567 instantiate_args(Arity, Term, How).
|
Chris@0
|
568 instantiate_term(Term, How) :-
|
Chris@0
|
569 ( var(Term),
|
Chris@0
|
570 \+ get_attr(Term, instantiated, _)
|
Chris@0
|
571 -> put_attr(Term, instantiated, How)
|
Chris@0
|
572 ; true
|
Chris@0
|
573 ).
|
Chris@0
|
574
|
Chris@0
|
575 instantiate_args(0, _, _) :- !.
|
Chris@0
|
576 instantiate_args(N, Term, How) :-
|
Chris@0
|
577 arg(N, Term, A),
|
Chris@0
|
578 instantiate_term(A, How),
|
Chris@0
|
579 N2 is N - 1,
|
Chris@0
|
580 instantiate_args(N2, Term, How).
|
Chris@0
|
581
|
Chris@0
|
582
|
Chris@0
|
583 %% uninstantiate_term(+Term, +How)
|
Chris@0
|
584 %
|
Chris@0
|
585 % Remove all skolem instantiations from Term.
|
Chris@0
|
586
|
Chris@0
|
587 uninstantiate_term(Term, How) :-
|
Chris@0
|
588 compound(Term), !,
|
Chris@0
|
589 functor(Term, _, Arity),
|
Chris@0
|
590 uninstantiate_args(Arity, Term, How).
|
Chris@0
|
591 uninstantiate_term(Term, How) :-
|
Chris@0
|
592 uninstantiate(Term, How).
|
Chris@0
|
593
|
Chris@0
|
594 uninstantiate_args(0, _, _) :- !.
|
Chris@0
|
595 uninstantiate_args(N, Term, How) :-
|
Chris@0
|
596 arg(N, Term, A),
|
Chris@0
|
597 uninstantiate_term(A, How),
|
Chris@0
|
598 N2 is N - 1,
|
Chris@0
|
599 uninstantiate_args(N2, Term, How).
|
Chris@0
|
600
|
Chris@0
|
601
|
Chris@0
|
602 %% delete_instantiated(+List, -Unbound)
|
Chris@0
|
603 %
|
Chris@0
|
604 % Delete all elements of List that are not instantiated (i.e. var
|
Chris@0
|
605 % and without an instantiated attribute).
|
Chris@0
|
606
|
Chris@0
|
607 delete_instantiated([], []).
|
Chris@0
|
608 delete_instantiated([H|T0], L) :-
|
Chris@0
|
609 ( instantiated(H, -)
|
Chris@0
|
610 -> L = [H|T],
|
Chris@0
|
611 delete_instantiated(T0, T)
|
Chris@0
|
612 ; delete_instantiated(T0, L)
|
Chris@0
|
613 ).
|
Chris@0
|
614
|
Chris@0
|
615
|
Chris@0
|
616 /*******************************
|
Chris@0
|
617 * COMPLEXITY *
|
Chris@0
|
618 *******************************/
|
Chris@0
|
619
|
Chris@0
|
620 %% rdf_complexity(+GoalIn, -GoalOut, -SpaceEstimate, -TimeEstimate)
|
Chris@0
|
621 %
|
Chris@0
|
622 % Provide an estimate for the size of the search space for
|
Chris@0
|
623 % executing Goal. For this we estimate the branching factor of
|
Chris@0
|
624 % each subgoal in the conjunction. If the branching factors are
|
Chris@0
|
625 % B0, B1, ... then the total complexity estimate is
|
Chris@0
|
626 %
|
Chris@0
|
627 % E = 1 + B0 + B0*B1 + B0*B1*B2, ...
|
Chris@0
|
628 %
|
Chris@0
|
629 % Non-RDF calls are supposed to be boolean tests that cam be
|
Chris@0
|
630 % executed at the first opportunity all arguments are bound by RDF
|
Chris@0
|
631 % calls. They have a probability of failure, reducing the search
|
Chris@0
|
632 % space. Using the above formula, any number lower than 1 moves
|
Chris@0
|
633 % the test as far as possible to the head of the conjunction.
|
Chris@0
|
634 %
|
Chris@0
|
635 % If GoalIn and GoalOut are the same the system will not try to
|
Chris@0
|
636 % optimize local conjunctions.
|
Chris@0
|
637 %
|
Chris@0
|
638 % ISSUES: control structures ;, if-then-else, etc.
|
Chris@0
|
639
|
Chris@0
|
640 rdf_complexity(Goal, SpaceEstimate, TimeEstimate) :-
|
Chris@0
|
641 rdf_complexity(Goal, Goal, SpaceEstimate, TimeEstimate).
|
Chris@0
|
642
|
Chris@0
|
643 rdf_complexity(Goal0, Goal, Space, Time) :-
|
Chris@0
|
644 State = bindings([]),
|
Chris@0
|
645 complexity(Goal0, Goal, State, 1, Space, 0, Time),
|
Chris@0
|
646 arg(1, State, Bindings),
|
Chris@0
|
647 unbind(Bindings).
|
Chris@0
|
648
|
Chris@0
|
649 unbind([]).
|
Chris@0
|
650 unbind([H|T]) :-
|
Chris@0
|
651 del_attr(H, instantiated),
|
Chris@0
|
652 unbind(T).
|
Chris@0
|
653
|
Chris@0
|
654 % complexity(:GoalIn, -GoalOut,
|
Chris@0
|
655 % +State,
|
Chris@0
|
656 % +SpaceIn, -SpaceOut,
|
Chris@0
|
657 % +CountIn, -CountOut)
|
Chris@0
|
658 %
|
Chris@0
|
659 % Compute the complexity of Goal. Vars is an assoc holding the
|
Chris@0
|
660 % variables bound earlier in the conjunction. Space keeps the size
|
Chris@0
|
661 % of the search space and Count is the cummulative count of the
|
Chris@0
|
662 % costs for exploring the search space espressed in the number of
|
Chris@0
|
663 % nodes that will be visited.
|
Chris@0
|
664 %
|
Chris@0
|
665 % The (G*->_=true;_=false) clause deals with the code generated
|
Chris@0
|
666 % from optional graph specs as provided by SeRQL.
|
Chris@0
|
667
|
Chris@0
|
668 complexity((A0,B0), (A,B), State, Sz0, Sz, C0, C) :- !,
|
Chris@0
|
669 complexity(A0, A, State, Sz0, Sz1, C0, C1),
|
Chris@0
|
670 complexity(B0, B, State, Sz1, Sz, C1, C).
|
Chris@0
|
671 complexity((G0*->True;False),
|
Chris@0
|
672 ( G*->True;False), State, Sz0, Sz, C0, C) :- !,
|
Chris@0
|
673 ( var(G)
|
Chris@0
|
674 -> optimise_order(G0, G, Sz1, C1),
|
Chris@0
|
675 Sz is Sz0 * Sz1,
|
Chris@0
|
676 C is C0+Sz0*C1
|
Chris@0
|
677 ; complexity(G, G, State, Sz0, Sz, C0, C)
|
Chris@0
|
678 ).
|
Chris@0
|
679 complexity((If0->Then0;Else0), % dubious
|
Chris@0
|
680 ( If->Then; Else), State, Sz0, Sz, C0, C) :- !,
|
Chris@0
|
681 ( var(If)
|
Chris@0
|
682 -> optimise_order(If0, If, Sz1, C1),
|
Chris@0
|
683 optimise_order(Then0, Then, Sz2, C2),
|
Chris@0
|
684 optimise_order(Else0, Else, Sz3, C3),
|
Chris@0
|
685 Sz is max(Sz0 * Sz1 * Sz2, Sz0 * Sz3),
|
Chris@0
|
686 C is C0 + max(Sz0*C1+Sz0*Sz1*C2, Sz0*C1+Sz0*Sz1*C3)
|
Chris@0
|
687 ; complexity(If, If, State, Sz0, Sz1, C0, C1),
|
Chris@0
|
688 complexity(Then, Then, State, Sz1, Sz2, C1, C2),
|
Chris@0
|
689 complexity(Else, Else, State, Sz0, Sz3, C0, C3),
|
Chris@0
|
690 Sz is max(Sz2, Sz3),
|
Chris@0
|
691 C is max(C2, C3)
|
Chris@0
|
692 ).
|
Chris@0
|
693 complexity((A0;B0), (A;B), State, Sz0, Sz, C0, C) :- !,
|
Chris@0
|
694 ( var(A)
|
Chris@0
|
695 -> optimise_order(A0, A, _, _),
|
Chris@0
|
696 optimise_order(B0, B, _, _)
|
Chris@0
|
697 ; A = A0,
|
Chris@0
|
698 B = B0
|
Chris@0
|
699 ),
|
Chris@0
|
700 complexity(A, A, State, Sz0, SzA, C0, CA),
|
Chris@0
|
701 complexity(B, B, State, Sz0, SzB, C0, CB),
|
Chris@0
|
702 Sz is SzA + SzB,
|
Chris@0
|
703 C is CA + CB.
|
Chris@0
|
704 complexity(rdfql_carthesian(Bags),
|
Chris@0
|
705 rdfql_carthesian(Bags), State, Sz0, Sz, C0, C) :- !,
|
Chris@0
|
706 carth_complexity(Bags, State, Sz0, Sz, C0, 0, C).
|
Chris@0
|
707 complexity(Goal, Goal, State, Sz0, Sz, C0, C) :-
|
Chris@0
|
708 Goal = member(Var, List), !, % List is list of resources
|
Chris@0
|
709 instantiate(Var, _, b, State),
|
Chris@0
|
710 length(List, Branch),
|
Chris@0
|
711 Sz is Sz0 * Branch,
|
Chris@0
|
712 C is C0 + Sz0*0.2 + Sz*0.2.
|
Chris@0
|
713 complexity(Goal, Goal, State, Sz, Sz, C0, C) :-
|
Chris@0
|
714 Goal = (Var=literal(V)), !,
|
Chris@0
|
715 instantiated(V, +(_)),
|
Chris@0
|
716 instantiate(Var, _, b, State),
|
Chris@0
|
717 C is C0 + 0.2.
|
Chris@0
|
718 complexity(Goal, Goal, State, Sz0, Sz, C0, C) :-
|
Chris@0
|
719 rdf_db_goal(Goal, S, P, O), !,
|
Chris@0
|
720 instantiate(S, SI, b, State),
|
Chris@0
|
721 instantiate(P, PI, b, State),
|
Chris@0
|
722 instantiate_obj(O, OI, b, State),
|
Chris@0
|
723 complexity0(SI, PI, OI, P, Goal, SetUp, PerAlt, Branch),
|
Chris@0
|
724 Sz is Sz0 * Branch,
|
Chris@0
|
725 C is C0 + Sz0*SetUp + Sz*PerAlt,
|
Chris@0
|
726 debug(rdf_optimise(complexity), 'Complexity ~p: (~w) ~w --> ~w',
|
Chris@0
|
727 [i(SI,PI,OI), Goal, Branch, C]).
|
Chris@0
|
728 complexity(G, G, _, Sz0, Sz, C0, C) :- % non-rdf tests
|
Chris@0
|
729 term_variables(G, Vars),
|
Chris@0
|
730 all_bound(Vars),
|
Chris@0
|
731 Sz is Sz0 * 0.5, % probability of failure
|
Chris@0
|
732 C is C0 + Sz. % Sz * CostOfTest
|
Chris@0
|
733
|
Chris@0
|
734 all_bound([]).
|
Chris@0
|
735 all_bound([H|T]) :-
|
Chris@0
|
736 instantiated(H, +(_)),
|
Chris@0
|
737 all_bound(T).
|
Chris@0
|
738
|
Chris@0
|
739 % carth_complexity(+Bags, +State,
|
Chris@0
|
740 % +Size0, -Size,
|
Chris@0
|
741 % +Time0, +TimeSum0, -TimeSum)
|
Chris@0
|
742 %
|
Chris@0
|
743 % Compute the time and space efficiency of the carthesian product.
|
Chris@0
|
744 % the total cost in time is the sum of the costs of all branches,
|
Chris@0
|
745 % The search space at the end is still the same product.
|
Chris@0
|
746
|
Chris@0
|
747 carth_complexity([], _, Sz, Sz, _, C, C).
|
Chris@0
|
748 carth_complexity([bag(_,G)|T], State,
|
Chris@0
|
749 Sz0, Sz,
|
Chris@0
|
750 C0, Csum0, Csumz) :-
|
Chris@0
|
751 complexity(G, G, State, Sz0, Sz1, C0, C1),
|
Chris@0
|
752 Csum1 is Csum0 + C1,
|
Chris@0
|
753 carth_complexity(T, State, Sz1, Sz, C0, Csum1, Csumz).
|
Chris@0
|
754
|
Chris@0
|
755
|
Chris@0
|
756 %% complexity0(+SI, +PI, +OI, +P, +G, -Setup, -PerAlt, -Branch).
|
Chris@0
|
757 %
|
Chris@0
|
758 % SI,PI,OI describe the instantiation pattern. P is the predicate
|
Chris@0
|
759 % and G is the actual goal. Complexity is unified to an estimate
|
Chris@0
|
760 % of the size of the search space and therefore an estimate of the
|
Chris@0
|
761 % execution time required to prove the goal.
|
Chris@0
|
762 %
|
Chris@0
|
763 % Literal `like' matches come out as +(like(Pattern)). We must
|
Chris@0
|
764 % estimate the percentage of the literals that match this pattern.
|
Chris@0
|
765 % Suppose the factor is 1,000. This means the branching is reduced
|
Chris@0
|
766 % by 1,000, but finding each solution is slow as it requires a
|
Chris@0
|
767 % linear scan. It is faster than going all the way back to Prolog
|
Chris@0
|
768 % backtracking however, so we estimate a factor 10 (TBD: verify
|
Chris@0
|
769 % that number).
|
Chris@0
|
770 %
|
Chris@0
|
771 % ISSUES: rdf_has/3 vs rdf_reachable/3.
|
Chris@0
|
772
|
Chris@0
|
773 complexity0(+(_),+(_),+(_), _, _, 1, 0, 1) :- !.
|
Chris@0
|
774 complexity0(+(b),+(+),-, P, G, 1, 1, B) :- !,
|
Chris@0
|
775 subj_branch_factor(G, B, Prop),
|
Chris@0
|
776 rdf_predicate_property(P, Prop).
|
Chris@0
|
777 complexity0(-,+(+),+(b), P, G, 1, 1, B) :- !,
|
Chris@0
|
778 obj_branch_factor(G, B, Prop),
|
Chris@0
|
779 rdf_predicate_property(P, Prop).
|
Chris@0
|
780 complexity0(+(b), -, -, _, _, 1, 1, B) :- !,
|
Chris@0
|
781 rdf_statistics(triples(Total)),
|
Chris@0
|
782 rdf_statistics(subjects(Subjs)),
|
Chris@0
|
783 B is Total/Subjs.
|
Chris@0
|
784 complexity0(_,_,+(like(Pat)),_, G, Factor, Factor, B) :- !,
|
Chris@0
|
785 rdf_estimate_complexity(G, B0),
|
Chris@0
|
786 pattern_filter(Pat, Factor0),
|
Chris@0
|
787 Factor is max(1, min(B0, Factor0)/10),
|
Chris@0
|
788 B is B0/Factor.
|
Chris@0
|
789 complexity0(_,_,_, _, G, 1, 1, B) :-
|
Chris@0
|
790 rdf_estimate_complexity(G, B).
|
Chris@0
|
791
|
Chris@0
|
792 :- multifile
|
Chris@0
|
793 subj_branch_factor/3,
|
Chris@0
|
794 obj_branch_factor/3.
|
Chris@0
|
795
|
Chris@0
|
796 subj_branch_factor(rdf(_,_,_), X, rdf_subject_branch_factor(X)).
|
Chris@0
|
797 subj_branch_factor(rdf_has(_,_,_), X, rdfs_subject_branch_factor(X)).
|
Chris@0
|
798 subj_branch_factor(rdf_reachable(_,_,_), X, rdfs_subject_branch_factor(X)).
|
Chris@0
|
799
|
Chris@0
|
800 obj_branch_factor(rdf(_,_,_), X, rdf_object_branch_factor(X)).
|
Chris@0
|
801 obj_branch_factor(rdf_has(_,_,_), X, rdfs_object_branch_factor(X)).
|
Chris@0
|
802 obj_branch_factor(rdf_reachable(_,_,_), X, rdfs_object_branch_factor(X)).
|
Chris@0
|
803
|
Chris@0
|
804
|
Chris@0
|
805 %% rdf_db_goal(+Goal, -Subject, -Predicate, -Object)
|
Chris@0
|
806 %
|
Chris@0
|
807 % True if Goal is a pure (logical) predicate on the RDF database
|
Chris@0
|
808 % involving the given Subject, Predicate and Object. Defined
|
Chris@0
|
809 % multifile, allowing the optimiser to understand user-defined
|
Chris@0
|
810 % rdf/3 like predicates.
|
Chris@0
|
811 %
|
Chris@0
|
812 % @tbd Allow specifying different costs and branching factors
|
Chris@0
|
813
|
Chris@0
|
814 :- multifile
|
Chris@0
|
815 rdf_db_goal/4.
|
Chris@0
|
816
|
Chris@0
|
817 rdf_db_goal(rdf(S,P,O), S,P,O).
|
Chris@0
|
818 rdf_db_goal(rdf_has(S,P,O), S,P,O).
|
Chris@0
|
819 rdf_db_goal(rdf_reachable(S,P,O), S,P,O).
|
Chris@0
|
820 rdf_db_goal(rdf(S,P,O, _DB), S,P,O). % TBD: less hits
|
Chris@0
|
821
|
Chris@0
|
822 %% pattern_filter(+Like, -Factor)
|
Chris@0
|
823 %
|
Chris@0
|
824 % Estimate the efficiency of a pattern. This is a bit hard, as
|
Chris@0
|
825 % characters are not independent.
|
Chris@0
|
826
|
Chris@0
|
827 pattern_filter(Like, Factor) :-
|
Chris@0
|
828 atom_codes(Like, Codes),
|
Chris@0
|
829 pattern_factor(Codes, 1, Factor).
|
Chris@0
|
830
|
Chris@0
|
831 pattern_factor([], F, F).
|
Chris@0
|
832 pattern_factor([0'*|T], F0, F) :- !,
|
Chris@0
|
833 pattern_factor(T, F0, F).
|
Chris@0
|
834 pattern_factor([_|T], F0, F) :-
|
Chris@0
|
835 F1 is F0*10,
|
Chris@0
|
836 pattern_factor(T, F1, F).
|
Chris@0
|
837
|
Chris@0
|
838
|
Chris@0
|
839 %% rdf_estimate_complexity(+Goal, -Complexity)
|
Chris@0
|
840 %
|
Chris@0
|
841 % Estimate the branching factor introduced by Goal. This uses the
|
Chris@0
|
842 % rdf_db statistics of the hash chains which are based on
|
Chris@0
|
843 % exploiting the RDFS subPropertyOf reasoning.
|
Chris@0
|
844 %
|
Chris@0
|
845 % In addition, rdf_reachable/3 introduces its own complexity which
|
Chris@0
|
846 % must be estimate using the branching factor of the relation.
|
Chris@0
|
847
|
Chris@0
|
848 rdf_estimate_complexity(G, C) :-
|
Chris@0
|
849 rdf_db_goal(G, S, P0, O),
|
Chris@0
|
850 map_predicate(P0, P),
|
Chris@0
|
851 rdf_estimate_complexity(S, P, O, C).
|
Chris@0
|
852
|
Chris@0
|
853 map(map_predicate(_,_)).
|
Chris@0
|
854 map(map_predicate(_,_):-_).
|
Chris@0
|
855
|
Chris@0
|
856 term_expansion(In, Out) :-
|
Chris@0
|
857 map(In), !,
|
Chris@0
|
858 rdf_global_term(In, Out).
|
Chris@0
|
859
|
Chris@0
|
860 map_predicate(X, X) :-
|
Chris@0
|
861 var(X), !.
|
Chris@0
|
862 map_predicate(serql:directSubClassOf, rdfs:subClassOf) :- !.
|
Chris@0
|
863 map_predicate(serql:directType, rdf:type) :- !.
|
Chris@0
|
864 map_predicate(serql:directSubPropertyOf, rdfs:subPropertyOf) :- !.
|
Chris@0
|
865 map_predicate(X, X).
|
Chris@0
|
866
|
Chris@0
|
867
|
Chris@0
|
868 /*******************************
|
Chris@0
|
869 * INSTANTIATE OPTIONAL *
|
Chris@0
|
870 *******************************/
|
Chris@0
|
871
|
Chris@0
|
872 /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
Chris@0
|
873 In SELECT queries, optional parts of the path expression leave
|
Chris@0
|
874 uninstantiated variables. These must be bound to '$null$' to be able to
|
Chris@0
|
875 do correct merging for DISTINCT. The naive way to do this is to
|
Chris@0
|
876 instantiate all variables at the end of the query. On large selects
|
Chris@0
|
877 (i.e. involving many variables) this appears to be quite costly. Doing
|
Chris@0
|
878 the job early, as in
|
Chris@0
|
879
|
Chris@0
|
880 ( Goal
|
Chris@0
|
881 *-> true
|
Chris@0
|
882 ; bind_null(VarsInGoal)
|
Chris@0
|
883 )
|
Chris@0
|
884
|
Chris@0
|
885 is not correct as well, as VarsInGoal may be involved in other parts of
|
Chris@0
|
886 the code either before or after the optional path expression. So we need
|
Chris@0
|
887 to:
|
Chris@0
|
888
|
Chris@0
|
889 * Do abtract execution and find the bindings done before arriving
|
Chris@0
|
890 at Goal.
|
Chris@0
|
891 * continue the execution, and watch for new bindings to these
|
Chris@0
|
892 variables. If we find a binding for the second time, remove
|
Chris@0
|
893 it from the first and make a conditional binding for it.
|
Chris@0
|
894
|
Chris@0
|
895 If we bind an argument unconditionally, we place an attribute 'b'. If it
|
Chris@0
|
896 is conditionally bound, we place an attribute c(Set), where Set is the
|
Chris@0
|
897 set in which it was bound or plain 'c' if it was conditionally bound in
|
Chris@0
|
898 multiple places.
|
Chris@0
|
899
|
Chris@0
|
900 TBD: disjunctions and other control structures.
|
Chris@0
|
901 queries that do not bind (such as SPARQL bound(X))
|
Chris@0
|
902 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
|
Chris@0
|
903
|
Chris@0
|
904 serql_select_bind_null(Goal0, Goal) :-
|
Chris@0
|
905 State = bindings([]),
|
Chris@0
|
906 select_bind_null(Goal0, Goal1, State),
|
Chris@0
|
907 arg(1, State, Bindings),
|
Chris@0
|
908 c_unbind(Bindings, Left),
|
Chris@0
|
909 ( Left == []
|
Chris@0
|
910 -> Goal = Goal1
|
Chris@0
|
911 ; Goal = (Goal1, rdfql_cond_bind_null(Left))
|
Chris@0
|
912 ).
|
Chris@0
|
913
|
Chris@0
|
914 c_unbind([], []).
|
Chris@0
|
915 c_unbind([H|T0], L) :-
|
Chris@0
|
916 ( get_attr(H, instantiated, c)
|
Chris@0
|
917 -> L = [H|T]
|
Chris@0
|
918 ; L = T
|
Chris@0
|
919 ),
|
Chris@0
|
920 del_attr(H, instantiated),
|
Chris@0
|
921 c_unbind(T0, T).
|
Chris@0
|
922
|
Chris@0
|
923
|
Chris@0
|
924 select_bind_null((A0,B0), (A,B), State) :- !,
|
Chris@0
|
925 select_bind_null(A0, A, State),
|
Chris@0
|
926 select_bind_null(B0, B, State).
|
Chris@0
|
927 select_bind_null((G0 *-> true; true),
|
Chris@0
|
928 ( G *-> true; Bind),
|
Chris@0
|
929 State) :- !,
|
Chris@0
|
930 arg(1, State, B0),
|
Chris@0
|
931 select_bind_null(G0, G, State),
|
Chris@0
|
932 arg(1, State, B),
|
Chris@0
|
933 Bind = rdfql_bind_null(Vars),
|
Chris@0
|
934 c_bindings(B, B0, c(Bind), Vars).
|
Chris@0
|
935 select_bind_null(rdfql_carthesian(List0),
|
Chris@0
|
936 rdfql_carthesian(List), State) :- !,
|
Chris@0
|
937 select_carth_bind_null(List0, List, State).
|
Chris@0
|
938 select_bind_null(Goal, Goal, State) :-
|
Chris@0
|
939 term_variables(Goal, Vars),
|
Chris@0
|
940 c_bind(Vars, State).
|
Chris@0
|
941
|
Chris@0
|
942 %% c_bindings(+AtEnd, +AtStart, +CVars, -Vars)
|
Chris@0
|
943 %
|
Chris@0
|
944 % The variables of the difference-list AtEnd..AtStart are
|
Chris@0
|
945 % conditionally bound. Tag each such variable with CVars.
|
Chris@0
|
946 %
|
Chris@0
|
947 % @param CVars Term c(Vars), where Vars are the other variables
|
Chris@0
|
948 % that have the same conditional binding.
|
Chris@0
|
949
|
Chris@0
|
950 c_bindings(B, B0, _, Vars) :-
|
Chris@0
|
951 B == B0, !,
|
Chris@0
|
952 Vars = [].
|
Chris@0
|
953 c_bindings([H|T0], B0, Attr, [H|Vars]) :-
|
Chris@0
|
954 get_attr(H, instantiated, I),
|
Chris@0
|
955 is_instantiated(I), !,
|
Chris@0
|
956 put_attr(H, instantiated, Attr),
|
Chris@0
|
957 c_bindings(T0, B0, Attr, Vars).
|
Chris@0
|
958 c_bindings([_|T0], B0, Attr, Vars) :-
|
Chris@0
|
959 c_bindings(T0, B0, Attr, Vars).
|
Chris@0
|
960
|
Chris@0
|
961
|
Chris@0
|
962 is_instantiated(b). % unconditionally bound
|
Chris@0
|
963 is_instantiated(c(_)). % bound either by call or rdfql_bind_null/1
|
Chris@0
|
964
|
Chris@0
|
965
|
Chris@0
|
966 %% c_bind(+Vars, +State)
|
Chris@0
|
967 %
|
Chris@0
|
968 % Do unconditional binding of Vars. Var may be in a
|
Chris@0
|
969 % rdfql_bind_null/1 set. In that case, delete it from the set, and
|
Chris@0
|
970 % set the class to 'c' to make a conditional binding at the end.
|
Chris@0
|
971
|
Chris@0
|
972 c_bind([], _).
|
Chris@0
|
973 c_bind([H|T], State) :-
|
Chris@0
|
974 ( get_attr(H, instantiated, I)
|
Chris@0
|
975 -> ( I == b % already unconditionally bound
|
Chris@0
|
976 -> true
|
Chris@0
|
977 ; I = c(Set)
|
Chris@0
|
978 -> arg(1, Set, Vars0),
|
Chris@0
|
979 del_var(H, Vars0, Vars),
|
Chris@0
|
980 setarg(1, Set, Vars),
|
Chris@0
|
981 put_attr(H, instantiated, c)
|
Chris@0
|
982 ; I == c
|
Chris@0
|
983 -> true
|
Chris@0
|
984 )
|
Chris@0
|
985 ; put_attr(H, instantiated, b),
|
Chris@0
|
986 arg(1, State, B0),
|
Chris@0
|
987 setarg(1, State, [H|B0])
|
Chris@0
|
988 ),
|
Chris@0
|
989 c_bind(T, State).
|
Chris@0
|
990
|
Chris@0
|
991 del_var(H, [X|T0], T) :-
|
Chris@0
|
992 ( H == X
|
Chris@0
|
993 -> T = T0
|
Chris@0
|
994 ; T = [X|T1],
|
Chris@0
|
995 del_var(H, T0, T1)
|
Chris@0
|
996 ).
|
Chris@0
|
997
|
Chris@0
|
998 select_carth_bind_null([], [], _).
|
Chris@0
|
999 select_carth_bind_null([bag(Vars, G0)|T0], [bag(Vars, G)|T], State) :-
|
Chris@0
|
1000 select_bind_null(G0, G, State),
|
Chris@0
|
1001 select_carth_bind_null(T0, T, State).
|
Chris@0
|
1002
|
Chris@0
|
1003
|
Chris@0
|
1004 /*******************************
|
Chris@0
|
1005 * DEBUG SUPPORT *
|
Chris@0
|
1006 *******************************/
|
Chris@0
|
1007
|
Chris@0
|
1008 dbg_portray_body(G) :-
|
Chris@0
|
1009 debugging(rdf_optimise), !,
|
Chris@0
|
1010 portray_body(G).
|
Chris@0
|
1011 dbg_portray_body(_).
|
Chris@0
|
1012
|
Chris@0
|
1013 portray_body(G) :-
|
Chris@0
|
1014 ( pp_instantiate_term(G),
|
Chris@0
|
1015 debug(_, '~@', [portray_clause((<> :- G))]),
|
Chris@0
|
1016 fail
|
Chris@0
|
1017 ; true
|
Chris@0
|
1018 ).
|
Chris@0
|
1019
|
Chris@0
|
1020 pp_instantiate_term(Term) :-
|
Chris@0
|
1021 compound(Term), !,
|
Chris@0
|
1022 functor(Term, _, Arity),
|
Chris@0
|
1023 pp_instantiate_args(Arity, Term).
|
Chris@0
|
1024 pp_instantiate_term(Term) :-
|
Chris@0
|
1025 var(Term),
|
Chris@0
|
1026 get_attr(Term, instantiated, H), !,
|
Chris@0
|
1027 del_attr(Term, instantiated),
|
Chris@0
|
1028 Term = +(H).
|
Chris@0
|
1029 pp_instantiate_term(_).
|
Chris@0
|
1030
|
Chris@0
|
1031 pp_instantiate_args(0, _) :- !.
|
Chris@0
|
1032 pp_instantiate_args(N, Term) :-
|
Chris@0
|
1033 arg(N, Term, A),
|
Chris@0
|
1034 pp_instantiate_term(A),
|
Chris@0
|
1035 N2 is N - 1,
|
Chris@0
|
1036 pp_instantiate_args(N2, Term).
|
Chris@0
|
1037
|
Chris@0
|
1038
|