Mercurial > hg > dml-open-cliopatria

<div class="notebook">

<div class="nb-cell markdown">
## Analysis of collection level computation performance

This following block of code defines some predicates for browsing the database of collection level computations and tagging them by what kind of computation it was, what the implementation language was, the date and time of the computation, the size of the collection and the duration of the computation.

Scroll down the query boxes below and press the green arrow to run each query.
</div>

<div class="nb-cell program">
:- use_module(library(computations)).
:- use_module(library(mlserver)).
:- use_module(library(real)).
:- use_module(library(dml_c3)).

:- use_rendering(rdf,[resource_format(nslabel)]).
:- use_rendering(matlab,[format(svg),size(15,10)]).
:- use_rendering(c3).

% Enumerates collection level analysis events, providing
% Func=Label/Lang, collection size, and computation duration.
cla_op(Lab,Size,Dur) :- cla_op(label&gt;&gt;relabel-size-dur,Lab-Size-Dur).

% main CLA browser predicate. Uses Filter to determine which computations
% are returned and what information is provided.
cla_op(Filter,Out) :-
	browse(perspectives:cla_memo(Op,CID,_),comp(_,Time,Dur)-ok),
    dataset_size(CID,Size),
    x(Filter,op(Op,Time,Size,Dur),Out).

% field extractor predicate (a v. small arrow interpreter!)
x(id,X,X).
x(F&gt;&gt;G,X,Y) 		:- x(F,X,Z), x(G,Z,Y).
x(fst(F),X1-Y,X2-Y) :- x(F,X1,X2).
x(snd(F),X-Y1,X-Y2) :- x(F,Y1,Y2).
x(F1-F2,X,Y1-Y2) 	:- x(F1,X,Y1), x(F2,X,Y2).
x(H,X,Y) 			:- defx(H,F), x(F,X,Y).
x(add(N),X,Y) 		:- Y is X+N.
x(arg(N),X,Y) 		:- arg(N,X,Y).
x(log,X,Y) 			:- Y is log10(X).
x(jitter(L,U),X,Y) 	:- random(D), Y is L+(U-L)*D+X.
x(divide,X-Y,Z) 	:- Z is X/Y.
x(diff,X-Y,Z) 		:- Z is Y-X.
x(quant(Q),X,Y) 	:- Y is Q*floor(X/Q).
x(event,T1-_,T1-start).
x(event,T1-DT,T2-stop) :- T2 is T1+DT.

% gets month number and name from timestamp
x(month_num_name,Time,Num-Name) :-
    format_time(atom(Name),'%B',Time),
    stamp_date_time(Time,Date,local),
    date_time_value(month,Date,Num).

% Transforms collection level operation spec to Label/Lang,
% where Lang is in { ml, pl, r, py }.
x(relabel,Op,Label/Lang) :- cla_label_lang(Op,Label,Lang).
x(lang,_/Lang,Lang).

defx(op,   arg(1)).
defx(time, arg(2)).
defx(size, arg(3)).
defx(dur,  arg(4)).
defx(month, time&gt;&gt;month_num_name).

normalise_hist(Name-Pairs,Name-Pairs2) :-
    unzip(Pairs,Vals,Counts),
    stoch(Counts,Probs),
    unzip(Pairs2,Vals,Probs).

stoch(Xs,Ys) :- sumlist(Xs,Total), maplist(divby(Total),Xs,Ys).
divby(Z,X,Y) :- Y is X/Z.

concurrency([T0-start|StartStopEvents],ConcurrencyEvents) :-
    foldl(conc,StartStopEvents,ConcurrencyEvents,T0-1,_).

conc(T2-Event,(T1-T2)-N1,T1-N1,T2-N2) :-
    (   Event=start -&gt;  succ(N1,N2)
   	;   Event=stop  -&gt;  succ(N2,N1)
    ).

% ------- tools for building C3 charts -----------

%% add_points(+Data:pair(term,list(pair(number,number))))// is det.
%  Adds a set of points to a scatter plot
add_points(Name-Pairs) --&gt;
	{unzip(Pairs,Xs,Ys)},
    add_points(Name,Xs,Ys).

%% add_points(+Name:term,+X:list(number),+Y:list(number),+C1:c3,-C2:c3) is det.
%  adds a named set of points to a C3 scatter plot.
add_points(Name1,Xs,Ys,Ch1,Ch2) :-
    term_to_atom(Name1,Name),
    atom_concat(Name,'_x',NameX),
    Ch2=Ch1.put(data/columns,[[NameX|Xs],[Name|Ys]|Ch1.data.columns])
    	   .put(data/xs/Name,NameX).
</div>

<div class="nb-cell markdown">
This query shows the relationship between collection size and computation time for all collection level analyses, grouped by month of computation. Note that the axis scales
are _logarithmic_.
</div>

<div class="nb-cell query">
setof(Size-Dur,
      cla_op(month-size&gt;&gt;log&gt;&gt;jitter(0,0.05)-dur&gt;&gt;log,(_-M)-Size-Dur),
      _Points),
call_dcg((	c3:scat('log size','log dur'),
			add_points(all-_Points),
            c3:legend(false),
            c3:zoom(true)
         ),
         c3{},Ch0).
</div>

<div class="nb-cell markdown">
Histogram of logarithm of computation time per item, quantised to 0.1 bins (a ratio of about 1.26).
</div>

<div class="nb-cell query">
Q=0.1, % size of quantisation bin (in log domain)
histof(T, cla_op((dur-size)&gt;&gt;divide&gt;&gt;log&gt;&gt;quant(Q),T), _Hist),
call_dcg((	c3:bar('log dur','count'),
             add_points(all-_Hist),
             c3:put(bar/width/ratio,Q)
         ), c3{},Chart).
</div>

<div class="nb-cell markdown">
Histogram of logarithm of computation time per item, quantised to bins of width 0.25, grouped computation label and language.
</div>

<div class="nb-cell query">
findall(L-Hist,
        histof(T,
               cla_op(op&gt;&gt;relabel-(dur-size)&gt;&gt;divide&gt;&gt;log&gt;&gt;quant(0.25),L-T),
               Hist),
        _Hists),
maplist(normalise_hist,_Hists,_Dists),
call_dcg((	c3:bar('log dur','count'),
            foldl(add_points,_Dists),
             c3:put(bar/width/ratio,0.5)
         ), c3{}, Ch0).
</div>

<div class="nb-cell markdown">
Histogram of logarithm of computation time per item, quantised to bins of width 0.25, grouped language.
</div>

<div class="nb-cell query">
findall(L-Hist,
        histof(T,
               cla_op(op&gt;&gt;relabel&gt;&gt;lang - (dur-size)&gt;&gt;divide&gt;&gt;log&gt;&gt;quant(0.25),L-T),
               Hist),
        Hists),
maplist(normalise_hist,Hists,_Dists),
call_dcg((	c3:bar('log dur','fraction'),
            foldl(add_points,_Dists),
            c3:put(bar/width/ratio,0.45)
         ), c3{}, Ch0).
</div>

<div class="nb-cell markdown">
This next query shows how collection size vs computation duration varies with the kind of analysis being done and the implementation language. Note that computations in Prolog have the lowest overheads, and that computations in Matlab seem to have the most variable range of durations for a given collection size.
</div>

<div class="nb-cell query">
findall(Op-_Ps,
        setof(Size-Dur,
              cla_op(op&gt;&gt;relabel-(size&gt;&gt;log)-dur&gt;&gt;log,Op-Size-Dur),
              _Ps),
        _Rs),
foldl(add_points,_Rs,c3{}.scat('log size','log dur'),Ch1).
</div>

<div class="nb-cell markdown">
This query is like the previous one, but grouped by language only.
</div>

<div class="nb-cell query">
findall(Op-_Ps,
        setof(Size-Dur,
              cla_op(op&gt;&gt;relabel&gt;&gt;lang-(size&gt;&gt;log)-dur&gt;&gt;log,Op-Size-Dur),
              _Ps),
        _Rs),
foldl(add_points,_Rs,c3{}.scat('log size','log dur'),Ch1).
</div>

<div class="nb-cell markdown">
This query breaks down the performance of analysis method by month. There does not seem to be any significant pattern in this other than the overall volume of computation done in each month.
</div>

<div class="nb-cell query">
setof(Month-_Ps,
      setof(Size-Dur,
            cla_op( month - op&gt;&gt;relabel - size&gt;&gt;log - dur&gt;&gt;log,
            		Month - Label - Size - Dur),
            _Ps),
      _Rs),
call(foldl(add_points,_Rs), c3{}.scat(log_size,log_dur), Ch).
</div>

<div class="nb-cell markdown">
This query analyses the degree of concurrency of collection level computations. It works by getting a complete set of point events describing the beginning and ending times of computations. Then the predicate concurrency/2 (defined in the initial code block) folds over these events and produces a list of time interval events of the form (StartTime-EndTime)-Concurrency, where Concurrency is the number of concurrent computations occuring over that interval. Then, the time intervals are mapped to durations and a histogram of concurrency weighted by duration is produced.
</div>

<div class="nb-cell query">
setof(Ev,cla_op((time-dur)&gt;&gt;event,Ev),_Evs),
concurrency(_Evs,_CEvs),
maplist(x(fst(diff)),_CEvs,_CEvs2),
weighted_histof(Dur,Conc,member(Dur-Conc,_CEvs2),Hist),
select(0-_,Hist,_Hist1),
unzip(_Hist1,_Values,_Durs),
c3_bar(concurrency-_Values,duration-_Durs,Chart).
</div>

</div>
author	Daniel Wolff
date	Tue, 09 Feb 2016 21:05:06 +0100
parents
children