Daniel@0:
Daniel@0: Daniel@0:
Daniel@0: ## Analysis of collection level computation performance Daniel@0: Daniel@0: This following block of code defines some predicates for browsing the database of collection level computations and tagging them by what kind of computation it was, what the implementation language was, the date and time of the computation, the size of the collection and the duration of the computation. Daniel@0: Daniel@0: Scroll down the query boxes below and press the green arrow to run each query. Daniel@0:
Daniel@0: Daniel@0:
Daniel@0: :- use_module(library(computations)). Daniel@0: :- use_module(library(mlserver)). Daniel@0: :- use_module(library(real)). Daniel@0: :- use_module(library(dml_c3)). Daniel@0: Daniel@0: :- use_rendering(rdf,[resource_format(nslabel)]). Daniel@0: :- use_rendering(matlab,[format(svg),size(15,10)]). Daniel@0: :- use_rendering(c3). Daniel@0: Daniel@0: % Enumerates collection level analysis events, providing Daniel@0: % Func=Label/Lang, collection size, and computation duration. Daniel@0: cla_op(Lab,Size,Dur) :- cla_op(label>>relabel-size-dur,Lab-Size-Dur). Daniel@0: Daniel@0: % main CLA browser predicate. Uses Filter to determine which computations Daniel@0: % are returned and what information is provided. Daniel@0: cla_op(Filter,Out) :- Daniel@0: browse(perspectives:cla_memo(Op,CID,_),comp(_,Time,Dur)-ok), Daniel@0: dataset_size(CID,Size), Daniel@0: x(Filter,op(Op,Time,Size,Dur),Out). Daniel@0: Daniel@0: % field extractor predicate (a v. small arrow interpreter!) Daniel@0: x(id,X,X). Daniel@0: x(F>>G,X,Y) :- x(F,X,Z), x(G,Z,Y). Daniel@0: x(fst(F),X1-Y,X2-Y) :- x(F,X1,X2). Daniel@0: x(snd(F),X-Y1,X-Y2) :- x(F,Y1,Y2). Daniel@0: x(F1-F2,X,Y1-Y2) :- x(F1,X,Y1), x(F2,X,Y2). Daniel@0: x(H,X,Y) :- defx(H,F), x(F,X,Y). Daniel@0: x(add(N),X,Y) :- Y is X+N. Daniel@0: x(arg(N),X,Y) :- arg(N,X,Y). Daniel@0: x(log,X,Y) :- Y is log10(X). Daniel@0: x(jitter(L,U),X,Y) :- random(D), Y is L+(U-L)*D+X. Daniel@0: x(divide,X-Y,Z) :- Z is X/Y. Daniel@0: x(diff,X-Y,Z) :- Z is Y-X. Daniel@0: x(quant(Q),X,Y) :- Y is Q*floor(X/Q). Daniel@0: x(event,T1-_,T1-start). Daniel@0: x(event,T1-DT,T2-stop) :- T2 is T1+DT. Daniel@0: Daniel@0: % gets month number and name from timestamp Daniel@0: x(month_num_name,Time,Num-Name) :- Daniel@0: format_time(atom(Name),'%B',Time), Daniel@0: stamp_date_time(Time,Date,local), Daniel@0: date_time_value(month,Date,Num). Daniel@0: Daniel@0: % Transforms collection level operation spec to Label/Lang, Daniel@0: % where Lang is in { ml, pl, r, py }. Daniel@0: x(relabel,Op,Label/Lang) :- cla_label_lang(Op,Label,Lang). Daniel@0: x(lang,_/Lang,Lang). Daniel@0: Daniel@0: defx(op, arg(1)). Daniel@0: defx(time, arg(2)). Daniel@0: defx(size, arg(3)). Daniel@0: defx(dur, arg(4)). Daniel@0: defx(month, time>>month_num_name). Daniel@0: Daniel@0: normalise_hist(Name-Pairs,Name-Pairs2) :- Daniel@0: unzip(Pairs,Vals,Counts), Daniel@0: stoch(Counts,Probs), Daniel@0: unzip(Pairs2,Vals,Probs). Daniel@0: Daniel@0: stoch(Xs,Ys) :- sumlist(Xs,Total), maplist(divby(Total),Xs,Ys). Daniel@0: divby(Z,X,Y) :- Y is X/Z. Daniel@0: Daniel@0: concurrency([T0-start|StartStopEvents],ConcurrencyEvents) :- Daniel@0: foldl(conc,StartStopEvents,ConcurrencyEvents,T0-1,_). Daniel@0: Daniel@0: conc(T2-Event,(T1-T2)-N1,T1-N1,T2-N2) :- Daniel@0: ( Event=start -> succ(N1,N2) Daniel@0: ; Event=stop -> succ(N2,N1) Daniel@0: ). Daniel@0: Daniel@0: % ------- tools for building C3 charts ----------- Daniel@0: Daniel@0: %% add_points(+Data:pair(term,list(pair(number,number))))// is det. Daniel@0: % Adds a set of points to a scatter plot Daniel@0: add_points(Name-Pairs) --> Daniel@0: {unzip(Pairs,Xs,Ys)}, Daniel@0: add_points(Name,Xs,Ys). Daniel@0: Daniel@0: %% add_points(+Name:term,+X:list(number),+Y:list(number),+C1:c3,-C2:c3) is det. Daniel@0: % adds a named set of points to a C3 scatter plot. Daniel@0: add_points(Name1,Xs,Ys,Ch1,Ch2) :- Daniel@0: term_to_atom(Name1,Name), Daniel@0: atom_concat(Name,'_x',NameX), Daniel@0: Ch2=Ch1.put(data/columns,[[NameX|Xs],[Name|Ys]|Ch1.data.columns]) Daniel@0: .put(data/xs/Name,NameX). Daniel@0:
Daniel@0: Daniel@0:
Daniel@0: This query shows the relationship between collection size and computation time for all collection level analyses, grouped by month of computation. Note that the axis scales Daniel@0: are _logarithmic_. Daniel@0:
Daniel@0: Daniel@0:
Daniel@0: setof(Size-Dur, Daniel@0: cla_op(month-size>>log>>jitter(0,0.05)-dur>>log,(_-M)-Size-Dur), Daniel@0: _Points), Daniel@0: call_dcg(( c3:scat('log size','log dur'), Daniel@0: add_points(all-_Points), Daniel@0: c3:legend(false), Daniel@0: c3:zoom(true) Daniel@0: ), Daniel@0: c3{},Ch0). Daniel@0:
Daniel@0: Daniel@0:
Daniel@0: Histogram of logarithm of computation time per item, quantised to 0.1 bins (a ratio of about 1.26). Daniel@0:
Daniel@0: Daniel@0:
Daniel@0: Q=0.1, % size of quantisation bin (in log domain) Daniel@0: histof(T, cla_op((dur-size)>>divide>>log>>quant(Q),T), _Hist), Daniel@0: call_dcg(( c3:bar('log dur','count'), Daniel@0: add_points(all-_Hist), Daniel@0: c3:put(bar/width/ratio,Q) Daniel@0: ), c3{},Chart). Daniel@0:
Daniel@0: Daniel@0:
Daniel@0: Histogram of logarithm of computation time per item, quantised to bins of width 0.25, grouped computation label and language. Daniel@0:
Daniel@0: Daniel@0:
Daniel@0: findall(L-Hist, Daniel@0: histof(T, Daniel@0: cla_op(op>>relabel-(dur-size)>>divide>>log>>quant(0.25),L-T), Daniel@0: Hist), Daniel@0: _Hists), Daniel@0: maplist(normalise_hist,_Hists,_Dists), Daniel@0: call_dcg(( c3:bar('log dur','count'), Daniel@0: foldl(add_points,_Dists), Daniel@0: c3:put(bar/width/ratio,0.5) Daniel@0: ), c3{}, Ch0). Daniel@0:
Daniel@0: Daniel@0:
Daniel@0: Histogram of logarithm of computation time per item, quantised to bins of width 0.25, grouped language. Daniel@0:
Daniel@0: Daniel@0:
Daniel@0: findall(L-Hist, Daniel@0: histof(T, Daniel@0: cla_op(op>>relabel>>lang - (dur-size)>>divide>>log>>quant(0.25),L-T), Daniel@0: Hist), Daniel@0: Hists), Daniel@0: maplist(normalise_hist,Hists,_Dists), Daniel@0: call_dcg(( c3:bar('log dur','fraction'), Daniel@0: foldl(add_points,_Dists), Daniel@0: c3:put(bar/width/ratio,0.45) Daniel@0: ), c3{}, Ch0). Daniel@0:
Daniel@0: Daniel@0:
Daniel@0: This next query shows how collection size vs computation duration varies with the kind of analysis being done and the implementation language. Note that computations in Prolog have the lowest overheads, and that computations in Matlab seem to have the most variable range of durations for a given collection size. Daniel@0:
Daniel@0: Daniel@0:
Daniel@0: findall(Op-_Ps, Daniel@0: setof(Size-Dur, Daniel@0: cla_op(op>>relabel-(size>>log)-dur>>log,Op-Size-Dur), Daniel@0: _Ps), Daniel@0: _Rs), Daniel@0: foldl(add_points,_Rs,c3{}.scat('log size','log dur'),Ch1). Daniel@0:
Daniel@0: Daniel@0:
Daniel@0: This query is like the previous one, but grouped by language only. Daniel@0:
Daniel@0: Daniel@0:
Daniel@0: findall(Op-_Ps, Daniel@0: setof(Size-Dur, Daniel@0: cla_op(op>>relabel>>lang-(size>>log)-dur>>log,Op-Size-Dur), Daniel@0: _Ps), Daniel@0: _Rs), Daniel@0: foldl(add_points,_Rs,c3{}.scat('log size','log dur'),Ch1). Daniel@0:
Daniel@0: Daniel@0:
Daniel@0: This query breaks down the performance of analysis method by month. There does not seem to be any significant pattern in this other than the overall volume of computation done in each month. Daniel@0:
Daniel@0: Daniel@0:
Daniel@0: setof(Month-_Ps, Daniel@0: setof(Size-Dur, Daniel@0: cla_op( month - op>>relabel - size>>log - dur>>log, Daniel@0: Month - Label - Size - Dur), Daniel@0: _Ps), Daniel@0: _Rs), Daniel@0: call(foldl(add_points,_Rs), c3{}.scat(log_size,log_dur), Ch). Daniel@0:
Daniel@0: Daniel@0:
Daniel@0: This query analyses the degree of concurrency of collection level computations. It works by getting a complete set of point events describing the beginning and ending times of computations. Then the predicate concurrency/2 (defined in the initial code block) folds over these events and produces a list of time interval events of the form (StartTime-EndTime)-Concurrency, where Concurrency is the number of concurrent computations occuring over that interval. Then, the time intervals are mapped to durations and a histogram of concurrency weighted by duration is produced. Daniel@0:
Daniel@0: Daniel@0:
Daniel@0: setof(Ev,cla_op((time-dur)>>event,Ev),_Evs), Daniel@0: concurrency(_Evs,_CEvs), Daniel@0: maplist(x(fst(diff)),_CEvs,_CEvs2), Daniel@0: weighted_histof(Dur,Conc,member(Dur-Conc,_CEvs2),Hist), Daniel@0: select(0-_,Hist,_Hist1), Daniel@0: unzip(_Hist1,_Values,_Durs), Daniel@0: c3_bar(concurrency-_Values,duration-_Durs,Chart). Daniel@0:
Daniel@0: Daniel@0: