Ppm-star » History » Version 3

Jeremy Gow, 2013-04-09 11:51 AM

1 3 Jeremy Gow
h1. The ppm-star package
2 1 Marcus Pearce
 
3 3 Jeremy Gow
This package implements the Prediction by Partial Match* statistical model (originally developed for lossless data compression).
4 1 Marcus Pearce
5 3 Jeremy Gow
Here is a simple use of the package to generate probabilities for each element of each sequence in a list of sequences composed of some alphabet of symbols:
6 1 Marcus Pearce
7 1 Marcus Pearce
<pre>
8 1 Marcus Pearce
CL-USER> 
9 1 Marcus Pearce
(defun simple-ppm-test (sequences alphabet) 
10 1 Marcus Pearce
  (let ((model (ppm:make-ppm alphabet :escape :c :mixtures t 
11 1 Marcus Pearce
                             :update-exclusion nil :order-bound nil)))
12 1 Marcus Pearce
    (ppm:model-dataset model sequences :construct? t :predict? t)))
13 1 Marcus Pearce
SIMPLE-PPM-TEST
14 1 Marcus Pearce
CL-USER> (simple-ppm-test '((a b a b a b a b)) '(a b))
15 1 Marcus Pearce
((0 (A ((A 0.5) (B 0.5))) (B ((A 0.75) (B 0.25))) (A ((A 0.5) (B 0.5)))
16 1 Marcus Pearce
  (B ((A 0.36363637) (B 0.6363636))) (A ((A 0.6666667) (B 0.33333334)))
17 1 Marcus Pearce
  (B ((A 0.35714284) (B 0.64285713))) (A ((A 0.6666667) (B 0.33333334)))
18 1 Marcus Pearce
  (B ((A 0.3529412) (B 0.6470588)))))
19 1 Marcus Pearce
CL-USER> 
20 1 Marcus Pearce
</pre>
21 1 Marcus Pearce
22 1 Marcus Pearce
The output is a list of lists, one for each sequence in the list supplied to the function. Each of these lists is itself a list, composed of the symbol that appeared at that position in the sequence and a probability distribution reflecting the models predictions for that position. The probability of the symbol appearing at that location can be obtained by looking up the symbol in the probability distribution or one can use the distribution to compute the entropy (uncertainty) of the model's prediction at that location.
23 1 Marcus Pearce
24 1 Marcus Pearce
25 1 Marcus Pearce
The code in <code>ppm-ui.lisp</code> shows another example application: the following function returns the n-gram counts of a given order (n) in a list of sequences composed from symbols in the supplied alphabet: 
26 1 Marcus Pearce
27 1 Marcus Pearce
<pre>
28 1 Marcus Pearce
(defun test-model (sequences alphabet order)
29 1 Marcus Pearce
  (ngram-frequencies (build-model sequences alphabet) order))
30 1 Marcus Pearce
</pre> 
31 1 Marcus Pearce
32 1 Marcus Pearce
Here are some examples: 
33 1 Marcus Pearce
34 1 Marcus Pearce
<pre>
35 1 Marcus Pearce
CL-USER> (ppm-star::test-model '((a b r a c a d a b r a)) '(a b c d r) 3)           
36 1 Marcus Pearce
(((D A B) 1) ((C A D) 1) ((R A C) 1) ((B R A) 2) ((A D A) 1) ((A C A) 1)
37 1 Marcus Pearce
 ((A B R) 2))
38 1 Marcus Pearce
</pre> 
39 1 Marcus Pearce
<pre> 
40 1 Marcus Pearce
CL-USER> (ppm-star::test-model '((l e t l e t t e r t e l e)) '(e l t r) 1)                     
41 1 Marcus Pearce
(((R) 1) ((T) 4) ((E) 5) ((L) 3))
42 1 Marcus Pearce
</pre>
43 1 Marcus Pearce
<pre> 
44 1 Marcus Pearce
CL-USER> (ppm-star::test-model '((a g c g a c g a g)) '(a c g) 2)
45 1 Marcus Pearce
(((C G) 2) ((G A) 2) ((G C) 1) ((A C) 1) ((A G) 2))
46 1 Marcus Pearce
</pre> 
47 1 Marcus Pearce
<pre> 
48 1 Marcus Pearce
CL-USER> (ppm-star::test-model '((m i s s i s s i p p i)) '(i m p s) 4)
49 1 Marcus Pearce
(((S I P P) 1) ((S I S S) 1) ((S S I P) 1) ((S S I S) 1) ((I P P I) 1)
50 1 Marcus Pearce
 ((I S S I) 3) ((M I S S) 1))
51 1 Marcus Pearce
</pre> 
52 1 Marcus Pearce
<pre> 
53 1 Marcus Pearce
CL-USER> (ppm-star::test-model '((a s s a n i s s i m a s s a)) '(a i m n s) 2)
54 1 Marcus Pearce
(((M A) 1) ((I M) 1) ((I S) 1) ((N I) 1) ((S I) 1) ((S A) 2) ((S S) 3)
55 1 Marcus Pearce
 ((A N) 1) ((A S) 2))
56 1 Marcus Pearce
</pre> 
57 1 Marcus Pearce
<pre> 
58 1 Marcus Pearce
CL-USER> (ppm-star::test-model '((a s s a n i s s i m a s s a)
59 1 Marcus Pearce
                                 (m i s s i s s i p p i))
60 1 Marcus Pearce
                               '(a s n i m p)
61 1 Marcus Pearce
                               2)
62 1 Marcus Pearce
(((P I) 1) ((P P) 1) ((M I) 1) ((M A) 1) ((I P) 1) ((I M) 1) ((I S) 3)
63 1 Marcus Pearce
 ((N I) 1) ((S I) 3) ((S A) 2) ((S S) 5) ((A N) 1) ((A S) 2))
64 1 Marcus Pearce
</pre> 
65 1 Marcus Pearce
<pre>
66 1 Marcus Pearce
CL-USER> (ppm-star::test-model '((a b r a c a d a b r a)
67 1 Marcus Pearce
                                 (l e t l e t t e r t e l e)
68 1 Marcus Pearce
                                 (a s s a n i s s i m a s s a)
69 1 Marcus Pearce
                                 (m i s s i s s i p p i)
70 1 Marcus Pearce
                                 (w o o l o o b o o l o o))
71 1 Marcus Pearce
                               '(a b c d e i l m n o p r s t w)
72 1 Marcus Pearce
                               3)
73 1 Marcus Pearce
(((O B O) 1) ((O L O) 2) ((O O B) 1) ((O O L) 2) ((W O O) 1) ((P P I) 1)
74 1 Marcus Pearce
 ((M I S) 1) ((M A S) 1) ((I P P) 1) ((I M A) 1) ((I S S) 3) ((N I S) 1)
75 1 Marcus Pearce
 ((S I P) 1) ((S I S) 1) ((S I M) 1) ((S A N) 1) ((S S I) 3) ((S S A) 2)
76 1 Marcus Pearce
 ((T E L) 1) ((T E R) 1) ((T T E) 1) ((T L E) 1) ((E L E) 1) ((E R T) 1)
77 1 Marcus Pearce
 ((E T T) 1) ((E T L) 1) ((L O O) 2) ((L E T) 2) ((D A B) 1) ((C A D) 1)
78 1 Marcus Pearce
 ((R T E) 1) ((R A C) 1) ((B O O) 1) ((B R A) 2) ((A N I) 1) ((A S S) 2)
79 1 Marcus Pearce
 ((A D A) 1) ((A C A) 1) ((A B R) 2))
80 1 Marcus Pearce
CL-USER> 
81 1 Marcus Pearce
</pre>
82 1 Marcus Pearce
83 1 Marcus Pearce
There is also a function to write the model to a postscript representation of a suffix tree:
84 1 Marcus Pearce
85 1 Marcus Pearce
<pre>
86 1 Marcus Pearce
CL-USER> (ppm-star:write-model-to-postscript 
87 1 Marcus Pearce
          (ppm-star::build-model '((a b r a c a d a b r a)) '(a b c d r))
88 1 Marcus Pearce
          "/tmp/ppm.ps")
89 1 Marcus Pearce
NIL
90 1 Marcus Pearce
</pre>