Ppm-star » History » Version 1

Marcus Pearce, 2012-02-02 12:06 PM

1 1 Marcus Pearce
h1. ppm-star  
2 1 Marcus Pearce
3 1 Marcus Pearce
h2. Loading the system  
4 1 Marcus Pearce
5 1 Marcus Pearce
<pre> 
6 1 Marcus Pearce
CL-USER> (asdf:oos 'asdf:load-op 'ppm-star) 
7 1 Marcus Pearce
... 
8 1 Marcus Pearce
</pre>
9 1 Marcus Pearce
10 1 Marcus Pearce
h2. Usage 
11 1 Marcus Pearce
12 1 Marcus Pearce
This is a simple use of the package to generate probabilities for each element of each sequence in a list of sequences composed of some alphabet of symbols:
13 1 Marcus Pearce
14 1 Marcus Pearce
<pre>
15 1 Marcus Pearce
CL-USER> 
16 1 Marcus Pearce
(defun simple-ppm-test (sequences alphabet) 
17 1 Marcus Pearce
  (let ((model (ppm:make-ppm alphabet :escape :c :mixtures t 
18 1 Marcus Pearce
                             :update-exclusion nil :order-bound nil)))
19 1 Marcus Pearce
    (ppm:model-dataset model sequences :construct? t :predict? t)))
20 1 Marcus Pearce
SIMPLE-PPM-TEST
21 1 Marcus Pearce
CL-USER> (simple-ppm-test '((a b a b a b a b)) '(a b))
22 1 Marcus Pearce
((0 (A ((A 0.5) (B 0.5))) (B ((A 0.75) (B 0.25))) (A ((A 0.5) (B 0.5)))
23 1 Marcus Pearce
  (B ((A 0.36363637) (B 0.6363636))) (A ((A 0.6666667) (B 0.33333334)))
24 1 Marcus Pearce
  (B ((A 0.35714284) (B 0.64285713))) (A ((A 0.6666667) (B 0.33333334)))
25 1 Marcus Pearce
  (B ((A 0.3529412) (B 0.6470588)))))
26 1 Marcus Pearce
CL-USER> 
27 1 Marcus Pearce
</pre>
28 1 Marcus Pearce
29 1 Marcus Pearce
The output is a list of lists, one for each sequence in the list supplied to the function. Each of these lists is itself a list, composed of the symbol that appeared at that position in the sequence and a probability distribution reflecting the models predictions for that position. The probability of the symbol appearing at that location can be obtained by looking up the symbol in the probability distribution or one can use the distribution to compute the entropy (uncertainty) of the model's prediction at that location.
30 1 Marcus Pearce
31 1 Marcus Pearce
32 1 Marcus Pearce
The code in <code>ppm-ui.lisp</code> shows another example application: the following function returns the n-gram counts of a given order (n) in a list of sequences composed from symbols in the supplied alphabet: 
33 1 Marcus Pearce
34 1 Marcus Pearce
<pre>
35 1 Marcus Pearce
(defun test-model (sequences alphabet order)
36 1 Marcus Pearce
  (ngram-frequencies (build-model sequences alphabet) order))
37 1 Marcus Pearce
</pre> 
38 1 Marcus Pearce
39 1 Marcus Pearce
Here are some examples: 
40 1 Marcus Pearce
41 1 Marcus Pearce
<pre>
42 1 Marcus Pearce
CL-USER> (ppm-star::test-model '((a b r a c a d a b r a)) '(a b c d r) 3)           
43 1 Marcus Pearce
(((D A B) 1) ((C A D) 1) ((R A C) 1) ((B R A) 2) ((A D A) 1) ((A C A) 1)
44 1 Marcus Pearce
 ((A B R) 2))
45 1 Marcus Pearce
</pre> 
46 1 Marcus Pearce
<pre> 
47 1 Marcus Pearce
CL-USER> (ppm-star::test-model '((l e t l e t t e r t e l e)) '(e l t r) 1)                     
48 1 Marcus Pearce
(((R) 1) ((T) 4) ((E) 5) ((L) 3))
49 1 Marcus Pearce
</pre>
50 1 Marcus Pearce
<pre> 
51 1 Marcus Pearce
CL-USER> (ppm-star::test-model '((a g c g a c g a g)) '(a c g) 2)
52 1 Marcus Pearce
(((C G) 2) ((G A) 2) ((G C) 1) ((A C) 1) ((A G) 2))
53 1 Marcus Pearce
</pre> 
54 1 Marcus Pearce
<pre> 
55 1 Marcus Pearce
CL-USER> (ppm-star::test-model '((m i s s i s s i p p i)) '(i m p s) 4)
56 1 Marcus Pearce
(((S I P P) 1) ((S I S S) 1) ((S S I P) 1) ((S S I S) 1) ((I P P I) 1)
57 1 Marcus Pearce
 ((I S S I) 3) ((M I S S) 1))
58 1 Marcus Pearce
</pre> 
59 1 Marcus Pearce
<pre> 
60 1 Marcus Pearce
CL-USER> (ppm-star::test-model '((a s s a n i s s i m a s s a)) '(a i m n s) 2)
61 1 Marcus Pearce
(((M A) 1) ((I M) 1) ((I S) 1) ((N I) 1) ((S I) 1) ((S A) 2) ((S S) 3)
62 1 Marcus Pearce
 ((A N) 1) ((A S) 2))
63 1 Marcus Pearce
</pre> 
64 1 Marcus Pearce
<pre> 
65 1 Marcus Pearce
CL-USER> (ppm-star::test-model '((a s s a n i s s i m a s s a)
66 1 Marcus Pearce
                                 (m i s s i s s i p p i))
67 1 Marcus Pearce
                               '(a s n i m p)
68 1 Marcus Pearce
                               2)
69 1 Marcus Pearce
(((P I) 1) ((P P) 1) ((M I) 1) ((M A) 1) ((I P) 1) ((I M) 1) ((I S) 3)
70 1 Marcus Pearce
 ((N I) 1) ((S I) 3) ((S A) 2) ((S S) 5) ((A N) 1) ((A S) 2))
71 1 Marcus Pearce
</pre> 
72 1 Marcus Pearce
<pre>
73 1 Marcus Pearce
CL-USER> (ppm-star::test-model '((a b r a c a d a b r a)
74 1 Marcus Pearce
                                 (l e t l e t t e r t e l e)
75 1 Marcus Pearce
                                 (a s s a n i s s i m a s s a)
76 1 Marcus Pearce
                                 (m i s s i s s i p p i)
77 1 Marcus Pearce
                                 (w o o l o o b o o l o o))
78 1 Marcus Pearce
                               '(a b c d e i l m n o p r s t w)
79 1 Marcus Pearce
                               3)
80 1 Marcus Pearce
(((O B O) 1) ((O L O) 2) ((O O B) 1) ((O O L) 2) ((W O O) 1) ((P P I) 1)
81 1 Marcus Pearce
 ((M I S) 1) ((M A S) 1) ((I P P) 1) ((I M A) 1) ((I S S) 3) ((N I S) 1)
82 1 Marcus Pearce
 ((S I P) 1) ((S I S) 1) ((S I M) 1) ((S A N) 1) ((S S I) 3) ((S S A) 2)
83 1 Marcus Pearce
 ((T E L) 1) ((T E R) 1) ((T T E) 1) ((T L E) 1) ((E L E) 1) ((E R T) 1)
84 1 Marcus Pearce
 ((E T T) 1) ((E T L) 1) ((L O O) 2) ((L E T) 2) ((D A B) 1) ((C A D) 1)
85 1 Marcus Pearce
 ((R T E) 1) ((R A C) 1) ((B O O) 1) ((B R A) 2) ((A N I) 1) ((A S S) 2)
86 1 Marcus Pearce
 ((A D A) 1) ((A C A) 1) ((A B R) 2))
87 1 Marcus Pearce
CL-USER> 
88 1 Marcus Pearce
</pre>
89 1 Marcus Pearce
90 1 Marcus Pearce
There is also a function to write the model to a postscript representation of a suffix tree:
91 1 Marcus Pearce
92 1 Marcus Pearce
<pre>
93 1 Marcus Pearce
CL-USER> (ppm-star:write-model-to-postscript 
94 1 Marcus Pearce
          (ppm-star::build-model '((a b r a c a d a b r a)) '(a b c d r))
95 1 Marcus Pearce
          "/tmp/ppm.ps")
96 1 Marcus Pearce
NIL
97 1 Marcus Pearce
</pre>