Idyom » History » Version 13

Jeremy Gow, 2013-02-22 01:35 PM

1 11 Jeremy Gow
h1. Running IDyOM 
2 1 Marcus Pearce
3 11 Jeremy Gow
{{>toc}}
4 1 Marcus Pearce
5 11 Jeremy Gow
h2. <code>idyom:idyom</code> 
6 1 Marcus Pearce
7 12 Jeremy Gow
The main workhorse function is <code>idyom:idyom</code>, which has three required arguments and a number of optional keyword arguments.
8 1 Marcus Pearce
9 13 Jeremy Gow
h3. Required parameters
10 1 Marcus Pearce
11 12 Jeremy Gow
* dataset-id: a dataset id, e.g. 1.
12 12 Jeremy Gow
* target-viewpoints: a list of basic viewpoints to predict, e.g. '(:cpitch :bioi)
13 12 Jeremy Gow
* source-viewpoints: a list of viewpoints to use in prediction, e.g. '((:cpintfref :cpint) :bioi)
14 12 Jeremy Gow
** Passing <code>:select</code> will trigger viewpoint selection (see further options below)
15 12 Jeremy Gow
16 12 Jeremy Gow
See the [[List of viewpoints]] for a description of the various viewpoints available in IDyOM.  So a simple call to IDyOM would be:
17 12 Jeremy Gow
<pre>
18 12 Jeremy Gow
CL-USER> (idyom:idyom 1 '(:cpitch) '(:cpitch))
19 12 Jeremy Gow
</pre>
20 2 Marcus Pearce
21 13 Jeremy Gow
h3. Statistical modelling parameters
22 2 Marcus Pearce
23 1 Marcus Pearce
* pretraining-ids: a list of dataset-ids to pretrain the long-term models 
24 1 Marcus Pearce
** e.g., '(0 1 7)
25 1 Marcus Pearce
* k: an integer designating the number of cross-validation folds to use 
26 1 Marcus Pearce
** 1 = no cross-validation, but also no training set unless the models are pretrained; 
27 1 Marcus Pearce
** :full = as many folds as there are compositions in the dataset
28 1 Marcus Pearce
** default = 10 
29 1 Marcus Pearce
* resampling-indices: you can limit the modelling to a particular set of resampling folds
30 1 Marcus Pearce
* models: whether to use the short-term or long-term models or both
31 1 Marcus Pearce
** :stm - short-term model only 
32 1 Marcus Pearce
** :ltm - long-term model only 
33 1 Marcus Pearce
** :ltm+ - the long-term model trained incrementally on the test set 
34 1 Marcus Pearce
** :both - :stm + :ltm 
35 1 Marcus Pearce
** :both+ - :stm + :ltm+ (this is the default)
36 1 Marcus Pearce
* ltm-order-bound: the order bound for the long-term model (the default <code>nil</code> means no order bound, otherwise an integer indicates the bound in number of events)
37 1 Marcus Pearce
* ltm-mixtures: whether to use mixtures for the LTM (default <code>t</code>)
38 1 Marcus Pearce
* ltm-update-exclusion: whether to use update exclusion for the LTM (default <code>nil</code>)
39 1 Marcus Pearce
* ltm-escape: the escape method to use for the LTM (<code>:a :b :c :d :x</code> - default <code>:c</code>)
40 1 Marcus Pearce
* stm-order-bound: the order bound to use for the short-term model (default <code>nil</code>)
41 1 Marcus Pearce
* stm-mixtures: whether to use mixtures for the STM (default <code>t</code>)
42 1 Marcus Pearce
* stm-update-exclusion: whether to use update exclusion for the STM (default <code>t</code>)
43 1 Marcus Pearce
* stm-escape: the escape method for the STM (default <code>:x</code>)
44 1 Marcus Pearce
45 1 Marcus Pearce
See "Pearce [2005, chapter 6]":http://webprojects.eecs.qmul.ac.uk/marcusp/papers/Pearce2005.pdf for a description and explanation of these parameters.
46 2 Marcus Pearce
47 13 Jeremy Gow
h3. Viewpoint selection parameters
48 2 Marcus Pearce
49 2 Marcus Pearce
* dp: the number of decimal places to use when comparing information contents in viewpoint selection
50 2 Marcus Pearce
** full floating point precision is used if this is <code>nil</code> (the default)
51 2 Marcus Pearce
* max-links: the maximum number of links to use when creating linked viewpoints in viewpoint selection
52 2 Marcus Pearce
** the default is 2
53 2 Marcus Pearce
54 13 Jeremy Gow
h3. Output parameters
55 2 Marcus Pearce
56 2 Marcus Pearce
* output-path: a string indicating a directory in which to write the output 
57 3 Marcus Pearce
** output is only written to the console if this is <code>nil</code>
58 2 Marcus Pearce
* detail: an integer which determines how the information content is averaged in the output: 
59 1 Marcus Pearce
** 1: averaged over the entire dataset 
60 1 Marcus Pearce
** 2: and also averaged over each composition 
61 2 Marcus Pearce
** 3: and also for each event in each composition
62 2 Marcus Pearce
63 2 Marcus Pearce
64 13 Jeremy Gow
h2. <code>resampling:idyom-resample</code>
65 7 Marcus Pearce
66 11 Jeremy Gow
The top-level function in turn passes its arguments on to a number of sub-functions which can be used independently. 
67 2 Marcus Pearce
<code>RESAMPLING:DATASET-PREDICTION</code> accepts the following arguments (all but the first three are optional, keyword arguments): 
68 2 Marcus Pearce
69 2 Marcus Pearce
* dataset-id: a dataset id 
70 2 Marcus Pearce
** e.g., 1
71 2 Marcus Pearce
* basic-attributes: a list of basic attributes to predict 
72 2 Marcus Pearce
** e.g., '(cpitch bioi)
73 2 Marcus Pearce
* attributes: a list of attributes to use in prediction
74 2 Marcus Pearce
** e.g., '((cpintfref cpint) bioi)
75 2 Marcus Pearce
* pretraining-ids: a list of dataset-ids to pretrain the long-term models 
76 2 Marcus Pearce
** e.g., '(0 1 7)
77 2 Marcus Pearce
* k: an integer designating the number of cross-validation folds to use 
78 2 Marcus Pearce
** 1 = no cross-validation, but also no training set unless the models are pretrained; 
79 2 Marcus Pearce
** :full = as many folds as there are compositions in the dataset
80 2 Marcus Pearce
** default = 10 
81 2 Marcus Pearce
* resampling-indices: you can limit the modelling to a particular set of resampling folds
82 2 Marcus Pearce
* models: whether to use the short-term or long-term models or both
83 2 Marcus Pearce
** :stm - short-term model only 
84 2 Marcus Pearce
** :ltm - long-term model only 
85 2 Marcus Pearce
** :ltm+ - the long-term model trained incrementally on the test set 
86 2 Marcus Pearce
** :both - :stm + :ltm 
87 2 Marcus Pearce
** :both+ - :stm + :ltm+ (this is the default)
88 2 Marcus Pearce
* ltm-order-bound: the order bound for the long-term model (the default <code>nil</code> means no order bound, otherwise an integer indicates the bound in number of events)
89 2 Marcus Pearce
* ltm-mixtures: whether to use mixtures for the LTM (default <code>t</code>)
90 2 Marcus Pearce
* ltm-update-exclusion: whether to use update exclusion for the LTM (default <code>nil</code>)
91 2 Marcus Pearce
* ltm-escape: the escape method to use for the LTM (<code>:a :b :c :d :x</code> - default <code>:c</code>)
92 2 Marcus Pearce
* stm-order-bound: the order bound to use for the short-term model (default <code>nil</code>)
93 2 Marcus Pearce
* stm-mixtures: whether to use mixtures for the STM (default <code>t</code>)
94 2 Marcus Pearce
* stm-update-exclusion: whether to use update exclusion for the STM (default <code>t</code>)
95 2 Marcus Pearce
* stm-escape: the escape method for the STM (default <code>:x</code>)
96 1 Marcus Pearce
97 1 Marcus Pearce
<code>RESAMPLING:OUTPUT-INFORMATION-CONTENT</code> takes the output of <code>RESAMPLING:DATASET-PREDICTION</code> and returns the average information content. It takes the following arguments:
98 1 Marcus Pearce
99 1 Marcus Pearce
* predictions: the output of <code>RESAMPLING:DATASET-PREDICTION</code>
100 1 Marcus Pearce
* detail: an integer which determines how the information content is averaged (these are returned as multiple values): 
101 1 Marcus Pearce
** 1: averaged over the entire dataset 
102 1 Marcus Pearce
** 2: and also averaged over each composition 
103 1 Marcus Pearce
** 3: and also for each event in each composition
104 1 Marcus Pearce
105 11 Jeremy Gow
h2. <code>resampling:format-information-content</code>
106 11 Jeremy Gow
107 1 Marcus Pearce
<code>RESAMPLING:FORMAT-INFORMATION-CONTENT</code> takes the output of <code>RESAMPLING:DATASET-PREDICTION</code> and writes it to file. It takes the following arguments:
108 1 Marcus Pearce
109 1 Marcus Pearce
* predictions: the output of <code>RESAMPLING:DATASET-PREDICTION</code>
110 1 Marcus Pearce
* file: a string denoting a file
111 1 Marcus Pearce
* dataset-id: an integer reflecting the dataset-id
112 1 Marcus Pearce
* detail: an integer which determines how the information content is averaged (these are returned as multiple values): 
113 1 Marcus Pearce
** 1: averaged over the entire dataset 
114 1 Marcus Pearce
** 2: and also averaged over each composition 
115 1 Marcus Pearce
** 3: and also for each event in each composition
116 1 Marcus Pearce
117 13 Jeremy Gow
h2. Examples
118 1 Marcus Pearce
119 13 Jeremy Gow
h3. Mean melody IC
120 1 Marcus Pearce
121 13 Jeremy Gow
To get mean information contents for each melody of dataset 0 in a list 
122 13 Jeremy Gow
123 1 Marcus Pearce
<pre>
124 1 Marcus Pearce
CL-USER> (resampling:output-information-content 
125 1 Marcus Pearce
          (resampling:dataset-prediction 0 '(cpitch) '(cpintfref cpint))
126 1 Marcus Pearce
          2)
127 1 Marcus Pearce
2.493305
128 1 Marcus Pearce
(2.1368716 2.8534691 2.6938546 2.6491673 2.4993074 2.6098127 2.7728052 2.772861
129 1 Marcus Pearce
 2.5921957 2.905856 2.3591626 2.957503 2.4042292 2.7562473 2.3996017 2.8073587
130 1 Marcus Pearce
 2.114944 1.7434102 2.2310295 2.6374347 2.361792 1.9476132 2.501488 2.5472867
131 1 Marcus Pearce
 2.1056154 2.8225484 2.134257 2.9162033 3.0715692 2.9012227 2.7291088 2.866882
132 1 Marcus Pearce
 2.8795822 2.4571223 2.9277062 2.7861307 2.6623116 2.3304622 2.4217033
133 1 Marcus Pearce
 2.0556943 2.4048684 2.914848 2.7182267 3.0894585 2.873869 1.8821808 2.640174
134 1 Marcus Pearce
 2.8165438 2.5423129 2.3011856 3.1477294 2.655349 2.5216308 2.0667994 3.2579045
135 1 Marcus Pearce
 2.573013 2.6035044 2.202191 2.622113 2.2621205 2.3617425 2.7526956 2.3281655
136 1 Marcus Pearce
 2.9357266 2.3372407 3.1848125 2.67367 2.1906006 2.7835917 2.6332111 3.206142
137 1 Marcus Pearce
 2.1426969 2.194259 2.415167 1.9769101 2.0870917 2.7844474 2.2373738 2.772138
138 1 Marcus Pearce
 2.9702199 1.724408 2.473073 2.2464263 2.2452457 2.688889 2.6299863 2.2223835
139 1 Marcus Pearce
 2.8082614 2.673671 2.7693706 2.3369458 2.5016947 2.3837066 2.3682225 2.795649
140 1 Marcus Pearce
 2.9063463 2.5880773 2.0457468 1.8635312 2.4522712 1.5877498 2.8802161
141 1 Marcus Pearce
 2.7988417 2.3125513 1.7245895 2.2404804 2.1694546 2.365556 1.5905867 1.3827317
142 1 Marcus Pearce
 2.2706041 3.023884 2.2864542 2.1259797 2.713626 2.1967313 2.5721254 2.5812547
143 1 Marcus Pearce
 2.8233812 2.3134546 2.6203637 2.945946 2.601433 2.1920888 2.3732007 2.440137
144 1 Marcus Pearce
 2.4291563 2.3676903 2.734724 3.0283954 2.8076048 2.7796154 2.326931 2.1779459
145 1 Marcus Pearce
 2.2570527 2.2688026 1.3976555 2.030298 2.640235 2.568248 2.6338177 2.157162
146 1 Marcus Pearce
 2.3915367 2.7873137 2.3088667 2.2176988 2.4402564 2.8062992 2.784044 2.4296925
147 1 Marcus Pearce
 2.3520193 2.6146257)
148 1 Marcus Pearce
</pre>
149 1 Marcus Pearce
150 13 Jeremy Gow
h3. Write note IC to file
151 1 Marcus Pearce
152 13 Jeremy Gow
To write the information contents for each note of each melody in dataset 0 to a file 
153 13 Jeremy Gow
154 1 Marcus Pearce
<pre>
155 1 Marcus Pearce
CL-USER> (resampling:format-information-content 
156 1 Marcus Pearce
          (resampling:dataset-prediction 0 '(cpitch) '(cpintfref cpint))
157 1 Marcus Pearce
          "/tmp/foo.dat"
158 1 Marcus Pearce
          0
159 1 Marcus Pearce
          3)
160 1 Marcus Pearce
</pre>
161 1 Marcus Pearce
162 13 Jeremy Gow
h3. Conklin & Witten (1995)
163 13 Jeremy Gow
164 13 Jeremy Gow
To simulate the experiments of Conklin & Witten (1995) 
165 1 Marcus Pearce
166 1 Marcus Pearce
<pre>
167 1 Marcus Pearce
CL-USER> (resampling:conkwit95)
168 1 Marcus Pearce
Simulation of the experiments of Conklin & Witten (1995, Table 4).
169 1 Marcus Pearce
System 1; Mean Information Content: 2.33 
170 1 Marcus Pearce
System 2; Mean Information Content: 2.36 
171 1 Marcus Pearce
System 3; Mean Information Content: 2.09 
172 1 Marcus Pearce
System 4; Mean Information Content: 2.01 
173 1 Marcus Pearce
System 5; Mean Information Content: 2.08 
174 1 Marcus Pearce
System 6; Mean Information Content: 1.90 
175 1 Marcus Pearce
System 7; Mean Information Content: 1.88 
176 1 Marcus Pearce
System 8; Mean Information Content: 1.86 
177 1 Marcus Pearce
NIL
178 1 Marcus Pearce
</pre>
179 1 Marcus Pearce
180 1 Marcus Pearce
Compare with "Conklin & Witten [1995, JNMR, table 4]":http://www.sc.ehu.es/ccwbayes/members/conklin/papers/jnmr95.pdf
181 1 Marcus Pearce
182 11 Jeremy Gow
h2. Viewpoint Selection 
183 1 Marcus Pearce
184 1 Marcus Pearce
Two functions are supplied for searching a space of viewpoints: <code>run-hill-climber</code> and <code>run-best-first</code>, which take 4 arguments:
185 1 Marcus Pearce
186 1 Marcus Pearce
* a list of viewpoints: the algorithm searches through the space of combinations of these viewpoints
187 1 Marcus Pearce
* a start state (usually nil, the empty viewpoint system)
188 1 Marcus Pearce
* an evaluation function returning a numeric performance metric: e.g., the mean information content of the dataset returned by <code>dataset-prediction</code>
189 1 Marcus Pearce
* a symbol describing which way to optimise the metric: <code>:desc</code> mean lower values are better <code>:asc</code> mean greater values are better
190 1 Marcus Pearce
191 1 Marcus Pearce
Here is an example:
192 1 Marcus Pearce
193 1 Marcus Pearce
<pre>
194 1 Marcus Pearce
CL-USER> (viewpoint-selection:run-hill-climber 
195 1 Marcus Pearce
          '(:cpitch :cpintfref :cpint :contour)
196 1 Marcus Pearce
          nil
197 1 Marcus Pearce
          #'(lambda (viewpoints)
198 1 Marcus Pearce
              (utils:round-to-nearest-decimal-place 
199 1 Marcus Pearce
               (resampling:output-information-content 
200 1 Marcus Pearce
                (resampling:dataset-prediction 0 '(cpitch) viewpoints :k 10 :models :both+) 
201 1 Marcus Pearce
                1)
202 1 Marcus Pearce
               2))
203 1 Marcus Pearce
          :desc)
204 1 Marcus Pearce
205 1 Marcus Pearce
 =============================================================================
206 1 Marcus Pearce
   System                                                Score
207 1 Marcus Pearce
 -----------------------------------------------------------------------------
208 1 Marcus Pearce
   NIL                                                   NIL
209 1 Marcus Pearce
   (CPITCH)                                              2.52
210 1 Marcus Pearce
   (CPINT CPITCH)                                        2.43
211 1 Marcus Pearce
   (CPINTFREF CPINT CPITCH)                              2.38
212 1 Marcus Pearce
 =============================================================================
213 1 Marcus Pearce
#S(VIEWPOINT-SELECTION::RECORD :STATE (:CPINTFREF :CPINT :CPITCH) :WEIGHT 2.38)
214 1 Marcus Pearce
</pre>
215 1 Marcus Pearce
216 1 Marcus Pearce
Since this can be quite a time consuming process, there are also functions for caching the results.
217 1 Marcus Pearce
218 1 Marcus Pearce
<pre>
219 1 Marcus Pearce
(initialise-vs-cache)
220 1 Marcus Pearce
(load-vs-cache filename package)
221 1 Marcus Pearce
(store-vs-cache filename package)
222 1 Marcus Pearce
</pre>