Idyom » History » Version 17

Jeremy Gow, 2013-02-22 04:39 PM

1 11 Jeremy Gow
h1. Running IDyOM 
2 1 Marcus Pearce
3 11 Jeremy Gow
{{>toc}}
4 1 Marcus Pearce
5 11 Jeremy Gow
h2. <code>idyom:idyom</code> 
6 1 Marcus Pearce
7 12 Jeremy Gow
The main workhorse function is <code>idyom:idyom</code>, which has three required arguments and a number of optional keyword arguments.
8 1 Marcus Pearce
9 13 Jeremy Gow
h3. Required parameters
10 1 Marcus Pearce
11 12 Jeremy Gow
* dataset-id: a dataset id, e.g. 1.
12 12 Jeremy Gow
* target-viewpoints: a list of basic viewpoints to predict, e.g. '(:cpitch :bioi)
13 12 Jeremy Gow
* source-viewpoints: a list of viewpoints to use in prediction, e.g. '((:cpintfref :cpint) :bioi)
14 12 Jeremy Gow
** Passing <code>:select</code> will trigger viewpoint selection (see further options below)
15 12 Jeremy Gow
16 17 Jeremy Gow
See the [[List of viewpoints]] for a description of the various viewpoints available in IDyOM.  A simple call to IDyOM would be:
17 12 Jeremy Gow
<pre>
18 16 Jeremy Gow
CL-USER> (idyom:idyom 1 '(cpitch) '(cpitch cpint))
19 15 Jeremy Gow
2.2490792
20 15 Jeremy Gow
(1.9049941 2.427845 2.0234334 1.7971386 1.8213106 1.9313766 2.3758402 1.8310248
21 14 Jeremy Gow
...
22 12 Jeremy Gow
</pre>
23 15 Jeremy Gow
This predicts the pitch values in dataset 1, based on previous pitches (cpitch) and pitch intervals (cpint).  IDyOM computes the information content for each note, and by default returns a list of composition mean ICs (mean note IC for each composition, second value) and dataset mean IC (the mean composition mean IC, first value).
24 2 Marcus Pearce
25 13 Jeremy Gow
h3. Statistical modelling parameters
26 2 Marcus Pearce
27 1 Marcus Pearce
* pretraining-ids: a list of dataset-ids to pretrain the long-term models 
28 1 Marcus Pearce
** e.g., '(0 1 7)
29 1 Marcus Pearce
* k: an integer designating the number of cross-validation folds to use 
30 1 Marcus Pearce
** 1 = no cross-validation, but also no training set unless the models are pretrained; 
31 1 Marcus Pearce
** :full = as many folds as there are compositions in the dataset
32 1 Marcus Pearce
** default = 10 
33 1 Marcus Pearce
* resampling-indices: you can limit the modelling to a particular set of resampling folds
34 1 Marcus Pearce
* models: whether to use the short-term or long-term models or both
35 1 Marcus Pearce
** :stm - short-term model only 
36 1 Marcus Pearce
** :ltm - long-term model only 
37 1 Marcus Pearce
** :ltm+ - the long-term model trained incrementally on the test set 
38 1 Marcus Pearce
** :both - :stm + :ltm 
39 1 Marcus Pearce
** :both+ - :stm + :ltm+ (this is the default)
40 1 Marcus Pearce
* ltm-order-bound: the order bound for the long-term model (the default <code>nil</code> means no order bound, otherwise an integer indicates the bound in number of events)
41 1 Marcus Pearce
* ltm-mixtures: whether to use mixtures for the LTM (default <code>t</code>)
42 1 Marcus Pearce
* ltm-update-exclusion: whether to use update exclusion for the LTM (default <code>nil</code>)
43 1 Marcus Pearce
* ltm-escape: the escape method to use for the LTM (<code>:a :b :c :d :x</code> - default <code>:c</code>)
44 1 Marcus Pearce
* stm-order-bound: the order bound to use for the short-term model (default <code>nil</code>)
45 1 Marcus Pearce
* stm-mixtures: whether to use mixtures for the STM (default <code>t</code>)
46 1 Marcus Pearce
* stm-update-exclusion: whether to use update exclusion for the STM (default <code>t</code>)
47 1 Marcus Pearce
* stm-escape: the escape method for the STM (default <code>:x</code>)
48 1 Marcus Pearce
49 1 Marcus Pearce
See "Pearce [2005, chapter 6]":http://webprojects.eecs.qmul.ac.uk/marcusp/papers/Pearce2005.pdf for a description and explanation of these parameters.
50 2 Marcus Pearce
51 13 Jeremy Gow
h3. Viewpoint selection parameters
52 2 Marcus Pearce
53 2 Marcus Pearce
* dp: the number of decimal places to use when comparing information contents in viewpoint selection
54 2 Marcus Pearce
** full floating point precision is used if this is <code>nil</code> (the default)
55 2 Marcus Pearce
* max-links: the maximum number of links to use when creating linked viewpoints in viewpoint selection
56 2 Marcus Pearce
** the default is 2
57 2 Marcus Pearce
58 13 Jeremy Gow
h3. Output parameters
59 2 Marcus Pearce
60 2 Marcus Pearce
* output-path: a string indicating a directory in which to write the output 
61 3 Marcus Pearce
** output is only written to the console if this is <code>nil</code>
62 2 Marcus Pearce
* detail: an integer which determines how the information content is averaged in the output: 
63 1 Marcus Pearce
** 1: averaged over the entire dataset 
64 1 Marcus Pearce
** 2: and also averaged over each composition 
65 2 Marcus Pearce
** 3: and also for each event in each composition
66 2 Marcus Pearce
67 2 Marcus Pearce
68 13 Jeremy Gow
h2. <code>resampling:idyom-resample</code>
69 7 Marcus Pearce
70 11 Jeremy Gow
The top-level function in turn passes its arguments on to a number of sub-functions which can be used independently. 
71 2 Marcus Pearce
<code>RESAMPLING:DATASET-PREDICTION</code> accepts the following arguments (all but the first three are optional, keyword arguments): 
72 2 Marcus Pearce
73 2 Marcus Pearce
* dataset-id: a dataset id 
74 2 Marcus Pearce
** e.g., 1
75 2 Marcus Pearce
* basic-attributes: a list of basic attributes to predict 
76 2 Marcus Pearce
** e.g., '(cpitch bioi)
77 2 Marcus Pearce
* attributes: a list of attributes to use in prediction
78 2 Marcus Pearce
** e.g., '((cpintfref cpint) bioi)
79 2 Marcus Pearce
* pretraining-ids: a list of dataset-ids to pretrain the long-term models 
80 2 Marcus Pearce
** e.g., '(0 1 7)
81 2 Marcus Pearce
* k: an integer designating the number of cross-validation folds to use 
82 2 Marcus Pearce
** 1 = no cross-validation, but also no training set unless the models are pretrained; 
83 2 Marcus Pearce
** :full = as many folds as there are compositions in the dataset
84 2 Marcus Pearce
** default = 10 
85 2 Marcus Pearce
* resampling-indices: you can limit the modelling to a particular set of resampling folds
86 2 Marcus Pearce
* models: whether to use the short-term or long-term models or both
87 2 Marcus Pearce
** :stm - short-term model only 
88 2 Marcus Pearce
** :ltm - long-term model only 
89 2 Marcus Pearce
** :ltm+ - the long-term model trained incrementally on the test set 
90 2 Marcus Pearce
** :both - :stm + :ltm 
91 2 Marcus Pearce
** :both+ - :stm + :ltm+ (this is the default)
92 2 Marcus Pearce
* ltm-order-bound: the order bound for the long-term model (the default <code>nil</code> means no order bound, otherwise an integer indicates the bound in number of events)
93 2 Marcus Pearce
* ltm-mixtures: whether to use mixtures for the LTM (default <code>t</code>)
94 2 Marcus Pearce
* ltm-update-exclusion: whether to use update exclusion for the LTM (default <code>nil</code>)
95 2 Marcus Pearce
* ltm-escape: the escape method to use for the LTM (<code>:a :b :c :d :x</code> - default <code>:c</code>)
96 2 Marcus Pearce
* stm-order-bound: the order bound to use for the short-term model (default <code>nil</code>)
97 2 Marcus Pearce
* stm-mixtures: whether to use mixtures for the STM (default <code>t</code>)
98 2 Marcus Pearce
* stm-update-exclusion: whether to use update exclusion for the STM (default <code>t</code>)
99 2 Marcus Pearce
* stm-escape: the escape method for the STM (default <code>:x</code>)
100 1 Marcus Pearce
101 1 Marcus Pearce
<code>RESAMPLING:OUTPUT-INFORMATION-CONTENT</code> takes the output of <code>RESAMPLING:DATASET-PREDICTION</code> and returns the average information content. It takes the following arguments:
102 1 Marcus Pearce
103 1 Marcus Pearce
* predictions: the output of <code>RESAMPLING:DATASET-PREDICTION</code>
104 1 Marcus Pearce
* detail: an integer which determines how the information content is averaged (these are returned as multiple values): 
105 1 Marcus Pearce
** 1: averaged over the entire dataset 
106 1 Marcus Pearce
** 2: and also averaged over each composition 
107 1 Marcus Pearce
** 3: and also for each event in each composition
108 1 Marcus Pearce
109 11 Jeremy Gow
h2. <code>resampling:format-information-content</code>
110 11 Jeremy Gow
111 1 Marcus Pearce
<code>RESAMPLING:FORMAT-INFORMATION-CONTENT</code> takes the output of <code>RESAMPLING:DATASET-PREDICTION</code> and writes it to file. It takes the following arguments:
112 1 Marcus Pearce
113 1 Marcus Pearce
* predictions: the output of <code>RESAMPLING:DATASET-PREDICTION</code>
114 1 Marcus Pearce
* file: a string denoting a file
115 1 Marcus Pearce
* dataset-id: an integer reflecting the dataset-id
116 1 Marcus Pearce
* detail: an integer which determines how the information content is averaged (these are returned as multiple values): 
117 1 Marcus Pearce
** 1: averaged over the entire dataset 
118 1 Marcus Pearce
** 2: and also averaged over each composition 
119 1 Marcus Pearce
** 3: and also for each event in each composition
120 1 Marcus Pearce
121 13 Jeremy Gow
h2. Examples
122 1 Marcus Pearce
123 13 Jeremy Gow
h3. Mean melody IC
124 1 Marcus Pearce
125 13 Jeremy Gow
To get mean information contents for each melody of dataset 0 in a list 
126 13 Jeremy Gow
127 1 Marcus Pearce
<pre>
128 1 Marcus Pearce
CL-USER> (resampling:output-information-content 
129 1 Marcus Pearce
          (resampling:dataset-prediction 0 '(cpitch) '(cpintfref cpint))
130 1 Marcus Pearce
          2)
131 1 Marcus Pearce
2.493305
132 1 Marcus Pearce
(2.1368716 2.8534691 2.6938546 2.6491673 2.4993074 2.6098127 2.7728052 2.772861
133 1 Marcus Pearce
 2.5921957 2.905856 2.3591626 2.957503 2.4042292 2.7562473 2.3996017 2.8073587
134 1 Marcus Pearce
 2.114944 1.7434102 2.2310295 2.6374347 2.361792 1.9476132 2.501488 2.5472867
135 1 Marcus Pearce
 2.1056154 2.8225484 2.134257 2.9162033 3.0715692 2.9012227 2.7291088 2.866882
136 1 Marcus Pearce
 2.8795822 2.4571223 2.9277062 2.7861307 2.6623116 2.3304622 2.4217033
137 1 Marcus Pearce
 2.0556943 2.4048684 2.914848 2.7182267 3.0894585 2.873869 1.8821808 2.640174
138 1 Marcus Pearce
 2.8165438 2.5423129 2.3011856 3.1477294 2.655349 2.5216308 2.0667994 3.2579045
139 1 Marcus Pearce
 2.573013 2.6035044 2.202191 2.622113 2.2621205 2.3617425 2.7526956 2.3281655
140 1 Marcus Pearce
 2.9357266 2.3372407 3.1848125 2.67367 2.1906006 2.7835917 2.6332111 3.206142
141 1 Marcus Pearce
 2.1426969 2.194259 2.415167 1.9769101 2.0870917 2.7844474 2.2373738 2.772138
142 1 Marcus Pearce
 2.9702199 1.724408 2.473073 2.2464263 2.2452457 2.688889 2.6299863 2.2223835
143 1 Marcus Pearce
 2.8082614 2.673671 2.7693706 2.3369458 2.5016947 2.3837066 2.3682225 2.795649
144 1 Marcus Pearce
 2.9063463 2.5880773 2.0457468 1.8635312 2.4522712 1.5877498 2.8802161
145 1 Marcus Pearce
 2.7988417 2.3125513 1.7245895 2.2404804 2.1694546 2.365556 1.5905867 1.3827317
146 1 Marcus Pearce
 2.2706041 3.023884 2.2864542 2.1259797 2.713626 2.1967313 2.5721254 2.5812547
147 1 Marcus Pearce
 2.8233812 2.3134546 2.6203637 2.945946 2.601433 2.1920888 2.3732007 2.440137
148 1 Marcus Pearce
 2.4291563 2.3676903 2.734724 3.0283954 2.8076048 2.7796154 2.326931 2.1779459
149 1 Marcus Pearce
 2.2570527 2.2688026 1.3976555 2.030298 2.640235 2.568248 2.6338177 2.157162
150 1 Marcus Pearce
 2.3915367 2.7873137 2.3088667 2.2176988 2.4402564 2.8062992 2.784044 2.4296925
151 1 Marcus Pearce
 2.3520193 2.6146257)
152 1 Marcus Pearce
</pre>
153 1 Marcus Pearce
154 13 Jeremy Gow
h3. Write note IC to file
155 1 Marcus Pearce
156 13 Jeremy Gow
To write the information contents for each note of each melody in dataset 0 to a file 
157 13 Jeremy Gow
158 1 Marcus Pearce
<pre>
159 1 Marcus Pearce
CL-USER> (resampling:format-information-content 
160 1 Marcus Pearce
          (resampling:dataset-prediction 0 '(cpitch) '(cpintfref cpint))
161 1 Marcus Pearce
          "/tmp/foo.dat"
162 1 Marcus Pearce
          0
163 1 Marcus Pearce
          3)
164 1 Marcus Pearce
</pre>
165 1 Marcus Pearce
166 13 Jeremy Gow
h3. Conklin & Witten (1995)
167 13 Jeremy Gow
168 13 Jeremy Gow
To simulate the experiments of Conklin & Witten (1995) 
169 1 Marcus Pearce
170 1 Marcus Pearce
<pre>
171 1 Marcus Pearce
CL-USER> (resampling:conkwit95)
172 1 Marcus Pearce
Simulation of the experiments of Conklin & Witten (1995, Table 4).
173 1 Marcus Pearce
System 1; Mean Information Content: 2.33 
174 1 Marcus Pearce
System 2; Mean Information Content: 2.36 
175 1 Marcus Pearce
System 3; Mean Information Content: 2.09 
176 1 Marcus Pearce
System 4; Mean Information Content: 2.01 
177 1 Marcus Pearce
System 5; Mean Information Content: 2.08 
178 1 Marcus Pearce
System 6; Mean Information Content: 1.90 
179 1 Marcus Pearce
System 7; Mean Information Content: 1.88 
180 1 Marcus Pearce
System 8; Mean Information Content: 1.86 
181 1 Marcus Pearce
NIL
182 1 Marcus Pearce
</pre>
183 1 Marcus Pearce
184 1 Marcus Pearce
Compare with "Conklin & Witten [1995, JNMR, table 4]":http://www.sc.ehu.es/ccwbayes/members/conklin/papers/jnmr95.pdf
185 1 Marcus Pearce
186 11 Jeremy Gow
h2. Viewpoint Selection 
187 1 Marcus Pearce
188 1 Marcus Pearce
Two functions are supplied for searching a space of viewpoints: <code>run-hill-climber</code> and <code>run-best-first</code>, which take 4 arguments:
189 1 Marcus Pearce
190 1 Marcus Pearce
* a list of viewpoints: the algorithm searches through the space of combinations of these viewpoints
191 1 Marcus Pearce
* a start state (usually nil, the empty viewpoint system)
192 1 Marcus Pearce
* an evaluation function returning a numeric performance metric: e.g., the mean information content of the dataset returned by <code>dataset-prediction</code>
193 1 Marcus Pearce
* a symbol describing which way to optimise the metric: <code>:desc</code> mean lower values are better <code>:asc</code> mean greater values are better
194 1 Marcus Pearce
195 1 Marcus Pearce
Here is an example:
196 1 Marcus Pearce
197 1 Marcus Pearce
<pre>
198 1 Marcus Pearce
CL-USER> (viewpoint-selection:run-hill-climber 
199 1 Marcus Pearce
          '(:cpitch :cpintfref :cpint :contour)
200 1 Marcus Pearce
          nil
201 1 Marcus Pearce
          #'(lambda (viewpoints)
202 1 Marcus Pearce
              (utils:round-to-nearest-decimal-place 
203 1 Marcus Pearce
               (resampling:output-information-content 
204 1 Marcus Pearce
                (resampling:dataset-prediction 0 '(cpitch) viewpoints :k 10 :models :both+) 
205 1 Marcus Pearce
                1)
206 1 Marcus Pearce
               2))
207 1 Marcus Pearce
          :desc)
208 1 Marcus Pearce
209 1 Marcus Pearce
 =============================================================================
210 1 Marcus Pearce
   System                                                Score
211 1 Marcus Pearce
 -----------------------------------------------------------------------------
212 1 Marcus Pearce
   NIL                                                   NIL
213 1 Marcus Pearce
   (CPITCH)                                              2.52
214 1 Marcus Pearce
   (CPINT CPITCH)                                        2.43
215 1 Marcus Pearce
   (CPINTFREF CPINT CPITCH)                              2.38
216 1 Marcus Pearce
 =============================================================================
217 1 Marcus Pearce
#S(VIEWPOINT-SELECTION::RECORD :STATE (:CPINTFREF :CPINT :CPITCH) :WEIGHT 2.38)
218 1 Marcus Pearce
</pre>
219 1 Marcus Pearce
220 1 Marcus Pearce
Since this can be quite a time consuming process, there are also functions for caching the results.
221 1 Marcus Pearce
222 1 Marcus Pearce
<pre>
223 1 Marcus Pearce
(initialise-vs-cache)
224 1 Marcus Pearce
(load-vs-cache filename package)
225 1 Marcus Pearce
(store-vs-cache filename package)
226 1 Marcus Pearce
</pre>