Version 4 - History - Idyom - IDyOM Project

Idyom » History » Version 4

Marcus Pearce, 2012-02-02 02:00 PM

-Marcus Pearce
+h1. idyom
 Marcus Pearce
-Marcus Pearce
+h2. Loading the system
 Marcus Pearce
-Marcus Pearce
+<pre>
-Marcus Pearce
+CL-USER> (asdf:oos 'asdf:load-op 'idyom)
-Marcus Pearce
+...
-Marcus Pearce
+</pre>
 Marcus Pearce
-Marcus Pearce
+You will need to set the value of the variable <code>*root-dir*</code> in the file "apps/apps.lisp": to point to the location of the idyom repository; this is so that files can be saved in the <code>data</code> directory within the repository.
 Marcus Pearce
-Marcus Pearce
+h2. Directory Structure
 Marcus Pearce
-Marcus Pearce
+The layout of the code is as follows:
 Marcus Pearce
-Marcus Pearce
+* idyom.asd: the system definition
-Marcus Pearce
+* utils/: general-purpose utilities
-Marcus Pearce
+* amuse/: an interface to amuse music data interface
-Marcus Pearce
+** amuse-interface.lisp: look here to make this less dependent on amuse-mtp
-Marcus Pearce
+** viewpoint-extensions.lisp: extensions to the viewpoint framework useful for modelling
-Marcus Pearce
+* ppm/: the multiple-viewpoint system modelling framework
-Marcus Pearce
+** params.lisp: user-level parameters to the statistical modelling
-Marcus Pearce
+** multiple-viewpoint-system.lisp: modelling with multiple-viewpoints
-Marcus Pearce
+** prediction-sets.lisp: representing and combining distributions, computing entropy and information content
-Marcus Pearce
+* apps/: top-level applications for users
-Marcus Pearce
+** resampling.lisp: the main user interface - estimating information content by cross-validation
-Marcus Pearce
+** viewpoint-selection.lisp: on tin does it says the what
-Marcus Pearce
+* data/: directory used internally for storing data
-Marcus Pearce
+** cache/: for storing cache files
-Marcus Pearce
+** models/: for storing model files
-Marcus Pearce
+** resampling/: for storing resampling sets
 Marcus Pearce
-Marcus Pearce
+h2. Usage
 Marcus Pearce
-Marcus Pearce
+h3. Top-level function <code>MAIN:IDYOM</code>
 Marcus Pearce
-Marcus Pearce
+The main workhorse is the function <code>MAIN:IDYOM</code> which accepts the following arguments (the first three are required, the remainder are optional keyword arguments):
 Marcus Pearce
-Marcus Pearce
+_Required parameters_
 Marcus Pearce
-Marcus Pearce
+* dataset-id: a dataset id
-Marcus Pearce
+** e.g., 1
-Marcus Pearce
+* basic-attributes: a list of basic attributes to predict
-Marcus Pearce
+** e.g., '(cpitch bioi)
-Marcus Pearce
+* attributes: a list of attributes to use in prediction
-Marcus Pearce
+** e.g., '((cpintfref cpint) bioi)
 Marcus Pearce
-Marcus Pearce
+_Parameters controlling the statistical modelling_
 Marcus Pearce
-Marcus Pearce
+* pretraining-ids: a list of dataset-ids to pretrain the long-term models
-Marcus Pearce
+** e.g., '(0 1 7)
-Marcus Pearce
+* k: an integer designating the number of cross-validation folds to use
-Marcus Pearce
+** 1 = no cross-validation, but also no training set unless the models are pretrained;
-Marcus Pearce
+** :full = as many folds as there are compositions in the dataset
-Marcus Pearce
+** default = 10
-Marcus Pearce
+* resampling-indices: you can limit the modelling to a particular set of resampling folds
-Marcus Pearce
+* models: whether to use the short-term or long-term models or both
-Marcus Pearce
+** :stm - short-term model only
-Marcus Pearce
+** :ltm - long-term model only
-Marcus Pearce
+** :ltm+ - the long-term model trained incrementally on the test set
-Marcus Pearce
+** :both - :stm + :ltm
-Marcus Pearce
+** :both+ - :stm + :ltm+ (this is the default)
-Marcus Pearce
+* ltm-order-bound: the order bound for the long-term model (the default <code>nil</code> means no order bound, otherwise an integer indicates the bound in number of events)
-Marcus Pearce
+* ltm-mixtures: whether to use mixtures for the LTM (default <code>t</code>)
-Marcus Pearce
+* ltm-update-exclusion: whether to use update exclusion for the LTM (default <code>nil</code>)
-Marcus Pearce
+* ltm-escape: the escape method to use for the LTM (<code>:a :b :c :d :x</code> - default <code>:c</code>)
-Marcus Pearce
+* stm-order-bound: the order bound to use for the short-term model (default <code>nil</code>)
-Marcus Pearce
+* stm-mixtures: whether to use mixtures for the STM (default <code>t</code>)
-Marcus Pearce
+* stm-update-exclusion: whether to use update exclusion for the STM (default <code>t</code>)
-Marcus Pearce
+* stm-escape: the escape method for the STM (default <code>:x</code>)
 Marcus Pearce
-Marcus Pearce
+See "Pearce [2005, chapter 6]":http://webprojects.eecs.qmul.ac.uk/marcusp/papers/Pearce2005.pdf for a description and explanation of these parameters.
 Marcus Pearce
-Marcus Pearce
+_Parameters controlling viewpoint selection_
 Marcus Pearce
-Marcus Pearce
+* dp: the number of decimal places to use when comparing information contents in viewpoint selection
-Marcus Pearce
+** full floating point precision is used if this is <code>nil</code> (the default)
-Marcus Pearce
+* max-links: the maximum number of links to use when creating linked viewpoints in viewpoint selection
-Marcus Pearce
+** the default is 2
 Marcus Pearce
-Marcus Pearce
+_Parameters controlling the output_
 Marcus Pearce
-Marcus Pearce
+* output-path: a string indicating a directory in which to write the output
-Marcus Pearce
+** output is only written to the console if this is <code>nil</code>
-Marcus Pearce
+* detail: an integer which determines how the information content is averaged in the output:
-Marcus Pearce
+** 1: averaged over the entire dataset
-Marcus Pearce
+** 2: and also averaged over each composition
-Marcus Pearce
+** 3: and also for each event in each composition
 Marcus Pearce
-Marcus Pearce
+h3. Subsidiary functions
 Marcus Pearce
-Marcus Pearce
+The function <MAIN:IDYOM> in turn passes its arguments on to a number of sub-functions which can be used independently.
 Marcus Pearce
-Marcus Pearce
+<code>RESAMPLING:DATASET-PREDICTION</code> accepts the following arguments (all but the first three are optional, keyword arguments):
 Marcus Pearce
-Marcus Pearce
+* dataset-id: a dataset id
-Marcus Pearce
+** e.g., 1
-Marcus Pearce
+* basic-attributes: a list of basic attributes to predict
-Marcus Pearce
+** e.g., '(cpitch bioi)
-Marcus Pearce
+* attributes: a list of attributes to use in prediction
-Marcus Pearce
+** e.g., '((cpintfref cpint) bioi)
-Marcus Pearce
+* pretraining-ids: a list of dataset-ids to pretrain the long-term models
-Marcus Pearce
+** e.g., '(0 1 7)
-Marcus Pearce
+* k: an integer designating the number of cross-validation folds to use
-Marcus Pearce
+** 1 = no cross-validation, but also no training set unless the models are pretrained;
-Marcus Pearce
+** :full = as many folds as there are compositions in the dataset
-Marcus Pearce
+** default = 10
-Marcus Pearce
+* resampling-indices: you can limit the modelling to a particular set of resampling folds
-Marcus Pearce
+* models: whether to use the short-term or long-term models or both
-Marcus Pearce
+** :stm - short-term model only
-Marcus Pearce
+** :ltm - long-term model only
-Marcus Pearce
+** :ltm+ - the long-term model trained incrementally on the test set
-Marcus Pearce
+** :both - :stm + :ltm
-Marcus Pearce
+** :both+ - :stm + :ltm+ (this is the default)
-Marcus Pearce
+* ltm-order-bound: the order bound for the long-term model (the default <code>nil</code> means no order bound, otherwise an integer indicates the bound in number of events)
-Marcus Pearce
+* ltm-mixtures: whether to use mixtures for the LTM (default <code>t</code>)
-Marcus Pearce
+* ltm-update-exclusion: whether to use update exclusion for the LTM (default <code>nil</code>)
-Marcus Pearce
+* ltm-escape: the escape method to use for the LTM (<code>:a :b :c :d :x</code> - default <code>:c</code>)
-Marcus Pearce
+* stm-order-bound: the order bound to use for the short-term model (default <code>nil</code>)
-Marcus Pearce
+* stm-mixtures: whether to use mixtures for the STM (default <code>t</code>)
-Marcus Pearce
+* stm-update-exclusion: whether to use update exclusion for the STM (default <code>t</code>)
-Marcus Pearce
+* stm-escape: the escape method for the STM (default <code>:x</code>)
 Marcus Pearce
-Marcus Pearce
+<code>RESAMPLING:OUTPUT-INFORMATION-CONTENT</code> takes the output of <code>RESAMPLING:DATASET-PREDICTION</code> and returns the average information content. It takes the following arguments:
 Marcus Pearce
-Marcus Pearce
+* predictions: the output of <code>RESAMPLING:DATASET-PREDICTION</code>
-Marcus Pearce
+* detail: an integer which determines how the information content is averaged (these are returned as multiple values):
-Marcus Pearce
+** 1: averaged over the entire dataset
-Marcus Pearce
+** 2: and also averaged over each composition
-Marcus Pearce
+** 3: and also for each event in each composition
 Marcus Pearce
-Marcus Pearce
+<code>RESAMPLING:FORMAT-INFORMATION-CONTENT</code> takes the output of <code>RESAMPLING:DATASET-PREDICTION</code> and writes it to file. It takes the following arguments:
 Marcus Pearce
-Marcus Pearce
+* predictions: the output of <code>RESAMPLING:DATASET-PREDICTION</code>
-Marcus Pearce
+* file: a string denoting a file
-Marcus Pearce
+* dataset-id: an integer reflecting the dataset-id
-Marcus Pearce
+* detail: an integer which determines how the information content is averaged (these are returned as multiple values):
-Marcus Pearce
+** 1: averaged over the entire dataset
-Marcus Pearce
+** 2: and also averaged over each composition
-Marcus Pearce
+** 3: and also for each event in each composition
 Marcus Pearce
-Marcus Pearce
+h2. Examples
 Marcus Pearce
-Marcus Pearce
+h3. To get mean information contents for each melody of dataset 0 in a list
 Marcus Pearce
-Marcus Pearce
+<pre>
-Marcus Pearce
+CL-USER> (resampling:output-information-content
-Marcus Pearce
+          (resampling:dataset-prediction 0 '(cpitch) '(cpintfref cpint))
-Marcus Pearce
+)
-Marcus Pearce
+.493305
-Marcus Pearce
+(2.1368716 2.8534691 2.6938546 2.6491673 2.4993074 2.6098127 2.7728052 2.772861
-Marcus Pearce
+.5921957 2.905856 2.3591626 2.957503 2.4042292 2.7562473 2.3996017 2.8073587
-Marcus Pearce
+.114944 1.7434102 2.2310295 2.6374347 2.361792 1.9476132 2.501488 2.5472867
-Marcus Pearce
+.1056154 2.8225484 2.134257 2.9162033 3.0715692 2.9012227 2.7291088 2.866882
-Marcus Pearce
+.8795822 2.4571223 2.9277062 2.7861307 2.6623116 2.3304622 2.4217033
-Marcus Pearce
+.0556943 2.4048684 2.914848 2.7182267 3.0894585 2.873869 1.8821808 2.640174
-Marcus Pearce
+.8165438 2.5423129 2.3011856 3.1477294 2.655349 2.5216308 2.0667994 3.2579045
-Marcus Pearce
+.573013 2.6035044 2.202191 2.622113 2.2621205 2.3617425 2.7526956 2.3281655
-Marcus Pearce
+.9357266 2.3372407 3.1848125 2.67367 2.1906006 2.7835917 2.6332111 3.206142
-Marcus Pearce
+.1426969 2.194259 2.415167 1.9769101 2.0870917 2.7844474 2.2373738 2.772138
-Marcus Pearce
+.9702199 1.724408 2.473073 2.2464263 2.2452457 2.688889 2.6299863 2.2223835
-Marcus Pearce
+.8082614 2.673671 2.7693706 2.3369458 2.5016947 2.3837066 2.3682225 2.795649
-Marcus Pearce
+.9063463 2.5880773 2.0457468 1.8635312 2.4522712 1.5877498 2.8802161
-Marcus Pearce
+.7988417 2.3125513 1.7245895 2.2404804 2.1694546 2.365556 1.5905867 1.3827317
-Marcus Pearce
+.2706041 3.023884 2.2864542 2.1259797 2.713626 2.1967313 2.5721254 2.5812547
-Marcus Pearce
+.8233812 2.3134546 2.6203637 2.945946 2.601433 2.1920888 2.3732007 2.440137
-Marcus Pearce
+.4291563 2.3676903 2.734724 3.0283954 2.8076048 2.7796154 2.326931 2.1779459
-Marcus Pearce
+.2570527 2.2688026 1.3976555 2.030298 2.640235 2.568248 2.6338177 2.157162
-Marcus Pearce
+.3915367 2.7873137 2.3088667 2.2176988 2.4402564 2.8062992 2.784044 2.4296925
-Marcus Pearce
+.3520193 2.6146257)
-Marcus Pearce
+</pre>
 Marcus Pearce
-Marcus Pearce
+h3. To write the information contents for each note of each melody in dataset 0 to a file
 Marcus Pearce
-Marcus Pearce
+<pre>
-Marcus Pearce
+CL-USER> (resampling:format-information-content
-Marcus Pearce
+          (resampling:dataset-prediction 0 '(cpitch) '(cpintfref cpint))
-Marcus Pearce
+          "/tmp/foo.dat"
 Marcus Pearce
-Marcus Pearce
+)
-Marcus Pearce
+</pre>
 Marcus Pearce
-Marcus Pearce
+h3. To simulate the experiments of Conklin & Witten (1995)
 Marcus Pearce
-Marcus Pearce
+<pre>
-Marcus Pearce
+CL-USER> (resampling:conkwit95)
-Marcus Pearce
+Simulation of the experiments of Conklin & Witten (1995, Table 4).
-Marcus Pearce
+System 1; Mean Information Content: 2.33
-Marcus Pearce
+System 2; Mean Information Content: 2.36
-Marcus Pearce
+System 3; Mean Information Content: 2.09
-Marcus Pearce
+System 4; Mean Information Content: 2.01
-Marcus Pearce
+System 5; Mean Information Content: 2.08
-Marcus Pearce
+System 6; Mean Information Content: 1.90
-Marcus Pearce
+System 7; Mean Information Content: 1.88
-Marcus Pearce
+System 8; Mean Information Content: 1.86
-Marcus Pearce
+NIL
-Marcus Pearce
+</pre>
 Marcus Pearce
-Marcus Pearce
+Compare with "Conklin & Witten [1995, JNMR, table 4]":http://www.sc.ehu.es/ccwbayes/members/conklin/papers/jnmr95.pdf
 Marcus Pearce
-Marcus Pearce
+h3. Viewpoint Selection
 Marcus Pearce
-Marcus Pearce
+Two functions are supplied for searching a space of viewpoints: <code>run-hill-climber</code> and <code>run-best-first</code>, which take 4 arguments:
 Marcus Pearce
-Marcus Pearce
+* a list of viewpoints: the algorithm searches through the space of combinations of these viewpoints
-Marcus Pearce
+* a start state (usually nil, the empty viewpoint system)
-Marcus Pearce
+* an evaluation function returning a numeric performance metric: e.g., the mean information content of the dataset returned by <code>dataset-prediction</code>
-Marcus Pearce
+* a symbol describing which way to optimise the metric: <code>:desc</code> mean lower values are better <code>:asc</code> mean greater values are better
 Marcus Pearce
-Marcus Pearce
+Here is an example:
 Marcus Pearce
-Marcus Pearce
+<pre>
-Marcus Pearce
+CL-USER> (viewpoint-selection:run-hill-climber
-Marcus Pearce
+          '(:cpitch :cpintfref :cpint :contour)
-Marcus Pearce
+          nil
-Marcus Pearce
+          #'(lambda (viewpoints)
-Marcus Pearce
+              (utils:round-to-nearest-decimal-place
-Marcus Pearce
+               (resampling:output-information-content
-Marcus Pearce
+                (resampling:dataset-prediction 0 '(cpitch) viewpoints :k 10 :models :both+)
-Marcus Pearce
+)
-Marcus Pearce
+))
-Marcus Pearce
+          :desc)
 Marcus Pearce
-Marcus Pearce
+ =============================================================================
-Marcus Pearce
+   System                                                Score
-Marcus Pearce
+ -----------------------------------------------------------------------------
-Marcus Pearce
+   NIL                                                   NIL
-Marcus Pearce
+   (CPITCH)                                              2.52
-Marcus Pearce
+   (CPINT CPITCH)                                        2.43
-Marcus Pearce
+   (CPINTFREF CPINT CPITCH)                              2.38
-Marcus Pearce
+ =============================================================================
-Marcus Pearce
+#S(VIEWPOINT-SELECTION::RECORD :STATE (:CPINTFREF :CPINT :CPITCH) :WEIGHT 2.38)
-Marcus Pearce
+</pre>
 Marcus Pearce
-Marcus Pearce
+Since this can be quite a time consuming process, there are also functions for caching the results.
 Marcus Pearce
-Marcus Pearce
+<pre>
-Marcus Pearce
+(initialise-vs-cache)
-Marcus Pearce
+(load-vs-cache filename package)
-Marcus Pearce
+(store-vs-cache filename package)
-Marcus Pearce
+</pre>