annotate toolboxes/MIRtoolbox1.3.2/somtoolbox/som_demo1.m @ 0:cc4b1211e677 tip

initial commit to HG from Changeset: 646 (e263d8a21543) added further path and more save "camirversion.m"
author Daniel Wolff
date Fri, 19 Aug 2016 13:07:06 +0200
parents
children
rev   line source
Daniel@0 1
Daniel@0 2 %SOM_DEMO1 Basic properties and behaviour of the Self-Organizing Map.
Daniel@0 3
Daniel@0 4 % Contributed to SOM Toolbox 2.0, February 11th, 2000 by Juha Vesanto
Daniel@0 5 % http://www.cis.hut.fi/projects/somtoolbox/
Daniel@0 6
Daniel@0 7 % Version 1.0beta juuso 071197
Daniel@0 8 % Version 2.0beta juuso 030200
Daniel@0 9
Daniel@0 10 clf reset;
Daniel@0 11 figure(gcf)
Daniel@0 12 echo on
Daniel@0 13
Daniel@0 14
Daniel@0 15
Daniel@0 16 clc
Daniel@0 17 % ==========================================================
Daniel@0 18 % SOM_DEMO1 - BEHAVIOUR AND PROPERTIES OF SOM
Daniel@0 19 % ==========================================================
Daniel@0 20
Daniel@0 21 % som_make - Create, initialize and train a SOM.
Daniel@0 22 % som_randinit - Create and initialize a SOM.
Daniel@0 23 % som_lininit - Create and initialize a SOM.
Daniel@0 24 % som_seqtrain - Train a SOM.
Daniel@0 25 % som_batchtrain - Train a SOM.
Daniel@0 26 % som_bmus - Find best-matching units (BMUs).
Daniel@0 27 % som_quality - Measure quality of SOM.
Daniel@0 28
Daniel@0 29 % SELF-ORGANIZING MAP (SOM):
Daniel@0 30
Daniel@0 31 % A self-organized map (SOM) is a "map" of the training data,
Daniel@0 32 % dense where there is a lot of data and thin where the data
Daniel@0 33 % density is low.
Daniel@0 34
Daniel@0 35 % The map constitutes of neurons located on a regular map grid.
Daniel@0 36 % The lattice of the grid can be either hexagonal or rectangular.
Daniel@0 37
Daniel@0 38 subplot(1,2,1)
Daniel@0 39 som_cplane('hexa',[10 15],'none')
Daniel@0 40 title('Hexagonal SOM grid')
Daniel@0 41
Daniel@0 42 subplot(1,2,2)
Daniel@0 43 som_cplane('rect',[10 15],'none')
Daniel@0 44 title('Rectangular SOM grid')
Daniel@0 45
Daniel@0 46 % Each neuron (hexagon on the left, rectangle on the right) has an
Daniel@0 47 % associated prototype vector. After training, neighboring neurons
Daniel@0 48 % have similar prototype vectors.
Daniel@0 49
Daniel@0 50 % The SOM can be used for data visualization, clustering (or
Daniel@0 51 % classification), estimation and a variety of other purposes.
Daniel@0 52
Daniel@0 53 pause % Strike any key to continue...
Daniel@0 54
Daniel@0 55 clf
Daniel@0 56 clc
Daniel@0 57 % INITIALIZE AND TRAIN THE SELF-ORGANIZING MAP
Daniel@0 58 % ============================================
Daniel@0 59
Daniel@0 60 % Here are 300 data points sampled from the unit square:
Daniel@0 61
Daniel@0 62 D = rand(300,2);
Daniel@0 63
Daniel@0 64 % The map will be a 2-dimensional grid of size 10 x 10.
Daniel@0 65
Daniel@0 66 msize = [10 10];
Daniel@0 67
Daniel@0 68 % SOM_RANDINIT and SOM_LININIT can be used to initialize the
Daniel@0 69 % prototype vectors in the map. The map size is actually an
Daniel@0 70 % optional argument. If omitted, it is determined automatically
Daniel@0 71 % based on the amount of data vectors and the principal
Daniel@0 72 % eigenvectors of the data set. Below, the random initialization
Daniel@0 73 % algorithm is used.
Daniel@0 74
Daniel@0 75 sMap = som_randinit(D, 'msize', msize);
Daniel@0 76
Daniel@0 77 % Actually, each map unit can be thought as having two sets
Daniel@0 78 % of coordinates:
Daniel@0 79 % (1) in the input space: the prototype vectors
Daniel@0 80 % (2) in the output space: the position on the map
Daniel@0 81 % In the two spaces, the map looks like this:
Daniel@0 82
Daniel@0 83 subplot(1,3,1)
Daniel@0 84 som_grid(sMap)
Daniel@0 85 axis([0 11 0 11]), view(0,-90), title('Map in output space')
Daniel@0 86
Daniel@0 87 subplot(1,3,2)
Daniel@0 88 plot(D(:,1),D(:,2),'+r'), hold on
Daniel@0 89 som_grid(sMap,'Coord',sMap.codebook)
Daniel@0 90 title('Map in input space')
Daniel@0 91
Daniel@0 92 % The black dots show positions of map units, and the gray lines
Daniel@0 93 % show connections between neighboring map units. Since the map
Daniel@0 94 % was initialized randomly, the positions in in the input space are
Daniel@0 95 % completely disorganized. The red crosses are training data.
Daniel@0 96
Daniel@0 97 pause % Strike any key to train the SOM...
Daniel@0 98
Daniel@0 99 % During training, the map organizes and folds to the training
Daniel@0 100 % data. Here, the sequential training algorithm is used:
Daniel@0 101
Daniel@0 102 sMap = som_seqtrain(sMap,D,'radius',[5 1],'trainlen',10);
Daniel@0 103
Daniel@0 104 subplot(1,3,3)
Daniel@0 105 som_grid(sMap,'Coord',sMap.codebook)
Daniel@0 106 hold on, plot(D(:,1),D(:,2),'+r')
Daniel@0 107 title('Trained map')
Daniel@0 108
Daniel@0 109 pause % Strike any key to view more closely the training process...
Daniel@0 110
Daniel@0 111
Daniel@0 112 clf
Daniel@0 113
Daniel@0 114 clc
Daniel@0 115 % TRAINING THE SELF-ORGANIZING MAP
Daniel@0 116 % ================================
Daniel@0 117
Daniel@0 118 % To get a better idea of what happens during training, let's look
Daniel@0 119 % at how the map gradually unfolds and organizes itself. To make it
Daniel@0 120 % even more clear, the map is now initialized so that it is away
Daniel@0 121 % from the data.
Daniel@0 122
Daniel@0 123 sMap = som_randinit(D,'msize',msize);
Daniel@0 124 sMap.codebook = sMap.codebook + 1;
Daniel@0 125
Daniel@0 126 subplot(1,2,1)
Daniel@0 127 som_grid(sMap,'Coord',sMap.codebook)
Daniel@0 128 hold on, plot(D(:,1),D(:,2),'+r'), hold off
Daniel@0 129 title('Data and original map')
Daniel@0 130
Daniel@0 131 % The training is based on two principles:
Daniel@0 132 %
Daniel@0 133 % Competitive learning: the prototype vector most similar to a
Daniel@0 134 % data vector is modified so that it it is even more similar to
Daniel@0 135 % it. This way the map learns the position of the data cloud.
Daniel@0 136 %
Daniel@0 137 % Cooperative learning: not only the most similar prototype
Daniel@0 138 % vector, but also its neighbors on the map are moved towards the
Daniel@0 139 % data vector. This way the map self-organizes.
Daniel@0 140
Daniel@0 141 pause % Strike any key to train the map...
Daniel@0 142
Daniel@0 143 echo off
Daniel@0 144 subplot(1,2,2)
Daniel@0 145 o = ones(5,1);
Daniel@0 146 r = (1-[1:60]/60);
Daniel@0 147 for i=1:60,
Daniel@0 148 sMap = som_seqtrain(sMap,D,'tracking',0,...
Daniel@0 149 'trainlen',5,'samples',...
Daniel@0 150 'alpha',0.1*o,'radius',(4*r(i)+1)*o);
Daniel@0 151 som_grid(sMap,'Coord',sMap.codebook)
Daniel@0 152 hold on, plot(D(:,1),D(:,2),'+r'), hold off
Daniel@0 153 title(sprintf('%d/300 training steps',5*i))
Daniel@0 154 drawnow
Daniel@0 155 end
Daniel@0 156 title('Sequential training after 300 steps')
Daniel@0 157 echo on
Daniel@0 158
Daniel@0 159 pause % Strike any key to continue with 3D data...
Daniel@0 160
Daniel@0 161 clf
Daniel@0 162
Daniel@0 163 clc
Daniel@0 164 % TRAINING DATA: THE UNIT CUBE
Daniel@0 165 % ============================
Daniel@0 166
Daniel@0 167 % Above, the map dimension was equal to input space dimension: both
Daniel@0 168 % were 2-dimensional. Typically, the input space dimension is much
Daniel@0 169 % higher than the 2-dimensional map. In this case the map cannot
Daniel@0 170 % follow perfectly the data set any more but must find a balance
Daniel@0 171 % between two goals:
Daniel@0 172
Daniel@0 173 % - data representation accuracy
Daniel@0 174 % - data set topology representation accuracy
Daniel@0 175
Daniel@0 176 % Here are 500 data points sampled from the unit cube:
Daniel@0 177
Daniel@0 178 D = rand(500,3);
Daniel@0 179
Daniel@0 180 subplot(1,3,1), plot3(D(:,1),D(:,2),D(:,3),'+r')
Daniel@0 181 view(3), axis on, rotate3d on
Daniel@0 182 title('Data')
Daniel@0 183
Daniel@0 184 % The ROTATE3D command enables you to rotate the picture by
Daniel@0 185 % dragging the pointer above the picture with the leftmost mouse
Daniel@0 186 % button pressed down.
Daniel@0 187
Daniel@0 188 pause % Strike any key to train the SOM...
Daniel@0 189
Daniel@0 190
Daniel@0 191
Daniel@0 192
Daniel@0 193 clc
Daniel@0 194 % DEFAULT TRAINING PROCEDURE
Daniel@0 195 % ==========================
Daniel@0 196
Daniel@0 197 % Above, the initialization was done randomly and training was done
Daniel@0 198 % with sequential training function (SOM_SEQTRAIN). By default, the
Daniel@0 199 % initialization is linear, and batch training algorithm is
Daniel@0 200 % used. In addition, the training is done in two phases: first with
Daniel@0 201 % large neighborhood radius, and then finetuning with small radius.
Daniel@0 202
Daniel@0 203 % The function SOM_MAKE can be used to both initialize and train
Daniel@0 204 % the map using default parameters:
Daniel@0 205
Daniel@0 206 pause % Strike any key to use SOM_MAKE...
Daniel@0 207
Daniel@0 208 sMap = som_make(D);
Daniel@0 209
Daniel@0 210 % Here, the linear initialization is done again, so that
Daniel@0 211 % the results can be compared.
Daniel@0 212
Daniel@0 213 sMap0 = som_lininit(D);
Daniel@0 214
Daniel@0 215 subplot(1,3,2)
Daniel@0 216 som_grid(sMap0,'Coord',sMap0.codebook,...
Daniel@0 217 'Markersize',2,'Linecolor','k','Surf',sMap0.codebook(:,3))
Daniel@0 218 axis([0 1 0 1 0 1]), view(-120,-25), title('After initialization')
Daniel@0 219
Daniel@0 220 subplot(1,3,3)
Daniel@0 221 som_grid(sMap,'Coord',sMap.codebook,...
Daniel@0 222 'Markersize',2,'Linecolor','k','Surf',sMap.codebook(:,3))
Daniel@0 223 axis([0 1 0 1 0 1]), view(3), title('After training'), hold on
Daniel@0 224
Daniel@0 225 % Here you can see that the 2-dimensional map has folded into the
Daniel@0 226 % 3-dimensional space in order to be able to capture the whole data
Daniel@0 227 % space.
Daniel@0 228
Daniel@0 229 pause % Strike any key to evaluate the quality of maps...
Daniel@0 230
Daniel@0 231
Daniel@0 232
Daniel@0 233 clc
Daniel@0 234 % BEST-MATCHING UNITS (BMU)
Daniel@0 235 % =========================
Daniel@0 236
Daniel@0 237 % Before going to the quality, an important concept needs to be
Daniel@0 238 % introduced: the Best-Matching Unit (BMU). The BMU of a data
Daniel@0 239 % vector is the unit on the map whose model vector best resembles
Daniel@0 240 % the data vector. In practise the similarity is measured as the
Daniel@0 241 % minimum distance between data vector and each model vector on the
Daniel@0 242 % map. The BMUs can be calculated using function SOM_BMUS. This
Daniel@0 243 % function gives the index of the unit.
Daniel@0 244
Daniel@0 245 % Here the BMU is searched for the origin point (from the
Daniel@0 246 % trained map):
Daniel@0 247
Daniel@0 248 bmu = som_bmus(sMap,[0 0 0]);
Daniel@0 249
Daniel@0 250 % Here the corresponding unit is shown in the figure. You can
Daniel@0 251 % rotate the figure to see better where the BMU is.
Daniel@0 252
Daniel@0 253 co = sMap.codebook(bmu,:);
Daniel@0 254 text(co(1),co(2),co(3),'BMU','Fontsize',20)
Daniel@0 255 plot3([0 co(1)],[0 co(2)],[0 co(3)],'ro-')
Daniel@0 256
Daniel@0 257 pause % Strike any key to analyze map quality...
Daniel@0 258
Daniel@0 259
Daniel@0 260
Daniel@0 261
Daniel@0 262 clc
Daniel@0 263 % SELF-ORGANIZING MAP QUALITY
Daniel@0 264 % ===========================
Daniel@0 265
Daniel@0 266 % The maps have two primary quality properties:
Daniel@0 267 % - data representation accuracy
Daniel@0 268 % - data set topology representation accuracy
Daniel@0 269
Daniel@0 270 % The former is usually measured using average quantization error
Daniel@0 271 % between data vectors and their BMUs on the map. For the latter
Daniel@0 272 % several measures have been proposed, e.g. the topographic error
Daniel@0 273 % measure: percentage of data vectors for which the first- and
Daniel@0 274 % second-BMUs are not adjacent units.
Daniel@0 275
Daniel@0 276 % Both measures have been implemented in the SOM_QUALITY function.
Daniel@0 277 % Here are the quality measures for the trained map:
Daniel@0 278
Daniel@0 279 [q,t] = som_quality(sMap,D)
Daniel@0 280
Daniel@0 281 % And here for the initial map:
Daniel@0 282
Daniel@0 283 [q0,t0] = som_quality(sMap0,D)
Daniel@0 284
Daniel@0 285 % As can be seen, by folding the SOM has reduced the average
Daniel@0 286 % quantization error, but on the other hand the topology
Daniel@0 287 % representation capability has suffered. By using a larger final
Daniel@0 288 % neighborhood radius in the training, the map becomes stiffer and
Daniel@0 289 % preserves the topology of the data set better.
Daniel@0 290
Daniel@0 291
Daniel@0 292 echo off
Daniel@0 293
Daniel@0 294