wolffd@0
|
1
|
wolffd@0
|
2 %SOM_DEMO1 Basic properties and behaviour of the Self-Organizing Map.
|
wolffd@0
|
3
|
wolffd@0
|
4 % Contributed to SOM Toolbox 2.0, February 11th, 2000 by Juha Vesanto
|
wolffd@0
|
5 % http://www.cis.hut.fi/projects/somtoolbox/
|
wolffd@0
|
6
|
wolffd@0
|
7 % Version 1.0beta juuso 071197
|
wolffd@0
|
8 % Version 2.0beta juuso 030200
|
wolffd@0
|
9
|
wolffd@0
|
10 clf reset;
|
wolffd@0
|
11 figure(gcf)
|
wolffd@0
|
12 echo on
|
wolffd@0
|
13
|
wolffd@0
|
14
|
wolffd@0
|
15
|
wolffd@0
|
16 clc
|
wolffd@0
|
17 % ==========================================================
|
wolffd@0
|
18 % SOM_DEMO1 - BEHAVIOUR AND PROPERTIES OF SOM
|
wolffd@0
|
19 % ==========================================================
|
wolffd@0
|
20
|
wolffd@0
|
21 % som_make - Create, initialize and train a SOM.
|
wolffd@0
|
22 % som_randinit - Create and initialize a SOM.
|
wolffd@0
|
23 % som_lininit - Create and initialize a SOM.
|
wolffd@0
|
24 % som_seqtrain - Train a SOM.
|
wolffd@0
|
25 % som_batchtrain - Train a SOM.
|
wolffd@0
|
26 % som_bmus - Find best-matching units (BMUs).
|
wolffd@0
|
27 % som_quality - Measure quality of SOM.
|
wolffd@0
|
28
|
wolffd@0
|
29 % SELF-ORGANIZING MAP (SOM):
|
wolffd@0
|
30
|
wolffd@0
|
31 % A self-organized map (SOM) is a "map" of the training data,
|
wolffd@0
|
32 % dense where there is a lot of data and thin where the data
|
wolffd@0
|
33 % density is low.
|
wolffd@0
|
34
|
wolffd@0
|
35 % The map constitutes of neurons located on a regular map grid.
|
wolffd@0
|
36 % The lattice of the grid can be either hexagonal or rectangular.
|
wolffd@0
|
37
|
wolffd@0
|
38 subplot(1,2,1)
|
wolffd@0
|
39 som_cplane('hexa',[10 15],'none')
|
wolffd@0
|
40 title('Hexagonal SOM grid')
|
wolffd@0
|
41
|
wolffd@0
|
42 subplot(1,2,2)
|
wolffd@0
|
43 som_cplane('rect',[10 15],'none')
|
wolffd@0
|
44 title('Rectangular SOM grid')
|
wolffd@0
|
45
|
wolffd@0
|
46 % Each neuron (hexagon on the left, rectangle on the right) has an
|
wolffd@0
|
47 % associated prototype vector. After training, neighboring neurons
|
wolffd@0
|
48 % have similar prototype vectors.
|
wolffd@0
|
49
|
wolffd@0
|
50 % The SOM can be used for data visualization, clustering (or
|
wolffd@0
|
51 % classification), estimation and a variety of other purposes.
|
wolffd@0
|
52
|
wolffd@0
|
53 pause % Strike any key to continue...
|
wolffd@0
|
54
|
wolffd@0
|
55 clf
|
wolffd@0
|
56 clc
|
wolffd@0
|
57 % INITIALIZE AND TRAIN THE SELF-ORGANIZING MAP
|
wolffd@0
|
58 % ============================================
|
wolffd@0
|
59
|
wolffd@0
|
60 % Here are 300 data points sampled from the unit square:
|
wolffd@0
|
61
|
wolffd@0
|
62 D = rand(300,2);
|
wolffd@0
|
63
|
wolffd@0
|
64 % The map will be a 2-dimensional grid of size 10 x 10.
|
wolffd@0
|
65
|
wolffd@0
|
66 msize = [10 10];
|
wolffd@0
|
67
|
wolffd@0
|
68 % SOM_RANDINIT and SOM_LININIT can be used to initialize the
|
wolffd@0
|
69 % prototype vectors in the map. The map size is actually an
|
wolffd@0
|
70 % optional argument. If omitted, it is determined automatically
|
wolffd@0
|
71 % based on the amount of data vectors and the principal
|
wolffd@0
|
72 % eigenvectors of the data set. Below, the random initialization
|
wolffd@0
|
73 % algorithm is used.
|
wolffd@0
|
74
|
wolffd@0
|
75 sMap = som_randinit(D, 'msize', msize);
|
wolffd@0
|
76
|
wolffd@0
|
77 % Actually, each map unit can be thought as having two sets
|
wolffd@0
|
78 % of coordinates:
|
wolffd@0
|
79 % (1) in the input space: the prototype vectors
|
wolffd@0
|
80 % (2) in the output space: the position on the map
|
wolffd@0
|
81 % In the two spaces, the map looks like this:
|
wolffd@0
|
82
|
wolffd@0
|
83 subplot(1,3,1)
|
wolffd@0
|
84 som_grid(sMap)
|
wolffd@0
|
85 axis([0 11 0 11]), view(0,-90), title('Map in output space')
|
wolffd@0
|
86
|
wolffd@0
|
87 subplot(1,3,2)
|
wolffd@0
|
88 plot(D(:,1),D(:,2),'+r'), hold on
|
wolffd@0
|
89 som_grid(sMap,'Coord',sMap.codebook)
|
wolffd@0
|
90 title('Map in input space')
|
wolffd@0
|
91
|
wolffd@0
|
92 % The black dots show positions of map units, and the gray lines
|
wolffd@0
|
93 % show connections between neighboring map units. Since the map
|
wolffd@0
|
94 % was initialized randomly, the positions in in the input space are
|
wolffd@0
|
95 % completely disorganized. The red crosses are training data.
|
wolffd@0
|
96
|
wolffd@0
|
97 pause % Strike any key to train the SOM...
|
wolffd@0
|
98
|
wolffd@0
|
99 % During training, the map organizes and folds to the training
|
wolffd@0
|
100 % data. Here, the sequential training algorithm is used:
|
wolffd@0
|
101
|
wolffd@0
|
102 sMap = som_seqtrain(sMap,D,'radius',[5 1],'trainlen',10);
|
wolffd@0
|
103
|
wolffd@0
|
104 subplot(1,3,3)
|
wolffd@0
|
105 som_grid(sMap,'Coord',sMap.codebook)
|
wolffd@0
|
106 hold on, plot(D(:,1),D(:,2),'+r')
|
wolffd@0
|
107 title('Trained map')
|
wolffd@0
|
108
|
wolffd@0
|
109 pause % Strike any key to view more closely the training process...
|
wolffd@0
|
110
|
wolffd@0
|
111
|
wolffd@0
|
112 clf
|
wolffd@0
|
113
|
wolffd@0
|
114 clc
|
wolffd@0
|
115 % TRAINING THE SELF-ORGANIZING MAP
|
wolffd@0
|
116 % ================================
|
wolffd@0
|
117
|
wolffd@0
|
118 % To get a better idea of what happens during training, let's look
|
wolffd@0
|
119 % at how the map gradually unfolds and organizes itself. To make it
|
wolffd@0
|
120 % even more clear, the map is now initialized so that it is away
|
wolffd@0
|
121 % from the data.
|
wolffd@0
|
122
|
wolffd@0
|
123 sMap = som_randinit(D,'msize',msize);
|
wolffd@0
|
124 sMap.codebook = sMap.codebook + 1;
|
wolffd@0
|
125
|
wolffd@0
|
126 subplot(1,2,1)
|
wolffd@0
|
127 som_grid(sMap,'Coord',sMap.codebook)
|
wolffd@0
|
128 hold on, plot(D(:,1),D(:,2),'+r'), hold off
|
wolffd@0
|
129 title('Data and original map')
|
wolffd@0
|
130
|
wolffd@0
|
131 % The training is based on two principles:
|
wolffd@0
|
132 %
|
wolffd@0
|
133 % Competitive learning: the prototype vector most similar to a
|
wolffd@0
|
134 % data vector is modified so that it it is even more similar to
|
wolffd@0
|
135 % it. This way the map learns the position of the data cloud.
|
wolffd@0
|
136 %
|
wolffd@0
|
137 % Cooperative learning: not only the most similar prototype
|
wolffd@0
|
138 % vector, but also its neighbors on the map are moved towards the
|
wolffd@0
|
139 % data vector. This way the map self-organizes.
|
wolffd@0
|
140
|
wolffd@0
|
141 pause % Strike any key to train the map...
|
wolffd@0
|
142
|
wolffd@0
|
143 echo off
|
wolffd@0
|
144 subplot(1,2,2)
|
wolffd@0
|
145 o = ones(5,1);
|
wolffd@0
|
146 r = (1-[1:60]/60);
|
wolffd@0
|
147 for i=1:60,
|
wolffd@0
|
148 sMap = som_seqtrain(sMap,D,'tracking',0,...
|
wolffd@0
|
149 'trainlen',5,'samples',...
|
wolffd@0
|
150 'alpha',0.1*o,'radius',(4*r(i)+1)*o);
|
wolffd@0
|
151 som_grid(sMap,'Coord',sMap.codebook)
|
wolffd@0
|
152 hold on, plot(D(:,1),D(:,2),'+r'), hold off
|
wolffd@0
|
153 title(sprintf('%d/300 training steps',5*i))
|
wolffd@0
|
154 drawnow
|
wolffd@0
|
155 end
|
wolffd@0
|
156 title('Sequential training after 300 steps')
|
wolffd@0
|
157 echo on
|
wolffd@0
|
158
|
wolffd@0
|
159 pause % Strike any key to continue with 3D data...
|
wolffd@0
|
160
|
wolffd@0
|
161 clf
|
wolffd@0
|
162
|
wolffd@0
|
163 clc
|
wolffd@0
|
164 % TRAINING DATA: THE UNIT CUBE
|
wolffd@0
|
165 % ============================
|
wolffd@0
|
166
|
wolffd@0
|
167 % Above, the map dimension was equal to input space dimension: both
|
wolffd@0
|
168 % were 2-dimensional. Typically, the input space dimension is much
|
wolffd@0
|
169 % higher than the 2-dimensional map. In this case the map cannot
|
wolffd@0
|
170 % follow perfectly the data set any more but must find a balance
|
wolffd@0
|
171 % between two goals:
|
wolffd@0
|
172
|
wolffd@0
|
173 % - data representation accuracy
|
wolffd@0
|
174 % - data set topology representation accuracy
|
wolffd@0
|
175
|
wolffd@0
|
176 % Here are 500 data points sampled from the unit cube:
|
wolffd@0
|
177
|
wolffd@0
|
178 D = rand(500,3);
|
wolffd@0
|
179
|
wolffd@0
|
180 subplot(1,3,1), plot3(D(:,1),D(:,2),D(:,3),'+r')
|
wolffd@0
|
181 view(3), axis on, rotate3d on
|
wolffd@0
|
182 title('Data')
|
wolffd@0
|
183
|
wolffd@0
|
184 % The ROTATE3D command enables you to rotate the picture by
|
wolffd@0
|
185 % dragging the pointer above the picture with the leftmost mouse
|
wolffd@0
|
186 % button pressed down.
|
wolffd@0
|
187
|
wolffd@0
|
188 pause % Strike any key to train the SOM...
|
wolffd@0
|
189
|
wolffd@0
|
190
|
wolffd@0
|
191
|
wolffd@0
|
192
|
wolffd@0
|
193 clc
|
wolffd@0
|
194 % DEFAULT TRAINING PROCEDURE
|
wolffd@0
|
195 % ==========================
|
wolffd@0
|
196
|
wolffd@0
|
197 % Above, the initialization was done randomly and training was done
|
wolffd@0
|
198 % with sequential training function (SOM_SEQTRAIN). By default, the
|
wolffd@0
|
199 % initialization is linear, and batch training algorithm is
|
wolffd@0
|
200 % used. In addition, the training is done in two phases: first with
|
wolffd@0
|
201 % large neighborhood radius, and then finetuning with small radius.
|
wolffd@0
|
202
|
wolffd@0
|
203 % The function SOM_MAKE can be used to both initialize and train
|
wolffd@0
|
204 % the map using default parameters:
|
wolffd@0
|
205
|
wolffd@0
|
206 pause % Strike any key to use SOM_MAKE...
|
wolffd@0
|
207
|
wolffd@0
|
208 sMap = som_make(D);
|
wolffd@0
|
209
|
wolffd@0
|
210 % Here, the linear initialization is done again, so that
|
wolffd@0
|
211 % the results can be compared.
|
wolffd@0
|
212
|
wolffd@0
|
213 sMap0 = som_lininit(D);
|
wolffd@0
|
214
|
wolffd@0
|
215 subplot(1,3,2)
|
wolffd@0
|
216 som_grid(sMap0,'Coord',sMap0.codebook,...
|
wolffd@0
|
217 'Markersize',2,'Linecolor','k','Surf',sMap0.codebook(:,3))
|
wolffd@0
|
218 axis([0 1 0 1 0 1]), view(-120,-25), title('After initialization')
|
wolffd@0
|
219
|
wolffd@0
|
220 subplot(1,3,3)
|
wolffd@0
|
221 som_grid(sMap,'Coord',sMap.codebook,...
|
wolffd@0
|
222 'Markersize',2,'Linecolor','k','Surf',sMap.codebook(:,3))
|
wolffd@0
|
223 axis([0 1 0 1 0 1]), view(3), title('After training'), hold on
|
wolffd@0
|
224
|
wolffd@0
|
225 % Here you can see that the 2-dimensional map has folded into the
|
wolffd@0
|
226 % 3-dimensional space in order to be able to capture the whole data
|
wolffd@0
|
227 % space.
|
wolffd@0
|
228
|
wolffd@0
|
229 pause % Strike any key to evaluate the quality of maps...
|
wolffd@0
|
230
|
wolffd@0
|
231
|
wolffd@0
|
232
|
wolffd@0
|
233 clc
|
wolffd@0
|
234 % BEST-MATCHING UNITS (BMU)
|
wolffd@0
|
235 % =========================
|
wolffd@0
|
236
|
wolffd@0
|
237 % Before going to the quality, an important concept needs to be
|
wolffd@0
|
238 % introduced: the Best-Matching Unit (BMU). The BMU of a data
|
wolffd@0
|
239 % vector is the unit on the map whose model vector best resembles
|
wolffd@0
|
240 % the data vector. In practise the similarity is measured as the
|
wolffd@0
|
241 % minimum distance between data vector and each model vector on the
|
wolffd@0
|
242 % map. The BMUs can be calculated using function SOM_BMUS. This
|
wolffd@0
|
243 % function gives the index of the unit.
|
wolffd@0
|
244
|
wolffd@0
|
245 % Here the BMU is searched for the origin point (from the
|
wolffd@0
|
246 % trained map):
|
wolffd@0
|
247
|
wolffd@0
|
248 bmu = som_bmus(sMap,[0 0 0]);
|
wolffd@0
|
249
|
wolffd@0
|
250 % Here the corresponding unit is shown in the figure. You can
|
wolffd@0
|
251 % rotate the figure to see better where the BMU is.
|
wolffd@0
|
252
|
wolffd@0
|
253 co = sMap.codebook(bmu,:);
|
wolffd@0
|
254 text(co(1),co(2),co(3),'BMU','Fontsize',20)
|
wolffd@0
|
255 plot3([0 co(1)],[0 co(2)],[0 co(3)],'ro-')
|
wolffd@0
|
256
|
wolffd@0
|
257 pause % Strike any key to analyze map quality...
|
wolffd@0
|
258
|
wolffd@0
|
259
|
wolffd@0
|
260
|
wolffd@0
|
261
|
wolffd@0
|
262 clc
|
wolffd@0
|
263 % SELF-ORGANIZING MAP QUALITY
|
wolffd@0
|
264 % ===========================
|
wolffd@0
|
265
|
wolffd@0
|
266 % The maps have two primary quality properties:
|
wolffd@0
|
267 % - data representation accuracy
|
wolffd@0
|
268 % - data set topology representation accuracy
|
wolffd@0
|
269
|
wolffd@0
|
270 % The former is usually measured using average quantization error
|
wolffd@0
|
271 % between data vectors and their BMUs on the map. For the latter
|
wolffd@0
|
272 % several measures have been proposed, e.g. the topographic error
|
wolffd@0
|
273 % measure: percentage of data vectors for which the first- and
|
wolffd@0
|
274 % second-BMUs are not adjacent units.
|
wolffd@0
|
275
|
wolffd@0
|
276 % Both measures have been implemented in the SOM_QUALITY function.
|
wolffd@0
|
277 % Here are the quality measures for the trained map:
|
wolffd@0
|
278
|
wolffd@0
|
279 [q,t] = som_quality(sMap,D)
|
wolffd@0
|
280
|
wolffd@0
|
281 % And here for the initial map:
|
wolffd@0
|
282
|
wolffd@0
|
283 [q0,t0] = som_quality(sMap0,D)
|
wolffd@0
|
284
|
wolffd@0
|
285 % As can be seen, by folding the SOM has reduced the average
|
wolffd@0
|
286 % quantization error, but on the other hand the topology
|
wolffd@0
|
287 % representation capability has suffered. By using a larger final
|
wolffd@0
|
288 % neighborhood radius in the training, the map becomes stiffer and
|
wolffd@0
|
289 % preserves the topology of the data set better.
|
wolffd@0
|
290
|
wolffd@0
|
291
|
wolffd@0
|
292 echo off
|
wolffd@0
|
293
|
wolffd@0
|
294
|