Mercurial > hg > camir-aes2014
comparison toolboxes/MIRtoolbox1.3.2/somtoolbox/som_demo1.m @ 0:e9a9cd732c1e tip
first hg version after svn
author | wolffd |
---|---|
date | Tue, 10 Feb 2015 15:05:51 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:e9a9cd732c1e |
---|---|
1 | |
2 %SOM_DEMO1 Basic properties and behaviour of the Self-Organizing Map. | |
3 | |
4 % Contributed to SOM Toolbox 2.0, February 11th, 2000 by Juha Vesanto | |
5 % http://www.cis.hut.fi/projects/somtoolbox/ | |
6 | |
7 % Version 1.0beta juuso 071197 | |
8 % Version 2.0beta juuso 030200 | |
9 | |
10 clf reset; | |
11 figure(gcf) | |
12 echo on | |
13 | |
14 | |
15 | |
16 clc | |
17 % ========================================================== | |
18 % SOM_DEMO1 - BEHAVIOUR AND PROPERTIES OF SOM | |
19 % ========================================================== | |
20 | |
21 % som_make - Create, initialize and train a SOM. | |
22 % som_randinit - Create and initialize a SOM. | |
23 % som_lininit - Create and initialize a SOM. | |
24 % som_seqtrain - Train a SOM. | |
25 % som_batchtrain - Train a SOM. | |
26 % som_bmus - Find best-matching units (BMUs). | |
27 % som_quality - Measure quality of SOM. | |
28 | |
29 % SELF-ORGANIZING MAP (SOM): | |
30 | |
31 % A self-organized map (SOM) is a "map" of the training data, | |
32 % dense where there is a lot of data and thin where the data | |
33 % density is low. | |
34 | |
35 % The map constitutes of neurons located on a regular map grid. | |
36 % The lattice of the grid can be either hexagonal or rectangular. | |
37 | |
38 subplot(1,2,1) | |
39 som_cplane('hexa',[10 15],'none') | |
40 title('Hexagonal SOM grid') | |
41 | |
42 subplot(1,2,2) | |
43 som_cplane('rect',[10 15],'none') | |
44 title('Rectangular SOM grid') | |
45 | |
46 % Each neuron (hexagon on the left, rectangle on the right) has an | |
47 % associated prototype vector. After training, neighboring neurons | |
48 % have similar prototype vectors. | |
49 | |
50 % The SOM can be used for data visualization, clustering (or | |
51 % classification), estimation and a variety of other purposes. | |
52 | |
53 pause % Strike any key to continue... | |
54 | |
55 clf | |
56 clc | |
57 % INITIALIZE AND TRAIN THE SELF-ORGANIZING MAP | |
58 % ============================================ | |
59 | |
60 % Here are 300 data points sampled from the unit square: | |
61 | |
62 D = rand(300,2); | |
63 | |
64 % The map will be a 2-dimensional grid of size 10 x 10. | |
65 | |
66 msize = [10 10]; | |
67 | |
68 % SOM_RANDINIT and SOM_LININIT can be used to initialize the | |
69 % prototype vectors in the map. The map size is actually an | |
70 % optional argument. If omitted, it is determined automatically | |
71 % based on the amount of data vectors and the principal | |
72 % eigenvectors of the data set. Below, the random initialization | |
73 % algorithm is used. | |
74 | |
75 sMap = som_randinit(D, 'msize', msize); | |
76 | |
77 % Actually, each map unit can be thought as having two sets | |
78 % of coordinates: | |
79 % (1) in the input space: the prototype vectors | |
80 % (2) in the output space: the position on the map | |
81 % In the two spaces, the map looks like this: | |
82 | |
83 subplot(1,3,1) | |
84 som_grid(sMap) | |
85 axis([0 11 0 11]), view(0,-90), title('Map in output space') | |
86 | |
87 subplot(1,3,2) | |
88 plot(D(:,1),D(:,2),'+r'), hold on | |
89 som_grid(sMap,'Coord',sMap.codebook) | |
90 title('Map in input space') | |
91 | |
92 % The black dots show positions of map units, and the gray lines | |
93 % show connections between neighboring map units. Since the map | |
94 % was initialized randomly, the positions in in the input space are | |
95 % completely disorganized. The red crosses are training data. | |
96 | |
97 pause % Strike any key to train the SOM... | |
98 | |
99 % During training, the map organizes and folds to the training | |
100 % data. Here, the sequential training algorithm is used: | |
101 | |
102 sMap = som_seqtrain(sMap,D,'radius',[5 1],'trainlen',10); | |
103 | |
104 subplot(1,3,3) | |
105 som_grid(sMap,'Coord',sMap.codebook) | |
106 hold on, plot(D(:,1),D(:,2),'+r') | |
107 title('Trained map') | |
108 | |
109 pause % Strike any key to view more closely the training process... | |
110 | |
111 | |
112 clf | |
113 | |
114 clc | |
115 % TRAINING THE SELF-ORGANIZING MAP | |
116 % ================================ | |
117 | |
118 % To get a better idea of what happens during training, let's look | |
119 % at how the map gradually unfolds and organizes itself. To make it | |
120 % even more clear, the map is now initialized so that it is away | |
121 % from the data. | |
122 | |
123 sMap = som_randinit(D,'msize',msize); | |
124 sMap.codebook = sMap.codebook + 1; | |
125 | |
126 subplot(1,2,1) | |
127 som_grid(sMap,'Coord',sMap.codebook) | |
128 hold on, plot(D(:,1),D(:,2),'+r'), hold off | |
129 title('Data and original map') | |
130 | |
131 % The training is based on two principles: | |
132 % | |
133 % Competitive learning: the prototype vector most similar to a | |
134 % data vector is modified so that it it is even more similar to | |
135 % it. This way the map learns the position of the data cloud. | |
136 % | |
137 % Cooperative learning: not only the most similar prototype | |
138 % vector, but also its neighbors on the map are moved towards the | |
139 % data vector. This way the map self-organizes. | |
140 | |
141 pause % Strike any key to train the map... | |
142 | |
143 echo off | |
144 subplot(1,2,2) | |
145 o = ones(5,1); | |
146 r = (1-[1:60]/60); | |
147 for i=1:60, | |
148 sMap = som_seqtrain(sMap,D,'tracking',0,... | |
149 'trainlen',5,'samples',... | |
150 'alpha',0.1*o,'radius',(4*r(i)+1)*o); | |
151 som_grid(sMap,'Coord',sMap.codebook) | |
152 hold on, plot(D(:,1),D(:,2),'+r'), hold off | |
153 title(sprintf('%d/300 training steps',5*i)) | |
154 drawnow | |
155 end | |
156 title('Sequential training after 300 steps') | |
157 echo on | |
158 | |
159 pause % Strike any key to continue with 3D data... | |
160 | |
161 clf | |
162 | |
163 clc | |
164 % TRAINING DATA: THE UNIT CUBE | |
165 % ============================ | |
166 | |
167 % Above, the map dimension was equal to input space dimension: both | |
168 % were 2-dimensional. Typically, the input space dimension is much | |
169 % higher than the 2-dimensional map. In this case the map cannot | |
170 % follow perfectly the data set any more but must find a balance | |
171 % between two goals: | |
172 | |
173 % - data representation accuracy | |
174 % - data set topology representation accuracy | |
175 | |
176 % Here are 500 data points sampled from the unit cube: | |
177 | |
178 D = rand(500,3); | |
179 | |
180 subplot(1,3,1), plot3(D(:,1),D(:,2),D(:,3),'+r') | |
181 view(3), axis on, rotate3d on | |
182 title('Data') | |
183 | |
184 % The ROTATE3D command enables you to rotate the picture by | |
185 % dragging the pointer above the picture with the leftmost mouse | |
186 % button pressed down. | |
187 | |
188 pause % Strike any key to train the SOM... | |
189 | |
190 | |
191 | |
192 | |
193 clc | |
194 % DEFAULT TRAINING PROCEDURE | |
195 % ========================== | |
196 | |
197 % Above, the initialization was done randomly and training was done | |
198 % with sequential training function (SOM_SEQTRAIN). By default, the | |
199 % initialization is linear, and batch training algorithm is | |
200 % used. In addition, the training is done in two phases: first with | |
201 % large neighborhood radius, and then finetuning with small radius. | |
202 | |
203 % The function SOM_MAKE can be used to both initialize and train | |
204 % the map using default parameters: | |
205 | |
206 pause % Strike any key to use SOM_MAKE... | |
207 | |
208 sMap = som_make(D); | |
209 | |
210 % Here, the linear initialization is done again, so that | |
211 % the results can be compared. | |
212 | |
213 sMap0 = som_lininit(D); | |
214 | |
215 subplot(1,3,2) | |
216 som_grid(sMap0,'Coord',sMap0.codebook,... | |
217 'Markersize',2,'Linecolor','k','Surf',sMap0.codebook(:,3)) | |
218 axis([0 1 0 1 0 1]), view(-120,-25), title('After initialization') | |
219 | |
220 subplot(1,3,3) | |
221 som_grid(sMap,'Coord',sMap.codebook,... | |
222 'Markersize',2,'Linecolor','k','Surf',sMap.codebook(:,3)) | |
223 axis([0 1 0 1 0 1]), view(3), title('After training'), hold on | |
224 | |
225 % Here you can see that the 2-dimensional map has folded into the | |
226 % 3-dimensional space in order to be able to capture the whole data | |
227 % space. | |
228 | |
229 pause % Strike any key to evaluate the quality of maps... | |
230 | |
231 | |
232 | |
233 clc | |
234 % BEST-MATCHING UNITS (BMU) | |
235 % ========================= | |
236 | |
237 % Before going to the quality, an important concept needs to be | |
238 % introduced: the Best-Matching Unit (BMU). The BMU of a data | |
239 % vector is the unit on the map whose model vector best resembles | |
240 % the data vector. In practise the similarity is measured as the | |
241 % minimum distance between data vector and each model vector on the | |
242 % map. The BMUs can be calculated using function SOM_BMUS. This | |
243 % function gives the index of the unit. | |
244 | |
245 % Here the BMU is searched for the origin point (from the | |
246 % trained map): | |
247 | |
248 bmu = som_bmus(sMap,[0 0 0]); | |
249 | |
250 % Here the corresponding unit is shown in the figure. You can | |
251 % rotate the figure to see better where the BMU is. | |
252 | |
253 co = sMap.codebook(bmu,:); | |
254 text(co(1),co(2),co(3),'BMU','Fontsize',20) | |
255 plot3([0 co(1)],[0 co(2)],[0 co(3)],'ro-') | |
256 | |
257 pause % Strike any key to analyze map quality... | |
258 | |
259 | |
260 | |
261 | |
262 clc | |
263 % SELF-ORGANIZING MAP QUALITY | |
264 % =========================== | |
265 | |
266 % The maps have two primary quality properties: | |
267 % - data representation accuracy | |
268 % - data set topology representation accuracy | |
269 | |
270 % The former is usually measured using average quantization error | |
271 % between data vectors and their BMUs on the map. For the latter | |
272 % several measures have been proposed, e.g. the topographic error | |
273 % measure: percentage of data vectors for which the first- and | |
274 % second-BMUs are not adjacent units. | |
275 | |
276 % Both measures have been implemented in the SOM_QUALITY function. | |
277 % Here are the quality measures for the trained map: | |
278 | |
279 [q,t] = som_quality(sMap,D) | |
280 | |
281 % And here for the initial map: | |
282 | |
283 [q0,t0] = som_quality(sMap0,D) | |
284 | |
285 % As can be seen, by folding the SOM has reduced the average | |
286 % quantization error, but on the other hand the topology | |
287 % representation capability has suffered. By using a larger final | |
288 % neighborhood radius in the training, the map becomes stiffer and | |
289 % preserves the topology of the data set better. | |
290 | |
291 | |
292 echo off | |
293 | |
294 |