Mercurial > hg > camir-aes2014
comparison toolboxes/MIRtoolbox1.3.2/somtoolbox/som_demo2.m @ 0:e9a9cd732c1e tip
first hg version after svn
author | wolffd |
---|---|
date | Tue, 10 Feb 2015 15:05:51 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:e9a9cd732c1e |
---|---|
1 | |
2 %SOM_DEMO2 Basic usage of the SOM Toolbox. | |
3 | |
4 % Contributed to SOM Toolbox 2.0, February 11th, 2000 by Juha Vesanto | |
5 % http://www.cis.hut.fi/projects/somtoolbox/ | |
6 | |
7 % Version 1.0beta juuso 071197 | |
8 % Version 2.0beta juuso 070200 | |
9 | |
10 clf reset; | |
11 figure(gcf) | |
12 echo on | |
13 | |
14 | |
15 | |
16 clc | |
17 % ========================================================== | |
18 % SOM_DEMO2 - BASIC USAGE OF SOM TOOLBOX | |
19 % ========================================================== | |
20 | |
21 % som_data_struct - Create a data struct. | |
22 % som_read_data - Read data from file. | |
23 % | |
24 % som_normalize - Normalize data. | |
25 % som_denormalize - Denormalize data. | |
26 % | |
27 % som_make - Initialize and train the map. | |
28 % | |
29 % som_show - Visualize map. | |
30 % som_show_add - Add markers on som_show visualization. | |
31 % som_grid - Visualization with free coordinates. | |
32 % | |
33 % som_autolabel - Give labels to map. | |
34 % som_hits - Calculate hit histogram for the map. | |
35 | |
36 % BASIC USAGE OF THE SOM TOOLBOX | |
37 | |
38 % The basic usage of the SOM Toolbox proceeds like this: | |
39 % 1. construct data set | |
40 % 2. normalize it | |
41 % 3. train the map | |
42 % 4. visualize map | |
43 % 5. analyse results | |
44 | |
45 % The four first items are - if default options are used - very | |
46 % simple operations, each executable with a single command. For | |
47 % the last, several different kinds of functions are provided in | |
48 % the Toolbox, but as the needs of analysis vary, a general default | |
49 % function or procedure does not exist. | |
50 | |
51 pause % Strike any key to construct data... | |
52 | |
53 | |
54 | |
55 clc | |
56 % STEP 1: CONSTRUCT DATA | |
57 % ====================== | |
58 | |
59 % The SOM Toolbox has a special struct, called data struct, which | |
60 % is used to group information regarding the data set in one | |
61 % place. | |
62 | |
63 % Here, a data struct is created using function SOM_DATA_STRUCT. | |
64 % First argument is the data matrix itself, then is the name | |
65 % given to the data set, and the names of the components | |
66 % (variables) in the data matrix. | |
67 | |
68 D = rand(1000,3); % 1000 samples from unit cube | |
69 sData = som_data_struct(D,'name','unit cube','comp_names',{'x','y','z'}); | |
70 | |
71 % Another option is to read the data directly from an ASCII file. | |
72 % Here, the IRIS data set is loaded from a file (please make sure | |
73 % the file can be found from the current path): | |
74 | |
75 try, | |
76 sDiris = som_read_data('iris.data'); | |
77 catch | |
78 echo off | |
79 | |
80 warning('File ''iris.data'' not found. Using simulated data instead.') | |
81 | |
82 D = randn(50,4); | |
83 D(:,1) = D(:,1)+5; D(:,2) = D(:,2)+3.5; | |
84 D(:,3) = D(:,3)/2+1.5; D(:,4) = D(:,4)/2+0.3; | |
85 D(find(D(:)<=0)) = 0.01; | |
86 | |
87 D2 = randn(100,4); D2(:,2) = sort(D2(:,2)); | |
88 D2(:,1) = D2(:,1)+6.5; D2(:,2) = D2(:,2)+2.8; | |
89 D2(:,3) = D2(:,3)+5; D2(:,4) = D2(:,4)/2+1.5; | |
90 D2(find(D2(:)<=0)) = 0.01; | |
91 | |
92 sDiris = som_data_struct([D; D2],'name','iris (simulated)',... | |
93 'comp_names',{'SepalL','SepalW','PetalL','PetalW'}); | |
94 sDiris = som_label(sDiris,'add',[1:50]','Setosa'); | |
95 sDiris = som_label(sDiris,'add',[51:100]','Versicolor'); | |
96 sDiris = som_label(sDiris,'add',[101:150]','Virginica'); | |
97 | |
98 echo on | |
99 end | |
100 | |
101 % Here are the histograms and scatter plots of the four variables. | |
102 | |
103 echo off | |
104 k=1; | |
105 for i=1:4, | |
106 for j=1:4, | |
107 if i==j, | |
108 subplot(4,4,k); | |
109 hist(sDiris.data(:,i)); title(sDiris.comp_names{i}) | |
110 elseif i<j, | |
111 subplot(4,4,k); | |
112 plot(sDiris.data(:,i),sDiris.data(:,j),'k.') | |
113 xlabel(sDiris.comp_names{i}) | |
114 ylabel(sDiris.comp_names{j}) | |
115 end | |
116 k=k+1; | |
117 end | |
118 end | |
119 echo on | |
120 | |
121 % Actually, as you saw in SOM_DEMO1, most SOM Toolbox functions | |
122 % can also handle plain data matrices, but then one is without the | |
123 % convenience offered by component names, labels and | |
124 % denormalization operations. | |
125 | |
126 | |
127 pause % Strike any key to normalize the data... | |
128 | |
129 | |
130 | |
131 | |
132 | |
133 clc | |
134 % STEP 2: DATA NORMALIZATION | |
135 % ========================== | |
136 | |
137 % Since SOM algorithm is based on Euclidian distances, the scale of | |
138 % the variables is very important in determining what the map will | |
139 % be like. If the range of values of some variable is much bigger | |
140 % than of the other variables, that variable will probably dominate | |
141 % the map organization completely. | |
142 | |
143 % For this reason, the components of the data set are usually | |
144 % normalized, for example so that each component has unit | |
145 % variance. This can be done with function SOM_NORMALIZE: | |
146 | |
147 sDiris = som_normalize(sDiris,'var'); | |
148 | |
149 % The function has also other normalization methods. | |
150 | |
151 % However, interpreting the values may be harder when they have | |
152 % been normalized. Therefore, the normalization operations can be | |
153 % reversed with function SOM_DENORMALIZE: | |
154 | |
155 x = sDiris.data(1,:) | |
156 | |
157 orig_x = som_denormalize(x,sDiris) | |
158 | |
159 pause % Strike any key to to train the map... | |
160 | |
161 | |
162 | |
163 | |
164 | |
165 clc | |
166 % STEP 3: MAP TRAINING | |
167 % ==================== | |
168 | |
169 % The function SOM_MAKE is used to train the SOM. By default, it | |
170 % first determines the map size, then initializes the map using | |
171 % linear initialization, and finally uses batch algorithm to train | |
172 % the map. Function SOM_DEMO1 has a more detailed description of | |
173 % the training process. | |
174 | |
175 sMap = som_make(sDiris); | |
176 | |
177 | |
178 pause % Strike any key to continues... | |
179 | |
180 % The IRIS data set also has labels associated with the data | |
181 % samples. Actually, the data set consists of 50 samples of three | |
182 % species of Iris-flowers (a total of 150 samples) such that the | |
183 % measurements are width and height of sepal and petal leaves. The | |
184 % label associated with each sample is the species information: | |
185 % 'Setosa', 'Versicolor' or 'Virginica'. | |
186 | |
187 % Now, the map can be labelled with these labels. The best | |
188 % matching unit of each sample is found from the map, and the | |
189 % species label is given to the map unit. Function SOM_AUTOLABEL | |
190 % can be used to do this: | |
191 | |
192 sMap = som_autolabel(sMap,sDiris,'vote'); | |
193 | |
194 pause % Strike any key to visualize the map... | |
195 | |
196 | |
197 | |
198 | |
199 | |
200 clc | |
201 % STEP 4: VISUALIZING THE SELF-ORGANIZING MAP: SOM_SHOW | |
202 % ===================================================== | |
203 | |
204 % The basic visualization of the SOM is done with function SOM_SHOW. | |
205 | |
206 colormap(1-gray) | |
207 som_show(sMap,'norm','d') | |
208 | |
209 % Notice that the names of the components are included as the | |
210 % titles of the subplots. Notice also that the variable values | |
211 % have been denormalized to the original range and scale. | |
212 | |
213 % The component planes ('PetalL', 'PetalW', 'SepalL' and 'SepalW') | |
214 % show what kind of values the prototype vectors of the map units | |
215 % have. The value is indicated with color, and the colorbar on the | |
216 % right shows what the colors mean. | |
217 | |
218 % The 'U-matrix' shows distances between neighboring units and thus | |
219 % visualizes the cluster structure of the map. Note that the | |
220 % U-matrix visualization has much more hexagons that the | |
221 % component planes. This is because distances *between* map units | |
222 % are shown, and not only the distance values *at* the map units. | |
223 | |
224 % High values on the U-matrix mean large distance between | |
225 % neighboring map units, and thus indicate cluster | |
226 % borders. Clusters are typically uniform areas of low | |
227 % values. Refer to colorbar to see which colors mean high | |
228 % values. In the IRIS map, there appear to be two clusters. | |
229 | |
230 pause % Strike any key to continue... | |
231 | |
232 % The subplots are linked together through similar position. In | |
233 % each axis, a particular map unit is always in the same place. For | |
234 % example: | |
235 | |
236 h=zeros(sMap.topol.msize); h(1,2) = 1; | |
237 som_show_add('hit',h(:),'markercolor','r','markersize',0.5,'subplot','all') | |
238 | |
239 % the red marker is on top of the same unit on each axis. | |
240 | |
241 pause % Strike any key to continue... | |
242 | |
243 | |
244 | |
245 clf | |
246 | |
247 clc | |
248 | |
249 % STEP 4: VISUALIZING THE SELF-ORGANIZING MAP: SOM_SHOW_ADD | |
250 % ========================================================= | |
251 | |
252 % The SOM_SHOW_ADD function can be used to add markers, labels and | |
253 % trajectories on top of SOM_SHOW created figures. The function | |
254 % SOM_SHOW_CLEAR can be used to clear them away. | |
255 | |
256 % Here, the U-matrix is shown on the left, and an empty grid | |
257 % named 'Labels' is shown on the right. | |
258 | |
259 som_show(sMap,'umat','all','empty','Labels') | |
260 | |
261 pause % Strike any key to add labels... | |
262 | |
263 % Here, the labels added to the map with SOM_AUTOLABEL function | |
264 % are shown on the empty grid. | |
265 | |
266 som_show_add('label',sMap,'Textsize',8,'TextColor','r','Subplot',2) | |
267 | |
268 pause % Strike any key to add hits... | |
269 | |
270 % An important tool in data analysis using SOM are so called hit | |
271 % histograms. They are formed by taking a data set, finding the BMU | |
272 % of each data sample from the map, and increasing a counter in a | |
273 % map unit each time it is the BMU. The hit histogram shows the | |
274 % distribution of the data set on the map. | |
275 | |
276 % Here, the hit histogram for the whole data set is calculated | |
277 % and visualized on the U-matrix. | |
278 | |
279 h = som_hits(sMap,sDiris); | |
280 som_show_add('hit',h,'MarkerColor','w','Subplot',1) | |
281 | |
282 pause % Strike any key to continue... | |
283 | |
284 % Multiple hit histograms can be shown simultaniously. Here, three | |
285 % hit histograms corresponding to the three species of Iris | |
286 % flowers is calculated and shown. | |
287 | |
288 % First, the old hit histogram is removed. | |
289 | |
290 som_show_clear('hit',1) | |
291 | |
292 % Then, the histograms are calculated. The first 50 samples in | |
293 % the data set are of the 'Setosa' species, the next 50 samples | |
294 % of the 'Versicolor' species and the last 50 samples of the | |
295 % 'Virginica' species. | |
296 | |
297 h1 = som_hits(sMap,sDiris.data(1:50,:)); | |
298 h2 = som_hits(sMap,sDiris.data(51:100,:)); | |
299 h3 = som_hits(sMap,sDiris.data(101:150,:)); | |
300 | |
301 som_show_add('hit',[h1, h2, h3],'MarkerColor',[1 0 0; 0 1 0; 0 0 1],'Subplot',1) | |
302 | |
303 % Red color is for 'Setosa', green for 'Versicolor' and blue for | |
304 % 'Virginica'. One can see that the three species are pretty well | |
305 % separated, although 'Versicolor' and 'Virginica' are slightly | |
306 % mixed up. | |
307 | |
308 pause % Strike any key to continue... | |
309 | |
310 | |
311 | |
312 clf | |
313 clc | |
314 | |
315 % STEP 4: VISUALIZING THE SELF-ORGANIZING MAP: SOM_GRID | |
316 % ===================================================== | |
317 | |
318 % There's also another visualization function: SOM_GRID. This | |
319 % allows visualization of the SOM in freely specified coordinates, | |
320 % for example the input space (of course, only upto 3D space). This | |
321 % function has quite a lot of options, and is pretty flexible. | |
322 | |
323 % Basically, the SOM_GRID visualizes the SOM network: each unit is | |
324 % shown with a marker and connected to its neighbors with lines. | |
325 % The user has control over: | |
326 % - the coordinate of each unit (2D or 3D) | |
327 % - the marker type, color and size of each unit | |
328 % - the linetype, color and width of the connecting lines | |
329 % There are also some other options. | |
330 | |
331 pause % Strike any key to see some visualizations... | |
332 | |
333 % Here are four visualizations made with SOM_GRID: | |
334 % - The map grid in the output space. | |
335 | |
336 subplot(2,2,1) | |
337 som_grid(sMap,'Linecolor','k') | |
338 view(0,-90), title('Map grid') | |
339 | |
340 % - A surface plot of distance matrix: both color and | |
341 % z-coordinate indicate average distance to neighboring | |
342 % map units. This is closely related to the U-matrix. | |
343 | |
344 subplot(2,2,2) | |
345 Co=som_unit_coords(sMap); U=som_umat(sMap); U=U(1:2:size(U,1),1:2:size(U,2)); | |
346 som_grid(sMap,'Coord',[Co, U(:)],'Surf',U(:),'Marker','none'); | |
347 view(-80,45), axis tight, title('Distance matrix') | |
348 | |
349 % - The map grid in the output space. Three first components | |
350 % determine the 3D-coordinates of the map unit, and the size | |
351 % of the marker is determined by the fourth component. | |
352 % Note that the values have been denormalized. | |
353 | |
354 subplot(2,2,3) | |
355 M = som_denormalize(sMap.codebook,sMap); | |
356 som_grid(sMap,'Coord',M(:,1:3),'MarkerSize',M(:,4)*2) | |
357 view(-80,45), axis tight, title('Prototypes') | |
358 | |
359 % - Map grid as above, but the original data has been plotted | |
360 % also: coordinates show the values of three first components | |
361 % and color indicates the species of each sample. Fourth | |
362 % component is not shown. | |
363 | |
364 subplot(2,2,4) | |
365 som_grid(sMap,'Coord',M(:,1:3),'MarkerSize',M(:,4)*2) | |
366 hold on | |
367 D = som_denormalize(sDiris.data,sDiris); | |
368 plot3(D(1:50,1),D(1:50,2),D(1:50,3),'r.',... | |
369 D(51:100,1),D(51:100,2),D(51:100,3),'g.',... | |
370 D(101:150,1),D(101:150,2),D(101:150,3),'b.') | |
371 view(-72,64), axis tight, title('Prototypes and data') | |
372 | |
373 pause % Strike any key to continue... | |
374 | |
375 % STEP 5: ANALYSIS OF RESULTS | |
376 % =========================== | |
377 | |
378 % The purpose of this step highly depends on the purpose of the | |
379 % whole data analysis: is it segmentation, modeling, novelty | |
380 % detection, classification, or something else? For this reason, | |
381 % there is not a single general-purpose analysis function, but | |
382 % a number of individual functions which may, or may not, prove | |
383 % useful in any specific case. | |
384 | |
385 % Visualization is of course part of the analysis of | |
386 % results. Examination of labels and hit histograms is another | |
387 % part. Yet another is validation of the quality of the SOM (see | |
388 % the use of SOM_QUALITY in SOM_DEMO1). | |
389 | |
390 [qe,te] = som_quality(sMap,sDiris) | |
391 | |
392 % People have contributed a number of functions to the Toolbox | |
393 % which can be used for the analysis. These include functions for | |
394 % vector projection, clustering, pdf-estimation, modeling, | |
395 % classification, etc. However, ultimately the use of these | |
396 % tools is up to you. | |
397 | |
398 % More about visualization is presented in SOM_DEMO3. | |
399 % More about data analysis is presented in SOM_DEMO4. | |
400 | |
401 echo off | |
402 warning on | |
403 | |
404 | |
405 | |
406 |