annotate toolboxes/MIRtoolbox1.3.2/somtoolbox/som_demo2.m @ 0:e9a9cd732c1e tip

first hg version after svn
author wolffd
date Tue, 10 Feb 2015 15:05:51 +0000
parents
children
rev   line source
wolffd@0 1
wolffd@0 2 %SOM_DEMO2 Basic usage of the SOM Toolbox.
wolffd@0 3
wolffd@0 4 % Contributed to SOM Toolbox 2.0, February 11th, 2000 by Juha Vesanto
wolffd@0 5 % http://www.cis.hut.fi/projects/somtoolbox/
wolffd@0 6
wolffd@0 7 % Version 1.0beta juuso 071197
wolffd@0 8 % Version 2.0beta juuso 070200
wolffd@0 9
wolffd@0 10 clf reset;
wolffd@0 11 figure(gcf)
wolffd@0 12 echo on
wolffd@0 13
wolffd@0 14
wolffd@0 15
wolffd@0 16 clc
wolffd@0 17 % ==========================================================
wolffd@0 18 % SOM_DEMO2 - BASIC USAGE OF SOM TOOLBOX
wolffd@0 19 % ==========================================================
wolffd@0 20
wolffd@0 21 % som_data_struct - Create a data struct.
wolffd@0 22 % som_read_data - Read data from file.
wolffd@0 23 %
wolffd@0 24 % som_normalize - Normalize data.
wolffd@0 25 % som_denormalize - Denormalize data.
wolffd@0 26 %
wolffd@0 27 % som_make - Initialize and train the map.
wolffd@0 28 %
wolffd@0 29 % som_show - Visualize map.
wolffd@0 30 % som_show_add - Add markers on som_show visualization.
wolffd@0 31 % som_grid - Visualization with free coordinates.
wolffd@0 32 %
wolffd@0 33 % som_autolabel - Give labels to map.
wolffd@0 34 % som_hits - Calculate hit histogram for the map.
wolffd@0 35
wolffd@0 36 % BASIC USAGE OF THE SOM TOOLBOX
wolffd@0 37
wolffd@0 38 % The basic usage of the SOM Toolbox proceeds like this:
wolffd@0 39 % 1. construct data set
wolffd@0 40 % 2. normalize it
wolffd@0 41 % 3. train the map
wolffd@0 42 % 4. visualize map
wolffd@0 43 % 5. analyse results
wolffd@0 44
wolffd@0 45 % The four first items are - if default options are used - very
wolffd@0 46 % simple operations, each executable with a single command. For
wolffd@0 47 % the last, several different kinds of functions are provided in
wolffd@0 48 % the Toolbox, but as the needs of analysis vary, a general default
wolffd@0 49 % function or procedure does not exist.
wolffd@0 50
wolffd@0 51 pause % Strike any key to construct data...
wolffd@0 52
wolffd@0 53
wolffd@0 54
wolffd@0 55 clc
wolffd@0 56 % STEP 1: CONSTRUCT DATA
wolffd@0 57 % ======================
wolffd@0 58
wolffd@0 59 % The SOM Toolbox has a special struct, called data struct, which
wolffd@0 60 % is used to group information regarding the data set in one
wolffd@0 61 % place.
wolffd@0 62
wolffd@0 63 % Here, a data struct is created using function SOM_DATA_STRUCT.
wolffd@0 64 % First argument is the data matrix itself, then is the name
wolffd@0 65 % given to the data set, and the names of the components
wolffd@0 66 % (variables) in the data matrix.
wolffd@0 67
wolffd@0 68 D = rand(1000,3); % 1000 samples from unit cube
wolffd@0 69 sData = som_data_struct(D,'name','unit cube','comp_names',{'x','y','z'});
wolffd@0 70
wolffd@0 71 % Another option is to read the data directly from an ASCII file.
wolffd@0 72 % Here, the IRIS data set is loaded from a file (please make sure
wolffd@0 73 % the file can be found from the current path):
wolffd@0 74
wolffd@0 75 try,
wolffd@0 76 sDiris = som_read_data('iris.data');
wolffd@0 77 catch
wolffd@0 78 echo off
wolffd@0 79
wolffd@0 80 warning('File ''iris.data'' not found. Using simulated data instead.')
wolffd@0 81
wolffd@0 82 D = randn(50,4);
wolffd@0 83 D(:,1) = D(:,1)+5; D(:,2) = D(:,2)+3.5;
wolffd@0 84 D(:,3) = D(:,3)/2+1.5; D(:,4) = D(:,4)/2+0.3;
wolffd@0 85 D(find(D(:)<=0)) = 0.01;
wolffd@0 86
wolffd@0 87 D2 = randn(100,4); D2(:,2) = sort(D2(:,2));
wolffd@0 88 D2(:,1) = D2(:,1)+6.5; D2(:,2) = D2(:,2)+2.8;
wolffd@0 89 D2(:,3) = D2(:,3)+5; D2(:,4) = D2(:,4)/2+1.5;
wolffd@0 90 D2(find(D2(:)<=0)) = 0.01;
wolffd@0 91
wolffd@0 92 sDiris = som_data_struct([D; D2],'name','iris (simulated)',...
wolffd@0 93 'comp_names',{'SepalL','SepalW','PetalL','PetalW'});
wolffd@0 94 sDiris = som_label(sDiris,'add',[1:50]','Setosa');
wolffd@0 95 sDiris = som_label(sDiris,'add',[51:100]','Versicolor');
wolffd@0 96 sDiris = som_label(sDiris,'add',[101:150]','Virginica');
wolffd@0 97
wolffd@0 98 echo on
wolffd@0 99 end
wolffd@0 100
wolffd@0 101 % Here are the histograms and scatter plots of the four variables.
wolffd@0 102
wolffd@0 103 echo off
wolffd@0 104 k=1;
wolffd@0 105 for i=1:4,
wolffd@0 106 for j=1:4,
wolffd@0 107 if i==j,
wolffd@0 108 subplot(4,4,k);
wolffd@0 109 hist(sDiris.data(:,i)); title(sDiris.comp_names{i})
wolffd@0 110 elseif i<j,
wolffd@0 111 subplot(4,4,k);
wolffd@0 112 plot(sDiris.data(:,i),sDiris.data(:,j),'k.')
wolffd@0 113 xlabel(sDiris.comp_names{i})
wolffd@0 114 ylabel(sDiris.comp_names{j})
wolffd@0 115 end
wolffd@0 116 k=k+1;
wolffd@0 117 end
wolffd@0 118 end
wolffd@0 119 echo on
wolffd@0 120
wolffd@0 121 % Actually, as you saw in SOM_DEMO1, most SOM Toolbox functions
wolffd@0 122 % can also handle plain data matrices, but then one is without the
wolffd@0 123 % convenience offered by component names, labels and
wolffd@0 124 % denormalization operations.
wolffd@0 125
wolffd@0 126
wolffd@0 127 pause % Strike any key to normalize the data...
wolffd@0 128
wolffd@0 129
wolffd@0 130
wolffd@0 131
wolffd@0 132
wolffd@0 133 clc
wolffd@0 134 % STEP 2: DATA NORMALIZATION
wolffd@0 135 % ==========================
wolffd@0 136
wolffd@0 137 % Since SOM algorithm is based on Euclidian distances, the scale of
wolffd@0 138 % the variables is very important in determining what the map will
wolffd@0 139 % be like. If the range of values of some variable is much bigger
wolffd@0 140 % than of the other variables, that variable will probably dominate
wolffd@0 141 % the map organization completely.
wolffd@0 142
wolffd@0 143 % For this reason, the components of the data set are usually
wolffd@0 144 % normalized, for example so that each component has unit
wolffd@0 145 % variance. This can be done with function SOM_NORMALIZE:
wolffd@0 146
wolffd@0 147 sDiris = som_normalize(sDiris,'var');
wolffd@0 148
wolffd@0 149 % The function has also other normalization methods.
wolffd@0 150
wolffd@0 151 % However, interpreting the values may be harder when they have
wolffd@0 152 % been normalized. Therefore, the normalization operations can be
wolffd@0 153 % reversed with function SOM_DENORMALIZE:
wolffd@0 154
wolffd@0 155 x = sDiris.data(1,:)
wolffd@0 156
wolffd@0 157 orig_x = som_denormalize(x,sDiris)
wolffd@0 158
wolffd@0 159 pause % Strike any key to to train the map...
wolffd@0 160
wolffd@0 161
wolffd@0 162
wolffd@0 163
wolffd@0 164
wolffd@0 165 clc
wolffd@0 166 % STEP 3: MAP TRAINING
wolffd@0 167 % ====================
wolffd@0 168
wolffd@0 169 % The function SOM_MAKE is used to train the SOM. By default, it
wolffd@0 170 % first determines the map size, then initializes the map using
wolffd@0 171 % linear initialization, and finally uses batch algorithm to train
wolffd@0 172 % the map. Function SOM_DEMO1 has a more detailed description of
wolffd@0 173 % the training process.
wolffd@0 174
wolffd@0 175 sMap = som_make(sDiris);
wolffd@0 176
wolffd@0 177
wolffd@0 178 pause % Strike any key to continues...
wolffd@0 179
wolffd@0 180 % The IRIS data set also has labels associated with the data
wolffd@0 181 % samples. Actually, the data set consists of 50 samples of three
wolffd@0 182 % species of Iris-flowers (a total of 150 samples) such that the
wolffd@0 183 % measurements are width and height of sepal and petal leaves. The
wolffd@0 184 % label associated with each sample is the species information:
wolffd@0 185 % 'Setosa', 'Versicolor' or 'Virginica'.
wolffd@0 186
wolffd@0 187 % Now, the map can be labelled with these labels. The best
wolffd@0 188 % matching unit of each sample is found from the map, and the
wolffd@0 189 % species label is given to the map unit. Function SOM_AUTOLABEL
wolffd@0 190 % can be used to do this:
wolffd@0 191
wolffd@0 192 sMap = som_autolabel(sMap,sDiris,'vote');
wolffd@0 193
wolffd@0 194 pause % Strike any key to visualize the map...
wolffd@0 195
wolffd@0 196
wolffd@0 197
wolffd@0 198
wolffd@0 199
wolffd@0 200 clc
wolffd@0 201 % STEP 4: VISUALIZING THE SELF-ORGANIZING MAP: SOM_SHOW
wolffd@0 202 % =====================================================
wolffd@0 203
wolffd@0 204 % The basic visualization of the SOM is done with function SOM_SHOW.
wolffd@0 205
wolffd@0 206 colormap(1-gray)
wolffd@0 207 som_show(sMap,'norm','d')
wolffd@0 208
wolffd@0 209 % Notice that the names of the components are included as the
wolffd@0 210 % titles of the subplots. Notice also that the variable values
wolffd@0 211 % have been denormalized to the original range and scale.
wolffd@0 212
wolffd@0 213 % The component planes ('PetalL', 'PetalW', 'SepalL' and 'SepalW')
wolffd@0 214 % show what kind of values the prototype vectors of the map units
wolffd@0 215 % have. The value is indicated with color, and the colorbar on the
wolffd@0 216 % right shows what the colors mean.
wolffd@0 217
wolffd@0 218 % The 'U-matrix' shows distances between neighboring units and thus
wolffd@0 219 % visualizes the cluster structure of the map. Note that the
wolffd@0 220 % U-matrix visualization has much more hexagons that the
wolffd@0 221 % component planes. This is because distances *between* map units
wolffd@0 222 % are shown, and not only the distance values *at* the map units.
wolffd@0 223
wolffd@0 224 % High values on the U-matrix mean large distance between
wolffd@0 225 % neighboring map units, and thus indicate cluster
wolffd@0 226 % borders. Clusters are typically uniform areas of low
wolffd@0 227 % values. Refer to colorbar to see which colors mean high
wolffd@0 228 % values. In the IRIS map, there appear to be two clusters.
wolffd@0 229
wolffd@0 230 pause % Strike any key to continue...
wolffd@0 231
wolffd@0 232 % The subplots are linked together through similar position. In
wolffd@0 233 % each axis, a particular map unit is always in the same place. For
wolffd@0 234 % example:
wolffd@0 235
wolffd@0 236 h=zeros(sMap.topol.msize); h(1,2) = 1;
wolffd@0 237 som_show_add('hit',h(:),'markercolor','r','markersize',0.5,'subplot','all')
wolffd@0 238
wolffd@0 239 % the red marker is on top of the same unit on each axis.
wolffd@0 240
wolffd@0 241 pause % Strike any key to continue...
wolffd@0 242
wolffd@0 243
wolffd@0 244
wolffd@0 245 clf
wolffd@0 246
wolffd@0 247 clc
wolffd@0 248
wolffd@0 249 % STEP 4: VISUALIZING THE SELF-ORGANIZING MAP: SOM_SHOW_ADD
wolffd@0 250 % =========================================================
wolffd@0 251
wolffd@0 252 % The SOM_SHOW_ADD function can be used to add markers, labels and
wolffd@0 253 % trajectories on top of SOM_SHOW created figures. The function
wolffd@0 254 % SOM_SHOW_CLEAR can be used to clear them away.
wolffd@0 255
wolffd@0 256 % Here, the U-matrix is shown on the left, and an empty grid
wolffd@0 257 % named 'Labels' is shown on the right.
wolffd@0 258
wolffd@0 259 som_show(sMap,'umat','all','empty','Labels')
wolffd@0 260
wolffd@0 261 pause % Strike any key to add labels...
wolffd@0 262
wolffd@0 263 % Here, the labels added to the map with SOM_AUTOLABEL function
wolffd@0 264 % are shown on the empty grid.
wolffd@0 265
wolffd@0 266 som_show_add('label',sMap,'Textsize',8,'TextColor','r','Subplot',2)
wolffd@0 267
wolffd@0 268 pause % Strike any key to add hits...
wolffd@0 269
wolffd@0 270 % An important tool in data analysis using SOM are so called hit
wolffd@0 271 % histograms. They are formed by taking a data set, finding the BMU
wolffd@0 272 % of each data sample from the map, and increasing a counter in a
wolffd@0 273 % map unit each time it is the BMU. The hit histogram shows the
wolffd@0 274 % distribution of the data set on the map.
wolffd@0 275
wolffd@0 276 % Here, the hit histogram for the whole data set is calculated
wolffd@0 277 % and visualized on the U-matrix.
wolffd@0 278
wolffd@0 279 h = som_hits(sMap,sDiris);
wolffd@0 280 som_show_add('hit',h,'MarkerColor','w','Subplot',1)
wolffd@0 281
wolffd@0 282 pause % Strike any key to continue...
wolffd@0 283
wolffd@0 284 % Multiple hit histograms can be shown simultaniously. Here, three
wolffd@0 285 % hit histograms corresponding to the three species of Iris
wolffd@0 286 % flowers is calculated and shown.
wolffd@0 287
wolffd@0 288 % First, the old hit histogram is removed.
wolffd@0 289
wolffd@0 290 som_show_clear('hit',1)
wolffd@0 291
wolffd@0 292 % Then, the histograms are calculated. The first 50 samples in
wolffd@0 293 % the data set are of the 'Setosa' species, the next 50 samples
wolffd@0 294 % of the 'Versicolor' species and the last 50 samples of the
wolffd@0 295 % 'Virginica' species.
wolffd@0 296
wolffd@0 297 h1 = som_hits(sMap,sDiris.data(1:50,:));
wolffd@0 298 h2 = som_hits(sMap,sDiris.data(51:100,:));
wolffd@0 299 h3 = som_hits(sMap,sDiris.data(101:150,:));
wolffd@0 300
wolffd@0 301 som_show_add('hit',[h1, h2, h3],'MarkerColor',[1 0 0; 0 1 0; 0 0 1],'Subplot',1)
wolffd@0 302
wolffd@0 303 % Red color is for 'Setosa', green for 'Versicolor' and blue for
wolffd@0 304 % 'Virginica'. One can see that the three species are pretty well
wolffd@0 305 % separated, although 'Versicolor' and 'Virginica' are slightly
wolffd@0 306 % mixed up.
wolffd@0 307
wolffd@0 308 pause % Strike any key to continue...
wolffd@0 309
wolffd@0 310
wolffd@0 311
wolffd@0 312 clf
wolffd@0 313 clc
wolffd@0 314
wolffd@0 315 % STEP 4: VISUALIZING THE SELF-ORGANIZING MAP: SOM_GRID
wolffd@0 316 % =====================================================
wolffd@0 317
wolffd@0 318 % There's also another visualization function: SOM_GRID. This
wolffd@0 319 % allows visualization of the SOM in freely specified coordinates,
wolffd@0 320 % for example the input space (of course, only upto 3D space). This
wolffd@0 321 % function has quite a lot of options, and is pretty flexible.
wolffd@0 322
wolffd@0 323 % Basically, the SOM_GRID visualizes the SOM network: each unit is
wolffd@0 324 % shown with a marker and connected to its neighbors with lines.
wolffd@0 325 % The user has control over:
wolffd@0 326 % - the coordinate of each unit (2D or 3D)
wolffd@0 327 % - the marker type, color and size of each unit
wolffd@0 328 % - the linetype, color and width of the connecting lines
wolffd@0 329 % There are also some other options.
wolffd@0 330
wolffd@0 331 pause % Strike any key to see some visualizations...
wolffd@0 332
wolffd@0 333 % Here are four visualizations made with SOM_GRID:
wolffd@0 334 % - The map grid in the output space.
wolffd@0 335
wolffd@0 336 subplot(2,2,1)
wolffd@0 337 som_grid(sMap,'Linecolor','k')
wolffd@0 338 view(0,-90), title('Map grid')
wolffd@0 339
wolffd@0 340 % - A surface plot of distance matrix: both color and
wolffd@0 341 % z-coordinate indicate average distance to neighboring
wolffd@0 342 % map units. This is closely related to the U-matrix.
wolffd@0 343
wolffd@0 344 subplot(2,2,2)
wolffd@0 345 Co=som_unit_coords(sMap); U=som_umat(sMap); U=U(1:2:size(U,1),1:2:size(U,2));
wolffd@0 346 som_grid(sMap,'Coord',[Co, U(:)],'Surf',U(:),'Marker','none');
wolffd@0 347 view(-80,45), axis tight, title('Distance matrix')
wolffd@0 348
wolffd@0 349 % - The map grid in the output space. Three first components
wolffd@0 350 % determine the 3D-coordinates of the map unit, and the size
wolffd@0 351 % of the marker is determined by the fourth component.
wolffd@0 352 % Note that the values have been denormalized.
wolffd@0 353
wolffd@0 354 subplot(2,2,3)
wolffd@0 355 M = som_denormalize(sMap.codebook,sMap);
wolffd@0 356 som_grid(sMap,'Coord',M(:,1:3),'MarkerSize',M(:,4)*2)
wolffd@0 357 view(-80,45), axis tight, title('Prototypes')
wolffd@0 358
wolffd@0 359 % - Map grid as above, but the original data has been plotted
wolffd@0 360 % also: coordinates show the values of three first components
wolffd@0 361 % and color indicates the species of each sample. Fourth
wolffd@0 362 % component is not shown.
wolffd@0 363
wolffd@0 364 subplot(2,2,4)
wolffd@0 365 som_grid(sMap,'Coord',M(:,1:3),'MarkerSize',M(:,4)*2)
wolffd@0 366 hold on
wolffd@0 367 D = som_denormalize(sDiris.data,sDiris);
wolffd@0 368 plot3(D(1:50,1),D(1:50,2),D(1:50,3),'r.',...
wolffd@0 369 D(51:100,1),D(51:100,2),D(51:100,3),'g.',...
wolffd@0 370 D(101:150,1),D(101:150,2),D(101:150,3),'b.')
wolffd@0 371 view(-72,64), axis tight, title('Prototypes and data')
wolffd@0 372
wolffd@0 373 pause % Strike any key to continue...
wolffd@0 374
wolffd@0 375 % STEP 5: ANALYSIS OF RESULTS
wolffd@0 376 % ===========================
wolffd@0 377
wolffd@0 378 % The purpose of this step highly depends on the purpose of the
wolffd@0 379 % whole data analysis: is it segmentation, modeling, novelty
wolffd@0 380 % detection, classification, or something else? For this reason,
wolffd@0 381 % there is not a single general-purpose analysis function, but
wolffd@0 382 % a number of individual functions which may, or may not, prove
wolffd@0 383 % useful in any specific case.
wolffd@0 384
wolffd@0 385 % Visualization is of course part of the analysis of
wolffd@0 386 % results. Examination of labels and hit histograms is another
wolffd@0 387 % part. Yet another is validation of the quality of the SOM (see
wolffd@0 388 % the use of SOM_QUALITY in SOM_DEMO1).
wolffd@0 389
wolffd@0 390 [qe,te] = som_quality(sMap,sDiris)
wolffd@0 391
wolffd@0 392 % People have contributed a number of functions to the Toolbox
wolffd@0 393 % which can be used for the analysis. These include functions for
wolffd@0 394 % vector projection, clustering, pdf-estimation, modeling,
wolffd@0 395 % classification, etc. However, ultimately the use of these
wolffd@0 396 % tools is up to you.
wolffd@0 397
wolffd@0 398 % More about visualization is presented in SOM_DEMO3.
wolffd@0 399 % More about data analysis is presented in SOM_DEMO4.
wolffd@0 400
wolffd@0 401 echo off
wolffd@0 402 warning on
wolffd@0 403
wolffd@0 404
wolffd@0 405
wolffd@0 406