camir-ismir2012: toolboxes/MIRtoolbox1.3.2/somtoolbox/som

annotate toolboxes/MIRtoolbox1.3.2/somtoolbox/som_demo2.m @ 0:cc4b1211e677 tip

initial commit to HG from Changeset: 646 (e263d8a21543) added further path and more save "camirversion.m"

author	Daniel Wolff
date	Fri, 19 Aug 2016 13:07:06 +0200
parents
children

rev	line source
Daniel@0	1
Daniel@0	2 %SOM_DEMO2 Basic usage of the SOM Toolbox.
Daniel@0	3
Daniel@0	4 % Contributed to SOM Toolbox 2.0, February 11th, 2000 by Juha Vesanto
Daniel@0	5 % http://www.cis.hut.fi/projects/somtoolbox/
Daniel@0	6
Daniel@0	7 % Version 1.0beta juuso 071197
Daniel@0	8 % Version 2.0beta juuso 070200
Daniel@0	9
Daniel@0	10 clf reset;
Daniel@0	11 figure(gcf)
Daniel@0	12 echo on
Daniel@0	13
Daniel@0	14
Daniel@0	15
Daniel@0	16 clc
Daniel@0	17 % ==========================================================
Daniel@0	18 % SOM_DEMO2 - BASIC USAGE OF SOM TOOLBOX
Daniel@0	19 % ==========================================================
Daniel@0	20
Daniel@0	21 % som_data_struct - Create a data struct.
Daniel@0	22 % som_read_data - Read data from file.
Daniel@0	23 %
Daniel@0	24 % som_normalize - Normalize data.
Daniel@0	25 % som_denormalize - Denormalize data.
Daniel@0	26 %
Daniel@0	27 % som_make - Initialize and train the map.
Daniel@0	28 %
Daniel@0	29 % som_show - Visualize map.
Daniel@0	30 % som_show_add - Add markers on som_show visualization.
Daniel@0	31 % som_grid - Visualization with free coordinates.
Daniel@0	32 %
Daniel@0	33 % som_autolabel - Give labels to map.
Daniel@0	34 % som_hits - Calculate hit histogram for the map.
Daniel@0	35
Daniel@0	36 % BASIC USAGE OF THE SOM TOOLBOX
Daniel@0	37
Daniel@0	38 % The basic usage of the SOM Toolbox proceeds like this:
Daniel@0	39 % 1. construct data set
Daniel@0	40 % 2. normalize it
Daniel@0	41 % 3. train the map
Daniel@0	42 % 4. visualize map
Daniel@0	43 % 5. analyse results
Daniel@0	44
Daniel@0	45 % The four first items are - if default options are used - very
Daniel@0	46 % simple operations, each executable with a single command. For
Daniel@0	47 % the last, several different kinds of functions are provided in
Daniel@0	48 % the Toolbox, but as the needs of analysis vary, a general default
Daniel@0	49 % function or procedure does not exist.
Daniel@0	50
Daniel@0	51 pause % Strike any key to construct data...
Daniel@0	52
Daniel@0	53
Daniel@0	54
Daniel@0	55 clc
Daniel@0	56 % STEP 1: CONSTRUCT DATA
Daniel@0	57 % ======================
Daniel@0	58
Daniel@0	59 % The SOM Toolbox has a special struct, called data struct, which
Daniel@0	60 % is used to group information regarding the data set in one
Daniel@0	61 % place.
Daniel@0	62
Daniel@0	63 % Here, a data struct is created using function SOM_DATA_STRUCT.
Daniel@0	64 % First argument is the data matrix itself, then is the name
Daniel@0	65 % given to the data set, and the names of the components
Daniel@0	66 % (variables) in the data matrix.
Daniel@0	67
Daniel@0	68 D = rand(1000,3); % 1000 samples from unit cube
Daniel@0	69 sData = som_data_struct(D,'name','unit cube','comp_names',{'x','y','z'});
Daniel@0	70
Daniel@0	71 % Another option is to read the data directly from an ASCII file.
Daniel@0	72 % Here, the IRIS data set is loaded from a file (please make sure
Daniel@0	73 % the file can be found from the current path):
Daniel@0	74
Daniel@0	75 try,
Daniel@0	76 sDiris = som_read_data('iris.data');
Daniel@0	77 catch
Daniel@0	78 echo off
Daniel@0	79
Daniel@0	80 warning('File ''iris.data'' not found. Using simulated data instead.')
Daniel@0	81
Daniel@0	82 D = randn(50,4);
Daniel@0	83 D(:,1) = D(:,1)+5; D(:,2) = D(:,2)+3.5;
Daniel@0	84 D(:,3) = D(:,3)/2+1.5; D(:,4) = D(:,4)/2+0.3;
Daniel@0	85 D(find(D(:)<=0)) = 0.01;
Daniel@0	86
Daniel@0	87 D2 = randn(100,4); D2(:,2) = sort(D2(:,2));
Daniel@0	88 D2(:,1) = D2(:,1)+6.5; D2(:,2) = D2(:,2)+2.8;
Daniel@0	89 D2(:,3) = D2(:,3)+5; D2(:,4) = D2(:,4)/2+1.5;
Daniel@0	90 D2(find(D2(:)<=0)) = 0.01;
Daniel@0	91
Daniel@0	92 sDiris = som_data_struct([D; D2],'name','iris (simulated)',...
Daniel@0	93 'comp_names',{'SepalL','SepalW','PetalL','PetalW'});
Daniel@0	94 sDiris = som_label(sDiris,'add',[1:50]','Setosa');
Daniel@0	95 sDiris = som_label(sDiris,'add',[51:100]','Versicolor');
Daniel@0	96 sDiris = som_label(sDiris,'add',[101:150]','Virginica');
Daniel@0	97
Daniel@0	98 echo on
Daniel@0	99 end
Daniel@0	100
Daniel@0	101 % Here are the histograms and scatter plots of the four variables.
Daniel@0	102
Daniel@0	103 echo off
Daniel@0	104 k=1;
Daniel@0	105 for i=1:4,
Daniel@0	106 for j=1:4,
Daniel@0	107 if i==j,
Daniel@0	108 subplot(4,4,k);
Daniel@0	109 hist(sDiris.data(:,i)); title(sDiris.comp_names{i})
Daniel@0	110 elseif i<j,
Daniel@0	111 subplot(4,4,k);
Daniel@0	112 plot(sDiris.data(:,i),sDiris.data(:,j),'k.')
Daniel@0	113 xlabel(sDiris.comp_names{i})
Daniel@0	114 ylabel(sDiris.comp_names{j})
Daniel@0	115 end
Daniel@0	116 k=k+1;
Daniel@0	117 end
Daniel@0	118 end
Daniel@0	119 echo on
Daniel@0	120
Daniel@0	121 % Actually, as you saw in SOM_DEMO1, most SOM Toolbox functions
Daniel@0	122 % can also handle plain data matrices, but then one is without the
Daniel@0	123 % convenience offered by component names, labels and
Daniel@0	124 % denormalization operations.
Daniel@0	125
Daniel@0	126
Daniel@0	127 pause % Strike any key to normalize the data...
Daniel@0	128
Daniel@0	129
Daniel@0	130
Daniel@0	131
Daniel@0	132
Daniel@0	133 clc
Daniel@0	134 % STEP 2: DATA NORMALIZATION
Daniel@0	135 % ==========================
Daniel@0	136
Daniel@0	137 % Since SOM algorithm is based on Euclidian distances, the scale of
Daniel@0	138 % the variables is very important in determining what the map will
Daniel@0	139 % be like. If the range of values of some variable is much bigger
Daniel@0	140 % than of the other variables, that variable will probably dominate
Daniel@0	141 % the map organization completely.
Daniel@0	142
Daniel@0	143 % For this reason, the components of the data set are usually
Daniel@0	144 % normalized, for example so that each component has unit
Daniel@0	145 % variance. This can be done with function SOM_NORMALIZE:
Daniel@0	146
Daniel@0	147 sDiris = som_normalize(sDiris,'var');
Daniel@0	148
Daniel@0	149 % The function has also other normalization methods.
Daniel@0	150
Daniel@0	151 % However, interpreting the values may be harder when they have
Daniel@0	152 % been normalized. Therefore, the normalization operations can be
Daniel@0	153 % reversed with function SOM_DENORMALIZE:
Daniel@0	154
Daniel@0	155 x = sDiris.data(1,:)
Daniel@0	156
Daniel@0	157 orig_x = som_denormalize(x,sDiris)
Daniel@0	158
Daniel@0	159 pause % Strike any key to to train the map...
Daniel@0	160
Daniel@0	161
Daniel@0	162
Daniel@0	163
Daniel@0	164
Daniel@0	165 clc
Daniel@0	166 % STEP 3: MAP TRAINING
Daniel@0	167 % ====================
Daniel@0	168
Daniel@0	169 % The function SOM_MAKE is used to train the SOM. By default, it
Daniel@0	170 % first determines the map size, then initializes the map using
Daniel@0	171 % linear initialization, and finally uses batch algorithm to train
Daniel@0	172 % the map. Function SOM_DEMO1 has a more detailed description of
Daniel@0	173 % the training process.
Daniel@0	174
Daniel@0	175 sMap = som_make(sDiris);
Daniel@0	176
Daniel@0	177
Daniel@0	178 pause % Strike any key to continues...
Daniel@0	179
Daniel@0	180 % The IRIS data set also has labels associated with the data
Daniel@0	181 % samples. Actually, the data set consists of 50 samples of three
Daniel@0	182 % species of Iris-flowers (a total of 150 samples) such that the
Daniel@0	183 % measurements are width and height of sepal and petal leaves. The
Daniel@0	184 % label associated with each sample is the species information:
Daniel@0	185 % 'Setosa', 'Versicolor' or 'Virginica'.
Daniel@0	186
Daniel@0	187 % Now, the map can be labelled with these labels. The best
Daniel@0	188 % matching unit of each sample is found from the map, and the
Daniel@0	189 % species label is given to the map unit. Function SOM_AUTOLABEL
Daniel@0	190 % can be used to do this:
Daniel@0	191
Daniel@0	192 sMap = som_autolabel(sMap,sDiris,'vote');
Daniel@0	193
Daniel@0	194 pause % Strike any key to visualize the map...
Daniel@0	195
Daniel@0	196
Daniel@0	197
Daniel@0	198
Daniel@0	199
Daniel@0	200 clc
Daniel@0	201 % STEP 4: VISUALIZING THE SELF-ORGANIZING MAP: SOM_SHOW
Daniel@0	202 % =====================================================
Daniel@0	203
Daniel@0	204 % The basic visualization of the SOM is done with function SOM_SHOW.
Daniel@0	205
Daniel@0	206 colormap(1-gray)
Daniel@0	207 som_show(sMap,'norm','d')
Daniel@0	208
Daniel@0	209 % Notice that the names of the components are included as the
Daniel@0	210 % titles of the subplots. Notice also that the variable values
Daniel@0	211 % have been denormalized to the original range and scale.
Daniel@0	212
Daniel@0	213 % The component planes ('PetalL', 'PetalW', 'SepalL' and 'SepalW')
Daniel@0	214 % show what kind of values the prototype vectors of the map units
Daniel@0	215 % have. The value is indicated with color, and the colorbar on the
Daniel@0	216 % right shows what the colors mean.
Daniel@0	217
Daniel@0	218 % The 'U-matrix' shows distances between neighboring units and thus
Daniel@0	219 % visualizes the cluster structure of the map. Note that the
Daniel@0	220 % U-matrix visualization has much more hexagons that the
Daniel@0	221 % component planes. This is because distances between map units
Daniel@0	222 % are shown, and not only the distance values at the map units.
Daniel@0	223
Daniel@0	224 % High values on the U-matrix mean large distance between
Daniel@0	225 % neighboring map units, and thus indicate cluster
Daniel@0	226 % borders. Clusters are typically uniform areas of low
Daniel@0	227 % values. Refer to colorbar to see which colors mean high
Daniel@0	228 % values. In the IRIS map, there appear to be two clusters.
Daniel@0	229
Daniel@0	230 pause % Strike any key to continue...
Daniel@0	231
Daniel@0	232 % The subplots are linked together through similar position. In
Daniel@0	233 % each axis, a particular map unit is always in the same place. For
Daniel@0	234 % example:
Daniel@0	235
Daniel@0	236 h=zeros(sMap.topol.msize); h(1,2) = 1;
Daniel@0	237 som_show_add('hit',h(:),'markercolor','r','markersize',0.5,'subplot','all')
Daniel@0	238
Daniel@0	239 % the red marker is on top of the same unit on each axis.
Daniel@0	240
Daniel@0	241 pause % Strike any key to continue...
Daniel@0	242
Daniel@0	243
Daniel@0	244
Daniel@0	245 clf
Daniel@0	246
Daniel@0	247 clc
Daniel@0	248
Daniel@0	249 % STEP 4: VISUALIZING THE SELF-ORGANIZING MAP: SOM_SHOW_ADD
Daniel@0	250 % =========================================================
Daniel@0	251
Daniel@0	252 % The SOM_SHOW_ADD function can be used to add markers, labels and
Daniel@0	253 % trajectories on top of SOM_SHOW created figures. The function
Daniel@0	254 % SOM_SHOW_CLEAR can be used to clear them away.
Daniel@0	255
Daniel@0	256 % Here, the U-matrix is shown on the left, and an empty grid
Daniel@0	257 % named 'Labels' is shown on the right.
Daniel@0	258
Daniel@0	259 som_show(sMap,'umat','all','empty','Labels')
Daniel@0	260
Daniel@0	261 pause % Strike any key to add labels...
Daniel@0	262
Daniel@0	263 % Here, the labels added to the map with SOM_AUTOLABEL function
Daniel@0	264 % are shown on the empty grid.
Daniel@0	265
Daniel@0	266 som_show_add('label',sMap,'Textsize',8,'TextColor','r','Subplot',2)
Daniel@0	267
Daniel@0	268 pause % Strike any key to add hits...
Daniel@0	269
Daniel@0	270 % An important tool in data analysis using SOM are so called hit
Daniel@0	271 % histograms. They are formed by taking a data set, finding the BMU
Daniel@0	272 % of each data sample from the map, and increasing a counter in a
Daniel@0	273 % map unit each time it is the BMU. The hit histogram shows the
Daniel@0	274 % distribution of the data set on the map.
Daniel@0	275
Daniel@0	276 % Here, the hit histogram for the whole data set is calculated
Daniel@0	277 % and visualized on the U-matrix.
Daniel@0	278
Daniel@0	279 h = som_hits(sMap,sDiris);
Daniel@0	280 som_show_add('hit',h,'MarkerColor','w','Subplot',1)
Daniel@0	281
Daniel@0	282 pause % Strike any key to continue...
Daniel@0	283
Daniel@0	284 % Multiple hit histograms can be shown simultaniously. Here, three
Daniel@0	285 % hit histograms corresponding to the three species of Iris
Daniel@0	286 % flowers is calculated and shown.
Daniel@0	287
Daniel@0	288 % First, the old hit histogram is removed.
Daniel@0	289
Daniel@0	290 som_show_clear('hit',1)
Daniel@0	291
Daniel@0	292 % Then, the histograms are calculated. The first 50 samples in
Daniel@0	293 % the data set are of the 'Setosa' species, the next 50 samples
Daniel@0	294 % of the 'Versicolor' species and the last 50 samples of the
Daniel@0	295 % 'Virginica' species.
Daniel@0	296
Daniel@0	297 h1 = som_hits(sMap,sDiris.data(1:50,:));
Daniel@0	298 h2 = som_hits(sMap,sDiris.data(51:100,:));
Daniel@0	299 h3 = som_hits(sMap,sDiris.data(101:150,:));
Daniel@0	300
Daniel@0	301 som_show_add('hit',[h1, h2, h3],'MarkerColor',[1 0 0; 0 1 0; 0 0 1],'Subplot',1)
Daniel@0	302
Daniel@0	303 % Red color is for 'Setosa', green for 'Versicolor' and blue for
Daniel@0	304 % 'Virginica'. One can see that the three species are pretty well
Daniel@0	305 % separated, although 'Versicolor' and 'Virginica' are slightly
Daniel@0	306 % mixed up.
Daniel@0	307
Daniel@0	308 pause % Strike any key to continue...
Daniel@0	309
Daniel@0	310
Daniel@0	311
Daniel@0	312 clf
Daniel@0	313 clc
Daniel@0	314
Daniel@0	315 % STEP 4: VISUALIZING THE SELF-ORGANIZING MAP: SOM_GRID
Daniel@0	316 % =====================================================
Daniel@0	317
Daniel@0	318 % There's also another visualization function: SOM_GRID. This
Daniel@0	319 % allows visualization of the SOM in freely specified coordinates,
Daniel@0	320 % for example the input space (of course, only upto 3D space). This
Daniel@0	321 % function has quite a lot of options, and is pretty flexible.
Daniel@0	322
Daniel@0	323 % Basically, the SOM_GRID visualizes the SOM network: each unit is
Daniel@0	324 % shown with a marker and connected to its neighbors with lines.
Daniel@0	325 % The user has control over:
Daniel@0	326 % - the coordinate of each unit (2D or 3D)
Daniel@0	327 % - the marker type, color and size of each unit
Daniel@0	328 % - the linetype, color and width of the connecting lines
Daniel@0	329 % There are also some other options.
Daniel@0	330
Daniel@0	331 pause % Strike any key to see some visualizations...
Daniel@0	332
Daniel@0	333 % Here are four visualizations made with SOM_GRID:
Daniel@0	334 % - The map grid in the output space.
Daniel@0	335
Daniel@0	336 subplot(2,2,1)
Daniel@0	337 som_grid(sMap,'Linecolor','k')
Daniel@0	338 view(0,-90), title('Map grid')
Daniel@0	339
Daniel@0	340 % - A surface plot of distance matrix: both color and
Daniel@0	341 % z-coordinate indicate average distance to neighboring
Daniel@0	342 % map units. This is closely related to the U-matrix.
Daniel@0	343
Daniel@0	344 subplot(2,2,2)
Daniel@0	345 Co=som_unit_coords(sMap); U=som_umat(sMap); U=U(1:2:size(U,1),1:2:size(U,2));
Daniel@0	346 som_grid(sMap,'Coord',[Co, U(:)],'Surf',U(:),'Marker','none');
Daniel@0	347 view(-80,45), axis tight, title('Distance matrix')
Daniel@0	348
Daniel@0	349 % - The map grid in the output space. Three first components
Daniel@0	350 % determine the 3D-coordinates of the map unit, and the size
Daniel@0	351 % of the marker is determined by the fourth component.
Daniel@0	352 % Note that the values have been denormalized.
Daniel@0	353
Daniel@0	354 subplot(2,2,3)
Daniel@0	355 M = som_denormalize(sMap.codebook,sMap);
Daniel@0	356 som_grid(sMap,'Coord',M(:,1:3),'MarkerSize',M(:,4)*2)
Daniel@0	357 view(-80,45), axis tight, title('Prototypes')
Daniel@0	358
Daniel@0	359 % - Map grid as above, but the original data has been plotted
Daniel@0	360 % also: coordinates show the values of three first components
Daniel@0	361 % and color indicates the species of each sample. Fourth
Daniel@0	362 % component is not shown.
Daniel@0	363
Daniel@0	364 subplot(2,2,4)
Daniel@0	365 som_grid(sMap,'Coord',M(:,1:3),'MarkerSize',M(:,4)*2)
Daniel@0	366 hold on
Daniel@0	367 D = som_denormalize(sDiris.data,sDiris);
Daniel@0	368 plot3(D(1:50,1),D(1:50,2),D(1:50,3),'r.',...
Daniel@0	369 D(51:100,1),D(51:100,2),D(51:100,3),'g.',...
Daniel@0	370 D(101:150,1),D(101:150,2),D(101:150,3),'b.')
Daniel@0	371 view(-72,64), axis tight, title('Prototypes and data')
Daniel@0	372
Daniel@0	373 pause % Strike any key to continue...
Daniel@0	374
Daniel@0	375 % STEP 5: ANALYSIS OF RESULTS
Daniel@0	376 % ===========================
Daniel@0	377
Daniel@0	378 % The purpose of this step highly depends on the purpose of the
Daniel@0	379 % whole data analysis: is it segmentation, modeling, novelty
Daniel@0	380 % detection, classification, or something else? For this reason,
Daniel@0	381 % there is not a single general-purpose analysis function, but
Daniel@0	382 % a number of individual functions which may, or may not, prove
Daniel@0	383 % useful in any specific case.
Daniel@0	384
Daniel@0	385 % Visualization is of course part of the analysis of
Daniel@0	386 % results. Examination of labels and hit histograms is another
Daniel@0	387 % part. Yet another is validation of the quality of the SOM (see
Daniel@0	388 % the use of SOM_QUALITY in SOM_DEMO1).
Daniel@0	389
Daniel@0	390 [qe,te] = som_quality(sMap,sDiris)
Daniel@0	391
Daniel@0	392 % People have contributed a number of functions to the Toolbox
Daniel@0	393 % which can be used for the analysis. These include functions for
Daniel@0	394 % vector projection, clustering, pdf-estimation, modeling,
Daniel@0	395 % classification, etc. However, ultimately the use of these
Daniel@0	396 % tools is up to you.
Daniel@0	397
Daniel@0	398 % More about visualization is presented in SOM_DEMO3.
Daniel@0	399 % More about data analysis is presented in SOM_DEMO4.
Daniel@0	400
Daniel@0	401 echo off
Daniel@0	402 warning on
Daniel@0	403
Daniel@0	404
Daniel@0	405
Daniel@0	406

Mercurial > hg > camir-ismir2012

annotate toolboxes/MIRtoolbox1.3.2/somtoolbox/som_demo2.m @ 0:cc4b1211e677 tip