wolffd@0: 
wolffd@0: %SOM_DEMO1 Basic properties and behaviour of the Self-Organizing Map.
wolffd@0: 
wolffd@0: % Contributed to SOM Toolbox 2.0, February 11th, 2000 by Juha Vesanto
wolffd@0: % http://www.cis.hut.fi/projects/somtoolbox/
wolffd@0: 
wolffd@0: % Version 1.0beta juuso 071197
wolffd@0: % Version 2.0beta juuso 030200 
wolffd@0: 
wolffd@0: clf reset;
wolffd@0: figure(gcf)
wolffd@0: echo on
wolffd@0: 
wolffd@0: 
wolffd@0: 
wolffd@0: clc
wolffd@0: %    ==========================================================
wolffd@0: %    SOM_DEMO1 - BEHAVIOUR AND PROPERTIES OF SOM
wolffd@0: %    ==========================================================
wolffd@0: 
wolffd@0: %    som_make        - Create, initialize and train a SOM.
wolffd@0: %     som_randinit   - Create and initialize a SOM.
wolffd@0: %     som_lininit    - Create and initialize a SOM.
wolffd@0: %     som_seqtrain   - Train a SOM.
wolffd@0: %     som_batchtrain - Train a SOM.
wolffd@0: %    som_bmus        - Find best-matching units (BMUs).
wolffd@0: %    som_quality     - Measure quality of SOM.
wolffd@0: 
wolffd@0: %    SELF-ORGANIZING MAP (SOM):
wolffd@0: 
wolffd@0: %    A self-organized map (SOM) is a "map" of the training data, 
wolffd@0: %    dense where there is a lot of data and thin where the data 
wolffd@0: %    density is low. 
wolffd@0: 
wolffd@0: %    The map constitutes of neurons located on a regular map grid. 
wolffd@0: %    The lattice of the grid can be either hexagonal or rectangular.
wolffd@0: 
wolffd@0: subplot(1,2,1)
wolffd@0: som_cplane('hexa',[10 15],'none')
wolffd@0: title('Hexagonal SOM grid')
wolffd@0: 
wolffd@0: subplot(1,2,2)
wolffd@0: som_cplane('rect',[10 15],'none')
wolffd@0: title('Rectangular SOM grid')
wolffd@0: 
wolffd@0: %    Each neuron (hexagon on the left, rectangle on the right) has an
wolffd@0: %    associated prototype vector. After training, neighboring neurons
wolffd@0: %    have similar prototype vectors.
wolffd@0: 
wolffd@0: %    The SOM can be used for data visualization, clustering (or 
wolffd@0: %    classification), estimation and a variety of other purposes.
wolffd@0: 
wolffd@0: pause % Strike any key to continue...
wolffd@0: 
wolffd@0: clf
wolffd@0: clc
wolffd@0: %    INITIALIZE AND TRAIN THE SELF-ORGANIZING MAP
wolffd@0: %    ============================================
wolffd@0: 
wolffd@0: %    Here are 300 data points sampled from the unit square:
wolffd@0: 
wolffd@0: D = rand(300,2);
wolffd@0: 
wolffd@0: %    The map will be a 2-dimensional grid of size 10 x 10.
wolffd@0: 
wolffd@0: msize = [10 10];
wolffd@0: 
wolffd@0: %    SOM_RANDINIT and SOM_LININIT can be used to initialize the
wolffd@0: %    prototype vectors in the map. The map size is actually an
wolffd@0: %    optional argument. If omitted, it is determined automatically
wolffd@0: %    based on the amount of data vectors and the principal
wolffd@0: %    eigenvectors of the data set. Below, the random initialization
wolffd@0: %    algorithm is used.
wolffd@0: 
wolffd@0: sMap  = som_randinit(D, 'msize', msize);
wolffd@0: 
wolffd@0: %    Actually, each map unit can be thought as having two sets
wolffd@0: %    of coordinates: 
wolffd@0: %      (1) in the input space:  the prototype vectors
wolffd@0: %      (2) in the output space: the position on the map
wolffd@0: %    In the two spaces, the map looks like this: 
wolffd@0: 
wolffd@0: subplot(1,3,1) 
wolffd@0: som_grid(sMap)
wolffd@0: axis([0 11 0 11]), view(0,-90), title('Map in output space')
wolffd@0: 
wolffd@0: subplot(1,3,2) 
wolffd@0: plot(D(:,1),D(:,2),'+r'), hold on
wolffd@0: som_grid(sMap,'Coord',sMap.codebook)
wolffd@0: title('Map in input space')
wolffd@0: 
wolffd@0: %    The black dots show positions of map units, and the gray lines
wolffd@0: %    show connections between neighboring map units.  Since the map
wolffd@0: %    was initialized randomly, the positions in in the input space are
wolffd@0: %    completely disorganized. The red crosses are training data.
wolffd@0: 
wolffd@0: pause % Strike any key to train the SOM...
wolffd@0: 
wolffd@0: %    During training, the map organizes and folds to the training
wolffd@0: %    data. Here, the sequential training algorithm is used:
wolffd@0: 
wolffd@0: sMap  = som_seqtrain(sMap,D,'radius',[5 1],'trainlen',10);
wolffd@0: 
wolffd@0: subplot(1,3,3)
wolffd@0: som_grid(sMap,'Coord',sMap.codebook)
wolffd@0: hold on, plot(D(:,1),D(:,2),'+r')
wolffd@0: title('Trained map')
wolffd@0: 
wolffd@0: pause % Strike any key to view more closely the training process...
wolffd@0: 
wolffd@0: 
wolffd@0: clf
wolffd@0: 
wolffd@0: clc
wolffd@0: %    TRAINING THE SELF-ORGANIZING MAP
wolffd@0: %    ================================
wolffd@0: 
wolffd@0: %    To get a better idea of what happens during training, let's look
wolffd@0: %    at how the map gradually unfolds and organizes itself. To make it
wolffd@0: %    even more clear, the map is now initialized so that it is away
wolffd@0: %    from the data.
wolffd@0: 
wolffd@0: sMap = som_randinit(D,'msize',msize);
wolffd@0: sMap.codebook = sMap.codebook + 1;
wolffd@0: 
wolffd@0: subplot(1,2,1)
wolffd@0: som_grid(sMap,'Coord',sMap.codebook)
wolffd@0: hold on, plot(D(:,1),D(:,2),'+r'), hold off
wolffd@0: title('Data and original map')
wolffd@0: 
wolffd@0: %    The training is based on two principles: 
wolffd@0: %     
wolffd@0: %      Competitive learning: the prototype vector most similar to a
wolffd@0: %      data vector is modified so that it it is even more similar to
wolffd@0: %      it. This way the map learns the position of the data cloud.
wolffd@0: %
wolffd@0: %      Cooperative learning: not only the most similar prototype
wolffd@0: %      vector, but also its neighbors on the map are moved towards the
wolffd@0: %      data vector. This way the map self-organizes.
wolffd@0: 
wolffd@0: pause % Strike any key to train the map...
wolffd@0: 
wolffd@0: echo off
wolffd@0: subplot(1,2,2)
wolffd@0: o = ones(5,1);
wolffd@0: r = (1-[1:60]/60);
wolffd@0: for i=1:60,
wolffd@0:   sMap = som_seqtrain(sMap,D,'tracking',0,...
wolffd@0: 		      'trainlen',5,'samples',...
wolffd@0: 		      'alpha',0.1*o,'radius',(4*r(i)+1)*o);
wolffd@0:   som_grid(sMap,'Coord',sMap.codebook)
wolffd@0:   hold on, plot(D(:,1),D(:,2),'+r'), hold off
wolffd@0:   title(sprintf('%d/300 training steps',5*i))
wolffd@0:   drawnow
wolffd@0: end
wolffd@0: title('Sequential training after 300 steps')
wolffd@0: echo on
wolffd@0: 
wolffd@0: pause % Strike any key to continue with 3D data...
wolffd@0: 
wolffd@0: clf
wolffd@0: 
wolffd@0: clc
wolffd@0: %    TRAINING DATA: THE UNIT CUBE
wolffd@0: %    ============================
wolffd@0: 
wolffd@0: %    Above, the map dimension was equal to input space dimension: both
wolffd@0: %    were 2-dimensional. Typically, the input space dimension is much
wolffd@0: %    higher than the 2-dimensional map. In this case the map cannot
wolffd@0: %    follow perfectly the data set any more but must find a balance
wolffd@0: %    between two goals:
wolffd@0: 
wolffd@0: %      - data representation accuracy
wolffd@0: %      - data set topology representation accuracy    
wolffd@0: 
wolffd@0: %    Here are 500 data points sampled from the unit cube:
wolffd@0: 
wolffd@0: D = rand(500,3);
wolffd@0: 
wolffd@0: subplot(1,3,1), plot3(D(:,1),D(:,2),D(:,3),'+r')
wolffd@0: view(3), axis on, rotate3d on
wolffd@0: title('Data')
wolffd@0: 
wolffd@0: %    The ROTATE3D command enables you to rotate the picture by
wolffd@0: %    dragging the pointer above the picture with the leftmost mouse
wolffd@0: %    button pressed down.
wolffd@0: 
wolffd@0: pause % Strike any key to train the SOM...
wolffd@0: 
wolffd@0: 
wolffd@0: 
wolffd@0: 
wolffd@0: clc
wolffd@0: %    DEFAULT TRAINING PROCEDURE
wolffd@0: %    ==========================
wolffd@0: 
wolffd@0: %    Above, the initialization was done randomly and training was done
wolffd@0: %    with sequential training function (SOM_SEQTRAIN). By default, the
wolffd@0: %    initialization is linear, and batch training algorithm is
wolffd@0: %    used. In addition, the training is done in two phases: first with
wolffd@0: %    large neighborhood radius, and then finetuning with small radius.
wolffd@0: 
wolffd@0: %    The function SOM_MAKE can be used to both initialize and train
wolffd@0: %    the map using default parameters:
wolffd@0: 
wolffd@0: pause % Strike any key to use SOM_MAKE...
wolffd@0: 
wolffd@0: sMap = som_make(D);
wolffd@0: 
wolffd@0: %    Here, the linear initialization is done again, so that 
wolffd@0: %    the results can be compared.
wolffd@0: 
wolffd@0: sMap0 = som_lininit(D); 
wolffd@0: 
wolffd@0: subplot(1,3,2)
wolffd@0: som_grid(sMap0,'Coord',sMap0.codebook,...
wolffd@0: 	 'Markersize',2,'Linecolor','k','Surf',sMap0.codebook(:,3)) 
wolffd@0: axis([0 1 0 1 0 1]), view(-120,-25), title('After initialization')
wolffd@0: 
wolffd@0: subplot(1,3,3)
wolffd@0: som_grid(sMap,'Coord',sMap.codebook,...
wolffd@0: 	 'Markersize',2,'Linecolor','k','Surf',sMap.codebook(:,3)) 
wolffd@0: axis([0 1 0 1 0 1]), view(3), title('After training'), hold on
wolffd@0: 
wolffd@0: %    Here you can see that the 2-dimensional map has folded into the
wolffd@0: %    3-dimensional space in order to be able to capture the whole data
wolffd@0: %    space. 
wolffd@0: 
wolffd@0: pause % Strike any key to evaluate the quality of maps...
wolffd@0: 
wolffd@0: 
wolffd@0: 
wolffd@0: clc
wolffd@0: %    BEST-MATCHING UNITS (BMU)
wolffd@0: %    =========================
wolffd@0: 
wolffd@0: %    Before going to the quality, an important concept needs to be
wolffd@0: %    introduced: the Best-Matching Unit (BMU). The BMU of a data
wolffd@0: %    vector is the unit on the map whose model vector best resembles
wolffd@0: %    the data vector. In practise the similarity is measured as the
wolffd@0: %    minimum distance between data vector and each model vector on the
wolffd@0: %    map. The BMUs can be calculated using function SOM_BMUS. This
wolffd@0: %    function gives the index of the unit.
wolffd@0: 
wolffd@0: %    Here the BMU is searched for the origin point (from the
wolffd@0: %    trained map):
wolffd@0: 
wolffd@0: bmu = som_bmus(sMap,[0 0 0]);
wolffd@0: 
wolffd@0: %    Here the corresponding unit is shown in the figure. You can
wolffd@0: %    rotate the figure to see better where the BMU is.
wolffd@0: 
wolffd@0: co = sMap.codebook(bmu,:);
wolffd@0: text(co(1),co(2),co(3),'BMU','Fontsize',20)
wolffd@0: plot3([0 co(1)],[0 co(2)],[0 co(3)],'ro-')
wolffd@0: 
wolffd@0: pause % Strike any key to analyze map quality...
wolffd@0: 
wolffd@0: 
wolffd@0: 
wolffd@0: 
wolffd@0: clc
wolffd@0: %    SELF-ORGANIZING MAP QUALITY
wolffd@0: %    ===========================
wolffd@0: 
wolffd@0: %    The maps have two primary quality properties:
wolffd@0: %      - data representation accuracy
wolffd@0: %      - data set topology representation accuracy
wolffd@0: 
wolffd@0: %    The former is usually measured using average quantization error
wolffd@0: %    between data vectors and their BMUs on the map.  For the latter
wolffd@0: %    several measures have been proposed, e.g. the topographic error
wolffd@0: %    measure: percentage of data vectors for which the first- and
wolffd@0: %    second-BMUs are not adjacent units.
wolffd@0: 
wolffd@0: %    Both measures have been implemented in the SOM_QUALITY function.
wolffd@0: %    Here are the quality measures for the trained map: 
wolffd@0: 
wolffd@0: [q,t] = som_quality(sMap,D)
wolffd@0: 
wolffd@0: %    And here for the initial map:
wolffd@0: 
wolffd@0: [q0,t0] = som_quality(sMap0,D)
wolffd@0: 
wolffd@0: %    As can be seen, by folding the SOM has reduced the average
wolffd@0: %    quantization error, but on the other hand the topology
wolffd@0: %    representation capability has suffered.  By using a larger final
wolffd@0: %    neighborhood radius in the training, the map becomes stiffer and
wolffd@0: %    preserves the topology of the data set better.
wolffd@0: 
wolffd@0: 
wolffd@0: echo off
wolffd@0: 
wolffd@0: