wolffd@0: function [Bmus,Qerrors] = som_bmus(sMap, sData, which_bmus, mask) wolffd@0: wolffd@0: %SOM_BMUS Find the best-matching units from the map for the given vectors. wolffd@0: % wolffd@0: % [Bmus, Qerrors] = som_bmus(sMap, sData, [which], [mask]) wolffd@0: % wolffd@0: % bmus = som_bmus(sM,sD); wolffd@0: % [bmus,qerrs] = som_bmus(sM,D,[1 2 3]); wolffd@0: % bmus = som_bmus(sM,D,1,[1 1 0 0 1]); wolffd@0: % wolffd@0: % Input and output arguments ([]'s are optional): wolffd@0: % sMap (struct) map struct wolffd@0: % (matrix) codebook matrix, size munits x dim wolffd@0: % sData (struct) data struct wolffd@0: % (matrix) data matrix, size dlen x dim wolffd@0: % [which] (vector) which BMUs are returned, [1] by default wolffd@0: % (string) 'all', 'best' or 'worst' meaning [1:munits], wolffd@0: % [1] and [munits] respectively wolffd@0: % [mask] (vector) mask vector, length=dim, sMap.mask by default wolffd@0: % wolffd@0: % Bmus (matrix) the requested BMUs for each data vector, wolffd@0: % size dlen x length(which) wolffd@0: % Qerrors (matrix) the corresponding quantization errors, size as Bmus wolffd@0: % wolffd@0: % NOTE: for a vector with all components NaN's, bmu=NaN and qerror=NaN wolffd@0: % NOTE: the mask also effects the quantization errors wolffd@0: % wolffd@0: % For more help, try 'type som_bmus' or check out online documentation. wolffd@0: % See also SOM_QUALITY. wolffd@0: wolffd@0: %%%%%%%%%%%%% DETAILED DESCRIPTION %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% wolffd@0: % wolffd@0: % som_bmus wolffd@0: % wolffd@0: % PURPOSE wolffd@0: % wolffd@0: % Finds Best-Matching Units (BMUs) for given data vector from a given map. wolffd@0: % wolffd@0: % SYNTAX wolffd@0: % wolffd@0: % Bmus = som_bmus(sMap, sData) wolffd@0: % Bmus = som_bmus(..., which) wolffd@0: % Bmus = som_bmus(..., which, mask) wolffd@0: % [Bmus, Qerrs] = som_bmus(...) wolffd@0: % wolffd@0: % DESCRIPTION wolffd@0: % wolffd@0: % Returns the indexes and corresponding quantization errors of the wolffd@0: % vectors in sMap that best matched the vectors in sData. wolffd@0: % wolffd@0: % By default only the index of the best matching unit (/vector) is wolffd@0: % returned, but the 'which' argument can be used to get others as wolffd@0: % well. For example it might be desirable to get also second- and wolffd@0: % third-best matching units as well (which = [1:3]). wolffd@0: % wolffd@0: % A mask can be used to weight the search process. The mask is used to wolffd@0: % weight the influence of components in the distance calculation, as wolffd@0: % follows: wolffd@0: % wolffd@0: % distance(x,y) = (x-y)' diag(mask) (x-y) wolffd@0: % wolffd@0: % where x and y are two vectors, and diag(mask) is a diagonal matrix with wolffd@0: % the elements of mask vector on the diagonal. wolffd@0: % wolffd@0: % The vectors in the data set (sData) can contain unknown components wolffd@0: % (NaNs), but the map (sMap) cannot. If there are completely empty wolffd@0: % vectors (all NaNs), the returned BMUs and quantization errors for those wolffd@0: % vectors are NaNs. wolffd@0: % wolffd@0: % REQUIRED INPUT ARGUMENTS wolffd@0: % wolffd@0: % sMap The vectors from among which the BMUs are searched wolffd@0: % for. These must not have any unknown components (NaNs). wolffd@0: % (struct) map struct wolffd@0: % (matrix) codebook matrix, size munits x dim wolffd@0: % wolffd@0: % sData The data vector(s) for which the BMUs are searched. wolffd@0: % (struct) data struct wolffd@0: % (matrix) data matrix, size dlen x dim wolffd@0: % wolffd@0: % OPTIONAL INPUT ARGUMENTS wolffd@0: % wolffd@0: % which (vector) which BMUs are returned, wolffd@0: % by default only the best (ie. which = [1]) wolffd@0: % (string) 'all', 'best' or 'worst' meaning [1:munits], wolffd@0: % [1] and [munits] respectively wolffd@0: % mask (vector) mask vector to be used in BMU search, wolffd@0: % by default sMap.mask, or ones(dim,1) in case wolffd@0: % a matrix was given wolffd@0: % wolffd@0: % OUTPUT ARGUMENTS wolffd@0: % wolffd@0: % Bmus (matrix) the requested BMUs for each data vector, wolffd@0: % size dlen x length(which) wolffd@0: % Qerrors (matrix) the corresponding quantization errors, wolffd@0: % size equal to that of Bmus wolffd@0: % wolffd@0: % EXAMPLES wolffd@0: % wolffd@0: % Simplest case: wolffd@0: % bmu = som_bmus(sM, [0.3 -0.4 1.0]); wolffd@0: % % 3-dimensional data, returns BMU for vector [0.3 -0.4 1] wolffd@0: % bmu = som_bmus(sM, [0.3 -0.4 1.0], [3 5]); wolffd@0: % % as above, except returns the 3rd and 5th BMUs wolffd@0: % bmu = som_bmus(sM, [0.3 -0.4 1.0], [], [1 0 1]); wolffd@0: % % as above, except ignores second component in searching wolffd@0: % [bmus qerrs] = som_bmus(sM, D); wolffd@0: % % returns BMUs and corresponding quantization errors wolffd@0: % % for each vector in D wolffd@0: % bmus = som_bmus(sM, sD); wolffd@0: % % returns BMUs for each vector in sD using the mask in sM wolffd@0: % wolffd@0: % SEE ALSO wolffd@0: % wolffd@0: % som_quality Measure the quantization and topographic error of a SOM. wolffd@0: wolffd@0: % Copyright (c) 1997-2000 by the SOM toolbox programming team. wolffd@0: % http://www.cis.hut.fi/projects/somtoolbox/ wolffd@0: wolffd@0: % Version 1.0beta juuso 071197, 101297 wolffd@0: % Version 2.0alpha juuso 201198 080200 wolffd@0: wolffd@0: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% wolffd@0: %% check arguments and initialize wolffd@0: wolffd@0: error(nargchk(1, 4, nargin)); % check no. of input args is correct wolffd@0: wolffd@0: % sMap wolffd@0: if isstruct(sMap), wolffd@0: switch sMap.type, wolffd@0: case 'som_map', M = sMap.codebook; wolffd@0: case 'som_data', M = sMap.data; wolffd@0: otherwise, error('Invalid 1st argument.'); wolffd@0: end wolffd@0: else wolffd@0: M = sMap; wolffd@0: end wolffd@0: [munits dim] = size(M); wolffd@0: if any(any(isnan(M))), wolffd@0: error ('Map codebook must not have missing components.'); wolffd@0: end wolffd@0: wolffd@0: % data wolffd@0: if isstruct(sData), wolffd@0: switch sData.type, wolffd@0: case 'som_map', D = sData.codebook; wolffd@0: case 'som_data', D = sData.data; wolffd@0: otherwise, error('Invalid 2nd argument.'); wolffd@0: end wolffd@0: else wolffd@0: D = sData; wolffd@0: end wolffd@0: [dlen ddim] = size(D); wolffd@0: if dim ~= ddim, wolffd@0: error('Data and map dimensions do not match.') wolffd@0: end wolffd@0: wolffd@0: % which_bmus wolffd@0: if nargin < 3 | isempty(which_bmus) | any(isnan(which_bmus)), wolffd@0: which_bmus = 1; wolffd@0: else wolffd@0: if ischar(which_bmus), wolffd@0: switch which_bmus, wolffd@0: case 'best', which_bmus = 1; wolffd@0: case 'worst', which_bmus = munits; wolffd@0: case 'all', which_bmus = [1:munits]; wolffd@0: end wolffd@0: end wolffd@0: end wolffd@0: wolffd@0: % mask wolffd@0: if nargin < 4 | isempty(mask) | any(isnan(mask)), wolffd@0: if isstruct(sMap) & strcmp(sMap.type,'som_map'), wolffd@0: mask = sMap.mask; wolffd@0: elseif isstruct(sData) & strcmp(sData.type,'som_map'), wolffd@0: mask = sData.mask; wolffd@0: else wolffd@0: mask = ones(dim,1); wolffd@0: end wolffd@0: end wolffd@0: if size(mask,1)==1, mask = mask'; end wolffd@0: if all(mask == 0), wolffd@0: error('All components masked off. BMU search cannot be done.'); wolffd@0: end wolffd@0: wolffd@0: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% wolffd@0: %% action wolffd@0: wolffd@0: Bmus = zeros(dlen,length(which_bmus)); wolffd@0: Qerrors = Bmus; wolffd@0: wolffd@0: % The BMU search involves calculating weighted Euclidian distances wolffd@0: % to all map units for each data vector. Basically this is done as wolffd@0: % for i=1:dlen, wolffd@0: % for j=1:munits, wolffd@0: % for k=1:dim, wolffd@0: % Dist(j,i) = Dist(j,i) + mask(k) * (D(i,k) - M(j,k))^2; wolffd@0: % end wolffd@0: % end wolffd@0: % end wolffd@0: % where mask is the weighting vector for distance calculation. However, taking wolffd@0: % into account that distance between vectors m and v can be expressed as wolffd@0: % |m - v|^2 = sum_i ((m_i - v_i)^2) = sum_i (m_i^2 + v_i^2 - 2*m_i*v_i) wolffd@0: % this can be made much faster by transforming it to a matrix operation: wolffd@0: % Dist = (M.^2)*mask*ones(1,d) + ones(m,1)*mask'*(D'.^2) - 2*M*diag(mask)*D' wolffd@0: % wolffd@0: % In the case where there are unknown components in the data, each data wolffd@0: % vector will have an individual mask vector so that for that unit, the wolffd@0: % unknown components are not taken into account in distance calculation. wolffd@0: % In addition all NaN's are changed to zeros so that they don't screw up wolffd@0: % the matrix multiplications. wolffd@0: wolffd@0: % calculate distances & bmus wolffd@0: wolffd@0: % This is done a block of data at a time rather than in a wolffd@0: % single sweep to save memory consumption. The 'Dist' matrix has wolffd@0: % size munits*blen which would be HUGE if you did it in a single-sweep wolffd@0: % operation. If you _want_ to use the single-sweep version, just wolffd@0: % set blen = dlen. If you're having problems with memory, try to wolffd@0: % set the value of blen lower. wolffd@0: blen = min(munits,dlen); wolffd@0: wolffd@0: % handle unknown components wolffd@0: Known = ~isnan(D); wolffd@0: W1 = (mask*ones(1,dlen)) .* Known'; wolffd@0: D(find(~Known)) = 0; wolffd@0: unknown = find(sum(Known')==0); % completely unknown vectors wolffd@0: wolffd@0: % constant matrices wolffd@0: WD = 2*diag(mask)*D'; % constant matrix wolffd@0: dconst = ((D.^2)*mask); % constant term in the distances wolffd@0: wolffd@0: i0 = 0; wolffd@0: while i0+1<=dlen, wolffd@0: % calculate distances wolffd@0: inds = [(i0+1):min(dlen,i0+blen)]; i0 = i0+blen; wolffd@0: Dist = (M.^2)*W1(:,inds) - M*WD(:,inds); % plus dconst for each sample wolffd@0: wolffd@0: % find the bmus and the corresponding quantization errors wolffd@0: if all(which_bmus==1), [Q B] = min(Dist); else [Q B] = sort(Dist); end wolffd@0: if munits==1, Bmus(inds,:) = 1; else Bmus(inds,:) = B(which_bmus,:)'; end wolffd@0: Qerrors(inds,:) = Q(which_bmus,:)' + dconst(inds,ones(length(which_bmus),1)); wolffd@0: end wolffd@0: wolffd@0: % completely unknown vectors wolffd@0: if ~isempty(unknown), wolffd@0: Bmus(unknown,:) = NaN; wolffd@0: Qerrors(unknown,:) = NaN; wolffd@0: end wolffd@0: wolffd@0: Qerrors = sqrt(Qerrors); wolffd@0: wolffd@0: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%