wolffd@0: function [C,P]=knn(d, Cp, K) wolffd@0: wolffd@0: %KNN K-Nearest Neighbor classifier using an arbitrary distance matrix wolffd@0: % wolffd@0: % [C,P]=knn(d, Cp, [K]) wolffd@0: % wolffd@0: % Input and output arguments ([]'s are optional): wolffd@0: % d (matrix) of size NxP: This is a precalculated dissimilarity (distance matrix). wolffd@0: % P is the number of prototype vectors and N is the number of data vectors wolffd@0: % That is, d(i,j) is the distance between data item i and prototype j. wolffd@0: % Cp (vector) of size Px1 that contains integer class labels. Cp(j) is the class of wolffd@0: % jth prototype. wolffd@0: % [K] (scalar) the maximum K in K-NN classifier, default is 1 wolffd@0: % C (matrix) of size NxK: integers indicating the class wolffd@0: % decision for data items according to the K-NN rule for each K. wolffd@0: % C(i,K) is the classification for data item i using the K-NN rule wolffd@0: % P (matrix) of size NxkxK: the relative amount of prototypes of wolffd@0: % each class among the K closest prototypes for each classifiee. wolffd@0: % That is, P(i,j,K) is the relative amount of prototypes of class j wolffd@0: % among K nearest prototypes for data item i. wolffd@0: % wolffd@0: % If there is a tie between representatives of two or more classes wolffd@0: % among the K closest neighbors to the classifiee, the class i selected randomly wolffd@0: % among these candidates. wolffd@0: % wolffd@0: % IMPORTANT If K>1 this function uses 'sort' which is considerably slower than wolffd@0: % 'max' which is used for K=1. If K>1 the knn always calculates wolffd@0: % results for all K-NN models from 1-NN up to K-NN. wolffd@0: % wolffd@0: % EXAMPLE 1 wolffd@0: % wolffd@0: % sP; % a SOM Toolbox data struct containing labeled prototype vectors wolffd@0: % [Cp,label]=som_label2num(sP); % get integer class labels for prototype vectors wolffd@0: % sD; % a SOM Toolbox data struct containing vectors to be classified wolffd@0: % d=som_eucdist2(sD,sP); % calculate euclidean distance matrix wolffd@0: % class=knn(d,Cp,10); % classify using 1,2,...,10-rules wolffd@0: % class(:,5); % includes results for 5NN wolffd@0: % label(class(:,5)) % original class labels for 5NN wolffd@0: % wolffd@0: % EXAMPLE 2 (leave-one-out-crossvalidate KNN for selection of proper K) wolffd@0: % wolffd@0: % P; % a data matrix of prototype vectors (rows) wolffd@0: % Cp; % column vector of integer class labels for vectors in P wolffd@0: % d=som_eucdist2(P,P); % calculate euclidean distance matrix PxP wolffd@0: % d(eye(size(d))==1)=NaN; % set self-dissimilarity to NaN: wolffd@0: % % this drops the prototype itself away from its neighborhood wolffd@0: % % leave-one-out-crossvalidation (LOOCV) wolffd@0: % class=knn(d,Cp,size(P,1)); % classify using all possible K wolffd@0: % % calculate and plot LOOC-validated errors for all K wolffd@0: % failratep = ... wolffd@0: % 100*sum((class~=repmat(Cp,1,size(P,1))))./size(P,1); plot(1:size(P,1),failratep) wolffd@0: wolffd@0: % See also SOM_LABEL2NUM, SOM_EUCDIST2, PDIST. wolffd@0: % wolffd@0: % Contributed to SOM Toolbox 2.0, October 29th, 2000 by Johan Himberg wolffd@0: % Copyright (c) by Johan Himberg wolffd@0: % http://www.cis.hut.fi/projects/somtoolbox/ wolffd@0: wolffd@0: % Version 2.0beta Johan 291000 wolffd@0: wolffd@0: %% Init %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% wolffd@0: wolffd@0: % Check K wolffd@0: if nargin<3 | isempty(K), wolffd@0: K=1; wolffd@0: end wolffd@0: wolffd@0: if ~vis_valuetype(K,{'1x1'}) wolffd@0: error('Value for K must be a scalar'); wolffd@0: end wolffd@0: wolffd@0: % Check that dist is a matrix wolffd@0: if ~vis_valuetype(d,{'nxm'}), wolffd@0: error('Distance matrix not valid.') wolffd@0: end wolffd@0: wolffd@0: [N_data N_proto]=size(d); wolffd@0: wolffd@0: % Check class label vector: must be numerical and of integers wolffd@0: if ~vis_valuetype(Cp,{[N_proto 1]}); wolffd@0: error(['Class vector is invalid: has to be a N-of-data_rows x 1' ... wolffd@0: ' vector of integers']); wolffd@0: elseif sum(fix(Cp)-Cp)~=0 wolffd@0: error('Class labels in vector ''Cp'' must be integers.'); wolffd@0: end wolffd@0: wolffd@0: if size(d,2) ~= length(Cp), wolffd@0: error('Distance matrix and prototype class vector dimensions do not match.'); wolffd@0: end wolffd@0: wolffd@0: % Check if the classes are given as labels (no class input arg.) wolffd@0: % if they are take them from prototype struct wolffd@0: wolffd@0: % Find all class labels wolffd@0: ClassIndex=unique(Cp); wolffd@0: N_class=length(ClassIndex); % number of different classes wolffd@0: wolffd@0: wolffd@0: %%%% Classification %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% wolffd@0: wolffd@0: if K==1, % sort distances only if K>1 wolffd@0: wolffd@0: % 1NN wolffd@0: % Select the closest prototype wolffd@0: [tmp,proto_index]=min(d,[],2); wolffd@0: C=Cp(proto_index); wolffd@0: wolffd@0: else wolffd@0: wolffd@0: % Sort the prototypes for each classifiee according to distance wolffd@0: [tmp, proto_index]=sort(d'); wolffd@0: wolffd@0: %% Select up to K closest prototypes wolffd@0: proto_index=proto_index(1:K,:); wolffd@0: knn_class=Cp(proto_index); wolffd@0: for i=1:N_class, wolffd@0: classcounter(:,:,i)=cumsum(knn_class==ClassIndex(i)); wolffd@0: end wolffd@0: wolffd@0: %% Vote between classes of K neighbors wolffd@0: [winner,vote_index]=max(classcounter,[],3); wolffd@0: wolffd@0: %%% Handle ties wolffd@0: wolffd@0: % Set index to classes that got as much votes as winner wolffd@0: wolffd@0: equal_to_winner=(repmat(winner,[1 1 N_class])==classcounter); wolffd@0: wolffd@0: % set index to ties wolffd@0: [tie_indexi,tie_indexj]=find(sum(equal_to_winner,3)>1); % drop the winner from counter wolffd@0: wolffd@0: % Go through tie cases and reset vote_index randomly to one wolffd@0: % of them wolffd@0: wolffd@0: for i=1:length(tie_indexi), wolffd@0: tie_class_index=find(squeeze(equal_to_winner(tie_indexi(i),tie_indexj(i),:))); wolffd@0: fortuna=randperm(length(tie_class_index)); wolffd@0: vote_index(tie_indexi(i),tie_indexj(i))=tie_class_index(fortuna(1)); wolffd@0: end wolffd@0: wolffd@0: C=ClassIndex(vote_index)'; wolffd@0: end wolffd@0: wolffd@0: %% Build output %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% wolffd@0: wolffd@0: % Relative amount of classes in K neighbors for each classifiee wolffd@0: wolffd@0: if K==1, wolffd@0: P=zeros(N_data,N_class); wolffd@0: if nargout>1, wolffd@0: for i=1:N_data, wolffd@0: P(i,ClassIndex==C(i))=1; wolffd@0: end wolffd@0: end wolffd@0: else wolffd@0: P=shiftdim(classcounter,1)./repmat(shiftdim(1:K,-1), [N_data N_class 1]); wolffd@0: end wolffd@0: