Daniel@0: function [centres, cweights, post, errlog, options] = weighted_kmeans(centres, data, weights, options) Daniel@0: %[centres, cweights, post, errlog, options] = weighted_kmeans(centres,data, weights, options) Daniel@0: % Daniel@0: % weighted_kmeans Trains a k means cluster model on weighted input vectors Daniel@0: % Daniel@0: % Adapted from the Netlab Toolbox by Daniel Wolff, Daniel@0: % This function takes a WEIGHTS vector, containing weights for the Daniel@0: % different data points. This can be used for training with varying Daniel@0: % discretisation intervals. Daniel@0: % Daniel@0: % Description Daniel@0: % CENTRES = weighted_kmeans(NCENTRES, DATA, WEIGHTS, OPTIONS) or Daniel@0: % CENTRES = weighted_kmeans(CENTRES, DATA, WEIGHTS, OPTIONS) uses the batch K-means Daniel@0: % algorithm to set the centres of a cluster model. The matrix DATA Daniel@0: % represents the data which is being clustered, with each row Daniel@0: % corresponding to a vector. The sum of squares error function is used. Daniel@0: % The point at which a local minimum is achieved is returned as Daniel@0: % CENTRES. The error value at that point is returned in OPTIONS(8). Daniel@0: % Daniel@0: % Daniel@0: % POST and ERRLOG Daniel@0: % also return the cluster number (in a one-of-N encoding) for each Daniel@0: % data point in POST and a log of the error values after each cycle in Daniel@0: % ERRLOG. The optional parameters have the following Daniel@0: % interpretations. Daniel@0: % Daniel@0: % OPTIONS(1) is set to 1 to display error values; also logs error Daniel@0: % values in the return argument ERRLOG. If OPTIONS(1) is set to 0, then Daniel@0: % only warning messages are displayed. If OPTIONS(1) is -1, then Daniel@0: % nothing is displayed. Daniel@0: % Daniel@0: % OPTIONS(2) is a measure of the absolute precision required for the Daniel@0: % value of CENTRES at the solution. If the absolute difference between Daniel@0: % the values of CENTRES between two successive steps is less than Daniel@0: % OPTIONS(2), then this condition is satisfied. Daniel@0: % Daniel@0: % OPTIONS(3) is a measure of the precision required of the error Daniel@0: % function at the solution. If the absolute difference between the Daniel@0: % error functions between two successive steps is less than OPTIONS(3), Daniel@0: % then this condition is satisfied. Both this and the previous Daniel@0: % condition must be satisfied for termination. Daniel@0: % Daniel@0: % OPTIONS(14) is the maximum number of iterations; default 100. Daniel@0: % Daniel@0: % See also Daniel@0: % GMMINIT, GMMEM Daniel@0: % Daniel@0: Daniel@0: % Copyright (c) Ian T Nabney (1996-2001) Daniel@0: Daniel@0: [ndata, data_dim] = size(data); Daniel@0: [ncentres, dim] = size(centres); Daniel@0: Daniel@0: if dim ~= data_dim Daniel@0: if dim == 1 && ncentres == 1 && centres > 1 Daniel@0: Daniel@0: if ndata == numel(weights) Daniel@0: Daniel@0: % --- Daniel@0: % allow for number of centres specification Daniel@0: % --- Daniel@0: dim = data_dim; Daniel@0: ncentres = centres; Daniel@0: Daniel@0: options(5) = 1; Daniel@0: else Daniel@0: error('Data dimension does not match number of weights') Daniel@0: end Daniel@0: Daniel@0: else Daniel@0: error('Data dimension does not match dimension of centres') Daniel@0: end Daniel@0: end Daniel@0: Daniel@0: if (ncentres > ndata) Daniel@0: error('More centres than data') Daniel@0: end Daniel@0: Daniel@0: % Sort out the options Daniel@0: if (options(14)) Daniel@0: niters = options(14); Daniel@0: else Daniel@0: niters = 100; Daniel@0: end Daniel@0: Daniel@0: store = 0; Daniel@0: if (nargout > 3) Daniel@0: store = 1; Daniel@0: errlog = zeros(1, niters); Daniel@0: end Daniel@0: Daniel@0: % Check if centres and posteriors need to be initialised from data Daniel@0: if (options(5) == 1) Daniel@0: % Do the initialisation Daniel@0: perm = randperm(ndata); Daniel@0: perm = perm(1:ncentres); Daniel@0: Daniel@0: % Assign first ncentres (permuted) data points as centres Daniel@0: centres = data(perm, :); Daniel@0: end Daniel@0: % Matrix to make unit vectors easy to construct Daniel@0: id = eye(ncentres); Daniel@0: Daniel@0: % save accumulated weight for a center Daniel@0: cweights = zeros(ncentres, 1); Daniel@0: Daniel@0: % Main loop of algorithm Daniel@0: for n = 1:niters Daniel@0: Daniel@0: % Save old centres to check for termination Daniel@0: old_centres = centres; Daniel@0: Daniel@0: % Calculate posteriors based on existing centres Daniel@0: d2 = dist2(data, centres); Daniel@0: % Assign each point to nearest centre Daniel@0: [minvals, index] = min(d2', [], 1); Daniel@0: post = logical(id(index,:)); Daniel@0: Daniel@0: % num_points = sum(post, 1); Daniel@0: % Adjust the centres based on new posteriors Daniel@0: for j = 1:ncentres Daniel@0: if (sum(weights(post(:,j))) > 0) Daniel@0: % --- Daniel@0: % NOTE: this is edited to include the weights. Daniel@0: % Instead of summing the vectors directly, the vectors are weighted Daniel@0: % and then the result is divided by the sum of the weights instead Daniel@0: % of the number of vectors for this class Daniel@0: % --- Daniel@0: cweights(j) = sum(weights(post(:,j))); Daniel@0: Daniel@0: centres(j,:) = sum(diag(weights(post(:,j))) * data(post(:,j),:), 1)... Daniel@0: /cweights(j); Daniel@0: end Daniel@0: end Daniel@0: Daniel@0: % Error value is total squared distance from cluster centres Daniel@0: % edit: weighted by the vectors weight Daniel@0: e = sum(minvals .* weights); Daniel@0: if store Daniel@0: errlog(n) = e; Daniel@0: end Daniel@0: if options(1) > 0 Daniel@0: fprintf(1, 'Cycle %4d Error %11.6f\n', n, e); Daniel@0: end Daniel@0: Daniel@0: if n > 1 Daniel@0: % Test for termination Daniel@0: if max(max(abs(centres - old_centres))) < options(2) & ... Daniel@0: abs(old_e - e) < options(3) Daniel@0: options(8) = e; Daniel@0: return; Daniel@0: end Daniel@0: end Daniel@0: old_e = e; Daniel@0: end Daniel@0: Daniel@0: % If we get here, then we haven't terminated in the given number of Daniel@0: % iterations. Daniel@0: options(8) = e; Daniel@0: if (options(1) >= 0) Daniel@0: disp(maxitmess); Daniel@0: end Daniel@0: