Daniel@0: function [centres, options, post, errlog] = netlabkmeans(centres, data, options) Daniel@0: %KMEANS Trains a k means cluster model. Daniel@0: %(Renamed NETLABKMEANS in MIRtoolbox in order to avoid conflict with Daniel@0: % statistics toolbox...) Daniel@0: % Daniel@0: % Description Daniel@0: % CENTRES = KMEANS(CENTRES, DATA, OPTIONS) uses the batch K-means Daniel@0: % algorithm to set the centres of a cluster model. The matrix DATA Daniel@0: % represents the data which is being clustered, with each row Daniel@0: % corresponding to a vector. The sum of squares error function is used. Daniel@0: % The point at which a local minimum is achieved is returned as Daniel@0: % CENTRES. The error value at that point is returned in OPTIONS(8). Daniel@0: % Daniel@0: % [CENTRES, OPTIONS, POST, ERRLOG] = KMEANS(CENTRES, DATA, OPTIONS) Daniel@0: % also returns the cluster number (in a one-of-N encoding) for each Daniel@0: % data point in POST and a log of the error values after each cycle in Daniel@0: % ERRLOG. The optional parameters have the following Daniel@0: % interpretations. Daniel@0: % Daniel@0: % OPTIONS(1) is set to 1 to display error values; also logs error Daniel@0: % values in the return argument ERRLOG. If OPTIONS(1) is set to 0, then Daniel@0: % only warning messages are displayed. If OPTIONS(1) is -1, then Daniel@0: % nothing is displayed. Daniel@0: % Daniel@0: % OPTIONS(2) is a measure of the absolute precision required for the Daniel@0: % value of CENTRES at the solution. If the absolute difference between Daniel@0: % the values of CENTRES between two successive steps is less than Daniel@0: % OPTIONS(2), then this condition is satisfied. Daniel@0: % Daniel@0: % OPTIONS(3) is a measure of the precision required of the error Daniel@0: % function at the solution. If the absolute difference between the Daniel@0: % error functions between two successive steps is less than OPTIONS(3), Daniel@0: % then this condition is satisfied. Both this and the previous Daniel@0: % condition must be satisfied for termination. Daniel@0: % Daniel@0: % OPTIONS(14) is the maximum number of iterations; default 100. Daniel@0: % Daniel@0: % See also Daniel@0: % GMMINIT, GMMEM Daniel@0: % Daniel@0: Daniel@0: % Copyright (c) Ian T Nabney (1996-2001) Daniel@0: Daniel@0: [ndata, data_dim] = size(data); Daniel@0: [ncentres, dim] = size(centres); Daniel@0: Daniel@0: if dim ~= data_dim Daniel@0: error('Data dimension does not match dimension of centres') Daniel@0: end Daniel@0: Daniel@0: if (ncentres > ndata) Daniel@0: error('More centres than data') Daniel@0: end Daniel@0: Daniel@0: % Sort out the options Daniel@0: if (options(14)) Daniel@0: niters = options(14); Daniel@0: else Daniel@0: niters = 100; Daniel@0: end Daniel@0: Daniel@0: store = 0; Daniel@0: if (nargout > 3) Daniel@0: store = 1; Daniel@0: errlog = zeros(1, niters); Daniel@0: end Daniel@0: Daniel@0: % Check if centres and posteriors need to be initialised from data Daniel@0: if (options(5) == 1) Daniel@0: % Do the initialisation Daniel@0: perm = randperm(ndata); Daniel@0: perm = perm(1:ncentres); Daniel@0: Daniel@0: % Assign first ncentres (permuted) data points as centres Daniel@0: centres = data(perm, :); Daniel@0: end Daniel@0: % Matrix to make unit vectors easy to construct Daniel@0: id = eye(ncentres); Daniel@0: Daniel@0: % Main loop of algorithm Daniel@0: for n = 1:niters Daniel@0: Daniel@0: % Save old centres to check for termination Daniel@0: old_centres = centres; Daniel@0: Daniel@0: % Calculate posteriors based on existing centres Daniel@0: d2 = dist2(data, centres); Daniel@0: % Assign each point to nearest centre Daniel@0: [minvals, index] = min(d2', [], 1); Daniel@0: post = id(index,:); Daniel@0: Daniel@0: num_points = sum(post, 1); Daniel@0: % Adjust the centres based on new posteriors Daniel@0: for j = 1:ncentres Daniel@0: if (num_points(j) > 0) Daniel@0: centres(j,:) = sum(data(find(post(:,j)),:), 1)/num_points(j); Daniel@0: end Daniel@0: end Daniel@0: Daniel@0: % Error value is total squared distance from cluster centres Daniel@0: e = sum(minvals); Daniel@0: if store Daniel@0: errlog(n) = e; Daniel@0: end Daniel@0: if options(1) > 0 Daniel@0: fprintf(1, 'Cycle %4d Error %11.6f\n', n, e); Daniel@0: end Daniel@0: Daniel@0: if n > 1 Daniel@0: % Test for termination Daniel@0: if max(max(abs(centres - old_centres))) < options(2) & ... Daniel@0: abs(old_e - e) < options(3) Daniel@0: options(8) = e; Daniel@0: return; Daniel@0: end Daniel@0: end Daniel@0: old_e = e; Daniel@0: end Daniel@0: Daniel@0: % If we get here, then we haven't terminated in the given number of Daniel@0: % iterations. Daniel@0: options(8) = e; Daniel@0: if (options(1) >= 0) Daniel@0: disp(maxitmess); Daniel@0: end Daniel@0: