annotate code/aglomCluster.m @ 37:d9a9a6b93026 tip

Add README
author DaveM
date Sat, 01 Apr 2017 17:03:14 +0100
parents 4af6fc2100e8
children
rev   line source
DaveM@9 1 function linkList = aglomCluster(data, clusterMethod, distanceMetric, numClusters)
DaveM@8 2 %% aglomCluster(data, clusterMethod, distanceMetric, numClusters)
DaveM@8 3 % This function performs aglomerative clustering on a given data set,
DaveM@8 4 % allowing the interpretation of a hierarchical data, and plotting a
DaveM@8 5 % dendrogram.
DaveM@8 6 %
DaveM@8 7 % data in the format of of each row is an observation and each column is a
DaveM@8 8 % feature vector clusterMethod;
DaveM@8 9 % * 'average' Unweighted average distance (UPGMA)
DaveM@8 10 % * 'centroid' Centroid distance (UPGMC), appropriate for Euclidean
DaveM@8 11 % distances only
DaveM@8 12 % * 'complete' Furthest distance
DaveM@8 13 % * 'median' Weighted center of mass distance (WPGMC),appropriate
DaveM@8 14 % for Euclidean distances only
DaveM@8 15 % * 'single' Shortest distance
DaveM@8 16 % * 'ward' Inner squared distance (minimum variance algorithm),
DaveM@8 17 % appropriate for Euclidean distances only (default)
DaveM@8 18 % * 'weighted' Weighted average distance (WPGMA)
DaveM@8 19 % distanceMetric
DaveM@8 20 % * 'euclidean' Euclidean distance (default).
DaveM@8 21 % * 'seuclidean' Standardized Euclidean distance. Each coordinate
DaveM@8 22 % difference between rows in X is scaled by dividing by the
DaveM@8 23 % corresponding element of the standard deviation S=nanstd(X). To
DaveM@8 24 % specify another value for S, use D=pdist(X,'seuclidean',S).
DaveM@8 25 % * 'cityblock' City block metric.
DaveM@8 26 % * 'minkowski' Minkowski distance. The default exponent is 2. To
DaveM@8 27 % specify a different exponent, use D = pdist(X,'minkowski',P), where P
DaveM@8 28 % is a scalar positive value of the exponent.
DaveM@8 29 % * 'chebychev' Chebychev distance (maximum coordinate difference).
DaveM@8 30 % * 'mahalanobis' Mahalanobis distance, using the sample covariance
DaveM@8 31 % of X as computed by nancov. To compute the distance with a different
DaveM@8 32 % covariance, use D = pdist(X,'mahalanobis',C), where the matrix C is
DaveM@8 33 % symmetric and positive definite.
DaveM@8 34 % * 'cosine' One minus the cosine of the included angle between points
DaveM@8 35 % (treated as vectors).
DaveM@8 36 % * 'correlation' One minus the sample correlation between points
DaveM@8 37 % (treated as sequences of values).
DaveM@8 38 % * 'spearman' One minus the sample Spearman's rank correlation between
DaveM@8 39 % observations (treated as sequences of values).
DaveM@8 40 % * 'hamming' Hamming distance, which is the percentage of coordinates
DaveM@8 41 % that differ.
DaveM@8 42 % * 'jaccard' One minus the Jaccard coefficient, which is the
DaveM@8 43 % percentage of nonzero coordinates that differ.
DaveM@8 44 % numClusters is the number of final clusters produced by the dendrogram,
DaveM@8 45 % if 0 (default), then will infer from data
DaveM@8 46
DaveM@8 47 if(nargin<2)
DaveM@8 48 clusterMethod = 'ward';
DaveM@8 49 end
DaveM@8 50 if(nargin<3)
DaveM@8 51 distanceMetric = 'euclidean';
DaveM@8 52 end
DaveM@8 53 if (nargin<4)
DaveM@8 54 numClusters = 0;
DaveM@8 55 end
DaveM@8 56
DaveM@8 57 distMap = pdist(data, distanceMetric);
DaveM@8 58 linkList = linkage(distMap, clusterMethod);
DaveM@36 59 [~,T] = dendrogram(linkList,numClusters,'Orientation','left');
DaveM@8 60
DaveM@8 61
DaveM@8 62 end