DaveM@9
|
1 function linkList = aglomCluster(data, clusterMethod, distanceMetric, numClusters)
|
DaveM@8
|
2 %% aglomCluster(data, clusterMethod, distanceMetric, numClusters)
|
DaveM@8
|
3 % This function performs aglomerative clustering on a given data set,
|
DaveM@8
|
4 % allowing the interpretation of a hierarchical data, and plotting a
|
DaveM@8
|
5 % dendrogram.
|
DaveM@8
|
6 %
|
DaveM@8
|
7 % data in the format of of each row is an observation and each column is a
|
DaveM@8
|
8 % feature vector clusterMethod;
|
DaveM@8
|
9 % * 'average' Unweighted average distance (UPGMA)
|
DaveM@8
|
10 % * 'centroid' Centroid distance (UPGMC), appropriate for Euclidean
|
DaveM@8
|
11 % distances only
|
DaveM@8
|
12 % * 'complete' Furthest distance
|
DaveM@8
|
13 % * 'median' Weighted center of mass distance (WPGMC),appropriate
|
DaveM@8
|
14 % for Euclidean distances only
|
DaveM@8
|
15 % * 'single' Shortest distance
|
DaveM@8
|
16 % * 'ward' Inner squared distance (minimum variance algorithm),
|
DaveM@8
|
17 % appropriate for Euclidean distances only (default)
|
DaveM@8
|
18 % * 'weighted' Weighted average distance (WPGMA)
|
DaveM@8
|
19 % distanceMetric
|
DaveM@8
|
20 % * 'euclidean' Euclidean distance (default).
|
DaveM@8
|
21 % * 'seuclidean' Standardized Euclidean distance. Each coordinate
|
DaveM@8
|
22 % difference between rows in X is scaled by dividing by the
|
DaveM@8
|
23 % corresponding element of the standard deviation S=nanstd(X). To
|
DaveM@8
|
24 % specify another value for S, use D=pdist(X,'seuclidean',S).
|
DaveM@8
|
25 % * 'cityblock' City block metric.
|
DaveM@8
|
26 % * 'minkowski' Minkowski distance. The default exponent is 2. To
|
DaveM@8
|
27 % specify a different exponent, use D = pdist(X,'minkowski',P), where P
|
DaveM@8
|
28 % is a scalar positive value of the exponent.
|
DaveM@8
|
29 % * 'chebychev' Chebychev distance (maximum coordinate difference).
|
DaveM@8
|
30 % * 'mahalanobis' Mahalanobis distance, using the sample covariance
|
DaveM@8
|
31 % of X as computed by nancov. To compute the distance with a different
|
DaveM@8
|
32 % covariance, use D = pdist(X,'mahalanobis',C), where the matrix C is
|
DaveM@8
|
33 % symmetric and positive definite.
|
DaveM@8
|
34 % * 'cosine' One minus the cosine of the included angle between points
|
DaveM@8
|
35 % (treated as vectors).
|
DaveM@8
|
36 % * 'correlation' One minus the sample correlation between points
|
DaveM@8
|
37 % (treated as sequences of values).
|
DaveM@8
|
38 % * 'spearman' One minus the sample Spearman's rank correlation between
|
DaveM@8
|
39 % observations (treated as sequences of values).
|
DaveM@8
|
40 % * 'hamming' Hamming distance, which is the percentage of coordinates
|
DaveM@8
|
41 % that differ.
|
DaveM@8
|
42 % * 'jaccard' One minus the Jaccard coefficient, which is the
|
DaveM@8
|
43 % percentage of nonzero coordinates that differ.
|
DaveM@8
|
44 % numClusters is the number of final clusters produced by the dendrogram,
|
DaveM@8
|
45 % if 0 (default), then will infer from data
|
DaveM@8
|
46
|
DaveM@8
|
47 if(nargin<2)
|
DaveM@8
|
48 clusterMethod = 'ward';
|
DaveM@8
|
49 end
|
DaveM@8
|
50 if(nargin<3)
|
DaveM@8
|
51 distanceMetric = 'euclidean';
|
DaveM@8
|
52 end
|
DaveM@8
|
53 if (nargin<4)
|
DaveM@8
|
54 numClusters = 0;
|
DaveM@8
|
55 end
|
DaveM@8
|
56
|
DaveM@8
|
57 distMap = pdist(data, distanceMetric);
|
DaveM@8
|
58 linkList = linkage(distMap, clusterMethod);
|
DaveM@36
|
59 [~,T] = dendrogram(linkList,numClusters,'Orientation','left');
|
DaveM@8
|
60
|
DaveM@8
|
61
|
DaveM@8
|
62 end |