wolffd@0
|
1 Metric Learning to Rank (mlr-1.2)
|
wolffd@0
|
2 http://www-cse.ucsd.edu/~bmcfee/code/mlr/
|
wolffd@0
|
3
|
wolffd@0
|
4 AUTHORS:
|
wolffd@0
|
5 Brian McFee <bmcfee@cs.ucsd.edu>
|
wolffd@0
|
6 Daryl Lim <dklim@ucsd.edu>
|
wolffd@0
|
7
|
wolffd@0
|
8 This code is distributed under the GNU GPL license. See LICENSE for details
|
wolffd@0
|
9 or http://www.gnu.org/licenses/gpl-3.0.txt
|
wolffd@0
|
10
|
wolffd@0
|
11
|
wolffd@0
|
12
|
wolffd@0
|
13 INTRODUCTION
|
wolffd@0
|
14 ------------
|
wolffd@0
|
15 This package contains the MATLAB code for Metric Learning to Rank (MLR).
|
wolffd@0
|
16 The latest version of this software can be found at the URL above.
|
wolffd@0
|
17
|
wolffd@0
|
18 The software included here implements the algorithm described in
|
wolffd@0
|
19
|
wolffd@0
|
20 [1] McFee, Brian and Lanckriet, G.R.G. Metric learning to rank.
|
wolffd@0
|
21 Proceedings of the 27th annual International Conference
|
wolffd@0
|
22 on Machine Learning (ICML), 2010.
|
wolffd@0
|
23
|
wolffd@0
|
24 Please cite this paper if you use this code.
|
wolffd@0
|
25
|
wolffd@0
|
26 If you use the Robust MLR code (rmlr_train), please cite:
|
wolffd@0
|
27
|
wolffd@0
|
28 [2] Lim, D.K.H., McFee, B. and Lanckriet, G.R.G. Robust structural metric learning.
|
wolffd@0
|
29 Proceedings of the 30th annual International Conference on Machine Learning (ICML),
|
wolffd@0
|
30 2013.
|
wolffd@0
|
31
|
wolffd@0
|
32
|
wolffd@0
|
33 INSTALLATION
|
wolffd@0
|
34 ------------
|
wolffd@0
|
35
|
wolffd@0
|
36 1. Requirements
|
wolffd@0
|
37
|
wolffd@0
|
38 This software requires MATLAB R2007a or later. Because it makes extensive use
|
wolffd@0
|
39 of the "bsxfun" function, earlier versions of Matlab will not work.
|
wolffd@0
|
40
|
wolffd@0
|
41 If you have an older Matlab installation, you can install an alternative bsxfun
|
wolffd@0
|
42 implementation by going to
|
wolffd@0
|
43
|
wolffd@0
|
44 http://www.mathworks.com/matlabcentral/fileexchange/23005 ,
|
wolffd@0
|
45
|
wolffd@0
|
46 however, this may not work, and it will certainly be slower than the native
|
wolffd@0
|
47 bsxfun implementation.
|
wolffd@0
|
48
|
wolffd@0
|
49
|
wolffd@0
|
50 2. Compiling MEX functions
|
wolffd@0
|
51
|
wolffd@0
|
52 MLR includes two auxiliary functions "cummax" and "binarysearch" to accelerate
|
wolffd@0
|
53 certain steps of the optimization procedure. The source code for these
|
wolffd@0
|
54 functions is located in the "util" subdirectory.
|
wolffd@0
|
55
|
wolffd@0
|
56 A makefile is provided, so that (on Unix systems), you can simply type
|
wolffd@0
|
57
|
wolffd@0
|
58 cd util
|
wolffd@0
|
59 make
|
wolffd@0
|
60 cd ..
|
wolffd@0
|
61
|
wolffd@0
|
62
|
wolffd@0
|
63 3. Running the demo
|
wolffd@0
|
64
|
wolffd@0
|
65 To test the installation, first add the path to MLR to your Matlab environment:
|
wolffd@0
|
66
|
wolffd@0
|
67 >> addpath(genpath('/path/to/mlr/'));
|
wolffd@0
|
68
|
wolffd@0
|
69 Then, run the demo script:
|
wolffd@0
|
70
|
wolffd@0
|
71 >> mlr_demo
|
wolffd@0
|
72
|
wolffd@0
|
73 from within Matlab. The demo generates a random training/test split of the Wine data set
|
wolffd@0
|
74 (http://archive.ics.uci.edu/ml/datasets/Wine)
|
wolffd@0
|
75 and learns a metric with MLR. Both native and learned metrics are displayed in a figure
|
wolffd@0
|
76 with a scatter plot.
|
wolffd@0
|
77
|
wolffd@0
|
78
|
wolffd@0
|
79 TRAINING
|
wolffd@0
|
80 --------
|
wolffd@0
|
81
|
wolffd@0
|
82 There are several modes of operation for training metrics with MLR. In the
|
wolffd@0
|
83 simplest mode, training data is contained in a matrix X (each column is a
|
wolffd@0
|
84 training vector), and Y contains the labels/relevance for the training data
|
wolffd@0
|
85 (see below).
|
wolffd@0
|
86
|
wolffd@0
|
87 [W, Xi, D] = mlr_train(X, Y, C,...)
|
wolffd@0
|
88
|
wolffd@0
|
89 X = d*n data matrix
|
wolffd@0
|
90 Y = either n-by-1 label of vectors
|
wolffd@0
|
91 OR
|
wolffd@0
|
92 n-by-2 cell array where
|
wolffd@0
|
93 Y{q,1} contains relevant indices for q, and
|
wolffd@0
|
94 Y{q,2} contains irrelevant indices for q
|
wolffd@0
|
95
|
wolffd@0
|
96 C >= 0 slack trade-off parameter (default=1)
|
wolffd@0
|
97
|
wolffd@0
|
98 W = the learned metric, i.e., the inner product matrix of the learned
|
wolffd@0
|
99 space can be computed by X' * W * X
|
wolffd@0
|
100 Xi = slack value on the learned metric (see [1])
|
wolffd@0
|
101 D = diagnostics
|
wolffd@0
|
102
|
wolffd@0
|
103
|
wolffd@0
|
104 By default, MLR optimizes for Area Under the ROC Curve (AUC). This can be
|
wolffd@0
|
105 changed by setting the "LOSS" parameter to one of several ranking loss
|
wolffd@0
|
106 measures:
|
wolffd@0
|
107
|
wolffd@0
|
108 [W, Xi, D] = mlr_train(X, Y, C, LOSS)
|
wolffd@0
|
109 where LOSS is one of:
|
wolffd@0
|
110 'AUC': Area under ROC curve (default)
|
wolffd@0
|
111 'KNN': KNN accuracy*
|
wolffd@0
|
112 'Prec@k': Precision-at-k
|
wolffd@0
|
113 'MAP': Mean Average Precision
|
wolffd@0
|
114 'MRR': Mean Reciprocal Rank
|
wolffd@0
|
115 'NDCG': Normalized Discounted Cumulative Gain
|
wolffd@0
|
116
|
wolffd@0
|
117 *Note: KNN is correct only for binary classification problems; in
|
wolffd@0
|
118 practice, Prec@k is usually a better alternative.
|
wolffd@0
|
119
|
wolffd@0
|
120
|
wolffd@0
|
121 For KNN/Prec@k/NDCG, a threshold k may be set to determine the truncation of
|
wolffd@0
|
122 the ranked list. Thiscan be done by setting the k parameter:
|
wolffd@0
|
123
|
wolffd@0
|
124 [W, Xi, D] = mlr_train(X, Y, C, LOSS, k)
|
wolffd@0
|
125 where k is the number of neighbors for Prec@k or NDCG
|
wolffd@0
|
126 (default=3)
|
wolffd@0
|
127
|
wolffd@0
|
128
|
wolffd@0
|
129 By default, MLR regularizes the metric W by the trace, i.e., the 1-norm of the
|
wolffd@0
|
130 eigenvalues. This can be changed to one of several alternatives:
|
wolffd@0
|
131
|
wolffd@0
|
132 [W, Xi, D] = mlr_train(X, Y, C, LOSS, k, REG)
|
wolffd@0
|
133 where REG defines the regularization on W, and is one of:
|
wolffd@0
|
134 0: no regularization
|
wolffd@0
|
135 1: 1-norm: trace(W) (default)
|
wolffd@0
|
136 2: 2-norm: trace(W' * W)
|
wolffd@0
|
137 3: Kernel: trace(W * X), assumes X is square
|
wolffd@0
|
138 and positive-definite
|
wolffd@0
|
139
|
wolffd@0
|
140 The last setting, "kernel", is appropriate for regularizing metrics learned
|
wolffd@0
|
141 from kernel matrices.
|
wolffd@0
|
142
|
wolffd@0
|
143
|
wolffd@0
|
144 W corresponds to a linear projection matrix (rotation and scaling). To learn a
|
wolffd@0
|
145 restricted model which may only scale, but not rotate the data, W can be
|
wolffd@0
|
146 constrained to diagonal matrices by setting the "Diagonal" parameter to 1:
|
wolffd@0
|
147
|
wolffd@0
|
148 [W, Xi, D] = mlr_train(X, Y, C, LOSS, k, REG, Diagonal)
|
wolffd@0
|
149 Diagonal = 0: learn a full d-by-d W (default)
|
wolffd@0
|
150 Diagonal = 1: learn diagonally-constrained W (d-by-1)
|
wolffd@0
|
151
|
wolffd@0
|
152 Note: the W returned in this case will be the d-by-1 vector corresponding
|
wolffd@0
|
153 to the main diagonal of a full metric, not the full d-by-d matrix.
|
wolffd@0
|
154
|
wolffd@0
|
155
|
wolffd@0
|
156
|
wolffd@0
|
157 Finally, we provide a stochastic gradient descent implementation to handle
|
wolffd@0
|
158 large-scale problems. Rather than estimating gradients from the entire
|
wolffd@0
|
159 training set, this variant uses a random subset of size B (see below) at each
|
wolffd@0
|
160 call to the cutting plane subroutine. This results in faster, but less
|
wolffd@0
|
161 accurate, optimization:
|
wolffd@0
|
162
|
wolffd@0
|
163 [W, Xi, D] = mlr_train(X, Y, C, LOSS, k, REG, Diagonal, B)
|
wolffd@0
|
164 where B > 0 enables stochastic optimization with batch size B
|
wolffd@0
|
165
|
wolffd@0
|
166
|
wolffd@0
|
167
|
wolffd@0
|
168 TESTING
|
wolffd@0
|
169 -------
|
wolffd@0
|
170
|
wolffd@0
|
171 Once a metric has been trained by "mlr_train", you can evaluate performance
|
wolffd@0
|
172 across all measures by using the "mlr_test" function:
|
wolffd@0
|
173
|
wolffd@0
|
174 Perf = mlr_test(W, test_k, Xtrain, Ytrain, Xtest, Ytest)
|
wolffd@0
|
175
|
wolffd@0
|
176 W = d-by-d positive semi-definite matrix
|
wolffd@0
|
177 test_k = vector of k-values to use for KNN/Prec@k/NDCG
|
wolffd@0
|
178 Xtrain = d-by-n matrix of training data
|
wolffd@0
|
179 Ytrain = n-by-1 vector of training labels
|
wolffd@0
|
180 OR
|
wolffd@0
|
181 n-by-2 cell array where
|
wolffd@0
|
182 Y{q,1} contains relevant indices (in 1..n) for point q
|
wolffd@0
|
183 Y{q,2} contains irrelevant indices (in 1..n) for point q
|
wolffd@0
|
184 Xtest = d-by-m matrix of testing data
|
wolffd@0
|
185 Ytest = m-by-1 vector of training labels, or m-by-2 cell array
|
wolffd@0
|
186 If using the cell version, indices correspond to
|
wolffd@0
|
187 the training set, and must lie in the range (1..n)
|
wolffd@0
|
188
|
wolffd@0
|
189
|
wolffd@0
|
190 The output structure Perf contains the mean score for:
|
wolffd@0
|
191 AUC, KNN, Prec@k, MAP, MRR, NDCG,
|
wolffd@0
|
192 as well as the effective dimensionality of W, and
|
wolffd@0
|
193 the best-performing k-value for KNN, Prec@k, and NDCG.
|
wolffd@0
|
194
|
wolffd@0
|
195 For information retrieval settings, consider Xtrain/Ytrain as the corpus
|
wolffd@0
|
196 and Xtest/Ytest as the queries.
|
wolffd@0
|
197
|
wolffd@0
|
198 By testing with
|
wolffd@0
|
199 Wnative = []
|
wolffd@0
|
200 you can quickly evaluate the performance of the native (Euclidean) metric.
|
wolffd@0
|
201
|
wolffd@0
|
202
|
wolffd@0
|
203 MULTIPLE KERNEL LEARNING
|
wolffd@0
|
204 ------------------------
|
wolffd@0
|
205
|
wolffd@0
|
206 As of version 1.1, MLR supports learning (multiple) kernel metrics, using the method described in
|
wolffd@0
|
207
|
wolffd@0
|
208 [3] Galleguillos, Carolina, McFee, Brian, Belongie, Serge, and Lanckriet, G.R.G.
|
wolffd@0
|
209 From region similarity to category discovery.
|
wolffd@0
|
210 Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.
|
wolffd@0
|
211
|
wolffd@0
|
212 For m kernels, input data must be provided in the form of an n-by-n-by-m matrix K, where K(:,:,i)
|
wolffd@0
|
213 contains the i'th kernel matrix of the training data. Training the metric is similar to the
|
wolffd@0
|
214 single-kernel case, except that the regularization parameter should be set to 3 (kernel regularization),
|
wolffd@0
|
215 eg:
|
wolffd@0
|
216
|
wolffd@0
|
217 >> [W, Xi, D] = mlr_train(K, Y, C, 'auc', 1, 3);
|
wolffd@0
|
218
|
wolffd@0
|
219 returns an n-by-n-by-m matrix W which can be used directly with mlr_test. Multiple kernel learning also supports
|
wolffd@0
|
220 diagonally constrained learning, eg:
|
wolffd@0
|
221
|
wolffd@0
|
222 >> [W, Xi, D] = mlr_train(K, Y, C, 'auc', 1, 3, 1);
|
wolffd@0
|
223
|
wolffd@0
|
224 returns an n-by-m matrix W, where W(:,i) is the main diagonal of the i'th kernels metric. Again, this W can be fed
|
wolffd@0
|
225 directly into mlr_test.
|
wolffd@0
|
226
|
wolffd@0
|
227 For testing with multiple kernel data, Ktest should be an n-by-nTest-by-m matrix, where K(:,:,i) is the
|
wolffd@0
|
228 training-by-test slice of the i'th kernel matrix.
|
wolffd@0
|
229
|
wolffd@0
|
230 FEEDBACK
|
wolffd@0
|
231 --------
|
wolffd@0
|
232
|
wolffd@0
|
233 Please send any bug reports, source code contributions, etc. to
|
wolffd@0
|
234
|
wolffd@0
|
235 Brian McFee <bmcfee@cs.ucsd.edu>
|
wolffd@0
|
236
|