Mercurial > hg > camir-aes2014
comparison toolboxes/distance_learning/mlr/README @ 0:e9a9cd732c1e tip
first hg version after svn
author | wolffd |
---|---|
date | Tue, 10 Feb 2015 15:05:51 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:e9a9cd732c1e |
---|---|
1 Metric Learning to Rank (mlr-1.2) | |
2 http://www-cse.ucsd.edu/~bmcfee/code/mlr/ | |
3 | |
4 AUTHORS: | |
5 Brian McFee <bmcfee@cs.ucsd.edu> | |
6 Daryl Lim <dklim@ucsd.edu> | |
7 | |
8 This code is distributed under the GNU GPL license. See LICENSE for details | |
9 or http://www.gnu.org/licenses/gpl-3.0.txt | |
10 | |
11 | |
12 | |
13 INTRODUCTION | |
14 ------------ | |
15 This package contains the MATLAB code for Metric Learning to Rank (MLR). | |
16 The latest version of this software can be found at the URL above. | |
17 | |
18 The software included here implements the algorithm described in | |
19 | |
20 [1] McFee, Brian and Lanckriet, G.R.G. Metric learning to rank. | |
21 Proceedings of the 27th annual International Conference | |
22 on Machine Learning (ICML), 2010. | |
23 | |
24 Please cite this paper if you use this code. | |
25 | |
26 If you use the Robust MLR code (rmlr_train), please cite: | |
27 | |
28 [2] Lim, D.K.H., McFee, B. and Lanckriet, G.R.G. Robust structural metric learning. | |
29 Proceedings of the 30th annual International Conference on Machine Learning (ICML), | |
30 2013. | |
31 | |
32 | |
33 INSTALLATION | |
34 ------------ | |
35 | |
36 1. Requirements | |
37 | |
38 This software requires MATLAB R2007a or later. Because it makes extensive use | |
39 of the "bsxfun" function, earlier versions of Matlab will not work. | |
40 | |
41 If you have an older Matlab installation, you can install an alternative bsxfun | |
42 implementation by going to | |
43 | |
44 http://www.mathworks.com/matlabcentral/fileexchange/23005 , | |
45 | |
46 however, this may not work, and it will certainly be slower than the native | |
47 bsxfun implementation. | |
48 | |
49 | |
50 2. Compiling MEX functions | |
51 | |
52 MLR includes two auxiliary functions "cummax" and "binarysearch" to accelerate | |
53 certain steps of the optimization procedure. The source code for these | |
54 functions is located in the "util" subdirectory. | |
55 | |
56 A makefile is provided, so that (on Unix systems), you can simply type | |
57 | |
58 cd util | |
59 make | |
60 cd .. | |
61 | |
62 | |
63 3. Running the demo | |
64 | |
65 To test the installation, first add the path to MLR to your Matlab environment: | |
66 | |
67 >> addpath(genpath('/path/to/mlr/')); | |
68 | |
69 Then, run the demo script: | |
70 | |
71 >> mlr_demo | |
72 | |
73 from within Matlab. The demo generates a random training/test split of the Wine data set | |
74 (http://archive.ics.uci.edu/ml/datasets/Wine) | |
75 and learns a metric with MLR. Both native and learned metrics are displayed in a figure | |
76 with a scatter plot. | |
77 | |
78 | |
79 TRAINING | |
80 -------- | |
81 | |
82 There are several modes of operation for training metrics with MLR. In the | |
83 simplest mode, training data is contained in a matrix X (each column is a | |
84 training vector), and Y contains the labels/relevance for the training data | |
85 (see below). | |
86 | |
87 [W, Xi, D] = mlr_train(X, Y, C,...) | |
88 | |
89 X = d*n data matrix | |
90 Y = either n-by-1 label of vectors | |
91 OR | |
92 n-by-2 cell array where | |
93 Y{q,1} contains relevant indices for q, and | |
94 Y{q,2} contains irrelevant indices for q | |
95 | |
96 C >= 0 slack trade-off parameter (default=1) | |
97 | |
98 W = the learned metric, i.e., the inner product matrix of the learned | |
99 space can be computed by X' * W * X | |
100 Xi = slack value on the learned metric (see [1]) | |
101 D = diagnostics | |
102 | |
103 | |
104 By default, MLR optimizes for Area Under the ROC Curve (AUC). This can be | |
105 changed by setting the "LOSS" parameter to one of several ranking loss | |
106 measures: | |
107 | |
108 [W, Xi, D] = mlr_train(X, Y, C, LOSS) | |
109 where LOSS is one of: | |
110 'AUC': Area under ROC curve (default) | |
111 'KNN': KNN accuracy* | |
112 'Prec@k': Precision-at-k | |
113 'MAP': Mean Average Precision | |
114 'MRR': Mean Reciprocal Rank | |
115 'NDCG': Normalized Discounted Cumulative Gain | |
116 | |
117 *Note: KNN is correct only for binary classification problems; in | |
118 practice, Prec@k is usually a better alternative. | |
119 | |
120 | |
121 For KNN/Prec@k/NDCG, a threshold k may be set to determine the truncation of | |
122 the ranked list. Thiscan be done by setting the k parameter: | |
123 | |
124 [W, Xi, D] = mlr_train(X, Y, C, LOSS, k) | |
125 where k is the number of neighbors for Prec@k or NDCG | |
126 (default=3) | |
127 | |
128 | |
129 By default, MLR regularizes the metric W by the trace, i.e., the 1-norm of the | |
130 eigenvalues. This can be changed to one of several alternatives: | |
131 | |
132 [W, Xi, D] = mlr_train(X, Y, C, LOSS, k, REG) | |
133 where REG defines the regularization on W, and is one of: | |
134 0: no regularization | |
135 1: 1-norm: trace(W) (default) | |
136 2: 2-norm: trace(W' * W) | |
137 3: Kernel: trace(W * X), assumes X is square | |
138 and positive-definite | |
139 | |
140 The last setting, "kernel", is appropriate for regularizing metrics learned | |
141 from kernel matrices. | |
142 | |
143 | |
144 W corresponds to a linear projection matrix (rotation and scaling). To learn a | |
145 restricted model which may only scale, but not rotate the data, W can be | |
146 constrained to diagonal matrices by setting the "Diagonal" parameter to 1: | |
147 | |
148 [W, Xi, D] = mlr_train(X, Y, C, LOSS, k, REG, Diagonal) | |
149 Diagonal = 0: learn a full d-by-d W (default) | |
150 Diagonal = 1: learn diagonally-constrained W (d-by-1) | |
151 | |
152 Note: the W returned in this case will be the d-by-1 vector corresponding | |
153 to the main diagonal of a full metric, not the full d-by-d matrix. | |
154 | |
155 | |
156 | |
157 Finally, we provide a stochastic gradient descent implementation to handle | |
158 large-scale problems. Rather than estimating gradients from the entire | |
159 training set, this variant uses a random subset of size B (see below) at each | |
160 call to the cutting plane subroutine. This results in faster, but less | |
161 accurate, optimization: | |
162 | |
163 [W, Xi, D] = mlr_train(X, Y, C, LOSS, k, REG, Diagonal, B) | |
164 where B > 0 enables stochastic optimization with batch size B | |
165 | |
166 | |
167 | |
168 TESTING | |
169 ------- | |
170 | |
171 Once a metric has been trained by "mlr_train", you can evaluate performance | |
172 across all measures by using the "mlr_test" function: | |
173 | |
174 Perf = mlr_test(W, test_k, Xtrain, Ytrain, Xtest, Ytest) | |
175 | |
176 W = d-by-d positive semi-definite matrix | |
177 test_k = vector of k-values to use for KNN/Prec@k/NDCG | |
178 Xtrain = d-by-n matrix of training data | |
179 Ytrain = n-by-1 vector of training labels | |
180 OR | |
181 n-by-2 cell array where | |
182 Y{q,1} contains relevant indices (in 1..n) for point q | |
183 Y{q,2} contains irrelevant indices (in 1..n) for point q | |
184 Xtest = d-by-m matrix of testing data | |
185 Ytest = m-by-1 vector of training labels, or m-by-2 cell array | |
186 If using the cell version, indices correspond to | |
187 the training set, and must lie in the range (1..n) | |
188 | |
189 | |
190 The output structure Perf contains the mean score for: | |
191 AUC, KNN, Prec@k, MAP, MRR, NDCG, | |
192 as well as the effective dimensionality of W, and | |
193 the best-performing k-value for KNN, Prec@k, and NDCG. | |
194 | |
195 For information retrieval settings, consider Xtrain/Ytrain as the corpus | |
196 and Xtest/Ytest as the queries. | |
197 | |
198 By testing with | |
199 Wnative = [] | |
200 you can quickly evaluate the performance of the native (Euclidean) metric. | |
201 | |
202 | |
203 MULTIPLE KERNEL LEARNING | |
204 ------------------------ | |
205 | |
206 As of version 1.1, MLR supports learning (multiple) kernel metrics, using the method described in | |
207 | |
208 [3] Galleguillos, Carolina, McFee, Brian, Belongie, Serge, and Lanckriet, G.R.G. | |
209 From region similarity to category discovery. | |
210 Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011. | |
211 | |
212 For m kernels, input data must be provided in the form of an n-by-n-by-m matrix K, where K(:,:,i) | |
213 contains the i'th kernel matrix of the training data. Training the metric is similar to the | |
214 single-kernel case, except that the regularization parameter should be set to 3 (kernel regularization), | |
215 eg: | |
216 | |
217 >> [W, Xi, D] = mlr_train(K, Y, C, 'auc', 1, 3); | |
218 | |
219 returns an n-by-n-by-m matrix W which can be used directly with mlr_test. Multiple kernel learning also supports | |
220 diagonally constrained learning, eg: | |
221 | |
222 >> [W, Xi, D] = mlr_train(K, Y, C, 'auc', 1, 3, 1); | |
223 | |
224 returns an n-by-m matrix W, where W(:,i) is the main diagonal of the i'th kernels metric. Again, this W can be fed | |
225 directly into mlr_test. | |
226 | |
227 For testing with multiple kernel data, Ktest should be an n-by-nTest-by-m matrix, where K(:,:,i) is the | |
228 training-by-test slice of the i'th kernel matrix. | |
229 | |
230 FEEDBACK | |
231 -------- | |
232 | |
233 Please send any bug reports, source code contributions, etc. to | |
234 | |
235 Brian McFee <bmcfee@cs.ucsd.edu> | |
236 |