comparison toolboxes/distance_learning/mlr/README @ 0:e9a9cd732c1e tip

first hg version after svn
author wolffd
date Tue, 10 Feb 2015 15:05:51 +0000
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:e9a9cd732c1e
1 Metric Learning to Rank (mlr-1.2)
2 http://www-cse.ucsd.edu/~bmcfee/code/mlr/
3
4 AUTHORS:
5 Brian McFee <bmcfee@cs.ucsd.edu>
6 Daryl Lim <dklim@ucsd.edu>
7
8 This code is distributed under the GNU GPL license. See LICENSE for details
9 or http://www.gnu.org/licenses/gpl-3.0.txt
10
11
12
13 INTRODUCTION
14 ------------
15 This package contains the MATLAB code for Metric Learning to Rank (MLR).
16 The latest version of this software can be found at the URL above.
17
18 The software included here implements the algorithm described in
19
20 [1] McFee, Brian and Lanckriet, G.R.G. Metric learning to rank.
21 Proceedings of the 27th annual International Conference
22 on Machine Learning (ICML), 2010.
23
24 Please cite this paper if you use this code.
25
26 If you use the Robust MLR code (rmlr_train), please cite:
27
28 [2] Lim, D.K.H., McFee, B. and Lanckriet, G.R.G. Robust structural metric learning.
29 Proceedings of the 30th annual International Conference on Machine Learning (ICML),
30 2013.
31
32
33 INSTALLATION
34 ------------
35
36 1. Requirements
37
38 This software requires MATLAB R2007a or later. Because it makes extensive use
39 of the "bsxfun" function, earlier versions of Matlab will not work.
40
41 If you have an older Matlab installation, you can install an alternative bsxfun
42 implementation by going to
43
44 http://www.mathworks.com/matlabcentral/fileexchange/23005 ,
45
46 however, this may not work, and it will certainly be slower than the native
47 bsxfun implementation.
48
49
50 2. Compiling MEX functions
51
52 MLR includes two auxiliary functions "cummax" and "binarysearch" to accelerate
53 certain steps of the optimization procedure. The source code for these
54 functions is located in the "util" subdirectory.
55
56 A makefile is provided, so that (on Unix systems), you can simply type
57
58 cd util
59 make
60 cd ..
61
62
63 3. Running the demo
64
65 To test the installation, first add the path to MLR to your Matlab environment:
66
67 >> addpath(genpath('/path/to/mlr/'));
68
69 Then, run the demo script:
70
71 >> mlr_demo
72
73 from within Matlab. The demo generates a random training/test split of the Wine data set
74 (http://archive.ics.uci.edu/ml/datasets/Wine)
75 and learns a metric with MLR. Both native and learned metrics are displayed in a figure
76 with a scatter plot.
77
78
79 TRAINING
80 --------
81
82 There are several modes of operation for training metrics with MLR. In the
83 simplest mode, training data is contained in a matrix X (each column is a
84 training vector), and Y contains the labels/relevance for the training data
85 (see below).
86
87 [W, Xi, D] = mlr_train(X, Y, C,...)
88
89 X = d*n data matrix
90 Y = either n-by-1 label of vectors
91 OR
92 n-by-2 cell array where
93 Y{q,1} contains relevant indices for q, and
94 Y{q,2} contains irrelevant indices for q
95
96 C >= 0 slack trade-off parameter (default=1)
97
98 W = the learned metric, i.e., the inner product matrix of the learned
99 space can be computed by X' * W * X
100 Xi = slack value on the learned metric (see [1])
101 D = diagnostics
102
103
104 By default, MLR optimizes for Area Under the ROC Curve (AUC). This can be
105 changed by setting the "LOSS" parameter to one of several ranking loss
106 measures:
107
108 [W, Xi, D] = mlr_train(X, Y, C, LOSS)
109 where LOSS is one of:
110 'AUC': Area under ROC curve (default)
111 'KNN': KNN accuracy*
112 'Prec@k': Precision-at-k
113 'MAP': Mean Average Precision
114 'MRR': Mean Reciprocal Rank
115 'NDCG': Normalized Discounted Cumulative Gain
116
117 *Note: KNN is correct only for binary classification problems; in
118 practice, Prec@k is usually a better alternative.
119
120
121 For KNN/Prec@k/NDCG, a threshold k may be set to determine the truncation of
122 the ranked list. Thiscan be done by setting the k parameter:
123
124 [W, Xi, D] = mlr_train(X, Y, C, LOSS, k)
125 where k is the number of neighbors for Prec@k or NDCG
126 (default=3)
127
128
129 By default, MLR regularizes the metric W by the trace, i.e., the 1-norm of the
130 eigenvalues. This can be changed to one of several alternatives:
131
132 [W, Xi, D] = mlr_train(X, Y, C, LOSS, k, REG)
133 where REG defines the regularization on W, and is one of:
134 0: no regularization
135 1: 1-norm: trace(W) (default)
136 2: 2-norm: trace(W' * W)
137 3: Kernel: trace(W * X), assumes X is square
138 and positive-definite
139
140 The last setting, "kernel", is appropriate for regularizing metrics learned
141 from kernel matrices.
142
143
144 W corresponds to a linear projection matrix (rotation and scaling). To learn a
145 restricted model which may only scale, but not rotate the data, W can be
146 constrained to diagonal matrices by setting the "Diagonal" parameter to 1:
147
148 [W, Xi, D] = mlr_train(X, Y, C, LOSS, k, REG, Diagonal)
149 Diagonal = 0: learn a full d-by-d W (default)
150 Diagonal = 1: learn diagonally-constrained W (d-by-1)
151
152 Note: the W returned in this case will be the d-by-1 vector corresponding
153 to the main diagonal of a full metric, not the full d-by-d matrix.
154
155
156
157 Finally, we provide a stochastic gradient descent implementation to handle
158 large-scale problems. Rather than estimating gradients from the entire
159 training set, this variant uses a random subset of size B (see below) at each
160 call to the cutting plane subroutine. This results in faster, but less
161 accurate, optimization:
162
163 [W, Xi, D] = mlr_train(X, Y, C, LOSS, k, REG, Diagonal, B)
164 where B > 0 enables stochastic optimization with batch size B
165
166
167
168 TESTING
169 -------
170
171 Once a metric has been trained by "mlr_train", you can evaluate performance
172 across all measures by using the "mlr_test" function:
173
174 Perf = mlr_test(W, test_k, Xtrain, Ytrain, Xtest, Ytest)
175
176 W = d-by-d positive semi-definite matrix
177 test_k = vector of k-values to use for KNN/Prec@k/NDCG
178 Xtrain = d-by-n matrix of training data
179 Ytrain = n-by-1 vector of training labels
180 OR
181 n-by-2 cell array where
182 Y{q,1} contains relevant indices (in 1..n) for point q
183 Y{q,2} contains irrelevant indices (in 1..n) for point q
184 Xtest = d-by-m matrix of testing data
185 Ytest = m-by-1 vector of training labels, or m-by-2 cell array
186 If using the cell version, indices correspond to
187 the training set, and must lie in the range (1..n)
188
189
190 The output structure Perf contains the mean score for:
191 AUC, KNN, Prec@k, MAP, MRR, NDCG,
192 as well as the effective dimensionality of W, and
193 the best-performing k-value for KNN, Prec@k, and NDCG.
194
195 For information retrieval settings, consider Xtrain/Ytrain as the corpus
196 and Xtest/Ytest as the queries.
197
198 By testing with
199 Wnative = []
200 you can quickly evaluate the performance of the native (Euclidean) metric.
201
202
203 MULTIPLE KERNEL LEARNING
204 ------------------------
205
206 As of version 1.1, MLR supports learning (multiple) kernel metrics, using the method described in
207
208 [3] Galleguillos, Carolina, McFee, Brian, Belongie, Serge, and Lanckriet, G.R.G.
209 From region similarity to category discovery.
210 Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.
211
212 For m kernels, input data must be provided in the form of an n-by-n-by-m matrix K, where K(:,:,i)
213 contains the i'th kernel matrix of the training data. Training the metric is similar to the
214 single-kernel case, except that the regularization parameter should be set to 3 (kernel regularization),
215 eg:
216
217 >> [W, Xi, D] = mlr_train(K, Y, C, 'auc', 1, 3);
218
219 returns an n-by-n-by-m matrix W which can be used directly with mlr_test. Multiple kernel learning also supports
220 diagonally constrained learning, eg:
221
222 >> [W, Xi, D] = mlr_train(K, Y, C, 'auc', 1, 3, 1);
223
224 returns an n-by-m matrix W, where W(:,i) is the main diagonal of the i'th kernels metric. Again, this W can be fed
225 directly into mlr_test.
226
227 For testing with multiple kernel data, Ktest should be an n-by-nTest-by-m matrix, where K(:,:,i) is the
228 training-by-test slice of the i'th kernel matrix.
229
230 FEEDBACK
231 --------
232
233 Please send any bug reports, source code contributions, etc. to
234
235 Brian McFee <bmcfee@cs.ucsd.edu>
236