comparison toolboxes/distance_learning/mlr/readme.txt @ 0:e9a9cd732c1e tip

first hg version after svn
author wolffd
date Tue, 10 Feb 2015 15:05:51 +0000
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:e9a9cd732c1e
1 Metric Learning to Rank (mlr-1.0)
2 http://www-cse.ucsd.edu/~bmcfee/code/mlr/
3
4 Brian McFee <bmcfee@cs.ucsd.edu>, 2010.
5
6 This code is distributed under the GNU GPL license. See LICENSE for details,
7 or http://www.gnu.org/licenses/gpl-3.0.txt .
8
9
10
11 INTRODUCTION
12 ------------
13 This package contains the MATLAB code for Metric Learning to Rank (MLR).
14 The latest version of this software can be found at the URL above.
15
16 The software included here implements the algorithm described in
17
18 [1] Mcfee, Brian and Lanckriet, G.R.G. Metric learning to rank.
19 Proceedings of the 27th annual International Conference
20 on Machine Learning (ICML), 2010.
21
22 Please cite this paper if you use this code.
23
24
25
26 INSTALLATION
27 ------------
28
29 1. Requirements
30
31 This software requires MATLAB R2007a or later. Because it makes extensive use
32 of the "bsxfun" function, earlier versions of Matlab will not work.
33
34 If you have an older Matlab installation, you can install an alternative bsxfun
35 implementation by going to
36
37 http://www.mathworks.com/matlabcentral/fileexchange/23005 ,
38
39 however, this may not work, and it will certainly be slower than the native
40 bsxfun implementation.
41
42
43 2. Compiling MEX functions
44
45 MLR includes two auxiliary functions "cummax" and "binarysearch" to accelerate
46 certain steps of the optimization procedure. The source code for these
47 functions is located in the "util" subdirectory.
48
49 A makefile is provided, so that (on Unix systems), you can simply type
50
51 cd util
52 make
53 cd ..
54
55
56 3. Running the demo
57
58 To test the installation, run
59
60 mlr_demo
61
62 from within Matlab. The demo generates a random training/test split of the Wine data set
63 http://archive.ics.uci.edu/ml/datasets/Wine
64 and learns a metric with MLR. Both native and learned metrics are displayed in a figure
65 with a scatter plot.
66
67
68 TRAINING
69 --------
70
71 There are several modes of operation for training metrics with MLR. In the
72 simplest mode, training data is contained in a matrix X (each column is a
73 training vector), and Y contains the labels/relevance for the training data
74 (see below).
75
76 [W, Xi, D] = mlr_train(X, Y, C,...)
77
78 X = d*n data matrix
79 Y = either n-by-1 label of vectors
80 OR
81 n-by-2 cell array where
82 Y{q,1} contains relevant indices for q, and
83 Y{q,2} contains irrelevant indices for q
84
85 C >= 0 slack trade-off parameter (default=1)
86
87 W = the learned metric, i.e., the inner product matrix of the learned
88 space can be computed by X' * W * X
89 Xi = slack value on the learned metric (see [1])
90 D = diagnostics
91
92
93 By default, MLR optimizes for Area Under the ROC Curve (AUC). This can be
94 changed by setting the "LOSS" parameter to one of several ranking loss
95 measures:
96
97 [W, Xi, D] = mlr_train(X, Y, C, LOSS)
98 where LOSS is one of:
99 'AUC': Area under ROC curve (default)
100 'KNN': KNN accuracy*
101 'Prec@k': Precision-at-k
102 'MAP': Mean Average Precision
103 'MRR': Mean Reciprocal Rank
104 'NDCG': Normalized Discounted Cumulative Gain
105
106 *Note: KNN is correct only for binary classification problems; in
107 practice, Prec@k is usually a better alternative.
108
109
110 For KNN/Prec@k/NDCG, a threshold k may be set to determine the truncation of
111 the ranked list. Thiscan be done by setting the k parameter:
112
113 [W, Xi, D] = mlr_train(X, Y, C, LOSS, k)
114 where k is the number of neighbors for Prec@k or NDCG
115 (default=3)
116
117
118 By default, MLR regularizes the metric W by the trace, i.e., the 1-norm of the
119 eigenvalues. This can be changed to one of several alternatives:
120
121 [W, Xi, D] = mlr_train(X, Y, C, LOSS, k, REG)
122 where REG defines the regularization on W, and is one of:
123 0: no regularization
124 1: 1-norm: trace(W) (default)
125 2: 2-norm: trace(W' * W)
126 3: Kernel: trace(W * X), assumes X is square
127 and positive-definite
128
129 The last setting, "kernel", is appropriate for regularizing metrics learned
130 from kernel matrices.
131
132
133 W corresponds to a linear projection matrix (rotation and scaling). To learn a
134 restricted model which may only scale, but not rotate the data, W can be
135 constrained to diagonal matrices by setting the "Diagonal" parameter to 1:
136
137 [W, Xi, D] = mlr_train(X, Y, C, LOSS, k, REG, Diagonal)
138 Diagonal = 0: learn a full d-by-d W (default)
139 Diagonal = 1: learn diagonally-constrained W (d-by-1)
140
141 Note: the W returned in this case will be the d-by-1 vector corresponding
142 to the main diagonal of a full metric, not the full d-by-d matrix.
143
144
145
146 Finally, we provide a stochastic gradient descent implementation to handle
147 large-scale problems. Rather than estimating gradients from the entire
148 training set, this variant uses a random subset of size B (see below) at each
149 call to the cutting plane subroutine. This results in faster, but less
150 accurate, optimization:
151
152 [W, Xi, D] = mlr_train(X, Y, C, LOSS, k, REG, Diagonal, B)
153 where B > 0 enables stochastic optimization with batch size B
154
155
156
157 TESTING
158 -------
159
160 Once a metric has been trained by "mlr_train", you can evaluate performance
161 across all measures by using the "mlr_test" function:
162
163 Perf = mlr_test(W, test_k, Xtrain, Ytrain, Xtest, Ytest)
164
165 W = d-by-d positive semi-definite matrix
166 test_k = vector of k-values to use for KNN/Prec@k/NDCG
167 Xtrain = d-by-n matrix of training data
168 Ytrain = n-by-1 vector of training labels
169 OR
170 n-by-2 cell array where
171 Y{q,1} contains relevant indices (in 1..n) for point q
172 Y{q,2} contains irrelevant indices (in 1..n) for point q
173 Xtest = d-by-m matrix of testing data
174 Ytest = m-by-1 vector of training labels, or m-by-2 cell array
175 If using the cell version, indices correspond to
176 the training set, and must lie in the range (1..n)
177
178
179 The output structure Perf contains the mean score for:
180 AUC, KNN, Prec@k, MAP, MRR, NDCG,
181 as well as the effective dimensionality of W, and
182 the best-performing k-value for KNN, Prec@k, and NDCG.
183
184 For information retrieval settings, consider Xtrain/Ytrain as the corpus
185 and Xtest/Ytest as the queries.
186
187 By testing with
188 Wnative = eye(size(X,1))
189 you can quickly evaluate the performance of the native (Euclidean) metric.
190
191
192 FEEDBACK
193 --------
194
195 Please send any bug reports, source code contributions, etc. to
196
197 Brian McFee <bmcfee@cs.ucsd.edu>
198