annotate toolboxes/distance_learning/mlr/readme.txt @ 0:e9a9cd732c1e tip

first hg version after svn
author wolffd
date Tue, 10 Feb 2015 15:05:51 +0000
parents
children
rev   line source
wolffd@0 1 Metric Learning to Rank (mlr-1.0)
wolffd@0 2 http://www-cse.ucsd.edu/~bmcfee/code/mlr/
wolffd@0 3
wolffd@0 4 Brian McFee <bmcfee@cs.ucsd.edu>, 2010.
wolffd@0 5
wolffd@0 6 This code is distributed under the GNU GPL license. See LICENSE for details,
wolffd@0 7 or http://www.gnu.org/licenses/gpl-3.0.txt .
wolffd@0 8
wolffd@0 9
wolffd@0 10
wolffd@0 11 INTRODUCTION
wolffd@0 12 ------------
wolffd@0 13 This package contains the MATLAB code for Metric Learning to Rank (MLR).
wolffd@0 14 The latest version of this software can be found at the URL above.
wolffd@0 15
wolffd@0 16 The software included here implements the algorithm described in
wolffd@0 17
wolffd@0 18 [1] Mcfee, Brian and Lanckriet, G.R.G. Metric learning to rank.
wolffd@0 19 Proceedings of the 27th annual International Conference
wolffd@0 20 on Machine Learning (ICML), 2010.
wolffd@0 21
wolffd@0 22 Please cite this paper if you use this code.
wolffd@0 23
wolffd@0 24
wolffd@0 25
wolffd@0 26 INSTALLATION
wolffd@0 27 ------------
wolffd@0 28
wolffd@0 29 1. Requirements
wolffd@0 30
wolffd@0 31 This software requires MATLAB R2007a or later. Because it makes extensive use
wolffd@0 32 of the "bsxfun" function, earlier versions of Matlab will not work.
wolffd@0 33
wolffd@0 34 If you have an older Matlab installation, you can install an alternative bsxfun
wolffd@0 35 implementation by going to
wolffd@0 36
wolffd@0 37 http://www.mathworks.com/matlabcentral/fileexchange/23005 ,
wolffd@0 38
wolffd@0 39 however, this may not work, and it will certainly be slower than the native
wolffd@0 40 bsxfun implementation.
wolffd@0 41
wolffd@0 42
wolffd@0 43 2. Compiling MEX functions
wolffd@0 44
wolffd@0 45 MLR includes two auxiliary functions "cummax" and "binarysearch" to accelerate
wolffd@0 46 certain steps of the optimization procedure. The source code for these
wolffd@0 47 functions is located in the "util" subdirectory.
wolffd@0 48
wolffd@0 49 A makefile is provided, so that (on Unix systems), you can simply type
wolffd@0 50
wolffd@0 51 cd util
wolffd@0 52 make
wolffd@0 53 cd ..
wolffd@0 54
wolffd@0 55
wolffd@0 56 3. Running the demo
wolffd@0 57
wolffd@0 58 To test the installation, run
wolffd@0 59
wolffd@0 60 mlr_demo
wolffd@0 61
wolffd@0 62 from within Matlab. The demo generates a random training/test split of the Wine data set
wolffd@0 63 http://archive.ics.uci.edu/ml/datasets/Wine
wolffd@0 64 and learns a metric with MLR. Both native and learned metrics are displayed in a figure
wolffd@0 65 with a scatter plot.
wolffd@0 66
wolffd@0 67
wolffd@0 68 TRAINING
wolffd@0 69 --------
wolffd@0 70
wolffd@0 71 There are several modes of operation for training metrics with MLR. In the
wolffd@0 72 simplest mode, training data is contained in a matrix X (each column is a
wolffd@0 73 training vector), and Y contains the labels/relevance for the training data
wolffd@0 74 (see below).
wolffd@0 75
wolffd@0 76 [W, Xi, D] = mlr_train(X, Y, C,...)
wolffd@0 77
wolffd@0 78 X = d*n data matrix
wolffd@0 79 Y = either n-by-1 label of vectors
wolffd@0 80 OR
wolffd@0 81 n-by-2 cell array where
wolffd@0 82 Y{q,1} contains relevant indices for q, and
wolffd@0 83 Y{q,2} contains irrelevant indices for q
wolffd@0 84
wolffd@0 85 C >= 0 slack trade-off parameter (default=1)
wolffd@0 86
wolffd@0 87 W = the learned metric, i.e., the inner product matrix of the learned
wolffd@0 88 space can be computed by X' * W * X
wolffd@0 89 Xi = slack value on the learned metric (see [1])
wolffd@0 90 D = diagnostics
wolffd@0 91
wolffd@0 92
wolffd@0 93 By default, MLR optimizes for Area Under the ROC Curve (AUC). This can be
wolffd@0 94 changed by setting the "LOSS" parameter to one of several ranking loss
wolffd@0 95 measures:
wolffd@0 96
wolffd@0 97 [W, Xi, D] = mlr_train(X, Y, C, LOSS)
wolffd@0 98 where LOSS is one of:
wolffd@0 99 'AUC': Area under ROC curve (default)
wolffd@0 100 'KNN': KNN accuracy*
wolffd@0 101 'Prec@k': Precision-at-k
wolffd@0 102 'MAP': Mean Average Precision
wolffd@0 103 'MRR': Mean Reciprocal Rank
wolffd@0 104 'NDCG': Normalized Discounted Cumulative Gain
wolffd@0 105
wolffd@0 106 *Note: KNN is correct only for binary classification problems; in
wolffd@0 107 practice, Prec@k is usually a better alternative.
wolffd@0 108
wolffd@0 109
wolffd@0 110 For KNN/Prec@k/NDCG, a threshold k may be set to determine the truncation of
wolffd@0 111 the ranked list. Thiscan be done by setting the k parameter:
wolffd@0 112
wolffd@0 113 [W, Xi, D] = mlr_train(X, Y, C, LOSS, k)
wolffd@0 114 where k is the number of neighbors for Prec@k or NDCG
wolffd@0 115 (default=3)
wolffd@0 116
wolffd@0 117
wolffd@0 118 By default, MLR regularizes the metric W by the trace, i.e., the 1-norm of the
wolffd@0 119 eigenvalues. This can be changed to one of several alternatives:
wolffd@0 120
wolffd@0 121 [W, Xi, D] = mlr_train(X, Y, C, LOSS, k, REG)
wolffd@0 122 where REG defines the regularization on W, and is one of:
wolffd@0 123 0: no regularization
wolffd@0 124 1: 1-norm: trace(W) (default)
wolffd@0 125 2: 2-norm: trace(W' * W)
wolffd@0 126 3: Kernel: trace(W * X), assumes X is square
wolffd@0 127 and positive-definite
wolffd@0 128
wolffd@0 129 The last setting, "kernel", is appropriate for regularizing metrics learned
wolffd@0 130 from kernel matrices.
wolffd@0 131
wolffd@0 132
wolffd@0 133 W corresponds to a linear projection matrix (rotation and scaling). To learn a
wolffd@0 134 restricted model which may only scale, but not rotate the data, W can be
wolffd@0 135 constrained to diagonal matrices by setting the "Diagonal" parameter to 1:
wolffd@0 136
wolffd@0 137 [W, Xi, D] = mlr_train(X, Y, C, LOSS, k, REG, Diagonal)
wolffd@0 138 Diagonal = 0: learn a full d-by-d W (default)
wolffd@0 139 Diagonal = 1: learn diagonally-constrained W (d-by-1)
wolffd@0 140
wolffd@0 141 Note: the W returned in this case will be the d-by-1 vector corresponding
wolffd@0 142 to the main diagonal of a full metric, not the full d-by-d matrix.
wolffd@0 143
wolffd@0 144
wolffd@0 145
wolffd@0 146 Finally, we provide a stochastic gradient descent implementation to handle
wolffd@0 147 large-scale problems. Rather than estimating gradients from the entire
wolffd@0 148 training set, this variant uses a random subset of size B (see below) at each
wolffd@0 149 call to the cutting plane subroutine. This results in faster, but less
wolffd@0 150 accurate, optimization:
wolffd@0 151
wolffd@0 152 [W, Xi, D] = mlr_train(X, Y, C, LOSS, k, REG, Diagonal, B)
wolffd@0 153 where B > 0 enables stochastic optimization with batch size B
wolffd@0 154
wolffd@0 155
wolffd@0 156
wolffd@0 157 TESTING
wolffd@0 158 -------
wolffd@0 159
wolffd@0 160 Once a metric has been trained by "mlr_train", you can evaluate performance
wolffd@0 161 across all measures by using the "mlr_test" function:
wolffd@0 162
wolffd@0 163 Perf = mlr_test(W, test_k, Xtrain, Ytrain, Xtest, Ytest)
wolffd@0 164
wolffd@0 165 W = d-by-d positive semi-definite matrix
wolffd@0 166 test_k = vector of k-values to use for KNN/Prec@k/NDCG
wolffd@0 167 Xtrain = d-by-n matrix of training data
wolffd@0 168 Ytrain = n-by-1 vector of training labels
wolffd@0 169 OR
wolffd@0 170 n-by-2 cell array where
wolffd@0 171 Y{q,1} contains relevant indices (in 1..n) for point q
wolffd@0 172 Y{q,2} contains irrelevant indices (in 1..n) for point q
wolffd@0 173 Xtest = d-by-m matrix of testing data
wolffd@0 174 Ytest = m-by-1 vector of training labels, or m-by-2 cell array
wolffd@0 175 If using the cell version, indices correspond to
wolffd@0 176 the training set, and must lie in the range (1..n)
wolffd@0 177
wolffd@0 178
wolffd@0 179 The output structure Perf contains the mean score for:
wolffd@0 180 AUC, KNN, Prec@k, MAP, MRR, NDCG,
wolffd@0 181 as well as the effective dimensionality of W, and
wolffd@0 182 the best-performing k-value for KNN, Prec@k, and NDCG.
wolffd@0 183
wolffd@0 184 For information retrieval settings, consider Xtrain/Ytrain as the corpus
wolffd@0 185 and Xtest/Ytest as the queries.
wolffd@0 186
wolffd@0 187 By testing with
wolffd@0 188 Wnative = eye(size(X,1))
wolffd@0 189 you can quickly evaluate the performance of the native (Euclidean) metric.
wolffd@0 190
wolffd@0 191
wolffd@0 192 FEEDBACK
wolffd@0 193 --------
wolffd@0 194
wolffd@0 195 Please send any bug reports, source code contributions, etc. to
wolffd@0 196
wolffd@0 197 Brian McFee <bmcfee@cs.ucsd.edu>
wolffd@0 198