camir-aes2014: toolboxes/distance_learning/mlr/README annotate

annotate toolboxes/distance_learning/mlr/README @ 0:e9a9cd732c1e tip

first hg version after svn

author	wolffd
date	Tue, 10 Feb 2015 15:05:51 +0000
parents
children

rev	line source
wolffd@0	1 Metric Learning to Rank (mlr-1.2)
wolffd@0	2 http://www-cse.ucsd.edu/~bmcfee/code/mlr/
wolffd@0	3
wolffd@0	4 AUTHORS:
wolffd@0	5 Brian McFee <bmcfee@cs.ucsd.edu>
wolffd@0	6 Daryl Lim <dklim@ucsd.edu>
wolffd@0	7
wolffd@0	8 This code is distributed under the GNU GPL license. See LICENSE for details
wolffd@0	9 or http://www.gnu.org/licenses/gpl-3.0.txt
wolffd@0	10
wolffd@0	11
wolffd@0	12
wolffd@0	13 INTRODUCTION
wolffd@0	14 ------------
wolffd@0	15 This package contains the MATLAB code for Metric Learning to Rank (MLR).
wolffd@0	16 The latest version of this software can be found at the URL above.
wolffd@0	17
wolffd@0	18 The software included here implements the algorithm described in
wolffd@0	19
wolffd@0	20 [1] McFee, Brian and Lanckriet, G.R.G. Metric learning to rank.
wolffd@0	21 Proceedings of the 27th annual International Conference
wolffd@0	22 on Machine Learning (ICML), 2010.
wolffd@0	23
wolffd@0	24 Please cite this paper if you use this code.
wolffd@0	25
wolffd@0	26 If you use the Robust MLR code (rmlr_train), please cite:
wolffd@0	27
wolffd@0	28 [2] Lim, D.K.H., McFee, B. and Lanckriet, G.R.G. Robust structural metric learning.
wolffd@0	29 Proceedings of the 30th annual International Conference on Machine Learning (ICML),
wolffd@0	30 2013.
wolffd@0	31
wolffd@0	32
wolffd@0	33 INSTALLATION
wolffd@0	34 ------------
wolffd@0	35
wolffd@0	36 1. Requirements
wolffd@0	37
wolffd@0	38 This software requires MATLAB R2007a or later. Because it makes extensive use
wolffd@0	39 of the "bsxfun" function, earlier versions of Matlab will not work.
wolffd@0	40
wolffd@0	41 If you have an older Matlab installation, you can install an alternative bsxfun
wolffd@0	42 implementation by going to
wolffd@0	43
wolffd@0	44 http://www.mathworks.com/matlabcentral/fileexchange/23005 ,
wolffd@0	45
wolffd@0	46 however, this may not work, and it will certainly be slower than the native
wolffd@0	47 bsxfun implementation.
wolffd@0	48
wolffd@0	49
wolffd@0	50 2. Compiling MEX functions
wolffd@0	51
wolffd@0	52 MLR includes two auxiliary functions "cummax" and "binarysearch" to accelerate
wolffd@0	53 certain steps of the optimization procedure. The source code for these
wolffd@0	54 functions is located in the "util" subdirectory.
wolffd@0	55
wolffd@0	56 A makefile is provided, so that (on Unix systems), you can simply type
wolffd@0	57
wolffd@0	58 cd util
wolffd@0	59 make
wolffd@0	60 cd ..
wolffd@0	61
wolffd@0	62
wolffd@0	63 3. Running the demo
wolffd@0	64
wolffd@0	65 To test the installation, first add the path to MLR to your Matlab environment:
wolffd@0	66
wolffd@0	67 >> addpath(genpath('/path/to/mlr/'));
wolffd@0	68
wolffd@0	69 Then, run the demo script:
wolffd@0	70
wolffd@0	71 >> mlr_demo
wolffd@0	72
wolffd@0	73 from within Matlab. The demo generates a random training/test split of the Wine data set
wolffd@0	74 (http://archive.ics.uci.edu/ml/datasets/Wine)
wolffd@0	75 and learns a metric with MLR. Both native and learned metrics are displayed in a figure
wolffd@0	76 with a scatter plot.
wolffd@0	77
wolffd@0	78
wolffd@0	79 TRAINING
wolffd@0	80 --------
wolffd@0	81
wolffd@0	82 There are several modes of operation for training metrics with MLR. In the
wolffd@0	83 simplest mode, training data is contained in a matrix X (each column is a
wolffd@0	84 training vector), and Y contains the labels/relevance for the training data
wolffd@0	85 (see below).
wolffd@0	86
wolffd@0	87 [W, Xi, D] = mlr_train(X, Y, C,...)
wolffd@0	88
wolffd@0	89 X = d*n data matrix
wolffd@0	90 Y = either n-by-1 label of vectors
wolffd@0	91 OR
wolffd@0	92 n-by-2 cell array where
wolffd@0	93 Y{q,1} contains relevant indices for q, and
wolffd@0	94 Y{q,2} contains irrelevant indices for q
wolffd@0	95
wolffd@0	96 C >= 0 slack trade-off parameter (default=1)
wolffd@0	97
wolffd@0	98 W = the learned metric, i.e., the inner product matrix of the learned
wolffd@0	99 space can be computed by X' * W * X
wolffd@0	100 Xi = slack value on the learned metric (see [1])
wolffd@0	101 D = diagnostics
wolffd@0	102
wolffd@0	103
wolffd@0	104 By default, MLR optimizes for Area Under the ROC Curve (AUC). This can be
wolffd@0	105 changed by setting the "LOSS" parameter to one of several ranking loss
wolffd@0	106 measures:
wolffd@0	107
wolffd@0	108 [W, Xi, D] = mlr_train(X, Y, C, LOSS)
wolffd@0	109 where LOSS is one of:
wolffd@0	110 'AUC': Area under ROC curve (default)
wolffd@0	111 'KNN': KNN accuracy*
wolffd@0	112 'Prec@k': Precision-at-k
wolffd@0	113 'MAP': Mean Average Precision
wolffd@0	114 'MRR': Mean Reciprocal Rank
wolffd@0	115 'NDCG': Normalized Discounted Cumulative Gain
wolffd@0	116
wolffd@0	117 *Note: KNN is correct only for binary classification problems; in
wolffd@0	118 practice, Prec@k is usually a better alternative.
wolffd@0	119
wolffd@0	120
wolffd@0	121 For KNN/Prec@k/NDCG, a threshold k may be set to determine the truncation of
wolffd@0	122 the ranked list. Thiscan be done by setting the k parameter:
wolffd@0	123
wolffd@0	124 [W, Xi, D] = mlr_train(X, Y, C, LOSS, k)
wolffd@0	125 where k is the number of neighbors for Prec@k or NDCG
wolffd@0	126 (default=3)
wolffd@0	127
wolffd@0	128
wolffd@0	129 By default, MLR regularizes the metric W by the trace, i.e., the 1-norm of the
wolffd@0	130 eigenvalues. This can be changed to one of several alternatives:
wolffd@0	131
wolffd@0	132 [W, Xi, D] = mlr_train(X, Y, C, LOSS, k, REG)
wolffd@0	133 where REG defines the regularization on W, and is one of:
wolffd@0	134 0: no regularization
wolffd@0	135 1: 1-norm: trace(W) (default)
wolffd@0	136 2: 2-norm: trace(W' * W)
wolffd@0	137 3: Kernel: trace(W * X), assumes X is square
wolffd@0	138 and positive-definite
wolffd@0	139
wolffd@0	140 The last setting, "kernel", is appropriate for regularizing metrics learned
wolffd@0	141 from kernel matrices.
wolffd@0	142
wolffd@0	143
wolffd@0	144 W corresponds to a linear projection matrix (rotation and scaling). To learn a
wolffd@0	145 restricted model which may only scale, but not rotate the data, W can be
wolffd@0	146 constrained to diagonal matrices by setting the "Diagonal" parameter to 1:
wolffd@0	147
wolffd@0	148 [W, Xi, D] = mlr_train(X, Y, C, LOSS, k, REG, Diagonal)
wolffd@0	149 Diagonal = 0: learn a full d-by-d W (default)
wolffd@0	150 Diagonal = 1: learn diagonally-constrained W (d-by-1)
wolffd@0	151
wolffd@0	152 Note: the W returned in this case will be the d-by-1 vector corresponding
wolffd@0	153 to the main diagonal of a full metric, not the full d-by-d matrix.
wolffd@0	154
wolffd@0	155
wolffd@0	156
wolffd@0	157 Finally, we provide a stochastic gradient descent implementation to handle
wolffd@0	158 large-scale problems. Rather than estimating gradients from the entire
wolffd@0	159 training set, this variant uses a random subset of size B (see below) at each
wolffd@0	160 call to the cutting plane subroutine. This results in faster, but less
wolffd@0	161 accurate, optimization:
wolffd@0	162
wolffd@0	163 [W, Xi, D] = mlr_train(X, Y, C, LOSS, k, REG, Diagonal, B)
wolffd@0	164 where B > 0 enables stochastic optimization with batch size B
wolffd@0	165
wolffd@0	166
wolffd@0	167
wolffd@0	168 TESTING
wolffd@0	169 -------
wolffd@0	170
wolffd@0	171 Once a metric has been trained by "mlr_train", you can evaluate performance
wolffd@0	172 across all measures by using the "mlr_test" function:
wolffd@0	173
wolffd@0	174 Perf = mlr_test(W, test_k, Xtrain, Ytrain, Xtest, Ytest)
wolffd@0	175
wolffd@0	176 W = d-by-d positive semi-definite matrix
wolffd@0	177 test_k = vector of k-values to use for KNN/Prec@k/NDCG
wolffd@0	178 Xtrain = d-by-n matrix of training data
wolffd@0	179 Ytrain = n-by-1 vector of training labels
wolffd@0	180 OR
wolffd@0	181 n-by-2 cell array where
wolffd@0	182 Y{q,1} contains relevant indices (in 1..n) for point q
wolffd@0	183 Y{q,2} contains irrelevant indices (in 1..n) for point q
wolffd@0	184 Xtest = d-by-m matrix of testing data
wolffd@0	185 Ytest = m-by-1 vector of training labels, or m-by-2 cell array
wolffd@0	186 If using the cell version, indices correspond to
wolffd@0	187 the training set, and must lie in the range (1..n)
wolffd@0	188
wolffd@0	189
wolffd@0	190 The output structure Perf contains the mean score for:
wolffd@0	191 AUC, KNN, Prec@k, MAP, MRR, NDCG,
wolffd@0	192 as well as the effective dimensionality of W, and
wolffd@0	193 the best-performing k-value for KNN, Prec@k, and NDCG.
wolffd@0	194
wolffd@0	195 For information retrieval settings, consider Xtrain/Ytrain as the corpus
wolffd@0	196 and Xtest/Ytest as the queries.
wolffd@0	197
wolffd@0	198 By testing with
wolffd@0	199 Wnative = []
wolffd@0	200 you can quickly evaluate the performance of the native (Euclidean) metric.
wolffd@0	201
wolffd@0	202
wolffd@0	203 MULTIPLE KERNEL LEARNING
wolffd@0	204 ------------------------
wolffd@0	205
wolffd@0	206 As of version 1.1, MLR supports learning (multiple) kernel metrics, using the method described in
wolffd@0	207
wolffd@0	208 [3] Galleguillos, Carolina, McFee, Brian, Belongie, Serge, and Lanckriet, G.R.G.
wolffd@0	209 From region similarity to category discovery.
wolffd@0	210 Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.
wolffd@0	211
wolffd@0	212 For m kernels, input data must be provided in the form of an n-by-n-by-m matrix K, where K(:,:,i)
wolffd@0	213 contains the i'th kernel matrix of the training data. Training the metric is similar to the
wolffd@0	214 single-kernel case, except that the regularization parameter should be set to 3 (kernel regularization),
wolffd@0	215 eg:
wolffd@0	216
wolffd@0	217 >> [W, Xi, D] = mlr_train(K, Y, C, 'auc', 1, 3);
wolffd@0	218
wolffd@0	219 returns an n-by-n-by-m matrix W which can be used directly with mlr_test. Multiple kernel learning also supports
wolffd@0	220 diagonally constrained learning, eg:
wolffd@0	221
wolffd@0	222 >> [W, Xi, D] = mlr_train(K, Y, C, 'auc', 1, 3, 1);
wolffd@0	223
wolffd@0	224 returns an n-by-m matrix W, where W(:,i) is the main diagonal of the i'th kernels metric. Again, this W can be fed
wolffd@0	225 directly into mlr_test.
wolffd@0	226
wolffd@0	227 For testing with multiple kernel data, Ktest should be an n-by-nTest-by-m matrix, where K(:,:,i) is the
wolffd@0	228 training-by-test slice of the i'th kernel matrix.
wolffd@0	229
wolffd@0	230 FEEDBACK
wolffd@0	231 --------
wolffd@0	232
wolffd@0	233 Please send any bug reports, source code contributions, etc. to
wolffd@0	234
wolffd@0	235 Brian McFee <bmcfee@cs.ucsd.edu>
wolffd@0	236

Mercurial > hg > camir-aes2014

annotate toolboxes/distance_learning/mlr/README @ 0:e9a9cd732c1e tip