wolffd@0: Metric Learning to Rank (mlr-1.0) wolffd@0: http://www-cse.ucsd.edu/~bmcfee/code/mlr/ wolffd@0: wolffd@0: Brian McFee , 2010. wolffd@0: wolffd@0: This code is distributed under the GNU GPL license. See LICENSE for details, wolffd@0: or http://www.gnu.org/licenses/gpl-3.0.txt . wolffd@0: wolffd@0: wolffd@0: wolffd@0: INTRODUCTION wolffd@0: ------------ wolffd@0: This package contains the MATLAB code for Metric Learning to Rank (MLR). wolffd@0: The latest version of this software can be found at the URL above. wolffd@0: wolffd@0: The software included here implements the algorithm described in wolffd@0: wolffd@0: [1] Mcfee, Brian and Lanckriet, G.R.G. Metric learning to rank. wolffd@0: Proceedings of the 27th annual International Conference wolffd@0: on Machine Learning (ICML), 2010. wolffd@0: wolffd@0: Please cite this paper if you use this code. wolffd@0: wolffd@0: wolffd@0: wolffd@0: INSTALLATION wolffd@0: ------------ wolffd@0: wolffd@0: 1. Requirements wolffd@0: wolffd@0: This software requires MATLAB R2007a or later. Because it makes extensive use wolffd@0: of the "bsxfun" function, earlier versions of Matlab will not work. wolffd@0: wolffd@0: If you have an older Matlab installation, you can install an alternative bsxfun wolffd@0: implementation by going to wolffd@0: wolffd@0: http://www.mathworks.com/matlabcentral/fileexchange/23005 , wolffd@0: wolffd@0: however, this may not work, and it will certainly be slower than the native wolffd@0: bsxfun implementation. wolffd@0: wolffd@0: wolffd@0: 2. Compiling MEX functions wolffd@0: wolffd@0: MLR includes two auxiliary functions "cummax" and "binarysearch" to accelerate wolffd@0: certain steps of the optimization procedure. The source code for these wolffd@0: functions is located in the "util" subdirectory. wolffd@0: wolffd@0: A makefile is provided, so that (on Unix systems), you can simply type wolffd@0: wolffd@0: cd util wolffd@0: make wolffd@0: cd .. wolffd@0: wolffd@0: wolffd@0: 3. Running the demo wolffd@0: wolffd@0: To test the installation, run wolffd@0: wolffd@0: mlr_demo wolffd@0: wolffd@0: from within Matlab. The demo generates a random training/test split of the Wine data set wolffd@0: http://archive.ics.uci.edu/ml/datasets/Wine wolffd@0: and learns a metric with MLR. Both native and learned metrics are displayed in a figure wolffd@0: with a scatter plot. wolffd@0: wolffd@0: wolffd@0: TRAINING wolffd@0: -------- wolffd@0: wolffd@0: There are several modes of operation for training metrics with MLR. In the wolffd@0: simplest mode, training data is contained in a matrix X (each column is a wolffd@0: training vector), and Y contains the labels/relevance for the training data wolffd@0: (see below). wolffd@0: wolffd@0: [W, Xi, D] = mlr_train(X, Y, C,...) wolffd@0: wolffd@0: X = d*n data matrix wolffd@0: Y = either n-by-1 label of vectors wolffd@0: OR wolffd@0: n-by-2 cell array where wolffd@0: Y{q,1} contains relevant indices for q, and wolffd@0: Y{q,2} contains irrelevant indices for q wolffd@0: wolffd@0: C >= 0 slack trade-off parameter (default=1) wolffd@0: wolffd@0: W = the learned metric, i.e., the inner product matrix of the learned wolffd@0: space can be computed by X' * W * X wolffd@0: Xi = slack value on the learned metric (see [1]) wolffd@0: D = diagnostics wolffd@0: wolffd@0: wolffd@0: By default, MLR optimizes for Area Under the ROC Curve (AUC). This can be wolffd@0: changed by setting the "LOSS" parameter to one of several ranking loss wolffd@0: measures: wolffd@0: wolffd@0: [W, Xi, D] = mlr_train(X, Y, C, LOSS) wolffd@0: where LOSS is one of: wolffd@0: 'AUC': Area under ROC curve (default) wolffd@0: 'KNN': KNN accuracy* wolffd@0: 'Prec@k': Precision-at-k wolffd@0: 'MAP': Mean Average Precision wolffd@0: 'MRR': Mean Reciprocal Rank wolffd@0: 'NDCG': Normalized Discounted Cumulative Gain wolffd@0: wolffd@0: *Note: KNN is correct only for binary classification problems; in wolffd@0: practice, Prec@k is usually a better alternative. wolffd@0: wolffd@0: wolffd@0: For KNN/Prec@k/NDCG, a threshold k may be set to determine the truncation of wolffd@0: the ranked list. Thiscan be done by setting the k parameter: wolffd@0: wolffd@0: [W, Xi, D] = mlr_train(X, Y, C, LOSS, k) wolffd@0: where k is the number of neighbors for Prec@k or NDCG wolffd@0: (default=3) wolffd@0: wolffd@0: wolffd@0: By default, MLR regularizes the metric W by the trace, i.e., the 1-norm of the wolffd@0: eigenvalues. This can be changed to one of several alternatives: wolffd@0: wolffd@0: [W, Xi, D] = mlr_train(X, Y, C, LOSS, k, REG) wolffd@0: where REG defines the regularization on W, and is one of: wolffd@0: 0: no regularization wolffd@0: 1: 1-norm: trace(W) (default) wolffd@0: 2: 2-norm: trace(W' * W) wolffd@0: 3: Kernel: trace(W * X), assumes X is square wolffd@0: and positive-definite wolffd@0: wolffd@0: The last setting, "kernel", is appropriate for regularizing metrics learned wolffd@0: from kernel matrices. wolffd@0: wolffd@0: wolffd@0: W corresponds to a linear projection matrix (rotation and scaling). To learn a wolffd@0: restricted model which may only scale, but not rotate the data, W can be wolffd@0: constrained to diagonal matrices by setting the "Diagonal" parameter to 1: wolffd@0: wolffd@0: [W, Xi, D] = mlr_train(X, Y, C, LOSS, k, REG, Diagonal) wolffd@0: Diagonal = 0: learn a full d-by-d W (default) wolffd@0: Diagonal = 1: learn diagonally-constrained W (d-by-1) wolffd@0: wolffd@0: Note: the W returned in this case will be the d-by-1 vector corresponding wolffd@0: to the main diagonal of a full metric, not the full d-by-d matrix. wolffd@0: wolffd@0: wolffd@0: wolffd@0: Finally, we provide a stochastic gradient descent implementation to handle wolffd@0: large-scale problems. Rather than estimating gradients from the entire wolffd@0: training set, this variant uses a random subset of size B (see below) at each wolffd@0: call to the cutting plane subroutine. This results in faster, but less wolffd@0: accurate, optimization: wolffd@0: wolffd@0: [W, Xi, D] = mlr_train(X, Y, C, LOSS, k, REG, Diagonal, B) wolffd@0: where B > 0 enables stochastic optimization with batch size B wolffd@0: wolffd@0: wolffd@0: wolffd@0: TESTING wolffd@0: ------- wolffd@0: wolffd@0: Once a metric has been trained by "mlr_train", you can evaluate performance wolffd@0: across all measures by using the "mlr_test" function: wolffd@0: wolffd@0: Perf = mlr_test(W, test_k, Xtrain, Ytrain, Xtest, Ytest) wolffd@0: wolffd@0: W = d-by-d positive semi-definite matrix wolffd@0: test_k = vector of k-values to use for KNN/Prec@k/NDCG wolffd@0: Xtrain = d-by-n matrix of training data wolffd@0: Ytrain = n-by-1 vector of training labels wolffd@0: OR wolffd@0: n-by-2 cell array where wolffd@0: Y{q,1} contains relevant indices (in 1..n) for point q wolffd@0: Y{q,2} contains irrelevant indices (in 1..n) for point q wolffd@0: Xtest = d-by-m matrix of testing data wolffd@0: Ytest = m-by-1 vector of training labels, or m-by-2 cell array wolffd@0: If using the cell version, indices correspond to wolffd@0: the training set, and must lie in the range (1..n) wolffd@0: wolffd@0: wolffd@0: The output structure Perf contains the mean score for: wolffd@0: AUC, KNN, Prec@k, MAP, MRR, NDCG, wolffd@0: as well as the effective dimensionality of W, and wolffd@0: the best-performing k-value for KNN, Prec@k, and NDCG. wolffd@0: wolffd@0: For information retrieval settings, consider Xtrain/Ytrain as the corpus wolffd@0: and Xtest/Ytest as the queries. wolffd@0: wolffd@0: By testing with wolffd@0: Wnative = eye(size(X,1)) wolffd@0: you can quickly evaluate the performance of the native (Euclidean) metric. wolffd@0: wolffd@0: wolffd@0: FEEDBACK wolffd@0: -------- wolffd@0: wolffd@0: Please send any bug reports, source code contributions, etc. to wolffd@0: wolffd@0: Brian McFee wolffd@0: