wolffd@0: Metric Learning to Rank (mlr-1.0)
wolffd@0: http://www-cse.ucsd.edu/~bmcfee/code/mlr/
wolffd@0: 
wolffd@0: Brian McFee <bmcfee@cs.ucsd.edu>, 2010.
wolffd@0: 
wolffd@0: This code is distributed under the GNU GPL license.  See LICENSE for details, 
wolffd@0: or http://www.gnu.org/licenses/gpl-3.0.txt .
wolffd@0: 
wolffd@0: 
wolffd@0: 
wolffd@0: INTRODUCTION
wolffd@0: ------------
wolffd@0: This package contains the MATLAB code for Metric Learning to Rank (MLR).  
wolffd@0: The latest version of this software can be found at the URL above.
wolffd@0: 
wolffd@0: The software included here implements the algorithm described in
wolffd@0: 
wolffd@0: [1] Mcfee, Brian and Lanckriet, G.R.G.  Metric learning to rank.  
wolffd@0:     Proceedings of the 27th annual International Conference 
wolffd@0:     on Machine Learning (ICML), 2010.
wolffd@0: 
wolffd@0: Please cite this paper if you use this code.
wolffd@0: 
wolffd@0: 
wolffd@0: 
wolffd@0: INSTALLATION
wolffd@0: ------------
wolffd@0: 
wolffd@0: 1. Requirements
wolffd@0: 
wolffd@0: This software requires MATLAB R2007a or later.  Because it makes extensive use
wolffd@0: of the "bsxfun" function, earlier versions of Matlab will not work.
wolffd@0: 
wolffd@0: If you have an older Matlab installation, you can install an alternative bsxfun
wolffd@0: implementation by going to
wolffd@0: 
wolffd@0: http://www.mathworks.com/matlabcentral/fileexchange/23005 ,
wolffd@0: 
wolffd@0: however, this may not work, and it will certainly be slower than the native
wolffd@0: bsxfun implementation.
wolffd@0: 
wolffd@0: 
wolffd@0: 2. Compiling MEX functions
wolffd@0: 
wolffd@0: MLR includes two auxiliary functions "cummax" and "binarysearch" to accelerate
wolffd@0: certain steps of the optimization procedure.  The source code for these
wolffd@0: functions is located in the "util" subdirectory.
wolffd@0: 
wolffd@0: A makefile is provided, so that (on Unix systems), you can simply type
wolffd@0: 
wolffd@0: cd util
wolffd@0: make
wolffd@0: cd ..
wolffd@0: 
wolffd@0: 
wolffd@0: 3. Running the demo
wolffd@0: 
wolffd@0: To test the installation, run 
wolffd@0: 
wolffd@0:     mlr_demo
wolffd@0: 
wolffd@0: from within Matlab.  The demo generates a random training/test split of the Wine data set
wolffd@0: http://archive.ics.uci.edu/ml/datasets/Wine
wolffd@0: and learns a metric with MLR.  Both native and learned metrics are displayed in a figure
wolffd@0: with a scatter plot.
wolffd@0: 
wolffd@0: 
wolffd@0: TRAINING
wolffd@0: --------
wolffd@0: 
wolffd@0: There are several modes of operation for training metrics with MLR.  In the
wolffd@0: simplest mode, training data is contained in a matrix X (each column is a
wolffd@0: training vector), and Y contains the labels/relevance for the training data
wolffd@0: (see below).
wolffd@0: 
wolffd@0:    [W, Xi, D] = mlr_train(X, Y, C,...)
wolffd@0: 
wolffd@0:        X = d*n data matrix
wolffd@0:        Y = either n-by-1 label of vectors
wolffd@0:            OR
wolffd@0:            n-by-2 cell array where 
wolffd@0:                Y{q,1} contains relevant indices for q, and
wolffd@0:                Y{q,2} contains irrelevant indices for q
wolffd@0:        
wolffd@0:        C >= 0 slack trade-off parameter (default=1)
wolffd@0: 
wolffd@0:        W   = the learned metric, i.e., the inner product matrix of the learned 
wolffd@0:                 space can be computed by X' * W * X
wolffd@0:        Xi  = slack value on the learned metric (see [1])
wolffd@0:        D   = diagnostics
wolffd@0: 
wolffd@0: 
wolffd@0: By default, MLR optimizes for Area Under the ROC Curve (AUC).  This can be
wolffd@0: changed by setting the "LOSS" parameter to one of several ranking loss
wolffd@0: measures:
wolffd@0: 
wolffd@0:    [W, Xi, D] = mlr_train(X, Y, C, LOSS)
wolffd@0:        where LOSS is one of:
wolffd@0:            'AUC':      Area under ROC curve (default)
wolffd@0:            'KNN':      KNN accuracy*
wolffd@0:            'Prec@k':   Precision-at-k
wolffd@0:            'MAP':      Mean Average Precision
wolffd@0:            'MRR':      Mean Reciprocal Rank
wolffd@0:            'NDCG':     Normalized Discounted Cumulative Gain
wolffd@0: 
wolffd@0:     *Note:  KNN is correct only for binary classification problems; in 
wolffd@0:             practice, Prec@k is usually a better alternative.
wolffd@0: 
wolffd@0: 
wolffd@0: For KNN/Prec@k/NDCG, a threshold k may be set to determine the truncation of
wolffd@0: the ranked list.  Thiscan be done by setting the k parameter:
wolffd@0: 
wolffd@0:    [W, Xi, D] = mlr_train(X, Y, C, LOSS, k)
wolffd@0:        where k is the number of neighbors for Prec@k or NDCG
wolffd@0:        (default=3)
wolffd@0: 
wolffd@0: 
wolffd@0: By default, MLR regularizes the metric W by the trace, i.e., the 1-norm of the
wolffd@0: eigenvalues.  This can be changed to one of several alternatives:
wolffd@0: 
wolffd@0:    [W, Xi, D] = mlr_train(X, Y, C, LOSS, k, REG)
wolffd@0:        where REG defines the regularization on W, and is one of:
wolffd@0:            0:          no regularization
wolffd@0:            1:          1-norm: trace(W)            (default)
wolffd@0:            2:          2-norm: trace(W' * W)
wolffd@0:            3:          Kernel: trace(W * X), assumes X is square 
wolffd@0:                                 and positive-definite
wolffd@0: 
wolffd@0:     The last setting, "kernel", is appropriate for regularizing metrics learned
wolffd@0:     from kernel matrices.
wolffd@0: 
wolffd@0: 
wolffd@0: W corresponds to a linear projection matrix (rotation and scaling).  To learn a
wolffd@0: restricted model which may only scale, but not rotate the data, W can be
wolffd@0: constrained to diagonal matrices by setting the "Diagonal" parameter to 1:
wolffd@0: 
wolffd@0:    [W, Xi, D] = mlr_train(X, Y, C, LOSS, k, REG, Diagonal)
wolffd@0:        Diagonal = 0:   learn a full d-by-d W       (default)
wolffd@0:        Diagonal = 1:   learn diagonally-constrained W (d-by-1)
wolffd@0:        
wolffd@0:     Note: the W returned in this case will be the d-by-1 vector corresponding
wolffd@0:           to the main diagonal of a full metric, not the full d-by-d matrix.
wolffd@0: 
wolffd@0: 
wolffd@0: 
wolffd@0: Finally, we provide a stochastic gradient descent implementation to handle
wolffd@0: large-scale problems.  Rather than estimating gradients from the entire
wolffd@0: training set, this variant uses a random subset of size B (see below) at each
wolffd@0: call to the cutting plane subroutine.  This results in faster, but less
wolffd@0: accurate, optimization:
wolffd@0: 
wolffd@0:    [W, Xi, D] = mlr_train(X, Y, C, LOSS, k, REG, Diagonal, B)
wolffd@0:        where B > 0 enables stochastic optimization with batch size B
wolffd@0: 
wolffd@0: 
wolffd@0: 
wolffd@0: TESTING
wolffd@0: -------
wolffd@0: 
wolffd@0: Once a metric has been trained by "mlr_train", you can evaluate performance
wolffd@0: across all measures by using the "mlr_test" function:
wolffd@0: 
wolffd@0:    Perf = mlr_test(W, test_k, Xtrain, Ytrain, Xtest, Ytest)
wolffd@0: 
wolffd@0:        W       = d-by-d positive semi-definite matrix
wolffd@0:        test_k  = vector of k-values to use for KNN/Prec@k/NDCG
wolffd@0:        Xtrain  = d-by-n matrix of training data
wolffd@0:        Ytrain  = n-by-1 vector of training labels
wolffd@0:                    OR
wolffd@0:                  n-by-2 cell array where
wolffd@0:                    Y{q,1} contains relevant indices (in 1..n) for point q
wolffd@0:                    Y{q,2} contains irrelevant indices (in 1..n) for point q
wolffd@0:        Xtest   = d-by-m matrix of testing data
wolffd@0:        Ytest   = m-by-1 vector of training labels, or m-by-2 cell array 
wolffd@0:                    If using the cell version, indices correspond to
wolffd@0:                    the training set, and must lie in the range (1..n)
wolffd@0:                
wolffd@0: 
wolffd@0:    The output structure Perf contains the mean score for:
wolffd@0:        AUC, KNN, Prec@k, MAP, MRR, NDCG,
wolffd@0:    as well as the effective dimensionality of W, and
wolffd@0:    the best-performing k-value for KNN, Prec@k, and NDCG.
wolffd@0: 
wolffd@0:    For information retrieval settings, consider Xtrain/Ytrain as the corpus
wolffd@0:    and Xtest/Ytest as the queries.
wolffd@0: 
wolffd@0: By testing with 
wolffd@0:     Wnative = eye(size(X,1))
wolffd@0: you can quickly evaluate the performance of the native (Euclidean) metric.
wolffd@0: 
wolffd@0: 
wolffd@0: FEEDBACK
wolffd@0: --------
wolffd@0: 
wolffd@0: Please send any bug reports, source code contributions, etc. to 
wolffd@0: 
wolffd@0: Brian McFee <bmcfee@cs.ucsd.edu>
wolffd@0: