Daniel@0: function [x, options, flog, pointlog] = graddesc(f, x, options, gradf, ... Daniel@0: varargin) Daniel@0: %GRADDESC Gradient descent optimization. Daniel@0: % Daniel@0: % Description Daniel@0: % [X, OPTIONS, FLOG, POINTLOG] = GRADDESC(F, X, OPTIONS, GRADF) uses Daniel@0: % batch gradient descent to find a local minimum of the function F(X) Daniel@0: % whose gradient is given by GRADF(X). A log of the function values Daniel@0: % after each cycle is (optionally) returned in ERRLOG, and a log of the Daniel@0: % points visited is (optionally) returned in POINTLOG. Daniel@0: % Daniel@0: % Note that X is a row vector and F returns a scalar value. The point Daniel@0: % at which F has a local minimum is returned as X. The function value Daniel@0: % at that point is returned in OPTIONS(8). Daniel@0: % Daniel@0: % GRADDESC(F, X, OPTIONS, GRADF, P1, P2, ...) allows additional Daniel@0: % arguments to be passed to F() and GRADF(). Daniel@0: % Daniel@0: % The optional parameters have the following interpretations. Daniel@0: % Daniel@0: % OPTIONS(1) is set to 1 to display error values; also logs error Daniel@0: % values in the return argument ERRLOG, and the points visited in the Daniel@0: % return argument POINTSLOG. If OPTIONS(1) is set to 0, then only Daniel@0: % warning messages are displayed. If OPTIONS(1) is -1, then nothing is Daniel@0: % displayed. Daniel@0: % Daniel@0: % OPTIONS(2) is the absolute precision required for the value of X at Daniel@0: % the solution. If the absolute difference between the values of X Daniel@0: % between two successive steps is less than OPTIONS(2), then this Daniel@0: % condition is satisfied. Daniel@0: % Daniel@0: % OPTIONS(3) is a measure of the precision required of the objective Daniel@0: % function at the solution. If the absolute difference between the Daniel@0: % objective function values between two successive steps is less than Daniel@0: % OPTIONS(3), then this condition is satisfied. Both this and the Daniel@0: % previous condition must be satisfied for termination. Daniel@0: % Daniel@0: % OPTIONS(7) determines the line minimisation method used. If it is Daniel@0: % set to 1 then a line minimiser is used (in the direction of the Daniel@0: % negative gradient). If it is 0 (the default), then each parameter Daniel@0: % update is a fixed multiple (the learning rate) of the negative Daniel@0: % gradient added to a fixed multiple (the momentum) of the previous Daniel@0: % parameter update. Daniel@0: % Daniel@0: % OPTIONS(9) should be set to 1 to check the user defined gradient Daniel@0: % function GRADF with GRADCHEK. This is carried out at the initial Daniel@0: % parameter vector X. Daniel@0: % Daniel@0: % OPTIONS(10) returns the total number of function evaluations Daniel@0: % (including those in any line searches). Daniel@0: % Daniel@0: % OPTIONS(11) returns the total number of gradient evaluations. Daniel@0: % Daniel@0: % OPTIONS(14) is the maximum number of iterations; default 100. Daniel@0: % Daniel@0: % OPTIONS(15) is the precision in parameter space of the line search; Daniel@0: % default FOPTIONS(2). Daniel@0: % Daniel@0: % OPTIONS(17) is the momentum; default 0.5. It should be scaled by the Daniel@0: % inverse of the number of data points. Daniel@0: % Daniel@0: % OPTIONS(18) is the learning rate; default 0.01. It should be scaled Daniel@0: % by the inverse of the number of data points. Daniel@0: % Daniel@0: % See also Daniel@0: % CONJGRAD, LINEMIN, OLGD, MINBRACK, QUASINEW, SCG Daniel@0: % Daniel@0: Daniel@0: % Copyright (c) Ian T Nabney (1996-2001) Daniel@0: Daniel@0: % Set up the options. Daniel@0: if length(options) < 18 Daniel@0: error('Options vector too short') Daniel@0: end Daniel@0: Daniel@0: if (options(14)) Daniel@0: niters = options(14); Daniel@0: else Daniel@0: niters = 100; Daniel@0: end Daniel@0: Daniel@0: line_min_flag = 0; % Flag for line minimisation option Daniel@0: if (round(options(7)) == 1) Daniel@0: % Use line minimisation Daniel@0: line_min_flag = 1; Daniel@0: % Set options for line minimiser Daniel@0: line_options = foptions; Daniel@0: if options(15) > 0 Daniel@0: line_options(2) = options(15); Daniel@0: end Daniel@0: else Daniel@0: % Learning rate: must be positive Daniel@0: if (options(18) > 0) Daniel@0: eta = options(18); Daniel@0: else Daniel@0: eta = 0.01; Daniel@0: end Daniel@0: % Momentum term: allow zero momentum Daniel@0: if (options(17) >= 0) Daniel@0: mu = options(17); Daniel@0: else Daniel@0: mu = 0.5; Daniel@0: end Daniel@0: end Daniel@0: Daniel@0: % Check function string Daniel@0: f = fcnchk(f, length(varargin)); Daniel@0: gradf = fcnchk(gradf, length(varargin)); Daniel@0: Daniel@0: % Display information if options(1) > 0 Daniel@0: display = options(1) > 0; Daniel@0: Daniel@0: % Work out if we need to compute f at each iteration. Daniel@0: % Needed if using line search or if display results or if termination Daniel@0: % criterion requires it. Daniel@0: fcneval = (options(7) | display | options(3)); Daniel@0: Daniel@0: % Check gradients Daniel@0: if (options(9) > 0) Daniel@0: feval('gradchek', x, f, gradf, varargin{:}); Daniel@0: end Daniel@0: Daniel@0: dxold = zeros(1, size(x, 2)); Daniel@0: xold = x; Daniel@0: fold = 0; % Must be initialised so that termination test can be performed Daniel@0: if fcneval Daniel@0: fnew = feval(f, x, varargin{:}); Daniel@0: options(10) = options(10) + 1; Daniel@0: fold = fnew; Daniel@0: end Daniel@0: Daniel@0: % Main optimization loop. Daniel@0: for j = 1:niters Daniel@0: xold = x; Daniel@0: grad = feval(gradf, x, varargin{:}); Daniel@0: options(11) = options(11) + 1; % Increment gradient evaluation counter Daniel@0: if (line_min_flag ~= 1) Daniel@0: dx = mu*dxold - eta*grad; Daniel@0: x = x + dx; Daniel@0: dxold = dx; Daniel@0: if fcneval Daniel@0: fold = fnew; Daniel@0: fnew = feval(f, x, varargin{:}); Daniel@0: options(10) = options(10) + 1; Daniel@0: end Daniel@0: else Daniel@0: sd = - grad./norm(grad); % New search direction. Daniel@0: fold = fnew; Daniel@0: % Do a line search: normalise search direction to have length 1 Daniel@0: [lmin, line_options] = feval('linemin', f, x, sd, fold, ... Daniel@0: line_options, varargin{:}); Daniel@0: options(10) = options(10) + line_options(10); Daniel@0: x = xold + lmin*sd; Daniel@0: fnew = line_options(8); Daniel@0: end Daniel@0: if nargout >= 3 Daniel@0: flog(j) = fnew; Daniel@0: if nargout >= 4 Daniel@0: pointlog(j, :) = x; Daniel@0: end Daniel@0: end Daniel@0: if display Daniel@0: fprintf(1, 'Cycle %5d Function %11.8f\n', j, fnew); Daniel@0: end Daniel@0: if (max(abs(x - xold)) < options(2) & abs(fnew - fold) < options(3)) Daniel@0: % Termination criteria are met Daniel@0: options(8) = fnew; Daniel@0: return; Daniel@0: end Daniel@0: end Daniel@0: Daniel@0: if fcneval Daniel@0: options(8) = fnew; Daniel@0: else Daniel@0: options(8) = feval(f, x, varargin{:}); Daniel@0: options(10) = options(10) + 1; Daniel@0: end Daniel@0: if (options(1) >= 0) Daniel@0: disp(maxitmess); Daniel@0: end