Daniel@0: function [net, options, errlog, pointlog] = olgd(net, options, x, t) Daniel@0: %OLGD On-line gradient descent optimization. Daniel@0: % Daniel@0: % Description Daniel@0: % [NET, OPTIONS, ERRLOG, POINTLOG] = OLGD(NET, OPTIONS, X, T) uses on- Daniel@0: % line gradient descent to find a local minimum of the error function Daniel@0: % for the network NET computed on the input data X and target values T. Daniel@0: % A log of the error values after each cycle is (optionally) returned Daniel@0: % in ERRLOG, and a log of the points visited is (optionally) returned Daniel@0: % in POINTLOG. Because the gradient is computed on-line (i.e. after Daniel@0: % each pattern) this can be quite inefficient in Matlab. Daniel@0: % Daniel@0: % The error function value at final weight vector is returned in Daniel@0: % OPTIONS(8). Daniel@0: % Daniel@0: % The optional parameters have the following interpretations. Daniel@0: % Daniel@0: % OPTIONS(1) is set to 1 to display error values; also logs error Daniel@0: % values in the return argument ERRLOG, and the points visited in the Daniel@0: % return argument POINTSLOG. If OPTIONS(1) is set to 0, then only Daniel@0: % warning messages are displayed. If OPTIONS(1) is -1, then nothing is Daniel@0: % displayed. Daniel@0: % Daniel@0: % OPTIONS(2) is the precision required for the value of X at the Daniel@0: % solution. If the absolute difference between the values of X between Daniel@0: % two successive steps is less than OPTIONS(2), then this condition is Daniel@0: % satisfied. Daniel@0: % Daniel@0: % OPTIONS(3) is the precision required of the objective function at the Daniel@0: % solution. If the absolute difference between the error functions Daniel@0: % between two successive steps is less than OPTIONS(3), then this Daniel@0: % condition is satisfied. Both this and the previous condition must be Daniel@0: % satisfied for termination. Note that testing the function value at Daniel@0: % each iteration roughly halves the speed of the algorithm. Daniel@0: % Daniel@0: % OPTIONS(5) determines whether the patterns are sampled randomly with Daniel@0: % replacement. If it is 0 (the default), then patterns are sampled in Daniel@0: % order. Daniel@0: % Daniel@0: % OPTIONS(6) determines if the learning rate decays. If it is 1 then Daniel@0: % the learning rate decays at a rate of 1/T. If it is 0 (the default) Daniel@0: % then the learning rate is constant. Daniel@0: % Daniel@0: % OPTIONS(9) should be set to 1 to check the user defined gradient Daniel@0: % function. Daniel@0: % Daniel@0: % OPTIONS(10) returns the total number of function evaluations Daniel@0: % (including those in any line searches). Daniel@0: % Daniel@0: % OPTIONS(11) returns the total number of gradient evaluations. Daniel@0: % Daniel@0: % OPTIONS(14) is the maximum number of iterations (passes through the Daniel@0: % complete pattern set); default 100. Daniel@0: % Daniel@0: % OPTIONS(17) is the momentum; default 0.5. Daniel@0: % Daniel@0: % OPTIONS(18) is the learning rate; default 0.01. Daniel@0: % Daniel@0: % See also Daniel@0: % GRADDESC Daniel@0: % Daniel@0: Daniel@0: % Copyright (c) Ian T Nabney (1996-2001) Daniel@0: Daniel@0: % Set up the options. Daniel@0: if length(options) < 18 Daniel@0: error('Options vector too short') Daniel@0: end Daniel@0: Daniel@0: if (options(14)) Daniel@0: niters = options(14); Daniel@0: else Daniel@0: niters = 100; Daniel@0: end Daniel@0: Daniel@0: % Learning rate: must be positive Daniel@0: if (options(18) > 0) Daniel@0: eta = options(18); Daniel@0: else Daniel@0: eta = 0.01; Daniel@0: end Daniel@0: % Save initial learning rate for annealing Daniel@0: lr = eta; Daniel@0: % Momentum term: allow zero momentum Daniel@0: if (options(17) >= 0) Daniel@0: mu = options(17); Daniel@0: else Daniel@0: mu = 0.5; Daniel@0: end Daniel@0: Daniel@0: pakstr = [net.type, 'pak']; Daniel@0: unpakstr = [net.type, 'unpak']; Daniel@0: Daniel@0: % Extract initial weights from the network Daniel@0: w = feval(pakstr, net); Daniel@0: Daniel@0: display = options(1); Daniel@0: Daniel@0: % Work out if we need to compute f at each iteration. Daniel@0: % Needed if display results or if termination Daniel@0: % criterion requires it. Daniel@0: fcneval = (display | options(3)); Daniel@0: Daniel@0: % Check gradients Daniel@0: if (options(9)) Daniel@0: feval('gradchek', w, 'neterr', 'netgrad', net, x, t); Daniel@0: end Daniel@0: Daniel@0: dwold = zeros(1, length(w)); Daniel@0: fold = 0; % Must be initialised so that termination test can be performed Daniel@0: ndata = size(x, 1); Daniel@0: Daniel@0: if fcneval Daniel@0: fnew = neterr(w, net, x, t); Daniel@0: options(10) = options(10) + 1; Daniel@0: fold = fnew; Daniel@0: end Daniel@0: Daniel@0: j = 1; Daniel@0: if nargout >= 3 Daniel@0: errlog(j, :) = fnew; Daniel@0: if nargout == 4 Daniel@0: pointlog(j, :) = w; Daniel@0: end Daniel@0: end Daniel@0: Daniel@0: % Main optimization loop. Daniel@0: while j <= niters Daniel@0: wold = w; Daniel@0: if options(5) Daniel@0: % Randomise order of pattern presentation: with replacement Daniel@0: pnum = ceil(rand(ndata, 1).*ndata); Daniel@0: else Daniel@0: pnum = 1:ndata; Daniel@0: end Daniel@0: for k = 1:ndata Daniel@0: grad = netgrad(w, net, x(pnum(k),:), t(pnum(k),:)); Daniel@0: if options(6) Daniel@0: % Let learning rate decrease as 1/t Daniel@0: lr = eta/((j-1)*ndata + k); Daniel@0: end Daniel@0: dw = mu*dwold - lr*grad; Daniel@0: w = w + dw; Daniel@0: dwold = dw; Daniel@0: end Daniel@0: options(11) = options(11) + 1; % Increment gradient evaluation count Daniel@0: if fcneval Daniel@0: fold = fnew; Daniel@0: fnew = neterr(w, net, x, t); Daniel@0: options(10) = options(10) + 1; Daniel@0: end Daniel@0: if display Daniel@0: fprintf(1, 'Iteration %5d Error %11.8f\n', j, fnew); Daniel@0: end Daniel@0: j = j + 1; Daniel@0: if nargout >= 3 Daniel@0: errlog(j) = fnew; Daniel@0: if nargout == 4 Daniel@0: pointlog(j, :) = w; Daniel@0: end Daniel@0: end Daniel@0: if (max(abs(w - wold)) < options(2) & abs(fnew - fold) < options(3)) Daniel@0: % Termination criteria are met Daniel@0: options(8) = fnew; Daniel@0: net = feval(unpakstr, net, w); Daniel@0: return; Daniel@0: end Daniel@0: end Daniel@0: Daniel@0: if fcneval Daniel@0: options(8) = fnew; Daniel@0: else Daniel@0: % Return error on entire dataset Daniel@0: options(8) = neterr(w, net, x, t); Daniel@0: options(10) = options(10) + 1; Daniel@0: end Daniel@0: if (options(1) >= 0) Daniel@0: disp(maxitmess); Daniel@0: end Daniel@0: Daniel@0: net = feval(unpakstr, net, w);