wolffd@0: wolffd@0: wolffd@0: wolffd@0: Netlab Reference Manual graddesc wolffd@0: wolffd@0: wolffd@0: wolffd@0:

graddesc wolffd@0:

wolffd@0:

wolffd@0: Purpose wolffd@0:

wolffd@0: Gradient descent optimization. wolffd@0: wolffd@0:

wolffd@0: Description wolffd@0:

wolffd@0: [x, options, flog, pointlog] = graddesc(f, x, options, gradf) uses wolffd@0: batch gradient descent to find a local minimum of the function wolffd@0: f(x) whose gradient is given by gradf(x). A log of the function values wolffd@0: after each cycle is (optionally) returned in errlog, and a log wolffd@0: of the points visited is (optionally) returned in pointlog. wolffd@0: wolffd@0:

Note that x is a row vector wolffd@0: and f returns a scalar value. wolffd@0: The point at which f has a local minimum wolffd@0: is returned as x. The function value at that point is returned wolffd@0: in options(8). wolffd@0: wolffd@0:

graddesc(f, x, options, gradf, p1, p2, ...) allows wolffd@0: additional arguments to be passed to f() and gradf(). wolffd@0: wolffd@0:

The optional parameters have the following interpretations. wolffd@0: wolffd@0:

options(1) is set to 1 to display error values; also logs error wolffd@0: values in the return argument errlog, and the points visited wolffd@0: in the return argument pointslog. If options(1) is set to 0, wolffd@0: then only warning messages are displayed. If options(1) is -1, wolffd@0: then nothing is displayed. wolffd@0: wolffd@0:

options(2) is the absolute precision required for the value wolffd@0: of x at the solution. If the absolute difference between wolffd@0: the values of x between two successive steps is less than wolffd@0: options(2), then this condition is satisfied. wolffd@0: wolffd@0:

options(3) is a measure of the precision required of the objective wolffd@0: function at the solution. If the absolute difference between the wolffd@0: objective function values between two successive steps is less than wolffd@0: options(3), then this condition is satisfied. wolffd@0: Both this and the previous condition must be wolffd@0: satisfied for termination. wolffd@0: wolffd@0:

options(7) determines the line minimisation method used. If it wolffd@0: is set to 1 then a line minimiser is used (in the direction of the negative wolffd@0: gradient). If it is 0 (the default), then each parameter update wolffd@0: is a fixed multiple (the learning rate) wolffd@0: of the negative gradient added to a fixed multiple (the momentum) of wolffd@0: the previous parameter update. wolffd@0: wolffd@0:

options(9) should be set to 1 to check the user defined gradient wolffd@0: function gradf with gradchek. This is carried out at wolffd@0: the initial parameter vector x. wolffd@0: wolffd@0:

options(10) returns the total number of function evaluations (including wolffd@0: those in any line searches). wolffd@0: wolffd@0:

options(11) returns the total number of gradient evaluations. wolffd@0: wolffd@0:

options(14) is the maximum number of iterations; default 100. wolffd@0: wolffd@0:

options(15) is the precision in parameter space of the line search; wolffd@0: default foptions(2). wolffd@0: wolffd@0:

options(17) is the momentum; default 0.5. It should be scaled by the wolffd@0: inverse of the number of data points. wolffd@0: wolffd@0:

options(18) is the learning rate; default 0.01. It should be wolffd@0: scaled by the inverse of the number of data points. wolffd@0: wolffd@0:

wolffd@0: Examples wolffd@0:

wolffd@0: An example of how this function can be used to train a neural network is: wolffd@0:
wolffd@0: 
wolffd@0: options = zeros(1, 18);
wolffd@0: options(17) = 0.1/size(x, 1);
wolffd@0: net = netopt(net, options, x, t, 'graddesc');
wolffd@0: 
wolffd@0: wolffd@0: Note how the learning rate is scaled by the number of data points. wolffd@0: wolffd@0:

wolffd@0: See Also wolffd@0:

wolffd@0: conjgrad, linemin, olgd, minbrack, quasinew, scg
wolffd@0: Pages: wolffd@0: Index wolffd@0:
wolffd@0:

Copyright (c) Ian T Nabney (1996-9) wolffd@0: wolffd@0: wolffd@0: wolffd@0: