Daniel@0: Daniel@0:
Daniel@0:[x, options, flog, pointlog] = graddesc(f, x, options, gradf)
uses
Daniel@0: batch gradient descent to find a local minimum of the function
Daniel@0: f(x)
whose gradient is given by gradf(x)
. A log of the function values
Daniel@0: after each cycle is (optionally) returned in errlog
, and a log
Daniel@0: of the points visited is (optionally) returned in pointlog
.
Daniel@0:
Daniel@0: Note that x
is a row vector
Daniel@0: and f
returns a scalar value.
Daniel@0: The point at which f
has a local minimum
Daniel@0: is returned as x
. The function value at that point is returned
Daniel@0: in options(8)
.
Daniel@0:
Daniel@0:
graddesc(f, x, options, gradf, p1, p2, ...)
allows
Daniel@0: additional arguments to be passed to f()
and gradf()
.
Daniel@0:
Daniel@0:
The optional parameters have the following interpretations. Daniel@0: Daniel@0:
options(1)
is set to 1 to display error values; also logs error
Daniel@0: values in the return argument errlog
, and the points visited
Daniel@0: in the return argument pointslog
. If options(1)
is set to 0,
Daniel@0: then only warning messages are displayed. If options(1)
is -1,
Daniel@0: then nothing is displayed.
Daniel@0:
Daniel@0:
options(2)
is the absolute precision required for the value
Daniel@0: of x
at the solution. If the absolute difference between
Daniel@0: the values of x
between two successive steps is less than
Daniel@0: options(2)
, then this condition is satisfied.
Daniel@0:
Daniel@0:
options(3)
is a measure of the precision required of the objective
Daniel@0: function at the solution. If the absolute difference between the
Daniel@0: objective function values between two successive steps is less than
Daniel@0: options(3)
, then this condition is satisfied.
Daniel@0: Both this and the previous condition must be
Daniel@0: satisfied for termination.
Daniel@0:
Daniel@0:
options(7)
determines the line minimisation method used. If it
Daniel@0: is set to 1 then a line minimiser is used (in the direction of the negative
Daniel@0: gradient). If it is 0 (the default), then each parameter update
Daniel@0: is a fixed multiple (the learning rate)
Daniel@0: of the negative gradient added to a fixed multiple (the momentum) of
Daniel@0: the previous parameter update.
Daniel@0:
Daniel@0:
options(9)
should be set to 1 to check the user defined gradient
Daniel@0: function gradf
with gradchek
. This is carried out at
Daniel@0: the initial parameter vector x
.
Daniel@0:
Daniel@0:
options(10)
returns the total number of function evaluations (including
Daniel@0: those in any line searches).
Daniel@0:
Daniel@0:
options(11)
returns the total number of gradient evaluations.
Daniel@0:
Daniel@0:
options(14)
is the maximum number of iterations; default 100.
Daniel@0:
Daniel@0:
options(15)
is the precision in parameter space of the line search;
Daniel@0: default foptions(2)
.
Daniel@0:
Daniel@0:
options(17)
is the momentum; default 0.5. It should be scaled by the
Daniel@0: inverse of the number of data points.
Daniel@0:
Daniel@0:
options(18)
is the learning rate; default 0.01. It should be
Daniel@0: scaled by the inverse of the number of data points.
Daniel@0:
Daniel@0:
Daniel@0: Daniel@0: options = zeros(1, 18); Daniel@0: options(17) = 0.1/size(x, 1); Daniel@0: net = netopt(net, options, x, t, 'graddesc'); Daniel@0:Daniel@0: Daniel@0: Note how the learning rate is scaled by the number of data points. Daniel@0: Daniel@0:
conjgrad
, linemin
, olgd
, minbrack
, quasinew
, scg
Copyright (c) Ian T Nabney (1996-9) Daniel@0: Daniel@0: Daniel@0: Daniel@0: