wolffd@0: wolffd@0:
wolffd@0:[x, options, flog, pointlog] = graddesc(f, x, options, gradf)
uses
wolffd@0: batch gradient descent to find a local minimum of the function
wolffd@0: f(x)
whose gradient is given by gradf(x)
. A log of the function values
wolffd@0: after each cycle is (optionally) returned in errlog
, and a log
wolffd@0: of the points visited is (optionally) returned in pointlog
.
wolffd@0:
wolffd@0: Note that x
is a row vector
wolffd@0: and f
returns a scalar value.
wolffd@0: The point at which f
has a local minimum
wolffd@0: is returned as x
. The function value at that point is returned
wolffd@0: in options(8)
.
wolffd@0:
wolffd@0:
graddesc(f, x, options, gradf, p1, p2, ...)
allows
wolffd@0: additional arguments to be passed to f()
and gradf()
.
wolffd@0:
wolffd@0:
The optional parameters have the following interpretations. wolffd@0: wolffd@0:
options(1)
is set to 1 to display error values; also logs error
wolffd@0: values in the return argument errlog
, and the points visited
wolffd@0: in the return argument pointslog
. If options(1)
is set to 0,
wolffd@0: then only warning messages are displayed. If options(1)
is -1,
wolffd@0: then nothing is displayed.
wolffd@0:
wolffd@0:
options(2)
is the absolute precision required for the value
wolffd@0: of x
at the solution. If the absolute difference between
wolffd@0: the values of x
between two successive steps is less than
wolffd@0: options(2)
, then this condition is satisfied.
wolffd@0:
wolffd@0:
options(3)
is a measure of the precision required of the objective
wolffd@0: function at the solution. If the absolute difference between the
wolffd@0: objective function values between two successive steps is less than
wolffd@0: options(3)
, then this condition is satisfied.
wolffd@0: Both this and the previous condition must be
wolffd@0: satisfied for termination.
wolffd@0:
wolffd@0:
options(7)
determines the line minimisation method used. If it
wolffd@0: is set to 1 then a line minimiser is used (in the direction of the negative
wolffd@0: gradient). If it is 0 (the default), then each parameter update
wolffd@0: is a fixed multiple (the learning rate)
wolffd@0: of the negative gradient added to a fixed multiple (the momentum) of
wolffd@0: the previous parameter update.
wolffd@0:
wolffd@0:
options(9)
should be set to 1 to check the user defined gradient
wolffd@0: function gradf
with gradchek
. This is carried out at
wolffd@0: the initial parameter vector x
.
wolffd@0:
wolffd@0:
options(10)
returns the total number of function evaluations (including
wolffd@0: those in any line searches).
wolffd@0:
wolffd@0:
options(11)
returns the total number of gradient evaluations.
wolffd@0:
wolffd@0:
options(14)
is the maximum number of iterations; default 100.
wolffd@0:
wolffd@0:
options(15)
is the precision in parameter space of the line search;
wolffd@0: default foptions(2)
.
wolffd@0:
wolffd@0:
options(17)
is the momentum; default 0.5. It should be scaled by the
wolffd@0: inverse of the number of data points.
wolffd@0:
wolffd@0:
options(18)
is the learning rate; default 0.01. It should be
wolffd@0: scaled by the inverse of the number of data points.
wolffd@0:
wolffd@0:
wolffd@0: wolffd@0: options = zeros(1, 18); wolffd@0: options(17) = 0.1/size(x, 1); wolffd@0: net = netopt(net, options, x, t, 'graddesc'); wolffd@0:wolffd@0: wolffd@0: Note how the learning rate is scaled by the number of data points. wolffd@0: wolffd@0:
conjgrad
, linemin
, olgd
, minbrack
, quasinew
, scg
Copyright (c) Ian T Nabney (1996-9) wolffd@0: wolffd@0: wolffd@0: wolffd@0: