wolffd@0
|
1 <html>
|
wolffd@0
|
2 <head>
|
wolffd@0
|
3 <title>
|
wolffd@0
|
4 Netlab Reference Manual graddesc
|
wolffd@0
|
5 </title>
|
wolffd@0
|
6 </head>
|
wolffd@0
|
7 <body>
|
wolffd@0
|
8 <H1> graddesc
|
wolffd@0
|
9 </H1>
|
wolffd@0
|
10 <h2>
|
wolffd@0
|
11 Purpose
|
wolffd@0
|
12 </h2>
|
wolffd@0
|
13 Gradient descent optimization.
|
wolffd@0
|
14
|
wolffd@0
|
15 <p><h2>
|
wolffd@0
|
16 Description
|
wolffd@0
|
17 </h2>
|
wolffd@0
|
18 <CODE>[x, options, flog, pointlog] = graddesc(f, x, options, gradf)</CODE> uses
|
wolffd@0
|
19 batch gradient descent to find a local minimum of the function
|
wolffd@0
|
20 <CODE>f(x)</CODE> whose gradient is given by <CODE>gradf(x)</CODE>. A log of the function values
|
wolffd@0
|
21 after each cycle is (optionally) returned in <CODE>errlog</CODE>, and a log
|
wolffd@0
|
22 of the points visited is (optionally) returned in <CODE>pointlog</CODE>.
|
wolffd@0
|
23
|
wolffd@0
|
24 <p>Note that <CODE>x</CODE> is a row vector
|
wolffd@0
|
25 and <CODE>f</CODE> returns a scalar value.
|
wolffd@0
|
26 The point at which <CODE>f</CODE> has a local minimum
|
wolffd@0
|
27 is returned as <CODE>x</CODE>. The function value at that point is returned
|
wolffd@0
|
28 in <CODE>options(8)</CODE>.
|
wolffd@0
|
29
|
wolffd@0
|
30 <p><CODE>graddesc(f, x, options, gradf, p1, p2, ...)</CODE> allows
|
wolffd@0
|
31 additional arguments to be passed to <CODE>f()</CODE> and <CODE>gradf()</CODE>.
|
wolffd@0
|
32
|
wolffd@0
|
33 <p>The optional parameters have the following interpretations.
|
wolffd@0
|
34
|
wolffd@0
|
35 <p><CODE>options(1)</CODE> is set to 1 to display error values; also logs error
|
wolffd@0
|
36 values in the return argument <CODE>errlog</CODE>, and the points visited
|
wolffd@0
|
37 in the return argument <CODE>pointslog</CODE>. If <CODE>options(1)</CODE> is set to 0,
|
wolffd@0
|
38 then only warning messages are displayed. If <CODE>options(1)</CODE> is -1,
|
wolffd@0
|
39 then nothing is displayed.
|
wolffd@0
|
40
|
wolffd@0
|
41 <p><CODE>options(2)</CODE> is the absolute precision required for the value
|
wolffd@0
|
42 of <CODE>x</CODE> at the solution. If the absolute difference between
|
wolffd@0
|
43 the values of <CODE>x</CODE> between two successive steps is less than
|
wolffd@0
|
44 <CODE>options(2)</CODE>, then this condition is satisfied.
|
wolffd@0
|
45
|
wolffd@0
|
46 <p><CODE>options(3)</CODE> is a measure of the precision required of the objective
|
wolffd@0
|
47 function at the solution. If the absolute difference between the
|
wolffd@0
|
48 objective function values between two successive steps is less than
|
wolffd@0
|
49 <CODE>options(3)</CODE>, then this condition is satisfied.
|
wolffd@0
|
50 Both this and the previous condition must be
|
wolffd@0
|
51 satisfied for termination.
|
wolffd@0
|
52
|
wolffd@0
|
53 <p><CODE>options(7)</CODE> determines the line minimisation method used. If it
|
wolffd@0
|
54 is set to 1 then a line minimiser is used (in the direction of the negative
|
wolffd@0
|
55 gradient). If it is 0 (the default), then each parameter update
|
wolffd@0
|
56 is a fixed multiple (the learning rate)
|
wolffd@0
|
57 of the negative gradient added to a fixed multiple (the momentum) of
|
wolffd@0
|
58 the previous parameter update.
|
wolffd@0
|
59
|
wolffd@0
|
60 <p><CODE>options(9)</CODE> should be set to 1 to check the user defined gradient
|
wolffd@0
|
61 function <CODE>gradf</CODE> with <CODE>gradchek</CODE>. This is carried out at
|
wolffd@0
|
62 the initial parameter vector <CODE>x</CODE>.
|
wolffd@0
|
63
|
wolffd@0
|
64 <p><CODE>options(10)</CODE> returns the total number of function evaluations (including
|
wolffd@0
|
65 those in any line searches).
|
wolffd@0
|
66
|
wolffd@0
|
67 <p><CODE>options(11)</CODE> returns the total number of gradient evaluations.
|
wolffd@0
|
68
|
wolffd@0
|
69 <p><CODE>options(14)</CODE> is the maximum number of iterations; default 100.
|
wolffd@0
|
70
|
wolffd@0
|
71 <p><CODE>options(15)</CODE> is the precision in parameter space of the line search;
|
wolffd@0
|
72 default <CODE>foptions(2)</CODE>.
|
wolffd@0
|
73
|
wolffd@0
|
74 <p><CODE>options(17)</CODE> is the momentum; default 0.5. It should be scaled by the
|
wolffd@0
|
75 inverse of the number of data points.
|
wolffd@0
|
76
|
wolffd@0
|
77 <p><CODE>options(18)</CODE> is the learning rate; default 0.01. It should be
|
wolffd@0
|
78 scaled by the inverse of the number of data points.
|
wolffd@0
|
79
|
wolffd@0
|
80 <p><h2>
|
wolffd@0
|
81 Examples
|
wolffd@0
|
82 </h2>
|
wolffd@0
|
83 An example of how this function can be used to train a neural network is:
|
wolffd@0
|
84 <PRE>
|
wolffd@0
|
85
|
wolffd@0
|
86 options = zeros(1, 18);
|
wolffd@0
|
87 options(17) = 0.1/size(x, 1);
|
wolffd@0
|
88 net = netopt(net, options, x, t, 'graddesc');
|
wolffd@0
|
89 </PRE>
|
wolffd@0
|
90
|
wolffd@0
|
91 Note how the learning rate is scaled by the number of data points.
|
wolffd@0
|
92
|
wolffd@0
|
93 <p><h2>
|
wolffd@0
|
94 See Also
|
wolffd@0
|
95 </h2>
|
wolffd@0
|
96 <CODE><a href="conjgrad.htm">conjgrad</a></CODE>, <CODE><a href="linemin.htm">linemin</a></CODE>, <CODE><a href="olgd.htm">olgd</a></CODE>, <CODE><a href="minbrack.htm">minbrack</a></CODE>, <CODE><a href="quasinew.htm">quasinew</a></CODE>, <CODE><a href="scg.htm">scg</a></CODE><hr>
|
wolffd@0
|
97 <b>Pages:</b>
|
wolffd@0
|
98 <a href="index.htm">Index</a>
|
wolffd@0
|
99 <hr>
|
wolffd@0
|
100 <p>Copyright (c) Ian T Nabney (1996-9)
|
wolffd@0
|
101
|
wolffd@0
|
102
|
wolffd@0
|
103 </body>
|
wolffd@0
|
104 </html> |