Mercurial > hg > camir-aes2014
comparison toolboxes/FullBNT-1.0.7/nethelp3.3/graddesc.htm @ 0:e9a9cd732c1e tip
first hg version after svn
author | wolffd |
---|---|
date | Tue, 10 Feb 2015 15:05:51 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:e9a9cd732c1e |
---|---|
1 <html> | |
2 <head> | |
3 <title> | |
4 Netlab Reference Manual graddesc | |
5 </title> | |
6 </head> | |
7 <body> | |
8 <H1> graddesc | |
9 </H1> | |
10 <h2> | |
11 Purpose | |
12 </h2> | |
13 Gradient descent optimization. | |
14 | |
15 <p><h2> | |
16 Description | |
17 </h2> | |
18 <CODE>[x, options, flog, pointlog] = graddesc(f, x, options, gradf)</CODE> uses | |
19 batch gradient descent to find a local minimum of the function | |
20 <CODE>f(x)</CODE> whose gradient is given by <CODE>gradf(x)</CODE>. A log of the function values | |
21 after each cycle is (optionally) returned in <CODE>errlog</CODE>, and a log | |
22 of the points visited is (optionally) returned in <CODE>pointlog</CODE>. | |
23 | |
24 <p>Note that <CODE>x</CODE> is a row vector | |
25 and <CODE>f</CODE> returns a scalar value. | |
26 The point at which <CODE>f</CODE> has a local minimum | |
27 is returned as <CODE>x</CODE>. The function value at that point is returned | |
28 in <CODE>options(8)</CODE>. | |
29 | |
30 <p><CODE>graddesc(f, x, options, gradf, p1, p2, ...)</CODE> allows | |
31 additional arguments to be passed to <CODE>f()</CODE> and <CODE>gradf()</CODE>. | |
32 | |
33 <p>The optional parameters have the following interpretations. | |
34 | |
35 <p><CODE>options(1)</CODE> is set to 1 to display error values; also logs error | |
36 values in the return argument <CODE>errlog</CODE>, and the points visited | |
37 in the return argument <CODE>pointslog</CODE>. If <CODE>options(1)</CODE> is set to 0, | |
38 then only warning messages are displayed. If <CODE>options(1)</CODE> is -1, | |
39 then nothing is displayed. | |
40 | |
41 <p><CODE>options(2)</CODE> is the absolute precision required for the value | |
42 of <CODE>x</CODE> at the solution. If the absolute difference between | |
43 the values of <CODE>x</CODE> between two successive steps is less than | |
44 <CODE>options(2)</CODE>, then this condition is satisfied. | |
45 | |
46 <p><CODE>options(3)</CODE> is a measure of the precision required of the objective | |
47 function at the solution. If the absolute difference between the | |
48 objective function values between two successive steps is less than | |
49 <CODE>options(3)</CODE>, then this condition is satisfied. | |
50 Both this and the previous condition must be | |
51 satisfied for termination. | |
52 | |
53 <p><CODE>options(7)</CODE> determines the line minimisation method used. If it | |
54 is set to 1 then a line minimiser is used (in the direction of the negative | |
55 gradient). If it is 0 (the default), then each parameter update | |
56 is a fixed multiple (the learning rate) | |
57 of the negative gradient added to a fixed multiple (the momentum) of | |
58 the previous parameter update. | |
59 | |
60 <p><CODE>options(9)</CODE> should be set to 1 to check the user defined gradient | |
61 function <CODE>gradf</CODE> with <CODE>gradchek</CODE>. This is carried out at | |
62 the initial parameter vector <CODE>x</CODE>. | |
63 | |
64 <p><CODE>options(10)</CODE> returns the total number of function evaluations (including | |
65 those in any line searches). | |
66 | |
67 <p><CODE>options(11)</CODE> returns the total number of gradient evaluations. | |
68 | |
69 <p><CODE>options(14)</CODE> is the maximum number of iterations; default 100. | |
70 | |
71 <p><CODE>options(15)</CODE> is the precision in parameter space of the line search; | |
72 default <CODE>foptions(2)</CODE>. | |
73 | |
74 <p><CODE>options(17)</CODE> is the momentum; default 0.5. It should be scaled by the | |
75 inverse of the number of data points. | |
76 | |
77 <p><CODE>options(18)</CODE> is the learning rate; default 0.01. It should be | |
78 scaled by the inverse of the number of data points. | |
79 | |
80 <p><h2> | |
81 Examples | |
82 </h2> | |
83 An example of how this function can be used to train a neural network is: | |
84 <PRE> | |
85 | |
86 options = zeros(1, 18); | |
87 options(17) = 0.1/size(x, 1); | |
88 net = netopt(net, options, x, t, 'graddesc'); | |
89 </PRE> | |
90 | |
91 Note how the learning rate is scaled by the number of data points. | |
92 | |
93 <p><h2> | |
94 See Also | |
95 </h2> | |
96 <CODE><a href="conjgrad.htm">conjgrad</a></CODE>, <CODE><a href="linemin.htm">linemin</a></CODE>, <CODE><a href="olgd.htm">olgd</a></CODE>, <CODE><a href="minbrack.htm">minbrack</a></CODE>, <CODE><a href="quasinew.htm">quasinew</a></CODE>, <CODE><a href="scg.htm">scg</a></CODE><hr> | |
97 <b>Pages:</b> | |
98 <a href="index.htm">Index</a> | |
99 <hr> | |
100 <p>Copyright (c) Ian T Nabney (1996-9) | |
101 | |
102 | |
103 </body> | |
104 </html> |