Mercurial > hg > camir-aes2014
comparison toolboxes/FullBNT-1.0.7/nethelp3.3/graddesc.htm @ 0:e9a9cd732c1e tip
first hg version after svn
| author | wolffd |
|---|---|
| date | Tue, 10 Feb 2015 15:05:51 +0000 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| -1:000000000000 | 0:e9a9cd732c1e |
|---|---|
| 1 <html> | |
| 2 <head> | |
| 3 <title> | |
| 4 Netlab Reference Manual graddesc | |
| 5 </title> | |
| 6 </head> | |
| 7 <body> | |
| 8 <H1> graddesc | |
| 9 </H1> | |
| 10 <h2> | |
| 11 Purpose | |
| 12 </h2> | |
| 13 Gradient descent optimization. | |
| 14 | |
| 15 <p><h2> | |
| 16 Description | |
| 17 </h2> | |
| 18 <CODE>[x, options, flog, pointlog] = graddesc(f, x, options, gradf)</CODE> uses | |
| 19 batch gradient descent to find a local minimum of the function | |
| 20 <CODE>f(x)</CODE> whose gradient is given by <CODE>gradf(x)</CODE>. A log of the function values | |
| 21 after each cycle is (optionally) returned in <CODE>errlog</CODE>, and a log | |
| 22 of the points visited is (optionally) returned in <CODE>pointlog</CODE>. | |
| 23 | |
| 24 <p>Note that <CODE>x</CODE> is a row vector | |
| 25 and <CODE>f</CODE> returns a scalar value. | |
| 26 The point at which <CODE>f</CODE> has a local minimum | |
| 27 is returned as <CODE>x</CODE>. The function value at that point is returned | |
| 28 in <CODE>options(8)</CODE>. | |
| 29 | |
| 30 <p><CODE>graddesc(f, x, options, gradf, p1, p2, ...)</CODE> allows | |
| 31 additional arguments to be passed to <CODE>f()</CODE> and <CODE>gradf()</CODE>. | |
| 32 | |
| 33 <p>The optional parameters have the following interpretations. | |
| 34 | |
| 35 <p><CODE>options(1)</CODE> is set to 1 to display error values; also logs error | |
| 36 values in the return argument <CODE>errlog</CODE>, and the points visited | |
| 37 in the return argument <CODE>pointslog</CODE>. If <CODE>options(1)</CODE> is set to 0, | |
| 38 then only warning messages are displayed. If <CODE>options(1)</CODE> is -1, | |
| 39 then nothing is displayed. | |
| 40 | |
| 41 <p><CODE>options(2)</CODE> is the absolute precision required for the value | |
| 42 of <CODE>x</CODE> at the solution. If the absolute difference between | |
| 43 the values of <CODE>x</CODE> between two successive steps is less than | |
| 44 <CODE>options(2)</CODE>, then this condition is satisfied. | |
| 45 | |
| 46 <p><CODE>options(3)</CODE> is a measure of the precision required of the objective | |
| 47 function at the solution. If the absolute difference between the | |
| 48 objective function values between two successive steps is less than | |
| 49 <CODE>options(3)</CODE>, then this condition is satisfied. | |
| 50 Both this and the previous condition must be | |
| 51 satisfied for termination. | |
| 52 | |
| 53 <p><CODE>options(7)</CODE> determines the line minimisation method used. If it | |
| 54 is set to 1 then a line minimiser is used (in the direction of the negative | |
| 55 gradient). If it is 0 (the default), then each parameter update | |
| 56 is a fixed multiple (the learning rate) | |
| 57 of the negative gradient added to a fixed multiple (the momentum) of | |
| 58 the previous parameter update. | |
| 59 | |
| 60 <p><CODE>options(9)</CODE> should be set to 1 to check the user defined gradient | |
| 61 function <CODE>gradf</CODE> with <CODE>gradchek</CODE>. This is carried out at | |
| 62 the initial parameter vector <CODE>x</CODE>. | |
| 63 | |
| 64 <p><CODE>options(10)</CODE> returns the total number of function evaluations (including | |
| 65 those in any line searches). | |
| 66 | |
| 67 <p><CODE>options(11)</CODE> returns the total number of gradient evaluations. | |
| 68 | |
| 69 <p><CODE>options(14)</CODE> is the maximum number of iterations; default 100. | |
| 70 | |
| 71 <p><CODE>options(15)</CODE> is the precision in parameter space of the line search; | |
| 72 default <CODE>foptions(2)</CODE>. | |
| 73 | |
| 74 <p><CODE>options(17)</CODE> is the momentum; default 0.5. It should be scaled by the | |
| 75 inverse of the number of data points. | |
| 76 | |
| 77 <p><CODE>options(18)</CODE> is the learning rate; default 0.01. It should be | |
| 78 scaled by the inverse of the number of data points. | |
| 79 | |
| 80 <p><h2> | |
| 81 Examples | |
| 82 </h2> | |
| 83 An example of how this function can be used to train a neural network is: | |
| 84 <PRE> | |
| 85 | |
| 86 options = zeros(1, 18); | |
| 87 options(17) = 0.1/size(x, 1); | |
| 88 net = netopt(net, options, x, t, 'graddesc'); | |
| 89 </PRE> | |
| 90 | |
| 91 Note how the learning rate is scaled by the number of data points. | |
| 92 | |
| 93 <p><h2> | |
| 94 See Also | |
| 95 </h2> | |
| 96 <CODE><a href="conjgrad.htm">conjgrad</a></CODE>, <CODE><a href="linemin.htm">linemin</a></CODE>, <CODE><a href="olgd.htm">olgd</a></CODE>, <CODE><a href="minbrack.htm">minbrack</a></CODE>, <CODE><a href="quasinew.htm">quasinew</a></CODE>, <CODE><a href="scg.htm">scg</a></CODE><hr> | |
| 97 <b>Pages:</b> | |
| 98 <a href="index.htm">Index</a> | |
| 99 <hr> | |
| 100 <p>Copyright (c) Ian T Nabney (1996-9) | |
| 101 | |
| 102 | |
| 103 </body> | |
| 104 </html> |
