Daniel@0: %DEMEV1 Demonstrate Bayesian regression for the MLP. Daniel@0: % Daniel@0: % Description Daniel@0: % The problem consists an input variable X which sampled from a Daniel@0: % Gaussian distribution, and a target variable T generated by computing Daniel@0: % SIN(2*PI*X) and adding Gaussian noise. A 2-layer network with linear Daniel@0: % outputs is trained by minimizing a sum-of-squares error function with Daniel@0: % isotropic Gaussian regularizer, using the scaled conjugate gradient Daniel@0: % optimizer. The hyperparameters ALPHA and BETA are re-estimated using Daniel@0: % the function EVIDENCE. A graph is plotted of the original function, Daniel@0: % the training data, the trained network function, and the error bars. Daniel@0: % Daniel@0: % See also Daniel@0: % EVIDENCE, MLP, SCG, DEMARD, DEMMLP1 Daniel@0: % Daniel@0: Daniel@0: % Copyright (c) Ian T Nabney (1996-2001) Daniel@0: Daniel@0: clc; Daniel@0: disp('This demonstration illustrates the application of Bayesian') Daniel@0: disp('re-estimation to determine the hyperparameters in a simple regression') Daniel@0: disp('problem. It is based on a local quadratic approximation to a mode of') Daniel@0: disp('the posterior distribution and the evidence maximization framework of') Daniel@0: disp('MacKay.') Daniel@0: disp(' ') Daniel@0: disp('First, we generate a synthetic data set consisting of a single input') Daniel@0: disp('variable x sampled from a Gaussian distribution, and a target variable') Daniel@0: disp('t obtained by evaluating sin(2*pi*x) and adding Gaussian noise.') Daniel@0: disp(' ') Daniel@0: disp('Press any key to see a plot of the data together with the sine function.') Daniel@0: pause; Daniel@0: Daniel@0: % Generate the matrix of inputs x and targets t. Daniel@0: Daniel@0: ndata = 16; % Number of data points. Daniel@0: noise = 0.1; % Standard deviation of noise distribution. Daniel@0: randn('state', 0); Daniel@0: x = 0.25 + 0.07*randn(ndata, 1); Daniel@0: t = sin(2*pi*x) + noise*randn(size(x)); Daniel@0: Daniel@0: % Plot the data and the original sine function. Daniel@0: h = figure; Daniel@0: nplot = 200; Daniel@0: plotvals = linspace(0, 1, nplot)'; Daniel@0: plot(x, t, 'ok') Daniel@0: xlabel('Input') Daniel@0: ylabel('Target') Daniel@0: hold on Daniel@0: axis([0 1 -1.5 1.5]) Daniel@0: fplot('sin(2*pi*x)', [0 1], '-g') Daniel@0: legend('data', 'function'); Daniel@0: Daniel@0: disp(' ') Daniel@0: disp('Press any key to continue') Daniel@0: pause; clc; Daniel@0: Daniel@0: disp('Next we create a two-layer MLP network having 3 hidden units and one') Daniel@0: disp('linear output. The model assumes Gaussian target noise governed by an') Daniel@0: disp('inverse variance hyperparmeter beta, and uses a simple Gaussian prior') Daniel@0: disp('distribution governed by an inverse variance hyperparameter alpha.') Daniel@0: disp(' '); Daniel@0: disp('The network weights and the hyperparameters are initialised and then') Daniel@0: disp('the weights are optimized with the scaled conjugate gradient') Daniel@0: disp('algorithm using the SCG function, with the hyperparameters kept') Daniel@0: disp('fixed. After a maximum of 500 iterations, the hyperparameters are') Daniel@0: disp('re-estimated using the EVIDENCE function. The process of optimizing') Daniel@0: disp('the weights with fixed hyperparameters and then re-estimating the') Daniel@0: disp('hyperparameters is repeated for a total of 3 cycles.') Daniel@0: disp(' ') Daniel@0: disp('Press any key to train the network and determine the hyperparameters.') Daniel@0: pause; Daniel@0: Daniel@0: % Set up network parameters. Daniel@0: nin = 1; % Number of inputs. Daniel@0: nhidden = 3; % Number of hidden units. Daniel@0: nout = 1; % Number of outputs. Daniel@0: alpha = 0.01; % Initial prior hyperparameter. Daniel@0: beta_init = 50.0; % Initial noise hyperparameter. Daniel@0: Daniel@0: % Create and initialize network weight vector. Daniel@0: net = mlp(nin, nhidden, nout, 'linear', alpha, beta_init); Daniel@0: Daniel@0: % Set up vector of options for the optimiser. Daniel@0: nouter = 3; % Number of outer loops. Daniel@0: ninner = 1; % Number of innter loops. Daniel@0: options = zeros(1,18); % Default options vector. Daniel@0: options(1) = 1; % This provides display of error values. Daniel@0: options(2) = 1.0e-7; % Absolute precision for weights. Daniel@0: options(3) = 1.0e-7; % Precision for objective function. Daniel@0: options(14) = 500; % Number of training cycles in inner loop. Daniel@0: Daniel@0: % Train using scaled conjugate gradients, re-estimating alpha and beta. Daniel@0: for k = 1:nouter Daniel@0: net = netopt(net, options, x, t, 'scg'); Daniel@0: [net, gamma] = evidence(net, x, t, ninner); Daniel@0: fprintf(1, '\nRe-estimation cycle %d:\n', k); Daniel@0: fprintf(1, ' alpha = %8.5f\n', net.alpha); Daniel@0: fprintf(1, ' beta = %8.5f\n', net.beta); Daniel@0: fprintf(1, ' gamma = %8.5f\n\n', gamma); Daniel@0: disp(' ') Daniel@0: disp('Press any key to continue.') Daniel@0: pause; Daniel@0: end Daniel@0: Daniel@0: fprintf(1, 'true beta: %f\n', 1/(noise*noise)); Daniel@0: Daniel@0: disp(' ') Daniel@0: disp('Network training and hyperparameter re-estimation are now complete.') Daniel@0: disp('Compare the final value for the hyperparameter beta with the true') Daniel@0: disp('value.') Daniel@0: disp(' ') Daniel@0: disp('Notice that the final error value is close to the number of data') Daniel@0: disp(['points (', num2str(ndata),') divided by two.']) Daniel@0: disp(' ') Daniel@0: disp('Press any key to continue.') Daniel@0: pause; clc; Daniel@0: disp('We can now plot the function represented by the trained network. This') Daniel@0: disp('corresponds to the mean of the predictive distribution. We can also') Daniel@0: disp('plot ''error bars'' representing one standard deviation of the') Daniel@0: disp('predictive distribution around the mean.') Daniel@0: disp(' ') Daniel@0: disp('Press any key to add the network function and error bars to the plot.') Daniel@0: pause; Daniel@0: Daniel@0: % Evaluate error bars. Daniel@0: [y, sig2] = netevfwd(mlppak(net), net, x, t, plotvals); Daniel@0: sig = sqrt(sig2); Daniel@0: Daniel@0: % Plot the data, the original function, and the trained network function. Daniel@0: [y, z] = mlpfwd(net, plotvals); Daniel@0: figure(h); hold on; Daniel@0: plot(plotvals, y, '-r') Daniel@0: xlabel('Input') Daniel@0: ylabel('Target') Daniel@0: plot(plotvals, y + sig, '-b'); Daniel@0: plot(plotvals, y - sig, '-b'); Daniel@0: legend('data', 'function', 'network', 'error bars'); Daniel@0: Daniel@0: disp(' ') Daniel@0: disp('Notice how the confidence interval spanned by the ''error bars'' is') Daniel@0: disp('smaller in the region of input space where the data density is high,') Daniel@0: disp('and becomes larger in regions away from the data.') Daniel@0: disp(' ') Daniel@0: disp('Press any key to end.') Daniel@0: pause; clc; close(h); Daniel@0: %clear all