Daniel@0: %DEMEV3 Demonstrate Bayesian regression for the RBF. Daniel@0: % Daniel@0: % Description Daniel@0: % The problem consists an input variable X which sampled from a Daniel@0: % Gaussian distribution, and a target variable T generated by computing Daniel@0: % SIN(2*PI*X) and adding Gaussian noise. An RBF network with linear Daniel@0: % outputs is trained by minimizing a sum-of-squares error function with Daniel@0: % isotropic Gaussian regularizer, using the scaled conjugate gradient Daniel@0: % optimizer. The hyperparameters ALPHA and BETA are re-estimated using Daniel@0: % the function EVIDENCE. A graph is plotted of the original function, Daniel@0: % the training data, the trained network function, and the error bars. Daniel@0: % Daniel@0: % See also Daniel@0: % DEMEV1, EVIDENCE, RBF, SCG, NETEVFWD Daniel@0: % Daniel@0: Daniel@0: % Copyright (c) Ian T Nabney (1996-2001) Daniel@0: Daniel@0: clc; Daniel@0: disp('This demonstration illustrates the application of Bayesian') Daniel@0: disp('re-estimation to determine the hyperparameters in a simple regression') Daniel@0: disp('problem using an RBF netowk. It is based on a the fact that the') Daniel@0: disp('posterior distribution for the output weights of an RBF is Gaussian') Daniel@0: disp('and uses the evidence maximization framework of MacKay.') Daniel@0: disp(' ') Daniel@0: disp('First, we generate a synthetic data set consisting of a single input') Daniel@0: disp('variable x sampled from a Gaussian distribution, and a target variable') Daniel@0: disp('t obtained by evaluating sin(2*pi*x) and adding Gaussian noise.') Daniel@0: disp(' ') Daniel@0: disp('Press any key to see a plot of the data together with the sine function.') Daniel@0: pause; Daniel@0: Daniel@0: % Generate the matrix of inputs x and targets t. Daniel@0: Daniel@0: ndata = 16; % Number of data points. Daniel@0: noise = 0.1; % Standard deviation of noise distribution. Daniel@0: randn('state', 0); Daniel@0: rand('state', 0); Daniel@0: x = 0.25 + 0.07*randn(ndata, 1); Daniel@0: t = sin(2*pi*x) + noise*randn(size(x)); Daniel@0: Daniel@0: % Plot the data and the original sine function. Daniel@0: h = figure; Daniel@0: nplot = 200; Daniel@0: plotvals = linspace(0, 1, nplot)'; Daniel@0: plot(x, t, 'ok') Daniel@0: xlabel('Input') Daniel@0: ylabel('Target') Daniel@0: hold on Daniel@0: axis([0 1 -1.5 1.5]) Daniel@0: fplot('sin(2*pi*x)', [0 1], '-g') Daniel@0: legend('data', 'function'); Daniel@0: Daniel@0: disp(' ') Daniel@0: disp('Press any key to continue') Daniel@0: pause; clc; Daniel@0: Daniel@0: disp('Next we create a two-layer MLP network having 3 hidden units and one') Daniel@0: disp('linear output. The model assumes Gaussian target noise governed by an') Daniel@0: disp('inverse variance hyperparmeter beta, and uses a simple Gaussian prior') Daniel@0: disp('distribution governed by an inverse variance hyperparameter alpha.') Daniel@0: disp(' '); Daniel@0: disp('The network weights and the hyperparameters are initialised and then') Daniel@0: disp('the output layer weights are optimized with the scaled conjugate gradient') Daniel@0: disp('algorithm using the SCG function, with the hyperparameters kept') Daniel@0: disp('fixed. After a maximum of 50 iterations, the hyperparameters are') Daniel@0: disp('re-estimated using the EVIDENCE function. The process of optimizing') Daniel@0: disp('the weights with fixed hyperparameters and then re-estimating the') Daniel@0: disp('hyperparameters is repeated for a total of 3 cycles.') Daniel@0: disp(' ') Daniel@0: disp('Press any key to train the network and determine the hyperparameters.') Daniel@0: pause; Daniel@0: Daniel@0: % Set up network parameters. Daniel@0: nin = 1; % Number of inputs. Daniel@0: nhidden = 3; % Number of hidden units. Daniel@0: nout = 1; % Number of outputs. Daniel@0: alpha = 0.01; % Initial prior hyperparameter. Daniel@0: beta_init = 50.0; % Initial noise hyperparameter. Daniel@0: Daniel@0: % Create and initialize network weight vector. Daniel@0: net = rbf(nin, nhidden, nout, 'tps', 'linear', alpha, beta_init); Daniel@0: [net.mask, prior] = rbfprior('tps', nin, nhidden, nout, alpha, alpha); Daniel@0: net = netinit(net, prior); Daniel@0: Daniel@0: options = foptions; Daniel@0: options(14) = 5; % At most 5 EM iterations for basis functions Daniel@0: options(1) = -1; % Turn off all messages Daniel@0: net = rbfsetbf(net, options, x); % Initialise the basis functions Daniel@0: Daniel@0: % Now train the network Daniel@0: nouter = 5; Daniel@0: ninner = 2; Daniel@0: options = foptions; Daniel@0: options(1) = 1; Daniel@0: options(2) = 1.0e-5; % Absolute precision for weights. Daniel@0: options(3) = 1.0e-5; % Precision for objective function. Daniel@0: options(14) = 50; % Number of training cycles in inner loop. Daniel@0: Daniel@0: % Train using scaled conjugate gradients, re-estimating alpha and beta. Daniel@0: for k = 1:nouter Daniel@0: net = netopt(net, options, x, t, 'scg'); Daniel@0: [net, gamma] = evidence(net, x, t, ninner); Daniel@0: fprintf(1, '\nRe-estimation cycle %d:\n', k); Daniel@0: fprintf(1, ' alpha = %8.5f\n', net.alpha); Daniel@0: fprintf(1, ' beta = %8.5f\n', net.beta); Daniel@0: fprintf(1, ' gamma = %8.5f\n\n', gamma); Daniel@0: disp(' ') Daniel@0: disp('Press any key to continue.') Daniel@0: pause; Daniel@0: end Daniel@0: Daniel@0: fprintf(1, 'true beta: %f\n', 1/(noise*noise)); Daniel@0: Daniel@0: disp(' ') Daniel@0: disp('Network training and hyperparameter re-estimation are now complete.') Daniel@0: disp('Compare the final value for the hyperparameter beta with the true') Daniel@0: disp('value.') Daniel@0: disp(' ') Daniel@0: disp('Notice that the final error value is close to the number of data') Daniel@0: disp(['points (', num2str(ndata),') divided by two.']) Daniel@0: disp(' ') Daniel@0: disp('Press any key to continue.') Daniel@0: pause; clc; Daniel@0: disp('We can now plot the function represented by the trained network. This') Daniel@0: disp('corresponds to the mean of the predictive distribution. We can also') Daniel@0: disp('plot ''error bars'' representing one standard deviation of the') Daniel@0: disp('predictive distribution around the mean.') Daniel@0: disp(' ') Daniel@0: disp('Press any key to add the network function and error bars to the plot.') Daniel@0: pause; Daniel@0: Daniel@0: % Evaluate error bars. Daniel@0: [y, sig2] = netevfwd(netpak(net), net, x, t, plotvals); Daniel@0: sig = sqrt(sig2); Daniel@0: Daniel@0: % Plot the data, the original function, and the trained network function. Daniel@0: [y, z] = rbffwd(net, plotvals); Daniel@0: figure(h); hold on; Daniel@0: plot(plotvals, y, '-r') Daniel@0: xlabel('Input') Daniel@0: ylabel('Target') Daniel@0: plot(plotvals, y + sig, '-b'); Daniel@0: plot(plotvals, y - sig, '-b'); Daniel@0: legend('data', 'function', 'network', 'error bars'); Daniel@0: Daniel@0: disp(' ') Daniel@0: disp('Notice how the confidence interval spanned by the ''error bars'' is') Daniel@0: disp('smaller in the region of input space where the data density is high,') Daniel@0: disp('and becomes larger in regions away from the data.') Daniel@0: disp(' ') Daniel@0: disp('Press any key to end.') Daniel@0: pause; clc; close(h); Daniel@0: