Daniel@0: %DEMEV3	Demonstrate Bayesian regression for the RBF.
Daniel@0: %
Daniel@0: %	Description
Daniel@0: %	The problem consists an input variable X which sampled from a
Daniel@0: %	Gaussian distribution, and a target variable T generated by computing
Daniel@0: %	SIN(2*PI*X) and adding Gaussian noise. An RBF network with linear
Daniel@0: %	outputs is trained by minimizing a sum-of-squares error function with
Daniel@0: %	isotropic Gaussian regularizer, using the scaled conjugate gradient
Daniel@0: %	optimizer. The hyperparameters ALPHA and BETA are re-estimated using
Daniel@0: %	the function EVIDENCE. A graph  is plotted of the original function,
Daniel@0: %	the training data, the trained network function, and the error bars.
Daniel@0: %
Daniel@0: %	See also
Daniel@0: %	DEMEV1, EVIDENCE, RBF, SCG, NETEVFWD
Daniel@0: %
Daniel@0: 
Daniel@0: %	Copyright (c) Ian T Nabney (1996-2001)
Daniel@0: 
Daniel@0: clc;
Daniel@0: disp('This demonstration illustrates the application of Bayesian')
Daniel@0: disp('re-estimation to determine the hyperparameters in a simple regression')
Daniel@0: disp('problem using an RBF netowk. It is based on a the fact that the')
Daniel@0: disp('posterior distribution for the output weights of an RBF is Gaussian')
Daniel@0: disp('and uses the evidence maximization framework of MacKay.')
Daniel@0: disp(' ')
Daniel@0: disp('First, we generate a synthetic data set consisting of a single input')
Daniel@0: disp('variable x sampled from a Gaussian distribution, and a target variable')
Daniel@0: disp('t obtained by evaluating sin(2*pi*x) and adding Gaussian noise.')
Daniel@0: disp(' ')
Daniel@0: disp('Press any key to see a plot of the data together with the sine function.')
Daniel@0: pause;
Daniel@0: 
Daniel@0: % Generate the matrix of inputs x and targets t.
Daniel@0: 
Daniel@0: ndata = 16;			% Number of data points.
Daniel@0: noise = 0.1;			% Standard deviation of noise distribution.
Daniel@0: randn('state', 0);
Daniel@0: rand('state', 0);
Daniel@0: x = 0.25 + 0.07*randn(ndata, 1);
Daniel@0: t = sin(2*pi*x) + noise*randn(size(x));
Daniel@0: 
Daniel@0: % Plot the data and the original sine function.
Daniel@0: h = figure;
Daniel@0: nplot = 200;
Daniel@0: plotvals = linspace(0, 1, nplot)';
Daniel@0: plot(x, t, 'ok')
Daniel@0: xlabel('Input')
Daniel@0: ylabel('Target')
Daniel@0: hold on
Daniel@0: axis([0 1 -1.5 1.5])
Daniel@0: fplot('sin(2*pi*x)', [0 1], '-g')
Daniel@0: legend('data', 'function');
Daniel@0: 
Daniel@0: disp(' ')
Daniel@0: disp('Press any key to continue')
Daniel@0: pause; clc;
Daniel@0: 
Daniel@0: disp('Next we create a two-layer MLP network having 3 hidden units and one')
Daniel@0: disp('linear output. The model assumes Gaussian target noise governed by an')
Daniel@0: disp('inverse variance hyperparmeter beta, and uses a simple Gaussian prior')
Daniel@0: disp('distribution governed by an inverse variance hyperparameter alpha.')
Daniel@0: disp(' ');
Daniel@0: disp('The network weights and the hyperparameters are initialised and then')
Daniel@0: disp('the output layer weights are optimized with the scaled conjugate gradient')
Daniel@0: disp('algorithm using the SCG function, with the hyperparameters kept')
Daniel@0: disp('fixed. After a maximum of 50 iterations, the hyperparameters are')
Daniel@0: disp('re-estimated using the EVIDENCE function. The process of optimizing')
Daniel@0: disp('the weights with fixed hyperparameters and then re-estimating the')
Daniel@0: disp('hyperparameters is repeated for a total of 3 cycles.')
Daniel@0: disp(' ')
Daniel@0: disp('Press any key to train the network and determine the hyperparameters.')
Daniel@0: pause;
Daniel@0: 
Daniel@0: % Set up network parameters.
Daniel@0: nin = 1;		% Number of inputs.
Daniel@0: nhidden = 3;		% Number of hidden units.
Daniel@0: nout = 1;		% Number of outputs.
Daniel@0: alpha = 0.01;		% Initial prior hyperparameter. 
Daniel@0: beta_init = 50.0;	% Initial noise hyperparameter.
Daniel@0: 
Daniel@0: % Create and initialize network weight vector.
Daniel@0: net = rbf(nin, nhidden, nout, 'tps', 'linear', alpha, beta_init);
Daniel@0: [net.mask, prior] = rbfprior('tps', nin, nhidden, nout, alpha, alpha);
Daniel@0: net = netinit(net, prior);
Daniel@0: 
Daniel@0: options = foptions;
Daniel@0: options(14) = 5;  % At most 5 EM iterations for basis functions
Daniel@0: options(1) = -1;  % Turn off all messages
Daniel@0: net = rbfsetbf(net, options, x);  % Initialise the basis functions
Daniel@0: 
Daniel@0: % Now train the network
Daniel@0: nouter = 5;
Daniel@0: ninner = 2;
Daniel@0: options = foptions;
Daniel@0: options(1) = 1;
Daniel@0: options(2) = 1.0e-5;		% Absolute precision for weights.
Daniel@0: options(3) = 1.0e-5;		% Precision for objective function.
Daniel@0: options(14) = 50;		% Number of training cycles in inner loop. 
Daniel@0: 
Daniel@0: % Train using scaled conjugate gradients, re-estimating alpha and beta.
Daniel@0: for k = 1:nouter
Daniel@0:   net = netopt(net, options, x, t, 'scg');
Daniel@0:   [net, gamma] = evidence(net, x, t, ninner);
Daniel@0:   fprintf(1, '\nRe-estimation cycle %d:\n', k);
Daniel@0:   fprintf(1, '  alpha =  %8.5f\n', net.alpha);
Daniel@0:   fprintf(1, '  beta  =  %8.5f\n', net.beta);
Daniel@0:   fprintf(1, '  gamma =  %8.5f\n\n', gamma);
Daniel@0:   disp(' ')
Daniel@0:   disp('Press any key to continue.')
Daniel@0:   pause;
Daniel@0: end
Daniel@0: 
Daniel@0: fprintf(1, 'true beta: %f\n', 1/(noise*noise));
Daniel@0: 
Daniel@0: disp(' ')
Daniel@0: disp('Network training and hyperparameter re-estimation are now complete.') 
Daniel@0: disp('Compare the final value for the hyperparameter beta with the true') 
Daniel@0: disp('value.')
Daniel@0: disp(' ')
Daniel@0: disp('Notice that the final error value is close to the number of data')
Daniel@0: disp(['points (', num2str(ndata),') divided by two.'])
Daniel@0: disp(' ')
Daniel@0: disp('Press any key to continue.')
Daniel@0: pause; clc;
Daniel@0: disp('We can now plot the function represented by the trained network. This')
Daniel@0: disp('corresponds to the mean of the predictive distribution. We can also')
Daniel@0: disp('plot ''error bars'' representing one standard deviation of the')
Daniel@0: disp('predictive distribution around the mean.')
Daniel@0: disp(' ')
Daniel@0: disp('Press any key to add the network function and error bars to the plot.')
Daniel@0: pause;
Daniel@0: 
Daniel@0: % Evaluate error bars.
Daniel@0: [y, sig2] = netevfwd(netpak(net), net, x, t, plotvals);
Daniel@0: sig = sqrt(sig2);
Daniel@0: 
Daniel@0: % Plot the data, the original function, and the trained network function.
Daniel@0: [y, z] = rbffwd(net, plotvals);
Daniel@0: figure(h); hold on;
Daniel@0: plot(plotvals, y, '-r')
Daniel@0: xlabel('Input')
Daniel@0: ylabel('Target')
Daniel@0: plot(plotvals, y + sig, '-b');
Daniel@0: plot(plotvals, y - sig, '-b');
Daniel@0: legend('data', 'function', 'network', 'error bars');
Daniel@0: 
Daniel@0: disp(' ')
Daniel@0: disp('Notice how the confidence interval spanned by the ''error bars'' is')
Daniel@0: disp('smaller in the region of input space where the data density is high,')
Daniel@0: disp('and becomes larger in regions away from the data.')
Daniel@0: disp(' ')
Daniel@0: disp('Press any key to end.')
Daniel@0: pause; clc; close(h); 
Daniel@0: