wolffd@0: %DEMEV3	Demonstrate Bayesian regression for the RBF.
wolffd@0: %
wolffd@0: %	Description
wolffd@0: %	The problem consists an input variable X which sampled from a
wolffd@0: %	Gaussian distribution, and a target variable T generated by computing
wolffd@0: %	SIN(2*PI*X) and adding Gaussian noise. An RBF network with linear
wolffd@0: %	outputs is trained by minimizing a sum-of-squares error function with
wolffd@0: %	isotropic Gaussian regularizer, using the scaled conjugate gradient
wolffd@0: %	optimizer. The hyperparameters ALPHA and BETA are re-estimated using
wolffd@0: %	the function EVIDENCE. A graph  is plotted of the original function,
wolffd@0: %	the training data, the trained network function, and the error bars.
wolffd@0: %
wolffd@0: %	See also
wolffd@0: %	DEMEV1, EVIDENCE, RBF, SCG, NETEVFWD
wolffd@0: %
wolffd@0: 
wolffd@0: %	Copyright (c) Ian T Nabney (1996-2001)
wolffd@0: 
wolffd@0: clc;
wolffd@0: disp('This demonstration illustrates the application of Bayesian')
wolffd@0: disp('re-estimation to determine the hyperparameters in a simple regression')
wolffd@0: disp('problem using an RBF netowk. It is based on a the fact that the')
wolffd@0: disp('posterior distribution for the output weights of an RBF is Gaussian')
wolffd@0: disp('and uses the evidence maximization framework of MacKay.')
wolffd@0: disp(' ')
wolffd@0: disp('First, we generate a synthetic data set consisting of a single input')
wolffd@0: disp('variable x sampled from a Gaussian distribution, and a target variable')
wolffd@0: disp('t obtained by evaluating sin(2*pi*x) and adding Gaussian noise.')
wolffd@0: disp(' ')
wolffd@0: disp('Press any key to see a plot of the data together with the sine function.')
wolffd@0: pause;
wolffd@0: 
wolffd@0: % Generate the matrix of inputs x and targets t.
wolffd@0: 
wolffd@0: ndata = 16;			% Number of data points.
wolffd@0: noise = 0.1;			% Standard deviation of noise distribution.
wolffd@0: randn('state', 0);
wolffd@0: rand('state', 0);
wolffd@0: x = 0.25 + 0.07*randn(ndata, 1);
wolffd@0: t = sin(2*pi*x) + noise*randn(size(x));
wolffd@0: 
wolffd@0: % Plot the data and the original sine function.
wolffd@0: h = figure;
wolffd@0: nplot = 200;
wolffd@0: plotvals = linspace(0, 1, nplot)';
wolffd@0: plot(x, t, 'ok')
wolffd@0: xlabel('Input')
wolffd@0: ylabel('Target')
wolffd@0: hold on
wolffd@0: axis([0 1 -1.5 1.5])
wolffd@0: fplot('sin(2*pi*x)', [0 1], '-g')
wolffd@0: legend('data', 'function');
wolffd@0: 
wolffd@0: disp(' ')
wolffd@0: disp('Press any key to continue')
wolffd@0: pause; clc;
wolffd@0: 
wolffd@0: disp('Next we create a two-layer MLP network having 3 hidden units and one')
wolffd@0: disp('linear output. The model assumes Gaussian target noise governed by an')
wolffd@0: disp('inverse variance hyperparmeter beta, and uses a simple Gaussian prior')
wolffd@0: disp('distribution governed by an inverse variance hyperparameter alpha.')
wolffd@0: disp(' ');
wolffd@0: disp('The network weights and the hyperparameters are initialised and then')
wolffd@0: disp('the output layer weights are optimized with the scaled conjugate gradient')
wolffd@0: disp('algorithm using the SCG function, with the hyperparameters kept')
wolffd@0: disp('fixed. After a maximum of 50 iterations, the hyperparameters are')
wolffd@0: disp('re-estimated using the EVIDENCE function. The process of optimizing')
wolffd@0: disp('the weights with fixed hyperparameters and then re-estimating the')
wolffd@0: disp('hyperparameters is repeated for a total of 3 cycles.')
wolffd@0: disp(' ')
wolffd@0: disp('Press any key to train the network and determine the hyperparameters.')
wolffd@0: pause;
wolffd@0: 
wolffd@0: % Set up network parameters.
wolffd@0: nin = 1;		% Number of inputs.
wolffd@0: nhidden = 3;		% Number of hidden units.
wolffd@0: nout = 1;		% Number of outputs.
wolffd@0: alpha = 0.01;		% Initial prior hyperparameter. 
wolffd@0: beta_init = 50.0;	% Initial noise hyperparameter.
wolffd@0: 
wolffd@0: % Create and initialize network weight vector.
wolffd@0: net = rbf(nin, nhidden, nout, 'tps', 'linear', alpha, beta_init);
wolffd@0: [net.mask, prior] = rbfprior('tps', nin, nhidden, nout, alpha, alpha);
wolffd@0: net = netinit(net, prior);
wolffd@0: 
wolffd@0: options = foptions;
wolffd@0: options(14) = 5;  % At most 5 EM iterations for basis functions
wolffd@0: options(1) = -1;  % Turn off all messages
wolffd@0: net = rbfsetbf(net, options, x);  % Initialise the basis functions
wolffd@0: 
wolffd@0: % Now train the network
wolffd@0: nouter = 5;
wolffd@0: ninner = 2;
wolffd@0: options = foptions;
wolffd@0: options(1) = 1;
wolffd@0: options(2) = 1.0e-5;		% Absolute precision for weights.
wolffd@0: options(3) = 1.0e-5;		% Precision for objective function.
wolffd@0: options(14) = 50;		% Number of training cycles in inner loop. 
wolffd@0: 
wolffd@0: % Train using scaled conjugate gradients, re-estimating alpha and beta.
wolffd@0: for k = 1:nouter
wolffd@0:   net = netopt(net, options, x, t, 'scg');
wolffd@0:   [net, gamma] = evidence(net, x, t, ninner);
wolffd@0:   fprintf(1, '\nRe-estimation cycle %d:\n', k);
wolffd@0:   fprintf(1, '  alpha =  %8.5f\n', net.alpha);
wolffd@0:   fprintf(1, '  beta  =  %8.5f\n', net.beta);
wolffd@0:   fprintf(1, '  gamma =  %8.5f\n\n', gamma);
wolffd@0:   disp(' ')
wolffd@0:   disp('Press any key to continue.')
wolffd@0:   pause;
wolffd@0: end
wolffd@0: 
wolffd@0: fprintf(1, 'true beta: %f\n', 1/(noise*noise));
wolffd@0: 
wolffd@0: disp(' ')
wolffd@0: disp('Network training and hyperparameter re-estimation are now complete.') 
wolffd@0: disp('Compare the final value for the hyperparameter beta with the true') 
wolffd@0: disp('value.')
wolffd@0: disp(' ')
wolffd@0: disp('Notice that the final error value is close to the number of data')
wolffd@0: disp(['points (', num2str(ndata),') divided by two.'])
wolffd@0: disp(' ')
wolffd@0: disp('Press any key to continue.')
wolffd@0: pause; clc;
wolffd@0: disp('We can now plot the function represented by the trained network. This')
wolffd@0: disp('corresponds to the mean of the predictive distribution. We can also')
wolffd@0: disp('plot ''error bars'' representing one standard deviation of the')
wolffd@0: disp('predictive distribution around the mean.')
wolffd@0: disp(' ')
wolffd@0: disp('Press any key to add the network function and error bars to the plot.')
wolffd@0: pause;
wolffd@0: 
wolffd@0: % Evaluate error bars.
wolffd@0: [y, sig2] = netevfwd(netpak(net), net, x, t, plotvals);
wolffd@0: sig = sqrt(sig2);
wolffd@0: 
wolffd@0: % Plot the data, the original function, and the trained network function.
wolffd@0: [y, z] = rbffwd(net, plotvals);
wolffd@0: figure(h); hold on;
wolffd@0: plot(plotvals, y, '-r')
wolffd@0: xlabel('Input')
wolffd@0: ylabel('Target')
wolffd@0: plot(plotvals, y + sig, '-b');
wolffd@0: plot(plotvals, y - sig, '-b');
wolffd@0: legend('data', 'function', 'network', 'error bars');
wolffd@0: 
wolffd@0: disp(' ')
wolffd@0: disp('Notice how the confidence interval spanned by the ''error bars'' is')
wolffd@0: disp('smaller in the region of input space where the data density is high,')
wolffd@0: disp('and becomes larger in regions away from the data.')
wolffd@0: disp(' ')
wolffd@0: disp('Press any key to end.')
wolffd@0: pause; clc; close(h); 
wolffd@0: