wolffd@0: %DEMEV3 Demonstrate Bayesian regression for the RBF. wolffd@0: % wolffd@0: % Description wolffd@0: % The problem consists an input variable X which sampled from a wolffd@0: % Gaussian distribution, and a target variable T generated by computing wolffd@0: % SIN(2*PI*X) and adding Gaussian noise. An RBF network with linear wolffd@0: % outputs is trained by minimizing a sum-of-squares error function with wolffd@0: % isotropic Gaussian regularizer, using the scaled conjugate gradient wolffd@0: % optimizer. The hyperparameters ALPHA and BETA are re-estimated using wolffd@0: % the function EVIDENCE. A graph is plotted of the original function, wolffd@0: % the training data, the trained network function, and the error bars. wolffd@0: % wolffd@0: % See also wolffd@0: % DEMEV1, EVIDENCE, RBF, SCG, NETEVFWD wolffd@0: % wolffd@0: wolffd@0: % Copyright (c) Ian T Nabney (1996-2001) wolffd@0: wolffd@0: clc; wolffd@0: disp('This demonstration illustrates the application of Bayesian') wolffd@0: disp('re-estimation to determine the hyperparameters in a simple regression') wolffd@0: disp('problem using an RBF netowk. It is based on a the fact that the') wolffd@0: disp('posterior distribution for the output weights of an RBF is Gaussian') wolffd@0: disp('and uses the evidence maximization framework of MacKay.') wolffd@0: disp(' ') wolffd@0: disp('First, we generate a synthetic data set consisting of a single input') wolffd@0: disp('variable x sampled from a Gaussian distribution, and a target variable') wolffd@0: disp('t obtained by evaluating sin(2*pi*x) and adding Gaussian noise.') wolffd@0: disp(' ') wolffd@0: disp('Press any key to see a plot of the data together with the sine function.') wolffd@0: pause; wolffd@0: wolffd@0: % Generate the matrix of inputs x and targets t. wolffd@0: wolffd@0: ndata = 16; % Number of data points. wolffd@0: noise = 0.1; % Standard deviation of noise distribution. wolffd@0: randn('state', 0); wolffd@0: rand('state', 0); wolffd@0: x = 0.25 + 0.07*randn(ndata, 1); wolffd@0: t = sin(2*pi*x) + noise*randn(size(x)); wolffd@0: wolffd@0: % Plot the data and the original sine function. wolffd@0: h = figure; wolffd@0: nplot = 200; wolffd@0: plotvals = linspace(0, 1, nplot)'; wolffd@0: plot(x, t, 'ok') wolffd@0: xlabel('Input') wolffd@0: ylabel('Target') wolffd@0: hold on wolffd@0: axis([0 1 -1.5 1.5]) wolffd@0: fplot('sin(2*pi*x)', [0 1], '-g') wolffd@0: legend('data', 'function'); wolffd@0: wolffd@0: disp(' ') wolffd@0: disp('Press any key to continue') wolffd@0: pause; clc; wolffd@0: wolffd@0: disp('Next we create a two-layer MLP network having 3 hidden units and one') wolffd@0: disp('linear output. The model assumes Gaussian target noise governed by an') wolffd@0: disp('inverse variance hyperparmeter beta, and uses a simple Gaussian prior') wolffd@0: disp('distribution governed by an inverse variance hyperparameter alpha.') wolffd@0: disp(' '); wolffd@0: disp('The network weights and the hyperparameters are initialised and then') wolffd@0: disp('the output layer weights are optimized with the scaled conjugate gradient') wolffd@0: disp('algorithm using the SCG function, with the hyperparameters kept') wolffd@0: disp('fixed. After a maximum of 50 iterations, the hyperparameters are') wolffd@0: disp('re-estimated using the EVIDENCE function. The process of optimizing') wolffd@0: disp('the weights with fixed hyperparameters and then re-estimating the') wolffd@0: disp('hyperparameters is repeated for a total of 3 cycles.') wolffd@0: disp(' ') wolffd@0: disp('Press any key to train the network and determine the hyperparameters.') wolffd@0: pause; wolffd@0: wolffd@0: % Set up network parameters. wolffd@0: nin = 1; % Number of inputs. wolffd@0: nhidden = 3; % Number of hidden units. wolffd@0: nout = 1; % Number of outputs. wolffd@0: alpha = 0.01; % Initial prior hyperparameter. wolffd@0: beta_init = 50.0; % Initial noise hyperparameter. wolffd@0: wolffd@0: % Create and initialize network weight vector. wolffd@0: net = rbf(nin, nhidden, nout, 'tps', 'linear', alpha, beta_init); wolffd@0: [net.mask, prior] = rbfprior('tps', nin, nhidden, nout, alpha, alpha); wolffd@0: net = netinit(net, prior); wolffd@0: wolffd@0: options = foptions; wolffd@0: options(14) = 5; % At most 5 EM iterations for basis functions wolffd@0: options(1) = -1; % Turn off all messages wolffd@0: net = rbfsetbf(net, options, x); % Initialise the basis functions wolffd@0: wolffd@0: % Now train the network wolffd@0: nouter = 5; wolffd@0: ninner = 2; wolffd@0: options = foptions; wolffd@0: options(1) = 1; wolffd@0: options(2) = 1.0e-5; % Absolute precision for weights. wolffd@0: options(3) = 1.0e-5; % Precision for objective function. wolffd@0: options(14) = 50; % Number of training cycles in inner loop. wolffd@0: wolffd@0: % Train using scaled conjugate gradients, re-estimating alpha and beta. wolffd@0: for k = 1:nouter wolffd@0: net = netopt(net, options, x, t, 'scg'); wolffd@0: [net, gamma] = evidence(net, x, t, ninner); wolffd@0: fprintf(1, '\nRe-estimation cycle %d:\n', k); wolffd@0: fprintf(1, ' alpha = %8.5f\n', net.alpha); wolffd@0: fprintf(1, ' beta = %8.5f\n', net.beta); wolffd@0: fprintf(1, ' gamma = %8.5f\n\n', gamma); wolffd@0: disp(' ') wolffd@0: disp('Press any key to continue.') wolffd@0: pause; wolffd@0: end wolffd@0: wolffd@0: fprintf(1, 'true beta: %f\n', 1/(noise*noise)); wolffd@0: wolffd@0: disp(' ') wolffd@0: disp('Network training and hyperparameter re-estimation are now complete.') wolffd@0: disp('Compare the final value for the hyperparameter beta with the true') wolffd@0: disp('value.') wolffd@0: disp(' ') wolffd@0: disp('Notice that the final error value is close to the number of data') wolffd@0: disp(['points (', num2str(ndata),') divided by two.']) wolffd@0: disp(' ') wolffd@0: disp('Press any key to continue.') wolffd@0: pause; clc; wolffd@0: disp('We can now plot the function represented by the trained network. This') wolffd@0: disp('corresponds to the mean of the predictive distribution. We can also') wolffd@0: disp('plot ''error bars'' representing one standard deviation of the') wolffd@0: disp('predictive distribution around the mean.') wolffd@0: disp(' ') wolffd@0: disp('Press any key to add the network function and error bars to the plot.') wolffd@0: pause; wolffd@0: wolffd@0: % Evaluate error bars. wolffd@0: [y, sig2] = netevfwd(netpak(net), net, x, t, plotvals); wolffd@0: sig = sqrt(sig2); wolffd@0: wolffd@0: % Plot the data, the original function, and the trained network function. wolffd@0: [y, z] = rbffwd(net, plotvals); wolffd@0: figure(h); hold on; wolffd@0: plot(plotvals, y, '-r') wolffd@0: xlabel('Input') wolffd@0: ylabel('Target') wolffd@0: plot(plotvals, y + sig, '-b'); wolffd@0: plot(plotvals, y - sig, '-b'); wolffd@0: legend('data', 'function', 'network', 'error bars'); wolffd@0: wolffd@0: disp(' ') wolffd@0: disp('Notice how the confidence interval spanned by the ''error bars'' is') wolffd@0: disp('smaller in the region of input space where the data density is high,') wolffd@0: disp('and becomes larger in regions away from the data.') wolffd@0: disp(' ') wolffd@0: disp('Press any key to end.') wolffd@0: pause; clc; close(h); wolffd@0: