wolffd@0: %DEMGPARD Demonstrate ARD using a Gaussian Process. wolffd@0: % wolffd@0: % Description wolffd@0: % The data consists of three input variables X1, X2 and X3, and one wolffd@0: % target variable T. The target data is generated by computing wolffd@0: % SIN(2*PI*X1) and adding Gaussian noise, x2 is a copy of x1 with a wolffd@0: % higher level of added noise, and x3 is sampled randomly from a wolffd@0: % Gaussian distribution. A Gaussian Process, is trained by optimising wolffd@0: % the hyperparameters using the scaled conjugate gradient algorithm. wolffd@0: % The final values of the hyperparameters show that the model wolffd@0: % successfully identifies the importance of each input. wolffd@0: % wolffd@0: % See also wolffd@0: % DEMGP, GP, GPERR, GPFWD, GPGRAD, GPINIT, SCG wolffd@0: % wolffd@0: wolffd@0: % Copyright (c) Ian T Nabney (1996-2001) wolffd@0: wolffd@0: clc; wolffd@0: randn('state', 1729); wolffd@0: rand('state', 1729); wolffd@0: disp('This demonstration illustrates the technique of automatic relevance') wolffd@0: disp('determination (ARD) using a Gaussian Process.') wolffd@0: disp(' '); wolffd@0: disp('First, we set up a synthetic data set involving three input variables:') wolffd@0: disp('x1 is sampled uniformly from the range (0,1) and has a low level of') wolffd@0: disp('added Gaussian noise, x2 is a copy of x1 with a higher level of added') wolffd@0: disp('noise, and x3 is sampled randomly from a Gaussian distribution. The') wolffd@0: disp('single target variable is given by t = sin(2*pi*x1) with additive') wolffd@0: disp('Gaussian noise. Thus x1 is very relevant for determining the target') wolffd@0: disp('value, x2 is of some relevance, while x3 should in principle be') wolffd@0: disp('irrelevant.') wolffd@0: disp(' '); wolffd@0: disp('Press any key to see a plot of t against x1.') wolffd@0: pause; wolffd@0: wolffd@0: ndata = 100; wolffd@0: x1 = rand(ndata, 1); wolffd@0: x2 = x1 + 0.05*randn(ndata, 1); wolffd@0: x3 = 0.5 + 0.5*randn(ndata, 1); wolffd@0: x = [x1, x2, x3]; wolffd@0: t = sin(2*pi*x1) + 0.1*randn(ndata, 1); wolffd@0: wolffd@0: % Plot the data and the original function. wolffd@0: h = figure; wolffd@0: plotvals = linspace(0, 1, 200)'; wolffd@0: plot(x1, t, 'ob') wolffd@0: hold on wolffd@0: xlabel('Input x1') wolffd@0: ylabel('Target') wolffd@0: axis([0 1 -1.5 1.5]) wolffd@0: [fx, fy] = fplot('sin(2*pi*x)', [0 1]); wolffd@0: plot(fx, fy, '-g', 'LineWidth', 2); wolffd@0: legend('data', 'function'); wolffd@0: wolffd@0: disp(' '); wolffd@0: disp('Press any key to continue') wolffd@0: pause; clc; wolffd@0: wolffd@0: disp('The Gaussian Process has a separate hyperparameter for each input.') wolffd@0: disp('The hyperparameters are trained by error minimisation using the scaled.') wolffd@0: disp('conjugate gradient optimiser.') wolffd@0: disp(' '); wolffd@0: disp('Press any key to create and train the model.') wolffd@0: disp(' '); wolffd@0: pause; wolffd@0: wolffd@0: net = gp(3, 'sqexp'); wolffd@0: % Initialise the parameters. wolffd@0: prior.pr_mean = 0; wolffd@0: prior.pr_var = 0.1; wolffd@0: net = gpinit(net, x, t, prior); wolffd@0: wolffd@0: % Now train to find the hyperparameters. wolffd@0: options = foptions; wolffd@0: options(1) = 1; wolffd@0: options(14) = 30; wolffd@0: wolffd@0: [net, options] = netopt(net, options, x, t, 'scg'); wolffd@0: wolffd@0: rel = exp(net.inweights); wolffd@0: wolffd@0: fprintf(1, ... wolffd@0: '\nFinal hyperparameters:\n\n bias:\t\t%10.6f\n noise:\t%10.6f\n', ... wolffd@0: exp(net.bias), exp(net.noise)); wolffd@0: fprintf(1, ' Vertical scale: %8.6f\n', exp(net.fpar(1))); wolffd@0: fprintf(1, ' Input 1:\t%10.6f\n Input 2:\t%10.6f\n', ... wolffd@0: rel(1), rel(2)); wolffd@0: fprintf(1, ' Input 3:\t%10.6f\n\n', rel(3)); wolffd@0: disp(' '); wolffd@0: disp('We see that the inverse lengthscale associated with') wolffd@0: disp('input x1 is large, that of x2 has an intermediate value and the variance') wolffd@0: disp('of weights associated with x3 is small.') wolffd@0: disp(' '); wolffd@0: disp('This implies that the Gaussian Process is giving greatest emphasis') wolffd@0: disp('to x1 and least emphasis to x3, with intermediate emphasis on') wolffd@0: disp('x2 in the covariance function.') wolffd@0: disp(' ') wolffd@0: disp('Since the target t is statistically independent of x3 we might') wolffd@0: disp('expect the weights associated with this input would go to') wolffd@0: disp('zero. However, for any finite data set there may be some chance') wolffd@0: disp('correlation between x3 and t, and so the corresponding hyperparameter remains') wolffd@0: disp('finite.') wolffd@0: disp('Press any key to continue.') wolffd@0: pause wolffd@0: wolffd@0: disp('Finally, we plot the output of the Gaussian Process along the line') wolffd@0: disp('x1 = x2 = x3, together with the true underlying function.') wolffd@0: xt = linspace(0, 1, 50); wolffd@0: xtest = [xt', xt', xt']; wolffd@0: wolffd@0: cn = gpcovar(net, x); wolffd@0: cninv = inv(cn); wolffd@0: [ytest, sigsq] = gpfwd(net, xtest, cninv); wolffd@0: sig = sqrt(sigsq); wolffd@0: wolffd@0: figure(h); hold on; wolffd@0: plot(xt, ytest, '-k'); wolffd@0: plot(xt, ytest+(2*sig), '-b', xt, ytest-(2*sig), '-b'); wolffd@0: axis([0 1 -1.5 1.5]); wolffd@0: fplot('sin(2*pi*x)', [0 1], '--m'); wolffd@0: wolffd@0: disp(' '); wolffd@0: disp('Press any key to end.') wolffd@0: pause; clc; close(h); clear all wolffd@0: