wolffd@0: %DEMEV2 Demonstrate Bayesian classification for the MLP. wolffd@0: % wolffd@0: % Description wolffd@0: % A synthetic two class two-dimensional dataset X is sampled from a wolffd@0: % mixture of four Gaussians. Each class is associated with two of the wolffd@0: % Gaussians so that the optimal decision boundary is non-linear. A 2- wolffd@0: % layer network with logistic outputs is trained by minimizing the wolffd@0: % cross-entropy error function with isotroipc Gaussian regularizer (one wolffd@0: % hyperparameter for each of the four standard weight groups), using wolffd@0: % the scaled conjugate gradient optimizer. The hyperparameter vectors wolffd@0: % ALPHA and BETA are re-estimated using the function EVIDENCE. A graph wolffd@0: % is plotted of the optimal, regularised, and unregularised decision wolffd@0: % boundaries. A further plot of the moderated versus unmoderated wolffd@0: % contours is generated. wolffd@0: % wolffd@0: % See also wolffd@0: % EVIDENCE, MLP, SCG, DEMARD, DEMMLP2 wolffd@0: % wolffd@0: wolffd@0: % Copyright (c) Ian T Nabney (1996-2001) wolffd@0: wolffd@0: wolffd@0: clc; wolffd@0: wolffd@0: disp('This program demonstrates the use of the evidence procedure on') wolffd@0: disp('a two-class problem. It also shows the improved generalisation') wolffd@0: disp('performance that can be achieved with moderated outputs; that is') wolffd@0: disp('predictions where an approximate integration over the true') wolffd@0: disp('posterior distribution is carried out.') wolffd@0: disp(' ') wolffd@0: disp('First we generate a synthetic dataset with two-dimensional input') wolffd@0: disp('sampled from a mixture of four Gaussians. Each class is') wolffd@0: disp('associated with two of the Gaussians so that the optimal decision') wolffd@0: disp('boundary is non-linear.') wolffd@0: disp(' ') wolffd@0: disp('Press any key to see a plot of the data.') wolffd@0: pause; wolffd@0: wolffd@0: % Generate the matrix of inputs x and targets t. wolffd@0: wolffd@0: rand('state', 423); wolffd@0: randn('state', 423); wolffd@0: wolffd@0: ClassSymbol1 = 'r.'; wolffd@0: ClassSymbol2 = 'y.'; wolffd@0: PointSize = 12; wolffd@0: titleSize = 10; wolffd@0: wolffd@0: fh1 = figure; wolffd@0: set(fh1, 'Name', 'True Data Distribution'); wolffd@0: whitebg(fh1, 'k'); wolffd@0: wolffd@0: % wolffd@0: % Generate the data wolffd@0: % wolffd@0: n=200; wolffd@0: wolffd@0: % Set up mixture model: 2d data with four centres wolffd@0: % Class 1 is first two centres, class 2 from the other two wolffd@0: mix = gmm(2, 4, 'full'); wolffd@0: mix.priors = [0.25 0.25 0.25 0.25]; wolffd@0: mix.centres = [0 -0.1; 1.5 0; 1 1; 1 -1]; wolffd@0: mix.covars(:,:,1) = [0.625 -0.2165; -0.2165 0.875]; wolffd@0: mix.covars(:,:,2) = [0.25 0; 0 0.25]; wolffd@0: mix.covars(:,:,3) = [0.2241 -0.1368; -0.1368 0.9759]; wolffd@0: mix.covars(:,:,4) = [0.2375 0.1516; 0.1516 0.4125]; wolffd@0: wolffd@0: [data, label] = gmmsamp(mix, n); wolffd@0: wolffd@0: % wolffd@0: % Calculate some useful axis limits wolffd@0: % wolffd@0: x0 = min(data(:,1)); wolffd@0: x1 = max(data(:,1)); wolffd@0: y0 = min(data(:,2)); wolffd@0: y1 = max(data(:,2)); wolffd@0: dx = x1-x0; wolffd@0: dy = y1-y0; wolffd@0: expand = 5/100; % Add on 5 percent each way wolffd@0: x0 = x0 - dx*expand; wolffd@0: x1 = x1 + dx*expand; wolffd@0: y0 = y0 - dy*expand; wolffd@0: y1 = y1 + dy*expand; wolffd@0: resolution = 100; wolffd@0: step = dx/resolution; wolffd@0: xrange = [x0:step:x1]; wolffd@0: yrange = [y0:step:y1]; wolffd@0: % wolffd@0: % Generate the grid wolffd@0: % wolffd@0: [X Y]=meshgrid([x0:step:x1],[y0:step:y1]); wolffd@0: % wolffd@0: % Calculate the class conditional densities, the unconditional densities and wolffd@0: % the posterior probabilities wolffd@0: % wolffd@0: px_j = gmmactiv(mix, [X(:) Y(:)]); wolffd@0: px = reshape(px_j*(mix.priors)',size(X)); wolffd@0: post = gmmpost(mix, [X(:) Y(:)]); wolffd@0: p1_x = reshape(post(:, 1) + post(:, 2), size(X)); wolffd@0: p2_x = reshape(post(:, 3) + post(:, 4), size(X)); wolffd@0: wolffd@0: plot(data((label<=2),1),data(label<=2,2),ClassSymbol1, 'MarkerSize', ... wolffd@0: PointSize) wolffd@0: hold on wolffd@0: axis([x0 x1 y0 y1]) wolffd@0: plot(data((label>2),1),data(label>2,2),ClassSymbol2, 'MarkerSize', ... wolffd@0: PointSize) wolffd@0: wolffd@0: % Convert targets to 0-1 encoding wolffd@0: target=[label<=2]; wolffd@0: disp(' ') wolffd@0: disp('Press any key to continue') wolffd@0: pause; clc; wolffd@0: wolffd@0: disp('Next we create a two-layer MLP network with 6 hidden units and') wolffd@0: disp('one logistic output. We use a separate inverse variance') wolffd@0: disp('hyperparameter for each group of weights (inputs, input bias,') wolffd@0: disp('outputs, output bias) and the weights are optimised with the') wolffd@0: disp('scaled conjugate gradient algorithm. After each 100 iterations') wolffd@0: disp('the hyperparameters are re-estimated twice. There are eight') wolffd@0: disp('cycles of the whole algorithm.') wolffd@0: disp(' ') wolffd@0: disp('Press any key to train the network and determine the hyperparameters.') wolffd@0: pause; wolffd@0: wolffd@0: % Set up network parameters. wolffd@0: nin = 2; % Number of inputs. wolffd@0: nhidden = 6; % Number of hidden units. wolffd@0: nout = 1; % Number of outputs. wolffd@0: alpha = 0.01; % Initial prior hyperparameter. wolffd@0: aw1 = 0.01; wolffd@0: ab1 = 0.01; wolffd@0: aw2 = 0.01; wolffd@0: ab2 = 0.01; wolffd@0: wolffd@0: % Create and initialize network weight vector. wolffd@0: prior = mlpprior(nin, nhidden, nout, aw1, ab1, aw2, ab2); wolffd@0: net = mlp(nin, nhidden, nout, 'logistic', prior); wolffd@0: wolffd@0: % Set up vector of options for the optimiser. wolffd@0: nouter = 8; % Number of outer loops. wolffd@0: ninner = 2; % Number of innter loops. wolffd@0: options = foptions; % Default options vector. wolffd@0: options(1) = 1; % This provides display of error values. wolffd@0: options(2) = 1.0e-5; % Absolute precision for weights. wolffd@0: options(3) = 1.0e-5; % Precision for objective function. wolffd@0: options(14) = 100; % Number of training cycles in inner loop. wolffd@0: wolffd@0: % Train using scaled conjugate gradients, re-estimating alpha and beta. wolffd@0: for k = 1:nouter wolffd@0: net = netopt(net, options, data, target, 'scg'); wolffd@0: [net, gamma] = evidence(net, data, target, ninner); wolffd@0: fprintf(1, '\nRe-estimation cycle %d:\n', k); wolffd@0: disp([' alpha = ', num2str(net.alpha')]); wolffd@0: fprintf(1, ' gamma = %8.5f\n\n', gamma); wolffd@0: disp(' ') wolffd@0: disp('Press any key to continue.') wolffd@0: pause; wolffd@0: end wolffd@0: wolffd@0: disp(' ') wolffd@0: disp('Network training and hyperparameter re-estimation are now complete.') wolffd@0: disp('Notice that the final error value is close to the number of data') wolffd@0: disp(['points (', num2str(n), ') divided by two.']) wolffd@0: disp('Also, the hyperparameter values differ, which suggests that a single') wolffd@0: disp('hyperparameter would not be so effective.') wolffd@0: disp(' ') wolffd@0: disp('First we train an MLP without Bayesian regularisation on the') wolffd@0: disp('same dataset using 400 iterations of scaled conjugate gradient') wolffd@0: disp(' ') wolffd@0: disp('Press any key to train the network by maximum likelihood.') wolffd@0: pause; wolffd@0: % Train standard network wolffd@0: net2 = mlp(nin, nhidden, nout, 'logistic'); wolffd@0: options(14) = 400; wolffd@0: net2 = netopt(net2, options, data, target, 'scg'); wolffd@0: y2g = mlpfwd(net2, [X(:), Y(:)]); wolffd@0: y2g = reshape(y2g(:, 1), size(X)); wolffd@0: wolffd@0: disp(' ') wolffd@0: disp('We can now plot the function represented by the trained networks.') wolffd@0: disp('We show the decision boundaries (output = 0.5) and the optimal') wolffd@0: disp('decision boundary given by applying Bayes'' theorem to the true') wolffd@0: disp('data model.') wolffd@0: disp(' ') wolffd@0: disp('Press any key to add the boundaries to the plot.') wolffd@0: pause; wolffd@0: wolffd@0: % Evaluate predictions. wolffd@0: [yg, ymodg] = mlpevfwd(net, data, target, [X(:) Y(:)]); wolffd@0: yg = reshape(yg(:,1),size(X)); wolffd@0: ymodg = reshape(ymodg(:,1),size(X)); wolffd@0: wolffd@0: % Bayesian decision boundary wolffd@0: [cB, hB] = contour(xrange,yrange,p1_x,[0.5 0.5],'b-'); wolffd@0: [cNb, hNb] = contour(xrange,yrange,yg,[0.5 0.5],'r-'); wolffd@0: [cN, hN] = contour(xrange,yrange,y2g,[0.5 0.5],'g-'); wolffd@0: set(hB, 'LineWidth', 2); wolffd@0: set(hNb, 'LineWidth', 2); wolffd@0: set(hN, 'LineWidth', 2); wolffd@0: Chandles = [hB(1) hNb(1) hN(1)]; wolffd@0: legend(Chandles, 'Bayes', ... wolffd@0: 'Reg. Network', 'Network', 3); wolffd@0: wolffd@0: disp(' ') wolffd@0: disp('Note how the regularised network predictions are closer to the') wolffd@0: disp('optimal decision boundary, while the unregularised network is') wolffd@0: disp('overtrained.') wolffd@0: wolffd@0: disp(' ') wolffd@0: disp('We will now compare moderated and unmoderated outputs for the'); wolffd@0: disp('regularised network by showing the contour plot of the posterior') wolffd@0: disp('probability estimates.') wolffd@0: disp(' ') wolffd@0: disp('The first plot shows the regularised (moderated) predictions') wolffd@0: disp('and the second shows the standard predictions from the same network.') wolffd@0: disp('These agree at the level 0.5.') wolffd@0: disp('Press any key to continue') wolffd@0: pause wolffd@0: levels = 0:0.1:1; wolffd@0: fh4 = figure; wolffd@0: set(fh4, 'Name', 'Moderated outputs'); wolffd@0: hold on wolffd@0: plot(data((label<=2),1),data(label<=2,2),'r.', 'MarkerSize', PointSize) wolffd@0: plot(data((label>2),1),data(label>2,2),'y.', 'MarkerSize', PointSize) wolffd@0: wolffd@0: [cNby, hNby] = contour(xrange, yrange, ymodg, levels, 'k-'); wolffd@0: set(hNby, 'LineWidth', 1); wolffd@0: wolffd@0: fh5 = figure; wolffd@0: set(fh5, 'Name', 'Unmoderated outputs'); wolffd@0: hold on wolffd@0: plot(data((label<=2),1),data(label<=2,2),'r.', 'MarkerSize', PointSize) wolffd@0: plot(data((label>2),1),data(label>2,2),'y.', 'MarkerSize', PointSize) wolffd@0: wolffd@0: [cNbm, hNbm] = contour(xrange, yrange, yg, levels, 'k-'); wolffd@0: set(hNbm, 'LineWidth', 1); wolffd@0: wolffd@0: disp(' ') wolffd@0: disp('Note how the moderated contours are more widely spaced. This shows') wolffd@0: disp('that there is a larger region where the outputs are close to 0.5') wolffd@0: disp('and a smaller region where the outputs are close to 0 or 1.') wolffd@0: disp(' ') wolffd@0: disp('Press any key to exit') wolffd@0: pause wolffd@0: close(fh1); wolffd@0: close(fh4); wolffd@0: close(fh5);