Daniel@0: function [net, options] = rbftrain(net, options, x, t) Daniel@0: %RBFTRAIN Two stage training of RBF network. Daniel@0: % Daniel@0: % Description Daniel@0: % NET = RBFTRAIN(NET, OPTIONS, X, T) uses a two stage training Daniel@0: % algorithm to set the weights in the RBF model structure NET. Each row Daniel@0: % of X corresponds to one input vector and each row of T contains the Daniel@0: % corresponding target vector. The centres are determined by fitting a Daniel@0: % Gaussian mixture model with circular covariances using the EM Daniel@0: % algorithm through a call to RBFSETBF. (The mixture model is Daniel@0: % initialised using a small number of iterations of the K-means Daniel@0: % algorithm.) If the activation functions are Gaussians, then the basis Daniel@0: % function widths are then set to the maximum inter-centre squared Daniel@0: % distance. Daniel@0: % Daniel@0: % For linear outputs, the hidden to output weights that give rise to Daniel@0: % the least squares solution can then be determined using the pseudo- Daniel@0: % inverse. For neuroscale outputs, the hidden to output weights are Daniel@0: % determined using the iterative shadow targets algorithm. Although Daniel@0: % this two stage procedure may not give solutions with as low an error Daniel@0: % as using general purpose non-linear optimisers, it is much faster. Daniel@0: % Daniel@0: % The options vector may have two rows: if this is the case, then the Daniel@0: % second row is passed to RBFSETBF, which allows the user to specify a Daniel@0: % different number iterations for RBF and GMM training. The optional Daniel@0: % parameters to RBFTRAIN have the following interpretations. Daniel@0: % Daniel@0: % OPTIONS(1) is set to 1 to display error values during EM training. Daniel@0: % Daniel@0: % OPTIONS(2) is a measure of the precision required for the value of Daniel@0: % the weights W at the solution. Daniel@0: % Daniel@0: % OPTIONS(3) is a measure of the precision required of the objective Daniel@0: % function at the solution. Both this and the previous condition must Daniel@0: % be satisfied for termination. Daniel@0: % Daniel@0: % OPTIONS(5) is set to 1 if the basis functions parameters should Daniel@0: % remain unchanged; default 0. Daniel@0: % Daniel@0: % OPTIONS(6) is set to 1 if the output layer weights should be should Daniel@0: % set using PCA. This is only relevant for Neuroscale outputs; default Daniel@0: % 0. Daniel@0: % Daniel@0: % OPTIONS(14) is the maximum number of iterations for the shadow Daniel@0: % targets algorithm; default 100. Daniel@0: % Daniel@0: % See also Daniel@0: % RBF, RBFERR, RBFFWD, RBFGRAD, RBFPAK, RBFUNPAK, RBFSETBF Daniel@0: % Daniel@0: Daniel@0: % Copyright (c) Ian T Nabney (1996-2001) Daniel@0: Daniel@0: % Check arguments for consistency Daniel@0: switch net.outfn Daniel@0: case 'linear' Daniel@0: errstring = consist(net, 'rbf', x, t); Daniel@0: case 'neuroscale' Daniel@0: errstring = consist(net, 'rbf', x); Daniel@0: otherwise Daniel@0: error(['Unknown output function ', net.outfn]); Daniel@0: end Daniel@0: if ~isempty(errstring) Daniel@0: error(errstring); Daniel@0: end Daniel@0: Daniel@0: % Allow options to have two rows: if this is the case, then the second row Daniel@0: % is passed to rbfsetbf Daniel@0: if size(options, 1) == 2 Daniel@0: setbfoptions = options(2, :); Daniel@0: options = options(1, :); Daniel@0: else Daniel@0: setbfoptions = options; Daniel@0: end Daniel@0: Daniel@0: if(~options(14)) Daniel@0: options(14) = 100; Daniel@0: end Daniel@0: % Do we need to test for termination? Daniel@0: test = (options(2) | options(3)); Daniel@0: Daniel@0: % Set up the basis function parameters to model the input data density Daniel@0: % unless options(5) is set. Daniel@0: if ~(logical(options(5))) Daniel@0: net = rbfsetbf(net, setbfoptions, x); Daniel@0: end Daniel@0: Daniel@0: % Compute the design (or activations) matrix Daniel@0: [y, act] = rbffwd(net, x); Daniel@0: ndata = size(x, 1); Daniel@0: Daniel@0: if strcmp(net.outfn, 'neuroscale') & options(6) Daniel@0: % Initialise output layer weights by projecting data with PCA Daniel@0: mu = mean(x); Daniel@0: [pcvals, pcvecs] = pca(x, net.nout); Daniel@0: xproj = (x - ones(ndata, 1)*mu)*pcvecs; Daniel@0: % Now use projected data as targets to compute output layer weights Daniel@0: temp = pinv([act ones(ndata, 1)]) * xproj; Daniel@0: net.w2 = temp(1:net.nhidden, :); Daniel@0: net.b2 = temp(net.nhidden+1, :); Daniel@0: % Propagate again to compute revised outputs Daniel@0: [y, act] = rbffwd(net, x); Daniel@0: end Daniel@0: Daniel@0: switch net.outfn Daniel@0: case 'linear' Daniel@0: % Sum of squares error function in regression model Daniel@0: % Solve for the weights and biases using pseudo-inverse from activations Daniel@0: Phi = [act ones(ndata, 1)]; Daniel@0: if ~isfield(net, 'alpha') Daniel@0: % Solve for the weights and biases using left matrix divide Daniel@0: temp = pinv(Phi)*t; Daniel@0: elseif size(net.alpha == [1 1]) Daniel@0: % Use normal form equation Daniel@0: hessian = Phi'*Phi + net.alpha*eye(net.nhidden+1); Daniel@0: temp = pinv(hessian)*(Phi'*t); Daniel@0: else Daniel@0: error('Only scalar alpha allowed'); Daniel@0: end Daniel@0: net.w2 = temp(1:net.nhidden, :); Daniel@0: net.b2 = temp(net.nhidden+1, :); Daniel@0: Daniel@0: case 'neuroscale' Daniel@0: % Use the shadow targets training algorithm Daniel@0: if nargin < 4 Daniel@0: % If optional input distances not passed in, then use Daniel@0: % Euclidean distance Daniel@0: x_dist = sqrt(dist2(x, x)); Daniel@0: else Daniel@0: x_dist = t; Daniel@0: end Daniel@0: Phi = [act, ones(ndata, 1)]; Daniel@0: % Compute the pseudo-inverse of Phi Daniel@0: PhiDag = pinv(Phi); Daniel@0: % Compute y_dist, distances between image points Daniel@0: y_dist = sqrt(dist2(y, y)); Daniel@0: Daniel@0: % Save old weights so that we can check the termination criterion Daniel@0: wold = netpak(net); Daniel@0: % Compute initial error (stress) value Daniel@0: errold = 0.5*(sum(sum((x_dist - y_dist).^2))); Daniel@0: Daniel@0: % Initial value for eta Daniel@0: eta = 0.1; Daniel@0: k_up = 1.2; Daniel@0: k_down = 0.1; Daniel@0: success = 1; % Force initial gradient calculation Daniel@0: Daniel@0: for j = 1:options(14) Daniel@0: if success Daniel@0: % Compute the negative error gradient with respect to network outputs Daniel@0: D = (x_dist - y_dist)./(y_dist+(y_dist==0)); Daniel@0: temp = y'; Daniel@0: neg_gradient = -2.*sum(kron(D, ones(1, net.nout)) .* ... Daniel@0: (repmat(y, 1, ndata) - repmat((temp(:))', ndata, 1)), 1); Daniel@0: neg_gradient = (reshape(neg_gradient, net.nout, ndata))'; Daniel@0: end Daniel@0: % Compute the shadow targets Daniel@0: t = y + eta*neg_gradient; Daniel@0: % Solve for the weights and biases Daniel@0: temp = PhiDag * t; Daniel@0: net.w2 = temp(1:net.nhidden, :); Daniel@0: net.b2 = temp(net.nhidden+1, :); Daniel@0: Daniel@0: % Do housekeeping and test for convergence Daniel@0: ynew = rbffwd(net, x); Daniel@0: y_distnew = sqrt(dist2(ynew, ynew)); Daniel@0: err = 0.5.*(sum(sum((x_dist-y_distnew).^2))); Daniel@0: if err > errold Daniel@0: success = 0; Daniel@0: % Restore previous weights Daniel@0: net = netunpak(net, wold); Daniel@0: err = errold; Daniel@0: eta = eta * k_down; Daniel@0: else Daniel@0: success = 1; Daniel@0: eta = eta * k_up; Daniel@0: errold = err; Daniel@0: y = ynew; Daniel@0: y_dist = y_distnew; Daniel@0: if test & j > 1 Daniel@0: w = netpak(net); Daniel@0: if (max(abs(w - wold)) < options(2) & abs(err-errold) < options(3)) Daniel@0: options(8) = err; Daniel@0: return; Daniel@0: end Daniel@0: end Daniel@0: wold = netpak(net); Daniel@0: end Daniel@0: if options(1) Daniel@0: fprintf(1, 'Cycle %4d Error %11.6f\n', j, err) Daniel@0: end Daniel@0: if nargout >= 3 Daniel@0: errlog(j) = err; Daniel@0: end Daniel@0: end Daniel@0: options(8) = errold; Daniel@0: if (options(1) >= 0) Daniel@0: disp('Warning: Maximum number of iterations has been exceeded'); Daniel@0: end Daniel@0: otherwise Daniel@0: error(['Unknown output function ', net.outfn]); Daniel@0: Daniel@0: end