matthiasm@8: function dag = learn_struct_K2(data, ns, order, varargin) matthiasm@8: % LEARN_STRUCT_K2 Greedily learn the best structure compatible with a fixed node ordering matthiasm@8: % best_dag = learn_struct_K2(data, node_sizes, order, ...) matthiasm@8: % matthiasm@8: % data(i,m) = value of node i in case m (can be a cell array). matthiasm@8: % node_sizes(i) is the size of node i. matthiasm@8: % order(i) is the i'th node in the topological ordering. matthiasm@8: % matthiasm@8: % The following optional arguments can be specified in the form of name/value pairs: matthiasm@8: % [default value in brackets] matthiasm@8: % matthiasm@8: % max_fan_in - this the largest number of parents we allow per node [N] matthiasm@8: % scoring_fn - 'bayesian' or 'bic' [ 'bayesian' ] matthiasm@8: % Currently, only networks with all tabular nodes support Bayesian scoring. matthiasm@8: % type - type{i} is the type of CPD to use for node i, where the type is a string matthiasm@8: % of the form 'tabular', 'noisy_or', 'gaussian', etc. [ all cells contain 'tabular' ] matthiasm@8: % params - params{i} contains optional arguments passed to the CPD constructor for node i, matthiasm@8: % or [] if none. [ all cells contain {'prior', 1}, meaning use uniform Dirichlet priors ] matthiasm@8: % discrete - the list of discrete nodes [ 1:N ] matthiasm@8: % clamped - clamped(i,m) = 1 if node i is clamped in case m [ zeros(N, ncases) ] matthiasm@8: % verbose - 'yes' means display output while running [ 'no' ] matthiasm@8: % matthiasm@8: % e.g., dag = learn_struct_K2(data, ns, order, 'scoring_fn', 'bic', 'params', []) matthiasm@8: % matthiasm@8: % To be backwards compatible with BNT2, you can also specify arguments as follows matthiasm@8: % dag = learn_struct_K2(data, node_sizes, order, max_fan_in) matthiasm@8: % matthiasm@8: % This algorithm is described in matthiasm@8: % - Cooper and Herskovits, "A Bayesian method for the induction of probabilistic matthiasm@8: % networks from data", Machine Learning Journal 9:308--347, 1992 matthiasm@8: matthiasm@8: [n ncases] = size(data); matthiasm@8: matthiasm@8: % set default params matthiasm@8: type = cell(1,n); matthiasm@8: params = cell(1,n); matthiasm@8: for i=1:n matthiasm@8: type{i} = 'tabular'; matthiasm@8: %params{i} = { 'prior', 1 }; matthiasm@8: params{i} = { 'prior_type', 'dirichlet', 'dirichlet_weight', 1 }; matthiasm@8: end matthiasm@8: scoring_fn = 'bayesian'; matthiasm@8: discrete = 1:n; matthiasm@8: clamped = zeros(n, ncases); matthiasm@8: matthiasm@8: max_fan_in = n; matthiasm@8: verbose = 0; matthiasm@8: matthiasm@8: args = varargin; matthiasm@8: nargs = length(args); matthiasm@8: if length(args) > 0 matthiasm@8: if isstr(args{1}) matthiasm@8: for i=1:2:nargs matthiasm@8: switch args{i}, matthiasm@8: case 'verbose', verbose = strcmp(args{i+1}, 'yes'); matthiasm@8: case 'max_fan_in', max_fan_in = args{i+1}; matthiasm@8: case 'scoring_fn', scoring_fn = args{i+1}; matthiasm@8: case 'type', type = args{i+1}; matthiasm@8: case 'discrete', discrete = args{i+1}; matthiasm@8: case 'clamped', clamped = args{i+1}; matthiasm@8: case 'params', if isempty(args{i+1}), params = cell(1,n); else params = args{i+1}; end matthiasm@8: end matthiasm@8: end matthiasm@8: else matthiasm@8: max_fan_in = args{1}; matthiasm@8: end matthiasm@8: end matthiasm@8: matthiasm@8: dag = zeros(n,n); matthiasm@8: matthiasm@8: for i=1:n matthiasm@8: ps = []; matthiasm@8: j = order(i); matthiasm@8: u = find(clamped(j,:)==0); matthiasm@8: score = score_family(j, ps, type{j}, scoring_fn, ns, discrete, data(:,u), params{j}); matthiasm@8: if verbose, fprintf('\nnode %d, empty score %6.4f\n', j, score); end matthiasm@8: done = 0; matthiasm@8: while ~done & (length(ps) <= max_fan_in) matthiasm@8: pps = mysetdiff(order(1:i-1), ps); % potential parents matthiasm@8: nps = length(pps); matthiasm@8: pscore = zeros(1, nps); matthiasm@8: for pi=1:nps matthiasm@8: p = pps(pi); matthiasm@8: pscore(pi) = score_family(j, [ps p], type{j}, scoring_fn, ns, discrete, data(:,u), params{j}); matthiasm@8: if verbose, fprintf('considering adding %d to %d, score %6.4f\n', p, j, pscore(pi)); end matthiasm@8: end matthiasm@8: [best_pscore, best_p] = max(pscore); matthiasm@8: best_p = pps(best_p); matthiasm@8: if best_pscore > score matthiasm@8: score = best_pscore; matthiasm@8: ps = [ps best_p]; matthiasm@8: if verbose, fprintf('* adding %d to %d, score %6.4f\n', best_p, j, best_pscore); end matthiasm@8: else matthiasm@8: done = 1; matthiasm@8: end matthiasm@8: end matthiasm@8: if ~isempty(ps) % need this check for matlab 5.2 matthiasm@8: dag(ps, j) = 1; matthiasm@8: end matthiasm@8: end matthiasm@8: matthiasm@8: matthiasm@8: matthiasm@8: