comparison toolboxes/FullBNT-1.0.7/docs/usage_dbn.html @ 0:e9a9cd732c1e tip

first hg version after svn
author wolffd
date Tue, 10 Feb 2015 15:05:51 +0000
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:e9a9cd732c1e
1 <HEAD>
2 <TITLE>How to use BNT for DBNs</TITLE>
3 </HEAD>
4
5 <BODY BGCOLOR="#FFFFFF">
6 <!-- white background is better for the pictures and equations -->
7
8 Documentation last updated on 7 June 2004
9
10 <h1>How to use BNT for DBNs</h1>
11
12 <p>
13 <ul>
14 <li> <a href="#spec">Model specification</a>
15 <ul>
16 <li> <a href="#hmm">HMMs</a>
17 <li> <a href="#lds">Kalman filters</a>
18 <li> <a href="#chmm">Coupled HMMs</a>
19 <li> <a href="#water">Water network</a>
20 <li> <a href="#bat">BAT network</a>
21 </ul>
22
23 <li> <a href="#inf">Inference</a>
24 <ul>
25 <li> <a href="#discrete">Discrete hidden nodes</a>
26 <li> <a href="#cts">Continuous hidden nodes</a>
27 </ul>
28
29 <li> <a href="#learn">Learning</a>
30 <ul>
31 <li> <a href="#param_learn">Parameter learning</a>
32 <li> <a href="#struct_learn">Structure learning</a>
33 </ul>
34
35 </ul>
36
37 Note:
38 you are recommended to read an introduction
39 to DBNs first, such as
40 <a href="http://www.ai.mit.edu/~murphyk/Papers/dbnchapter.pdf">
41 this book chapter</a>.
42 <br>
43 You may also want to consider using
44 <a href=http://ssli.ee.washington.edu/~bilmes/gmtk/>GMTk</a>, which is
45 an excellent C++ package for DBNs.
46
47
48 <h1><a name="spec">Model specification</h1>
49
50
51 <!--<h1><a name="dbn_intro">Dynamic Bayesian Networks (DBNs)</h1>-->
52
53 Dynamic Bayesian Networks (DBNs) are directed graphical models of stochastic
54 processes.
55 They generalise <a href="#hmm">hidden Markov models (HMMs)</a>
56 and <a href="#lds">linear dynamical systems (LDSs)</a>
57 by representing the hidden (and observed) state in terms of state
58 variables, which can have complex interdependencies.
59 The graphical structure provides an easy way to specify these
60 conditional independencies, and hence to provide a compact
61 parameterization of the model.
62 <p>
63 Note that "temporal Bayesian network" would be a better name than
64 "dynamic Bayesian network", since
65 it is assumed that the model structure does not change, but
66 the term DBN has become entrenched.
67 We also normally assume that the parameters do not
68 change, i.e., the model is time-invariant.
69 However, we can always add extra
70 hidden nodes to represent the current "regime", thereby creating
71 mixtures of models to capture periodic non-stationarities.
72 <p>
73 There are some cases where the size of the state space can change over
74 time, e.g., tracking a variable, but unknown, number of objects.
75 In this case, we need to change the model structure over time.
76 BNT does not support this.
77 <!--
78 , but see the following paper for a
79 discussion of some of the issues:
80 <ul>
81 <li> <a href="ftp://ftp.cs.monash.edu.au/pub/annn/smc.ps">
82 Dynamic belief networks for discrete monitoring</a>,
83 A. E. Nicholson and J. M. Brady.
84 IEEE Systems, Man and Cybernetics, 24(11):1593-1610, 1994.
85 </ul>
86 -->
87
88
89 <h2><a name="hmm">Hidden Markov Models (HMMs)</h2>
90
91 The simplest kind of DBN is a Hidden Markov Model (HMM), which has
92 one discrete hidden node and one discrete or continuous
93 observed node per slice. We illustrate this below.
94 As before, circles denote continuous nodes, squares denote
95 discrete nodes, clear means hidden, shaded means observed.
96 <!--
97 (The observed nodes can be
98 discrete or continuous; the crucial thing about an HMM is that the
99 hidden nodes are discrete, so the system can model arbitrary dynamics
100 -- providing, of course, that the hidden state space is large enough.)
101 -->
102 <p>
103 <img src="Figures/hmm3.gif">
104 <p>
105 We have "unrolled" the model for three "time slices" -- the structure and parameters are
106 assumed to repeat as the model is unrolled further.
107 Hence to specify a DBN, we need to
108 define the intra-slice topology (within a slice),
109 the inter-slice topology (between two slices),
110 as well as the parameters for the first two slices.
111 (Such a two-slice temporal Bayes net is often called a 2TBN.)
112 <p>
113 We can specify the topology as follows.
114 <PRE>
115 intra = zeros(2);
116 intra(1,2) = 1; % node 1 in slice t connects to node 2 in slice t
117
118 inter = zeros(2);
119 inter(1,1) = 1; % node 1 in slice t-1 connects to node 1 in slice t
120 </pre>
121 We can specify the parameters as follows,
122 where for simplicity we assume the observed node is discrete.
123 <pre>
124 Q = 2; % num hidden states
125 O = 2; % num observable symbols
126
127 ns = [Q O];
128 dnodes = 1:2;
129 bnet = mk_dbn(intra, inter, ns, 'discrete', dnodes);
130 for i=1:4
131 bnet.CPD{i} = tabular_CPD(bnet, i);
132 end
133 </pre>
134 <p>
135 We assume the distributions P(X(t) | X(t-1)) and
136 P(Y(t) | X(t)) are independent of t for t > 1.
137 Hence the CPD for nodes 5, 7, ... is the same as for node 3, so we say they
138 are in the same equivalence class, with node 3 being the "representative"
139 for this class. In other words, we have tied the parameters for nodes
140 3, 5, 7, ...
141 Similarly, nodes 4, 6, 8, ... are tied.
142 Note, however, that (the parameters for) nodes 1 and 2 are not tied to
143 subsequent slices.
144 <p>
145 Above we assumed the observation model P(Y(t) | X(t)) is independent of t for t>1, but
146 it is conventional to assume this is true for all t.
147 So we would like to put nodes 2, 4, 6, ... all in the same class.
148 We can do this by explicitely defining the equivalence classes, as
149 follows (see <a href="usage.html#tying">here</a> for more details on
150 parameter tying).
151 <p>
152 We define eclass1(i) to be the equivalence class that node i in slice
153 1 belongs to.
154 Similarly, we define eclass2(i) to be the equivalence class that node i in slice
155 2, 3, ..., belongs to.
156 For an HMM, we have
157 <pre>
158 eclass1 = [1 2];
159 eclass2 = [3 2];
160 eclass = [eclass1 eclass2];
161 </pre>
162 This ties the observation model across slices,
163 since e.g., eclass(4) = eclass(2) = 2.
164 <p>
165 By default,
166 eclass1 = 1:ss, and eclass2 = (1:ss)+ss, where ss = slice size = the
167 number of nodes per slice.
168 <!--This will tie nodes in slices 3, 4, ... to the the nodes in slice 2,
169 but none of the nodes in slice 2 to any in slice 1.-->
170 But by using the above tieing pattern,
171 we now only have 3 CPDs to specify, instead of 4:
172 <pre>
173 bnet = mk_dbn(intra, inter, ns, 'discrete', dnodes, 'eclass1', eclass1, 'eclass2', eclass2);
174 prior0 = normalise(rand(Q,1));
175 transmat0 = mk_stochastic(rand(Q,Q));
176 obsmat0 = mk_stochastic(rand(Q,O));
177 bnet.CPD{1} = tabular_CPD(bnet, 1, prior0);
178 bnet.CPD{2} = tabular_CPD(bnet, 2, obsmat0);
179 bnet.CPD{3} = tabular_CPD(bnet, 3, transmat0);
180 </pre>
181 We discuss how to do <a href="#inf">inference</a> and <a href="#learn">learning</a> on this model
182 below.
183 (See also
184 my <a href="../HMM/hmm.html">HMM toolbox</a>, which is included with BNT.)
185
186 <p>
187 Some common variants on HMMs are shown below.
188 BNT can handle all of these.
189 <p>
190 <center>
191 <table>
192 <tr>
193 <td><img src="Figures/hmm_gauss.gif">
194 <td><img src="Figures/hmm_mixgauss.gif"
195 <td><img src="Figures/hmm_ar.gif">
196 <tr>
197 <td><img src="Figures/hmm_factorial.gif">
198 <td><img src="Figures/hmm_coupled.gif"
199 <td><img src="Figures/hmm_io.gif">
200 <tr>
201 </table>
202 </center>
203
204
205
206 <h2><a name="lds">Linear Dynamical Systems (LDSs) and Kalman filters</h2>
207
208 A Linear Dynamical System (LDS) has the same topology as an HMM, but
209 all the nodes are assumed to have linear-Gaussian distributions, i.e.,
210 <pre>
211 x(t+1) = A*x(t) + w(t), w ~ N(0, Q), x(0) ~ N(init_x, init_V)
212 y(t) = C*x(t) + v(t), v ~ N(0, R)
213 </pre>
214 Some simple variants are shown below.
215 <p>
216 <center>
217 <table>
218 <tr>
219 <td><img src="Figures/ar1.gif">
220 <td><img src="Figures/sar.gif">
221 <td><img src="Figures/kf.gif">
222 <td><img src="Figures/skf.gif">
223 </table>
224 </center>
225 <p>
226
227 We can create a regular LDS in BNT as follows.
228 <pre>
229
230 intra = zeros(2);
231 intra(1,2) = 1;
232 inter = zeros(2);
233 inter(1,1) = 1;
234 n = 2;
235
236 X = 2; % size of hidden state
237 Y = 2; % size of observable state
238
239 ns = [X Y];
240 dnodes = [];
241 onodes = [2];
242 eclass1 = [1 2];
243 eclass2 = [3 2];
244 bnet = mk_dbn(intra, inter, ns, 'discrete', dnodes, 'eclass1', eclass1, 'eclass2', eclass2);
245
246 x0 = rand(X,1);
247 V0 = eye(X); % must be positive semi definite!
248 C0 = rand(Y,X);
249 R0 = eye(Y);
250 A0 = rand(X,X);
251 Q0 = eye(X);
252
253 bnet.CPD{1} = gaussian_CPD(bnet, 1, 'mean', x0, 'cov', V0, 'cov_prior_weight', 0);
254 bnet.CPD{2} = gaussian_CPD(bnet, 2, 'mean', zeros(Y,1), 'cov', R0, 'weights', C0, ...
255 'clamp_mean', 1, 'cov_prior_weight', 0);
256 bnet.CPD{3} = gaussian_CPD(bnet, 3, 'mean', zeros(X,1), 'cov', Q0, 'weights', A0, ...
257 'clamp_mean', 1, 'cov_prior_weight', 0);
258 </pre>
259 We discuss how to do <a href="#inf">inference</a> and <a href="#learn">learning</a> on this model
260 below.
261 (See also
262 my <a href="../Kalman/kalman.html">Kalman filter toolbox</a>, which is included with BNT.)
263 <p>
264
265
266 <h2><a name="chmm">Coupled HMMs</h2>
267
268 Here is an example of a coupled HMM with N=5 chains, unrolled for T=3
269 slices. Each hidden discrete node has a private observed Gaussian
270 child.
271 <p>
272 <img src="Figures/chmm5.gif">
273 <p>
274 We can make this using the function
275 <pre>
276 Q = 2; % binary hidden nodes
277 discrete_obs = 0; % cts observed nodes
278 Y = 1; % scalar observed nodes
279 bnet = mk_chmm(N, Q, Y, discrete_obs);
280 </pre>
281
282 <!--We will use this model <a href="#pred">below</a> to illustrate online prediction.-->
283
284
285
286 <h2><a name="water">Water network</h2>
287
288 Consider the following model
289 of a water purification plant, developed
290 by Finn V. Jensen, Uffe Kjærulff, Kristian G. Olesen, and Jan
291 Pedersen.
292 <!--
293 The clear nodes represent the hidden state of the system in
294 factored form, and the shaded nodes represent the observations in
295 factored form.
296 -->
297 <!--
298 (Click <a
299 href="http://www-nt.cs.berkeley.edu/home/nir/public_html/Repository/water.htm">here</a>
300 for more details on this model.
301 Following Boyen and Koller, we have added discrete evidence nodes.)
302 -->
303 <!--
304 We have "unrolled" the model for three "time slices" -- the structure and parameters are
305 assumed to repeat as the model is unrolled further.
306 Hence to specify a DBN, we need to
307 define the intra-slice topology (within a slice),
308 the inter-slice topology (between two slices),
309 as well as the parameters for the first two slices.
310 (Such a two-slice temporal Bayes net is often called a 2TBN.)
311 -->
312 <p>
313 <center>
314 <IMG SRC="Figures/water3_75.gif">
315 </center>
316 We now show how to specify this model in BNT.
317 <pre>
318 ss = 12; % slice size
319 intra = zeros(ss);
320 intra(1,9) = 1;
321 intra(3,10) = 1;
322 intra(4,11) = 1;
323 intra(8,12) = 1;
324
325 inter = zeros(ss);
326 inter(1, [1 3]) = 1; % node 1 in slice 1 connects to nodes 1 and 3 in slice 2
327 inter(2, [2 3 7]) = 1;
328 inter(3, [3 4 5]) = 1;
329 inter(4, [3 4 6]) = 1;
330 inter(5, [3 5 6]) = 1;
331 inter(6, [4 5 6]) = 1;
332 inter(7, [7 8]) = 1;
333 inter(8, [6 7 8]) = 1;
334
335 onodes = 9:12; % observed
336 dnodes = 1:ss; % discrete
337 ns = 2*ones(1,ss); % binary nodes
338 eclass1 = 1:12;
339 eclass2 = [13:20 9:12];
340 eclass = [eclass1 eclass2];
341 bnet = mk_dbn(intra, inter, ns, 'discrete', dnodes, 'eclass1', eclass1, 'eclass2', eclass2);
342 for e=1:max(eclass)
343 bnet.CPD{e} = tabular_CPD(bnet, e);
344 end
345 </pre>
346 We have tied the observation parameters across all slices.
347 Click <a href="param_tieing.html">here</a> for a more complex example
348 of parameter tieing.
349
350 <!--
351 Let X(i,t) denote the i'th hidden node in slice t,
352 and Y(i,y) denote the i'th observed node in slice t.
353 We also use the notation Nj to refer to the j'th node in the
354 unrolled network, e.g., N25 = X(1,3), N33 = Y(1,3).
355 <p>
356 We assume the distributions P(X(i,t) | X(i,t-1)) and
357 P(Y(i,t) | X(i,t)) are independent of t for t > 1 and for all i.
358 Hence the CPD for N25, N37, ... is the same as for N13, so we say they
359 are in the same equivalence class, with N13 being the "representative"
360 for this class. In other words, we have tied the parameters for nodes
361 N13, N25, N37, ...
362 Note, however, that the parameters for the nodes in the first slice
363 are not tied, so each equivalence class for nodes 1..12 contains a
364 single node.
365 <p>
366 Above we assumed P(Y(i,t) | X(i,t)) is independent of t for t>1, but
367 it is conventional to assume this is true for all t.
368 So we would like to put N9, N21, N33, ... all in the same class, and
369 similarly for the other observed nodes.
370 We can do this by explicitely defining the equivalence classes, as
371 follows.
372 <p>
373 We define eclass1(i) to be the equivalence class that node i in slice
374 1 belongs to.
375 Similarly, we define eclass2(i) to be the equivalence class that node i in slice
376 2, 3, ..., belongs to.
377 For the water model, we have
378 <pre>
379 </pre>
380 This ties the observation model across slices,
381 since e.g., eclass(9) = eclass(21) = 9, so Y(1,1) and Y(1,2) belong to the
382 same class.
383 <p>
384 By default,
385 eclass1 = 1:ss, and eclass2 = (1:ss)+ss, where ss = slice size = the
386 number of nodes per slice.
387 This will tie nodes in slices 3, 4, ... to the the nodes in slice 2,
388 but none of the nodes in slice 2 to any in slice 1.
389 By using the above tieing pattern,
390 we now only have 20 CPDs to specify, instead of 24:
391 <pre>
392 bnet = mk_dbn(intra, inter, ns, dnodes, eclass1, eclass2);
393 for e=1:max(eclass)
394 bnet.CPD{e} = tabular_CPD(bnet, e);
395 end
396 </pre>
397 -->
398
399
400
401 <h2><a name="bat">BATnet</h2>
402
403 As an example of a more complicated DBN, consider the following
404 example,
405 which is a model of a car's high level state, as might be used by
406 an automated car.
407 (The model is from Forbes, Huang, Kanazawa and Russell, "The BATmobile: Towards a
408 Bayesian Automated Taxi", IJCAI 95. The figure is from
409 Boyen and Koller, "Tractable Inference for Complex Stochastic
410 Processes", UAI98.
411 For simplicity, we only show the observed nodes for slice 2.)
412 <p>
413 <center>
414 <IMG SRC="Figures/batnet.gif">
415 </center>
416 <p>
417 Since this topology is so complicated,
418 it is useful to be able to refer to the nodes by name, instead of
419 number.
420 <pre>
421 names = {'LeftClr', 'RightClr', 'LatAct', ... 'Bclr', 'BYdotDiff'};
422 ss = length(names);
423 </pre>
424 We can specify the intra-slice topology using a cell array as follows,
425 where each row specifies a connection between two named nodes:
426 <pre>
427 intrac = {...
428 'LeftClr', 'LeftClrSens';
429 'RightClr', 'RightClrSens';
430 ...
431 'BYdotDiff', 'BcloseFast'};
432 </pre>
433 Finally, we can convert this cell array to an adjacency matrix using
434 the following function:
435 <pre>
436 [intra, names] = mk_adj_mat(intrac, names, 1);
437 </pre>
438 This function also permutes the names so that they are in topological
439 order.
440 Given this ordering of the names, we can make the inter-slice
441 connectivity matrix as follows:
442 <pre>
443 interc = {...
444 'LeftClr', 'LeftClr';
445 'LeftClr', 'LatAct';
446 ...
447 'FBStatus', 'LatAct'};
448
449 inter = mk_adj_mat(interc, names, 0);
450 </pre>
451
452 To refer to a node, we must know its number, which can be computed as
453 in the following example:
454 <pre>
455 obs = {'LeftClrSens', 'RightClrSens', 'TurnSignalSens', 'XdotSens', 'YdotSens', 'FYdotDiffSens', ...
456 'FclrSens', 'BXdotSens', 'BclrSens', 'BYdotDiffSens'};
457 for i=1:length(obs)
458 onodes(i) = strmatch(obs{i}, names);
459 end
460 onodes = sort(onodes);
461 </pre>
462 (We sort the onodes since most BNT routines assume that set-valued
463 arguments are in sorted order.)
464 We can now make the DBN:
465 <pre>
466 dnodes = 1:ss;
467 ns = 2*ones(1,ss); % binary nodes
468 bnet = mk_dbn(intra, inter, ns, 'iscrete', dnodes);
469 </pre>
470 To specify the parameters, we must know the order of the parents.
471 See the function BNT/general/mk_named_CPT for a way to do this in the
472 case of tabular nodes. For simplicity, we just generate random
473 parameters:
474 <pre>
475 for i=1:2*ss
476 bnet.CPD{i} = tabular_CPD(bnet, i);
477 end
478 </pre>
479 A complete version of this example is available in BNT/examples/dynamic/bat1.m.
480
481
482
483
484 <h1><a name="inf">Inference</h1>
485
486
487 The general inference problem for DBNs is to compute
488 P(X(i,t0) | Y(:, t1:t2)), where X(i,t) represents the i'th hidden
489 variable at time t and Y(:,t1:t2) represents all the evidence
490 between times t1 and t2.
491 There are several special cases of interest, illustrated below.
492 The arrow indicates t0: it is X(t0) that we are trying to estimate.
493 The shaded region denotes t1:t2, the available data.
494 <p>
495
496 <img src="Figures/filter.gif">
497
498 <p>
499 BNT can currently only handle offline smoothing.
500 (The HMM engine handles filtering and, to a limited extent, prediction.)
501 The usage is similar to static
502 inference engines, except now the evidence is a 2D cell array of
503 size ss*T, where ss is the number of nodes per slice (ss = slice sizee) and T is the
504 number of slices.
505 Also, 'marginal_nodes' takes two arguments, the nodes and the time-slice.
506 For example, to compute P(X(i,t) | y(:,1:T)), we proceed as follows
507 (where onodes are the indices of the observedd nodes in each slice,
508 which correspond to y):
509 <pre>
510 ev = sample_dbn(bnet, T);
511 evidence = cell(ss,T);
512 evidence(onodes,:) = ev(onodes, :); % all cells besides onodes are empty
513 [engine, ll] = enter_evidence(engine, evidence);
514 marg = marginal_nodes(engine, i, t);
515 </pre>
516
517
518 <h2><a name="discrete">Discrete hidden nodes</h2>
519
520 If all the hidden nodes are discrete,
521 we can use the junction tree algorithm to perform inference.
522 The simplest approach,
523 <tt>jtree_unrolled_dbn_inf_engine</tt>,
524 unrolls the DBN into a static network and applies jtree; however, for
525 long sequences, this
526 can be very slow and can result in numerical underflow.
527 A better approach is to apply the jtree algorithm to pairs of
528 neighboring slices at a time; this is implemented in
529 <tt>jtree_dbn_inf_engine</tt>.
530
531 <p>
532 A DBN can be converted to an HMM if all the hidden nodes are discrete.
533 In this case, you can use
534 <tt>hmm_inf_engine</tt>. This is faster than jtree for small models
535 because the constant factors of the algorithm are lower, but can be
536 exponentially slower for models with many variables
537 (e.g., > 6 binary hidden nodes).
538
539 <p>
540 The use of both
541 <tt>jtree_dbn_inf_engine</tt>
542 and
543 <tt>hmm_inf_engine</tt>
544 is deprecated.
545 A better approach is to construct a smoother engine out of lower-level
546 engines, which implement forward/backward operators.
547 You can create these engines as follows.
548 <pre>
549 engine = smoother_engine(hmm_2TBN_inf_engine(bnet));
550 or
551 engine = smoother_engine(jtree_2TBN_inf_engine(bnet));
552 </pre>
553 You then call them in the usual way:
554 <pre>
555 engine = enter_evidence(engine, evidence);
556 m = marginal_nodes(engine, nodes, t);
557 </pre>
558 Note: you must declare the observed nodes in the bnet before using
559 hmm_2TBN_inf_engine.
560
561
562 <p>
563 Unfortunately, when all the hiddden nodes are discrete,
564 exact inference takes O(2^n) time, where n is the number of hidden
565 nodes per slice,
566 even if the model is sparse.
567 The basic reason for this is that two nodes become correlated, even if
568 there is no direct connection between them in the 2TBN,
569 by virtue of sharing common ancestors in the past.
570 Hence we need to use approximations.
571 <p>
572 A popular approximate inference algorithm for discrete DBNs, known as BK, is described in
573 <ul>
574 <li>
575 <A HREF="http://robotics.Stanford.EDU/~xb/uai98/index.html">
576 Tractable inference for complex stochastic processes </A>,
577 Boyen and Koller, UAI 1998
578 <li>
579 <A HREF="http://robotics.Stanford.EDU/~xb/nips98/index.html">
580 Approximate learning of dynamic models</a>, Boyen and Koller, NIPS
581 1998.
582 </ul>
583 This approximates the belief state with a product of
584 marginals on a specified set of clusters. For example,
585 in the water network, we might use the following clusters:
586 <pre>
587 engine = bk_inf_engine(bnet, { [1 2], [3 4 5 6], [7 8] });
588 </pre>
589 This engine can now be used just like the jtree engine.
590 Two special cases of the BK algorithm are supported: 'ff' (fully
591 factored) means each node has its own cluster, and 'exact' means there
592 is 1 cluster that contains the whole slice. These can be created as
593 follows:
594 <pre>
595 engine = bk_inf_engine(bnet, 'ff');
596 engine = bk_inf_engine(bnet, 'exact');
597 </pre>
598 For pedagogical purposes, an implementation of BK-FF that uses an HMM
599 instead of junction tree is available at
600 <tt>bk_ff_hmm_inf_engine</tt>.
601
602
603
604 <h2><a name="cts">Continuous hidden nodes</h2>
605
606 If all the hidden nodes are linear-Gaussian, <em>and</em> the observed nodes are
607 linear-Gaussian,
608 the model is a <a href="http://www.cs.berkeley.edu/~murphyk/Bayes/kalman.html">
609 linear dynamical system</a> (LDS).
610 A DBN can be converted to an LDS if all the hidden nodes are linear-Gaussian
611 and if they are all persistent. In this case, you can use
612 <tt>kalman_inf_engine</tt>.
613 For more general linear-gaussian models, you can use
614 <tt>jtree_dbn_inf_engine</tt> or <tt>jtree_unrolled_dbn_inf_engine</tt>.
615
616 <p>
617 For nonlinear systems with Gaussian noise, the unscented Kalman filter (UKF),
618 due to Julier and Uhlmann, is far superior to the well-known extended Kalman
619 filter (EKF), both in theory and practice.
620 <!--
621 See
622 <A HREF="http://phoebe.robots.ox.ac.uk/default.html">"A General Method for
623 Approximating Nonlinear Transformations of
624 Probability Distributions"</A>.
625 (If the above link is down,
626 try <a href="http://www.ece.ogi.edu/~ericwan/pubs.html">Eric Wan's</a>
627 page, who has done a lot of work on the UKF.)
628 <p>
629 -->
630 The key idea of the UKF is that it is easier to estimate a Gaussian distribution
631 from a set of points than to approximate an arbitrary non-linear
632 function.
633 We start with points that are plus/minus sigma away from the mean along
634 each dimension, and then pipe them through the nonlinearity, and
635 then fit a Gaussian to the transformed points.
636 (No need to compute Jacobians, unlike the EKF!)
637
638 <p>
639 For systems with non-Gaussian noise, I recommend
640 <a href="http://www.cs.berkeley.edu/~jfgf/smc/">Particle
641 filtering</a> (PF), which is a popular sequential Monte Carlo technique.
642
643 <p>
644 The EKF can be used as a proposal distribution for a PF.
645 This method is better than either one alone.
646 See <a href="http://www.cs.berkeley.edu/~jfgf/upf.ps.gz">The Unscented Particle Filter</a>,
647 by R van der Merwe, A Doucet, JFG de Freitas and E Wan, May 2000.
648 <a href="http://www.cs.berkeley.edu/~jfgf/software.html">Matlab
649 software</a> for the UPF is also available.
650 <p>
651 Note: none of this software is part of BNT.
652
653
654
655 <h1><a name="learn">Learning</h1>
656
657 Learning in DBNs can be done online or offline.
658 Currently only offline learning is implemented in BNT.
659
660
661 <h2><a name="param_learn">Parameter learning</h2>
662
663 Offline parameter learning is very similar to learning in static networks,
664 except now the training data is a cell-array of 2D cell-arrays.
665 For example,
666 cases{l}{i,t} is the value of node i in slice t in sequence l, or []
667 if unobserved.
668 Each sequence can be a different length, and may have missing values
669 in arbitrary locations.
670 Here is a typical code fragment for using EM.
671 <pre>
672 ncases = 2;
673 cases = cell(1, ncases);
674 for i=1:ncases
675 ev = sample_dbn(bnet, T);
676 cases{i} = cell(ss,T);
677 cases{i}(onodes,:) = ev(onodes, :);
678 end
679 [bnet2, LLtrace] = learn_params_dbn_em(engine, cases, 'max_iter', 10);
680 </pre>
681 If the observed node is vector-valued and stored in an OxT array, you
682 need to assign each vector to a single cell, as in the following
683 example.
684 <pre>
685 data = [xpos(:)'; ypos(:)'];
686 ncases = 1;
687 cases = cell(1, ncases);
688 onodes = bnet.observed;
689 for i=1:ncases
690 cases{i} = cell(ss,T);
691 cases{i}(onodes,:) = num2cell(data(:,1:T), 1);
692 end
693 </pre>
694 <p>
695 For a complete code listing of how to do EM in a simple DBN, click
696 <a href="dbn_hmm_demo.m">here</a>.
697
698 <h2><a name="struct_learn">Structure learning</h2>
699
700 There is currently only one structure learning algorithm for DBNs.
701 This assumes all nodes are tabular and observed, and that there are
702 no intra-slice connections. Hence we can find the optimal set of
703 parents for each node separately, without worrying about directed
704 cycles or node orderings.
705 The function is called as follows
706 <pre>
707 inter = learn_struct_dbn_reveal(cases, ns, max_fan_in, penalty)
708 </pre>
709 A full example is given in BNT/examples/dynamic/reveal1.m.
710 Setting the penalty term to 0 gives the maximum likelihood model; this
711 is equivalent to maximizing the mutual information between parents and
712 child (in the bioinformatics community, this is known as the REVEAL
713 algorithm). A non-zero penalty invokes the BIC criterion, which
714 lessens the chance of overfitting.
715 <p>
716 <a href="http://www.bioss.sari.ac.uk/~dirk/software/DBmcmc/">
717 Dirk Husmeier has extended MCMC model selection to DBNs</a>.
718
719 </BODY>