comparison toolboxes/FullBNT-1.0.7/docs/usage_dbn_02nov13.html @ 0:e9a9cd732c1e tip

first hg version after svn
author wolffd
date Tue, 10 Feb 2015 15:05:51 +0000
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:e9a9cd732c1e
1 <HEAD>
2 <TITLE>How to use BNT for DBNs</TITLE>
3 </HEAD>
4
5 <BODY BGCOLOR="#FFFFFF">
6 <!-- white background is better for the pictures and equations -->
7
8 Documentation last updated on 13 November 2002
9
10 <h1>How to use BNT for DBNs</h1>
11
12 <p>
13 <ul>
14 <li> <a href="#spec">Model specification</a>
15 <ul>
16 <li> <a href="#hmm">HMMs</a>
17 <li> <a href="#lds">Kalman filters</a>
18 <li> <a href="#chmm">Coupled HMMs</a>
19 <li> <a href="#water">Water network</a>
20 <li> <a href="#bat">BAT network</a>
21 </ul>
22
23 <li> <a href="#inf">Inference</a>
24 <ul>
25 <li> <a href="#discrete">Discrete hidden nodes</a>
26 <li> <a href="#cts">Continuous hidden nodes</a>
27 </ul>
28
29 <li> <a href="#learn">Learning</a>
30 <ul>
31 <li> <a href="#param_learn">Parameter learning</a>
32 <li> <a href="#struct_learn">Structure learning</a>
33 </ul>
34
35 </ul>
36
37 Note:
38 you are recommended to read an introduction
39 to DBNs first, such as
40 <a href="http://www.ai.mit.edu/~murphyk/Papers/dbnchapter.pdf">
41 this book chapter</a>.
42
43
44 <h1><a name="spec">Model specification</h1>
45
46
47 <!--<h1><a name="dbn_intro">Dynamic Bayesian Networks (DBNs)</h1>-->
48
49 Dynamic Bayesian Networks (DBNs) are directed graphical models of stochastic
50 processes.
51 They generalise <a href="#hmm">hidden Markov models (HMMs)</a>
52 and <a href="#lds">linear dynamical systems (LDSs)</a>
53 by representing the hidden (and observed) state in terms of state
54 variables, which can have complex interdependencies.
55 The graphical structure provides an easy way to specify these
56 conditional independencies, and hence to provide a compact
57 parameterization of the model.
58 <p>
59 Note that "temporal Bayesian network" would be a better name than
60 "dynamic Bayesian network", since
61 it is assumed that the model structure does not change, but
62 the term DBN has become entrenched.
63 We also normally assume that the parameters do not
64 change, i.e., the model is time-invariant.
65 However, we can always add extra
66 hidden nodes to represent the current "regime", thereby creating
67 mixtures of models to capture periodic non-stationarities.
68 <p>
69 There are some cases where the size of the state space can change over
70 time, e.g., tracking a variable, but unknown, number of objects.
71 In this case, we need to change the model structure over time.
72 BNT does not support this.
73 <!--
74 , but see the following paper for a
75 discussion of some of the issues:
76 <ul>
77 <li> <a href="ftp://ftp.cs.monash.edu.au/pub/annn/smc.ps">
78 Dynamic belief networks for discrete monitoring</a>,
79 A. E. Nicholson and J. M. Brady.
80 IEEE Systems, Man and Cybernetics, 24(11):1593-1610, 1994.
81 </ul>
82 -->
83
84
85 <h2><a name="hmm">Hidden Markov Models (HMMs)</h2>
86
87 The simplest kind of DBN is a Hidden Markov Model (HMM), which has
88 one discrete hidden node and one discrete or continuous
89 observed node per slice. We illustrate this below.
90 As before, circles denote continuous nodes, squares denote
91 discrete nodes, clear means hidden, shaded means observed.
92 <!--
93 (The observed nodes can be
94 discrete or continuous; the crucial thing about an HMM is that the
95 hidden nodes are discrete, so the system can model arbitrary dynamics
96 -- providing, of course, that the hidden state space is large enough.)
97 -->
98 <p>
99 <img src="Figures/hmm3.gif">
100 <p>
101 We have "unrolled" the model for three "time slices" -- the structure and parameters are
102 assumed to repeat as the model is unrolled further.
103 Hence to specify a DBN, we need to
104 define the intra-slice topology (within a slice),
105 the inter-slice topology (between two slices),
106 as well as the parameters for the first two slices.
107 (Such a two-slice temporal Bayes net is often called a 2TBN.)
108 <p>
109 We can specify the topology as follows.
110 <PRE>
111 intra = zeros(2);
112 intra(1,2) = 1; % node 1 in slice t connects to node 2 in slice t
113
114 inter = zeros(2);
115 inter(1,1) = 1; % node 1 in slice t-1 connects to node 1 in slice t
116 </pre>
117 We can specify the parameters as follows,
118 where for simplicity we assume the observed node is discrete.
119 <pre>
120 Q = 2; % num hidden states
121 O = 2; % num observable symbols
122
123 ns = [Q O];
124 dnodes = 1:2;
125 bnet = mk_dbn(intra, inter, ns, 'discrete', dnodes);
126 for i=1:4
127 bnet.CPD{i} = tabular_CPD(bnet, i);
128 end
129 </pre>
130 <p>
131 We assume the distributions P(X(t) | X(t-1)) and
132 P(Y(t) | X(t)) are independent of t for t > 1.
133 Hence the CPD for nodes 5, 7, ... is the same as for node 3, so we say they
134 are in the same equivalence class, with node 3 being the "representative"
135 for this class. In other words, we have tied the parameters for nodes
136 3, 5, 7, ...
137 Similarly, nodes 4, 6, 8, ... are tied.
138 Note, however, that (the parameters for) nodes 1 and 2 are not tied to
139 subsequent slices.
140 <p>
141 Above we assumed the observation model P(Y(t) | X(t)) is independent of t for t>1, but
142 it is conventional to assume this is true for all t.
143 So we would like to put nodes 2, 4, 6, ... all in the same class.
144 We can do this by explicitely defining the equivalence classes, as
145 follows (see <a href="usage.html#tying">here</a> for more details on
146 parameter tying).
147 <p>
148 We define eclass1(i) to be the equivalence class that node i in slice
149 1 belongs to.
150 Similarly, we define eclass2(i) to be the equivalence class that node i in slice
151 2, 3, ..., belongs to.
152 For an HMM, we have
153 <pre>
154 eclass1 = [1 2];
155 eclass2 = [3 2];
156 eclass = [eclass1 eclass2];
157 </pre>
158 This ties the observation model across slices,
159 since e.g., eclass(4) = eclass(2) = 2.
160 <p>
161 By default,
162 eclass1 = 1:ss, and eclass2 = (1:ss)+ss, where ss = slice size = the
163 number of nodes per slice.
164 <!--This will tie nodes in slices 3, 4, ... to the the nodes in slice 2,
165 but none of the nodes in slice 2 to any in slice 1.-->
166 But by using the above tieing pattern,
167 we now only have 3 CPDs to specify, instead of 4:
168 <pre>
169 bnet = mk_dbn(intra, inter, ns, 'discrete', dnodes, 'eclass1', eclass1, 'eclass2', eclass2);
170 prior0 = normalise(rand(Q,1));
171 transmat0 = mk_stochastic(rand(Q,Q));
172 obsmat0 = mk_stochastic(rand(Q,O));
173 bnet.CPD{1} = tabular_CPD(bnet, 1, prior0);
174 bnet.CPD{2} = tabular_CPD(bnet, 2, obsmat0);
175 bnet.CPD{3} = tabular_CPD(bnet, 3, transmat0);
176 </pre>
177 We discuss how to do <a href="#inf">inference</a> and <a href="#learn">learning</a> on this model
178 below.
179 (See also
180 my <a href="../HMM/hmm.html">HMM toolbox</a>, which is included with BNT.)
181
182 <p>
183 Some common variants on HMMs are shown below.
184 BNT can handle all of these.
185 <p>
186 <center>
187 <table>
188 <tr>
189 <td><img src="Figures/hmm_gauss.gif">
190 <td><img src="Figures/hmm_mixgauss.gif"
191 <td><img src="Figures/hmm_ar.gif">
192 <tr>
193 <td><img src="Figures/hmm_factorial.gif">
194 <td><img src="Figures/hmm_coupled.gif"
195 <td><img src="Figures/hmm_io.gif">
196 <tr>
197 </table>
198 </center>
199
200
201
202 <h2><a name="lds">Linear Dynamical Systems (LDSs) and Kalman filters</h2>
203
204 A Linear Dynamical System (LDS) has the same topology as an HMM, but
205 all the nodes are assumed to have linear-Gaussian distributions, i.e.,
206 <pre>
207 x(t+1) = A*x(t) + w(t), w ~ N(0, Q), x(0) ~ N(init_x, init_V)
208 y(t) = C*x(t) + v(t), v ~ N(0, R)
209 </pre>
210 Some simple variants are shown below.
211 <p>
212 <center>
213 <table>
214 <tr>
215 <td><img src="Figures/ar1.gif">
216 <td><img src="Figures/sar.gif">
217 <td><img src="Figures/kf.gif">
218 <td><img src="Figures/skf.gif">
219 </table>
220 </center>
221 <p>
222
223 We can create a regular LDS in BNT as follows.
224 <pre>
225
226 intra = zeros(2);
227 intra(1,2) = 1;
228 inter = zeros(2);
229 inter(1,1) = 1;
230 n = 2;
231
232 X = 2; % size of hidden state
233 Y = 2; % size of observable state
234
235 ns = [X Y];
236 dnodes = [];
237 onodes = [2];
238 eclass1 = [1 2];
239 eclass2 = [3 2];
240 bnet = mk_dbn(intra, inter, ns, 'discrete', dnodes, 'eclass1', eclass1, 'eclass2', eclass2);
241
242 x0 = rand(X,1);
243 V0 = eye(X); % must be positive semi definite!
244 C0 = rand(Y,X);
245 R0 = eye(Y);
246 A0 = rand(X,X);
247 Q0 = eye(X);
248
249 bnet.CPD{1} = gaussian_CPD(bnet, 1, 'mean', x0, 'cov', V0, 'cov_prior_weight', 0);
250 bnet.CPD{2} = gaussian_CPD(bnet, 2, 'mean', zeros(Y,1), 'cov', R0, 'weights', C0, ...
251 'clamp_mean', 1, 'cov_prior_weight', 0);
252 bnet.CPD{3} = gaussian_CPD(bnet, 3, 'mean', zeros(X,1), 'cov', Q0, 'weights', A0, ...
253 'clamp_mean', 1, 'cov_prior_weight', 0);
254 </pre>
255 We discuss how to do <a href="#inf">inference</a> and <a href="#learn">learning</a> on this model
256 below.
257 (See also
258 my <a href="../Kalman/kalman.html">Kalman filter toolbox</a>, which is included with BNT.)
259 <p>
260
261
262 <h2><a name="chmm">Coupled HMMs</h2>
263
264 Here is an example of a coupled HMM with N=5 chains, unrolled for T=3
265 slices. Each hidden discrete node has a private observed Gaussian
266 child.
267 <p>
268 <img src="Figures/chmm5.gif">
269 <p>
270 We can make this using the function
271 <pre>
272 Q = 2; % binary hidden nodes
273 discrete_obs = 0; % cts observed nodes
274 Y = 1; % scalar observed nodes
275 bnet = mk_chmm(N, Q, Y, discrete_obs);
276 </pre>
277
278 <!--We will use this model <a href="#pred">below</a> to illustrate online prediction.-->
279
280
281
282 <h2><a name="water">Water network</h2>
283
284 Consider the following model
285 of a water purification plant, developed
286 by Finn V. Jensen, Uffe Kjærulff, Kristian G. Olesen, and Jan
287 Pedersen.
288 <!--
289 The clear nodes represent the hidden state of the system in
290 factored form, and the shaded nodes represent the observations in
291 factored form.
292 -->
293 <!--
294 (Click <a
295 href="http://www-nt.cs.berkeley.edu/home/nir/public_html/Repository/water.htm">here</a>
296 for more details on this model.
297 Following Boyen and Koller, we have added discrete evidence nodes.)
298 -->
299 <!--
300 We have "unrolled" the model for three "time slices" -- the structure and parameters are
301 assumed to repeat as the model is unrolled further.
302 Hence to specify a DBN, we need to
303 define the intra-slice topology (within a slice),
304 the inter-slice topology (between two slices),
305 as well as the parameters for the first two slices.
306 (Such a two-slice temporal Bayes net is often called a 2TBN.)
307 -->
308 <p>
309 <center>
310 <IMG SRC="Figures/water3_75.gif">
311 </center>
312 We now show how to specify this model in BNT.
313 <pre>
314 ss = 12; % slice size
315 intra = zeros(ss);
316 intra(1,9) = 1;
317 intra(3,10) = 1;
318 intra(4,11) = 1;
319 intra(8,12) = 1;
320
321 inter = zeros(ss);
322 inter(1, [1 3]) = 1; % node 1 in slice 1 connects to nodes 1 and 3 in slice 2
323 inter(2, [2 3 7]) = 1;
324 inter(3, [3 4 5]) = 1;
325 inter(4, [3 4 6]) = 1;
326 inter(5, [3 5 6]) = 1;
327 inter(6, [4 5 6]) = 1;
328 inter(7, [7 8]) = 1;
329 inter(8, [6 7 8]) = 1;
330
331 onodes = 9:12; % observed
332 dnodes = 1:ss; % discrete
333 ns = 2*ones(1,ss); % binary nodes
334 eclass1 = 1:12;
335 eclass2 = [13:20 9:12];
336 eclass = [eclass1 eclass2];
337 bnet = mk_dbn(intra, inter, ns, 'discrete', dnodes, 'eclass1', eclass1, 'eclass2', eclass2);
338 for e=1:max(eclass)
339 bnet.CPD{e} = tabular_CPD(bnet, e);
340 end
341 </pre>
342 We have tied the observation parameters across all slices.
343 Click <a href="param_tieing.html">here</a> for a more complex example
344 of parameter tieing.
345
346 <!--
347 Let X(i,t) denote the i'th hidden node in slice t,
348 and Y(i,y) denote the i'th observed node in slice t.
349 We also use the notation Nj to refer to the j'th node in the
350 unrolled network, e.g., N25 = X(1,3), N33 = Y(1,3).
351 <p>
352 We assume the distributions P(X(i,t) | X(i,t-1)) and
353 P(Y(i,t) | X(i,t)) are independent of t for t > 1 and for all i.
354 Hence the CPD for N25, N37, ... is the same as for N13, so we say they
355 are in the same equivalence class, with N13 being the "representative"
356 for this class. In other words, we have tied the parameters for nodes
357 N13, N25, N37, ...
358 Note, however, that the parameters for the nodes in the first slice
359 are not tied, so each equivalence class for nodes 1..12 contains a
360 single node.
361 <p>
362 Above we assumed P(Y(i,t) | X(i,t)) is independent of t for t>1, but
363 it is conventional to assume this is true for all t.
364 So we would like to put N9, N21, N33, ... all in the same class, and
365 similarly for the other observed nodes.
366 We can do this by explicitely defining the equivalence classes, as
367 follows.
368 <p>
369 We define eclass1(i) to be the equivalence class that node i in slice
370 1 belongs to.
371 Similarly, we define eclass2(i) to be the equivalence class that node i in slice
372 2, 3, ..., belongs to.
373 For the water model, we have
374 <pre>
375 </pre>
376 This ties the observation model across slices,
377 since e.g., eclass(9) = eclass(21) = 9, so Y(1,1) and Y(1,2) belong to the
378 same class.
379 <p>
380 By default,
381 eclass1 = 1:ss, and eclass2 = (1:ss)+ss, where ss = slice size = the
382 number of nodes per slice.
383 This will tie nodes in slices 3, 4, ... to the the nodes in slice 2,
384 but none of the nodes in slice 2 to any in slice 1.
385 By using the above tieing pattern,
386 we now only have 20 CPDs to specify, instead of 24:
387 <pre>
388 bnet = mk_dbn(intra, inter, ns, dnodes, eclass1, eclass2);
389 for e=1:max(eclass)
390 bnet.CPD{e} = tabular_CPD(bnet, e);
391 end
392 </pre>
393 -->
394
395
396
397 <h2><a name="bat">BATnet</h2>
398
399 As an example of a more complicated DBN, consider the following
400 example,
401 which is a model of a car's high level state, as might be used by
402 an automated car.
403 (The model is from Forbes, Huang, Kanazawa and Russell, "The BATmobile: Towards a
404 Bayesian Automated Taxi", IJCAI 95. The figure is from
405 Boyen and Koller, "Tractable Inference for Complex Stochastic
406 Processes", UAI98.
407 For simplicity, we only show the observed nodes for slice 2.)
408 <p>
409 <center>
410 <IMG SRC="Figures/batnet.gif">
411 </center>
412 <p>
413 Since this topology is so complicated,
414 it is useful to be able to refer to the nodes by name, instead of
415 number.
416 <pre>
417 names = {'LeftClr', 'RightClr', 'LatAct', ... 'Bclr', 'BYdotDiff'};
418 ss = length(names);
419 </pre>
420 We can specify the intra-slice topology using a cell array as follows,
421 where each row specifies a connection between two named nodes:
422 <pre>
423 intrac = {...
424 'LeftClr', 'LeftClrSens';
425 'RightClr', 'RightClrSens';
426 ...
427 'BYdotDiff', 'BcloseFast'};
428 </pre>
429 Finally, we can convert this cell array to an adjacency matrix using
430 the following function:
431 <pre>
432 [intra, names] = mk_adj_mat(intrac, names, 1);
433 </pre>
434 This function also permutes the names so that they are in topological
435 order.
436 Given this ordering of the names, we can make the inter-slice
437 connectivity matrix as follows:
438 <pre>
439 interc = {...
440 'LeftClr', 'LeftClr';
441 'LeftClr', 'LatAct';
442 ...
443 'FBStatus', 'LatAct'};
444
445 inter = mk_adj_mat(interc, names, 0);
446 </pre>
447
448 To refer to a node, we must know its number, which can be computed as
449 in the following example:
450 <pre>
451 obs = {'LeftClrSens', 'RightClrSens', 'TurnSignalSens', 'XdotSens', 'YdotSens', 'FYdotDiffSens', ...
452 'FclrSens', 'BXdotSens', 'BclrSens', 'BYdotDiffSens'};
453 for i=1:length(obs)
454 onodes(i) = stringmatch(obs{i}, names);
455 end
456 onodes = sort(onodes);
457 </pre>
458 (We sort the onodes since most BNT routines assume that set-valued
459 arguments are in sorted order.)
460 We can now make the DBN:
461 <pre>
462 dnodes = 1:ss;
463 ns = 2*ones(1,ss); % binary nodes
464 bnet = mk_dbn(intra, inter, ns, 'iscrete', dnodes);
465 </pre>
466 To specify the parameters, we must know the order of the parents.
467 See the function BNT/general/mk_named_CPT for a way to do this in the
468 case of tabular nodes. For simplicity, we just generate random
469 parameters:
470 <pre>
471 for i=1:2*ss
472 bnet.CPD{i} = tabular_CPD(bnet, i);
473 end
474 </pre>
475 A complete version of this example is available in BNT/examples/dynamic/bat1.m.
476
477
478
479
480 <h1><a name="inf">Inference</h1>
481
482
483 The general inference problem for DBNs is to compute
484 P(X(i,t0) | Y(:, t1:t2)), where X(i,t) represents the i'th hidden
485 variable at time t and Y(:,t1:t2) represents all the evidence
486 between times t1 and t2.
487 There are several special cases of interest, illustrated below.
488 The arrow indicates t0: it is X(t0) that we are trying to estimate.
489 The shaded region denotes t1:t2, the available data.
490 <p>
491
492 <img src="Figures/filter.gif">
493
494 <p>
495 BNT can currently only handle offline smoothing.
496 (The HMM engine handles filtering and, to a limited extent, prediction.)
497 The usage is similar to static
498 inference engines, except now the evidence is a 2D cell array of
499 size ss*T, where ss is the number of nodes per slice (ss = slice sizee) and T is the
500 number of slices.
501 Also, 'marginal_nodes' takes two arguments, the nodes and the time-slice.
502 For example, to compute P(X(i,t) | y(:,1:T)), we proceed as follows
503 (where onodes are the indices of the observedd nodes in each slice,
504 which correspond to y):
505 <pre>
506 ev = sample_dbn(bnet, T);
507 evidence = cell(ss,T);
508 evidence(onodes,:) = ev(onodes, :); % all cells besides onodes are empty
509 [engine, ll] = enter_evidence(engine, evidence);
510 marg = marginal_nodes(engine, i, t);
511 </pre>
512
513
514 <h2><a name="discrete">Discrete hidden nodes</h2>
515
516 If all the hidden nodes are discrete,
517 we can use the junction tree algorithm to perform inference.
518 The simplest approach,
519 <tt>jtree_unrolled_dbn_inf_engine</tt>,
520 unrolls the DBN into a static network and applies jtree; however, for
521 long sequences, this
522 can be very slow and can result in numerical underflow.
523 A better approach is to apply the jtree algorithm to pairs of
524 neighboring slices at a time; this is implemented in
525 <tt>jtree_dbn_inf_engine</tt>.
526
527 <p>
528 A DBN can be converted to an HMM if all the hidden nodes are discrete.
529 In this case, you can use
530 <tt>hmm_inf_engine</tt>. This is faster than jtree for small models
531 because the constant factors of the algorithm are lower, but can be
532 exponentially slower for models with many variables
533 (e.g., > 6 binary hidden nodes).
534
535 <p>
536 The use of both
537 <tt>jtree_dbn_inf_engine</tt>
538 and
539 <tt>hmm_inf_engine</tt>
540 is deprecated.
541 A better approach is to construct a smoother engine out of lower-level
542 engines, which implement forward/backward operators.
543 You can create these engines as follows.
544 <pre>
545 engine = smoother_engine(hmm_2TBN_inf_engine(bnet));
546 or
547 engine = smoother_engine(jtree_2TBN_inf_engine(bnet));
548 </pre>
549 You then call them in the usual way:
550 <pre>
551 engine = enter_evidence(engine, evidence);
552 m = marginal_nodes(engine, nodes, t);
553 </pre>
554 Note: you must declare the observed nodes in the bnet before using
555 hmm_2TBN_inf_engine.
556
557
558 <p>
559 Unfortunately, when all the hiddden nodes are discrete,
560 exact inference takes O(2^n) time, where n is the number of hidden
561 nodes per slice,
562 even if the model is sparse.
563 The basic reason for this is that two nodes become correlated, even if
564 there is no direct connection between them in the 2TBN,
565 by virtue of sharing common ancestors in the past.
566 Hence we need to use approximations.
567 <p>
568 A popular approximate inference algorithm for discrete DBNs, known as BK, is described in
569 <ul>
570 <li>
571 <A HREF="http://robotics.Stanford.EDU/~xb/uai98/index.html">
572 Tractable inference for complex stochastic processes </A>,
573 Boyen and Koller, UAI 1998
574 <li>
575 <A HREF="http://robotics.Stanford.EDU/~xb/nips98/index.html">
576 Approximate learning of dynamic models</a>, Boyen and Koller, NIPS
577 1998.
578 </ul>
579 This approximates the belief state with a product of
580 marginals on a specified set of clusters. For example,
581 in the water network, we might use the following clusters:
582 <pre>
583 engine = bk_inf_engine(bnet, { [1 2], [3 4 5 6], [7 8] });
584 </pre>
585 This engine can now be used just like the jtree engine.
586 Two special cases of the BK algorithm are supported: 'ff' (fully
587 factored) means each node has its own cluster, and 'exact' means there
588 is 1 cluster that contains the whole slice. These can be created as
589 follows:
590 <pre>
591 engine = bk_inf_engine(bnet, 'ff');
592 engine = bk_inf_engine(bnet, 'exact');
593 </pre>
594 For pedagogical purposes, an implementation of BK-FF that uses an HMM
595 instead of junction tree is available at
596 <tt>bk_ff_hmm_inf_engine</tt>.
597
598
599
600 <h2><a name="cts">Continuous hidden nodes</h2>
601
602 If all the hidden nodes are linear-Gaussian, <em>and</em> the observed nodes are
603 linear-Gaussian,
604 the model is a <a href="http://www.cs.berkeley.edu/~murphyk/Bayes/kalman.html">
605 linear dynamical system</a> (LDS).
606 A DBN can be converted to an LDS if all the hidden nodes are linear-Gaussian
607 and if they are all persistent. In this case, you can use
608 <tt>kalman_inf_engine</tt>.
609 For more general linear-gaussian models, you can use
610 <tt>jtree_dbn_inf_engine</tt> or <tt>jtree_unrolled_dbn_inf_engine</tt>.
611
612 <p>
613 For nonlinear systems with Gaussian noise, the unscented Kalman filter (UKF),
614 due to Julier and Uhlmann, is far superior to the well-known extended Kalman
615 filter (EKF), both in theory and practice.
616 <!--
617 See
618 <A HREF="http://phoebe.robots.ox.ac.uk/default.html">"A General Method for
619 Approximating Nonlinear Transformations of
620 Probability Distributions"</A>.
621 (If the above link is down,
622 try <a href="http://www.ece.ogi.edu/~ericwan/pubs.html">Eric Wan's</a>
623 page, who has done a lot of work on the UKF.)
624 <p>
625 -->
626 The key idea of the UKF is that it is easier to estimate a Gaussian distribution
627 from a set of points than to approximate an arbitrary non-linear
628 function.
629 We start with points that are plus/minus sigma away from the mean along
630 each dimension, and then pipe them through the nonlinearity, and
631 then fit a Gaussian to the transformed points.
632 (No need to compute Jacobians, unlike the EKF!)
633
634 <p>
635 For systems with non-Gaussian noise, I recommend
636 <a href="http://www.cs.berkeley.edu/~jfgf/smc/">Particle
637 filtering</a> (PF), which is a popular sequential Monte Carlo technique.
638
639 <p>
640 The EKF can be used as a proposal distribution for a PF.
641 This method is better than either one alone.
642 See <a href="http://www.cs.berkeley.edu/~jfgf/upf.ps.gz">The Unscented Particle Filter</a>,
643 by R van der Merwe, A Doucet, JFG de Freitas and E Wan, May 2000.
644 <a href="http://www.cs.berkeley.edu/~jfgf/software.html">Matlab
645 software</a> for the UPF is also available.
646 <p>
647 Note: none of this software is part of BNT.
648
649
650
651 <h1><a name="learn">Learning</h1>
652
653 Learning in DBNs can be done online or offline.
654 Currently only offline learning is implemented in BNT.
655
656
657 <h2><a name="param_learn">Parameter learning</h2>
658
659 Offline parameter learning is very similar to learning in static networks,
660 except now the training data is a cell-array of 2D cell-arrays.
661 For example,
662 cases{l}{i,t} is the value of node i in slice t in sequence l, or []
663 if unobserved.
664 Each sequence can be a different length, and may have missing values
665 in arbitrary locations.
666 Here is a typical code fragment for using EM.
667 <pre>
668 ncases = 2;
669 cases = cell(1, ncases);
670 for i=1:ncases
671 ev = sample_dbn(bnet, T);
672 cases{i} = cell(ss,T);
673 cases{i}(onodes,:) = ev(onodes, :);
674 end
675 [bnet2, LLtrace] = learn_params_dbn_em(engine, cases, 'max_iter', 10);
676 </pre>
677 If the observed node is vector-valued and stored in an OxT array, you
678 need to assign each vector to a single cell, as in the following
679 example.
680 <pre>
681 data = [xpos(:)'; ypos(:)'];
682 ncases = 1;
683 cases = cell(1, ncases);
684 onodes = bnet.observed;
685 for i=1:ncases
686 cases{i} = cell(ss,T);
687 cases{i}(onodes,:) = num2cell(data(:,1:T), 1);
688 end
689 </pre>
690 <p>
691 For a complete code listing of how to do EM in a simple DBN, click
692 <a href="dbn_hmm_demo.m">here</a>.
693
694 <h2><a name="struct_learn">Structure learning</h2>
695
696 There is currently only one structure learning algorithm for DBNs.
697 This assumes all nodes are tabular and observed, and that there are
698 no intra-slice connections. Hence we can find the optimal set of
699 parents for each node separately, without worrying about directed
700 cycles or node orderings.
701 The function is called as follows
702 <pre>
703 inter = learn_struct_dbn_reveal(cases, ns, max_fan_in, penalty)
704 </pre>
705 A full example is given in BNT/examples/dynamic/reveal1.m.
706 Setting the penalty term to 0 gives the maximum likelihood model; this
707 is equivalent to maximizing the mutual information between parents and
708 child (in the bioinformatics community, this is known as the REVEAL
709 algorithm). A non-zero penalty invokes the BIC criterion, which
710 lessens the chance of overfitting.
711 <p>
712 <a href="http://www.bioss.sari.ac.uk/~dirk/software/DBmcmc/">
713 Dirk Husmeier has extended MCMC model selection to DBNs</a>.
714
715 </BODY>