annotate toolboxes/FullBNT-1.0.7/bnt/examples/static/dtree/test_restaurants.m @ 0:cc4b1211e677 tip

initial commit to HG from Changeset: 646 (e263d8a21543) added further path and more save "camirversion.m"
author Daniel Wolff
date Fri, 19 Aug 2016 13:07:06 +0200
parents
children
rev   line source
Daniel@0 1 % Here the training data is adapted from Russell95 book. See restaurant.names for description.
Daniel@0 2 % (1) Use infomation-gain as the split testing score, we get the the same decision tree as the book Russell 95 (page 537),
Daniel@0 3 % and the Gain(Patrons) is 0.5409, equal to the result in Page 541 of Russell 95. (see below output trace)
Daniel@0 4 % (Note: the dtree in that book has small compilation error, the Type node is from YES of Hungry node, not NO.)
Daniel@0 5 % (2) Use gain-ratio (Quilan 93), the splitting defavorite attribute with more values. (e.g. the Type attribute here)
Daniel@0 6
Daniel@0 7 dtreeCPD=tree_CPD;
Daniel@0 8
Daniel@0 9 % load data
Daniel@0 10 fname = fullfile(BNT_HOME, 'examples', 'static', 'uci_data', 'restaurant', 'restaurant.data');
Daniel@0 11 data=load(fname);
Daniel@0 12 data=data';
Daniel@0 13
Daniel@0 14 %make the data be BNT compliant (values for discrete nodes are from 1-n, here n is the node size)
Daniel@0 15 % e.g. if the values are [0 1 6], they must be mapping to [1 2 3]
Daniel@0 16 %data=transform_data(data,'tmp.dat',[]); %here no cts nodes
Daniel@0 17
Daniel@0 18 % learn decision tree from data
Daniel@0 19 ns=2*ones(1,11);
Daniel@0 20 ns(5:6)=3;
Daniel@0 21 ns(9:10)=4;
Daniel@0 22 dtreeCPD1=learn_params(dtreeCPD,1:11,data,ns,[]);
Daniel@0 23
Daniel@0 24 % evaluate on data
Daniel@0 25 [score,outputs]=evaluate_tree_performance(dtreeCPD1,1:11,data,ns,[]);
Daniel@0 26 fprintf('Accuracy in training data %6.3f\n',score);
Daniel@0 27
Daniel@0 28 % show decision tree using graphpad
Daniel@0 29
Daniel@0 30
Daniel@0 31
Daniel@0 32 % --------------------------Output trace: using Information-Gain------------------------------
Daniel@0 33 % The splits are Patron, Hungry, Type, Fri/Sat
Daniel@0 34 % *********************************
Daniel@0 35 % Create node 1 split at 5 gain 0.5409 Th 0. Class 1 Cases 12 Error 6
Daniel@0 36 % Create leaf node(onecla) 2. Class 1 Cases 2 Error 0
Daniel@0 37 % Add subtree node 2 to 1. #nodes 2
Daniel@0 38 % Create leaf node(onecla) 3. Class 2 Cases 4 Error 0
Daniel@0 39 % Add subtree node 3 to 1. #nodes 3
Daniel@0 40 % Create node 4 split at 4 gain 0.2516 Th 0. Class 1 Cases 6 Error 2
Daniel@0 41 % Create leaf node(onecla) 5. Class 1 Cases 2 Error 0
Daniel@0 42 % Add subtree node 5 to 4. #nodes 5
Daniel@0 43 % Create node 6 split at 9 gain 0.5000 Th 0. Class 1 Cases 4 Error 2
Daniel@0 44 % Create leaf node(nullset) 7. Father 6 Class 1
Daniel@0 45 % Create node 8 split at 3 gain 1.0000 Th 0. Class 1 Cases 2 Error 1
Daniel@0 46 % Create leaf node(onecla) 9. Class 1 Cases 1 Error 0
Daniel@0 47 % Add subtree node 9 to 8. #nodes 9
Daniel@0 48 % Create leaf node(onecla) 10. Class 2 Cases 1 Error 0
Daniel@0 49 % Add subtree node 10 to 8. #nodes 10
Daniel@0 50 % Add subtree node 8 to 6. #nodes 10
Daniel@0 51 % Create leaf node(onecla) 11. Class 2 Cases 1 Error 0
Daniel@0 52 % Add subtree node 11 to 6. #nodes 11
Daniel@0 53 % Create leaf node(onecla) 12. Class 1 Cases 1 Error 0
Daniel@0 54 % Add subtree node 12 to 6. #nodes 12
Daniel@0 55 % Add subtree node 6 to 4. #nodes 12
Daniel@0 56 % Add subtree node 4 to 1. #nodes 12
Daniel@0 57 % ********************************
Daniel@0 58 %
Daniel@0 59 % Note:
Daniel@0 60 % ***Create node 4 split at 4 gain 0.2516 Th 0. Class 1 Cases 6 Error 2
Daniel@0 61 % This mean we create a new node number 4, it is splitting at the attribute 4, and info-gain is 0.2516,
Daniel@0 62 % "Th 0" means threshhold for splitting continous attribute, "Class 1" means the majority class at node 4 is 1,
Daniel@0 63 % and "Cases 6" means it has 6 cases attached to it, "Error 2" means it has two errors if changing the class lable of
Daniel@0 64 % all the cases in it to the majority class.
Daniel@0 65 % *** Add subtree node 12 to 6. #nodes 12
Daniel@0 66 % It means we add the child node 12 to node 6.
Daniel@0 67 % *** Create leaf node(onecla) 10. Class 2 Cases 1 Error 0
Daniel@0 68 % here 'onecla' means all cases in this node belong to one class, so no need to split further.
Daniel@0 69 % 'nullset' means no training cases belong to this node, we use its parent node majority class as its class
Daniel@0 70 %
Daniel@0 71 %
Daniel@0 72 %
Daniel@0 73 % ---------------Output trace: using GainRatio-----------------------
Daniel@0 74 % The splits are Patron, Hungry, Fri/Sat, Price
Daniel@0 75 %
Daniel@0 76 %
Daniel@0 77 % Create node 1 split at 5 gain 0.3707 Th 0. Class 1 Cases 12 Error 6
Daniel@0 78 % Create leaf node(onecla) 2. Class 1 Cases 2 Error 0
Daniel@0 79 % Add subtree node 2 to 1. #nodes 2
Daniel@0 80 % Create leaf node(onecla) 3. Class 2 Cases 4 Error 0
Daniel@0 81 % Add subtree node 3 to 1. #nodes 3
Daniel@0 82 % Create node 4 split at 4 gain 0.2740 Th 0. Class 1 Cases 6 Error 2
Daniel@0 83 % Create leaf node(onecla) 5. Class 1 Cases 2 Error 0
Daniel@0 84 % Add subtree node 5 to 4. #nodes 5
Daniel@0 85 % Create node 6 split at 3 gain 0.3837 Th 0. Class 1 Cases 4 Error 2
Daniel@0 86 % Create leaf node(onecla) 7. Class 1 Cases 1 Error 0
Daniel@0 87 % Add subtree node 7 to 6. #nodes 7
Daniel@0 88 % Create node 8 split at 6 gain 1.0000 Th 0. Class 2 Cases 3 Error 1
Daniel@0 89 % Create leaf node(onecla) 9. Class 2 Cases 2 Error 0
Daniel@0 90 % Add subtree node 9 to 8. #nodes 9
Daniel@0 91 % Create leaf node(nullset) 10. Father 8 Class 2
Daniel@0 92 % Create leaf node(onecla) 11. Class 1 Cases 1 Error 0
Daniel@0 93 % Add subtree node 11 to 8. #nodes 11
Daniel@0 94 % Add subtree node 8 to 6. #nodes 11
Daniel@0 95 % Add subtree node 6 to 4. #nodes 11
Daniel@0 96 % Add subtree node 4 to 1. #nodes 11
Daniel@0 97 %
Daniel@0 98 %