Decision Trees Tutorial 2: Exercise 1


Constructing a Simple Tree

The dataset:

District House Type Income Previous Customer Outcome
Suburban Detached High No Nothing
Suburban Detached High Yes Nothing
Rural Detached High No Responded
Urban Semi-detached High No Responded
Urban Semi-detached Low No Responded
Urban Semi-detached Low Yes Nothing
Rural Semi-detached Low Yes Responded
Suburban Terrace High No Nothing
Suburban Semi-detached Low No Responded
Urban Terrace Low No Responded
Suburban Terrace Low Yes Responded
Rural Terrace High Yes Responded
Rural Detached Low No Responded
Urban Terrace High Yes Nothing

The Decision Tree: Interactively build it

  • Click on the root node below and
    start building the tree.
  • Non leaf nodes can be “pruned” once they have been chosen (by clicking on the node and selecting “prune node completely”)
  • The ratios on the branches indicate how well the chosen attribute at a node splits the remaining data based on the target attribute (‘outcome’).
  • Click on any nodes to hilight the rows in the data table that the rule down to that node covers.

 

root node

6 Responses to Decision Trees Tutorial 2: Exercise 1

  1. Anonymous says:

    4/7 for income = “high” means the data for the target attribute is split 4/7 into 1 particular value and for “low it is 3/7… this means it is split 3/7 into particular value… I hope that helps ( i.e. 4/7 does not mean 4 is income = high and 3 = income = low, the ratio refers to the target attribute ). Thanks.

  2. Ms.MJ says:

    Hey, I guess when we select root as high … it really should be 3/7 since 3 out of total of 7 high income instances only 3 responded (which is our target attribute) …. Please correct me if am wrong. Thanks :-)

  3. michaelN says:

    Ok so we are interested in the entropy at each node split as we are trying to reduce it and so this is the probability ( as in 6/7 ) that we show here. We are not actually bothered what the value of the target attribute is, whether it is “responded” or “nothing” is really not of concern, as we are just interested in how the data splits in a way that reduces entropy and max’s infomation gain. The values could be anything, the algorithm does not make any semantic difference between the output values, it just cares how the data is split whatever the actual values are. Hope this helps a bit. Michael

  4. Usama Hasan says:

    Thank you for a great tutorial!

    There seems to be a minor error on this page. When we choose Income at the root node, the “High” branch says “4/7″ instead of “3/7″.

  5. As with all Decision Making methods, decision tree analysis should be used in conjunction with common sense – decision trees are just one important part of your Decision Making tool kit.

  6. leo says:

    Hi,

    I tried to simulate your software to build a tree (clicking on the root node) but nothing happened.
    Could you help me?

    Thanks in advance,

    Leo

Leave a Reply to Ms.MJ Cancel reply

Your email address will not be published. Required fields are marked *

* Copy This Password *

* Type Or Paste Password Here *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>