Decision Trees Tutorial 9: Exercise 4

Exercise 4: Using Gain Ratio as a Splitting Criteria

The dataset:

Date District House Type Income Previous
3/10/03 Suburban Detached High No Nothing
14/9/03 Suburban Detached High Responded Nothing
2/4/02 Rural Detached High No Responded
18/1/03 Urban Semi-detached High No Responded
3/4/03 Urban Semi-detached Low No Responded
15/10/02 Urban Semi-detached Low Responded Nothing
15/10/02 Rural Semi-detached Low Responded Responded
2/3/01 Suburban Terrace High No Nothing
4/5/03 Suburban Semi-detached Low No Responded
2/1/03 Urban Terrace Low No Responded
3/10/03 Suburban Terrace Low Responded Responded
3/10/03 Rural Terrace High Responded Responded
8/4/03 Rural Detached Low No Responded
6/5/02 Urban Terrace High Responded Nothing

The Decision Tree: Interactively build it

  • Click on the root node below and start building the tree.
  • Non leaf nodes can be "pruned" once they have been chosen (by clicking on the node and selecting "prune node completely")
  • The ratios on the branches indicate how well the chosen attribute at a node splits the remaining data based on the target attribute (‘outcome’). s
  • Click on any nodes to hilight the rows in the data table that the rule down to that node covers.
  • At each node, the entropy of the data at that point in the tree will be given.
  • Information gain (entropy reduction) is specified for each attribute.
  • Reducing entropy to zero is a way of building a decision tree here.
    When no more nodes can be expanded, the tree has classified all the training data.
  • Notice that the date attribute is calculated as having a high information gain.
  • The gain ratio of an attribute is now also shown at each node construction phase, after the Information Gain value.
  • See how the two differ and explore the types of trees that each produces.
  • If we are to assume that the Date has no bearing on the Outcome, then which method produces the smaller trees?
root node

One Response to Decision Trees Tutorial 9: Exercise 4

  1. Dave says:

    First off: Thanks a lot for this tutorial…

     But I think there is a mistake in the example above: 

     when calculating intrinsic split info for the atribute Previous Customer, you seem to find exactly 1 (since the gain ratio is equal to gain)… But that doesn't seem possible: 8/14*log2(8/14)+6/14*log2(6/14) != 1… Even when rounding up, the value of the Gain Ratio for 'Previous Customer' should be 0.049, not 0.048…

    Also, the definition of VI in the previous page is rather obscure… shouldn't it simply be: sum(pi * log2(pi)? 

Leave a Reply

Your email address will not be published. Required fields are marked *

* Copy This Password *

* Type Or Paste Password Here *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>