Decision Trees Tutorial 7: Exercise 3


Limitations of Information Gain

The dataset:

Date District House Type Income Previous
Customer
Outcome
3/10/03 Suburban Detached High No Nothing
14/9/03 Suburban Detached High Responded Nothing
2/4/02 Rural Detached High No Responded
18/1/03 Urban Semi-detached High No Responded
3/4/03 Urban Semi-detached Low No Responded
15/10/02 Urban Semi-detached Low Responded Nothing
15/10/02 Rural Semi-detached Low Responded Responded
2/3/01 Suburban Terrace High No Nothing
4/5/03 Suburban Semi-detached Low No Responded
2/1/03 Urban Terrace Low No Responded
3/10/03 Suburban Terrace Low Responded Responded
3/10/03 Rural Terrace High Responded Responded
8/4/03 Rural Detached Low No Responded
6/5/02 Urban Terrace High Responded Nothing

The Decision Tree: Interactively build it

  • Click on the root node below and start building the tree.
  • Non leaf nodes can be "pruned" once they have been chosen (by clicking on the node and selecting "prune node completely")
  • The ratios on the branches indicate how well the chosen attribute at a node splits the remaining data based on the target attribute (‘outcome’).
  • Click on any nodes to hilight the rows in the data table that the rule down to that node covers.
  • At each node, the entropy of the data at that point in the tree will be given.
  • Information gain (entropy reduction) is specified for each attribute.
  • Reducing entropy to zero is a way of building a decision tree here.
    When no more nodes can be expanded, the tree has classified all the training data.
  • Notice that the date attribute is calculated as having a high information gain.
  • This would be used as the root node in algorithms such as ID3. It splits the data effectively, but is it a good classifier? What would happen if we tried to use such a tree for prediction?
root node

4 Responses to Decision Trees Tutorial 7: Exercise 3

  1. Rob Sykes says:

    Hi there,

    I’m really enjoying the tutorials – I’m about to begin an MSc in Data Mining at Greenwich University (London) so trying to get a head start. Your site is very useful indeed!

    I may have missed the point, but on this page the date attribute is not an available option for any of the nodes. Also – and this is a minor point – the options in the grey/blue attribute selector box are being obscured somewhat by the comment text.

    I hope this is helpful.

    Rob Sykes

  2. vimal says:

    u presented ur tutorial with a good simulator, helped me 2 underatand.

    Thank you

  3. “Notice that the date attribute is calculated as having a high information gain.” Why is that? Henrik Stigell

Leave a Reply to vimal Cancel reply

Your email address will not be published. Required fields are marked *

* Copy This Password *

* Type Or Paste Password Here *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>