Decision Trees Tutorial 15: Exercise 6 : Pruning


The dataset:

District House Type Income Previous
Customer
Outcome
Suburban Detached High No Nothing
Suburban Detached High Yes Nothing
Rural Detached High No Responded
Urban Semi-detached High No Responded
Urban Semi-detached Low No Responded
Urban Semi-detached Low Yes Nothing
Rural Semi-detached Low Yes Responded
Suburban Terrace High No Nothing
Suburban Semi-detached Low No Responded
Urban Terrace Low No Responded
Suburban Terrace Low Yes Responded
Rural Terrace High Yes Responded
Rural Detached Low No Responded
Urban Terrace High Yes Nothing

The Holdout (validation) data

Move here by clicking in data table.

District House Type Income Previous
Customer
Outcome

The Decision Tree: Interactively build it

  • Click on the root node below and start building the tree.
  • Non leaf nodes can be "pruned" once they have been chosen (by clicking on the node and selecting "prune node completely")
  • The ratios on the branches indicate how well the chosen attribute at a node splits the remaining data based on the target attribute (‘outcome’).
  • Click on any nodes to hilight the rows in the data table that the rule down to that node covers.
  • At each node, the entropy of the data at that point in the tree will be given.
  • Information gain (entropy reduction) is specified for each attribute.
  • Reducing entropy to zero is a way of building a decision tree here. When no more nodes can be expanded, the tree has classified all the training data.
  • Move data into the testing set from the training set (by clicking) and then contsruct a tree. Ideally a testing set should have 33% or less of the training data (about 3 or 4 instances here).
  • Prune branches by replacing them with a single leaf node in order to improve combined classification accuracy.
root node

Classification Errors (Totals)

 

Correct

Incorrect

Training set

   

Validation set

   

One Response to Decision Trees Tutorial 15: Exercise 6 : Pruning

  1. Joe Zanotti says:

    why such diagrams are called trees, because, while they are admittedly upside down, they start from a root and have branches leading to leaves (the tips of the graph at the bottom).

Leave a Reply to Joe Zanotti Cancel reply

Your email address will not be published. Required fields are marked *

* Copy This Password *

* Type Or Paste Password Here *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>