Decision Trees Tutorial 12: Exercise 5


Exercise 5: Evaluating using a testing set

The dataset

District House Type Income Previous
Customer
Outcome
Suburban Detached High No Nothing
Suburban Detached High Responded Nothing
Rural Detached High No Responded
Urban Semi-detached High No Responded
Urban Semi-detached Low No Responded
Urban Semi-detached Low Responded Nothing
Rural Semi-detached Low Responded Responded
Suburban Terrace High No Nothing
Suburban Semi-detached Low No Responded
Urban Terrace Low No Responded
Suburban Terrace Low Responded Responded
Rural Terrace High Responded Responded
Rural Detached Low No Responded
Urban Terrace High Responded Nothing

The Holdout (validation) data

Select rows by clicking to move data instances (rows) between tables

District House Type Income Previous
Customer
Outcome

The Decision Tree: Interactively build it

  • Click on the root node below and start building the tree.
  • Non leaf nodes can be “pruned” once they have been chosen (by clicking on the node and selecting “prune node completely”)
  • The ratios on the branches indicate how well the chosen attribute at a node splits the remaining data based on the target attribute (‘outcome’).
  • Click on any nodes to hilight the rows in the data table that the rule down to that node covers.
  • At each node, the entropy of the data at that point in the tree will be given.
  • Information gain (entropy reduction) is specified for each attribute.
  • Reducing entropy to zero is a way of building a decision tree here.
    When no more nodes can be expanded, the tree has classified all the training data.
  • Move data into the testing set from the training set (by clicking) and then contsruct a tree. Ideally a testing set should have 33% or less of the training data (about 3 or 4 instances here).
  • Compare classification errors on the testing data for complex trees compared to simple trees.
root node

Classification Errors (Totals)

Correct Incorrect
Training set
Validation set

One Response to Decision Trees Tutorial 12: Exercise 5

  1. This technique of holdout sets to check classification errors. Henrik Stigell

Leave a Reply to Henrik Stigell Cancel reply

Your email address will not be published. Required fields are marked *

* Copy This Password *

* Type Or Paste Password Here *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>