Tutorial (15): Exercise 6
Decision Trees tutorial > Exercise 6:
Pruning
The dataset:
| District | House Type | Income | Previous Customer |
Outcome |
| Suburban | Detached | High | No | Nothing |
| Suburban | Detached | High | Yes | Nothing |
| Rural | Detached | High | No | Responded |
| Urban | Semi-detached | High | No | Responded |
| Urban | Semi-detached | Low | No | Responded |
| Urban | Semi-detached | Low | Yes | Nothing |
| Rural | Semi-detached | Low | Yes | Responded |
| Suburban | Terrace | High | No | Nothing |
| Suburban | Semi-detached | Low | No | Responded |
| Urban | Terrace | Low | No | Responded |
| Suburban | Terrace | Low | Yes | Responded |
| Rural | Terrace | High | Yes | Responded |
| Rural | Detached | Low | No | Responded |
| Urban | Terrace | High | Yes | Nothing |
The Holdout (validation) data
Select rows by clicking to move data instances (rows) between tables
| District | House Type | Income | Previous Customer |
Outcome |
The Decision Tree: Interactively build it
- Click on the root node below and start building the tree.
- Non leaf nodes can be "pruned" once they have been chosen (by clicking on the node and selecting "prune node completely")
- The ratios on the branches indicate how well the chosen attribute at a node splits the remaining data based on the target attribute ('outcome').
- Click on any nodes to hilight the rows in the data table that the rule down to that node covers.
- At each node, the entropy of the data at that point in the tree will be given.
- Information gain (entropy reduction) is specified for each attribute.
- Reducing entropy to zero is a way of building a decision tree here. When no more nodes can be expanded, the tree has classified all the training data.
- Move data into the testing set from the training set (by clicking) and then contsruct a tree. Ideally a testing set should have 33% or less of the training data (about 3 or 4 instances here).
- Prune branches by replacing them with a single leaf node in order to improve combined classification accuracy.
| root node |
Classification Errors (Totals)
|
Correct |
Incorrect |
|
|
Training set |
||
|
Validation set |
ZYgDsbtymSDWVY
SeGMSBdRcX
mpVUDTEkUHvGqsJ
jErFzSgFrR
WHatZIemHs
pIsZTfEaXuU
lQPltYQBYKW
ouAwnULkgm
QMKsZfzHmBltiINbKs
tgosxpiMeKmaI
LaZWLteccDm