Decision Trees Tutorial > Evaluating decision trees
How can we measure how good a decision tree is?
We have seen so far that there are differing approaches in constructing decision trees which lead to quite different trees being produced. There are several criteria that we can use in order to evaluate one decision tree algorithm for a given dataset over another algorithm on the same dataset. We might be interested in how concise the decision tree is that the algorithm produces- as we may want to be able to easily understand the rules from the tree. We might also look at how fast an algorithm is. Generally though, we will be interested in an algorithm’s learning ability or how accurate it’s predictions are. In this case, we need to look at some measures that evaluate a given tree’s performance in classifiying previously unseen instances from a given problem. We cannot base a measure of the tree’s accuracy on how well it performs on the training set, as this has been used in constructing the tree! There are several techniques that we can use.
In some cases where the dataset is relatively small, it is necessary to set some of the training data aside in order to use it as test data. This allows us to see the accuracy of a tree on hitherto unseen samples. In this situation we must be careful to ensure that the data used for training is representative of the underlying function or distribution. For instance, we may have a situation whereby one class is totally underrepresented in the training data and so error rates will be high when we come to use the test data on the tree built from this training data. In order to avoid this, random sampling can be used to ensure a more balanced representation of all possible classes seen in the dataset. This technique is known as stratified holdout and can be used to lessen the effect of any bias in the data of one particular class. This method can be explored in the next exercise. Here you can specify a holdout testing set and then see misclassfication rates on this set.
K-fold cross validation
In order to confidently lessen the effects of algorithmic bias, a way of performing repeated training and testing is possible. This often used procedure involves splitting the training data into equally sized mutually exclusive subsets or folds. Each one of the subsets is then used in turn as a testing set after all the other sets combined have been the training set on which a tree has been built. This cross validation procedure allows mean error rates to be calculated which gives a useful insight into tree accuracy.
Leave 1 out validation
This technique is simply k-fold cross validation whereby k is the number of data instances. This has the advantage of allowing the largest amount of training data to be used in each run and conversely means that the testing procedure is deterministic. With large datasets this is computationally infeasible however and in certain situations the deterministic nature of the testing results in weird errors.
The central idea behind this method of validation is one underpinned by the statistical technique of sampling with replacement. Here a testing set is chosen from instances of the training set that are replaced once chosen. Error rates for training and testing sets are then combined to give a truer representation of the actual error. This method has been proposed in the form of the .632 bootstrap which involves taking 63.2% of the training data (which could include duplicates that have already been chosen) and using it in testing.
The next exercise will use the technique of holdout sets to check classification errors. See what type of trees will result in the fewest classification errors.