Decision Trees Tutorial > Further topics
Combing Multiple Models
The inherent instability of top-down decision tree induction means that different training datasets from a given problem domain will produce quite different trees. One of the main directions in decision tree learning in the late 1990s was to combine multiple trees, or committees of trees in order to obtain better results than those that can be obtained from a single model. This is an approach not just restricted to decision tree learning but one that has been applied to most other Machine Learning methods.
One technique, known as Bagging, or Boostrap-Aggregating, involves producing different trees from a different sample of the problem domain (or from the training set itself) and then aggregating the results on a set of test instances. This helps to reduce the instability and variance in the decison tree learning process by combining mutlitple models. Similarly to Boostrapping, Bagging usually involves sampling (with replacement) the training data in order to produce test data which can be used to evaluate each model. Several of the other methods described previously, such as cross-validation for example, can also be used in bagging however.
Boosting is another relatively new technique that involves generating a sequence of decision trees from the same dataset whereby attention in is paid to the misclassifications of the previous tree. This results in a fixed number of decision trees which can then be used to classify new instances more accurately by aggregating all the different tree’s predictions and choosing the target class that occurs most often. Freund et al (1996) have developed the widely used AdaBoost.M1 which weights instances depending whether or not they were misclassified by previous trees. This allows attention to be directed to the instances that have caused errors in previous iterations. Although trees that use boosting are more accurate on new instances of data, they have been shown to overfit in some situations. Friedman99 has developed a way of reducing this aspect of its performance. Most high-performance modern decision tree learners use some form of boosting to generate more accurate trees. Studies have shown that Bagging and boosting can improve performance significantly (Quinlan96, Bauer99, Dietterich99 etc).
Several approaches have been examined which attempt to hybridise decision tree-learners with some other type of learner or technique. For instance, Carvalho (2000) has shown how a hybrid genetic algorithm decision tree learner can be used to find rules that are small disjuncts more effectively than a decision tree alone. This involves using Quinlan’s C4.5 learner to produce an initial tree which is then decomposed into rules which act as chromosomes in a genetic algorithm. Other researchers have also looked at this type of hybridisation. Work has also been done on combing fuzzy classifiers and neural networks with decision trees (e.g. Sethi90, Leow99, Ping02). This usually involves building a tree using one of the greedy induction algorithms, and then refining it to be as close to optimal as possible using another technique. This method may be beneficial in certain problem domains where a weighted consideration of rules is necessary, such as with medical diagnosis. Neural networks trained with back-propagation can represent a larger number of concepts than a decision tree. A decision tree though, with it’s greedy heuristics has the advantage of being able to quickly map out the general form of the concept space. A lot of the work in this area has of course been concerned with how to map the representations of decision trees to the necessary neural network representations. More recently, hybrid systems combining several different techniques have been developed such as the Neuro-fuzzy ICART algorithm (Li03).
Comparison with other machine learning methods
Decision Tree Induction is just one method in the field of machine learning. Comparisons with other techniques such as Neural Networks have been done (Atlas90, Brown93, Talmon94), but it has been shown that the accuracy of the techniques is similar. Why then use decision trees? Two main reasons can be explained for it’s popularity:
- It produces understandable tree-structures which elucidate the reasoning of the method (many other techniques lack this and are harder to interpret)
- It can be used to produce a disjunction of hypotheses for a problem.
Freund, Y., Schapire, R. E., 1996. Experiments with a new boosting algorithm.
In: Proceedings of International Conference on Machine Learning, 148–156.
Friedman, J., 1999. Greedy function approximation: A gradient boosting machine.
Technical Report, Stanford University, Department of Statistics, Palo Alto, CA.
Quinlan, J., 1996. Bagging, boosting, and C4.5.
In: Proceedings of the Thirteenth National Conference on Artificial Intelliegnce, 725–730.
Bauer, E., Kohavi, R., 1999. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants.
Machine Learning 36 (1/2), 105–139
Dietterich, T. G., 1999. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization.
Machine Learning 40 (2), 1–22.
Sethi, I., 1990. Entropy nets: from decision trees to neural networks.
In: Proceedings of the IEEE. Vol. 78, 1605–1613.
Leow, W., Setiono, R., 1999. On mapping decision trees and neural networks.
Knowledge Based Systems 12, 95–99.
Ping, Z., Lihui, C., January 2002. A novel feature extraction method and hybrid tree classification for handwritten numeral recognition.
Pattern Recognition Letters 23, 45.
Li, J., Li, E. and Yu, J. 2003. Designing neurofuzzy system based on ICART algorithm and its application for modeling jet fuel endpoint of hydrocracking process.
Engineering Applications of Artificial Intelligence 16, 11–19
Atlas, L., Cole, R., Muthuswarmy, Y., Lipman, A., Connor, J., Park, D., El-Sharkawi, M., Marks II, R., 1990. A performance comparison of trained multilayer perceptrons and trained classification trees.
In: Proceedings of the IEEE 78 (10), 1614–1619.
Brown.D.E., Corruble, V., Pittard, C., 1993. A comparison of decision tree classifiers with backpropagation neural networks for multimodal classification problems.
Pattern Recognition 26 (6), 953–961.
Talmon, J., Dassen, R., Karthaus, V., 1994. Neural nets and classification trees: A comparison in the domain of ecg analysis.
In: Gelsema, E., Kanal, L. (Eds.), Pattern Recognition in Practice IV: Multiple paradigms, Comparative studies and hybrid systems. Vol. 16 of Machine Intelligence and Pattern Recognition, 415–423.
Other Recommended Reading
Dietterich, T. G., 1998. Machine-learning research: Four current directions.
The AI Magazine 18 (4), 97–136.
Fleischer, R., June 1999. Decision trees: old and new results.
Information and Computation 152, 44.