So far we have considered simple approaches on basic datasets. The examples have been quite contrived and basic. We need to look at more complex situations that are dealt with in modern decision tree algorithms such as C4.5 and CART.
Missing Attribute Values
In real-life datasets it is common to find that there are missing values here and there. This may be because someone has forgot and not recorded the data properly, or maybe even because there simply was no value for a particular attribute. There are several different approaches that modern decision tree algorithms take to deal with this:
- Simply have another possible value that an attribute can take – ‘blank’. This is then treated as any other value, with branches in a tree being labelled by this if necessary. This is useful if some meaning can potentially be attached to missing atributes.
- Replace the black value with the value that occurs most frequently in a similar context.
Our example dataset only including categorical attributes. More realistically, we might have attributes which have numeric values from a continuous domain. For example, we could include the income of the househld, not simply as either High or Low, but as a numeric value. Splitting on this attribute in the tree-building process then involves using inequalities to partition the data, such that each branch follows some expression such as if value of attribute, A, is less than or equal to X, where X is some interval threshold that we have set (usually based on the range of values for the numeric attribute). A lot of other methods also exist for dealing with numeric attributes (see Fayyad & Irani 92), but mainly algorithms that can deal with them look to discretise them in some way. This means that they can then be treated just as categorical attributes.
So far we have detailed decision tree learning with respect to predicting categories. In the case of predicting continuous numeric quantities, ie. where the target attribute is from a continuous-valued domain, leaf nodes are generated which correspond to the averages of instances from the training set that correspond to the path down to the leaf node. When all the attributes are numeric then the procedure is similar to that of classical regression in which a linear weighted combination of values is used. Of course a number of other machine learning tools such as Artificial Neural Networks are adept at dealing with numeric datasets.
Incremental Decision Tree Induction
The method looked at so far involves batch learning. This means that we obtain all our data and then build a decision tree that acts as a classifier. If we get some more data after this, then we have to build the tree all over again with our old dataset together with our new data. Utgoff(88,94) has developed an ID4 algorithm which could modify a decision tree with new data dynamically, without having to reconstruct the tree from scratch (also see Kalles & Morris 95). In a situation whereby it takes a long time to construct a tree (e.g. if we had a massive dataset intially) then building the tree from scratch again when just a few more data instances arrive, might not be a good idea. Incremental deicison tree induction is a solution in such situations.
Future misclassifications that the tree might make, may have varying consequences. One type of misclassification might be more serious than another. It all depends on the problem and the data and the things we are trying to predict. For example, in medical diagnosis some misclassifications may be more serious in some way than others. Some approaches to decision tree induction allow cost measures to be attached to certain outcomes.
Fayyad, U. M., Irani, K. B., 1992. On the handling of continuous-valued attributes
in decision tree generation.
Machine Learning 8, 87–102.
Utgoff, P., 1988. Id5: An incremental id3.
In: Proceedings of the 5th National Conference on Machine Learning. Ann Arbor,MI, pp. 107–120.
Utgoff, P., 1994. An improved algorithm for incremental induction of decision trees.
In: Cohen, W., Hirsh, H. (Eds.), Proceedings of the 11th International Conference on Machine Learning. Morgan Kaufmann, New Brunswick, NJ, pp. 318–325.
Kalles, D., Morris, T., 1995. Efficient incremental induction of decision trees.
Machine Learning, 1–13.