Decision Trees tutorial > Other splitting methods
The observation that Information Gain is unfairly biased has led to other splitting criteria being proposed.
An obvious way to negate the bias or "greediness" of Information Gain is to take into account the number of values of an attribute. This is exactly the approach that can be used. A new, improved calculation for attribute A over data S is:
The second equation for IV (A) measures the information content for the attribute A by looking at each proportion, p, of instances that take value i for the attribute.
Try the practical exercise on the next page to see how gain ratio differs from information gain.
A method used by the influential CART system that was developed in the 1980s (see Breiman, L., Freidman, J., Olshen, R., and Stone, C., 1984. Classification and regression trees. Wadsworth International, CA).
A lot of the modern algorithms use some information-based measures adapted from Information Gain. However other methods involve the use of statistical approaches which base their tree induction algorithms on the distribution of the data. In particular, measures adopted from the classic statistical technique of chi-squared are used. This measures the amount of correlation between two variables. We use correlation tables to calculate this. As an example, here is the correlation table for the attribute Income from the Marketing data:
This table shows the class distributions for each instance according to their Income value. The totals are in the last column and row. We calculate the chi-squared distribution from the following formula:
Calculating this for our Income attribute gives:
For the other attributes the (normalised) chi-squared values are:
- Date = 9.3333
- District = 4.4
- House Type = 2
- Previous Customer = 2
The higher the value, the more correlation there is between the target attribute and the attribute used in the calculation.
Although the Date attribute is still favoured, it is not quite as highly favoured (relatively) compared with the evaulation technique of information gain (see the last exercise).
Several other variations have been proposed such as permutation tests (Frank & Witten 98) and multivariate splits.
Mingers, J. 1989. An Empirical Comparison of Selection Measures for Decision-Tree Induction. Machine Learning 3: 319-342.
Frank, E., Witten, I., 1998. Using a permutation test for attribute selection in decision trees.