Decision Trees Tutorial 8: Other Splitting Criteria

Decision Trees tutorial > Other splitting methods

The observation that Information Gain is unfairly biased has led to other splitting criteria being proposed.

Gain Ratio

An obvious way to negate the bias or "greediness" of Information Gain is to take into account the number of values of an attribute. This is exactly the approach that can be used. A new, improved calculation for attribute A over data S is:

Gain Ratio (S,A) = Gain(S,A)/IV(A)


The second equation for IV (A) measures the information content for the attribute A by looking at each proportion, p, of instances that take value i for the attribute.

Try the practical exercise on the next page to see how gain ratio differs from information gain.


A method used by the influential CART system that was developed in the 1980s (see Breiman, L., Freidman, J., Olshen, R., and Stone, C., 1984. Classification and regression trees. Wadsworth International, CA).

Statistical Approaches

A lot of the modern algorithms use some information-based measures adapted from Information Gain. However other methods involve the use of statistical approaches which base their tree induction algorithms on the distribution of the data. In particular, measures adopted from the classic statistical technique of chi-squared are used. This measures the amount of correlation between two variables. We use correlation tables to calculate this. As an example, here is the correlation table for the attribute Income from the Marketing data:



  Responded Nothing Totals  
High 3 4 7
Low 6 1 7
Totals 9 5 14

This table shows the class distributions for each instance according to their Income value. The totals are in the last column and row. We calculate the chi-squared distribution from the following formula:


Calculating this for our Income attribute gives:

Chi-squared(Income) = 3.7

For the other attributes the (normalised) chi-squared values are:

  • Date = 9.3333
  • District = 4.4
  • House Type = 2
  • Previous Customer = 2

The higher the value, the more correlation there is between the target attribute and the attribute used in the calculation.

Although the Date attribute is still favoured, it is not quite as highly favoured (relatively) compared with the evaulation technique of information gain (see the last exercise).

Other Methods

Several other variations have been proposed such as permutation tests (Frank & Witten 98) and multivariate splits.


Mingers, J. 1989. An Empirical Comparison of Selection Measures for Decision-Tree Induction. Machine Learning 3: 319-342.
Frank, E., Witten, I., 1998. Using a permutation test for attribute selection in decision trees.

3 Responses to Decision Trees Tutorial 8: Other Splitting Criteria

  1. maheshakya says:

    What is the E subscript (ij) in the chi – squared equation?
    (I’m still a novice in statistics)

  2. again for sharing these tutorials! Henrik Stigell

  3. Joe Zanotti says:

    each decision in the tree is modeled parametrically as is the process by which an output is generated from an input and a sequence of decisions. The resulting model yields a likelihood measure of goodness of fit, allowing ML and MAP estimation techniques to be utilized. Joe Zanotti

Leave a Reply to maheshakya Cancel reply

Your email address will not be published. Required fields are marked *

* Copy This Password *

* Type Or Paste Password Here *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>