What have we looked at?

  • Decision tree learning is one of the most important techniques in machine learning and data mining. It is a supervised technique that is often used when a disjunction of hypothese is required or when dealing (not exclusively) with categorical attributes.
  • We build decision trees in order to capture underlying relaltionships in a datset. This can help us in classification and prediction as well as in data visualisation. It is preferrable largely because of the intuitive tree representationsof data that it produces.
  • Many possible trees can be built that perfectly classify a given dataset. It is often preferrable to have small trees as they are easier to understand.
  • Various algorithms exist to construct Decision Trees. All of them need some sort of criteria for selecting attributes to split data on. We looked at Information Gain, Gain Ratio and the use of the chi-squared distribution.
  • Overfitting is a problem which can occur when constructing trees that capture the training data too closely.
  • We can prune decision trees to stop them overfitting the training data. This leads to better accuracy in prediction as well as smaller trees.
  • Hybrid methods exist with other machine learning schemes.
  • Decision Tree Learning compares well with other machine learning methods.

Hopefully you will have grasped the concepts and techniques without being too bored :) Thanks!

