Decision Trees Tutorial 1: A simple decision tree

An example dataset

Imagine we have the following data about a fictitious marketing strategy. Say some company sent out some promotion to various houses and recorded a few facts about each house and also whether the people responded or not:

District House Type Income Previous
Customer
Outcome
Suburban Detached High No Nothing
Suburban Detached High Yes Nothing
Rural Detached High No Responded
Urban Semi-detached High No Responded
Urban Semi-detached Low No Responded
Urban Semi-detached Low Yes Nothing
Rural Semi-detached Low Yes Responded
Suburban Terrace High No Nothing
Suburban Semi-detached Low No Responded
Urban Terrace Low No Responded
Suburban Terrace Low Yes Responded
Rural Terrace High Yes Responded
Rural Detached Low No Responded
Urban Terrace High Yes Nothing

Imagine that we had thousands and thousands of instances (records) of this stuff. Here we have only 14, but if we had a lot, then it would be reasonable to assume that there would be some patterns in it. What sort of patterns? What could we find out? Well, we might discover some underlying relationships between some of the attributes, in particular it would be good to know which factors influence whether someone responds or not. That is, which factors most strongly affect a household’s response to the promotion. In the data above for example, we can see that all rural households responded. This would be useful to know, as next time we might only have so many promotional brochures and so we would like to be selective as to where we send them in order to get the most responses. The example above is pretty trivial and we could probable analyse it manually just by looking, but the general idea if we had more instances would be to build some sort of classifier which could be used to examine the underlying relationships and make future predictions about the target concept (in this case the outcome of a mailed promotion). This is where automated building of decision trees comes in – a technique that can be used to generate some rules about data and then perform generalisation and prediction tasks.

We’ll stick with the example data above in this tutorial and use it to show how we could build a decision tree to analyse it. Of course we’re not going to ‘build’ any trees ourselves-we’d get some software to do it, or some seeds and a pot- but it’s important to examine the techniques involved.

The Decision Tree

How can a tree help here? Well, in order to generate a set of rules we can construct a decision tree. This is done top-down from a root node and involves partitioning the data into subsets that contain instances that have similar values. Doing this for the dataset above can result in such a tree:

District
|
|
Suburban
2/5
|
House Type
|
|
Detached
2/2
|
Nothing
|
Semi-detached
1/1
|
Responded
|
Terrace
1/2
|
Income
|
|
High
1/1
|
Nothing
|
Low
1/1
|
Responded
|
Rural
4/4
|
Responded
|
Urban
3/5
|
Previous Customer
|
|
No
3/3
|
Responded
|
Yes
2/2
|
Nothing

N.B. If you click on the green boxes such as “Responded” that are the leaf nodes of the tree you will see an option to highlight the matches in the data. Selecting this will highlight rows in the data that are covered by the rules of the tree ending in this leaf node. Try it!

Explanation

Ok, so the nodes in brown in the tree correspond to attributes. At each node the dataset is split into subsets based on the value of the attribute at the node. For instance, at the root node, we split the entire dataset into three subsets. One that contains only instances (rows, tuples, whatever) that have the value ‘Suburban’ for the ‘District’ attribute, one that that contains only instances where the District attribute is ‘Urban’, and one where the all the instances are ‘Rural’ for that attribute. The numbers on the branches are important here: They correspond to the number of instances in each subset that have one and only one value for the target attribute (‘Outcome’). This basically says how well the given value of the attribute we split on relates to the target attribute. What? Look at the tree – the middle branch of the first node. ‘4/4′ below ‘Rural’ indicates that all four of the instances with District=Rural have the same value for ‘Outcome’ (in this case ‘Responded’). This is good, because we have split the data using this attribute=value pairing to perfectly classify all instances that have this pairing. In other cases the value of the District attribute does not lead to a perfect, or pure subset. These ideas are related to entropy which we shall examine later. Anyway, continuing with the above tree – look at the first branch of the first node. This tells us that when District=Suburban, only 3 of 5 instances have the same value of the target attribute. In this case, it is necessary to continue splitting up this subset using other attribute tests until we have only pure subsets. The 5 instances which have District=Suburban on the left-most branch are then tested with ‘House-Type’ and are split into further subsets. The tree construction continues until purity, or until all the subsets are pure (with respect to the target attribute). When this occurs the branch terminates in a green leaf node that specifies what value the target attribute takes for all instances that have been filtered down this branch.

Rules From The Tree

Ok, so we can represent the data with a tree? So what? Well we can extract rules from the tree quite easily. Just read off the paths of all the leaf nodes. This gives us (from left to right in the tree):

A disjunction of conjunctions. This is useful for summarising the data and extracting the underlying relationships.

How can this be used for classification and prediction?

Well, say for example that we wanted to predict the outcome of mailing to a certain house. We could just of course do a look-up on our dataset to see if the characteristics of this new house matched any we had mailed to before with the assumption that the new house will respond in the same way. This won’t always be possible though, as our dataset here doesn’t represent all the possible combinations. Instead we use the decision tree to generalise.

E.g. If we know the District we’re going to mail to is Urban and the person was a previous customer, then the tree predicts that the person will not respond (Follow the attributes and values down the tree).

Practically?

Ok, so this illustrates the basic idea how we can use Decision Tree Learning. You might be thinking that this is a very small, contrived example and that really it’s all fairly random what happened. Well, yes, yes and yes, but the same basic idea is used in practical situations. Imagine if we had thousands of records of data for a concept like the one above and maybe lots more attributes, perhaps some of them with numeric attributes. We wouldn’t be able to analyse such data by just looking at it and so constructing a decision tree would help. Furthermore, the more data we have, then the more chance that we can get a real insight into the any underlying function or relationship between the attributes. This is because the tree generalises when used for predictions and we would be more confident about it’s accuracy if it had been constructed from many examples instead of just a few. There are many other complications and details to worry about, but all that can be looked at later. Right now, we are going to look at the tree building process a bit more.

One Dataset: Many Trees

For any given dataset there are a lot of possible trees that we could construct. Instead of having the root node as ‘District’, it could have been ‘Income’ for example. Likewise, the second child node could have been ‘House-Type’ instead of ‘Previous-Customer’. Have a go building a tree from this dataset on the next page. You will see that there are many possible trees that can be built to model this data. They are all perfectly legitimate decision trees. Try to find the shortest (least number of nodes).

10 Responses to Decision Trees Tutorial 1: A simple decision tree

  1. Good Example says:

    Really good example.

  2. Anonymous says:

    Good , simple and clear . Thanks

  3. Ahmad Khalil says:

    First I’d like to thank you for your effort in making this tutorial. The thing is that I found a mistake ( which I didn’t know where to report ) in “Rules From the Tree” section where the second, third, and fourth rules are wrong, and they should be as follows:

    # (District=Suburban) AND (House Type=Semi-Detached) => (Outcome = Responded)

    # (District=Suburban) AND (House Type=Terrace) AND (Income=High) => (Outcome = Nothing)

    # (District=Suburban) AND (House Type=Terrace) AND (Income=Low) => (Outcome = Responded)

    Thanks

  4. michaelN says:

    Thanks, I corrected this. Please let me know if anyone finds anymore mistakes or bugs.

  5. Dave says:

    The figure under Suburban in the tree is 3/5 Should this not be 2/5 has 2 suburbans out of 5 replied?

  6. michaelN says:

    Hey David–

    Glad you have enjoyed looking at the site… I hope I fixed the errors that people were finding.

  7. Very easy to understand , will be looking at the rest as well!

  8. michaelN says:

    thankyou, yes, I changed it.

  9. David says:

    I am taking a course on Artificial Intelligence, and Machine Learning and had trouble to understand decisions trees. Now I got an intuitive understandig of them within minutes after reading this. Thank you very much. You are doing very well in explaining the topic and have some good examples.

Leave a Reply to Anonymous Cancel reply

Your email address will not be published. Required fields are marked *

* Copy This Password *

* Type Or Paste Password Here *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>