Tutorial (2): Exercise 1

Constructing a Simple Tree

The dataset:

District House Type Income Previous Customer Outcome
Suburban Detached High No Nothing
Suburban Detached High Yes Nothing
Rural Detached High No Responded
Urban Semi-detached High No Responded
Urban Semi-detached Low No Responded
Urban Semi-detached Low Yes Nothing
Rural Semi-detached Low Yes Responded
Suburban Terrace High No Nothing
Suburban Semi-detached Low No Responded
Urban Terrace Low No Responded
Suburban Terrace Low Yes Responded
Rural Terrace High Yes Responded
Rural Detached Low No Responded
Urban Terrace High Yes Nothing

The Decision Tree: Interactively build it

  • Click on the root node below and
    start building the tree.
  • Non leaf nodes can be "pruned" once they have been chosen (by clicking on the node and selecting "prune node completely")
  • The ratios on the branches indicate how well the chosen attribute at a node splits the remaining data based on the target attribute ('outcome').
  • Click on any nodes to hilight the rows in the data table that the rule down to that node covers.
Click here to begin

As with all Decision Making

As with all Decision Making methods, decision tree analysis should be used in conjunction with common sense - decision trees are just one important part of your Decision Making tool kit.

Beautifully detailed. Great

Beautifully detailed. Great solution to me.
New Homes Tampa

The link to build a tree didn't work

Hi,

I tried to simulate your software to build a tree (clicking on the root node) but nothing happened.
Could you help me?

Thanks in advance,

Leo

Thank you

Thank you for a great tutorial!

There seems to be a minor error on this page. When we choose Income at the root node, the "High" branch says "4/7" instead of "3/7".

re: Thank you

4/7 for income = "high" means the data for the target attribute is split 4/7 into 1 particular value and for "low it is 3/7... this means it is split 3/7 into particular value... I hope that helps ( i.e. 4/7 does not mean 4 is income = high and 3 = income = low, the ratio refers to the target attribute ). Thanks.

With root as Income (error)

Hey, I guess when we select root as high ... it really should be 3/7 since 3 out of total of 7 high income instances only 3 responded (which is our target attribute) .... Please correct me if am wrong. Thanks :-)

re: node splits on Income

Ok so we are interested in the entropy at each node split as we are trying to reduce it and so this is the probability ( as in 6/7 ) that we show here. We are not actually bothered what the value of the target attribute is, whether it is "responded" or "nothing" is really not of concern, as we are just interested in how the data splits in a way that reduces entropy and max's infomation gain. The values could be anything, the algorithm does not make any semantic difference between the output values, it just cares how the data is split whatever the actual values are. Hope this helps a bit. Michael

Adverts