How to explain decision Tree.

Dear all:

I recently build a customer propensity model using Decision Tree model
in SAS Enterprise Miner for direct marketing. It should sounds very
famaliar to all the modeling experts on this board. Basically, I have a
training dataset, which contail target var (flag = Y/N) and several
independent variables. I tried to build the model using training data
and later when applying new data sets, the model can predict the target variable for each obsearvation.

Since I am new to decision tree, I have been reading a lot about it
lately. Now I built the model but found it is difficult to explain. For
all the examples I found online, they are quite straight forward,
whether play or not play gold on a rainy day, or what kind of species
of flower it is, the leaf node always give one answer, yes or no. But
in my model, I used SAS Enterprise Miner to build. Every leaf just
gives me the percentage of Y and N. Something like in this leaf, it is
18% for Y, 82% for N. How can I explain this?

In the original data set, obs with Y is about 13%, while 87% obs is N.
So I randomly choose obs to let Y:N = 1:1. Then I set prior
probability in SAS E-miner to reflect Y:N = 13%:87% in real situation.
I also set up profit to make the model work. Is that because of this
reason that the tree leaf doesn't contain single value for target
variable?

Please help. There is no one I can ask in my office. And I am stucked
in how to explain the result.

thanks a million.

Iris.

re: How to explain decision Tree.

Hi Iris,
I'm not sure I understand what you need but I think that the leaf nodes that the SAS model has built are not terminal nodes which means that futher splitting could be done in order to produce a tree that classifies the training data better ( resulting in leaf nodes with single values). This would then mean that for the training data the tree is highly accurate ( for instance it might have 100% Y or N on a leaf node, i.e just one target value ) but then the tree will generalise poorly on other data that it was not built with. This is known as overfitting. Most tree builders and tools will not construct a tree to completion and so all the training data is classified 100% one way or the other at leaf nodes. That seems to be what your decision tree model has done with the 82 and 18% for the training data. This makes them good predictors for new data as they are not so rigid and biased to the training data. This seems to make sense from what you are saying i think.

Can anyone else offer any advice with this? I'm not totally sure as I haven't used SAS ever. Hope I helped a little and not confused things! :)

build decision tree

Dear all, Please I want to create and implement a decision tree that will predict the future or the final acedamic performance of a student, drawing the predictting parameters from year 1 and year two result. This my first time of using decision tree so please I need the guide lines or steps to follow in order to achieve this. Thank you.

re: How to explain decision Tree.

Hi Iris,

Your original dataset have Y-13% N-87%

Your objective is to figure out rules which will allow you to select maximum Y.

So when you play around with tree you will find leaf nodes with say Y-70% N-30% with the obs as 40% of starting node. So here you ll say that targeting 40% of population  I'm capturing 70% of Y. and Say 70/13 is your lift.

Choose rules where you are targeting maximum Y. One thing as per industry norms, then leaf node that you choose should have atleast 5% of the original obs.

Anks