Missing values
Submitted by rajgupta85 on Sun, 2007-04-01 07:07
HI
I'm implimenting decision tree on .Net platform..can any one suggest the best and most effective way to handle missing values in dataset.
HI
I'm implimenting decision tree on .Net platform..can any one suggest the best and most effective way to handle missing values in dataset.
re: missing values
In the tutorial part of this I mentioned a couple of things you can do to handle missing attributes if you are coding a decision tree algorithm:
* Simply have another possible value that an attribute can take - 'blank'. This is then treated as any other value, with branches in a tree being labelled by this if necessary. This is useful if some meaning can potentially be attached to missing atributes.
* Replace the black value with the value that occurs most frequently in a similar context.
Computer Science
This is not an ideal solution when you have 2000 records and half of them are empty or wrong. Another case is when you have only 2-3 values in these 2000 records. Sometimes only 2-3 values are marked A and rest of 1997 values are marked B. What to do in such cases? Kindly reply and solve the problem. At least send links of resources.
Re: Missing values
In the case of missing values for many records ( e.g. half in a 2000 set ) you are always going to be screwed in whatever statistical analysis you use, not only decision trees... you will only be able to formulate incomplete descriptions and reports. In the other case with 2 or 3 marked A and the rest B, I'm not sure what you mean? Can you explain?