Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations
- Authors: I.H. Witten and E. Frank
- Publisher: Morgan Kaufmann; 1st edition (October 11, 1999)
- ISBN: 1558605525
In association with the WEKA software package (see Software section), this is an introductory book on the subject of Data Mining intended aimed at Computer Science students, business people and anyone with an interest.
The book covers a large area and succeeds in presenting many different techniques and algorithms in a practical way. All the main issues such as classifier construction, overfitting, pruning, multiple model combination are discussed with a good amount of detail. The references and pointers to further reading are especially useful if any more detail is required. The end chapters are devoted to looking at the WEKA software implementation and how it is structured and coded.
The structure of the book could be better (it sometimes seems encyclopedic) in places and if you are short on time I would not recommend this as an introduction to the subject of Decision Tree Learning as simpler and more concise material is available. However, if you want just a bit more detail and coverage than any simple introduction on decision tree learning, I would definately recommend this. With the WEKA software it is unparalelled for anyone wishing to implement and use decision tree learning methods.
C4.5 Programs for Machine Learning
- Author: J.R. Quinlan
- Publisher: Morgan Kaufmann; 1st edition (January 15, 1993)
- ISBN: 1558602380
The work of Jon Ross Quinlan with his ID3 system and more recently C4.5 and See5 (see Software section) has been some of the most significant and influential in decision tree learning research. In this book, the C4.5 system is presented through the use of examples and used to illustrate various issues in the use of decision trees in machine learning. Such topics such as missing attribute values and overfitting are addressed with respect to the implementation of C4.5 and other possible solutions. The full source code is presented in C and augmented with implementation notes. The code can be downloaded from Ross Quinlan’s Homepage.