The Smallest Tree
If you did the practical exercise on the previous page, you will notice that the smallest complete tree that it is possible to build from the is this:
There were other trees that you could build, some of them much larger and more complex than this one. The interesting thing is though, is that each of the trees describes the data perfectly - no instances are left unaccounted for. Each tree has no exceptions and is a perfectly valid hypothesis over the data. In that case, the shorter hypothesis is preferred. Why?
This principle thought up a long time ago by William Occam while shaving(!) states that the shortest hypothesis, or solution to a problem, should be the one we should prefer (over longer more complicated ones). This is one of the fundamental tenet's of the way western science works and has received much debate and controversy. Let's just say that most of the stuff concerning decision trees follows the idea of Occam's razor and looks to always contstruct small, simple, easily-understandable tree structures.
Why not just construct the shortest tree each time each time then for every dataset? As simple as it was for our dataset, in the general case this isn't so simple and can take a very long time on some datasets. Formally, this has shown to be an NP-complete problem which means that no computer algorithm can solve this in a sensible amount of time (in the general case). Therefore algorithms have been developed which find a good tree, usually very close to the best. It's now necessary to start examining some of these decision tree algorithms in more detail.