Decision Trees: Simplifying the decision-making process
As kids, you must have played the game of “Guess the animal”:
Let’s translate it into a graph :
This is exactly how a Decision Tree is created for any ML Problem.
A Decision Tree is a supervised learning algorithm that uses a set of binary rules to calculate the target. It is used for either classification (categorical target variable) or regression (continuous target variable) therefore also known as CART (Classification & Regression Trees).
Decision trees have three main parts:
- Root Node: The node that performs the first split
- Terminal Nodes/Leaves: Nodes that predict the outcome.
- Branches: arrows connecting nodes.
Decision trees work by repeatedly partitioning the data into multiple sub-spaces so that the outcomes in each final sub-space is as homogeneous as possible. The plot below shows sample data for two independent variables x and y. Each data point is coloured by the outcome variable, red or grey.
CART tries to split this data into subsets so that each subset is as homogeneous as possible.
The first three splits that the CART would create are shown below:
If a new observation fell into any of the subsets, it would be decided by the majority of the observations in that particular subset.
The tree generated from the above observations would appear like this:
Some real-life applications of Decision Trees include predicting customer churn, fraud detection and Credit risk scoring.
Decision trees are quite interpretable and easy to understand but they are prone to overfitting easily.
By: Parul Pandey