A Beginner's guide to Regression Trees using Sklearn | Decision Trees
Greedy nature of decision trees
1. Greedy nature of decision trees
2. Equation of Regression Tree
3. Predictions in Regression Trees
4. Prediction using stratification of feature space
5. Disadvantages of predicting using stratification
6. Predicting using Tree Pruning
7. Regression Tree analysis using Sklearn
8. Finding the relation between Tree depth and Mean Square Error
Almost all the Decision Tree algorithm uses Greedy approach, that is, at every node generation, we don’t care about the final tree becoming better. We are only worried about making the best split at any given time.
Let’s understand this by taking the 2. Equation of Regression Tree
3. Predictions in Regression Trees
4. Prediction using stratification of feature space
5. Disadvantages of predicting using stratification
6. Predicting using Tree Pruning
7. Regression Tree analysis using Sklearn
8. Finding the relation between Tree depth and Mean Square Error
Striker salary dataset
example.
To simplify the calculations, let’s assume that we only have three features in the dataset, Experience
, Goals last season
and Salary
and salary is the feature that we want to predict.
On this random data, we decided to fit the decision tree model. After initially analyzing the data, we found that Experience
was the most usable factor which was clearly dividing the whole dataset into different parts.
So, rather than thinking of ahead in time, we will directly choose the Experience
as the root node at this point.
We will discuss the above example much in detail during this post.
Equation of Regression Tree
As we already know the equation of the linear regression model which is equal to the equation of a straight line. We also have an equation for Regression Tree.
$f(x) =\displaystyle \sum_{m=1}^{M} C_m.1(X \epsilon R_m)$
$where\ R_1, R_2...R_m\ are\ the\ different\ regions$
Predictions in Regression Trees
There are a few different ways in which we can predict a regression tree. In this post, we are going to discuss,- Prediction using Stratification of feature space
- Prediction using Tree Pruning
Prediction using stratification of feature space
As we have already discussed that while fitting trees we almost every time use the greedy approach. We will use the same approach here as well. At any given time of node(branch feature) determination, we want to reduce the value of theResidual Sum of Squares(RSS)
.
Mathematically the value is given by,
$For\ every\ y_i\ in\ the\ given\ region$
$RSS = \displaystyle \sum_{j=1}^{J} \sum_{i \epsilon R_j} (y_i - \widehat{y}_{R_j})^ {2}$
$and\ \widehat{y}_{R_j}\ is\ mean\ value\ of\ training\ observations\ in\ Jth\ box.$
Let’s try to understand it using the hitter's dataset
.
First, we plot the whole dataset onto a plan. For simplicity let’s consider that all the points are confined in a rectangle.
In regression trees, at each point, we have to predict two values.
- Value of the branch feature (node).
- The cutoff value at which the reduction in RSS is minimum.
Age
was dividing the region in a better way. So, the branch feature(root node) that we chose was Age
. Also, we found that the cutoff value of 4.5 years was giving optimal results.
We will apply the recursion and apply the same steps again. Since we now have two regions, R1
with salaries of hitters
with experience < 4.5 years and R2
with experience >= 4.5 years.
We now chose from these two available regions and try to divide them. The only other available feature is number of hits
. We found that there is a clear distinction between the salaries of hitters hitting 117+
hits last year in region R2
.
Finally, we chose to leave the R1
as it is as there was no clear advantage of dividing it further.
This is what our Regression tree will look like.
Xj
and the cutoff value S
such that,
$\{X|X_j < S\}\ and\ \{X|X_j \geq S\}$
leads to the maximum reduction in the RSS.
So, the equation that we want to minimize is,
$\displaystyle \sum_{i:x_i\epsilon R_1(j, s)}(y_i - \widehat{y}_{R_1})^ {2} + \sum_{i:x_i\epsilon R_2(j, s)}(y_i - \widehat{y}_{R_2})^ {2}$
This process of dividing one of the available region into smaller ones is repeated until we reach and endpoint or a proper regression value.
Disadvantages of predicting using stratification
- This algorithm might produce good results in the training data but is likely to overfit the data, leading to poor test set performance. This happens because the tree is too complex.
Machine learning models try to find a sweet spot between the overfit and underfit so that the error rate should be minimized and at the same time, it should be able to predict correctly.
Predicting using Tree Pruning
Tree Pruning isn’t only used for regression trees. We also make use of it in the classification trees as well. As the word itself suggests, the process involves cutting the tree into smaller parts. We can do pruning in two ways.- Pre-pruning or early stopping
- Post Pruning
A practical approach to Tree Pruning using sklearn | Decision Trees
#machinelearning
#sklearn
#python
#datascience
April 5, 2020
5 mins read
Regression Tree analysis using Sklearn
Let’s import the required librariesimport matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error
boston
data, which is the house pricing data of the Boston region. You can know more about the data by running
print(load_boston().DESCR)
X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
clf = DecisionTreeRegressor()
clf.fit(X_train, y_train)
predictions = clf.predict(X_test)
print(mean_squared_error(y_test, predictions))
print(np.sqrt(mean_squared_error(y_test, predictions)))
18.59688622754491
4.312410721110051
Finding the relation between Tree depth and Mean Square Error
We can train the model with different depth values using a simple Python loop and then plot a relation between them.mses = []
for depth in range(1, (clf.tree_.max_depth + 1)):
d_tree_reg = DecisionTreeRegressor(max_depth=depth)
d_tree_reg.fit(X_train, y_train)
tree_predictions = d_tree_reg.predict(X_test)
mses.append(mean_squared_error(y_test, tree_predictions))
tree_depths = [depth for depth in range(1, (clf.tree_.max_depth + 1))]
plt.figure(figsize=(10, 6))
plt.grid()
plt.plot(tree_depths, mses)
plt.xlabel("Tree Depth")
plt.ylabel("Mean Square Error")
6
, we will get the best results and minimum Mean Square Error
.
Please share on social media and subscribe to the newsletter to read more such posts.
About Author
Ranvir Singh
Greetings! Ranvir is an Engineering professional with 3+ years of experience in Software development.
Original Source: Original Post
Please share your Feedback:
Did you enjoy reading or think it can be improved? Don’t forget to leave your thoughts in the comments section below! If you liked this article, please share it with your friends, and read a few more!