How can decision tree be used to implement a regressor in Python?


Decision tree is the basic building block of the random forest algorithm. It is considered as one of the most popular algorithms in machine learning and is used for classification purposes. The decision given out by a decision tree can be used to explain why a certain prediction was made. This means the in and out of the process would be clear to the user. They are also known as CART, i.e Classification And Regression Trees. It can be visualized as a binary tree (the one studied in data structures and algorithms).

Every node in the tree represents a single input variable, and the leaf nodes (which are also known as terminal nodes) contain output variable. These leaf nodes are used to make the prediction on the node. When a decision tree is being created, the basic idea is that the given space is being divided into multiple sections. All the values are put up and different splits are tried so as to attain less cost and best prediction values. These values are chosen in a greedy manner.

Splitting up of these nodes goes on until the maximum depth of the tree is reached. The idea behind using decision tree is to divide the input dataset into smaller dataset based on specific feature value until every target variable falls under one single category. This split is made so as to get the maximum information gain for every step.

Every decision tree begins with a root, and this is the place where the first split is made. An efficient way should be devised to ensure that the nodes are defined.

This is where Gini value comes into picture. Gini is considered to be one of the most commonly used measurement to measure inequality. Inequality refers to the target class (output) which every subset in a node may belong to.

The below syntax is used while using DecisionTreeRegressor −

class sklearn.tree.DecisionTreeRegressor (*, criterion=’mse’,…

Let us understand how DecisionTreeRegressor can be used −

Example

from sklearn import tree
my_data = [[1, 1], [5, 5], [2,3], [7,11]]
target_vals = [0.1, 1.5, 0.75, 1.73]
clf = tree.DecisionTreeRegressor()
print("The decision tree regressor has been called")
DTreg = clf.fit(my_data, target_vals)
print("Data has been fit")
pred_val = DTreg.predict([[4, 7]])
print("The predicted data is ")
print(pred_val)

Output

The decision tree regressor has been called
Data has been fit
The predicted data is
[1.5]

Explanation

  • The required packages are imported into the environment.
  • The feature vector and target values are defined.
  • The DecisionTreeRegressor is called and the data is fit to the model.
  • The ‘predict’ function is used to predict the values for the feature values.
  • The output is displayed on the console.

Updated on: 11-Dec-2020

148 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements