In this example, we will construct a decision tree for the Iris data. First, we load the data set as before, and split it into the training (N=100) and testing (N=50) data sets.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report
# Loading data
iris = load_iris()
X = iris.data
y = iris.target
feature_names = iris.feature_names
target_names = iris.target_names
# spliting the data into training and testing data sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=50,
random_state=2020)
Next, we define a decision tree classification object DecisionTreeClassifier
available under the sklearn.tree
library. As we define the classifier object dt
, we use the entropy criterion (criterion='entropy'
) to describe sample inhomogeneity at each node. We also set the minimum leaf size min_samples_leaf
to 3 and the maximum tree depth max_depth
to 4 in order to avoid overfitting. We seed the random number generator for this algorithm with the random seed random_state=0
.
# decision tree classifier
dt = DecisionTreeClassifier(criterion='entropy',
min_samples_leaf = 3,
max_depth = 4,
random_state=0)
Then we train he classifier with the fit
method.
dt.fit(X_train,y_train)
The trained classifier is then used to generate prediction on the testing data.
# classification on the testing data set
y_pred = dt.predict(X_test)
The confusion matrix and the classification report are generated.
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test, y_pred,
target_names=target_names))
The resulting decision tree can be visualized by the plot_tree
function available in the sklearn.tree
library. The trained classifier is passed on as the required input parameter, along with the feature names and class names.
# plotting the tree
plt.figure(figsize=[7.5,7.5])
plot_tree(dt, feature_names=feature_names, class_names=target_names)
plt.show()
Least square linear regression is implemented with a LinearRegression
object available in the sklearn.linear_model
library. In this example, we model the petal width from the Iris data as the dependent variable, and the three other features as the regressors.
from sklearn.linear_model import LinearRegression
# Target is petal width
y = iris.data[:,3]
# All the other variables are input features
X = iris.data[:,:3]
Now we fit a regression model with the fit
method. The resulting predictor object is referred as reg
.
# linear regression learner
reg = LinearRegression().fit(X,y)
The information regarding the regression model can be examined with various methods and attributes, such as the R-square (with the .score
method)
reg.score(X,y)
as well as the regression coefficients (with the coef_
attribute) and the intercept (with the .intercept_
attribute).
reg.coef_
reg.intercept_
Finally, we are plotting the sepal length against the petal width, with the predicted regression line overlaid on observed data.
# Observed vs predicted plot
y_pred = reg.predict(X)
plt.plot(X[:,0], y, 'b.', label='observed')
plt.plot([X[:,0].min(), X[:,0].max()], [y_pred.min(), y_pred.max()],
'r-', label='predicted')
plt.xlabel('Sepal length')
plt.ylabel('Petal width')
plt.legend()
plt.show()