Loading...
Loading...
Build and visualize classification decision trees. See how the CART algorithm splits data using Gini impurity or entropy, explore feature importance, and generate Python sklearn code.
import numpy as np
from sklearn.tree import DecisionTreeClassifier, export_text, plot_tree
from sklearn.metrics import accuracy_score, classification_report
import matplotlib.pyplot as plt
# Data
X = np.array([
[0, 2, 1, 0],
[0, 2, 1, 1],
[1, 2, 1, 0],
[2, 1, 1, 0],
[2, 0, 0, 0],
[2, 0, 0, 1],
[1, 0, 0, 1],
[0, 1, 1, 0],
[0, 0, 0, 0],
[2, 1, 0, 0],
[0, 1, 0, 1],
[1, 1, 1, 1],
[1, 2, 0, 0],
[2, 1, 1, 1]
])
y = np.array([0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0])
feature_names = ['Outlook', 'Temp', 'Humidity', 'Windy']
class_names = ['No', 'Yes']
# Decision Tree Classifier
dt = DecisionTreeClassifier(
criterion='gini',
max_depth=4,
min_samples_leaf=1,
random_state=42
)
dt.fit(X, y)
# Training accuracy
y_pred = dt.predict(X)
print(f"Training Accuracy: {accuracy_score(y, y_pred):.4f}")
print(f"Tree Depth: {dt.get_depth()}")
print(f"Number of Leaves: {dt.get_n_leaves()}")
# Feature importances
print("\nFeature Importances:")
for name, imp in zip(feature_names, dt.feature_importances_):
print(f" {name}: {imp:.4f}")
# Text representation
print("\nTree Rules:")
print(export_text(dt, feature_names=feature_names))
# Classification report
print("\nClassification Report:")
print(classification_report(y, y_pred, target_names=class_names))
# Visualize tree
plt.figure(figsize=(16, 10))
plot_tree(dt, feature_names=feature_names, class_names=class_names,
filled=True, rounded=True, fontsize=10)
plt.title('Decision Tree')
plt.tight_layout()
plt.show()Both measure node impurity. Gini = 1 - Σ(pᵢ²), ranges 0 to 0.5 for binary. Entropy = -Σ(pᵢ·log₂(pᵢ)), ranges 0 to 1 for binary. In practice, they produce very similar trees. Gini is slightly faster to compute; entropy tends to produce more balanced trees.
Use pruning parameters: max_depth limits tree depth, min_samples_leaf requires minimum samples at leaves, min_samples_split requires minimum samples to split a node, and max_leaf_nodes limits total leaves. You can also use cost-complexity pruning (ccp_alpha in sklearn) or ensemble methods like Random Forest.
Information gain measures the reduction in impurity after splitting. It equals parent impurity minus the weighted average of children impurities. The feature and threshold with the highest information gain is chosen at each node. This is equivalent to maximizing mutual information between feature and target.
Yes! For continuous features, the tree tests thresholds (e.g., age ≤ 30). For categorical features, sklearn requires one-hot encoding, but the CART algorithm can handle both directly. The tree finds the optimal split point for each feature type.
Decision trees are easy to interpret and visualize, require little data preprocessing (no scaling needed), handle both numerical and categorical data, capture non-linear relationships, and provide feature importance. Main disadvantage: they tend to overfit without proper pruning.