Loading...
Loading...
Compare Ridge (L2), Lasso (L1), and ElasticNet regularization. Visualize coefficient paths, identify important features, and generate Python sklearn code.
| Feature | Coefficient | Std. Coeff |
|---|---|---|
| Intercept | 115.096 | - |
| SqFt | 0.0921 | 36.5244 |
| Beds | 2.309 | 1.8472 |
| Baths | 2.309 | 1.8472 |
import numpy as np
from sklearn.linear_model import Ridge, RidgeCV
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import cross_val_score
import matplotlib.pyplot as plt
# Data
X = np.array([
[1400, 3, 2],
[1600, 3, 2],
[1700, 4, 3],
[1875, 4, 3],
[1100, 2, 1],
[1550, 3, 2],
[2350, 4, 3],
[2450, 5, 4],
[1425, 3, 2],
[1700, 3, 2]
])
y = np.array([245, 312, 279, 308, 199, 219, 405, 324, 319, 255])
feature_names = ['SqFt', 'Beds', 'Baths']
# Standardize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Ridge Regression (alpha = 1)
ridge = Ridge(alpha=1)
ridge.fit(X_scaled, y)
print("Ridge Regression Results (alpha=1)")
print(f"Intercept: {ridge.intercept_:.4f}")
for name, coef in zip(feature_names, ridge.coef_):
print(f" {name}: {coef:.4f}")
print(f"R² Score: {ridge.score(X_scaled, y):.4f}")
# Cross-validation for optimal alpha
alphas = np.logspace(-3, 3, 100)
ridge_cv = RidgeCV(alphas=alphas, cv=5)
ridge_cv.fit(X_scaled, y)
print(f"\nOptimal alpha (CV): {ridge_cv.alpha_:.4f}")
# Coefficient path
coefs = []
for a in alphas:
r = Ridge(alpha=a).fit(X_scaled, y)
coefs.append(r.coef_)
plt.figure(figsize=(10, 6))
for i, name in enumerate(feature_names):
plt.plot(alphas, [c[i] for c in coefs], label=name)
plt.xscale('log')
plt.xlabel('Alpha (Regularization Strength)')
plt.ylabel('Coefficient Value')
plt.title('Ridge Coefficient Path')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()Ridge (L2) adds the sum of squared coefficients as a penalty, shrinking all coefficients toward zero but never exactly to zero. Lasso (L1) adds the sum of absolute coefficients, which can shrink some coefficients to exactly zero, performing feature selection. ElasticNet combines both penalties.
Use cross-validation to find the optimal alpha. In sklearn, use RidgeCV, LassoCV, or ElasticNetCV which automatically test multiple alpha values and select the best one based on cross-validated performance.
Use Ridge when you believe all features are relevant but want to prevent overfitting. Use Lasso when you suspect many features are irrelevant and want automatic feature selection. Use ElasticNet when you have correlated features - Lasso can arbitrarily select one of correlated features, while ElasticNet tends to select groups.
Regularization penalizes coefficient magnitude. Without standardization, features with larger scales would have smaller coefficients and receive less penalty, creating an unfair comparison. Standardizing ensures all features are penalized equally.
As alpha → 0, regularized regression approaches OLS (no penalty). As alpha → ∞, all coefficients shrink to zero (underfitting). The optimal alpha balances bias-variance tradeoff - enough regularization to prevent overfitting but not so much that the model underfits.