Loading...
Loading...
Fit polynomial curves with degree comparison, R², and Python numpy code.
Orange dots = data points, blue curve = polynomial fit
| Degree | R² | Adj. R² | RMSE |
|---|---|---|---|
| 1 | 0.9474 | 0.9408 | 8.7707 |
| 2 (selected) | 1 | 1 | 0.0921 |
| 3 | 1 | 1 | 0.0766 |
| 4 | 1 | 1 | 0.0616 |
| 5 | 1 | 1 | 0.0432 |
| 6 | 1 | 1 | 0.0292 |
Choose the lowest degree where adjusted R² is maximized.
import numpy as np
import matplotlib.pyplot as plt
# Data
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
y = np.array([2.1, 5.0, 10.2, 17.5, 27.0, 38.5, 52.1, 68.0, 86.2, 106.5])
# Fit polynomial of degree 2
coeffs = np.polyfit(x, y, 2)
poly = np.poly1d(coeffs)
# Predictions
y_pred = poly(x)
ss_res = np.sum((y - y_pred) ** 2)
ss_tot = np.sum((y - np.mean(y)) ** 2)
r_squared = 1 - ss_res / ss_tot
n, p = len(x), 2 + 1
adj_r2 = 1 - (1 - r_squared) * (n - 1) / (n - p)
print(f"Polynomial (degree 2):")
print(f" Coefficients: {coeffs}")
print(f" Equation: {poly}")
print(f" R²: {r_squared:.6f}")
print(f" Adjusted R²: {adj_r2:.6f}")
print(f" RMSE: {np.sqrt(ss_res / (n - p)):.4f}")
# Compare degrees 1-6
print("\nDegree comparison:")
for d in range(1, 7):
if d >= n - 1:
break
c = np.polyfit(x, y, d)
p_fn = np.poly1d(c)
yp = p_fn(x)
ssr = np.sum((y - yp) ** 2)
r2 = 1 - ssr / ss_tot
ar2 = 1 - (1 - r2) * (n - 1) / (n - d - 1)
print(f" Degree {d}: R²={r2:.6f}, Adj R²={ar2:.6f}")
# Plot
x_smooth = np.linspace(x.min(), x.max(), 200)
plt.figure(figsize=(10, 6))
plt.scatter(x, y, c='orange', s=60, zorder=5, label='Data')
plt.plot(x_smooth, poly(x_smooth), 'b-', lw=2, label=f'Degree {2} (R²={r_squared:.4f})')
plt.xlabel('x')
plt.ylabel('y')
plt.title(f'Polynomial Regression (degree {2})')
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()Polynomial regression fits a polynomial function y = β₀ + β₁x + β₂x² + … + βₐxᵈ to data. It captures non-linear relationships by adding powers of x as features. Despite the non-linear curve, it is still solved as a linear regression problem (linear in parameters).
Start with degree 2 (quadratic) and increase if needed. Use adjusted R² - it penalizes unnecessary complexity. Cross-validation is even better: split data into train/test and check if higher degrees improve test performance. If test R² decreases, you are overfitting.
Overfitting occurs when the polynomial degree is too high, causing the curve to pass through every point but generalize poorly to new data. A degree-n polynomial perfectly fits n+1 points but captures noise rather than the true pattern. Signs: high training R² but low test R².
No. Polynomial regression is linear in its parameters (coefficients) even though the curve is non-linear. True non-linear regression involves parameters that appear non-linearly, like y = a·e^(bx). Polynomial regression can be solved with the normal equation; non-linear regression requires iterative methods.
numpy.polyfit(x, y, degree) uses least squares to fit a polynomial. It returns coefficients from highest to lowest degree: [βₐ, βₐ₋₁, …, β₁, β₀]. Use numpy.polyval(coeffs, x) to evaluate predictions. For more control, use numpy.polynomial.polynomial or sklearn PolynomialFeatures + LinearRegression.