Loading...
Loading...
Fit multiple predictors with OLS. Coefficients, R², adjusted R², t-values, and Python sklearn/statsmodels code.
| Variable | Coefficient | Std Error | t-value |
|---|---|---|---|
| Intercept | -88.7238 | - | - |
| Sqft | 0.27 | 0.1043 | 2.589 |
| Beds | -19.2127 | 34.634 | -0.555 |
| # | Actual | Predicted | Residual |
|---|---|---|---|
| 1 | 245 | 231.7 | 13.3 |
| 2 | 312 | 285.71 | 26.29 |
| 3 | 279 | 293.5 | -14.5 |
| 4 | 308 | 340.76 | -32.76 |
| 5 | 399 | 382.31 | 16.69 |
| 6 | 260 | 272.21 | -12.21 |
| 7 | 340 | 347.51 | -7.51 |
| 8 | 420 | 409.31 | 10.69 |
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error
import statsmodels.api as sm
# Data - each row: [Sqft, Beds, y]
data = np.array([
[1400,3,245],
[1600,3,312],
[1700,4,279],
[1875,4,308],
[2100,5,399],
[1550,3,260],
[1900,4,340],
[2200,5,420],
])
X = data[:, :2] # Sqft, Beds
y = data[:, 2] # target
# sklearn
model = LinearRegression()
model.fit(X, y)
y_pred = model.predict(X)
print("Coefficients:", dict(zip(["Sqft","Beds"], model.coef_)))
print(f"Intercept: {model.intercept_:.4f}")
print(f"R²: {r2_score(y, y_pred):.4f}")
print(f"RMSE: {np.sqrt(mean_squared_error(y, y_pred)):.4f}")
# statsmodels (with p-values)
X_sm = sm.add_constant(X)
ols = sm.OLS(y, X_sm).fit()
print(ols.summary())Multiple linear regression models the relationship between a dependent variable (y) and two or more independent variables (x₁, x₂, …). It finds coefficients that minimize the sum of squared residuals, extending simple linear regression to multiple predictors.
Always use adjusted R² when comparing models with different numbers of predictors. Regular R² always increases when you add variables (even useless ones), while adjusted R² penalizes for additional predictors that don't improve the model, preventing overfitting.
Key assumptions: (1) linearity between predictors and outcome, (2) independence of residuals, (3) homoscedasticity (constant variance of residuals), (4) normally distributed residuals, and (5) no perfect multicollinearity among predictors.
Each coefficient βⱼ represents the expected change in y for a one-unit increase in xⱼ, holding all other predictors constant. The intercept β₀ is the expected y when all predictors are zero.
Multicollinearity occurs when predictors are highly correlated with each other. It inflates standard errors, makes coefficients unstable, and can flip their signs. Check using VIF (Variance Inflation Factor) - values above 5-10 indicate problematic multicollinearity.