Multiple Linear Regression Calculator

Fit multiple predictors with OLS. Coefficients, R², adjusted R², t-values, and Python sklearn/statsmodels code.

Sample Data:

Settings

Number of predictors

Feature names (comma-separated)

Data (CSV: X₁,X₂,...,Y per line)

R²

0.9009

Adjusted R²

0.8613

RMSE

23.45

F-statistic

22.73

Regression Coefficients

Variable	Coefficient	Std Error	t-value
Intercept	-88.7238	-	-
Sqft	0.27	0.1043	2.589
Beds	-19.2127	34.634	-0.555

Equation: y = -88.72 + 0.27·Sqft − 19.2127·Beds

Actual vs Predicted

Data Points - Each dot plots an observation's predicted value (x-axis) against its actual value (y-axis)

Perfect Fit Line - The 45-degree line where actual equals predicted

How to Read the Chart

Points on the line: Indicate perfect predictions with zero residual error.
Points above the line: The model underestimated the actual value (positive residual).
Points below the line: The model overestimated the actual value (negative residual).
Tight cluster around the line: Indicates a high R-squared and strong model fit.

Predictions & Residuals

#	Actual	Predicted	Residual
1	245	231.7	13.3
2	312	285.71	26.29
3	279	293.5	-14.5
4	308	340.76	-32.76
5	399	382.31	16.69
6	260	272.21	-12.21
7	340	347.51	-7.51
8	420	409.31	10.69

Python Code

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error
import statsmodels.api as sm

# Data - each row: [Sqft, Beds, y]
data = np.array([
    [1400,3,245],
    [1600,3,312],
    [1700,4,279],
    [1875,4,308],
    [2100,5,399],
    [1550,3,260],
    [1900,4,340],
    [2200,5,420],
])

X = data[:, :2]  # Sqft, Beds
y = data[:, 2]   # target

# sklearn
model = LinearRegression()
model.fit(X, y)
y_pred = model.predict(X)

print("Coefficients:", dict(zip(["Sqft","Beds"], model.coef_)))
print(f"Intercept: {model.intercept_:.4f}")
print(f"R²: {r2_score(y, y_pred):.4f}")
print(f"RMSE: {np.sqrt(mean_squared_error(y, y_pred)):.4f}")

# statsmodels (with p-values)
X_sm = sm.add_constant(X)
ols = sm.OLS(y, X_sm).fit()
print(ols.summary())

Frequently Asked Questions

What is multiple linear regression?

Multiple linear regression models the relationship between a dependent variable (y) and two or more independent variables (x₁, x₂, …). It finds coefficients that minimize the sum of squared residuals, extending simple linear regression to multiple predictors.

When should I use adjusted R² instead of R²?

Always use adjusted R² when comparing models with different numbers of predictors. Regular R² always increases when you add variables (even useless ones), while adjusted R² penalizes for additional predictors that don't improve the model, preventing overfitting.

What assumptions does multiple regression require?

Key assumptions: (1) linearity between predictors and outcome, (2) independence of residuals, (3) homoscedasticity (constant variance of residuals), (4) normally distributed residuals, and (5) no perfect multicollinearity among predictors.

How do I interpret the coefficients?

Each coefficient βⱼ represents the expected change in y for a one-unit increase in xⱼ, holding all other predictors constant. The intercept β₀ is the expected y when all predictors are zero.

What is multicollinearity and why does it matter?

Multicollinearity occurs when predictors are highly correlated with each other. It inflates standard errors, makes coefficients unstable, and can flip their signs. Check using VIF (Variance Inflation Factor) - values above 5-10 indicate problematic multicollinearity.

Related Tools

Linear Regression

Slope, R², predictions

Polynomial Regression

Curve fitting with degree comparison

Ridge & Lasso

L1/L2 regularized regression

Variable

Coefficient

Std Error

t-value

Intercept

-88.7238

Sqft

0.27

0.1043

2.589

Beds

-19.2127

34.634

-0.555

Actual

Predicted

Residual

245

231.7

13.3

312

285.71

26.29

279

293.5

-14.5

308

340.76

-32.76

399

382.31

16.69

260

272.21

-12.21

340

347.51

-7.51

420

409.31

10.69

import numpy as np from sklearn.linear_model import LinearRegression from sklearn.metrics import r2_score, mean_squared_error import statsmodels.api as sm # Data - each row: [Sqft, Beds, y] data = np.array([ [1400,3,245], [1600,3,312], [1700,4,279], [1875,4,308], [2100,5,399], [1550,3,260], [1900,4,340], [2200,5,420], ]) X = data[:, :2] # Sqft, Beds y = data[:, 2] # target # sklearn model = LinearRegression() model.fit(X, y) y_pred = model.predict(X) print("Coefficients:", dict(zip(["Sqft","Beds"], model.coef_))) print(f"Intercept: {model.intercept_:.4f}") print(f"R²: {r2_score(y, y_pred):.4f}") print(f"RMSE: {np.sqrt(mean_squared_error(y, y_pred)):.4f}") # statsmodels (with p-values) X_sm = sm.add_constant(X) ols = sm.OLS(y, X_sm).fit() print(ols.summary())