Ordinary Least Squares and the Line of Best Fit

Ordinary Least Squares and the Line of Best Fit #

Direct Solution Method

Ordinary Least Squares (OLS) is the standard way to choose the “best” line in linear regression by minimising squared prediction errors.

Key takeaway: OLS defines “best fit” as the line that minimises the total squared residual error across all data points.


Predicted Value and Residual #

For each data point ((x_i, y_i)), the model predicts:

\[ \hat{y}_i = \beta_0 + \beta_1 x_i \]

The residual (error) is:

\[ r_i = y_i - \hat{y}_i \]

Residual intuition: Positive residual means the point is above the line. Negative residual means the point is below the line.


Sum of Squared Errors (SSE) #

OLS chooses (\beta_0, \beta_1) to minimise the Sum of Squared Errors:

\[ SSE = \sum_{i=1}^{n}(y_i - \hat{y}_i)^2 \]

Why square the residuals:

  • Prevents positive and negative errors cancelling out.
  • Penalises large errors more strongly.
  • Produces a smooth objective that is easier to optimise.

Closed-Form Solution (Single Predictor) #

When there is a single predictor variable, the solution can be written using covariance and variance:

\[ \beta_1 = \frac{\mathrm{cov}(x,y)}{\mathrm{var}(x)} \]

And the intercept:

\[ \beta_0 = \bar{y} - \beta_1 \bar{x} \]

Interpretation: (\beta_1) is the slope (how much (y) changes per unit change in (x)). (\beta_0) is the intercept (predicted value when (x=0)).


Matrix View (General Linear Regression Framework) #

Fitting a model can be expressed as an overdetermined system:

\[ Ap = b \]

Where:

  • (A) is the design matrix.
  • (p) is the parameter vector.
  • (b) is the output vector.

The least-squares solution satisfies the normal equations:

\[ A^T A p = A^T b \]

And (when invertible):

\[ p^* = (A^T A)^{-1}A^T b \]

Practical Meaning #

OLS is:

  • Fast and standard for many regression problems.
  • Sensitive to outliers (because squaring makes large errors dominate).
  • A foundation for extensions like regularisation (ridge, LASSO).

References #

  • /docs/ai/machine-learning/03-linear-models-regression/
  • /docs/ai/machine-learning/03-gradient-descent-linear-regression/

Home | Machine Learning