Gradient Descent for Linear Regression

Gradient Descent for Linear Regression #

Iterative Method – Gradient Descent (batch/stochastic/mini-batch)

Gradient descent is an iterative optimisation method used to minimise the regression cost function by repeatedly updating parameters in the direction that reduces error.

Key takeaway: Gradient descent starts with initial parameter values and repeatedly updates them using the gradient until the cost stops decreasing.


Cost Function #

A common cost function for linear regression is the Sum of Squared Errors:

\[ J(\beta_1,\beta_0) = \sum_{i=1}^{n}(\beta_1 x_i + \beta_0 - y_i)^2 \]

Equivalent forms often used: Mean Squared Error:

\[ J(w,b) = \frac{1}{n}\sum_{i=1}^{n}(w x_i + b - y_i)^2 \]

Sometimes (\frac{1}{2n}) is used to simplify derivatives.


Gradients (Partial Derivatives) #

The partial derivatives are:

\[ \frac{\partial J}{\partial \beta_1} = \frac{1}{n}\sum_{i=1}^{n}(\beta_1 x_i + \beta_0 - y_i)x_i \] \[ \frac{\partial J}{\partial \beta_0} = \frac{1}{n}\sum_{i=1}^{n}(\beta_1 x_i + \beta_0 - y_i) \]

Gradient meaning: It tells you how to change parameters to reduce the cost fastest.


Update Equations #

Starting from initial values, parameters are updated each iteration:

\[ \beta_1^{(i+1)} := \beta_1^{(i)} - \alpha \frac{\partial J}{\partial \beta_1} \] \[ \beta_0^{(i+1)} := \beta_0^{(i)} - \alpha \frac{\partial J}{\partial \beta_0} \]

Where:

  • (\alpha) is the learning rate.
  • (i) is the iteration number.

Matrix/Vector Form (Least Squares + Gradient Descent) #

For the overdetermined system (Ap=b), SSE can be written as:

\[ SSE = (Ap-b)^T(Ap-b) \]

Its gradient:

\[ \nabla_p(SSE) = 2A^TAp - 2A^Tb \]

Gradient descent update:

\[ p^{(i+1)} = p^{(i)} - \alpha\left(2A^TAp^{(i)} - 2A^Tb\right) \]

Choosing the Learning Rate (\alpha) #

If (\alpha) is too large: You can overshoot the minimum. The cost may oscillate or diverge.

If (\alpha) is too small: Training is very slow. You may need many iterations.

Practical habit: Monitor the cost (J) across iterations and ensure it decreases steadily.


References #

  • /docs/ai/machine-learning/03-linear-models-regression/
  • /docs/ai/machine-learning/03-ordinary-least-squares/

Home | Machine Learning