Gradient Descent for Linear Regression #
Iterative Method – Gradient Descent (batch/stochastic/mini-batch)
Gradient descent is an iterative optimisation method used to minimise the regression cost function by repeatedly updating parameters in the direction that reduces error.
Key takeaway: Gradient descent starts with initial parameter values and repeatedly updates them using the gradient until the cost stops decreasing.
Cost Function #
A common cost function for linear regression is the Sum of Squared Errors:
\[ J(\beta_1,\beta_0) = \sum_{i=1}^{n}(\beta_1 x_i + \beta_0 - y_i)^2 \]Equivalent forms often used: Mean Squared Error:
\[ J(w,b) = \frac{1}{n}\sum_{i=1}^{n}(w x_i + b - y_i)^2 \]Sometimes (\frac{1}{2n}) is used to simplify derivatives.
Gradients (Partial Derivatives) #
The partial derivatives are:
\[ \frac{\partial J}{\partial \beta_1} = \frac{1}{n}\sum_{i=1}^{n}(\beta_1 x_i + \beta_0 - y_i)x_i \] \[ \frac{\partial J}{\partial \beta_0} = \frac{1}{n}\sum_{i=1}^{n}(\beta_1 x_i + \beta_0 - y_i) \]Gradient meaning: It tells you how to change parameters to reduce the cost fastest.
Update Equations #
Starting from initial values, parameters are updated each iteration:
\[ \beta_1^{(i+1)} := \beta_1^{(i)} - \alpha \frac{\partial J}{\partial \beta_1} \] \[ \beta_0^{(i+1)} := \beta_0^{(i)} - \alpha \frac{\partial J}{\partial \beta_0} \]Where:
- (\alpha) is the learning rate.
- (i) is the iteration number.
Matrix/Vector Form (Least Squares + Gradient Descent) #
For the overdetermined system (Ap=b), SSE can be written as:
\[ SSE = (Ap-b)^T(Ap-b) \]Its gradient:
\[ \nabla_p(SSE) = 2A^TAp - 2A^Tb \]Gradient descent update:
\[ p^{(i+1)} = p^{(i)} - \alpha\left(2A^TAp^{(i)} - 2A^Tb\right) \]Choosing the Learning Rate (\alpha) #
If (\alpha) is too large: You can overshoot the minimum. The cost may oscillate or diverge.
If (\alpha) is too small: Training is very slow. You may need many iterations.
Practical habit: Monitor the cost (J) across iterations and ensure it decreases steadily.
References #
- /docs/ai/machine-learning/03-linear-models-regression/
- /docs/ai/machine-learning/03-ordinary-least-squares/