February 21, 2026Direct solution method - Ordinary Least Squares and the Line of Best Fit
#
It is possible to compute the best parameters for linear regression in one shot (closed-form),
instead of iteratively improving them step-by-step. fileciteturn34file10turn34file6
For linear regression, the direct method is usually Ordinary Least Squares (OLS).
Ordinary Least Squares (OLS) chooses the “best” line by minimising squared prediction errors.
Key takeaway:
OLS defines “best fit” as the line that minimises the total squared residual error across all data points.
February 21, 2026Gradient Descent for Linear Regression
#
Gradient descent is an iterative optimisation method used to minimise the regression cost function by repeatedly updating parameters in the direction that reduces error.
- Iterative method
- Types: batch / stochastic / mini-batch
Key takeaway:
Gradient descent starts with initial parameter values and repeatedly updates them using the gradient until the cost stops decreasing.
flowchart TD
GD["Gradient<br/>Descent"] -->|minimises| CF["Cost<br/>function"]
GD -->|updates| W["Parameters<br/>(weights)"]
GD -->|uses| GR["Gradient<br/>(slope)"]
GD --> H["Hyperparameters"]
H --> LR["Learning<br/>rate"]
H --> BS["Batch<br/>size"]
H --> EP["Epochs"]
style GD fill:#90CAF9,stroke:#1E88E5,color:#000
style CF fill:#CE93D8,stroke:#8E24AA,color:#000
style W fill:#CE93D8,stroke:#8E24AA,color:#000
style GR fill:#CE93D8,stroke:#8E24AA,color:#000
style H fill:#CE93D8,stroke:#8E24AA,color:#000
style LR fill:#CE93D8,stroke:#8E24AA,color:#000
style BS fill:#CE93D8,stroke:#8E24AA,color:#000
style EP fill:#CE93D8,stroke:#8E24AA,color:#000
Types of GD
#
flowchart TD
T["Gradient Descent<br/>types"] --> BGD["Batch<br/>GD"]
T --> SGD["Stochastic<br/>GD"]
T --> MGD["Mini-batch<br/>GD"]
BGD --> ALL["All data<br/>per step"]
BGD --> STB["Smooth<br/>updates"]
SGD --> ONE["1 sample<br/>per step"]
SGD --> FAST["Quick<br/>progress"]
SGD --> NOISE["Noisy<br/>updates"]
MGD --> MB["Small batch<br/>per step"]
MGD --> PRACT["Practical<br/>default"]
style T fill:#90CAF9,stroke:#1E88E5,color:#000
style BGD fill:#C8E6C9,stroke:#2E7D32,color:#000
style SGD fill:#C8E6C9,stroke:#2E7D32,color:#000
style MGD fill:#C8E6C9,stroke:#2E7D32,color:#000
style ALL fill:#CE93D8,stroke:#8E24AA,color:#000
style STB fill:#CE93D8,stroke:#8E24AA,color:#000
style ONE fill:#CE93D8,stroke:#8E24AA,color:#000
style FAST fill:#CE93D8,stroke:#8E24AA,color:#000
style NOISE fill:#CE93D8,stroke:#8E24AA,color:#000
style MB fill:#CE93D8,stroke:#8E24AA,color:#000
style PRACT fill:#CE93D8,stroke:#8E24AA,color:#000
Batch
#
- Use only if you have huge compute and a lot of time to train
SGD
#