Gradient Descent
Gradient Descent for Linear Regression #
Gradient descent is an iterative optimisation method used to minimise the regression cost function by repeatedly updating parameters in the direction that reduces error.
- Iterative method
- Types: batch / stochastic / mini-batch
Key takeaway: Gradient descent starts with initial parameter values and repeatedly updates them using the gradient until the cost stops decreasing.
flowchart TD GD["Gradient<br/>Descent"] -->|minimises| CF["Cost<br/>function"] GD -->|updates| W["Parameters<br/>(weights)"] GD -->|uses| GR["Gradient<br/>(slope)"] GD --> H["Hyperparameters"] H --> LR["Learning<br/>rate"] H --> BS["Batch<br/>size"] H --> EP["Epochs"] style GD fill:#90CAF9,stroke:#1E88E5,color:#000 style CF fill:#CE93D8,stroke:#8E24AA,color:#000 style W fill:#CE93D8,stroke:#8E24AA,color:#000 style GR fill:#CE93D8,stroke:#8E24AA,color:#000 style H fill:#CE93D8,stroke:#8E24AA,color:#000 style LR fill:#CE93D8,stroke:#8E24AA,color:#000 style BS fill:#CE93D8,stroke:#8E24AA,color:#000 style EP fill:#CE93D8,stroke:#8E24AA,color:#000
Types of GD #
flowchart TD T["Gradient Descent<br/>types"] --> BGD["Batch<br/>GD"] T --> SGD["Stochastic<br/>GD"] T --> MGD["Mini-batch<br/>GD"] BGD --> ALL["All data<br/>per step"] BGD --> STB["Smooth<br/>updates"] SGD --> ONE["1 sample<br/>per step"] SGD --> FAST["Quick<br/>progress"] SGD --> NOISE["Noisy<br/>updates"] MGD --> MB["Small batch<br/>per step"] MGD --> PRACT["Practical<br/>default"] style T fill:#90CAF9,stroke:#1E88E5,color:#000 style BGD fill:#C8E6C9,stroke:#2E7D32,color:#000 style SGD fill:#C8E6C9,stroke:#2E7D32,color:#000 style MGD fill:#C8E6C9,stroke:#2E7D32,color:#000 style ALL fill:#CE93D8,stroke:#8E24AA,color:#000 style STB fill:#CE93D8,stroke:#8E24AA,color:#000 style ONE fill:#CE93D8,stroke:#8E24AA,color:#000 style FAST fill:#CE93D8,stroke:#8E24AA,color:#000 style NOISE fill:#CE93D8,stroke:#8E24AA,color:#000 style MB fill:#CE93D8,stroke:#8E24AA,color:#000 style PRACT fill:#CE93D8,stroke:#8E24AA,color:#000
Batch #
- Use only if you have huge compute and a lot of time to train
SGD #
go-to solution