Gradient Descent

Gradient Descent

Gradient Descent for Linear Regression #

Revision:
Gradient descent is the step-by-step method for reducing the cost function when a direct closed-form solution is not convenient.


Where Gradient Descent Fits in ML ☆ #

Gradient descent is used when we want the model to learn parameters by repeatedly improving them.

For linear regression, it adjusts the slope and intercept until the prediction error becomes small.

flowchart LR
    A["Initial Parameters"] --> B["Make Predictions"]
    B --> C["Compute Cost"]
    C --> D["Compute Gradient"]
    D --> E["Update Parameters"]
    E --> B

    style A fill:#E1F5FE,stroke:#5b7db1,color:#000
    style B fill:#C8E6C9,stroke:#5f8f6a,color:#000
    style C fill:#FFF9C4,stroke:#b59b3b,color:#000
    style D fill:#EDE7F6,stroke:#8a6fb3,color:#000
    style E fill:#C8E6C9,stroke:#5f8f6a,color:#000

Core Idea ☆ #

The gradient tells us the direction in which the cost increases fastest.

Gradient Descent and Mini-Batch Gradient Descent

Optimisation: Gradient Descent and Mini-Batch Gradient Descent #

Gradient descent is the core optimisation idea behind neural network training. It updates the model parameters by moving in the opposite direction of the gradient of the loss.

Key takeaway:
Gradient descent uses the gradient to decide how to change the parameters. The learning rate controls how large each update step is.


flowchart TD
    A["Gradient Descent Variants"] --> B["Batch Gradient Descent"]
    A --> C["Stochastic Gradient Descent"]
    A --> D["Mini-batch Gradient Descent"]

    B --> B1["Uses full dataset"]
    B --> B2["One update per epoch"]
    B --> B3["Smooth but slow"]

    C --> C1["Uses one example at a time"]
    C --> C2["Frequent updates"]
    C --> C3["Fast but noisy"]

    D --> D1["Uses small batches"]
    D --> D2["Efficient on hardware"]
    D --> D3["Balanced and practical"]

    style A fill:#E1F5FE,stroke:#4A90E2,stroke-width:2px
    style B fill:#EDE7F6,stroke:#7E57C2
    style C fill:#C8E6C9,stroke:#43A047
    style D fill:#FFF9C4,stroke:#FBC02D

Gradient Descent Rule ☆ #

The gradient tells us the direction in which the loss increases fastest. To reduce the loss, we move in the opposite direction.