Momentum

Momentum Methods

Optimisation: Momentum Methods #

Momentum improves gradient descent by adding a memory of previous update directions. Instead of using only the current gradient, the optimiser accumulates velocity across iterations.

Key takeaway:
Momentum helps the optimiser move faster in consistent directions and reduces zigzag movement in directions where gradients oscillate.


flowchart TD
    A["Momentum-based Optimiser"] --> B["SGD with Momentum"]

    B --> B1["Adds velocity term"]
    B --> B2["Accumulates past gradients"]
    B --> B3["Reduces zig-zag movement"]
    B --> B4["Speeds up movement in useful direction"]
    B --> B5["Helps through shallow regions"]

    B1 --> C1["Current update depends on previous update"]
    B2 --> C2["Builds inertia"]
    B3 --> C3["Smoother path to minimum"]

    style A fill:#C8E6C9,stroke:#43A047,stroke-width:2px
    style B fill:#E1F5FE,stroke:#4A90E2
    style B1 fill:#EDE7F6,stroke:#7E57C2
    style B2 fill:#FFF9C4,stroke:#FBC02D
    style B3 fill:#F8BBD0,stroke:#D81B60
    style B4 fill:#EDE7F6,stroke:#7E57C2
    style B5 fill:#FFF9C4,stroke:#FBC02D

Physical Intuition ☆ #

Momentum is often explained using the analogy of a ball rolling down a hill.