Optimisation

LNN for Regression

Linear Neural Networks for Regression #

A linear neural network for regression is a model that predicts a continuous target by taking a weighted sum of input features and applying the identity activation (so the output can be any real number).

  • Single neuron for regression (predicting how much / how many)
  • Data + linear model (single neuron, no hidden layers) + squared loss
  • Training using batch gradient descent algorithm
  • Prediction (inference)
  • Eg: Auto MPG (UCI) style prediction with a single neuron (from-scratch code)

flowchart LR
  D["Data<br/>X, y"] --> M["Linear model<br/>w, b<br/>Single neuron"]
  M --> A["Activation<br/>Identity"]
  A --> L["Loss<br/>MSE (Squared error)"]
  L --> O["Optimiser<br/>Batch Gradient DescentBatch GD / Mini-batch GD"]
  O --> P["Parameters<br/>w, b"]
  P --> I["Inference<br/>Predict ŷ (number) for new x"]

  %% Pastel colour scheme
  style D fill:#E3F2FD,stroke:#1E88E5,stroke-width:1px
  style M fill:#E8F5E9,stroke:#43A047,stroke-width:1px
  style A fill:#FFF3E0,stroke:#FB8C00,stroke-width:1px
  style L fill:#FCE4EC,stroke:#D81B60,stroke-width:1px
  style O fill:#F3E5F5,stroke:#8E24AA,stroke-width:1px
  style P fill:#E0F7FA,stroke:#00838F,stroke-width:1px
  style I fill:#F1F8E9,stroke:#558B2F,stroke-width:1px

Regression #

Regression is a supervised learning task that predicts a continuous-valued output based on input features.

Gradient Descent Algorithm

Gradient Descent Algorithm #

Gradient Descent Algorithm (GDA) is

  • an optimisation method
  • used to train models
  • by repeatedly updating parameters (weights and biases) to reduce the loss

In deep learning, the default training approach is almost always mini-batch gradient descent, usually with Adam or SGD + momentum.

Gradient Descent is used in both regression and classification.

It’s not tied to the task type — it’s tied to the fact you have:

LNN for Classification

Linear NN for Classification #

A Linear Neural Network (LNN) for classification uses no hidden layers.
It learns a linear decision boundary and outputs class probabilities, then converts them into predicted classes.

Neural-network view:

  • Binary classification → logistic regression (single neuron + sigmoid)
  • Multi-class classification → softmax regression (K output neurons + softmax)

flowchart LR
  D["Data<br/>X, y"] --> M["Linear model<br/>w, b"]
  M --> A["Activation<br/>Sigmoid / Softmax"]
  A --> L["Loss<br/>Cross-entropy"]
  L --> O["Optimiser<br/>Mini-batch GD / Adam"]
  O --> P["Updated parameters<br/>w, b"]
  P --> I["Inference<br/>Probabilities → class"]

  %% Pastel colour scheme
  style D fill:#E3F2FD,stroke:#1E88E5,stroke-width:1px
  style M fill:#E8F5E9,stroke:#43A047,stroke-width:1px
  style A fill:#FFF3E0,stroke:#FB8C00,stroke-width:1px
  style L fill:#FCE4EC,stroke:#D81B60,stroke-width:1px
  style O fill:#F3E5F5,stroke:#8E24AA,stroke-width:1px
  style P fill:#E0F7FA,stroke:#00838F,stroke-width:1px
  style I fill:#F1F8E9,stroke:#558B2F,stroke-width:1px

Classification #

Classification predicts a discrete class label.
Common settings:

Optimisation of Deep models

Optimisation of Deep models #

Optimizers are algorithms that update neural network parameters to reduce the loss function.

Deep networks usually have millions or billions of parameters, so there is usually no closed-form solution.

Instead, training uses iterative optimisation.

Key takeaway:
An optimiser decides how the model moves through the loss landscape towards lower loss.


  • Goal of Optimization
  • Optimization Challenges in Deep Learning
  • Gradient Descent
  • Stochastic Gradient Descent
  • Minibatch Stochastic Gradient Descent
  • Momentum
  • Adagrad and Algorithm
  • RMSProp and Algorithm
  • Adadelta and Algorithm
  • Adam and Algorithm
  • Code Implementation and comparison of algorithms (webinar)

flowchart TD
    A["Optimisers in DNN"] --> B["Gradient Descent Variants"]
    A --> C["Momentum-based Optimiser"]
    A --> D["Adaptive Methods"]
    A --> E["Learning Rate Schedules"]

    D --> D1["Parameter-specific learning rates"]

    E --> E1["Learning rate changes during training"]

    style A fill:#E1F5FE,stroke:#4A90E2,stroke-width:2px
    style B fill:#EDE7F6,stroke:#7E57C2
    style C fill:#C8E6C9,stroke:#43A047
    style D fill:#FFF9C4,stroke:#FBC02D
    style E fill:#F8BBD0,stroke:#D81B60

Goal of Optimisation ☆ #

The goal is to find parameters \( \theta \) that minimise the loss.