Optimisation

LNN for Regression

Linear Neural Networks for Regression #

A linear neural network for regression is a model that predicts a continuous target by taking a weighted sum of input features and applying the identity activation (so the output can be any real number).

  • Single neuron for regression (predicting how much / how many)
  • Data + linear model (single neuron, no hidden layers) + squared loss
  • Training using batch gradient descent algorithm
  • Prediction (inference)
  • Eg: Auto MPG (UCI) style prediction with a single neuron (from-scratch code)

flowchart LR
  D["Data<br/>X, y"] --> M["Linear model<br/>w, b<br/>Single neuron"]
  M --> A["Activation<br/>Identity"]
  A --> L["Loss<br/>MSE (Squared error)"]
  L --> O["Optimiser<br/>Batch Gradient DescentBatch GD / Mini-batch GD"]
  O --> P["Parameters<br/>w, b"]
  P --> I["Inference<br/>Predict ŷ (number) for new x"]

  %% Pastel colour scheme
  style D fill:#E3F2FD,stroke:#1E88E5,stroke-width:1px
  style M fill:#E8F5E9,stroke:#43A047,stroke-width:1px
  style A fill:#FFF3E0,stroke:#FB8C00,stroke-width:1px
  style L fill:#FCE4EC,stroke:#D81B60,stroke-width:1px
  style O fill:#F3E5F5,stroke:#8E24AA,stroke-width:1px
  style P fill:#E0F7FA,stroke:#00838F,stroke-width:1px
  style I fill:#F1F8E9,stroke:#558B2F,stroke-width:1px

Regression #

Regression is a supervised learning task that predicts a continuous-valued output based on input features.

Gradient Descent Algorithm

Gradient Descent Algorithm #

Gradient Descent Algorithm (GDA) is

  • an optimisation method
  • used to train models
  • by repeatedly updating parameters (weights and biases) to reduce the loss

In deep learning, the default training approach is almost always mini-batch gradient descent, usually with Adam or SGD + momentum.

Gradient Descent is used in both regression and classification.

It’s not tied to the task type — it’s tied to the fact you have:

LNN for Classification

Linear NN for Classification #

A Linear Neural Network (LNN) for classification uses no hidden layers.
It learns a linear decision boundary and outputs class probabilities, then converts them into predicted classes.

Neural-network view:

  • Binary classification → logistic regression (single neuron + sigmoid)
  • Multi-class classification → softmax regression (K output neurons + softmax)

flowchart LR
  D["Data<br/>X, y"] --> M["Linear model<br/>w, b"]
  M --> A["Activation<br/>Sigmoid / Softmax"]
  A --> L["Loss<br/>Cross-entropy"]
  L --> O["Optimiser<br/>Mini-batch GD / Adam"]
  O --> P["Updated parameters<br/>w, b"]
  P --> I["Inference<br/>Probabilities → class"]

  %% Pastel colour scheme
  style D fill:#E3F2FD,stroke:#1E88E5,stroke-width:1px
  style M fill:#E8F5E9,stroke:#43A047,stroke-width:1px
  style A fill:#FFF3E0,stroke:#FB8C00,stroke-width:1px
  style L fill:#FCE4EC,stroke:#D81B60,stroke-width:1px
  style O fill:#F3E5F5,stroke:#8E24AA,stroke-width:1px
  style P fill:#E0F7FA,stroke:#00838F,stroke-width:1px
  style I fill:#F1F8E9,stroke:#558B2F,stroke-width:1px

Classification #

Classification predicts a discrete class label.
Common settings: