February 15, 2026Linear Neural Networks for Regression
#
A linear neural network for regression is a model that predicts a continuous target by taking a weighted sum of input features and applying the identity activation (so the output can be any real number).
- Single neuron for regression (predicting how much / how many)
- Data + linear model (single neuron, no hidden layers) + squared loss
- Training using batch gradient descent algorithm
- Prediction (inference)
- Eg: Auto MPG (UCI) style prediction with a single neuron (from-scratch code)
flowchart LR
D["Data<br/>X, y"] --> M["Linear model<br/>w, b<br/>Single neuron"]
M --> A["Activation<br/>Identity"]
A --> L["Loss<br/>MSE (Squared error)"]
L --> O["Optimiser<br/>Batch Gradient DescentBatch GD / Mini-batch GD"]
O --> P["Parameters<br/>w, b"]
P --> I["Inference<br/>Predict ŷ (number) for new x"]
%% Pastel colour scheme
style D fill:#E3F2FD,stroke:#1E88E5,stroke-width:1px
style M fill:#E8F5E9,stroke:#43A047,stroke-width:1px
style A fill:#FFF3E0,stroke:#FB8C00,stroke-width:1px
style L fill:#FCE4EC,stroke:#D81B60,stroke-width:1px
style O fill:#F3E5F5,stroke:#8E24AA,stroke-width:1px
style P fill:#E0F7FA,stroke:#00838F,stroke-width:1px
style I fill:#F1F8E9,stroke:#558B2F,stroke-width:1px
Regression
#
Regression is a supervised learning task that predicts a continuous-valued output based on input features.
February 26, 2026Gradient Descent Algorithm
#
Gradient Descent Algorithm (GDA) is
- an optimisation method
- used to train models
- by repeatedly updating parameters (weights and biases) to reduce the loss
In deep learning, the default training approach is almost always mini-batch gradient descent, usually with Adam or SGD + momentum.
Gradient Descent is used in both regression and classification.
It’s not tied to the task type — it’s tied to the fact you have:
February 15, 2026Linear NN for Classification
#
A Linear Neural Network (LNN) for classification uses no hidden layers.
It learns a linear decision boundary and outputs class probabilities, then converts them into predicted classes.
Neural-network view:
- Binary classification → logistic regression (single neuron + sigmoid)
- Multi-class classification → softmax regression (K output neurons + softmax)
flowchart LR
D["Data<br/>X, y"] --> M["Linear model<br/>w, b"]
M --> A["Activation<br/>Sigmoid / Softmax"]
A --> L["Loss<br/>Cross-entropy"]
L --> O["Optimiser<br/>Mini-batch GD / Adam"]
O --> P["Updated parameters<br/>w, b"]
P --> I["Inference<br/>Probabilities → class"]
%% Pastel colour scheme
style D fill:#E3F2FD,stroke:#1E88E5,stroke-width:1px
style M fill:#E8F5E9,stroke:#43A047,stroke-width:1px
style A fill:#FFF3E0,stroke:#FB8C00,stroke-width:1px
style L fill:#FCE4EC,stroke:#D81B60,stroke-width:1px
style O fill:#F3E5F5,stroke:#8E24AA,stroke-width:1px
style P fill:#E0F7FA,stroke:#00838F,stroke-width:1px
style I fill:#F1F8E9,stroke:#558B2F,stroke-width:1px
Classification
#
Classification predicts a discrete class label.
Common settings:
Optimisation of Deep models
#
Optimizers are algorithms that update neural network parameters to reduce the loss function.
Deep networks usually have millions or billions of parameters, so there is usually no closed-form solution.
Instead, training uses iterative optimisation.
Key takeaway:
An optimiser decides how the model moves through the loss landscape towards lower loss.
- Goal of Optimization
- Optimization Challenges in Deep Learning
- Gradient Descent
- Stochastic Gradient Descent
- Minibatch Stochastic Gradient Descent
- Momentum
- Adagrad and Algorithm
- RMSProp and Algorithm
- Adadelta and Algorithm
- Adam and Algorithm
- Code Implementation and comparison of algorithms (webinar)
flowchart TD
A["Optimisers in DNN"] --> B["Gradient Descent Variants"]
A --> C["Momentum-based Optimiser"]
A --> D["Adaptive Methods"]
A --> E["Learning Rate Schedules"]
D --> D1["Parameter-specific learning rates"]
E --> E1["Learning rate changes during training"]
style A fill:#E1F5FE,stroke:#4A90E2,stroke-width:2px
style B fill:#EDE7F6,stroke:#7E57C2
style C fill:#C8E6C9,stroke:#43A047
style D fill:#FFF9C4,stroke:#FBC02D
style E fill:#F8BBD0,stroke:#D81B60
Goal of Optimisation ☆
#
The goal is to find parameters
\( \theta \)
that minimise the loss.