A linear neural network for regression is a model that predicts a continuous target by taking a weighted sum of input features and applying the identity activation (so the output can be any real number).
Single neuron for regression (predicting how much / how many)
Data + linear model (single neuron, no hidden layers) + squared loss
Training using batch gradient descent algorithm
Prediction (inference)
Eg: Auto MPG (UCI) style prediction with a single neuron (from-scratch code)
flowchart LR
D["Data<br/>X, y"] --> M["Linear model<br/>w, b<br/>Single neuron"]
M --> A["Activation<br/>Identity"]
A --> L["Loss<br/>MSE (Squared error)"]
L --> O["Optimiser<br/>Batch Gradient DescentBatch GD / Mini-batch GD"]
O --> P["Parameters<br/>w, b"]
P --> I["Inference<br/>Predict ŷ (number) for new x"]
%% Pastel colour scheme
style D fill:#E3F2FD,stroke:#1E88E5,stroke-width:1px
style M fill:#E8F5E9,stroke:#43A047,stroke-width:1px
style A fill:#FFF3E0,stroke:#FB8C00,stroke-width:1px
style L fill:#FCE4EC,stroke:#D81B60,stroke-width:1px
style O fill:#F3E5F5,stroke:#8E24AA,stroke-width:1px
style P fill:#E0F7FA,stroke:#00838F,stroke-width:1px
style I fill:#F1F8E9,stroke:#558B2F,stroke-width:1px
Loss Function: Objective function that quantifies how well is model doing? lower the loss function, the better the model. So loss function will try to quantify how well or badly the model is learning or the model is doing.
Optimnisation Algorithm: in order to adjust the loss function, Learning Algorithm will try to optimize our algorithm. searching for the best possible parameters for minimizing the loss function. Popular optimization algorithms for deep learning are based on an approach called gradient descent.
A Linear Neural Network (LNN) for classification uses no hidden layers. It learns a linear decision boundary and outputs class probabilities, then converts them into predicted classes.
flowchart LR
D["Data<br/>X, y"] --> M["Linear model<br/>w, b"]
M --> A["Activation<br/>Sigmoid / Softmax"]
A --> L["Loss<br/>Cross-entropy"]
L --> O["Optimiser<br/>Mini-batch GD / Adam"]
O --> P["Updated parameters<br/>w, b"]
P --> I["Inference<br/>Probabilities → class"]
%% Pastel colour scheme
style D fill:#E3F2FD,stroke:#1E88E5,stroke-width:1px
style M fill:#E8F5E9,stroke:#43A047,stroke-width:1px
style A fill:#FFF3E0,stroke:#FB8C00,stroke-width:1px
style L fill:#FCE4EC,stroke:#D81B60,stroke-width:1px
style O fill:#F3E5F5,stroke:#8E24AA,stroke-width:1px
style P fill:#E0F7FA,stroke:#00838F,stroke-width:1px
style I fill:#F1F8E9,stroke:#558B2F,stroke-width:1px
Deep Feedforward Neural Networks (DFNN) or Multi Layer Perceptrons (MLP) for Classification
#
A Deep Feedforward Neural Network (DFNN), also called a Multi-Layer Perceptron (MLP), is a neural network with one or more hidden layers where information flows forward only (no recurrence). For classification, DFNNs learn non-linear decision boundaries by combining hidden layers with non-linear activation functions.
Core idea:
A single neuron can only learn linear boundaries.
Adding hidden layers + non-linearity allows DFNNs to solve problems like XOR.
Convolutional Neural Networks (CNNs) are specialised neural networks designed for data with spatial structure, especially images. They became the standard model for computer vision because they preserve spatial locality, reuse the same pattern detector across the image, and build representations hierarchically. In practical terms, a CNN starts by learning simple features such as edges and corners, then combines them into textures, shapes, object parts, and finally full semantic categories.
Once the basic ideas of convolution, pooling, channels, and classifier heads are understood, the next step is to study how successful CNN architectures are designed in practice. The history of deep CNNs is not just a list of famous models. It is a progression of design ideas: smaller filters, more depth, better optimisation, bottlenecks, multi-scale processing, residual connections, and transfer learning.
Key takeaway: Deep CNN architectures evolved by solving specific problems one by one: LeNet established the template, AlexNet proved deep learning could dominate large-scale vision, VGG simplified the design, NiN introduced powerful 1 × 1 ideas, GoogLeNet made multi-scale processing efficient, and ResNet solved the optimisation problem of very deep networks.
Recurrent Neural Networks (RNNs) are neural networks designed for sequential data, where the order of inputs matters and the model must use information from earlier time steps to interpret later ones. Unlike a feedforward network, an RNN does not process each input in isolation. It carries a hidden state from one time step to the next, so the network can build a running summary of what it has seen so far.