Deep Learning on Arshad Siddiqui

LNN for Regression

Sun, 15 Feb 2026 00:00:00 +0000

Linear Neural Networks for Regression #

A linear neural network for regression is a model that predicts a continuous target by taking a weighted sum of input features and applying the identity activation (so the output can be any real number).

Single neuron for regression (predicting how much / how many)
Data + linear model (single neuron, no hidden layers) + squared loss
Training using batch gradient descent algorithm
Prediction (inference)
Eg: Auto MPG (UCI) style prediction with a single neuron (from-scratch code)

flowchart LR
 D["Data<br/>X, y"] --> M["Linear model<br/>w, b<br/>Single neuron"]
 M --> A["Activation<br/>Identity"]
 A --> L["Loss<br/>MSE (Squared error)"]
 L --> O["Optimiser<br/>Batch Gradient DescentBatch GD / Mini-batch GD"]
 O --> P["Parameters<br/>w, b"]
 P --> I["Inference<br/>Predict ŷ (number) for new x"]

 %% Pastel colour scheme
 style D fill:#E3F2FD,stroke:#1E88E5,stroke-width:1px
 style M fill:#E8F5E9,stroke:#43A047,stroke-width:1px
 style A fill:#FFF3E0,stroke:#FB8C00,stroke-width:1px
 style L fill:#FCE4EC,stroke:#D81B60,stroke-width:1px
 style O fill:#F3E5F5,stroke:#8E24AA,stroke-width:1px
 style P fill:#E0F7FA,stroke:#00838F,stroke-width:1px
 style I fill:#F1F8E9,stroke:#558B2F,stroke-width:1px

Regression #

Regression is a supervised learning task that predicts a continuous-valued output based on input features.

Gradient Descent Algorithm

Thu, 26 Feb 2026 00:00:00 +0000

Gradient Descent Algorithm #

Gradient Descent Algorithm (GDA) is

an optimisation method
used to train models
by repeatedly updating parameters (weights and biases) to reduce the loss

In deep learning, the default training approach is almost always mini-batch gradient descent, usually with Adam or SGD + momentum.

Gradient Descent is used in both regression and classification.

It’s not tied to the task type — it’s tied to the fact you have:

Deep Learning

Wed, 22 Apr 2026 00:00:00 +0000

Deep Learning #

Subset of ML
focuses on algorithms inspired by the structure and function of the brain called Artificial Neural Networks.
A neural network with multiple hidden layers and multiple nodes in each hidden layer is known as a deep learning system or a deep neural network.
Allows systems to automatically learn hierarchical representations (features) from raw input, such as images, sound, or text.

Operational Steps for Neural Architectures #

Step	Perceptron (Boolean/Logic)	Linear Regression Network	Binary Classification (Logistic)	DFNN / MLP (Classification)
1. Input	Take binary or discrete inputs \( x_1, \dots, x_n \)	Take numerical features \( x \)	Take numerical features \( x \)	Take high-dimensional numerical or categorical features
2. Weighted Sum	Single calculation: \( z = \sum (w_i x_i) + b \)	Single calculation: \( \hat{y} = w_0 + w_1 x \)	Single calculation: \( z = W x + b \)	Multiple stages: \( z^{[l]} = W^{[l]} a^{[l-1]} + b^{[l]} \) for each layer \( l \)
3. Activation	Step Function: Output 1 if \( z \geq 0 \) , else 0	Identity: The output remains \( z \) (no non-linear change)	Sigmoid: Maps \( z \) to a probability between 0 and 1	ReLU for hidden layers; Softmax/Sigmoid for the output layer
4. Loss / Error	Error = Target − Output	Mean Squared Error (MSE): \( J = \frac{1}{2N} \sum (Y - \hat{y})^2 \)	Binary Cross-Entropy (BCE): penalises based on probability distance	BCE or Categorical Cross-Entropy for multiple classes
5. Optimisation	Update weights only on misclassification	Gradient Descent: compute gradients at initialization and update weights	Backpropagation: compute error signals \( \delta \) and gradients \( dW \)	Backpropagation: recursive chain rule to update all hidden layer weights
6. Output	Discrete Boolean value (0 or 1)	Continuous numerical value (e.g., house prices)	Single probability score or class label	A vector of probabilities for multiple classes

flowchart LR
 %% Input Layer
 subgraph subGraph0["Input Layer"]
 I1(("Input 1"))
 I2(("Input 2"))
 I3(("Input 3"))
 end

 %% Hidden Layers
 subgraph subGraph1["Hidden Layer 1"]
 H1a(("H1-1"))
 H1b(("H1-2"))
 H1c(("H1-3"))
 end

 subgraph subGraph2["Hidden Layer 2"]
 H2a(("H2-1"))
 H2b(("H2-2"))
 H2c(("H2-3"))
 end

 subgraph subGraph3["Hidden Layer 3"]
 H3a(("H3-1"))
 H3b(("H3-2"))
 H3c(("H3-3"))
 end

 %% Output Layer
 subgraph subGraph4["Output Layer"]
 O(("Output"))
 end

 %% Connections: Input to Hidden Layer 1
 I1 --> H1a & H1b & H1c
 I2 --> H1a & H1b & H1c
 I3 --> H1a & H1b & H1c

 %% Connections: Hidden Layer 1 to Hidden Layer 2
 H1a --> H2a & H2b & H2c
 H1b --> H2a & H2b & H2c
 H1c --> H2a & H2b & H2c

 %% Connections: Hidden Layer 2 to Hidden Layer 3
 H2a --> H3a & H3b & H3c
 H2b --> H3a & H3b & H3c
 H2c --> H3a & H3b & H3c

 %% Connections: Hidden Layer 3 to Output
 H3a --> O
 H3b --> O
 H3c --> O

 %% Styling
 style I1 fill:#C8E6C9
 style I2 fill:#C8E6C9
 style I3 fill:#C8E6C9
 style H1a fill:#BBDEFB
 style H1b fill:#BBDEFB
 style H1c fill:#BBDEFB
 style H2a fill:#90CAF9
 style H2b fill:#90CAF9
 style H2c fill:#90CAF9
 style H3a fill:#64B5F6
 style H3b fill:#64B5F6
 style H3c fill:#64B5F6
 style O fill:#FFCDD2
 style subGraph0 stroke:none,fill:transparent
 style subGraph1 stroke:none,fill:transparent
 style subGraph2 stroke:none,fill:transparent
 style subGraph3 stroke:none,fill:transparent
 style subGraph4 stroke:none,fill:transparent

Types of Neural Networks #

Standard NN - Small and Standard for a smaller and simpler data (e.g. Real Estate
CNN - Convolution - used for Images (e.g. Photo Tagging, Object Detection)
RNN - Recurrent - used for Text (e.g. Speech Recognition, Translation)
Hybrid NN (e.g. Autonoumous Driving)

Components of DL #

Data
Learning Algorithm : How to transform data
Loss Function: Objective function that quantifies how well is model doing? lower the loss function, the better the model. So loss function will try to quantify how well or badly the model is learning or the model is doing.
Optimnisation Algorithm: in order to adjust the loss function, Learning Algorithm will try to optimize our algorithm. searching for the best possible parameters for minimizing the loss function. Popular optimization algorithms for deep learning are based on an approach called gradient descent.
Model

Operational Steps for Neural Architectures #

Step	Perceptron (Boolean/Logic)	Linear Regression Network	Binary Classification (Logistic)	DFNN / MLP (Classification)
1. Input	Binary/discrete inputs \( x_1, \dots, x_n \)	Numerical features \( x \)	Numerical features \( x \)	High-dimensional numerical or categorical features
2. Weighted Sum	\( z = \sum (w_i x_i) + b \)	\( \hat{y} = w_0 + w_1 x \)	\( z = W x + b \)	\( z^{[l]} = W^{[l]} a^{[l-1]} + b^{[l]} \)
3. Activation	Step: 1 if \( z \geq 0 \) , else 0	Identity: output = \( z \)	Sigmoid: maps \( z \) to probability	ReLU (hidden), Softmax/Sigmoid (output)
4. Loss / Error	Error = Target − Output	\( J = \frac{1}{2N} \sum (Y - \hat{y})^2 \)	Binary Cross-Entropy (BCE)	BCE or Categorical Cross-Entropy
5. Optimisation	Update on misclassification	Gradient Descent	Backpropagation (single layer)	Backpropagation (multi-layer chain rule)
6. Output	Boolean (0 or 1)	Continuous value	Probability score	Probability vector (multi-class)

Applications #

Computer Vision (e.g., face detection, medical imaging)
Natural Language Processing (e.g., ChatGPT, translation)
Self Driving Cars
Speech Assistants (e.g., Alexa, Siri)

Intution #

Deep Learning is the methodology, DNN is a model.

LNN for Classification

Sun, 15 Feb 2026 00:00:00 +0000

Linear NN for Classification #

A Linear Neural Network (LNN) for classification uses no hidden layers.
It learns a linear decision boundary and outputs class probabilities, then converts them into predicted classes.

Neural-network view:

Binary classification → logistic regression (single neuron + sigmoid)

Multi-class classification → softmax regression (K output neurons + softmax)

flowchart LR
 D["Data<br/>X, y"] --> M["Linear model<br/>w, b"]
 M --> A["Activation<br/>Sigmoid / Softmax"]
 A --> L["Loss<br/>Cross-entropy"]
 L --> O["Optimiser<br/>Mini-batch GD / Adam"]
 O --> P["Updated parameters<br/>w, b"]
 P --> I["Inference<br/>Probabilities → class"]

 %% Pastel colour scheme
 style D fill:#E3F2FD,stroke:#1E88E5,stroke-width:1px
 style M fill:#E8F5E9,stroke:#43A047,stroke-width:1px
 style A fill:#FFF3E0,stroke:#FB8C00,stroke-width:1px
 style L fill:#FCE4EC,stroke:#D81B60,stroke-width:1px
 style O fill:#F3E5F5,stroke:#8E24AA,stroke-width:1px
 style P fill:#E0F7FA,stroke:#00838F,stroke-width:1px
 style I fill:#F1F8E9,stroke:#558B2F,stroke-width:1px

Classification #

Classification predicts a discrete class label.
Common settings:

LLM - Model

Mon, 01 Jan 0001 00:00:00 +0000

LLM – Large Language Model #

Large Language Models (LLMs) are advanced AI systems designed to process, understand, and generate human-like text.

They learn language by analysing massive amounts of text data, discovering patterns in:

grammar
meaning
context
relationships between words and sentences
Built on Deep Learning
Implemented using Neural Networks
Based on Transformers
Often combined with tools like:
- Retrieval (RAG)
- Agents
- External APIs
- Memory systems

What makes an LLM special? #

Built using deep neural networks
Trained on very large datasets (books, articles, code, web text)
Can perform many tasks without task-specific training
General-purpose language understanding, not single-task models

Foundation: Transformer Architecture #

LLMs are based on the Transformer Architecture, which allows models to understand context and long-range dependencies in text.

Deep Feedforward Neural Networks (DFNN) for Classification

Thu, 26 Feb 2026 00:00:00 +0000

Deep Feedforward Neural Networks (DFNN) or Multi Layer Perceptrons (MLP) for Classification #

A Deep Feedforward Neural Network (DFNN), also called a Multi-Layer Perceptron (MLP), is a neural network with one or more hidden layers where information flows forward only (no recurrence).
For classification, DFNNs learn non-linear decision boundaries by combining hidden layers with non-linear activation functions.

Core idea:

A single neuron can only learn linear boundaries.

Adding hidden layers + non-linearity allows DFNNs to solve problems like XOR.

MLP as solution for XOR #

A single perceptron fails on XOR because XOR is not linearly separable.

Convolutional Neural Networks

Sun, 19 Apr 2026 00:00:00 +0000

Convolutional Neural Networks (CNN) #

Convolutional Neural Networks (CNNs) are specialised neural networks designed for data with spatial structure, especially images. They became the standard model for computer vision because they preserve spatial locality, reuse the same pattern detector across the image, and build representations hierarchically. In practical terms, a CNN starts by learning simple features such as edges and corners, then combines them into textures, shapes, object parts, and finally full semantic categories.

Deep CNN Architectures

Sun, 19 Apr 2026 00:00:00 +0000

Deep CNN Architectures #

Once the basic ideas of convolution, pooling, channels, and classifier heads are understood, the next step is to study how successful CNN architectures are designed in practice. The history of deep CNNs is not just a list of famous models. It is a progression of design ideas: smaller filters, more depth, better optimisation, bottlenecks, multi-scale processing, residual connections, and transfer learning.

Key takeaway:
Deep CNN architectures evolved by solving specific problems one by one: LeNet established the template, AlexNet proved deep learning could dominate large-scale vision, VGG simplified the design, NiN introduced powerful 1 × 1 ideas, GoogLeNet made multi-scale processing efficient, and ResNet solved the optimisation problem of very deep networks.

CNN Pipeline

Wed, 22 Apr 2026 00:00:00 +0000

CNN Pipeline: Preprocessing & Models #

Understand CNN concepts deeply
Build CNN models step-by-step
Apply CNNs in assignments using Keras

Think of CNN as a pipeline: Image → Features → Patterns → Prediction

1. Image Representation #

\[ X \in \mathbb{R}^{H \times W \times C} \]

H = Height
W = Width
C = Channels

2. Convolution Operation #

\[ Z(i,j) = \sum_{m,n} X(i+m, j+n) \cdot K(m,n) \]

Sliding filter extracts features
Produces feature maps

3. Stride & Padding #

\[ Output = \frac{N - F + 2P}{S} + 1 \]

4. Activation (ReLU) #

\[ ReLU(x) = max(0, x) \]

5. Pooling #

Max Pooling → strongest feature
Average Pooling → smooth

6. Global Average Pooling #

\[ y_k = \frac{1}{HW} \sum_{i,j} x_{i,j,k} \]

7. Loss Function #

\[ L = - \sum y \log(\hat{y}) \]

8. CNN Architecture #

graph LR
A[Input Image] --> B[Conv]
B --> C[ReLU]
C --> D[Pooling]
D --> E[Conv Layers]
E --> F[Flatten / GAP]
F --> G[Dense]
G --> H[Output]

9. Training #

Forward pass
Loss computation
Backpropagation
Weight update

10. Keras Implementation #

Model #

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten

model = Sequential()

model.add(Conv2D(32, (3,3), activation='relu', input_shape=(64,64,3)))
model.add(MaxPooling2D((2,2)))

model.add(Conv2D(64, (3,3), activation='relu'))
model.add(MaxPooling2D((2,2)))

model.add(Flatten())

model.add(Dense(128, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

Compile #

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

Train #

model.fit(X_train, y_train, epochs=10, batch_size=32)

Predict #

pred = model.predict(X_test)

11. Tips #

Normalize images
Use small filters
Avoid too many dense layers

12. Summary #

CNN = Automatic feature extractor + classifier

Recurrent Neural Networks

Sun, 19 Apr 2026 00:00:00 +0000

Recurrent Neural Networks #

Recurrent Neural Networks (RNNs) are neural networks designed for sequential data, where the order of inputs matters and the model must use information from earlier time steps to interpret later ones. Unlike a feedforward network, an RNN does not process each input in isolation. It carries a hidden state from one time step to the next, so the network can build a running summary of what it has seen so far.

Deep Recurrent Neural Networks

Sun, 19 Apr 2026 00:00:00 +0000

Deep Recurrent Neural Networks #

Vanilla RNNs introduce the hidden-state idea, but they struggle on longer and more complex sequences because gradients can vanish across time. Deep recurrent models extend the RNN idea in two important ways:

make the recurrent architecture richer, for example by stacking multiple recurrent layers or using information from both directions,
use gates and memory cells to control what should be remembered, forgotten, updated, and exposed.

This is why practical recurrent modelling usually moves from a simple RNN to stacked RNNs, bidirectional RNNs, GRUs, or LSTMs.