Deep Learning

LNN for Regression

February 15, 2026

Linear Neural Networks for Regression #

A linear neural network for regression is a model that predicts a continuous target by taking a weighted sum of input features and applying the identity activation (so the output can be any real number).

Single neuron for regression (predicting how much / how many)
Data + linear model (single neuron, no hidden layers) + squared loss
Training using batch gradient descent algorithm
Prediction (inference)
Eg: Auto MPG (UCI) style prediction with a single neuron (from-scratch code)

flowchart LR
  D["Data<br/>X, y"] --> M["Linear model<br/>w, b<br/>Single neuron"]
  M --> A["Activation<br/>Identity"]
  A --> L["Loss<br/>MSE (Squared error)"]
  L --> O["Optimiser<br/>Batch Gradient DescentBatch GD / Mini-batch GD"]
  O --> P["Parameters<br/>w, b"]
  P --> I["Inference<br/>Predict ŷ (number) for new x"]

  %% Pastel colour scheme
  style D fill:#E3F2FD,stroke:#1E88E5,stroke-width:1px
  style M fill:#E8F5E9,stroke:#43A047,stroke-width:1px
  style A fill:#FFF3E0,stroke:#FB8C00,stroke-width:1px
  style L fill:#FCE4EC,stroke:#D81B60,stroke-width:1px
  style O fill:#F3E5F5,stroke:#8E24AA,stroke-width:1px
  style P fill:#E0F7FA,stroke:#00838F,stroke-width:1px
  style I fill:#F1F8E9,stroke:#558B2F,stroke-width:1px

Regression #

Regression is a supervised learning task that predicts a continuous-valued output based on input features.

Gradient Descent Algorithm

February 26, 2026

AI, Machine-Learning

Deep Learning, Optimisation

Gradient Descent Algorithm #

Gradient Descent Algorithm (GDA) is

an optimisation method
used to train models
by repeatedly updating parameters (weights and biases) to reduce the loss

In deep learning, the default training approach is almost always mini-batch gradient descent, usually with Adam or SGD + momentum.

Gradient Descent is used in both regression and classification.

It’s not tied to the task type — it’s tied to the fact you have:

Deep Learning

April 22, 2026

Artificial Intelligence

Deep Learning, Neural Networks, AI

Deep Learning #

Subset of ML
focuses on algorithms inspired by the structure and function of the brain called Artificial Neural Networks.
A neural network with multiple hidden layers and multiple nodes in each hidden layer is known as a deep learning system or a deep neural network.
Allows systems to automatically learn hierarchical representations (features) from raw input, such as images, sound, or text.

Operational Steps for Neural Architectures #

Step	Perceptron (Boolean/Logic)	Linear Regression Network	Binary Classification (Logistic)	DFNN / MLP (Classification)
1. Input	Take binary or discrete inputs \( x_1, \dots, x_n \)	Take numerical features \( x \)	Take numerical features \( x \)	Take high-dimensional numerical or categorical features
2. Weighted Sum	Single calculation: \( z = \sum (w_i x_i) + b \)	Single calculation: \( \hat{y} = w_0 + w_1 x \)	Single calculation: \( z = W x + b \)	Multiple stages: \( z^{[l]} = W^{[l]} a^{[l-1]} + b^{[l]} \) for each layer \( l \)
3. Activation	Step Function: Output 1 if \( z \geq 0 \) , else 0	Identity: The output remains \( z \) (no non-linear change)	Sigmoid: Maps \( z \) to a probability between 0 and 1	ReLU for hidden layers; Softmax/Sigmoid for the output layer
4. Loss / Error	Error = Target − Output	Mean Squared Error (MSE): \( J = \frac{1}{2N} \sum (Y - \hat{y})^2 \)	Binary Cross-Entropy (BCE): penalises based on probability distance	BCE or Categorical Cross-Entropy for multiple classes
5. Optimisation	Update weights only on misclassification	Gradient Descent: compute gradients at initialization and update weights	Backpropagation: compute error signals \( \delta \) and gradients \( dW \)	Backpropagation: recursive chain rule to update all hidden layer weights
6. Output	Discrete Boolean value (0 or 1)	Continuous numerical value (e.g., house prices)	Single probability score or class label	A vector of probabilities for multiple classes

flowchart LR
    %% Input Layer
    subgraph subGraph0["Input Layer"]
        I1(("Input 1"))
        I2(("Input 2"))
        I3(("Input 3"))
    end

    %% Hidden Layers
    subgraph subGraph1["Hidden Layer 1"]
        H1a(("H1-1"))
        H1b(("H1-2"))
        H1c(("H1-3"))
    end

    subgraph subGraph2["Hidden Layer 2"]
        H2a(("H2-1"))
        H2b(("H2-2"))
        H2c(("H2-3"))
    end

    subgraph subGraph3["Hidden Layer 3"]
        H3a(("H3-1"))
        H3b(("H3-2"))
        H3c(("H3-3"))
    end

    %% Output Layer
    subgraph subGraph4["Output Layer"]
        O(("Output"))
    end

    %% Connections: Input to Hidden Layer 1
    I1 --> H1a & H1b & H1c
    I2 --> H1a & H1b & H1c
    I3 --> H1a & H1b & H1c

    %% Connections: Hidden Layer 1 to Hidden Layer 2
    H1a --> H2a & H2b & H2c
    H1b --> H2a & H2b & H2c
    H1c --> H2a & H2b & H2c

    %% Connections: Hidden Layer 2 to Hidden Layer 3
    H2a --> H3a & H3b & H3c
    H2b --> H3a & H3b & H3c
    H2c --> H3a & H3b & H3c

    %% Connections: Hidden Layer 3 to Output
    H3a --> O
    H3b --> O
    H3c --> O

    %% Styling
    style I1 fill:#C8E6C9
    style I2 fill:#C8E6C9
    style I3 fill:#C8E6C9
    style H1a fill:#BBDEFB
    style H1b fill:#BBDEFB
    style H1c fill:#BBDEFB
    style H2a fill:#90CAF9
    style H2b fill:#90CAF9
    style H2c fill:#90CAF9
    style H3a fill:#64B5F6
    style H3b fill:#64B5F6
    style H3c fill:#64B5F6
    style O fill:#FFCDD2
    style subGraph0 stroke:none,fill:transparent
    style subGraph1 stroke:none,fill:transparent
    style subGraph2 stroke:none,fill:transparent
    style subGraph3 stroke:none,fill:transparent
    style subGraph4 stroke:none,fill:transparent

Types of Neural Networks #

Standard NN - Small and Standard for a smaller and simpler data (e.g. Real Estate
CNN - Convolution - used for Images (e.g. Photo Tagging, Object Detection)
RNN - Recurrent - used for Text (e.g. Speech Recognition, Translation)
Hybrid NN (e.g. Autonoumous Driving)

Components of DL #

Data
Learning Algorithm : How to transform data
Loss Function: Objective function that quantifies how well is model doing? lower the loss function, the better the model. So loss function will try to quantify how well or badly the model is learning or the model is doing.
Optimnisation Algorithm: in order to adjust the loss function, Learning Algorithm will try to optimize our algorithm. searching for the best possible parameters for minimizing the loss function. Popular optimization algorithms for deep learning are based on an approach called gradient descent.
Model

Operational Steps for Neural Architectures #

Step	Perceptron (Boolean/Logic)	Linear Regression Network	Binary Classification (Logistic)	DFNN / MLP (Classification)
1. Input	Binary/discrete inputs \( x_1, \dots, x_n \)	Numerical features \( x \)	Numerical features \( x \)	High-dimensional numerical or categorical features
2. Weighted Sum	\( z = \sum (w_i x_i) + b \)	\( \hat{y} = w_0 + w_1 x \)	\( z = W x + b \)	\( z^{[l]} = W^{[l]} a^{[l-1]} + b^{[l]} \)
3. Activation	Step: 1 if \( z \geq 0 \) , else 0	Identity: output = \( z \)	Sigmoid: maps \( z \) to probability	ReLU (hidden), Softmax/Sigmoid (output)
4. Loss / Error	Error = Target − Output	\( J = \frac{1}{2N} \sum (Y - \hat{y})^2 \)	Binary Cross-Entropy (BCE)	BCE or Categorical Cross-Entropy
5. Optimisation	Update on misclassification	Gradient Descent	Backpropagation (single layer)	Backpropagation (multi-layer chain rule)
6. Output	Boolean (0 or 1)	Continuous value	Probability score	Probability vector (multi-class)

Applications #

Computer Vision (e.g., face detection, medical imaging)
Natural Language Processing (e.g., ChatGPT, translation)
Self Driving Cars
Speech Assistants (e.g., Alexa, Siri)

Intution #

Deep Learning is the methodology, DNN is a model.

LNN for Classification

February 15, 2026

AI, Machine-Learning

Deep Learning, Classification, Optimisation

Linear NN for Classification #

A Linear Neural Network (LNN) for classification uses no hidden layers.
It learns a linear decision boundary and outputs class probabilities, then converts them into predicted classes.

Neural-network view:
Binary classification → logistic regression (single neuron + sigmoid)
Multi-class classification → softmax regression (K output neurons + softmax)

flowchart LR
  D["Data<br/>X, y"] --> M["Linear model<br/>w, b"]
  M --> A["Activation<br/>Sigmoid / Softmax"]
  A --> L["Loss<br/>Cross-entropy"]
  L --> O["Optimiser<br/>Mini-batch GD / Adam"]
  O --> P["Updated parameters<br/>w, b"]
  P --> I["Inference<br/>Probabilities → class"]

  %% Pastel colour scheme
  style D fill:#E3F2FD,stroke:#1E88E5,stroke-width:1px
  style M fill:#E8F5E9,stroke:#43A047,stroke-width:1px
  style A fill:#FFF3E0,stroke:#FB8C00,stroke-width:1px
  style L fill:#FCE4EC,stroke:#D81B60,stroke-width:1px
  style O fill:#F3E5F5,stroke:#8E24AA,stroke-width:1px
  style P fill:#E0F7FA,stroke:#00838F,stroke-width:1px
  style I fill:#F1F8E9,stroke:#558B2F,stroke-width:1px

Classification #

Classification predicts a discrete class label.
Common settings:

LLM - Model

AI, ML

AI, ML, Deep Learning, LLM

LLM – Large Language Model #

Large Language Models (LLMs) are advanced AI systems designed to process, understand, and generate human-like text.

They learn language by analysing massive amounts of text data, discovering patterns in:

grammar
meaning
context
relationships between words and sentences
Built on Deep Learning
Implemented using Neural Networks
Based on Transformers
Often combined with tools like:
- Retrieval (RAG)
- Agents
- External APIs
- Memory systems

What makes an LLM special? #

Built using deep neural networks
Trained on very large datasets (books, articles, code, web text)
Can perform many tasks without task-specific training
General-purpose language understanding, not single-task models

Foundation: Transformer Architecture #

LLMs are based on the Transformer Architecture, which allows models to understand context and long-range dependencies in text.

Deep Feedforward Neural Networks (DFNN) for Classification

February 26, 2026

AI, Machine-Learning

Deep Learning, DFNN, MLP, Classification

Deep Feedforward Neural Networks (DFNN) or Multi Layer Perceptrons (MLP) for Classification #

A Deep Feedforward Neural Network (DFNN), also called a Multi-Layer Perceptron (MLP), is a neural network with one or more hidden layers where information flows forward only (no recurrence).
For classification, DFNNs learn non-linear decision boundaries by combining hidden layers with non-linear activation functions.

Core idea:
A single neuron can only learn linear boundaries.
Adding hidden layers + non-linearity allows DFNNs to solve problems like XOR.

MLP as solution for XOR #

A single perceptron fails on XOR because XOR is not linearly separable.

Convolutional Neural Networks

April 19, 2026

AI, Deep-Learning

Deep Learning, Cnn, Convolutional Neural Networks, Computer Vision, Feature Maps, Receptive Field, Padding, Pooling

Convolutional Neural Networks (CNN) #

Convolutional Neural Networks (CNNs) are specialised neural networks designed for data with spatial structure, especially images. They became the standard model for computer vision because they preserve spatial locality, reuse the same pattern detector across the image, and build representations hierarchically. In practical terms, a CNN starts by learning simple features such as edges and corners, then combines them into textures, shapes, object parts, and finally full semantic categories.

Deep CNN Architectures

April 19, 2026

AI, Deep-Learning

Deep Learning, Cnn, Deep CNN Architectures, LeNet, AlexNet, VGG, NiN, GoogLeNet, ResNet, Transfer Learning

Deep CNN Architectures #

Once the basic ideas of convolution, pooling, channels, and classifier heads are understood, the next step is to study how successful CNN architectures are designed in practice. The history of deep CNNs is not just a list of famous models. It is a progression of design ideas: smaller filters, more depth, better optimisation, bottlenecks, multi-scale processing, residual connections, and transfer learning.

Key takeaway:
Deep CNN architectures evolved by solving specific problems one by one: LeNet established the template, AlexNet proved deep learning could dominate large-scale vision, VGG simplified the design, NiN introduced powerful 1 × 1 ideas, GoogLeNet made multi-scale processing efficient, and ResNet solved the optimisation problem of very deep networks.

CNN Pipeline

April 22, 2026

AI, Deep-Learning

Deep Learning, Cnn, Keras

CNN Pipeline: Preprocessing & Models #

Understand CNN concepts deeply
Build CNN models step-by-step
Apply CNNs in assignments using Keras

Think of CNN as a pipeline: Image → Features → Patterns → Prediction

1. Image Representation #

\[ X \in \mathbb{R}^{H \times W \times C} \]

H = Height
W = Width
C = Channels

2. Convolution Operation #

\[ Z(i,j) = \sum_{m,n} X(i+m, j+n) \cdot K(m,n) \]

Sliding filter extracts features
Produces feature maps

3. Stride & Padding #

\[ Output = \frac{N - F + 2P}{S} + 1 \]

4. Activation (ReLU) #

\[ ReLU(x) = max(0, x) \]

5. Pooling #

Max Pooling → strongest feature
Average Pooling → smooth

6. Global Average Pooling #

\[ y_k = \frac{1}{HW} \sum_{i,j} x_{i,j,k} \]

7. Loss Function #

\[ L = - \sum y \log(\hat{y}) \]

8. CNN Architecture #

graph LR
A[Input Image] --> B[Conv]
B --> C[ReLU]
C --> D[Pooling]
D --> E[Conv Layers]
E --> F[Flatten / GAP]
F --> G[Dense]
G --> H[Output]

9. Training #

Forward pass
Loss computation
Backpropagation
Weight update

10. Keras Implementation #

Model #

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten

model = Sequential()

model.add(Conv2D(32, (3,3), activation='relu', input_shape=(64,64,3)))
model.add(MaxPooling2D((2,2)))

model.add(Conv2D(64, (3,3), activation='relu'))
model.add(MaxPooling2D((2,2)))

model.add(Flatten())

model.add(Dense(128, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

Compile #

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

Train #

model.fit(X_train, y_train, epochs=10, batch_size=32)

Predict #

pred = model.predict(X_test)

11. Tips #

Normalize images
Use small filters
Avoid too many dense layers

12. Summary #

CNN = Automatic feature extractor + classifier

Recurrent Neural Networks

April 19, 2026

AI, Deep-Learning

Deep Learning, RNN, Recurrent Neural Networks, Sequence Modelling, BPTT, Encoder Decoder, Teacher Forcing, Time Series

Recurrent Neural Networks #

Recurrent Neural Networks (RNNs) are neural networks designed for sequential data, where the order of inputs matters and the model must use information from earlier time steps to interpret later ones. Unlike a feedforward network, an RNN does not process each input in isolation. It carries a hidden state from one time step to the next, so the network can build a running summary of what it has seen so far.