ML on Arshad Siddiqui

Supervised Learning

Sat, 03 Jan 2026 10:29:52 +0100

Supervised Learning #

Trained using labelled data.
Each example in the training set includes the correct output.
The algorithm learns to generalise and make predictions on unseen data.
Generally more accurate than unsupervised methods.
Requires human intervention for labelling and setup.
Widely used due to its accuracy and efficiency.
Produces highly accurate results when trained on good-quality labelled data.

Classification #

Output is discrete (e.g. Yes/No, Spam/Not Spam).
Used for categorising data into predefined classes.
Support Vector Machine (SVM) is a common classifier (a linear classifier with margin-based separation).

Unsupervised Learning

Sat, 03 Jan 2026 10:29:52 +0100

Unsupervised Learning #

Works on unlabelled raw data.
The algorithm discovers hidden patterns without prior knowledge of outcomes.
Requires no human intervention during training.
Does not make direct predictions — it groups or organises data instead.
Carries a higher risk because there’s no ground truth to verify results.
Common techniques include Clustering, Association, and Dimensionality Reduction.

stateDiagram-v2

 %% ML maths-based colours (same palette as supervised)
 classDef probability fill:#d1fae5,stroke:#065f46,stroke-width:1px
 classDef geometry fill:#ffedd5,stroke:#9a3412,stroke-width:1px
 classDef category font-style:italic,font-weight:bold,fill:#f3f4f6,stroke:#374151

 %% Root
 USL: Unsupervised Learning

 %% Main branches
 USL --> CLU:::category
 CLU: Clustering

 USL --> DR:::category
 DR: Dimensionality Reduction

 %% Clustering algorithms
 CLU --> KM:::geometry
 KM: K-Means

 CLU --> HC:::geometry
 HC: Hierarchical Clustering

 CLU --> DB:::geometry
 DB: DBSCAN

 %% Probabilistic models
 USL --> PM:::category
 PM: Probabilistic Models

 PM --> GMM:::probability
 GMM: Gaussian Mixture Model

 PM --> HMM:::probability
 HMM: Hidden Markov Model

Clustering #

Groups similar data points together based on shared features.
Commonly used for market segmentation, image compression, and anomaly detection.

Common Types of Clustering #

K-Means Clustering – Divides data into K groups based on similarity.
Hierarchical Clustering – Builds a hierarchy (tree) of clusters.
DBSCAN (Density-Based Spatial Clustering) – Groups points close in density; identifies noise/outliers.

Association #

Identifies relationships or correlations between variables in a dataset.
Commonly used in market basket analysis (e.g. “Customers who bought X also bought Y”).

Common Techniques #

Apriori Algorithm – Finds frequent itemsets and generates association rules.
Eclat Algorithm – Similar to Apriori but uses set intersections for faster computation.

Dimensionality Reduction #

Reduces the number of input variables to simplify data.
Helps remove noise and redundancy.
Commonly used in data pre-processing and visualisation.

Common Techniques #

Principal Component Analysis (PCA) – Projects data onto fewer dimensions while keeping most variance.
Linear Discriminant Analysis (LDA) – Focuses on class separation.
t-SNE (t-Distributed Stochastic Neighbour Embedding) – Used for visualising high-dimensional data.
Autoencoders – Neural networks that compress and reconstruct data.

mindmap
 root(Unsupervised Learning)
 Clustering
 K Means
 Hierarchical Clustering
 DBSCAN
 Dimensionality Reduction
 PCA
 t SNE
 Autoencoders
 Probabilistic Models
 Gaussian Mixture Model
 Hidden Markov Model

Home | Machine Learning

Semi-Supervised Learning

Sat, 03 Jan 2026 10:29:52 +0100

Semi-Supervised Learning #

A combination of labelled and unlabelled data.
Useful when labelling large datasets is expensive or time-consuming.
Works well with high-volume datasets (e.g. millions of images).
Only a small fraction of data is labelled (e.g. a few thousand).
The algorithm learns from both labelled examples and structure in unlabelled data.
Ideal for medical imaging where labelled data is limited.
For example, a radiologist can label a small set of medical scans,
and the model uses that to learn from thousands of unlabelled scans.
Helps improve accuracy and generalisation with minimal manual effort.

Home | Machine Learning

Neural Networks

Mon, 01 Jan 0001 00:00:00 +0000

Neural Networks #

A network of artificial neurons inspired by how neurons function in the human brain.
At its core - a mathematical model designed to process and learn from data.
Neural networks form the foundation of Deep Learning (involves training large and complex networks on vast amounts of data).

flowchart LR
 subgraph subGraph0["Input Layer"]
 I1(("Input 1"))
 I2(("Input 2"))
 I3(("Input 3"))
 end
 subgraph subGraph1["Hidden Layer"]
 H1(("Hidden 1"))
 H2(("Hidden 2"))
 H3(("Hidden 3"))
 end
 subgraph subGraph2["Output Layer"]
 O(("Output"))
 end
 I1 --> H1 & H2 & H3
 I2 --> H1 & H2 & H3
 I3 --> H1 & H2 & H3
 H1 --> O
 H2 --> O
 H3 --> O

 style I1 fill:#C8E6C9
 style I2 fill:#C8E6C9
 style I3 fill:#C8E6C9
 style H1 stroke:#2962FF,fill:#BBDEFB
 style H2 fill:#BBDEFB
 style H3 fill:#BBDEFB
 style O fill:#FFCDD2
 style subGraph0 stroke:none,fill:transparent
 style subGraph1 stroke:none,fill:transparent
 style subGraph2 stroke:none,fill:transparent

Structure of a Neural Network #

A typical neural network has three main layers:

Machine Learning

Tue, 06 Aug 2024 23:29:52 +0100

Machine Learning #

stateDiagram-v2

 %% ===== CLASS DEFINITIONS (Math-based colours) =====
 classDef algebra fill:#cfe8ff,stroke:#1e3a8a,stroke-width:1px
 classDef probability fill:#d1fae5,stroke:#065f46,stroke-width:1px
 classDef geometry fill:#ffedd5,stroke:#9a3412,stroke-width:1px
 classDef logic fill:#ede9fe,stroke:#5b21b6,stroke-width:1px
 classDef category font-style:italic,font-weight:bold,fill:#aaaaaa,stroke:#374151,stroke-width:3px

 %% ===== ROOT =====
 ML: Machine Learning

 %% ===== SUPERVISED =====
 ML --> SL:::category
 SL: Supervised Learning

 SL --> Regression
 Regression --> LR:::algebra
 LR: Linear Regression

 LR --> NN:::algebra
 NN: Neural Network

 NN --> DT:::logic
 DT: Decision Tree

 SL --> Classification
 Classification --> NB:::probability
 NB: Naive Bayes

 NB --> KNN:::geometry
 KNN: k-Nearest Neighbours

 KNN --> SVM:::algebra
 SVM: Support Vector Machine
 
 %% ===== UNSUPERVISED =====
 ML --> USL:::category
 USL: Unsupervised Learning

 USL --> Clustering
 Clustering --> KM:::geometry
 KM: K-Means

 KM --> GMM:::probability
 GMM: Gaussian Mixture Model

 GMM --> HMM:::probability
 HMM: Hidden Markov Model

 %% ===== REINFORCEMENT =====
 ML --> RL:::category
 RL: Reinforcement Learning

 RL --> DM:::logic
 DM: Decision Making

Mathematical Legend

Algebra / Linear Algebra (Blue) #

Used heavily when models rely on:

Artificial Neuron and Perceptron

Mon, 01 Jan 0001 00:00:00 +0000

Artificial Neuron and Perceptron #

knowledge in neural networks is stored in connection weights, and learning means modifying those weights.

Biological Neuron #

A biological neuron is a specialised cell that processes and transmits information through electrical and chemical signals.

Core components:

Dendrites: receive signals from other neurons
Cell body (soma): processes incoming signals
Axon: transmits the output signal
Synapses: connection points between neurons

Biological intuition:

many inputs arrive to one neuron
one neuron can connect out to many neurons
massive parallelism enables fast perception and recognition

Artificial Neuron #

An artificial neuron is a simplified computational model inspired by biological neurons.

ML Workflow

Mon, 01 Jan 0001 00:00:00 +0000

Machine learning Workflow #

Data is the foundation of any machine learning system. Quality of data matters more than model complexity.

Role of Data #

Data determines:

What patterns the model can learn
How well it generalises
Whether bias or noise is introduced

Bad data → bad model (even with perfect algorithms).

Data Preprocessing, wrangling #

Raw data is never ready for training.

Data Issues

Noise
- For objects, noise is an extraneous object
- For attributes, noise refers to modification of original values
- Use Log or Z Transfer to convert to mean
Outliers
- Data objects with characteristics that are considerably different than most of the other data objects in the data set
- Handle: Use IQR method
- Find Lower and Upper Bound and replace Outlier with Lower or Upper Bound
Missing Values
- Eliminate data objects or variables
- Handle: Estimate missing values
  - Mean, Median or Mode
  - Prefer Median if there are missing outliers
- Ignore the missing value during analysis
Duplicate Data
- Major issue when merging data from heterogeneous sources
Inconsistent Codes
- Find all Unique and transfer all inconsistent to

Data Preprocessing techniques

Regression(Linear Models)

Mon, 01 Jan 0001 00:00:00 +0000

Linear Regression #

Linear Regression is a supervised ML method used to predict a numerical target by fitting a model that is linear in its parameters.

In ML , linear models are a core baseline: they’re fast, often surprisingly strong, and usually easy to interpret.

Key takeaway: Linear Regression learns parameters by minimising a squared-error cost. You can solve it directly (closed form) or iteratively (gradient descent), and you can extend it using basis functions and regularisation.

Ordinary Least Squares

Sat, 21 Feb 2026 00:00:00 +0000

Direct solution method - Ordinary Least Squares and the Line of Best Fit #

It is possible to compute the best parameters for linear regression in one shot (closed-form), instead of iteratively improving them step-by-step. fileciteturn34file10turn34file6

For linear regression, the direct method is usually Ordinary Least Squares (OLS).

Ordinary Least Squares (OLS) chooses the “best” line by minimising squared prediction errors.

Key takeaway: OLS defines “best fit” as the line that minimises the total squared residual error across all data points.

Cost Function

Sat, 21 Feb 2026 00:00:00 +0000

Cost Function #

also known as an objective function
how far the predicted values are from the actual ones
measure of the difference between predicted values and actual values
quantifies the error between a model’s predicted values and actual values
measures the model’s error on a group of datapoints
method used to predict values by drawing the best-fit line through the data
used to evaluate the accuracy of a model’s predictions

Gradient Descent

Sat, 21 Feb 2026 00:00:00 +0000

Gradient Descent for Linear Regression #

Gradient descent is an iterative optimisation method used to minimise the regression cost function by repeatedly updating parameters in the direction that reduces error.

Iterative method
Types: batch / stochastic / mini-batch

Key takeaway: Gradient descent starts with initial parameter values and repeatedly updates them using the gradient until the cost stops decreasing.

flowchart TD
GD["Gradient<br/>Descent"] -->|minimises| CF["Cost<br/>function"]
GD -->|updates| W["Parameters<br/>(weights)"]
GD -->|uses| GR["Gradient<br/>(slope)"]

GD --> H["Hyperparameters"]
H --> LR["Learning<br/>rate"]
H --> BS["Batch<br/>size"]
H --> EP["Epochs"]

style GD fill:#90CAF9,stroke:#1E88E5,color:#000

style CF fill:#CE93D8,stroke:#8E24AA,color:#000
style W fill:#CE93D8,stroke:#8E24AA,color:#000
style GR fill:#CE93D8,stroke:#8E24AA,color:#000
style H fill:#CE93D8,stroke:#8E24AA,color:#000
style LR fill:#CE93D8,stroke:#8E24AA,color:#000
style BS fill:#CE93D8,stroke:#8E24AA,color:#000
style EP fill:#CE93D8,stroke:#8E24AA,color:#000

Types of GD #

flowchart TD
T["Gradient Descent<br/>types"] --> BGD["Batch<br/>GD"]
T --> SGD["Stochastic<br/>GD"]
T --> MGD["Mini-batch<br/>GD"]

BGD --> ALL["All data<br/>per step"]
BGD --> STB["Smooth<br/>updates"]

SGD --> ONE["1 sample<br/>per step"]
SGD --> FAST["Quick<br/>progress"]
SGD --> NOISE["Noisy<br/>updates"]

MGD --> MB["Small batch<br/>per step"]
MGD --> PRACT["Practical<br/>default"]

style T fill:#90CAF9,stroke:#1E88E5,color:#000

style BGD fill:#C8E6C9,stroke:#2E7D32,color:#000
style SGD fill:#C8E6C9,stroke:#2E7D32,color:#000
style MGD fill:#C8E6C9,stroke:#2E7D32,color:#000

style ALL fill:#CE93D8,stroke:#8E24AA,color:#000
style STB fill:#CE93D8,stroke:#8E24AA,color:#000
style ONE fill:#CE93D8,stroke:#8E24AA,color:#000
style FAST fill:#CE93D8,stroke:#8E24AA,color:#000
style NOISE fill:#CE93D8,stroke:#8E24AA,color:#000
style MB fill:#CE93D8,stroke:#8E24AA,color:#000
style PRACT fill:#CE93D8,stroke:#8E24AA,color:#000

Batch #

Use only if you have huge compute and a lot of time to train

SGD #

go-to solution

Classification(Linear Models)

Mon, 01 Jan 0001 00:00:00 +0000

Linear models for Classification #

categorises data by finding a linear boundary (hyperplane) that separates classes
calculating a weighted sum of input features plus bias

flowchart TD
T["Linear<br/>classification<br/>models"] --> P["Perceptron"]
T --> LR["Logistic<br/>regression"]
T --> SVM["Linear<br/>SVM"]

P -->|uses| STEP["Step<br/>activation"]
LR -->|uses| SIG["Sigmoid<br/>+ log loss"]
SVM -->|uses| HNG["Hinge<br/>loss"]

style T fill:#90CAF9,stroke:#1E88E5,color:#000

style P fill:#C8E6C9,stroke:#2E7D32,color:#000
style LR fill:#C8E6C9,stroke:#2E7D32,color:#000
style SVM fill:#C8E6C9,stroke:#2E7D32,color:#000

style STEP fill:#CE93D8,stroke:#8E24AA,color:#000
style SIG fill:#CE93D8,stroke:#8E24AA,color:#000
style HNG fill:#CE93D8,stroke:#8E24AA,color:#000

Discriminant Functions #

Decision Theory #

Probabilistic Discriminative Classifiers #

Logistic Regression #

Supervised machine learning algorithm
Binary classification algorithm
requires data to be linearly separable
predicts the probability that an input belongs to a specific class
uses Sigmoid function to convert inputs into a probability value between 0 and 1

Key takeaway: Logistic regression predicts $P(y=1\mid x)$ using a sigmoid of a linear score $z=w\cdot x+b$, then learns $w,b$ by maximising likelihood (equivalently minimising log-loss).

Foundation Models

Sun, 14 Dec 2025 00:00:00 +0000

Foundation Model #

AI models trained on massive datasets to perform a wide range of tasks with minimal fine-tuning.

are large deep learning neural networks
are large AI models trained on massive and diverse datasets (text, images, audio, or multiple modalities).
Contain millions or billions of parameters.
designed to perform a broad range of general tasks
designed for general-purpose intelligence, not a single task.
acts as base models for building specialised AI applications

LLM - Model

Mon, 01 Jan 0001 00:00:00 +0000

LLM – Large Language Model #

Large Language Models (LLMs) are advanced AI systems designed to process, understand, and generate human-like text.

They learn language by analysing massive amounts of text data, discovering patterns in:

grammar
meaning
context
relationships between words and sentences
Built on Deep Learning
Implemented using Neural Networks
Based on Transformers
Often combined with tools like:
- Retrieval (RAG)
- Agents
- External APIs
- Memory systems

What makes an LLM special? #

Built using deep neural networks
Trained on very large datasets (books, articles, code, web text)
Can perform many tasks without task-specific training
General-purpose language understanding, not single-task models

Foundation: Transformer Architecture #

LLMs are based on the Transformer Architecture, which allows models to understand context and long-range dependencies in text.

Decision Tree

Mon, 01 Jan 0001 00:00:00 +0000

Decision Tree #

A decision tree classifies an example by asking a sequence of questions about its attributes until it reaches a leaf (final decision).

Key takeaway: A decision tree grows by repeatedly splitting the training data into purer subsets using an impurity measure (Entropy / Gini / Classification Error).

Information Theory #

Decision trees need a way to measure: “How mixed are the class labels at a node?”

Instance-based Learning

Mon, 01 Jan 0001 00:00:00 +0000

Instance-based Learning #

Instance-based learning is a family of methods that do not build one explicit global model during training. Instead, they store training examples and delay most of the work until a new query arrives.

When a new point must be classified or predicted, the algorithm compares it with previously seen examples, finds the most relevant neighbours, and uses them to produce the answer.

Instance-based Learning covers three linked ideas:

Support Vector Machine

Mon, 01 Jan 0001 00:00:00 +0000

Support Vector Machine (SVM) #

A Support Vector Machine (SVM) is a supervised machine learning algorithm used for:

Classification (most common)
Regression (SVR – Support Vector Regression)

Find the decision boundary that separates classes with the maximum margin.

A Support Vector Machine is a supervised learning algorithm that finds an optimal hyperplane by maximising the margin between classes, using support vectors and kernel functions to handle non-linear data.

Attention Mechanism

Mon, 01 Jan 0001 00:00:00 +0000

Attention Mechanism #

Queries, Keys, and Values
Attention Pooling by Similarity
Attention Pooling via Nadaraya–Watson Regression
Attention Scoring Functions
Dot Product Attention
Convenience Functions
Scaled Dot Product Attention
Additive Attention
Bahdanau Attention Mechanism
Multi-Head Attention
Self-Attention
Positional Encoding
Code implementation (webinar)

Reference #

Dive into deep learning. Cambridge University Press.. (Ch 10, Ch7

Home | Deep Learning

Bayesian Learning

Mon, 01 Jan 0001 00:00:00 +0000

Bayesian Learning #

MLE Hypothesis #

MAP Hypothesis #

Bayes Rule #

Optimal Bayes Classifier #

Naïve Bayes Classifier #

Probabilistic Generative Classifiers #

Bayesian Linear Regression #

Home | Machine Learning

Ensemble Learning

Mon, 01 Jan 0001 00:00:00 +0000

Ensemble Learning #

Combining Classifiers #

Bagging #

Random Forest #

Boosting #

ADABoost #

Gradient Boosting #

XGBoost #

Home | Machine Learning

Optimisation of Deep models

Mon, 01 Jan 0001 00:00:00 +0000

Optimisation of Deep models #

Goal of Optimization
Optimization Challenges in Deep Learning
Gradient Descent
Stochastic Gradient Descent
Minibatch Stochastic Gradient Descent
Momentum
Adagrad and Algorithm
RMSProp and Algorithm
Adadelta and Algorithm
Adam and Algorithm
Code Implementation and comparison of algorithms (webinar)

Reference #

Dive into deep learning. Cambridge University Press.. (Ch12)

Home | Deep Learning

Evaluation/Comparison

Mon, 01 Jan 0001 00:00:00 +0000

Machine Learning Model Evaluation/Comparison #

Comparing Machine Learning Models #

Emerging requirements e.g., bias, fairness, interpretability of ML models #

Home | Machine Learning

Regularisation for Deep models

Mon, 01 Jan 0001 00:00:00 +0000

Regularisation for Deep models #

Generalization for regression
Training Error and Generalization Error
Underfitting or Overfitting
Model Selection
Weight Decay and Norms
Generalization in Classification
Environment and Distribution Shift
Generalization in Deep Learning
Dropout
Batch Normalization
Layer Normalization
Code implementation (webinar)

Reference #

Dive into deep learning. Cambridge University Press.. (T1 – Ch 3.6, 3.7, T1 - Ch 4.6, 4.7, T1 - Ch 5.5, 5.6, T1 - Ch 8.5, T1 - Ch 11.7

Home | Deep Learning