January 3, 2026
Supervised Learning
# Trained using labelled data . Each example in the training set includes the correct output . The algorithm learns to generalise and make predictions on unseen data. Generally more accurate than unsupervised methods. Requires human intervention for labelling and setup. Widely used due to its accuracy and efficiency . Produces highly accurate results when trained on good-quality labelled data.
Classification
# Output is discrete (e.g. Yes/No, Spam/Not Spam). Used for categorising data into predefined classes. Support Vector Machine (SVM) is a common classifier (a linear classifier with margin-based separation).
Differentiation of Univariate Functions
# Differentiation measures rate of change.
For a function f(x), the derivative measures the rate of change.
$[
f'(x) = $lim_{h $to 0} $frac{f(x+h)-f(x)}{h}
$] Interpretation:
Slope of tangent Instantaneous rate of change Home | Vector Calculus
January 3, 2026
Unsupervised Learning
# Works on unlabelled raw data . The algorithm discovers hidden patterns without prior knowledge of outcomes. Requires no human intervention during training. Does not make direct predictions — it groups or organises data instead. Carries a higher risk because there’s no ground truth to verify results. Common techniques include Clustering , Association , and Dimensionality Reduction .
stateDiagram-v2
%% ML maths-based colours (same palette as supervised)
classDef probability fill:#d1fae5,stroke:#065f46,stroke-width:1px
classDef geometry fill:#ffedd5,stroke:#9a3412,stroke-width:1px
classDef category font-style:italic,font-weight:bold,fill:#f3f4f6,stroke:#374151
%% Root
USL: Unsupervised Learning
%% Main branches
USL --> CLU:::category
CLU: Clustering
USL --> DR:::category
DR: Dimensionality Reduction
%% Clustering algorithms
CLU --> KM:::geometry
KM: K-Means
CLU --> HC:::geometry
HC: Hierarchical Clustering
CLU --> DB:::geometry
DB: DBSCAN
%% Probabilistic models
USL --> PM:::category
PM: Probabilistic Models
PM --> GMM:::probability
GMM: Gaussian Mixture Model
PM --> HMM:::probability
HMM: Hidden Markov Model
Clustering
# Groups similar data points together based on shared features. Commonly used for market segmentation , image compression , and anomaly detection . Common Types of Clustering
# K-Means Clustering – Divides data into K groups based on similarity.Hierarchical Clustering – Builds a hierarchy (tree) of clusters.DBSCAN (Density-Based Spatial Clustering) – Groups points close in density; identifies noise/outliers.Association
# Identifies relationships or correlations between variables in a dataset. Commonly used in market basket analysis (e.g. “Customers who bought X also bought Y”). Common Techniques
# Apriori Algorithm – Finds frequent itemsets and generates association rules.Eclat Algorithm – Similar to Apriori but uses set intersections for faster computation.Dimensionality Reduction
# Reduces the number of input variables to simplify data. Helps remove noise and redundancy. Commonly used in data pre-processing and visualisation . Common Techniques
# Principal Component Analysis (PCA) – Projects data onto fewer dimensions while keeping most variance.Linear Discriminant Analysis (LDA) – Focuses on class separation.t-SNE (t-Distributed Stochastic Neighbour Embedding) – Used for visualising high-dimensional data.Autoencoders – Neural networks that compress and reconstruct data.
mindmap
root(Unsupervised Learning)
Clustering
K Means
Hierarchical Clustering
DBSCAN
Dimensionality Reduction
PCA
t SNE
Autoencoders
Probabilistic Models
Gaussian Mixture Model
Hidden Markov Model
Home | Machine Learning
Partial Differentiation and Gradients
# For f(x1, x2, …, xn):
[
\frac{\partial f}{\partial x_i}
] Gradient vector:
[
\nabla f =
\begin{bmatrix}
\frac{\partial f}{\partial x_1} \
\vdots \
\frac{\partial f}{\partial x_n}
\end{bmatrix}
] Gradient points in direction of steepest ascent.
flowchart LR
Input --> Function
Function --> Gradient
Gradient --> Optimisation
Home | Vector Calculus
Linear Independence
# A set of vectors is linearly independent if none of them can be written as a linear combination of the others.
\[
c_1\mathbf{v}_1 + \cdots + c_k\mathbf{v}_k = \mathbf{0}
\;\Rightarrow\;
c_1=\cdots=c_k=0
\] Independence means each vector adds new information .
January 3, 2026
Semi-Supervised Learning
# A combination of labelled and unlabelled data . Useful when labelling large datasets is expensive or time-consuming . Works well with high-volume datasets (e.g. millions of images). Only a small fraction of data is labelled (e.g. a few thousand). The algorithm learns from both labelled examples and structure in unlabelled data. Ideal for medical imaging where labelled data is limited.For example, a radiologist can label a small set of medical scans, and the model uses that to learn from thousands of unlabelled scans. Helps improve accuracy and generalisation with minimal manual effort. Home | Machine Learning
Gradients of Vector-Valued and Matrix Functions
# Covers gradients when outputs or parameters are vectors/matrices.
If f: R^n -> R^m, the derivative is the Jacobian.
[
J =
\begin{bmatrix}
\frac{\partial f_1}{\partial x_1} & \dots & \frac{\partial f_1}{\partial x_n} \
\vdots & \ddots & \vdots \
\frac{\partial f_m}{\partial x_1} & \dots & \frac{\partial f_m}{\partial x_n}
\end{bmatrix}
] For scalar f(x):
[
H = \nabla^2 f
] Hessian captures curvature.
Reinforcement Learning (RL)
# RL is learning by trial and error .
Reinforcement Learning (RL) is a type of machine learning where an autonomous agent learns to make decisions by interacting with an environment .
Instead of being told the correct answer, the agent:
takes actions observes outcomes receives rewards or penalties gradually learns a strategy that maximises long-term reward Reinforcement Learning teaches an agent how to act, not what to predict.
Useful Gradient Identities
# [
\nabla (a^T x) = a
]
[
\nabla (x^T A x) = (A + A^T)x
] If A symmetric:
[
\nabla (x^T A x) = 2Ax
] These are heavily used in optimisation .
Home | Vector Calculus
Inner Products and Dot Product
# An inner product maps two vectors to a single scalar .
It allows us to measure:
similarity vector length projections orthogonality
flowchart TD
T["Inner<br/>products<br/>(types)"] --> DOT["Euclidean<br/>Dot product"]
T --> WIP["Weighted<br/>inner product"]
T --> FN["Function-space<br/>(integral)"]
T --> HERM["Complex<br/>Hermitian"]
T --> MAT["Matrix<br/>inner product<br/>(Frobenius)"]
DOT --> Rn["Vectors in<br/>
<span>
\( \mathbb{R}^n \)
</span>
"]
WIP --> SPD["SPD matrix<br/>W"]
FN --> L2["L2 space<br/>functions"]
HERM --> Cn["Vectors in<br/>C^n"]
MAT --> Mnm["Matrices<br/>R^{m×n}"]
style T fill:#90CAF9,stroke:#1E88E5,color:#000
style DOT fill:#C8E6C9,stroke:#2E7D32,color:#000
style WIP fill:#C8E6C9,stroke:#2E7D32,color:#000
style FN fill:#C8E6C9,stroke:#2E7D32,color:#000
style HERM fill:#C8E6C9,stroke:#2E7D32,color:#000
style MAT fill:#C8E6C9,stroke:#2E7D32,color:#000
style Rn fill:#CE93D8,stroke:#8E24AA,color:#000
style SPD fill:#CE93D8,stroke:#8E24AA,color:#000
style L2 fill:#CE93D8,stroke:#8E24AA,color:#000
style Cn fill:#CE93D8,stroke:#8E24AA,color:#000
style Mnm fill:#CE93D8,stroke:#8E24AA,color:#000
Definition
# For vectors\( \mathbf{a}, \mathbf{b} \in \mathbb{R}^n \)