ML

Supervised Learning

Supervised Learning #

Trained using labelled data.
Each example in the training set includes the correct output.
The algorithm learns to generalise and make predictions on unseen data.
Generally more accurate than unsupervised methods.
Requires human intervention for labelling and setup.
Widely used due to its accuracy and efficiency.
Produces highly accurate results when trained on good-quality labelled data.


Classification #

Output is discrete (e.g. Yes/No, Spam/Not Spam).
Used for categorising data into predefined classes.
Support Vector Machine (SVM) is a common classifier (a linear classifier with margin-based separation).

Unsupervised Learning

Unsupervised Learning #

  • Works on unlabelled raw data.
  • The algorithm discovers hidden patterns without prior knowledge of outcomes.
  • Requires no human intervention during training.
  • Does not make direct predictions — it groups or organises data instead.
  • Carries a higher risk because there’s no ground truth to verify results.
  • Common techniques include Clustering, Association, and Dimensionality Reduction.

stateDiagram-v2

  %% ML maths-based colours (same palette as supervised)
  classDef probability fill:#d1fae5,stroke:#065f46,stroke-width:1px
  classDef geometry fill:#ffedd5,stroke:#9a3412,stroke-width:1px
  classDef category font-style:italic,font-weight:bold,fill:#f3f4f6,stroke:#374151

  %% Root
  USL: Unsupervised Learning

  %% Main branches
  USL --> CLU:::category
  CLU: Clustering

  USL --> DR:::category
  DR: Dimensionality Reduction

  %% Clustering algorithms
  CLU --> KM:::geometry
  KM: K-Means

  CLU --> HC:::geometry
  HC: Hierarchical Clustering

  CLU --> DB:::geometry
  DB: DBSCAN

  %% Probabilistic models
  USL --> PM:::category
  PM: Probabilistic Models

  PM --> GMM:::probability
  GMM: Gaussian Mixture Model

  PM --> HMM:::probability
  HMM: Hidden Markov Model

Clustering #

  • Groups similar data points together based on shared features.
  • Commonly used for market segmentation, image compression, and anomaly detection.

Common Types of Clustering #

  • K-Means Clustering – Divides data into K groups based on similarity.
  • Hierarchical Clustering – Builds a hierarchy (tree) of clusters.
  • DBSCAN (Density-Based Spatial Clustering) – Groups points close in density; identifies noise/outliers.

Association #

  • Identifies relationships or correlations between variables in a dataset.
  • Commonly used in market basket analysis (e.g. “Customers who bought X also bought Y”).

Common Techniques #

  • Apriori Algorithm – Finds frequent itemsets and generates association rules.
  • Eclat Algorithm – Similar to Apriori but uses set intersections for faster computation.

Dimensionality Reduction #

  • Reduces the number of input variables to simplify data.
  • Helps remove noise and redundancy.
  • Commonly used in data pre-processing and visualisation.

Common Techniques #

  • Principal Component Analysis (PCA) – Projects data onto fewer dimensions while keeping most variance.
  • Linear Discriminant Analysis (LDA) – Focuses on class separation.
  • t-SNE (t-Distributed Stochastic Neighbour Embedding) – Used for visualising high-dimensional data.
  • Autoencoders – Neural networks that compress and reconstruct data.

mindmap
  root(Unsupervised Learning)
    Clustering
      K Means
      Hierarchical Clustering
      DBSCAN
    Dimensionality Reduction
      PCA
      t SNE
      Autoencoders
    Probabilistic Models
      Gaussian Mixture Model
      Hidden Markov Model

Home | Machine Learning

Partial Differentiation and Gradients

Partial Differentiation and Gradients #

For f(x1, x2, …, xn):

[ \frac{\partial f}{\partial x_i} ]

Gradient vector:

[ \nabla f = \begin{bmatrix} \frac{\partial f}{\partial x_1} \ \vdots \ \frac{\partial f}{\partial x_n} \end{bmatrix} ]

Gradient points in direction of steepest ascent.

flowchart LR
    Input --> Function
    Function --> Gradient
    Gradient --> Optimisation

Home | Vector Calculus

Linear Independence

Linear Independence #

A set of vectors is linearly independent if none of them can be written as a linear combination of the others.

\[ c_1\mathbf{v}_1 + \cdots + c_k\mathbf{v}_k = \mathbf{0} \;\Rightarrow\; c_1=\cdots=c_k=0 \]

Independence means each vector adds new information.

Semi-Supervised Learning

Semi-Supervised Learning #

  • A combination of labelled and unlabelled data.
  • Useful when labelling large datasets is expensive or time-consuming.
  • Works well with high-volume datasets (e.g. millions of images).
  • Only a small fraction of data is labelled (e.g. a few thousand).
  • The algorithm learns from both labelled examples and structure in unlabelled data.
  • Ideal for medical imaging where labelled data is limited.
  • For example, a radiologist can label a small set of medical scans,
    and the model uses that to learn from thousands of unlabelled scans.
  • Helps improve accuracy and generalisation with minimal manual effort.

Home | Machine Learning

Gradients of Vector-Valued and Matrix Functions

Gradients of Vector-Valued and Matrix Functions #

Covers gradients when outputs or parameters are vectors/matrices.

If f: R^n -> R^m, the derivative is the Jacobian.

[ J = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \dots & \frac{\partial f_1}{\partial x_n} \ \vdots & \ddots & \vdots \ \frac{\partial f_m}{\partial x_1} & \dots & \frac{\partial f_m}{\partial x_n} \end{bmatrix} ]

For scalar f(x):

[ H = \nabla^2 f ]

Hessian captures curvature.

Reinforcement Learning

Reinforcement Learning (RL) #

RL is learning by trial and error.

Reinforcement Learning (RL) is a type of machine learning where an autonomous agent learns to make decisions by interacting with an environment.

Instead of being told the correct answer, the agent:

  • takes actions
  • observes outcomes
  • receives rewards or penalties
  • gradually learns a strategy that maximises long-term reward

Reinforcement Learning teaches an agent how to act, not what to predict.

Inner Products and Dot Product

Inner Products and Dot Product #

An inner product maps two vectors to a single scalar.

It allows us to measure:

  • similarity
  • vector length
  • projections
  • orthogonality
flowchart TD
T["Inner<br/>products<br/>(types)"] --> DOT["Euclidean<br/>Dot product"]
T --> WIP["Weighted<br/>inner product"]
T --> FN["Function-space<br/>(integral)"]
T --> HERM["Complex<br/>Hermitian"]
T --> MAT["Matrix<br/>inner product<br/>(Frobenius)"]

DOT --> Rn["Vectors in<br/>
<span>
  \( \mathbb{R}^n \)
  </span>

"]
WIP --> SPD["SPD matrix<br/>W"]
FN --> L2["L2 space<br/>functions"]
HERM --> Cn["Vectors in<br/>C^n"]
MAT --> Mnm["Matrices<br/>R^{m×n}"]

style T fill:#90CAF9,stroke:#1E88E5,color:#000

style DOT fill:#C8E6C9,stroke:#2E7D32,color:#000
style WIP fill:#C8E6C9,stroke:#2E7D32,color:#000
style FN fill:#C8E6C9,stroke:#2E7D32,color:#000
style HERM fill:#C8E6C9,stroke:#2E7D32,color:#000
style MAT fill:#C8E6C9,stroke:#2E7D32,color:#000

style Rn fill:#CE93D8,stroke:#8E24AA,color:#000
style SPD fill:#CE93D8,stroke:#8E24AA,color:#000
style L2 fill:#CE93D8,stroke:#8E24AA,color:#000
style Cn fill:#CE93D8,stroke:#8E24AA,color:#000
style Mnm fill:#CE93D8,stroke:#8E24AA,color:#000

Definition #

For vectors
\( \mathbf{a}, \mathbf{b} \in \mathbb{R}^n \)