ML

Classification (Linear)

Linear models for Classification #

  • categorises data by finding a linear boundary (hyperplane) that separates classes
  • calculating a weighted sum of input features plus bias
flowchart TD
T["Linear<br/>classification<br/>models"] --> P["Perceptron"]
T --> LR["Logistic<br/>regression"]
T --> SVM["Linear<br/>SVM"]

P -->|uses| STEP["Step<br/>activation"]
LR -->|uses| SIG["Sigmoid<br/>+ log loss"]
SVM -->|uses| HNG["Hinge<br/>loss"]

style T fill:#90CAF9,stroke:#1E88E5,color:#000

style P fill:#C8E6C9,stroke:#2E7D32,color:#000
style LR fill:#C8E6C9,stroke:#2E7D32,color:#000
style SVM fill:#C8E6C9,stroke:#2E7D32,color:#000

style STEP fill:#CE93D8,stroke:#8E24AA,color:#000
style SIG fill:#CE93D8,stroke:#8E24AA,color:#000
style HNG fill:#CE93D8,stroke:#8E24AA,color:#000
  • Discriminant Functions
  • Decision Theory
  • Probabilistic Discriminative Classifiers
  • Logistic Regression

Logistic Regression #

  • Supervised machine learning algorithm
  • Binary classification algorithm
  • requires data to be linearly separable
  • predicts the probability that an input belongs to a specific class
  • uses Sigmoid function to convert inputs into a probability value between 0 and 1

Key takeaway: Logistic regression predicts $P(y=1\mid x)$ using a sigmoid of a linear score $z=w\cdot x+b$, then learns $w,b$ by maximising likelihood (equivalently minimising log-loss).

Hypothesis Testing

Hypothesis Testing #

Hypothesis testing is a statistical decision-making method used to decide whether sample evidence is strong enough to reject an initial assumption about a population.

It connects probability, sampling distributions, confidence intervals, significance levels, and decision rules.

Key takeaway:
Hypothesis testing is not about proving something with certainty.

It is about asking:

If the null hypothesis were true, how surprising would this sample result be?

Foundation Models

Foundation Model #

AI models trained on massive datasets to perform a wide range of tasks with minimal fine-tuning.

  • are large deep learning neural networks

  • are large AI models trained on massive and diverse datasets (text, images, audio, or multiple modalities).

  • Contain millions or billions of parameters.

  • designed to perform a broad range of general tasks

  • designed for general-purpose intelligence, not a single task.

  • acts as base models for building specialised AI applications

LLM - Model

LLM – Large Language Model #

Large Language Models (LLMs) are advanced AI systems designed to process, understand, and generate human-like text.

They learn language by analysing massive amounts of text data, discovering patterns in:

  • grammar

  • meaning

  • context

  • relationships between words and sentences

  • Built on Deep Learning

  • Implemented using Neural Networks

  • Based on Transformers

  • Often combined with tools like:

    • Retrieval (RAG)
    • Agents
    • External APIs
    • Memory systems

What makes an LLM special? #

  • Built using deep neural networks
  • Trained on very large datasets (books, articles, code, web text)
  • Can perform many tasks without task-specific training
  • General-purpose language understanding, not single-task models

Foundation: Transformer Architecture #

LLMs are based on the Transformer Architecture, which allows models to understand context and long-range dependencies in text.

Decision Tree

Decision Tree #

A decision tree classifies an example by asking a sequence of questions about its attributes until it reaches a leaf (final decision).

Key takeaway: A decision tree grows by repeatedly splitting the training data into purer subsets using an impurity measure (Entropy / Gini / Classification Error).

  • Information Theory
  • Entropy Based Decision Tree Construction
  • Avoiding Overfitting
  • Minimum Description Length
  • Handling Continuous valued attributes, missing attributes

Information Theory #

Decision trees need a way to measure: “How mixed are the class labels at a node?”

Prediction & Forecasting

Prediction & Forecasting #

Prediction and forecasting use statistical models to estimate unknown or future values.

In this module, the focus is on correlation, regression, and time series forecasting.

Key takeaway:
Prediction estimates a value using a model.

Forecasting is prediction where the order of time matters.

  • Correlation
  • Regression
  • Time series analysis
  • Components of time series data
  • Moving average and weighted moving average
  • AR model
  • ARMA model
  • ARIMA model
  • SARIMA and SARIMAX
  • VAR and VARMAX
  • Simple exponential smoothing

Prediction vs Forecasting ☆ #

ConceptMeaningExample
PredictionEstimate an unknown outputPredict house price from area and rooms
ForecastingPredict future values using time orderForecast sales for next month
All forecasting is prediction, but not all prediction is forecasting.

Overall Workflow #

flowchart LR
    A[Data] --> B[Explore Pattern]
    B --> C[Choose Model]
    C --> D[Train or Fit]
    D --> E[Validate]
    E --> F[Predict or Forecast]
    F --> G[Interpret Error]

    style A fill:#E1F5FE
    style B fill:#C8E6C9
    style C fill:#FFF9C4
    style D fill:#EDE7F6
    style E fill:#C8E6C9
    style F fill:#E1F5FE
    style G fill:#FFF9C4

Correlation ☆ #

Correlation measures the direction and strength of linear relationship between two variables.

Gaussian Mixture Model & Expectation Maximization

Gaussian Mixture Model & Expectation Maximization #

A Gaussian Mixture Model represents data as a weighted combination of multiple Gaussian distributions.

It is commonly used for soft clustering and density estimation.

Key takeaway:
K-means gives hard cluster membership.

GMM gives probabilities of belonging to each cluster.

  • Gaussian Mixture Model
  • soft clustering
  • mixing coefficients
  • latent variables
  • likelihood and log-likelihood
  • Expectation-Maximization algorithm
  • E-step and M-step
  • responsibilities
  • convergence

Motivation ☆ #

Many real datasets are not described well by one Gaussian distribution.

Instance-based Learning

Instance-based Learning #

Instance-based learning is a family of methods that do not build one explicit global model during training. Instead, they store training examples and delay most of the work until a new query arrives.

When a new point must be classified or predicted, the algorithm compares it with previously seen examples, finds the most relevant neighbours, and uses them to produce the answer.

Instance-based Learning covers three linked ideas:

Support Vector Machine

Support Vector Machine (SVM) #

Support Vector Machine (SVM) is a supervised machine learning algorithm used for:

  • Classification (most common)
  • Regression (SVR – Support Vector Regression)

It connects many earlier ideas:

  • classification and decision boundaries
  • linear classifiers
  • margins
  • optimisation
  • constrained optimisation
  • kernels for non-linear data

SVM is a discriminative classifier.

That means it does not try to model how each class is generated.

Instead, it tries to find the best separating boundary between classes.

Bayesian Learning

Bayesian Learning #

Bayesian Learning is a probabilistic approach to machine learning.

Instead of only asking, “Which output should the model predict?”, Bayesian Learning asks:

Given the data we have observed, how likely is each hypothesis, class, or parameter value?

This makes Bayesian Learning useful when uncertainty matters.

It is especially important in classification, probabilistic modelling, generative models, and situations where we want to combine prior knowledge with observed data.