ML

Decision Tree #

A decision tree classifies an example by asking a sequence of questions about its attributes until it reaches a leaf (final decision).

Key takeaway: A decision tree grows by repeatedly splitting the training data into purer subsets using an impurity measure (Entropy / Gini / Classification Error).

Information Theory #

Decision trees need a way to measure: “How mixed are the class labels at a node?”

Statistics

March 12, 2026

AI, ML

Statistics, Probability, Data Analysis

Statistics #

Statistical methods help you turn raw data into reliable conclusions, while understanding uncertainty, variability, and confidence.

Statistics provides the language and tools for reasoning about data, uncertainty, and inference.

ML needs understanding data behaviour, drawing conclusions, and validating machine learning models.

Collect Data
Present & Organise Data (in a systematic manner)
Alalyse Data
Infer about the Data
Take Decision from the Data

Statistics Topic	What you learn (plain English)	ML Connection
1. Basic Probability & Statistics	Summarise data; understand spread; basic probability rules	Data understanding (EDA), feature sanity checks, detecting outliers, interpreting “average behaviour”
2. Conditional Probability & Bayes	Update probability using new information; Bayes’ rule	Naïve Bayes, Bayesian thinking, posterior probabilities, probabilistic classification
3. Probability Distributions	Model randomness with distributions; expectation/variance/covariance	Likelihood models, noise assumptions (Gaussian), sampling, probabilistic modelling foundations
4. Hypothesis Testing	Sampling, CLT, confidence intervals, significance tests, ANOVA, MLE	A/B testing, evaluating model improvements, significance vs noise, parameter estimation (MLE)
5. Prediction & Forecasting	Correlation, regression, time series (AR/MA/ARIMA/SARIMA etc.)	Linear regression, forecasting, sequential data modelling, baseline predictive modelling
6. GMM & EM	Mixtures of Gaussians; iterative estimation with EM	Unsupervised learning (soft clustering), density estimation, latent-variable models

flowchart TD
  A["Statistical Methods<br/>AIML ZC418"] --> B["1. Basic Probability and Statistics"]
  A --> C["2. Conditional Probability and Bayes"]
  A --> D["3. Probability Distributions"]
  A --> E["4. Hypothesis Testing"]
  A --> F["5. Prediction and Forecasting"]
  A --> G["6. Gaussian Mixture Model and EM"]

  B --> B1["Central Tendency<br/>Mean - Median - Mode"]
  B --> B2["Variability<br/>Range - Variance - SD - Quartiles"]
  B --> B3["Basic Probability Concepts"]
  B3 --> B31["Axioms of Probability"]
  B3 --> B32["Definition of Probability"]
  B3 --> B33["Mutually Exclusive vs Independent"]

  C --> C1["Conditional Probability"]
  C --> C2["Independence (conditional)"]
  C --> C3["Bayes Theorem"]
  C --> C4["Naive Bayes (intro)"]

  D --> D1["Random Variables<br/>Discrete and Continuous"]
  D --> D2["Expectation - Variance - Covariance"]
  D --> D3["Transformations of RVs"]
  D --> D4["Key Distributions"]
  D4 --> D41["Bernoulli"]
  D4 --> D42["Binomial"]
  D4 --> D43["Poisson"]
  D4 --> D44["Normal (Gaussian)"]
  D4 --> D45["t - Chi-square - F (intro)"]

  E --> E1["Sampling<br/>Random and Stratified"]
  E --> E2["Sampling Distributions<br/>CLT"]
  E --> E3["Estimation<br/>Confidence Intervals"]
  E --> E4["Hypothesis Tests<br/>Means and Proportions"]
  E --> E5["ANOVA<br/>Single and Dual factor"]
  E --> E6["Maximum Likelihood"]

  F --> F1["Correlation"]
  F --> F2["Regression"]
  F --> F3["Time Series Basics<br/>Components"]
  F --> F4["Moving Averages<br/>Simple and Weighted"]
  F --> F5["Time Series Models"]
  F5 --> F51["AR"]
  F5 --> F52["ARMA / ARIMA"]
  F5 --> F53["SARIMA / SARIMAX"]
  F5 --> F54["VAR / VARMAX"]
  F --> F6["Exponential Smoothing"]

  G --> G1["GMM<br/>Mixture of Gaussians"]
  G --> G2["EM Algorithm<br/>E-step - M-step"]

  B -.-> C
  C -.-> D
  D -.-> E
  E -.-> F
  F -.-> G

Data - Types #

flowchart TD
	A[(Data)] --> B["Categorical (Qualitative)"]
    A --> C["Numerical (Quantitative)"]

    B --> B1[Nominal]
    B --> B2[Ordinal]

    C --> C1[Discrete]
    C --> C2[Continuous]

    C2 --> C21[Interval]
    C2 --> C22[Ratio]

    %% Styling
    style A fill:#E1F5FE,stroke:#333
    style B fill:#90CAF9,stroke:#333
    style B1 fill:#90CAF9,stroke:#333
    style B2 fill:#90CAF9,stroke:#333
    style C fill:#FFF9C4,stroke:#333
    style C1 fill:#FFF9C4,stroke:#333
    style C2 fill:#FFF9C4,stroke:#333
    style C21 fill:#FFF9C4,stroke:#333
    style C22 fill:#FFF9C4,stroke:#333

Categorical (Qualitative) #
express a qualitative attribute e.g. hair color, eye color

Instance-based Learning #

Instance-based learning is a family of methods that do not build one explicit global model during training. Instead, they store training examples and delay most of the work until a new query arrives.

When a new point must be classified or predicted, the algorithm compares it with previously seen examples, finds the most relevant neighbours, and uses them to produce the answer.

Instance-based Learning covers three linked ideas:

Support Vector Machine

AI, ML

AI, ML, SVM, Classification

Support Vector Machine (SVM) #

A Support Vector Machine (SVM) is a supervised machine learning algorithm used for:

Classification (most common)
Regression (SVR – Support Vector Regression)

Find the decision boundary that separates classes with the maximum margin.

A Support Vector Machine is a supervised learning algorithm that finds an optimal hyperplane by maximising the margin between classes, using support vectors and kernel functions to handle non-linear data.

Attention Mechanism

AI, ML

AI, ML, Neural Networks

Attention Mechanism #

Queries, Keys, and Values
Attention Pooling by Similarity
Attention Pooling via Nadaraya–Watson Regression
Attention Scoring Functions
Dot Product Attention
Convenience Functions
Scaled Dot Product Attention
Additive Attention
Bahdanau Attention Mechanism
Multi-Head Attention
Self-Attention
Positional Encoding
Code implementation (webinar)

Reference #

Dive into deep learning. Cambridge University Press.. (Ch 10, Ch7

Home | Deep Learning

Bayesian Learning

AI, ML

Bayesian Learning #

MLE Hypothesis #

MAP Hypothesis #

Bayes Rule #

Optimal Bayes Classifier #

Naïve Bayes Classifier #

Probabilistic Generative Classifiers #

Bayesian Linear Regression #

Home | Machine Learning

Transformer

December 15, 2025

AI, ML

AI, ANI, AGI, ASI

Transformer #

is an architecture of neural networks
based on the multi-head attention mechanism
text is converted to numerical representations called tokens, and each token is converted into a vector via lookup from a word embedding table
takes a text sequence as input and produces another text sequence as output
foundation for modern Large Language Models (LLMs) like ChatGPT and Gemini
Transformer architecture
Model, Positionwise Feed-Forward Networks, Residual Connection and Layer Normalization