Retrieval-Augmented Generation (RAG)
#
Retrieval-Augmented Generation (RAG) is a system design pattern that improves an LLM’s answers by:
- Retrieving relevant information from an external knowledge source, and then
- Augmenting the LLM prompt with that retrieved context before generating the final response.
RAG helps an LLM look things up first, then answer using evidence.
Why RAG is Useful
#
RAG is commonly used when:
- Your knowledge is in private documents (PDFs, policies, internal wiki)
- You need up-to-date information (things not in the model’s training data)
- You want fewer hallucinations by grounding answers in retrieved sources
- You want traceability (show “where the answer came from”)
RAG does not change the model weights.
It changes what the model sees at inference time by adding retrieved context.
May 28, 2026Mathematical Foundations for Machine Learning
#
Machine Learning is built on mathematical principles that allow models to:
- represent data
- learn patterns
- optimise performance
flowchart LR
DATA[Data]
MATH[Math Models]
OPT[Optimisation]
MODEL[Trained Model]
DATA --> MATH
MATH --> OPT
OPT --> MODEL
ML requires core mathematical tools to understand how ML algorithms work internally. Algebra deals with relationships between variables and quantities, while Calculus focuses on change and optimization.
February 26, 2026Deep Feedforward Neural Networks (DFNN) or Multi Layer Perceptrons (MLP) for Classification
#
A Deep Feedforward Neural Network (DFNN), also called a Multi-Layer Perceptron (MLP), is a neural network with one or more hidden layers where information flows forward only (no recurrence).
For classification, DFNNs learn non-linear decision boundaries by combining hidden layers with non-linear activation functions.
Core idea:
- A single neuron can only learn linear boundaries.
- Adding hidden layers + non-linearity allows DFNNs to solve problems like XOR.
MLP as solution for XOR
#
A single perceptron fails on XOR because XOR is not linearly separable.
Decision Tree
#
A decision tree classifies an example by asking a sequence of questions about its attributes until it reaches a leaf (final decision).
Key takeaway:
A decision tree grows by repeatedly splitting the training data into purer subsets using an impurity measure
(Entropy / Gini / Classification Error).
- Information Theory
- Entropy Based Decision Tree Construction
- Avoiding Overfitting
- Minimum Description Length
- Handling Continuous valued attributes, missing attributes
Decision trees need a way to measure:
“How mixed are the class labels at a node?”
Prediction & Forecasting
#
Prediction and forecasting use statistical models to estimate unknown or future values.
In this module, the focus is on correlation, regression, and time series forecasting.
Key takeaway:
Prediction estimates a value using a model.
Forecasting is prediction where the order of time matters.
- Correlation
- Regression
- Time series analysis
- Components of time series data
- Moving average and weighted moving average
- AR model
- ARMA model
- ARIMA model
- SARIMA and SARIMAX
- VAR and VARMAX
- Simple exponential smoothing
Prediction vs Forecasting ☆
#
| Concept | Meaning | Example |
|---|
| Prediction | Estimate an unknown output | Predict house price from area and rooms |
| Forecasting | Predict future values using time order | Forecast sales for next month |
All forecasting is prediction, but not all prediction is forecasting.
Overall Workflow
#
flowchart LR
A[Data] --> B[Explore Pattern]
B --> C[Choose Model]
C --> D[Train or Fit]
D --> E[Validate]
E --> F[Predict or Forecast]
F --> G[Interpret Error]
style A fill:#E1F5FE
style B fill:#C8E6C9
style C fill:#FFF9C4
style D fill:#EDE7F6
style E fill:#C8E6C9
style F fill:#E1F5FE
style G fill:#FFF9C4
Correlation ☆
#
Correlation measures the direction and strength of linear relationship between two variables.
April 19, 2026Convolutional Neural Networks (CNN)
#
Convolutional Neural Networks (CNNs) are specialised neural networks designed for data with spatial structure, especially images. They became the standard model for computer vision because they preserve spatial locality, reuse the same pattern detector across the image, and build representations hierarchically. In practical terms, a CNN starts by learning simple features such as edges and corners, then combines them into textures, shapes, object parts, and finally full semantic categories.
March 12, 2026Statistics
#
Statistical methods help you turn raw data into reliable conclusions, while understanding uncertainty, variability, and confidence.
Statistics provides the language and tools for reasoning about data, uncertainty, and inference.
ML needs understanding data behaviour, drawing conclusions, and validating machine learning models.
- Collect Data
- Present & Organise Data (in a systematic manner)
- Alalyse Data
- Infer about the Data
- Take Decision from the Data
| Statistics Topic | What you learn (plain English) | ML Connection |
|---|
| 1. Basic Probability & Statistics | Summarise data; understand spread; basic probability rules | Data understanding (EDA), feature sanity checks, detecting outliers, interpreting “average behaviour” |
| 2. Conditional Probability & Bayes | Update probability using new information; Bayes’ rule | Naïve Bayes, Bayesian thinking, posterior probabilities, probabilistic classification |
| 3. Probability Distributions | Model randomness with distributions; expectation/variance/covariance | Likelihood models, noise assumptions (Gaussian), sampling, probabilistic modelling foundations |
| 4. Hypothesis Testing | Sampling, CLT, confidence intervals, significance tests, ANOVA, MLE | A/B testing, evaluating model improvements, significance vs noise, parameter estimation (MLE) |
| 5. Prediction & Forecasting | Correlation, regression, time series (AR/MA/ARIMA/SARIMA etc.) | Linear regression, forecasting, sequential data modelling, baseline predictive modelling |
| 6. GMM & EM | Mixtures of Gaussians; iterative estimation with EM | Unsupervised learning (soft clustering), density estimation, latent-variable models |
flowchart TD
A["Statistical Methods<br/>AIML ZC418"] --> B["1. Basic Probability and Statistics"]
A --> C["2. Conditional Probability and Bayes"]
A --> D["3. Probability Distributions"]
A --> E["4. Hypothesis Testing"]
A --> F["5. Prediction and Forecasting"]
A --> G["6. Gaussian Mixture Model and EM"]
B --> B1["Central Tendency<br/>Mean - Median - Mode"]
B --> B2["Variability<br/>Range - Variance - SD - Quartiles"]
B --> B3["Basic Probability Concepts"]
B3 --> B31["Axioms of Probability"]
B3 --> B32["Definition of Probability"]
B3 --> B33["Mutually Exclusive vs Independent"]
C --> C1["Conditional Probability"]
C --> C2["Independence (conditional)"]
C --> C3["Bayes Theorem"]
C --> C4["Naive Bayes (intro)"]
D --> D1["Random Variables<br/>Discrete and Continuous"]
D --> D2["Expectation - Variance - Covariance"]
D --> D3["Transformations of RVs"]
D --> D4["Key Distributions"]
D4 --> D41["Bernoulli"]
D4 --> D42["Binomial"]
D4 --> D43["Poisson"]
D4 --> D44["Normal (Gaussian)"]
D4 --> D45["t - Chi-square - F (intro)"]
E --> E1["Sampling<br/>Random and Stratified"]
E --> E2["Sampling Distributions<br/>CLT"]
E --> E3["Estimation<br/>Confidence Intervals"]
E --> E4["Hypothesis Tests<br/>Means and Proportions"]
E --> E5["ANOVA<br/>Single and Dual factor"]
E --> E6["Maximum Likelihood"]
F --> F1["Correlation"]
F --> F2["Regression"]
F --> F3["Time Series Basics<br/>Components"]
F --> F4["Moving Averages<br/>Simple and Weighted"]
F --> F5["Time Series Models"]
F5 --> F51["AR"]
F5 --> F52["ARMA / ARIMA"]
F5 --> F53["SARIMA / SARIMAX"]
F5 --> F54["VAR / VARMAX"]
F --> F6["Exponential Smoothing"]
G --> G1["GMM<br/>Mixture of Gaussians"]
G --> G2["EM Algorithm<br/>E-step - M-step"]
B -.-> C
C -.-> D
D -.-> E
E -.-> F
F -.-> G
Data - Types
#
flowchart TD
A[(Data)] --> B["Categorical (Qualitative)"]
A --> C["Numerical (Quantitative)"]
B --> B1[Nominal]
B --> B2[Ordinal]
C --> C1[Discrete]
C --> C2[Continuous]
C2 --> C21[Interval]
C2 --> C22[Ratio]
%% Styling
style A fill:#E1F5FE,stroke:#333
style B fill:#90CAF9,stroke:#333
style B1 fill:#90CAF9,stroke:#333
style B2 fill:#90CAF9,stroke:#333
style C fill:#FFF9C4,stroke:#333
style C1 fill:#FFF9C4,stroke:#333
style C2 fill:#FFF9C4,stroke:#333
style C21 fill:#FFF9C4,stroke:#333
style C22 fill:#FFF9C4,stroke:#333
Categorical (Qualitative)
#
express a qualitative attribute
e.g. hair color, eye color
Gaussian Mixture Model & Expectation Maximization
#
A Gaussian Mixture Model represents data as a weighted combination of multiple Gaussian distributions.
It is commonly used for soft clustering and density estimation.
Key takeaway:
K-means gives hard cluster membership.
GMM gives probabilities of belonging to each cluster.
- Gaussian Mixture Model
- soft clustering
- mixing coefficients
- latent variables
- likelihood and log-likelihood
- Expectation-Maximization algorithm
- E-step and M-step
- responsibilities
- convergence
Motivation ☆
#
Many real datasets are not described well by one Gaussian distribution.
Instance-based Learning
#
Instance-based learning is a family of methods that do not build one explicit global model during training. Instead, they store training examples and delay most of the work until a new query arrives.
When a new point must be classified or predicted, the algorithm compares it with previously seen examples, finds the most relevant neighbours, and uses them to produce the answer.
Instance-based Learning covers three linked ideas:
April 19, 2026Deep CNN Architectures
#
Once the basic ideas of convolution, pooling, channels, and classifier heads are understood, the next step is to study how successful CNN architectures are designed in practice. The history of deep CNNs is not just a list of famous models. It is a progression of design ideas: smaller filters, more depth, better optimisation, bottlenecks, multi-scale processing, residual connections, and transfer learning.
Key takeaway:
Deep CNN architectures evolved by solving specific problems one by one: LeNet established the template, AlexNet proved deep learning could dominate large-scale vision, VGG simplified the design, NiN introduced powerful 1 × 1 ideas, GoogLeNet made multi-scale processing efficient, and ResNet solved the optimisation problem of very deep networks.