ML

Decision Tree

Decision Tree #

A decision tree classifies an example by asking a sequence of questions about its attributes until it reaches a leaf (final decision).

Key takeaway: A decision tree grows by repeatedly splitting the training data into purer subsets using an impurity measure (Entropy / Gini / Classification Error).

  • Information Theory
  • Entropy Based Decision Tree Construction
  • Avoiding Overfitting
  • Minimum Description Length
  • Handling Continuous valued attributes, missing attributes

Information Theory #

Decision trees need a way to measure: “How mixed are the class labels at a node?”

Prediction & Forecasting

Prediction & Forecasting #

Prediction and forecasting use statistical models to estimate unknown or future values.

In this module, the focus is on correlation, regression, and time series forecasting.

Key takeaway:
Prediction estimates a value using a model.

Forecasting is prediction where the order of time matters.

  • Correlation
  • Regression
  • Time series analysis
  • Components of time series data
  • Moving average and weighted moving average
  • AR model
  • ARMA model
  • ARIMA model
  • SARIMA and SARIMAX
  • VAR and VARMAX
  • Simple exponential smoothing

Prediction vs Forecasting ☆ #

ConceptMeaningExample
PredictionEstimate an unknown outputPredict house price from area and rooms
ForecastingPredict future values using time orderForecast sales for next month
All forecasting is prediction, but not all prediction is forecasting.

Overall Workflow #

flowchart LR
    A[Data] --> B[Explore Pattern]
    B --> C[Choose Model]
    C --> D[Train or Fit]
    D --> E[Validate]
    E --> F[Predict or Forecast]
    F --> G[Interpret Error]

    style A fill:#E1F5FE
    style B fill:#C8E6C9
    style C fill:#FFF9C4
    style D fill:#EDE7F6
    style E fill:#C8E6C9
    style F fill:#E1F5FE
    style G fill:#FFF9C4

Correlation ☆ #

Correlation measures the direction and strength of linear relationship between two variables.

Statistics

Statistics #

Statistical methods help you turn raw data into reliable conclusions, while understanding uncertainty, variability, and confidence.

Statistics provides the language and tools for reasoning about data, uncertainty, and inference.

ML needs understanding data behaviour, drawing conclusions, and validating machine learning models.

  • Collect Data
  • Present & Organise Data (in a systematic manner)
  • Alalyse Data
  • Infer about the Data
  • Take Decision from the Data


Statistics TopicWhat you learn (plain English)ML Connection
1. Basic Probability & StatisticsSummarise data;
understand spread;
basic probability rules
Data understanding (EDA), feature sanity checks,
detecting outliers, interpreting “average behaviour”
2. Conditional Probability & BayesUpdate probability using new information;
Bayes’ rule
Naïve Bayes, Bayesian thinking,
posterior probabilities, probabilistic classification
3. Probability DistributionsModel randomness with distributions;
expectation/variance/covariance
Likelihood models, noise assumptions (Gaussian), sampling,
probabilistic modelling foundations
4. Hypothesis TestingSampling, CLT, confidence intervals,
significance tests, ANOVA, MLE
A/B testing, evaluating model improvements,
significance vs noise, parameter estimation (MLE)
5. Prediction & ForecastingCorrelation, regression,
time series (AR/MA/ARIMA/SARIMA etc.)
Linear regression, forecasting, sequential data modelling, baseline predictive modelling
6. GMM & EMMixtures of Gaussians;
iterative estimation with EM
Unsupervised learning (soft clustering),
density estimation, latent-variable models

flowchart TD
  A["Statistical Methods<br/>AIML ZC418"] --> B["1. Basic Probability and Statistics"]
  A --> C["2. Conditional Probability and Bayes"]
  A --> D["3. Probability Distributions"]
  A --> E["4. Hypothesis Testing"]
  A --> F["5. Prediction and Forecasting"]
  A --> G["6. Gaussian Mixture Model and EM"]

  B --> B1["Central Tendency<br/>Mean - Median - Mode"]
  B --> B2["Variability<br/>Range - Variance - SD - Quartiles"]
  B --> B3["Basic Probability Concepts"]
  B3 --> B31["Axioms of Probability"]
  B3 --> B32["Definition of Probability"]
  B3 --> B33["Mutually Exclusive vs Independent"]

  C --> C1["Conditional Probability"]
  C --> C2["Independence (conditional)"]
  C --> C3["Bayes Theorem"]
  C --> C4["Naive Bayes (intro)"]

  D --> D1["Random Variables<br/>Discrete and Continuous"]
  D --> D2["Expectation - Variance - Covariance"]
  D --> D3["Transformations of RVs"]
  D --> D4["Key Distributions"]
  D4 --> D41["Bernoulli"]
  D4 --> D42["Binomial"]
  D4 --> D43["Poisson"]
  D4 --> D44["Normal (Gaussian)"]
  D4 --> D45["t - Chi-square - F (intro)"]

  E --> E1["Sampling<br/>Random and Stratified"]
  E --> E2["Sampling Distributions<br/>CLT"]
  E --> E3["Estimation<br/>Confidence Intervals"]
  E --> E4["Hypothesis Tests<br/>Means and Proportions"]
  E --> E5["ANOVA<br/>Single and Dual factor"]
  E --> E6["Maximum Likelihood"]

  F --> F1["Correlation"]
  F --> F2["Regression"]
  F --> F3["Time Series Basics<br/>Components"]
  F --> F4["Moving Averages<br/>Simple and Weighted"]
  F --> F5["Time Series Models"]
  F5 --> F51["AR"]
  F5 --> F52["ARMA / ARIMA"]
  F5 --> F53["SARIMA / SARIMAX"]
  F5 --> F54["VAR / VARMAX"]
  F --> F6["Exponential Smoothing"]

  G --> G1["GMM<br/>Mixture of Gaussians"]
  G --> G2["EM Algorithm<br/>E-step - M-step"]

  B -.-> C
  C -.-> D
  D -.-> E
  E -.-> F
  F -.-> G

Data - Types #

flowchart TD
	A[(Data)] --> B["Categorical (Qualitative)"]
    A --> C["Numerical (Quantitative)"]

    B --> B1[Nominal]
    B --> B2[Ordinal]

    C --> C1[Discrete]
    C --> C2[Continuous]

    C2 --> C21[Interval]
    C2 --> C22[Ratio]

    %% Styling
    style A fill:#E1F5FE,stroke:#333
    style B fill:#90CAF9,stroke:#333
    style B1 fill:#90CAF9,stroke:#333
    style B2 fill:#90CAF9,stroke:#333
    style C fill:#FFF9C4,stroke:#333
    style C1 fill:#FFF9C4,stroke:#333
    style C2 fill:#FFF9C4,stroke:#333
    style C21 fill:#FFF9C4,stroke:#333
    style C22 fill:#FFF9C4,stroke:#333
  1. Categorical (Qualitative) #

    express a qualitative attribute e.g. hair color, eye color

Gaussian Mixture Model & Expectation Maximization

Gaussian Mixture Model & Expectation Maximization #

A Gaussian Mixture Model represents data as a weighted combination of multiple Gaussian distributions.

It is commonly used for soft clustering and density estimation.

Key takeaway:
K-means gives hard cluster membership.

GMM gives probabilities of belonging to each cluster.

  • Gaussian Mixture Model
  • soft clustering
  • mixing coefficients
  • latent variables
  • likelihood and log-likelihood
  • Expectation-Maximization algorithm
  • E-step and M-step
  • responsibilities
  • convergence

Motivation ☆ #

Many real datasets are not described well by one Gaussian distribution.

Instance-based Learning

Instance-based Learning #

Instance-based learning is a family of methods that do not build one explicit global model during training. Instead, they store training examples and delay most of the work until a new query arrives.

When a new point must be classified or predicted, the algorithm compares it with previously seen examples, finds the most relevant neighbours, and uses them to produce the answer.

Instance-based Learning covers three linked ideas:

Support Vector Machine

Support Vector Machine (SVM) #

Support Vector Machine (SVM) is a supervised machine learning algorithm used for:

  • Classification (most common)
  • Regression (SVR – Support Vector Regression)

It connects many earlier ideas:

  • classification and decision boundaries
  • linear classifiers
  • margins
  • optimisation
  • constrained optimisation
  • kernels for non-linear data

SVM is a discriminative classifier.

That means it does not try to model how each class is generated.

Instead, it tries to find the best separating boundary between classes.

Bayesian Learning

Bayesian Learning #

Bayesian Learning is a probabilistic approach to machine learning.

Instead of only asking, “Which output should the model predict?”, Bayesian Learning asks:

Given the data we have observed, how likely is each hypothesis, class, or parameter value?

This makes Bayesian Learning useful when uncertainty matters.

It is especially important in classification, probabilistic modelling, generative models, and situations where we want to combine prior knowledge with observed data.

Ensemble Learning

Ensemble Learning #

Ensemble Learning is a machine learning approach where we combine multiple models to produce a stronger final prediction.

Instead of depending on one model, an ensemble uses a group of models and combines their outputs.

The main idea is simple:

Many weak or moderately good models can work together to produce a better and more stable model.

Key takeaway:
Ensemble Learning improves prediction by combining several models.

Unsupervised Learning

Unsupervised Learning #

Unsupervised Learning is used when we have input data but no target labels.

The model is not told the correct answer. Instead, it tries to discover hidden structure in the data.

  • K-means Clustering and variants
  • Review of EM algorithm
  • GMM based Soft Clustering
  • Applications

Supervised vs Unsupervised Learning #

AspectSupervised LearningUnsupervised Learning
Data contains target label?YesNo
Learns fromInput-output pairsInput features only
Main goalPredict outputDiscover structure
Example taskClassification, regressionClustering
Example algorithmLogistic regression, decision treeK-means, GMM

  • Works on unlabelled raw data.
  • The algorithm discovers hidden patterns without prior knowledge of outcomes.
  • Requires no human intervention during training.
  • Does not make direct predictions — it groups or organises data instead.
  • Carries a higher risk because there’s no ground truth to verify results.
  • Common techniques include Clustering, Association, and Dimensionality Reduction.

The most common example is clustering, where similar records are grouped together.