February 22, 2026Common Probability Distributions
#
Once you can describe a random variable using a pmf or pdf, the next step is to use
named distributions that appear repeatedly in real data and in ML models.
Key takeaway:
Named distributions give you ready-made probability models for common patterns:
binary outcomes, counts, and measurement noise.
flowchart TD
PD["Probability<br/>distributions"] --> DS["Common<br/>distributions"]
DS --> DIS["Discrete"]
DS --> CON["Continuous"]
DIS --> D1["Bernoulli"]
DIS --> D2["Binomial"]
DIS --> D3["Poisson"]
CON --> D4["Normal<br/>(Gaussian)"]
CON --> D5["t / Chi-square / F<br/>(intro)"]
style PD fill:#90CAF9,stroke:#1E88E5,color:#000
style DS fill:#90CAF9,stroke:#1E88E5,color:#000
style DIS fill:#CE93D8,stroke:#8E24AA,color:#000
style CON fill:#CE93D8,stroke:#8E24AA,color:#000
style D1 fill:#C8E6C9,stroke:#2E7D32,color:#000
style D2 fill:#C8E6C9,stroke:#2E7D32,color:#000
style D3 fill:#C8E6C9,stroke:#2E7D32,color:#000
style D4 fill:#C8E6C9,stroke:#2E7D32,color:#000
style D5 fill:#C8E6C9,stroke:#2E7D32,color:#000
1) Bernoulli distribution (binary)
#
Use when:
one trial has two outcomes (success/failure).
March 12, 2026Hypothesis Testing
#
Hypothesis testing is a structured way to decide:
Is what we see in a sample just random variation,
or is there evidence of a real effect in the population?
Hypothesis Testing topic sits inside inferential statistics:
we use a sample to make a statement about a population.
- Sampling (random and stratified)
- Sampling distribution and Central Limit Theorem
- Estimation (confidence intervals and confidence level)
- Testing hypotheses (mean, proportion, ANOVA)
- Maximum likelihood (MLE)
Key takeaway:
The logic is always the same:
March 18, 2026Mathematical Foundations for Machine Learning
#
Machine Learning is built on mathematical principles that allow models to:
- represent data
- learn patterns
- optimise performance
flowchart LR
DATA[Data]
MATH[Math Models]
OPT[Optimisation]
MODEL[Trained Model]
DATA --> MATH
MATH --> OPT
OPT --> MODEL
ML requires core mathematical tools to understand how ML algorithms work internally. Algebra deals with relationships between variables and quantities, while Calculus focuses on change and optimization.
Prediction & Forecasting
#
Correlation
#
Regression
#
Time Series Analysis
#
Introduction, Components of time series data
#
MA model – basic and weighted MA model
#
Time series models
#
- AR Model
- ARIMA Model
- SARIMA,SARIMAX,VAR,VARMAX
- Simple exponential smoothing model
Reference
#
Prediction & Forecasting
Home | Statistics
March 12, 2026Statistics
#
Statistical methods help you turn raw data into reliable conclusions, while understanding uncertainty, variability, and confidence.
Statistics provides the language and tools for reasoning about data, uncertainty, and inference.
ML needs understanding data behaviour, drawing conclusions, and validating machine learning models.
- Collect Data
- Present & Organise Data (in a systematic manner)
- Alalyse Data
- Infer about the Data
- Take Decision from the Data
| Statistics Topic | What you learn (plain English) | ML Connection |
|---|
| 1. Basic Probability & Statistics | Summarise data; understand spread; basic probability rules | Data understanding (EDA), feature sanity checks, detecting outliers, interpreting “average behaviour” |
| 2. Conditional Probability & Bayes | Update probability using new information; Bayes’ rule | Naïve Bayes, Bayesian thinking, posterior probabilities, probabilistic classification |
| 3. Probability Distributions | Model randomness with distributions; expectation/variance/covariance | Likelihood models, noise assumptions (Gaussian), sampling, probabilistic modelling foundations |
| 4. Hypothesis Testing | Sampling, CLT, confidence intervals, significance tests, ANOVA, MLE | A/B testing, evaluating model improvements, significance vs noise, parameter estimation (MLE) |
| 5. Prediction & Forecasting | Correlation, regression, time series (AR/MA/ARIMA/SARIMA etc.) | Linear regression, forecasting, sequential data modelling, baseline predictive modelling |
| 6. GMM & EM | Mixtures of Gaussians; iterative estimation with EM | Unsupervised learning (soft clustering), density estimation, latent-variable models |
flowchart TD
A["Statistical Methods<br/>AIML ZC418"] --> B["1. Basic Probability and Statistics"]
A --> C["2. Conditional Probability and Bayes"]
A --> D["3. Probability Distributions"]
A --> E["4. Hypothesis Testing"]
A --> F["5. Prediction and Forecasting"]
A --> G["6. Gaussian Mixture Model and EM"]
B --> B1["Central Tendency<br/>Mean - Median - Mode"]
B --> B2["Variability<br/>Range - Variance - SD - Quartiles"]
B --> B3["Basic Probability Concepts"]
B3 --> B31["Axioms of Probability"]
B3 --> B32["Definition of Probability"]
B3 --> B33["Mutually Exclusive vs Independent"]
C --> C1["Conditional Probability"]
C --> C2["Independence (conditional)"]
C --> C3["Bayes Theorem"]
C --> C4["Naive Bayes (intro)"]
D --> D1["Random Variables<br/>Discrete and Continuous"]
D --> D2["Expectation - Variance - Covariance"]
D --> D3["Transformations of RVs"]
D --> D4["Key Distributions"]
D4 --> D41["Bernoulli"]
D4 --> D42["Binomial"]
D4 --> D43["Poisson"]
D4 --> D44["Normal (Gaussian)"]
D4 --> D45["t - Chi-square - F (intro)"]
E --> E1["Sampling<br/>Random and Stratified"]
E --> E2["Sampling Distributions<br/>CLT"]
E --> E3["Estimation<br/>Confidence Intervals"]
E --> E4["Hypothesis Tests<br/>Means and Proportions"]
E --> E5["ANOVA<br/>Single and Dual factor"]
E --> E6["Maximum Likelihood"]
F --> F1["Correlation"]
F --> F2["Regression"]
F --> F3["Time Series Basics<br/>Components"]
F --> F4["Moving Averages<br/>Simple and Weighted"]
F --> F5["Time Series Models"]
F5 --> F51["AR"]
F5 --> F52["ARMA / ARIMA"]
F5 --> F53["SARIMA / SARIMAX"]
F5 --> F54["VAR / VARMAX"]
F --> F6["Exponential Smoothing"]
G --> G1["GMM<br/>Mixture of Gaussians"]
G --> G2["EM Algorithm<br/>E-step - M-step"]
B -.-> C
C -.-> D
D -.-> E
E -.-> F
F -.-> G
Data - Types
#
flowchart TD
A[(Data)] --> B["Categorical (Qualitative)"]
A --> C["Numerical (Quantitative)"]
B --> B1[Nominal]
B --> B2[Ordinal]
C --> C1[Discrete]
C --> C2[Continuous]
C2 --> C21[Interval]
C2 --> C22[Ratio]
%% Styling
style A fill:#E1F5FE,stroke:#333
style B fill:#90CAF9,stroke:#333
style B1 fill:#90CAF9,stroke:#333
style B2 fill:#90CAF9,stroke:#333
style C fill:#FFF9C4,stroke:#333
style C1 fill:#FFF9C4,stroke:#333
style C2 fill:#FFF9C4,stroke:#333
style C21 fill:#FFF9C4,stroke:#333
style C22 fill:#FFF9C4,stroke:#333
Categorical (Qualitative)
#
express a qualitative attribute
e.g. hair color, eye color
Gaussian Mixture model & Expectation Maximization
#
Reference
#
Gaussian Mixture model
Expectation Maximization
Home | Statistics
Principal Component Analysis (PCA)
#
- dimensionality reduction technique
- helps us to reduce the number of features in a dataset while keeping the most important information.
- changes complex datasets by transforming correlated features into a smaller set of uncorrelated components.
- uses linear algebra to transform data into new features called principal components.
- finds these by calculating eigenvectors (directions) and eigenvalues (importance) from the covariance matrix.
- PCA selects the top components with the highest eigenvalues and projects the data onto them simplify the dataset.
PCA prioritizes the directions where the data varies the most because more variation = more useful information.