AI

Basic Statistics

Basic Statistics #

Statistics: describes data (what you see).
Probability: models uncertainty (what you don’t know yet).

  • Summarise a dataset using central tendency and variability
  • Explain core probability ideas using simple examples
  • Apply the axioms of probability
  • Distinguish mutually exclusive vs independent events

flowchart TD
    A[Dataset] --> B[Central Tendency]
    A --> C[Variability]
    B --> B1[Mean]
    B --> B2[Median]
    B --> B3[Mode]
    C --> C1[Range]
    C --> C2[Variance]
    C --> C3[Standard Deviation]
    C --> C4[IQR]

Measures of Central Tendency #

Central tendency tells you where the “middle” of the data is. Describes a set of scores with a single number that describes the PERFORMANCE of the group.

Basic Probability

Basic Probability #

Probability models uncertainty: what you don’t know yet, but want to reason about.

Key takeaway: Probability is a number between 0 and 1 that measures how likely an event is. The whole topic is about defining events clearly and applying a few core rules consistently.

Probability quantifies uncertainty: a number between 0 and 1.

  • 0 means: impossible
  • 1 means: certain

Terminology #

Random experiment #

A random experiment is an action whose outcome is not known in advance.

Neural Networks

Neural Networks #

  • A network of artificial neurons inspired by how neurons function in the human brain.
  • At its core - a mathematical model designed to process and learn from data.
  • Neural networks form the foundation of Deep Learning (involves training large and complex networks on vast amounts of data).

flowchart LR
 subgraph subGraph0["Input Layer"]
        I1(("Input 1"))
        I2(("Input 2"))
        I3(("Input 3"))
  end
 subgraph subGraph1["Hidden Layer"]
        H1(("Hidden 1"))
        H2(("Hidden 2"))
        H3(("Hidden 3"))
  end
 subgraph subGraph2["Output Layer"]
        O(("Output"))
  end
    I1 --> H1 & H2 & H3
    I2 --> H1 & H2 & H3
    I3 --> H1 & H2 & H3
    H1 --> O
    H2 --> O
    H3 --> O

    style I1 fill:#C8E6C9
    style I2 fill:#C8E6C9
    style I3 fill:#C8E6C9
    style H1 stroke:#2962FF,fill:#BBDEFB
    style H2 fill:#BBDEFB
    style H3 fill:#BBDEFB
    style O fill:#FFCDD2
    style subGraph0 stroke:none,fill:transparent
    style subGraph1 stroke:none,fill:transparent
    style subGraph2 stroke:none,fill:transparent

Structure of a Neural Network #

A typical neural network has three main layers:

Conditional Probability & Bayes’ Theorem

Conditional Probability & Bayes’ Theorem #

Probability often changes when we learn new information.

Conditional probability and Bayes’ theorem give a structured way to update beliefs using evidence.

Conditional probability updates probabilities after observing an event.

Bayes’ theorem lets you estimate a hidden cause from observed evidence.

Naïve Bayes turns Bayes’ theorem into a practical classifier by assuming conditional independence of features given the class.


flowchart TD

A[Conditional<br/>probability] -->|foundation| B[Bayes<br/>theorem]
D[Independent<br/>events] -->|implies| C[Independence]
C -->|simplifies| A

E[Prior] -->|with likelihood| B
F[Likelihood] -->|updates| H[Posterior]
G[Evidence] -->|normalises| B
B -->|yields| H

I[Naïve<br/>Bayes] -->|uses| B
J[Naïve<br/>assumption] -->|assumes| C
K[Features] -->|given class| J
L[Class] -->|conditions| J
I -->|predicts| M[Classification]
M -->|selects| L

style A fill:#90CAF9,stroke:#1E88E5,color:#000
style B fill:#90CAF9,stroke:#1E88E5,color:#000
style C fill:#90CAF9,stroke:#1E88E5,color:#000

style D fill:#CE93D8,stroke:#8E24AA,color:#000
style E fill:#CE93D8,stroke:#8E24AA,color:#000
style F fill:#CE93D8,stroke:#8E24AA,color:#000
style G fill:#CE93D8,stroke:#8E24AA,color:#000
style J fill:#CE93D8,stroke:#8E24AA,color:#000
style K fill:#CE93D8,stroke:#8E24AA,color:#000
style L fill:#CE93D8,stroke:#8E24AA,color:#000

style H fill:#C8E6C9,stroke:#2E7D32,color:#000
style I fill:#C8E6C9,stroke:#2E7D32,color:#000
style M fill:#C8E6C9,stroke:#2E7D32,color:#000


Quick summary #

  • Conditional probability: updates probability after an event is known.
  • Multiplication rule: computes joint probability from conditional parts.
  • Independence: tested using \( P(A\cap B)=P(A)P(B) \) .
  • Total probability: breaks a probability into weighted cases.
  • Bayes’ theorem: reverses conditioning to infer causes from evidence.

What’s next #

Probability Distributions
Move from events to random variables and distributions.

Machine Learning

Machine Learning #

stateDiagram-v2

    %% ===== CLASS DEFINITIONS (Math-based colours) =====
    classDef algebra fill:#cfe8ff,stroke:#1e3a8a,stroke-width:1px
    classDef probability fill:#d1fae5,stroke:#065f46,stroke-width:1px
    classDef geometry fill:#ffedd5,stroke:#9a3412,stroke-width:1px
    classDef logic fill:#ede9fe,stroke:#5b21b6,stroke-width:1px
    classDef category font-style:italic,font-weight:bold,fill:#aaaaaa,stroke:#374151,stroke-width:3px

    %% ===== ROOT =====
    ML: Machine Learning

    %% ===== SUPERVISED =====
    ML --> SL:::category
    SL: Supervised Learning

    SL --> Regression
    Regression --> LR:::algebra
    LR: Linear Regression

    LR --> NN:::algebra
    NN: Neural Network

    NN --> DT:::logic
    DT: Decision Tree

    SL --> Classification
    Classification --> NB:::probability
    NB: Naive Bayes

    NB --> KNN:::geometry
    KNN: k-Nearest Neighbours

    KNN --> SVM:::algebra
    SVM: Support Vector Machine
    
    %% ===== UNSUPERVISED =====
    ML --> USL:::category
    USL: Unsupervised Learning

    USL --> Clustering
    Clustering --> KM:::geometry
    KM: K-Means

    KM --> GMM:::probability
    GMM: Gaussian Mixture Model

    GMM --> HMM:::probability
    HMM: Hidden Markov Model

    %% ===== REINFORCEMENT =====
    ML --> RL:::category
    RL: Reinforcement Learning

    RL --> DM:::logic
    DM: Decision Making

Mathematical Legend

Algebra / Linear Algebra (Blue) #

Used heavily when models rely on:

AI Stack

AI Stack #

The AI Stack describes the layers required to build an end-to-end AI system, from infrastructure at the bottom to user-facing applications at the top.

Different organisations represent the AI stack differently; this is a simplified conceptual view for learning.

Each layer depends on the one below it.


graph TB

    subgraph APP["Applications"]
        A[User Interfaces & Integrations]
    end

    subgraph ORCH["Orchestration"]
        O[Workflows • Agents • Control Logic]
    end

    subgraph DATA["Data"]
        D[Data Sources • Pipelines • Vector DBs]
    end

    subgraph MODEL["Models"]
        M[ML • DL • Foundation Models • LLMs]
    end

    subgraph INFRA["Infrastructure"]
        I[Cloud • On-prem • GPUs • Storage]
    end

    %% Styling
    style APP fill:#FFCCBC
    style ORCH fill:#90CAF9
    style DATA fill:#BBDEFB
    style MODEL fill:#C8E6C9
    style INFRA fill:#E1F5FE

    style A fill:#FFE0B2
    style O fill:#B3E5FC
    style D fill:#E3F2FD
    style M fill:#DCEDC8
    style I fill:#E1F5FE

1. Infrastructure #

The foundation that provides compute and storage.

Artificial Neuron and Perceptron

Artificial Neuron and Perceptron #

knowledge in neural networks is stored in connection weights, and learning means modifying those weights.


Biological Neuron #

A biological neuron is a specialised cell that processes and transmits information through electrical and chemical signals.

Core components:

  • Dendrites: receive signals from other neurons
  • Cell body (soma): processes incoming signals
  • Axon: transmits the output signal
  • Synapses: connection points between neurons

Biological intuition:

  • many inputs arrive to one neuron
  • one neuron can connect out to many neurons
  • massive parallelism enables fast perception and recognition

Artificial Neuron #

An artificial neuron is a simplified computational model inspired by biological neurons.

ML Workflow

Machine learning Workflow #

Data is the foundation of any machine learning system. Quality of data matters more than model complexity.

Role of Data #

Data determines:

  • What patterns the model can learn
  • How well it generalises
  • Whether bias or noise is introduced

Bad data → bad model (even with perfect algorithms).


Data Preprocessing, wrangling #

Raw data is never ready for training.

Data Issues

  • Noise
    • For objects, noise is an extraneous object
    • For attributes, noise refers to modification of original values
    • Use Log or Z Transfer to convert to mean
  • Outliers
    • Data objects with characteristics that are considerably different than most of the other data objects in the data set
    • Handle: Use IQR method
    • Find Lower and Upper Bound and replace Outlier with Lower or Upper Bound
  • Missing Values
    • Eliminate data objects or variables
    • Handle: Estimate missing values
      • Mean, Median or Mode
      • Prefer Median if there are missing outliers
    • Ignore the missing value during analysis
  • Duplicate Data
    • Major issue when merging data from heterogeneous sources
  • Inconsistent Codes
    • Find all Unique and transfer all inconsistent to

Data Preprocessing techniques

Conditional Probability

Conditional Probability #

Conditional probability updates the probability of an event when new information is available.

It shows up whenever a question says:

  • “given that…”
  • “among those who…”
  • “out of the items that…”
  • “if it does not fail immediately…”

Key takeaway: Conditional probability is always:

joint probability ÷ probability of the condition.

The condition must not be an impossible event.


Prior vs posterior #

  • Prior probability: probability with no condition (before new information)

Bayes’ Theorem

Bayes’ Theorem #

2.1 Total probability (needed for Bayes) #

Often we split the world into cases \( E_1,E_2,\dots,E_k \) that:

  • are mutually exclusive
  • cover the whole sample space

Then for any event \( A \) :

\[ P(A)=\sum_{i=1}^{k} P(A\mid E_i)\,P(E_i) \]

Tree intuition: