ML

Backpropagation and Automatic Differentiation

Backpropagation and Automatic Differentiation #

Backpropagation applies the chain rule:

  • efficiently across a computational graph.
  • repeatedly.

Chain rule:

[ \frac{dL}{dx} = \frac{dL}{dy} \cdot \frac{dy}{dx} ]
flowchart LR
    x --> y
    y --> L

Automatic differentiation computes exact derivatives efficiently using computational graphs.


Home | Vector Calculus

Angles and Orthogonality

Angles and Orthogonality #

Once we define an inner product, we can define the angle between two vectors.

Angles allow us to measure how aligned or different two vectors are in space.

Key Idea: Angle measures similarity between vectors. Orthogonality means complete independence (no similarity).

Why It Matters in Machine Learning #

  • PCA produces orthogonal components
  • Orthogonal features reduce redundancy
  • Gradient directions depend on angle

Angle Formula #

For vectors in n-dimensional space:

AI Stages: ANI, AGI, ASI

AI Development Stages: ANI → AGI → ASI #

Artificial Intelligence is often described in three stages, based on capability and scope:

  • ANI: Task-specific intelligence (today’s AI)
  • AGI: Human-level general intelligence (future goal)
  • ASI: Beyond human intelligence (theoretical)

AI Stages


ANI — Artificial Narrow Intelligence #

  • also called Weak AI
  • designed to perform one specific task
  • Operates within a predefined environment
  • Cannot generalise beyond its training
  • Most AI systems today are ANI

examples

Neural Networks

Neural Networks #

  • A network of artificial neurons inspired by how neurons function in the human brain.
  • At its core - a mathematical model designed to process and learn from data.
  • Neural networks form the foundation of Deep Learning (involves training large and complex networks on vast amounts of data).

flowchart LR
 subgraph subGraph0["Input Layer"]
        I1(("Input 1"))
        I2(("Input 2"))
        I3(("Input 3"))
  end
 subgraph subGraph1["Hidden Layer"]
        H1(("Hidden 1"))
        H2(("Hidden 2"))
        H3(("Hidden 3"))
  end
 subgraph subGraph2["Output Layer"]
        O(("Output"))
  end
    I1 --> H1 & H2 & H3
    I2 --> H1 & H2 & H3
    I3 --> H1 & H2 & H3
    H1 --> O
    H2 --> O
    H3 --> O

    style I1 fill:#C8E6C9
    style I2 fill:#C8E6C9
    style I3 fill:#C8E6C9
    style H1 stroke:#2962FF,fill:#BBDEFB
    style H2 fill:#BBDEFB
    style H3 fill:#BBDEFB
    style O fill:#FFCDD2
    style subGraph0 stroke:none,fill:transparent
    style subGraph1 stroke:none,fill:transparent
    style subGraph2 stroke:none,fill:transparent

Structure of a Neural Network #

A typical neural network has three main layers:

Machine Learning

Machine Learning #

stateDiagram-v2

    %% ===== CLASS DEFINITIONS (Math-based colours) =====
    classDef algebra fill:#cfe8ff,stroke:#1e3a8a,stroke-width:1px
    classDef probability fill:#d1fae5,stroke:#065f46,stroke-width:1px
    classDef geometry fill:#ffedd5,stroke:#9a3412,stroke-width:1px
    classDef logic fill:#ede9fe,stroke:#5b21b6,stroke-width:1px
    classDef category font-style:italic,font-weight:bold,fill:#aaaaaa,stroke:#374151,stroke-width:3px

    %% ===== ROOT =====
    ML: Machine Learning

    %% ===== SUPERVISED =====
    ML --> SL:::category
    SL: Supervised Learning

    SL --> Regression
    Regression --> LR:::algebra
    LR: Linear Regression

    LR --> NN:::algebra
    NN: Neural Network

    NN --> DT:::logic
    DT: Decision Tree

    SL --> Classification
    Classification --> NB:::probability
    NB: Naive Bayes

    NB --> KNN:::geometry
    KNN: k-Nearest Neighbours

    KNN --> SVM:::algebra
    SVM: Support Vector Machine
    
    %% ===== UNSUPERVISED =====
    ML --> USL:::category
    USL: Unsupervised Learning

    USL --> Clustering
    Clustering --> KM:::geometry
    KM: K-Means

    KM --> GMM:::probability
    GMM: Gaussian Mixture Model

    GMM --> HMM:::probability
    HMM: Hidden Markov Model

    %% ===== REINFORCEMENT =====
    ML --> RL:::category
    RL: Reinforcement Learning

    RL --> DM:::logic
    DM: Decision Making

Mathematical Legend

Algebra / Linear Algebra (Blue) #

Used heavily when models rely on:

Artificial Neuron and Perceptron

Artificial Neuron and Perceptron #

knowledge in neural networks is stored in connection weights, and learning means modifying those weights.


Biological Neuron #

A biological neuron is a specialised cell that processes and transmits information through electrical and chemical signals.

Core components:

  • Dendrites: receive signals from other neurons
  • Cell body (soma): processes incoming signals
  • Axon: transmits the output signal
  • Synapses: connection points between neurons

Biological intuition:

  • many inputs arrive to one neuron
  • one neuron can connect out to many neurons
  • massive parallelism enables fast perception and recognition

Artificial Neuron #

An artificial neuron is a simplified computational model inspired by biological neurons.

ML Workflow

Machine learning Workflow #

Data is the foundation of any machine learning system. Quality of data matters more than model complexity.

Role of Data #

Data determines:

  • What patterns the model can learn
  • How well it generalises
  • Whether bias or noise is introduced

Bad data → bad model (even with perfect algorithms).


Data Preprocessing, wrangling #

Raw data is never ready for training.

Data Issues

  • Noise
    • For objects, noise is an extraneous object
    • For attributes, noise refers to modification of original values
    • Use Log or Z Transfer to convert to mean
  • Outliers
    • Data objects with characteristics that are considerably different than most of the other data objects in the data set
    • Handle: Use IQR method
    • Find Lower and Upper Bound and replace Outlier with Lower or Upper Bound
  • Missing Values
    • Eliminate data objects or variables
    • Handle: Estimate missing values
      • Mean, Median or Mode
      • Prefer Median if there are missing outliers
    • Ignore the missing value during analysis
  • Duplicate Data
    • Major issue when merging data from heterogeneous sources
  • Inconsistent Codes
    • Find all Unique and transfer all inconsistent to

Data Preprocessing techniques