AI

Reinforcement Learning

AI, Machine Learning, Reinforcement Learning

Reinforcement Learning (RL) #

RL is learning by trial and error.

Reinforcement Learning (RL) is a type of machine learning where an autonomous agent learns to make decisions by interacting with an environment.

Instead of being told the correct answer, the agent:

takes actions
observes outcomes
receives rewards or penalties
gradually learns a strategy that maximises long-term reward

Reinforcement Learning teaches an agent how to act, not what to predict.

Useful Gradient Identities

Machine Learning, Mathematics, Vector Calculus

Useful Gradient Identities #

[ \nabla (a^T x) = a ] [ \nabla (x^T A x) = (A + A^T)x ]

If A symmetric:

[ \nabla (x^T A x) = 2Ax ]

These are heavily used in optimisation.

Home | Vector Calculus

Inner Products and Dot Product

Machine Learning, Mathematics, Linear Algebra

Inner Products and Dot Product #

An inner product maps two vectors to a single scalar.

It allows us to measure:

similarity
vector length
projections
orthogonality

flowchart TD
T["Inner<br/>products<br/>(types)"] --> DOT["Euclidean<br/>Dot product"]
T --> WIP["Weighted<br/>inner product"]
T --> FN["Function-space<br/>(integral)"]
T --> HERM["Complex<br/>Hermitian"]
T --> MAT["Matrix<br/>inner product<br/>(Frobenius)"]

DOT --> Rn["Vectors in<br/>
<span>
  \( \mathbb{R}^n \)
  </span>

"]
WIP --> SPD["SPD matrix<br/>W"]
FN --> L2["L2 space<br/>functions"]
HERM --> Cn["Vectors in<br/>C^n"]
MAT --> Mnm["Matrices<br/>R^{m×n}"]

style T fill:#90CAF9,stroke:#1E88E5,color:#000

style DOT fill:#C8E6C9,stroke:#2E7D32,color:#000
style WIP fill:#C8E6C9,stroke:#2E7D32,color:#000
style FN fill:#C8E6C9,stroke:#2E7D32,color:#000
style HERM fill:#C8E6C9,stroke:#2E7D32,color:#000
style MAT fill:#C8E6C9,stroke:#2E7D32,color:#000

style Rn fill:#CE93D8,stroke:#8E24AA,color:#000
style SPD fill:#CE93D8,stroke:#8E24AA,color:#000
style L2 fill:#CE93D8,stroke:#8E24AA,color:#000
style Cn fill:#CE93D8,stroke:#8E24AA,color:#000
style Mnm fill:#CE93D8,stroke:#8E24AA,color:#000

Definition #

For vectors
\( \mathbf{a}, \mathbf{b} \in \mathbb{R}^n \)

Backpropagation and Automatic Differentiation

Machine Learning, Mathematics, Vector Calculus

Backpropagation and Automatic Differentiation #

Backpropagation applies the chain rule:

efficiently across a computational graph.
repeatedly.

Chain rule:

[ \frac{dL}{dx} = \frac{dL}{dy} \cdot \frac{dy}{dx} ]

flowchart LR
    x --> y
    y --> L

Automatic differentiation computes exact derivatives efficiently using computational graphs.

Home | Vector Calculus

Higher-order derivatives

Machine Learning, Mathematics, Vector Calculus

Higher-order derivatives #

Home | Vector Calculus

Angles and Orthogonality

Linear Algebra, Vector Spaces

Angles and Orthogonality #

Once we define an inner product, we can define the angle between two vectors.

Angles allow us to measure how aligned or different two vectors are in space.

Key Idea: Angle measures similarity between vectors. Orthogonality means complete independence (no similarity).

Why It Matters in Machine Learning #

PCA produces orthogonal components
Orthogonal features reduce redundancy
Gradient directions depend on angle

Angle Formula #

For vectors in n-dimensional space:

Taylor’s series

Machine Learning, Mathematics, Vector Calculus

Linearization and multivariate Taylor’s series #

Home | Vector Calculus

Maxima and Minima

Machine Learning, Mathematics, Vector Calculus

Computing maxima and minima for unconstrained optimization #

Home | Vector Calculus

AI Foundation

January 26, 2026

AI #

A selection of notes that didn’t fit elsewhere or are being worked on!.

AI Stages: ANI, AGI, ASI

July 4, 2024

AI, ANI, AGI, ASI

AI Development Stages: ANI → AGI → ASI #

Artificial Intelligence is often described in three stages, based on capability and scope:

ANI: Task-specific intelligence (today’s AI)
AGI: Human-level general intelligence (future goal)
ASI: Beyond human intelligence (theoretical)

AI Stages

ANI — Artificial Narrow Intelligence #

also called Weak AI
designed to perform one specific task
Operates within a predefined environment
Cannot generalise beyond its training
Most AI systems today are ANI

examples

1
2
3
4
5