Reinforcement Learning (RL)
#
RL is learning by trial and error.
Reinforcement Learning (RL) is a type of machine learning where an autonomous agent learns to make decisions by interacting with an environment.
Instead of being told the correct answer, the agent:
- takes actions
- observes outcomes
- receives rewards or penalties
- gradually learns a strategy that maximises long-term reward
Reinforcement Learning teaches an agent how to act, not what to predict.
Useful Gradient Identities
#
[
\nabla (a^T x) = a
]
[
\nabla (x^T A x) = (A + A^T)x
]If A symmetric:
[
\nabla (x^T A x) = 2Ax
]These are heavily used in optimisation.
Home | Vector Calculus
Inner Products and Dot Product
#
An inner product maps two vectors to a single scalar.
It allows us to measure:
- similarity
- vector length
- projections
- orthogonality
flowchart TD
T["Inner<br/>products<br/>(types)"] --> DOT["Euclidean<br/>Dot product"]
T --> WIP["Weighted<br/>inner product"]
T --> FN["Function-space<br/>(integral)"]
T --> HERM["Complex<br/>Hermitian"]
T --> MAT["Matrix<br/>inner product<br/>(Frobenius)"]
DOT --> Rn["Vectors in<br/>
<span>
\( \mathbb{R}^n \)
</span>
"]
WIP --> SPD["SPD matrix<br/>W"]
FN --> L2["L2 space<br/>functions"]
HERM --> Cn["Vectors in<br/>C^n"]
MAT --> Mnm["Matrices<br/>R^{m×n}"]
style T fill:#90CAF9,stroke:#1E88E5,color:#000
style DOT fill:#C8E6C9,stroke:#2E7D32,color:#000
style WIP fill:#C8E6C9,stroke:#2E7D32,color:#000
style FN fill:#C8E6C9,stroke:#2E7D32,color:#000
style HERM fill:#C8E6C9,stroke:#2E7D32,color:#000
style MAT fill:#C8E6C9,stroke:#2E7D32,color:#000
style Rn fill:#CE93D8,stroke:#8E24AA,color:#000
style SPD fill:#CE93D8,stroke:#8E24AA,color:#000
style L2 fill:#CE93D8,stroke:#8E24AA,color:#000
style Cn fill:#CE93D8,stroke:#8E24AA,color:#000
style Mnm fill:#CE93D8,stroke:#8E24AA,color:#000
Definition
#
For vectors
\( \mathbf{a}, \mathbf{b} \in \mathbb{R}^n \)
Backpropagation and Automatic Differentiation
#
Backpropagation applies the chain rule:
- efficiently across a computational graph.
- repeatedly.
Chain rule:
[
\frac{dL}{dx} = \frac{dL}{dy} \cdot \frac{dy}{dx}
]
flowchart LR
x --> y
y --> L
Automatic differentiation computes exact derivatives efficiently using computational graphs.
Home | Vector Calculus
Angles and Orthogonality
#
Once we define an inner product, we can define the angle between two vectors.
Angles allow us to measure how aligned or different two vectors are in space.
Key Idea:
Angle measures similarity between vectors.
Orthogonality means complete independence (no similarity).
Why It Matters in Machine Learning
#
- PCA produces orthogonal components
- Orthogonal features reduce redundancy
- Gradient directions depend on angle
For vectors in n-dimensional space:
January 26, 2026AI
#
A selection of notes that didn’t fit elsewhere or are being worked on!.
Home
July 4, 2024AI Development Stages: ANI → AGI → ASI
#
Artificial Intelligence is often described in three stages, based on capability and scope:
- ANI: Task-specific intelligence (today’s AI)
- AGI: Human-level general intelligence (future goal)
- ASI: Beyond human intelligence (theoretical)

ANI — Artificial Narrow Intelligence
#
- also called Weak AI
- designed to perform one specific task
- Operates within a predefined environment
- Cannot generalise beyond its training
- Most AI systems today are ANI
examples