AI

MFML Lecture to Course Content Map

MFML Lecture to Course Content Map #

This file maps the uploaded Maths lecture PDFs and webinar PDFs against the official course handout/contact-session plan. It is intended as an exam preparation index and as a source map for future Hugo Markdown notes.

Course identity #

  • Course: Mathematical Foundations for Machine Learning
  • Course code: AIML ZC416
  • Main areas: linear algebra, vector spaces, matrix decompositions, vector calculus, optimisation, PCA, and SVM.

Official module structure #

ModuleCourse handout areaMain ideasUploaded lecture coverage
M1Solution of linear systemsSystems of equations, matrices, solving Ax = bLecture 1, Webinar 1
M2Vector spaces and analytic geometryVector spaces, linear independence, basis, rank, norms, inner products, angles, orthogonality, orthonormal basisLecture 2, Lecture 3, Webinar 1
M3Matrix decomposition methodsDeterminant, trace, eigenvalues, eigenvectors, Cholesky, eigendecomposition, diagonalisation, SVD, matrix approximationLecture 4, Lecture 5, Webinar 1, Webinar 2
M4Vector calculusUnivariate differentiation, partial derivatives, gradients, matrix gradients, Taylor/Maclaurin series, Hessian, backpropagation, automatic differentiationLecture 6, Lecture 7, Lecture 8, Webinar 2
M5Continuous optimisationGradient descent, constrained optimisation, Lagrange multipliers, convex optimisationLecture 9, Lecture 14, Webinar 2, Webinar 3, Webinar 4
M6Nonlinear optimisationLearning rate, initialisation, SGD, feature preprocessing, local optima, cliffs/valleys, momentum, AdaGrad, RMSProp, AdamLecture 10, Lecture 11, Webinar 3
M7Dimensionality reduction, PCA, SVMPCA perspectives, low-rank approximation, high-dimensional PCA, practical PCA, SVM preliminaries, primal/dual SVM, kernelsLecture 12, Lecture 13, Lecture 14, Lecture 15, Webinar 4

Contact session by lecture #

SessionCourse handout topicUploaded fileWhat the lecture appears to coverExam relevance
1Solution of linear systemsLecture_1.pdfLinear algebra introduction, closure, systems of linear equations, matrix representation, solution types: no solution, unique solution, infinite solutions, pivot/free variables, matrix operations, inverse, transpose, compact Ax=b formVery high for Mid-Sem and Comprehensive
2Vector spaces, linear independence, basis, rankLecture_2.pdfGroups, Abelian groups, vector spaces, vector subspaces, closure tests, linear combinations, span, linear independence, basis, rank, nullspace/column space ideasVery high for Mid-Sem and Comprehensive
3Analytic geometryLecture_3.pdfNorms, dot product, inner products, bilinear mappings, symmetric positive-definite matrices, lengths, distances, angles, orthogonality, orthonormal basis, Gram-Schmidt ideasVery high for Mid-Sem and Comprehensive
4Matrix Decomposition Ilecture_4.pdfDeterminant, cofactor formula, determinant behaviour under row operations, rank-det relation, eigenvalues/eigenvectors, Cholesky-related positive definite ideasVery high for Mid-Sem and Comprehensive
5Matrix Decomposition IIlecture_5.pdfDiagonal matrices, diagonalisation, eigendecomposition, spectral theorem for symmetric matrices, SVD, matrix approximationVery high for Mid-Sem and Comprehensive
6Vector Calculus Ilecture_6.pdfDifferentiation of univariate functions, polynomial derivatives, Taylor polynomial/series, partial derivatives, gradients, vector-valued gradientsVery high for Mid-Sem and Comprehensive
7Vector Calculus IIlecture_7_edited.pdfMatrix gradients, useful gradient identities, backpropagation, automatic differentiation, chain rule through neural-network layersHigh for Mid-Sem and Comprehensive
8Vector Calculus IIIlecture_8.pdfTaylor/Maclaurin series theory, remainder term, two-variable Taylor series, Hessian matrix, maxima/minima, unconstrained optimisation preliminariesVery high for Mid-Sem and Comprehensive
9Continuous OptimisationLecture_9.pdfGradient descent, negative gradient direction, local minima, step size, line search, convergence intuition, quadratic examplesVery high for Comprehensive; likely useful for quizzes/problems
10Nonlinear Optimisation ILecture_10.pdfInitialisation, objective functions in ML, overfitting, feature processing/preprocessing, SGD and practical optimisation behaviourHigh for Comprehensive
11Nonlinear Optimisation IILecture_11.pdfDifficult topologies: cliffs, valleys, flat regions, curvature; momentum, AdaGrad, RMSProp, AdamHigh for Comprehensive
12PCA ILecture_12.pdfDimensionality reduction, PCA problem setting, centred data, covariance, maximum variance perspective, projection perspectiveVery high for Comprehensive
13PCA IILecture_13.pdfPractical PCA, eigenvector computation, SVD relationship, low-rank approximation, high-dimensional PCA, key PCA stepsVery high for Comprehensive
14Mathematical preliminaries for SVMLecture 14.pdfConstrained optimisation, Lagrangian, quadratic programming, primal/dual, weak/strong duality, Slater condition, KKT conditions, kernels, linear classifiersVery high for Comprehensive
15Primal/dual linear SVMLecture_15.pdfSVM primal problem, dual formulation, KKT conditions, support vectors, hinge loss, linear SVM numerical problem, hard/soft-margin directionVery high for Comprehensive
16Nonlinear SVM / kernelsNot clearly uploaded as a separate Lecture 16 PDFKernel functions, nonlinear SVM examples; likely partly covered in Lecture 14/15 and webinarsVery high for Comprehensive; gap to fill if Lecture 16 exists

Webinar mapping #

Webinar fileMain roleBest linked lecturesExam use
Webinar_1.pdfProblem sheet on linear systems, REF/RREF, column space, nullspace, row independence, subspaces, inner products, Cauchy-Schwarz, Cholesky, eigenvaluesLectures 1-5Excellent for Mid-Sem problem practice
Webinar_2.pdfWorked problems on maxima/minima, eigenvalues/spectral decomposition, gradient-related calculations and PCA-style examplesLectures 4-9, 12-13Excellent for Mid-Sem revision and Comprehensive practice
Webinar_3.pdfGradient descent algorithm, step-size derivation for quadratic functions, worked gradient descent examplesLectures 8-11Excellent for optimisation exam problems
webinar_4.pdfAppears linked to optimisation/SVM/PCA practice based on uploaded set; use as problem-solving supplement after Lecture 12 onwardsLectures 12-15Comprehensive exam practice

Mid-Sem focus #

The course handout states that the Mid-Semester Test covers Weeks 1-8. So for Mid-Sem, focus on:

MFML Exam Revision Index

MFML Exam Revision Index #

This is a practical revision index for the uploaded Mathematical Foundations for Machine Learning material.

Exam split #

ExamCoverageMain files
Mid-SemesterWeeks/Sessions 1-8Lecture 1 to Lecture 8, Webinar 1, Webinar 2
ComprehensiveSessions 1-16Lecture 1 to Lecture 15, webinars, and any missing Lecture 16/kernel material

High-priority concept checklist #

Linear systems and matrices #

  • Convert equations into matrix form Ax = b
  • Understand solution types: no solution, unique solution, infinite solutions
  • Identify pivot and free variables
  • Understand row operations, REF/RREF, rank, nullity
  • Know matrix inverse and transpose properties

Vector spaces #

  • Definition of vector space and subspace
  • Closure under addition and scalar multiplication
  • Span, linear combination, linear independence
  • Basis, dimension, rank
  • Column space, row space, nullspace

Analytic geometry #

  • Norm properties
  • Manhattan norm and Euclidean norm
  • Inner product definition
  • Symmetric positive-definite matrices
  • Distance, angle, orthogonality
  • Orthonormal basis and Gram-Schmidt

Matrix decompositions #

  • Determinant and trace
  • Cofactor expansion
  • Row operation effect on determinant
  • Eigenvalue equation Av = λv
  • Characteristic equation det(A - λI) = 0
  • Diagonalisation A = PDP^{-1}
  • Spectral theorem for symmetric matrices
  • Cholesky decomposition
  • SVD A = UΣV^T
  • Low-rank approximation

Vector calculus #

  • Derivative from first principles
  • Partial derivatives
  • Gradient as direction of steepest ascent
  • Gradient of vector-valued functions
  • Matrix-gradient identities
  • Chain rule
  • Backpropagation and automatic differentiation

Taylor series and Hessian #

  • Taylor polynomial
  • Taylor series and Maclaurin series
  • Remainder term
  • Taylor series in two variables
  • Hessian matrix
  • First derivative and second derivative tests
  • Maxima, minima and saddle points

Gradient descent and optimisation #

  • Negative gradient direction
  • Learning rate/step size
  • Line search
  • Convergence and local minima
  • Constrained vs unconstrained optimisation
  • Lagrange multipliers
  • Convex optimisation
  • SGD and optimisation in ML
  • Feature preprocessing and scaling
  • Overfitting in optimisation examples

Nonlinear optimisation algorithms #

  • Difficult surfaces: cliffs, valleys, flat regions
  • Curvature and why first-order methods can struggle
  • Momentum update and intuition
  • AdaGrad
  • RMSProp
  • Adam
  • Learning rate decay

PCA #

  • Dimensionality reduction problem
  • Centred data and covariance matrix
  • Maximum variance view
  • Projection/reconstruction view
  • Principal components as eigenvectors of covariance matrix
  • SVD relation to PCA
  • Low-rank approximation and Eckart-Young theorem
  • PCA in high dimensions
  • Practical PCA steps

SVM #

  • Linear classifiers
  • Margin and support vectors
  • Hard-margin SVM primal formulation
  • Lagrangian for SVM
  • KKT conditions
  • Primal vs dual perspective
  • Role of inner products
  • Kernel trick
  • Hinge loss
  • Soft-margin SVM

Suggested revision order #

Phase 1: Foundations #

  1. Lecture 1
  2. Lecture 2
  3. Lecture 3
  4. Webinar 1 problems related to REF, nullspace, column space and subspaces

Phase 2: Matrix decompositions #

  1. Lecture 4
  2. Lecture 5
  3. Webinar 1 and Webinar 2 eigenvalue/eigendecomposition problems

Phase 3: Calculus and optimisation foundations #

  1. Lecture 6
  2. Lecture 7
  3. Lecture 8
  4. Webinar 2 maxima/minima and Hessian problems

Phase 4: Optimisation for ML #

  1. Lecture 9
  2. Lecture 10
  3. Lecture 11
  4. Webinar 3 gradient-descent step-size problems

Phase 5: PCA and SVM #

  1. Lecture 12
  2. Lecture 13
  3. Lecture 14
  4. Lecture 15
  5. Webinar 4 / SVM problems

What to ask me next #

Use these prompts when generating Hugo pages:

MFML Topic to Source Index

MFML Topic to Source Index #

This index tells you where to look when you want to create future notes or revise a topic.

TopicPrimary source PDFsSupporting source PDFsFuture Hugo page
Linear systemsLecture 1Webinar 101-linear-systems-and-matrices.md
Matrix operationsLecture 1Webinar 101-linear-systems-and-matrices.md
Vector spacesLecture 2Webinar 102-vector-spaces-subspaces-basis-rank.md
SubspacesLecture 2Webinar 102-vector-spaces-subspaces-basis-rank.md
Linear independence, span, basisLecture 2Webinar 102-vector-spaces-subspaces-basis-rank.md
Rank and nullityLecture 2Webinar 102-vector-spaces-subspaces-basis-rank.md
Norms and distancesLecture 3Webinar 103-analytic-geometry-norms-inner-products.md
Inner productsLecture 3Webinar 103-analytic-geometry-norms-inner-products.md
Orthogonality and Gram-SchmidtLecture 3Webinar 103-analytic-geometry-norms-inner-products.md
Determinant and traceLecture 4Webinar 104-determinants-trace-eigenvalues.md
Eigenvalues/eigenvectorsLecture 4Webinar 1, Webinar 204-determinants-trace-eigenvalues.md
CholeskyLecture 4Webinar 104-determinants-trace-eigenvalues.md
DiagonalisationLecture 5Webinar 205-eigendecomposition-svd-matrix-approximation.md
EigendecompositionLecture 5Webinar 205-eigendecomposition-svd-matrix-approximation.md
SVDLecture 5Lecture 13, Webinar 105-eigendecomposition-svd-matrix-approximation.md
DifferentiationLecture 6Webinar 206-vector-calculus-gradients.md
GradientsLecture 6, Lecture 7Webinar 2, Webinar 306-vector-calculus-gradients.md
BackpropagationLecture 707-backpropagation-automatic-differentiation.md
Automatic differentiationLecture 707-backpropagation-automatic-differentiation.md
Taylor/Maclaurin seriesLecture 6, Lecture 8Webinar 208-taylor-series-hessian-maxima-minima.md
HessianLecture 8Webinar 208-taylor-series-hessian-maxima-minima.md
Maxima/minimaLecture 8Webinar 208-taylor-series-hessian-maxima-minima.md
Gradient descentLecture 9Webinar 309-gradient-descent-continuous-optimisation.md
Step size / line searchLecture 9Webinar 309-gradient-descent-continuous-optimisation.md
Constrained optimisationLecture 9, Lecture 14Webinar 414-lagrangian-duality-kkt.md
Lagrange multipliersLecture 14Webinar 414-lagrangian-duality-kkt.md
KKT conditionsLecture 14, Lecture 15Webinar 414-lagrangian-duality-kkt.md
Feature preprocessingLecture 1010-nonlinear-optimisation-sgd-feature-preprocessing.md
OverfittingLecture 1010-nonlinear-optimisation-sgd-feature-preprocessing.md
SGDLecture 10Webinar 310-nonlinear-optimisation-sgd-feature-preprocessing.md
Cliffs and valleysLecture 1111-momentum-adagrad-rmsprop-adam.md
MomentumLecture 11Webinar 311-momentum-adagrad-rmsprop-adam.md
AdaGrad, RMSProp, AdamLecture 1111-momentum-adagrad-rmsprop-adam.md
PCA foundationsLecture 12Webinar 412-pca-foundations.md
PCA computationLecture 13Webinar 413-pca-practical-computation-svd.md
Low-rank PCALecture 13Lecture 513-pca-practical-computation-svd.md
SVM preliminariesLecture 14Webinar 415-support-vector-machines.md
Linear SVMLecture 15Webinar 415-support-vector-machines.md
Hinge lossLecture 15Webinar 415-support-vector-machines.md
Kernels / nonlinear SVMLecture 14/15, possibly missing Lecture 16Webinar 416-nonlinear-svm-kernels.md

Formula Sheet

Formula Sheet #

This page is a quick reference of definitions + formulas, grouped by the modules.


Notation #

  • Sample size: \( n \) (sample), \( N \) (population)
  • Sample mean: \( \bar{x} \) , population mean: \( \mu \)
  • Sample variance: \( s^2 \) , population variance: \( \sigma^2 \)
  • Sample SD: \( s \) , population SD: \( \sigma \)
  • Complement: \( A^c \)
  • Intersection (“and”): \( A\cap B \) , union (“or”): \( A\cup B \)
  • Conditional probability: \( P(A\mid B) \)

1. Basic Probability & Statistics #

1.1 Measures of Central Tendency #

Arithmetic mean #

Sample mean (ungrouped):

Supervised Learning

Supervised Learning #

Trained using labelled data.
Each example in the training set includes the correct output.
The algorithm learns to generalise and make predictions on unseen data.
Generally more accurate than unsupervised methods.
Requires human intervention for labelling and setup.
Widely used due to its accuracy and efficiency.
Produces highly accurate results when trained on good-quality labelled data.


Classification #

Output is discrete (e.g. Yes/No, Spam/Not Spam).
Used for categorising data into predefined classes.
Support Vector Machine (SVM) is a common classifier (a linear classifier with margin-based separation).

Artificial Intelligence

My AI Notes #

Learning how machines learn! My working notes as I learn AI.


flowchart LR
    AI[Artificial Intelligence]
    ML[Machine Learning]
    DL[Deep Learning]
    FM[Foundation Models]
    LLM[LLM Models]

    AI --> ML
    ML --> DL
    DL --> FM
    FM --> LLM

    style AI fill:#E1F5FE
    style ML fill:#C8E6C9
    style DL fill:#90CAF9
    style FM fill:#64B5F6
    style LLM fill:#FFCCBC

  • Mathematical Foundations for Machine Learning
  • Statistical Methods
  • Machine Learning
  • Deep Neural Networks


  • Machine Learning → The broad field where systems learn patterns from data to make predictions or decisions.
  • Neural Networks → A subset of machine learning that uses interconnected artificial neurons to model complex relationships.
  • Deep Learning → A subset of neural networks that uses many hidden layers to learn high-level features from large datasets.
  • Foundation Models → Large deep learning models trained on massive datasets and reused across many tasks using transfer learning.
  • LLMs (Large Language Models) → A specialised type of foundation model focused on understanding and generating human language.

flowchart TD
AI["Artificial<br/>Intelligence"]
ML["Machine<br/>Learning"]
NN["Neural<br/>Networks"]
DL["Deep<br/>Learning"]
FM["Foundation<br/>Models"]
LLM["LLM<br/>Models"]

AI --> ML
ML --> NN
NN --> DL
DL --> FM
FM --> LLM

LR["Linear<br/>Regression"]
DT["Decision<br/>Trees"]
ML --> LR
ML --> DT

MLP["MLP"]
CNN["CNN"]
NN --> MLP
NN --> CNN

CNNDL["CNN<br/>(deep)"]
RNN["RNN"]
DL --> CNNDL
DL --> RNN

BERT["BERT"]
CLIP["CLIP"]
FM --> BERT
FM --> CLIP

GPT["GPT"]
LLAMA["LLaMA"]
LLM --> GPT
LLM --> LLAMA

TEXT["Text"]
IMAGE["Images"]
AUDIO["Audio"]
VIDEO["Video"]
LLM --> TEXT
LLM --> IMAGE
LLM --> AUDIO
LLM --> VIDEO

style AI fill:#90CAF9,stroke:#1E88E5,color:#000
style ML fill:#90CAF9,stroke:#1E88E5,color:#000
style NN fill:#90CAF9,stroke:#1E88E5,color:#000

style DL fill:#CE93D8,stroke:#8E24AA,color:#000
style FM fill:#CE93D8,stroke:#8E24AA,color:#000

style LLM fill:#C8E6C9,stroke:#2E7D32,color:#000
style LR fill:#C8E6C9,stroke:#2E7D32,color:#000
style DT fill:#C8E6C9,stroke:#2E7D32,color:#000
style MLP fill:#C8E6C9,stroke:#2E7D32,color:#000
style CNN fill:#C8E6C9,stroke:#2E7D32,color:#000
style CNNDL fill:#C8E6C9,stroke:#2E7D32,color:#000
style RNN fill:#C8E6C9,stroke:#2E7D32,color:#000
style BERT fill:#C8E6C9,stroke:#2E7D32,color:#000
style CLIP fill:#C8E6C9,stroke:#2E7D32,color:#000
style GPT fill:#C8E6C9,stroke:#2E7D32,color:#000
style LLAMA fill:#C8E6C9,stroke:#2E7D32,color:#000
style TEXT fill:#C8E6C9,stroke:#2E7D32,color:#000
style IMAGE fill:#C8E6C9,stroke:#2E7D32,color:#000
style AUDIO fill:#C8E6C9,stroke:#2E7D32,color:#000
style VIDEO fill:#C8E6C9,stroke:#2E7D32,color:#000

AI, ML, DL, and Data Science Diagram

Stats Formula Sheet

Stats Formula Sheet #

Keep this page as a quick reference of definitions + formulas.


Notation #

  • Sample size: \( n \) (sample), \( N \) (population)
  • Mean: \( \bar{x} \) (sample), \( \mu \) (population)
  • Variance: \( s^2 \) (sample), \( \sigma^2 \) (population)
  • Standard deviation: \( s \) (sample), \( \sigma \) (population)

Module 1: Basic Statistics #

Measures of Central Tendency #

Sample mean (ungrouped):

Partial Differentiation and Gradients

Partial Differentiation and Gradients #

For f(x1, x2, …, xn):

[ \frac{\partial f}{\partial x_i} ]

Gradient vector:

[ \nabla f = \begin{bmatrix} \frac{\partial f}{\partial x_1} \ \vdots \ \frac{\partial f}{\partial x_n} \end{bmatrix} ]

Gradient points in direction of steepest ascent.

flowchart LR
    Input --> Function
    Function --> Gradient
    Gradient --> Optimisation

Home | Vector Calculus

Linear Independence

Linear Independence #

A set of vectors is linearly independent if none of them can be written as a linear combination of the others.

\[ c_1\mathbf{v}_1 + \cdots + c_k\mathbf{v}_k = \mathbf{0} \;\Rightarrow\; c_1=\cdots=c_k=0 \]

Independence means each vector adds new information.