Stochastic Gradient Descent (SGD)
#
SGD uses mini-batches to trade exact gradients for speed and generalisation.
Home | Nonlinear Optimisation
Momentum-Based Learning
#
Momentum smooths updates and helps traverse valleys efficiently.
Home | Nonlinear Optimisation
Adaptive Methods: AdaGrad, RMSProp, Adam
#
Adaptive methods adjust learning rates per-parameter.
Home | Nonlinear Optimisation
Tuning Hyperparameters and Preprocessing
#
- Learning rate schedules
- Initialisation
- Tuning hyperparameters
- Importance of feature preprocessing
Home | Nonlinear Optimisation
Dimensionality reduction and PCA
#
PCA and SVM connect linear algebra, geometry, and optimisation.
Home | Linear Algebra
Principal Component Analysis (PCA)
#
- dimensionality reduction technique
- helps us to reduce the number of features in a dataset while keeping the most important information.
- changes complex datasets by transforming correlated features into a smaller set of uncorrelated components.
- uses linear algebra to transform data into new features called principal components.
- finds these by calculating eigenvectors (directions) and eigenvalues (importance) from the covariance matrix.
- PCA selects the top components with the highest eigenvalues and projects the data onto them simplify the dataset.
PCA prioritizes the directions where the data varies the most because more variation = more useful information.
PCA Theory
#
- Problem setting
- Maximum variance perspective
- Projection perspective
- Eigenvector and low-rank approximations
Home | Dimensionality reduction and PCA
PCA in Practice
#
Key steps of PCA in practice, including considerations in high dimensions.
Home | Dimensionality reduction and PCA
Latent Variable Perspective
#
PCA can be interpreted as modelling data using a smaller number of latent variables.
Home | Dimensionality reduction and PCA
Mathematical Preliminaries of SVM
#
- Primal and dual perspectives
- Geometry of margins
Home | Dimensionality reduction and PCA