Challenges in Gradient-Based Optimisation
Challenges in Gradient-Based Optimisation #
- Local optima and flat regions
- Differential curvature
- Difficult topologies (cliffs and valleys)
SGD uses mini-batches to trade exact gradients for speed and generalisation.
Momentum smooths updates and helps traverse valleys efficiently.
Adaptive methods adjust learning rates per-parameter.
PCA and SVM connect linear algebra, geometry, and optimisation.
Dimensionality reduction means representing high-dimensional data using fewer dimensions while trying to preserve the important structure of the data.
Principal Components Analysis, or PCA, is a linear dimensionality reduction method. It finds directions in the data along which the variance is maximum, and projects the data onto those directions.
Key takeaway: PCA chooses the eigenvectors of the covariance matrix corresponding to the largest eigenvalues. These eigenvectors form the principal subspace. The largest eigenvalues represent the directions that preserve the most variance.
A curated list of high-quality online courses to learn Artificial Intelligence, Machine Learning, and Deep Learning from reputable universities and organisations.
Deep Learning. MIT Press.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). (Vol. 1, No. 2).
Introduction to Deep Learning. MIT Press.
Eugene, C. (2019).
Deep Learning with Python. Simon & Schuster.
Chollet, F. (2021).
This page explains both data preprocessing and model development concepts in a clear, structured way to support understanding.
A complete ML pipeline includes preprocessing, feature engineering, feature selection, and model training.
Raw data is often:
Preprocessing ensures data is suitable for machine learning.
Why they occur
Methods