Support Vector Machine (SVM) #

A Support Vector Machine (SVM) is a supervised machine learning algorithm used for:

Classification (most common)
Regression (SVR – Support Vector Regression)

Find the decision boundary that separates classes with the maximum margin.

A Support Vector Machine is a supervised learning algorithm that finds an optimal hyperplane by maximising the margin between classes, using support vectors and kernel functions to handle non-linear data.

Intuition #

Imagine two groups of points on a graph.

Many lines can separate them, but SVM asks:

Which boundary stays as far away as possible from both groups?

That safest boundary is the optimal hyperplane.

A larger margin usually leads to better generalisation on unseen data.

Key Concepts #

Hyperplane #

The decision boundary
In 2D → a line
In 3D → a plane
In higher dimensions → a hyperplane

Margin #

Distance between the hyperplane and the nearest points
SVM maximises this margin

Larger margin = better generalisation

Support Vectors #

The closest points to the hyperplane
They define the decision boundary
Removing them changes the model

Linearly Separable Data #

Data is linearly separable if a straight line (or hyperplane) can perfectly separate the classes.

No misclassification
Clean decision boundary
Ideal case for SVM

This leads to Hard-Margin SVM.

Hard-Margin vs Soft-Margin SVM #

Hard-Margin SVM #

Perfect separation
No errors allowed
Works only when data is clean and separable

Soft-Margin SVM #

Allows some misclassification
Controlled by parameter C

C value	Behaviour
Large C	Strict fit, low error, risk of overfitting
Small C	More tolerant, better generalisation

Soft-margin SVM is used in practice.

Non-Linearly Separable Data #

Many real-world datasets cannot be separated using a straight line.

Examples:

Concentric circles
XOR-like patterns
Complex text or image data

In such cases, a linear boundary is insufficient.

Kernel Trick (Mercer’s Idea) #

The Problem #

Data is not linearly separable in the original feature space.

The Solution #

Map the data into a higher-dimensional space where it becomes separable.

The Key Insight #

This mapping can be done implicitly using a kernel function, without explicitly computing the higher dimension.

This idea is known as the Kernel Trick.

Common Kernels #

Linear – when data is already separable
Polynomial – captures interactions
(RBF - Radial Basis Function Gaussian)[https://www.geeksforgeeks.org/machine-learning/radial-basis-function-kernel-machine-learning/] - most popular
Sigmoid

The kernel trick allows SVMs to create non-linear decision boundaries while still using a linear algorithm in a transformed space.

SVM for Regression (SVR) #

SVM can also predict continuous values.

Fits a function inside an ε-tube
Errors inside the tube are ignored
Only points outside the margin influence the model

Applications of SVM #

SVMs are effective for both structured and unstructured data.

Structured Data #

Tabular datasets
Finance and risk scoring
Bioinformatics

Unstructured Data #

Text classification
Image recognition
Handwriting and OCR
Spam detection

They work especially well in high-dimensional spaces.

SVM vs Neural Networks #

Aspect	SVM	Neural Networks
Data size	Small–medium	Large
Training	Convex (global optimum)	Non-convex
Feature learning	❌ No	✅ Yes
Interpretability	Medium	Low

Intuition #

SVM draws the safest possible boundary between classes.

Home | Machine Learning

Support Vector Machine (SVM) #

Intuition #

Key Concepts #

Hyperplane #

Margin #

Support Vectors #

Linearly Separable Data #

Hard-Margin vs Soft-Margin SVM #

Hard-Margin SVM #

Soft-Margin SVM #

Non-Linearly Separable Data #

Kernel Trick (Mercer’s Idea) #

The Problem #

The Solution #

The Key Insight #

Common Kernels #

SVM for Regression (SVR) #

Applications of SVM #

Structured Data #

Unstructured Data #

SVM vs Neural Networks #

Intuition #