Instance-based Learning #
k-Nearest Neighbor Learning #
Distance / similarity choices #
Cosine similarity vs Euclidean distance #
Cosine similarity compares direction (angle), not magnitude:
\[ \cos(\theta)=\frac{x\cdot y}{\|x\|\|y\|} \]When to use cosine:
- text/document vectors (high-dimensional, sparse)
- when vector length should not dominate similarity
Euclidean distance measures straight-line distance:
\[ d(x,y)=\|x-y\| \]When to use Euclidean:
- continuous features on comparable scales (often after normalisation)