Characteristic Polynomial #
The characteristic polynomial of a square matrix is the key tool used to compute eigenvalues.
It connects:
- Determinants
- Trace
- Eigenvalues
- Matrix structure
Definition #
Let
\( A \in \mathbb{R}^{n \times n} \)
and
\( \lambda \in \mathbb{R} \)
.
The characteristic polynomial of (A) is defined as:
\[ p_A(\lambda) = \det(A - \lambda I) \]
It is a polynomial in \( \lambda \) of degree (n).
General Form #
We can show that:
\[ p_A(\lambda) = c_0 + c_1 \lambda + \dots + c_{n-1} \lambda^{n-1} + (-1)^n \lambda^n \]
where
\( c_0, c_1, \dots, c_{n-1} \in \mathbb{R} \)
.
Important Coefficients #
Two coefficients are especially important.
1. Constant Term #
Set \( \lambda = 0 \) :
\[ p_A(0) = \det(A) \]
So:
\( c_0 = \det(A) \)
2. Coefficient of ( \lambda^{n-1} ) #
One can show (via determinant expansion) that:
\[ c_{n-1} = (-1)^{n-1} \operatorname{tr}(A) \]
So the trace of a matrix appears directly inside the characteristic polynomial.
Leading Term #
The highest-degree term is always:
\[ (-1)^n \lambda^n \]
So:
The characteristic polynomial is always degree (n).
Why These Coefficients Appear (Intuition from Expansion) #
Consider a (3 \times 3) matrix:
\[ A - \lambda I = \begin{bmatrix} a_{11} - \lambda & a_{12} & a_{13} \\ a_{21} & a_{22} - \lambda & a_{23} \\ a_{31} & a_{32} & a_{33} - \lambda \end{bmatrix} \]
When expanding the determinant:
The product
\( \prod_{i=1}^{3} (a_{ii} - \lambda) \)
generates the highest powers of \( \lambda \) .Other determinant terms contain fewer factors of \( \lambda \)
and therefore produce lower powers.
General Case (n × n) #
When expanding along the first row:
The term
\( \prod_{i=1}^{n}(a_{ii} - \lambda) \)
produces powers up to \( \lambda^n \) .All other expansion terms contain minors where at least one
\( (a_{ii} - \lambda) \) factor is removed.
So:
- Only one contributor produces the ( \lambda^n ) term
- Only that same contributor produces the ( \lambda^{n-1} ) term
Which leads to:
\[ \text{Coefficient of } \lambda^n = (-1)^n \]
and
\[ \text{Coefficient of } \lambda^{n-1} = (-1)^{n-1} \sum_{i=1}^{n} a_{ii} = (-1)^{n-1} \operatorname{tr}(A) \]
Connection to Eigenvalues #
Eigenvalues are defined as the roots of the characteristic polynomial.
\[ \det(A - \lambda I) = 0 \]
Solving this equation gives all eigenvalues of (A).
Important Consequences #
From the polynomial structure we obtain:
1️⃣ Product of Eigenvalues #
\[ \prod_{i=1}^{n} \lambda_i = \det(A) \]
2️⃣ Sum of Eigenvalues #
\[ \sum_{i=1}^{n} \lambda_i = \operatorname{tr}(A) \]
These follow from the relationship between polynomial coefficients and roots.
Why This Matters in Machine Learning #
Characteristic polynomials are used to:
- Compute eigenvalues (PCA, SVD foundations)
- Analyse stability of systems
- Understand diagonalisation
- Study covariance matrices
- Analyse Hessians in optimisation
Every time you compute eigenvalues, you are solving
\( \det(A - \lambda I) = 0 \).
Summary #
- The characteristic polynomial is
\( p_A(\lambda) = \det(A - \lambda I) \) - It is a degree (n) polynomial
- Constant term = determinant
- ( \lambda^{n-1} ) coefficient involves the trace
- Roots of the polynomial are the eigenvalues