Understanding Eigenvalues and Eigenvectors in Machine Learning
Master eigenvalues and eigenvectors for ML - geometric intuition, mathematical foundations, PCA connection, and Python implementations with real-world applications.
Understanding Eigenvalues and Eigenvectors in Machine Learning
Reading Time: 16 minutes | Word Count: 4,000+ | Difficulty: Intermediate
“The eigenvectors of a matrix are the directions in which the linear transformation acts simply—by stretching.” — David C. Lay
If you’ve ever used PCA, spectral clustering, or wondered how Google’s PageRank works, you’ve encountered eigenvalues and eigenvectors. These concepts might seem abstract, but they’re fundamental to many machine learning algorithms.
In this guide, we’ll build intuition from the ground up, connect the math to practical ML applications, and implement everything in Python.
Prerequisites
Before diving in, make sure you understand:
- Vector and matrix basics (see our Linear Algebra Guide)
- Matrix multiplication
- Basic Python and NumPy
What Are Eigenvectors and Eigenvalues?
The Core Idea
When you multiply a matrix by a vector, the result is usually a vector pointing in a different direction:
$$\mathbf{A} \cdot \mathbf{v} = \mathbf{w} \quad \text{(different direction)}$$
But some special vectors, called eigenvectors, only get stretched or compressed—they keep pointing in the same direction:
$$\mathbf{A} \cdot \mathbf{v} = \lambda \mathbf{v} \quad \text{(same direction, scaled by } \lambda \text{)}$$
- $\mathbf{v}$ is an eigenvector of matrix $\mathbf{A}$
- $\lambda$ (lambda) is the corresponding eigenvalue (the scaling factor)
Visual Intuition
import numpy as np
import matplotlib.pyplot as plt
# Define a transformation matrix
A = np.array([[3, 1],
[1, 3]])
# Regular vector - changes direction
v_regular = np.array([1, 0])
Av_regular = A @ v_regular
# Eigenvector - only changes magnitude
eigenvalues, eigenvectors = np.linalg.eig(A)
v_eigen = eigenvectors[:, 0] # First eigenvector
Av_eigen = A @ v_eigen
print(f"Regular vector: {v_regular} → {Av_regular}")
print(f"Direction changed: {np.arctan2(v_regular[1], v_regular[0]):.2f} → {np.arctan2(Av_regular[1], Av_regular[0]):.2f} radians")
print(f"\nEigenvector: {v_eigen}")
print(f"After transformation: {Av_eigen}")
print(f"Eigenvalue (scaling factor): {eigenvalues[0]:.2f}")
print(f"v_eigen * eigenvalue: {v_eigen * eigenvalues[0]}") # Should match Av_eigen
The Eigenvalue Equation
Mathematically, we’re looking for solutions to:
$$\mathbf{A}\mathbf{v} = \lambda\mathbf{v}$$
Rearranging: $$(\mathbf{A} - \lambda\mathbf{I})\mathbf{v} = \mathbf{0}$$
For non-trivial solutions (non-zero $\mathbf{v}$), we need:
$$\det(\mathbf{A} - \lambda\mathbf{I}) = 0$$
This determinant equation gives us the characteristic polynomial, whose roots are the eigenvalues.
Computing Eigenvalues and Eigenvectors
By Hand (2×2 Example)
Let’s find eigenvalues and eigenvectors of: $$\mathbf{A} = \begin{bmatrix} 4 & 2 \ 1 & 3 \end{bmatrix}$$
Step 1: Set up the characteristic equation $$\det(\mathbf{A} - \lambda\mathbf{I}) = 0$$
$$\det\begin{bmatrix} 4-\lambda & 2 \ 1 & 3-\lambda \end{bmatrix} = 0$$
$$(4-\lambda)(3-\lambda) - 2 \cdot 1 = 0$$ $$\lambda^2 - 7\lambda + 10 = 0$$ $$(\lambda - 5)(\lambda - 2) = 0$$
Eigenvalues: $\lambda_1 = 5$, $\lambda_2 = 2$
Step 2: Find eigenvectors for each eigenvalue
For $\lambda_1 = 5$: $$(\mathbf{A} - 5\mathbf{I})\mathbf{v} = \mathbf{0}$$ $$\begin{bmatrix} -1 & 2 \ 1 & -2 \end{bmatrix} \begin{bmatrix} v_1 \ v_2 \end{bmatrix} = \begin{bmatrix} 0 \ 0 \end{bmatrix}$$
From first row: $-v_1 + 2v_2 = 0$, so $v_1 = 2v_2$
Eigenvector: $\mathbf{v}_1 = \begin{bmatrix} 2 \ 1 \end{bmatrix}$ (or any scalar multiple)
Similarly for $\lambda_2 = 2$: $\mathbf{v}_2 = \begin{bmatrix} 1 \ -1 \end{bmatrix}$
With NumPy (The Practical Way)
import numpy as np
A = np.array([[4, 2],
[1, 3]])
# Compute eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
print("Eigenvalues:", eigenvalues)
print("Eigenvectors (columns):\n", eigenvectors)
# Verify: A @ v = λ * v
for i in range(len(eigenvalues)):
v = eigenvectors[:, i]
lam = eigenvalues[i]
Av = A @ v
lambda_v = lam * v
print(f"\nEigenvalue {lam:.2f}:")
print(f" A @ v = {Av}")
print(f" λ * v = {lambda_v}")
print(f" Match: {np.allclose(Av, lambda_v)}")
Output:
Eigenvalues: [5. 2.]
Eigenvectors (columns):
[[ 0.894 0.707]
[ 0.447 -0.707]]
Eigenvalue 5.00:
A @ v = [4.472 2.236]
λ * v = [4.472 2.236]
Match: True
Properties of Eigenvalues and Eigenvectors
Key Properties to Remember
| Property | Formula | ML Relevance |
|---|---|---|
| Sum of eigenvalues | $\sum \lambda_i = \text{trace}(\mathbf{A})$ | Model complexity |
| Product of eigenvalues | $\prod \lambda_i = \det(\mathbf{A})$ | Matrix invertibility |
| Symmetric matrices | All $\lambda$ are real | Covariance matrices in PCA |
| Orthogonal eigenvectors | If $\mathbf{A} = \mathbf{A}^T$ | PCA directions are perpendicular |
A = np.array([[4, 2], [1, 3]])
eigenvalues, _ = np.linalg.eig(A)
print(f"Sum of eigenvalues: {np.sum(eigenvalues):.2f}")
print(f"Trace of A: {np.trace(A):.2f}")
print(f"Match: {np.isclose(np.sum(eigenvalues), np.trace(A))}")
print(f"\nProduct of eigenvalues: {np.prod(eigenvalues):.2f}")
print(f"Determinant of A: {np.linalg.det(A):.2f}")
print(f"Match: {np.isclose(np.prod(eigenvalues), np.linalg.det(A))}")
Symmetric Matrices (Crucial for ML)
In ML, we often work with symmetric matrices (like covariance matrices), which have special properties:
- All eigenvalues are real (no complex numbers)
- Eigenvectors are orthogonal (perpendicular to each other)
- Can be diagonalized as $\mathbf{A} = \mathbf{Q}\mathbf{\Lambda}\mathbf{Q}^T$
# Symmetric matrix (covariance-like)
A_symmetric = np.array([[5, 2],
[2, 3]])
eigenvalues, eigenvectors = np.linalg.eig(A_symmetric)
print("Eigenvalues (all real):", eigenvalues)
# Check orthogonality
dot_product = eigenvectors[:, 0] @ eigenvectors[:, 1]
print(f"Dot product of eigenvectors: {dot_product:.10f}") # ≈ 0 (orthogonal)
# Eigenvectors form an orthonormal basis
Q = eigenvectors
print(f"\nQ @ Q.T (should be identity):\n{Q @ Q.T}")
Eigendecomposition (Diagonalization)
The Big Idea
Any diagonalizable matrix $\mathbf{A}$ can be decomposed as:
$$\mathbf{A} = \mathbf{V}\mathbf{\Lambda}\mathbf{V}^{-1}$$
Where:
- $\mathbf{V}$ = matrix of eigenvectors (columns)
- $\mathbf{\Lambda}$ = diagonal matrix of eigenvalues
- $\mathbf{V}^{-1}$ = inverse of eigenvector matrix
For symmetric matrices: $\mathbf{A} = \mathbf{Q}\mathbf{\Lambda}\mathbf{Q}^T$ (simpler!)
Why This Matters for ML
- Matrix Powers: $\mathbf{A}^n = \mathbf{V}\mathbf{\Lambda}^n\mathbf{V}^{-1}$ (efficient computation)
- Matrix Exponential: Used in continuous dynamical systems
- Understanding Transformations: Eigenvalues tell you how the matrix stretches space
# Eigendecomposition
A = np.array([[4, 2], [1, 3]])
eigenvalues, V = np.linalg.eig(A)
# Create diagonal matrix of eigenvalues
Lambda = np.diag(eigenvalues)
# Reconstruct A
A_reconstructed = V @ Lambda @ np.linalg.inv(V)
print("Original A:\n", A)
print("\nReconstructed A (V @ Λ @ V⁻¹):\n", A_reconstructed.real)
print("\nMatch:", np.allclose(A, A_reconstructed))
# Compute A^10 efficiently
A_power_10_naive = np.linalg.matrix_power(A, 10)
Lambda_power_10 = np.diag(eigenvalues ** 10)
A_power_10_eigen = V @ Lambda_power_10 @ np.linalg.inv(V)
print(f"\nA^10 via eigendecomposition matches naive: {np.allclose(A_power_10_naive, A_power_10_eigen.real)}")
Eigenvalues in Machine Learning Applications
1. Principal Component Analysis (PCA)
PCA is perhaps the most common application of eigenvalues in ML.
The Process:
- Compute covariance matrix of data
- Find eigenvalues and eigenvectors
- Sort by eigenvalue (largest = most variance)
- Project data onto top k eigenvectors
from sklearn.datasets import load_iris
# Load data
iris = load_iris()
X = iris.data # (150 samples, 4 features)
# Center the data
X_centered = X - X.mean(axis=0)
# Compute covariance matrix
cov_matrix = np.cov(X_centered.T) # (4, 4)
print(f"Covariance matrix shape: {cov_matrix.shape}")
# Eigendecomposition
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)
# Sort by eigenvalue (descending)
sorted_indices = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[sorted_indices]
eigenvectors = eigenvectors[:, sorted_indices]
print(f"\nEigenvalues (variance in each PC direction):")
for i, ev in enumerate(eigenvalues):
print(f" PC{i+1}: {ev:.4f} ({100*ev/sum(eigenvalues):.1f}% of variance)")
# Project to 2D
W = eigenvectors[:, :2] # Top 2 eigenvectors
X_pca = X_centered @ W
print(f"\nOriginal shape: {X.shape}")
print(f"PCA shape: {X_pca.shape}") # (150, 2)
# Variance explained by top 2 components
variance_explained = sum(eigenvalues[:2]) / sum(eigenvalues)
print(f"Variance explained by 2 PCs: {100*variance_explained:.1f}%")
2. Spectral Clustering
Spectral clustering uses eigenvalues of the graph Laplacian:
from sklearn.datasets import make_moons
from sklearn.neighbors import kneighbors_graph
import scipy.sparse as sp
# Create non-linearly separable data
X, y = make_moons(n_samples=200, noise=0.05, random_state=42)
# Build adjacency matrix (k-nearest neighbors graph)
k = 10
A = kneighbors_graph(X, k, mode='connectivity', include_self=False)
A = (A + A.T) / 2 # Make symmetric
# Compute degree matrix
degrees = np.array(A.sum(axis=1)).flatten()
D = np.diag(degrees)
# Laplacian: L = D - A
L = D - A.toarray()
# Compute eigenvalues and eigenvectors of Laplacian
eigenvalues, eigenvectors = np.linalg.eigh(L)
# Second smallest eigenvector (Fiedler vector) for 2 clusters
fiedler_vector = eigenvectors[:, 1]
# Cluster based on sign
cluster_labels = (fiedler_vector > 0).astype(int)
print(f"Eigenvalues of Laplacian (first 5): {eigenvalues[:5]}")
print(f"Number of ~0 eigenvalues = number of connected components")
3. Google’s PageRank
PageRank finds the principal eigenvector of a modified adjacency matrix:
# Simple PageRank implementation
def pagerank(adjacency_matrix, damping=0.85, max_iter=100):
"""
Compute PageRank using power iteration.
"""
n = adjacency_matrix.shape[0]
# Create transition matrix (column-stochastic)
out_degree = adjacency_matrix.sum(axis=1)
out_degree[out_degree == 0] = 1 # Handle dangling nodes
M = adjacency_matrix / out_degree[:, np.newaxis]
# PageRank matrix with damping
G = damping * M + (1 - damping) / n * np.ones((n, n))
# Power iteration (finds principal eigenvector)
r = np.ones(n) / n
for _ in range(max_iter):
r_new = G.T @ r
if np.allclose(r, r_new):
break
r = r_new
return r
# Example: 4-page web graph
# 0 → 1 → 2 → 3
# ↑___|
adjacency = np.array([
[0, 1, 0, 0], # Page 0 links to page 1
[0, 0, 1, 0], # Page 1 links to page 2
[0, 1, 0, 1], # Page 2 links to pages 1, 3
[0, 0, 0, 0], # Page 3 (no outlinks - dangling)
])
ranks = pagerank(adjacency)
print("PageRank scores:", ranks)
print("Most important page:", np.argmax(ranks))
4. Covariance Matrix Analysis
Understanding data spread through eigenvalues:
# Generate correlated 2D data
np.random.seed(42)
mean = [0, 0]
cov = [[3, 2], [2, 3]] # Positive correlation
X = np.random.multivariate_normal(mean, cov, 500)
# Compute sample covariance
sample_cov = np.cov(X.T)
eigenvalues, eigenvectors = np.linalg.eig(sample_cov)
print("Sample covariance matrix:")
print(sample_cov)
print(f"\nEigenvalues: {eigenvalues}")
print("These represent variance along principal directions")
print(f"\nTotal variance: {np.trace(sample_cov):.2f}")
print(f"Sum of eigenvalues: {np.sum(eigenvalues):.2f}")
# Condition number (ratio of largest to smallest eigenvalue)
condition_number = max(eigenvalues) / min(eigenvalues)
print(f"\nCondition number: {condition_number:.2f}")
print("High condition number → features are highly correlated")
Positive Definite Matrices
What Are They?
A symmetric matrix $\mathbf{A}$ is positive definite if:
- All eigenvalues are positive ($\lambda_i > 0$)
- Equivalently: $\mathbf{x}^T\mathbf{A}\mathbf{x} > 0$ for all non-zero $\mathbf{x}$
Why They Matter in ML
- Covariance matrices are positive semi-definite (eigenvalues ≥ 0)
- Hessian matrices being positive definite → local minimum in optimization
- Kernel matrices must be positive semi-definite
def check_positive_definite(A):
"""Check if matrix is positive definite via eigenvalues."""
eigenvalues = np.linalg.eigvalsh(A) # eigvalsh for symmetric matrices
return np.all(eigenvalues > 0), eigenvalues
# Positive definite example (covariance matrix)
A_pd = np.array([[4, 2], [2, 3]])
is_pd, eigs = check_positive_definite(A_pd)
print(f"Matrix A: positive definite = {is_pd}")
print(f"Eigenvalues: {eigs}")
# Not positive definite
A_not_pd = np.array([[1, 2], [2, 1]])
is_pd, eigs = check_positive_definite(A_not_pd)
print(f"\nMatrix B: positive definite = {is_pd}")
print(f"Eigenvalues: {eigs}") # One negative eigenvalue
Numerical Considerations
When Eigenvalues Go Wrong
# Near-singular matrix
A_bad = np.array([[1, 1], [1, 1.0001]])
eigenvalues = np.linalg.eigvalsh(A_bad)
print(f"Eigenvalues: {eigenvalues}")
print(f"Condition number: {max(eigenvalues)/min(eigenvalues):.0f}")
print("Warning: High condition number means numerical instability!")
# Better: use eigh for symmetric matrices (more stable)
A_symmetric = np.array([[4, 2], [2, 3]])
eigenvalues_eig, _ = np.linalg.eig(A_symmetric) # General method
eigenvalues_eigh = np.linalg.eigvalsh(A_symmetric) # For symmetric
print(f"\nnp.linalg.eig: {eigenvalues_eig}")
print(f"np.linalg.eigvalsh: {eigenvalues_eigh}") # Guaranteed real, more stable
Best Practices
- Use
eigvalsh/eighfor symmetric matrices (guaranteed real eigenvalues) - Check condition number before inverting matrices
- Add regularization when eigenvalues are near zero
# Regularization example (Ridge regression adds λI)
A_singular = np.array([[1, 1], [1, 1]]) # Singular matrix
print(f"Determinant: {np.linalg.det(A_singular)}") # 0 - can't invert
# Add regularization
lambda_reg = 0.1
A_regularized = A_singular + lambda_reg * np.eye(2)
print(f"Regularized determinant: {np.linalg.det(A_regularized)}") # Non-zero
eigenvalues_reg = np.linalg.eigvalsh(A_regularized)
print(f"Regularized eigenvalues: {eigenvalues_reg}") # All positive now
Summary and Key Takeaways
Core Concepts
| Concept | Definition | ML Application |
|---|---|---|
| Eigenvector | $\mathbf{Av} = \lambda\mathbf{v}$ (direction preserved) | Principal components |
| Eigenvalue | Scaling factor $\lambda$ | Variance explained |
| Eigendecomposition | $\mathbf{A} = \mathbf{V\Lambda V^{-1}}$ | Matrix powers, PCA |
| Positive definite | All $\lambda > 0$ | Valid covariance matrices |
When to Use What
| Task | Eigenvalue Application |
|---|---|
| Dimensionality reduction | PCA (covariance matrix eigendecomp) |
| Clustering | Spectral clustering (Laplacian eigendecomp) |
| Graph analysis | PageRank, community detection |
| Optimization | Hessian eigenvalues (check convexity) |
| Data understanding | Covariance eigenvalues (spread, correlation) |
NumPy Functions
# For general matrices
eigenvalues, eigenvectors = np.linalg.eig(A)
# For symmetric matrices (preferred)
eigenvalues = np.linalg.eigvalsh(A) # Values only
eigenvalues, eigenvectors = np.linalg.eigh(A) # Values and vectors
Practice Exercises
Exercise 1: Implement Power Iteration
def power_iteration(A, max_iter=1000, tol=1e-10):
"""
Find the largest eigenvalue and corresponding eigenvector
using power iteration.
"""
# Your code here: start with random vector, repeatedly multiply by A
pass
Exercise 2: PCA from Scratch
def pca_from_scratch(X, n_components):
"""
Implement PCA using eigendecomposition.
Returns: projected data, explained variance ratio
"""
# Your code here
pass
Exercise 3: Analyze Eigenvalues
Given a dataset, compute the covariance matrix eigenvalues and determine:
- How many principal components explain 95% variance?
- What’s the condition number?
- Are there highly correlated features?
Frequently Asked Questions
Q: What if eigenvalues are complex?
A: Complex eigenvalues occur in non-symmetric matrices. In ML, we usually work with symmetric matrices (covariance, kernel matrices) which always have real eigenvalues. If you get complex values, check your matrix construction.
Q: How many eigenvalues does a matrix have?
A: An $n \times n$ matrix has exactly $n$ eigenvalues (counting multiplicity). Some may be repeated or zero.
Q: What does a zero eigenvalue mean?
A: Zero eigenvalue means the matrix is singular (not invertible). The corresponding eigenvector spans the null space—directions that get collapsed to zero.
Q: When should I use eigendecomposition vs SVD?
A:
- Eigendecomposition: Square matrices, especially symmetric ones (covariance, Laplacian)
- SVD: Any matrix (including rectangular), more numerically stable
Next Steps
Now that you understand eigenvalues and eigenvectors:
- Read Next: SVD Explained: Singular Value Decomposition for Data Science
- Practice: Implement PCA from scratch
- Apply: Use spectral clustering on a real dataset
Found this guide helpful? Continue with our Mathematics for Machine Learning series. Next: SVD Explained: Singular Value Decomposition.