Olox Olox

Theme

Documentation
Back to Home

Understanding Eigenvalues and Eigenvectors in Machine Learning

Master eigenvalues and eigenvectors for ML - geometric intuition, mathematical foundations, PCA connection, and Python implementations with real-world applications.

10 min read Jan 15, 2025

Understanding Eigenvalues and Eigenvectors in Machine Learning

Reading Time: 16 minutes | Word Count: 4,000+ | Difficulty: Intermediate

“The eigenvectors of a matrix are the directions in which the linear transformation acts simply—by stretching.” — David C. Lay

If you’ve ever used PCA, spectral clustering, or wondered how Google’s PageRank works, you’ve encountered eigenvalues and eigenvectors. These concepts might seem abstract, but they’re fundamental to many machine learning algorithms.

In this guide, we’ll build intuition from the ground up, connect the math to practical ML applications, and implement everything in Python.

Prerequisites

Before diving in, make sure you understand:


What Are Eigenvectors and Eigenvalues?

The Core Idea

When you multiply a matrix by a vector, the result is usually a vector pointing in a different direction:

$$\mathbf{A} \cdot \mathbf{v} = \mathbf{w} \quad \text{(different direction)}$$

But some special vectors, called eigenvectors, only get stretched or compressed—they keep pointing in the same direction:

$$\mathbf{A} \cdot \mathbf{v} = \lambda \mathbf{v} \quad \text{(same direction, scaled by } \lambda \text{)}$$

  • $\mathbf{v}$ is an eigenvector of matrix $\mathbf{A}$
  • $\lambda$ (lambda) is the corresponding eigenvalue (the scaling factor)

Visual Intuition

import numpy as np
import matplotlib.pyplot as plt

# Define a transformation matrix
A = np.array([[3, 1], 
              [1, 3]])

# Regular vector - changes direction
v_regular = np.array([1, 0])
Av_regular = A @ v_regular

# Eigenvector - only changes magnitude
eigenvalues, eigenvectors = np.linalg.eig(A)
v_eigen = eigenvectors[:, 0]  # First eigenvector
Av_eigen = A @ v_eigen

print(f"Regular vector: {v_regular}{Av_regular}")
print(f"Direction changed: {np.arctan2(v_regular[1], v_regular[0]):.2f}{np.arctan2(Av_regular[1], Av_regular[0]):.2f} radians")

print(f"\nEigenvector: {v_eigen}")
print(f"After transformation: {Av_eigen}")
print(f"Eigenvalue (scaling factor): {eigenvalues[0]:.2f}")
print(f"v_eigen * eigenvalue: {v_eigen * eigenvalues[0]}")  # Should match Av_eigen

The Eigenvalue Equation

Mathematically, we’re looking for solutions to:

$$\mathbf{A}\mathbf{v} = \lambda\mathbf{v}$$

Rearranging: $$(\mathbf{A} - \lambda\mathbf{I})\mathbf{v} = \mathbf{0}$$

For non-trivial solutions (non-zero $\mathbf{v}$), we need:

$$\det(\mathbf{A} - \lambda\mathbf{I}) = 0$$

This determinant equation gives us the characteristic polynomial, whose roots are the eigenvalues.


Computing Eigenvalues and Eigenvectors

By Hand (2×2 Example)

Let’s find eigenvalues and eigenvectors of: $$\mathbf{A} = \begin{bmatrix} 4 & 2 \ 1 & 3 \end{bmatrix}$$

Step 1: Set up the characteristic equation $$\det(\mathbf{A} - \lambda\mathbf{I}) = 0$$

$$\det\begin{bmatrix} 4-\lambda & 2 \ 1 & 3-\lambda \end{bmatrix} = 0$$

$$(4-\lambda)(3-\lambda) - 2 \cdot 1 = 0$$ $$\lambda^2 - 7\lambda + 10 = 0$$ $$(\lambda - 5)(\lambda - 2) = 0$$

Eigenvalues: $\lambda_1 = 5$, $\lambda_2 = 2$

Step 2: Find eigenvectors for each eigenvalue

For $\lambda_1 = 5$: $$(\mathbf{A} - 5\mathbf{I})\mathbf{v} = \mathbf{0}$$ $$\begin{bmatrix} -1 & 2 \ 1 & -2 \end{bmatrix} \begin{bmatrix} v_1 \ v_2 \end{bmatrix} = \begin{bmatrix} 0 \ 0 \end{bmatrix}$$

From first row: $-v_1 + 2v_2 = 0$, so $v_1 = 2v_2$

Eigenvector: $\mathbf{v}_1 = \begin{bmatrix} 2 \ 1 \end{bmatrix}$ (or any scalar multiple)

Similarly for $\lambda_2 = 2$: $\mathbf{v}_2 = \begin{bmatrix} 1 \ -1 \end{bmatrix}$

With NumPy (The Practical Way)

import numpy as np

A = np.array([[4, 2], 
              [1, 3]])

# Compute eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)

print("Eigenvalues:", eigenvalues)
print("Eigenvectors (columns):\n", eigenvectors)

# Verify: A @ v = λ * v
for i in range(len(eigenvalues)):
    v = eigenvectors[:, i]
    lam = eigenvalues[i]
    Av = A @ v
    lambda_v = lam * v
    print(f"\nEigenvalue {lam:.2f}:")
    print(f"  A @ v     = {Av}")
    print(f"  λ * v     = {lambda_v}")
    print(f"  Match: {np.allclose(Av, lambda_v)}")

Output:

Eigenvalues: [5. 2.]
Eigenvectors (columns):
 [[ 0.894  0.707]
  [ 0.447 -0.707]]

Eigenvalue 5.00:
  A @ v     = [4.472 2.236]
  λ * v     = [4.472 2.236]
  Match: True

Properties of Eigenvalues and Eigenvectors

Key Properties to Remember

PropertyFormulaML Relevance
Sum of eigenvalues$\sum \lambda_i = \text{trace}(\mathbf{A})$Model complexity
Product of eigenvalues$\prod \lambda_i = \det(\mathbf{A})$Matrix invertibility
Symmetric matricesAll $\lambda$ are realCovariance matrices in PCA
Orthogonal eigenvectorsIf $\mathbf{A} = \mathbf{A}^T$PCA directions are perpendicular
A = np.array([[4, 2], [1, 3]])
eigenvalues, _ = np.linalg.eig(A)

print(f"Sum of eigenvalues: {np.sum(eigenvalues):.2f}")
print(f"Trace of A: {np.trace(A):.2f}")
print(f"Match: {np.isclose(np.sum(eigenvalues), np.trace(A))}")

print(f"\nProduct of eigenvalues: {np.prod(eigenvalues):.2f}")
print(f"Determinant of A: {np.linalg.det(A):.2f}")
print(f"Match: {np.isclose(np.prod(eigenvalues), np.linalg.det(A))}")

Symmetric Matrices (Crucial for ML)

In ML, we often work with symmetric matrices (like covariance matrices), which have special properties:

  1. All eigenvalues are real (no complex numbers)
  2. Eigenvectors are orthogonal (perpendicular to each other)
  3. Can be diagonalized as $\mathbf{A} = \mathbf{Q}\mathbf{\Lambda}\mathbf{Q}^T$
# Symmetric matrix (covariance-like)
A_symmetric = np.array([[5, 2], 
                         [2, 3]])

eigenvalues, eigenvectors = np.linalg.eig(A_symmetric)
print("Eigenvalues (all real):", eigenvalues)

# Check orthogonality
dot_product = eigenvectors[:, 0] @ eigenvectors[:, 1]
print(f"Dot product of eigenvectors: {dot_product:.10f}")  # ≈ 0 (orthogonal)

# Eigenvectors form an orthonormal basis
Q = eigenvectors
print(f"\nQ @ Q.T (should be identity):\n{Q @ Q.T}")

Eigendecomposition (Diagonalization)

The Big Idea

Any diagonalizable matrix $\mathbf{A}$ can be decomposed as:

$$\mathbf{A} = \mathbf{V}\mathbf{\Lambda}\mathbf{V}^{-1}$$

Where:

  • $\mathbf{V}$ = matrix of eigenvectors (columns)
  • $\mathbf{\Lambda}$ = diagonal matrix of eigenvalues
  • $\mathbf{V}^{-1}$ = inverse of eigenvector matrix

For symmetric matrices: $\mathbf{A} = \mathbf{Q}\mathbf{\Lambda}\mathbf{Q}^T$ (simpler!)

Why This Matters for ML

  1. Matrix Powers: $\mathbf{A}^n = \mathbf{V}\mathbf{\Lambda}^n\mathbf{V}^{-1}$ (efficient computation)
  2. Matrix Exponential: Used in continuous dynamical systems
  3. Understanding Transformations: Eigenvalues tell you how the matrix stretches space
# Eigendecomposition
A = np.array([[4, 2], [1, 3]])
eigenvalues, V = np.linalg.eig(A)

# Create diagonal matrix of eigenvalues
Lambda = np.diag(eigenvalues)

# Reconstruct A
A_reconstructed = V @ Lambda @ np.linalg.inv(V)
print("Original A:\n", A)
print("\nReconstructed A (V @ Λ @ V⁻¹):\n", A_reconstructed.real)
print("\nMatch:", np.allclose(A, A_reconstructed))

# Compute A^10 efficiently
A_power_10_naive = np.linalg.matrix_power(A, 10)
Lambda_power_10 = np.diag(eigenvalues ** 10)
A_power_10_eigen = V @ Lambda_power_10 @ np.linalg.inv(V)
print(f"\nA^10 via eigendecomposition matches naive: {np.allclose(A_power_10_naive, A_power_10_eigen.real)}")

Eigenvalues in Machine Learning Applications

1. Principal Component Analysis (PCA)

PCA is perhaps the most common application of eigenvalues in ML.

The Process:

  1. Compute covariance matrix of data
  2. Find eigenvalues and eigenvectors
  3. Sort by eigenvalue (largest = most variance)
  4. Project data onto top k eigenvectors
from sklearn.datasets import load_iris

# Load data
iris = load_iris()
X = iris.data  # (150 samples, 4 features)

# Center the data
X_centered = X - X.mean(axis=0)

# Compute covariance matrix
cov_matrix = np.cov(X_centered.T)  # (4, 4)
print(f"Covariance matrix shape: {cov_matrix.shape}")

# Eigendecomposition
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

# Sort by eigenvalue (descending)
sorted_indices = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[sorted_indices]
eigenvectors = eigenvectors[:, sorted_indices]

print(f"\nEigenvalues (variance in each PC direction):")
for i, ev in enumerate(eigenvalues):
    print(f"  PC{i+1}: {ev:.4f} ({100*ev/sum(eigenvalues):.1f}% of variance)")

# Project to 2D
W = eigenvectors[:, :2]  # Top 2 eigenvectors
X_pca = X_centered @ W

print(f"\nOriginal shape: {X.shape}")
print(f"PCA shape: {X_pca.shape}")  # (150, 2)

# Variance explained by top 2 components
variance_explained = sum(eigenvalues[:2]) / sum(eigenvalues)
print(f"Variance explained by 2 PCs: {100*variance_explained:.1f}%")

2. Spectral Clustering

Spectral clustering uses eigenvalues of the graph Laplacian:

from sklearn.datasets import make_moons
from sklearn.neighbors import kneighbors_graph
import scipy.sparse as sp

# Create non-linearly separable data
X, y = make_moons(n_samples=200, noise=0.05, random_state=42)

# Build adjacency matrix (k-nearest neighbors graph)
k = 10
A = kneighbors_graph(X, k, mode='connectivity', include_self=False)
A = (A + A.T) / 2  # Make symmetric

# Compute degree matrix
degrees = np.array(A.sum(axis=1)).flatten()
D = np.diag(degrees)

# Laplacian: L = D - A
L = D - A.toarray()

# Compute eigenvalues and eigenvectors of Laplacian
eigenvalues, eigenvectors = np.linalg.eigh(L)

# Second smallest eigenvector (Fiedler vector) for 2 clusters
fiedler_vector = eigenvectors[:, 1]

# Cluster based on sign
cluster_labels = (fiedler_vector > 0).astype(int)

print(f"Eigenvalues of Laplacian (first 5): {eigenvalues[:5]}")
print(f"Number of ~0 eigenvalues = number of connected components")

3. Google’s PageRank

PageRank finds the principal eigenvector of a modified adjacency matrix:

# Simple PageRank implementation
def pagerank(adjacency_matrix, damping=0.85, max_iter=100):
    """
    Compute PageRank using power iteration.
    """
    n = adjacency_matrix.shape[0]
    
    # Create transition matrix (column-stochastic)
    out_degree = adjacency_matrix.sum(axis=1)
    out_degree[out_degree == 0] = 1  # Handle dangling nodes
    M = adjacency_matrix / out_degree[:, np.newaxis]
    
    # PageRank matrix with damping
    G = damping * M + (1 - damping) / n * np.ones((n, n))
    
    # Power iteration (finds principal eigenvector)
    r = np.ones(n) / n
    for _ in range(max_iter):
        r_new = G.T @ r
        if np.allclose(r, r_new):
            break
        r = r_new
    
    return r

# Example: 4-page web graph
#   0 → 1 → 2 → 3
#       ↑___|
adjacency = np.array([
    [0, 1, 0, 0],  # Page 0 links to page 1
    [0, 0, 1, 0],  # Page 1 links to page 2  
    [0, 1, 0, 1],  # Page 2 links to pages 1, 3
    [0, 0, 0, 0],  # Page 3 (no outlinks - dangling)
])

ranks = pagerank(adjacency)
print("PageRank scores:", ranks)
print("Most important page:", np.argmax(ranks))

4. Covariance Matrix Analysis

Understanding data spread through eigenvalues:

# Generate correlated 2D data
np.random.seed(42)
mean = [0, 0]
cov = [[3, 2], [2, 3]]  # Positive correlation
X = np.random.multivariate_normal(mean, cov, 500)

# Compute sample covariance
sample_cov = np.cov(X.T)
eigenvalues, eigenvectors = np.linalg.eig(sample_cov)

print("Sample covariance matrix:")
print(sample_cov)
print(f"\nEigenvalues: {eigenvalues}")
print("These represent variance along principal directions")
print(f"\nTotal variance: {np.trace(sample_cov):.2f}")
print(f"Sum of eigenvalues: {np.sum(eigenvalues):.2f}")

# Condition number (ratio of largest to smallest eigenvalue)
condition_number = max(eigenvalues) / min(eigenvalues)
print(f"\nCondition number: {condition_number:.2f}")
print("High condition number → features are highly correlated")

Positive Definite Matrices

What Are They?

A symmetric matrix $\mathbf{A}$ is positive definite if:

  • All eigenvalues are positive ($\lambda_i > 0$)
  • Equivalently: $\mathbf{x}^T\mathbf{A}\mathbf{x} > 0$ for all non-zero $\mathbf{x}$

Why They Matter in ML

  • Covariance matrices are positive semi-definite (eigenvalues ≥ 0)
  • Hessian matrices being positive definite → local minimum in optimization
  • Kernel matrices must be positive semi-definite
def check_positive_definite(A):
    """Check if matrix is positive definite via eigenvalues."""
    eigenvalues = np.linalg.eigvalsh(A)  # eigvalsh for symmetric matrices
    return np.all(eigenvalues > 0), eigenvalues

# Positive definite example (covariance matrix)
A_pd = np.array([[4, 2], [2, 3]])
is_pd, eigs = check_positive_definite(A_pd)
print(f"Matrix A: positive definite = {is_pd}")
print(f"Eigenvalues: {eigs}")

# Not positive definite
A_not_pd = np.array([[1, 2], [2, 1]])
is_pd, eigs = check_positive_definite(A_not_pd)
print(f"\nMatrix B: positive definite = {is_pd}")
print(f"Eigenvalues: {eigs}")  # One negative eigenvalue

Numerical Considerations

When Eigenvalues Go Wrong

# Near-singular matrix
A_bad = np.array([[1, 1], [1, 1.0001]])
eigenvalues = np.linalg.eigvalsh(A_bad)
print(f"Eigenvalues: {eigenvalues}")
print(f"Condition number: {max(eigenvalues)/min(eigenvalues):.0f}")
print("Warning: High condition number means numerical instability!")

# Better: use eigh for symmetric matrices (more stable)
A_symmetric = np.array([[4, 2], [2, 3]])
eigenvalues_eig, _ = np.linalg.eig(A_symmetric)  # General method
eigenvalues_eigh = np.linalg.eigvalsh(A_symmetric)  # For symmetric

print(f"\nnp.linalg.eig: {eigenvalues_eig}")
print(f"np.linalg.eigvalsh: {eigenvalues_eigh}")  # Guaranteed real, more stable

Best Practices

  1. Use eigvalsh/eigh for symmetric matrices (guaranteed real eigenvalues)
  2. Check condition number before inverting matrices
  3. Add regularization when eigenvalues are near zero
# Regularization example (Ridge regression adds λI)
A_singular = np.array([[1, 1], [1, 1]])  # Singular matrix
print(f"Determinant: {np.linalg.det(A_singular)}")  # 0 - can't invert

# Add regularization
lambda_reg = 0.1
A_regularized = A_singular + lambda_reg * np.eye(2)
print(f"Regularized determinant: {np.linalg.det(A_regularized)}")  # Non-zero

eigenvalues_reg = np.linalg.eigvalsh(A_regularized)
print(f"Regularized eigenvalues: {eigenvalues_reg}")  # All positive now

Summary and Key Takeaways

Core Concepts

ConceptDefinitionML Application
Eigenvector$\mathbf{Av} = \lambda\mathbf{v}$ (direction preserved)Principal components
EigenvalueScaling factor $\lambda$Variance explained
Eigendecomposition$\mathbf{A} = \mathbf{V\Lambda V^{-1}}$Matrix powers, PCA
Positive definiteAll $\lambda > 0$Valid covariance matrices

When to Use What

TaskEigenvalue Application
Dimensionality reductionPCA (covariance matrix eigendecomp)
ClusteringSpectral clustering (Laplacian eigendecomp)
Graph analysisPageRank, community detection
OptimizationHessian eigenvalues (check convexity)
Data understandingCovariance eigenvalues (spread, correlation)

NumPy Functions

# For general matrices
eigenvalues, eigenvectors = np.linalg.eig(A)

# For symmetric matrices (preferred)
eigenvalues = np.linalg.eigvalsh(A)  # Values only
eigenvalues, eigenvectors = np.linalg.eigh(A)  # Values and vectors

Practice Exercises

Exercise 1: Implement Power Iteration

def power_iteration(A, max_iter=1000, tol=1e-10):
    """
    Find the largest eigenvalue and corresponding eigenvector
    using power iteration.
    """
    # Your code here: start with random vector, repeatedly multiply by A
    pass

Exercise 2: PCA from Scratch

def pca_from_scratch(X, n_components):
    """
    Implement PCA using eigendecomposition.
    Returns: projected data, explained variance ratio
    """
    # Your code here
    pass

Exercise 3: Analyze Eigenvalues

Given a dataset, compute the covariance matrix eigenvalues and determine:

  1. How many principal components explain 95% variance?
  2. What’s the condition number?
  3. Are there highly correlated features?

Frequently Asked Questions

Q: What if eigenvalues are complex?

A: Complex eigenvalues occur in non-symmetric matrices. In ML, we usually work with symmetric matrices (covariance, kernel matrices) which always have real eigenvalues. If you get complex values, check your matrix construction.

Q: How many eigenvalues does a matrix have?

A: An $n \times n$ matrix has exactly $n$ eigenvalues (counting multiplicity). Some may be repeated or zero.

Q: What does a zero eigenvalue mean?

A: Zero eigenvalue means the matrix is singular (not invertible). The corresponding eigenvector spans the null space—directions that get collapsed to zero.

Q: When should I use eigendecomposition vs SVD?

A:

  • Eigendecomposition: Square matrices, especially symmetric ones (covariance, Laplacian)
  • SVD: Any matrix (including rectangular), more numerically stable

Next Steps

Now that you understand eigenvalues and eigenvectors:

  1. Read Next: SVD Explained: Singular Value Decomposition for Data Science
  2. Practice: Implement PCA from scratch
  3. Apply: Use spectral clustering on a real dataset

Found this guide helpful? Continue with our Mathematics for Machine Learning series. Next: SVD Explained: Singular Value Decomposition.

AI & ML Team
Author