A matrix is a rectangular array of numbers arranged in rows and columns. Matrices provide a powerful
framework for solving systems of linear equations, representing geometric transformations, modelling
Markov processes, and much more. This topic is central to the IB Mathematics AA course at both SL
and HL, with eigenvalues and diagonalisation appearing exclusively at HL.
If A is m×p and B is p×n, then the product C=AB is an m×n matrix
whose entries are:
c_`\{ij}` = \sum_{k=1}^{p} a_`\{ik}` b_`\{kj}`
This is the dot product of the i-th row of A with the j-th column of B. The inner
dimensions must agree: an m×p matrix can multiply a p×n matrix, producing an
m×n matrix.
Critical properties:
Matrix multiplication is associative: (AB)C=A(BC) when the products are defined.
Matrix multiplication is distributive over addition: A(B+C)=AB+AC.
Matrix multiplication is NOT commutative in general: AB=BA.
The existence of AB does not imply the existence of BA.
AB=O does NOT imply A=O or B=O (there are non-trivial zero divisors).
warning
A common error is assuming AB=BA. Always check the order of multiplication. In geometric
transformations, applying A then B corresponds to the product BA (right-to-left reading).
A = \begin`\{pmatrix}` a & b \\ c & d \end`\{pmatrix}`
the determinant is:
det(A)=∣A∣=ad−bc
The determinant is a scalar that encodes important information about the matrix, including whether
it is invertible and how it scales area (or volume).
Rule of Sarrus (mnemonic for 3×3 only): Copy the first two columns to the right of the
matrix. Sum the products of the three downward diagonals, then subtract the products of the three
upward diagonals. This is NOT valid for matrices larger than 3×3.
A^{-1} = \frac{1}{ad - bc} \begin`\{pmatrix}` d & -b \\ -c & a \end`\{pmatrix}`
This is obtained by swapping the diagonal entries, negating the off-diagonal entries, and dividing
by the determinant. The matrix of cofactors (with the sign change) is called the adjugate or
adjoint of A.
A system of n linear equations in n unknowns can be written in matrix form as
A′{′x′}′=′{′b′}′, where A is the coefficient matrix, ′{′x′}′ is the column vector
of unknowns, and ′{′b′}′ is the column vector of constants.
If A is invertible, the unique solution is:
′{′x′}′=A−1′{′b′}′
For a 2×2 system:
\begin`\{pmatrix}` a & b \\ c & d \end`\{pmatrix}` \begin`\{pmatrix}` x \\ y \end`\{pmatrix}` = \begin`\{pmatrix}` e \\ f \end`\{pmatrix}`
the solution is:
\begin`\{pmatrix}` x \\ y \end`\{pmatrix}` = \frac{1}{ad - bc} \begin`\{pmatrix}` d & -b \\ -c & a \end`\{pmatrix}` \begin`\{pmatrix}` e \\ f \end`\{pmatrix}` = \frac{1}{ad - bc} \begin`\{pmatrix}` de - bf \\ -ce + af \end`\{pmatrix}`
Every linear transformation T:′{′R′}′2→′{′R′}′2 can be represented by a 2×2
matrix M such that T(′{′v′}′)=M′{′v′}′. The images of the standard basis vectors
(10) and (01) form the columns
of M.
For a stretch parallel to the x-axis, det(Sx)=k, so the area scale factor is ∣k∣. If
0<k<1, the figure is compressed; if k>1, it is expanded.
Shears. A horizontal shear with shear factor k fixes every point on the x-axis and shifts
other points horizontally in proportion to their y-coordinate:
If transformation A is applied first, followed by transformation B, the composite transformation
is represented by the product BA (note the order: right to left).
′{′v′}′′=B(A′{′v′}′)=(BA)′{′v′}′
Example. A rotation of 90∘ anticlockwise followed by a reflection in the x-axis:
An invariant point under transformation M is a point ′{′v′}′ such that
M′{′v′}′=′{′v′}′, i.e. (M−I)′{′v′}′=′{′0′}′. The set of invariant points forms
the null space of M−I.
For any 2×2 transformation matrix, the origin is always invariant.
An invariant line is a line that is mapped onto itself (though individual points on the line may
move along it). A line through the origin with direction vector ′{′d′}′ is invariant if
M′{′d′}′=λ′{′d′}′ for some scalar λ, which means ′{′d′}′ is an
eigenvector of M.
A line is point-wise invariant (every point is fixed) if and only if every point on it is an
invariant point, meaning the line lies entirely in the null space of M−I.
Not all invariant lines are point-wise invariant. For example, a stretch parallel to the
x-axis (k001) leaves the x-axis point-wise invariant and
also leaves the y-axis invariant as a line (each point (0,y) maps to itself), but it leaves
every line parallel to the x-axis invariant as a line (points slide along it), not point-wise.
For a reflection, the mirror line is point-wise invariant and the line perpendicular to it through
the origin is invariant as a set (points are reflected across the mirror line but remain on the
perpendicular line).
When a linear transformation acts on a vector, the vector is generally rotated and scaled. Certain
special vectors, called eigenvectors, are only scaled (stretched or compressed) by the
transformation, not rotated. The factor by which an eigenvector is scaled is the corresponding
eigenvalue.
The equation det(A−λI)=0 is the characteristic equation of A. For a 2×2
matrix A=(acbd):
\det\begin`\{pmatrix}` a - \lambda & b \\ c & d - \lambda \end`\{pmatrix}` = (a - \lambda)(d - \lambda) - bc = 0
Expanding:
λ2−(a+d)λ+(ad−bc)=0
Notice that a+d=tr(A) (the trace of A) and ad−bc=det(A). Therefore:
λ2−tr(A)λ+det(A)=0
For a 3×3 matrix, the characteristic equation is a cubic:
det(A−λI)=−λ3+tr(A)λ2−Sλ+det(A)=0
where S is the sum of the principal 2×2 minors (the sum of the determinants of the
matrices obtained by deleting each row and the corresponding column).
Fundamental properties of eigenvalues:
The sum of the eigenvalues equals the trace:
λ1+λ2+⋯+λn=tr(A).
The product of the eigenvalues equals the determinant:
λ1λ2⋯λn=det(A).
A matrix is invertible if and only if none of its eigenvalues is zero.
Once an eigenvalue λ is known, the corresponding eigenvectors are found by solving:
(A−λI)′{′v′}′=′{′0′}′
This is a homogeneous system. Since det(A−λI)=0, the rows of A−λI are
linearly dependent, and the system has infinitely many solutions forming a one-dimensional subspace
(a line through the origin) for each distinct eigenvalue.
Important: Eigenvectors are determined only up to a non-zero scalar multiple. Any non-zero
multiple of an eigenvector is also an eigenvector for the same eigenvalue.
When the characteristic equation has a repeated root (a repeated eigenvalue), the matrix may or may
not be diagonalisable.
Geometric multiplicity≤algebraic multiplicity. The algebraic multiplicity of an
eigenvalue is its multiplicity as a root of the characteristic equation. The geometric
multiplicity is the dimension of the corresponding eigenspace (the number of linearly independent
eigenvectors for that eigenvalue).
A matrix is diagonalisable if and only if the geometric multiplicity of each eigenvalue equals its
algebraic multiplicity.
If the geometric multiplicity is strictly less than the algebraic multiplicity, the matrix is
defective and cannot be diagonalised.
For a 2×2 matrix with a repeated eigenvalue λ:
If A=λI, then every non-zero vector is an eigenvector (geometric multiplicity =2),
and A is diagonalisable (it is already diagonal).
If A=λI but has a repeated eigenvalue, the geometric multiplicity is typically 1,
and A is not diagonalisable.
For a 3×3 matrix, the characteristic equation is a cubic polynomial in λ. The cubic
can have three distinct real roots, one repeated and one distinct real root, or one real root and
two complex conjugate roots. Since the IB course works over ′{′R′}′, only real eigenvalues and
eigenvectors are considered.
v2+v3=0, v3=−v2, and v1 is free. Two linearly independent eigenvectors:
100 and 01−1.
Since the geometric multiplicity of λ2=2 equals its algebraic multiplicity (both =2),
the matrix is diagonalisable. The total number of linearly independent eigenvectors is 3.
P=01010001−1,D=300020002
Characteristic equation:
det(A−λI)=4−λ213−λ=(4−λ)(3−λ)−2=0
λ2−7λ+10=0
(λ−5)(λ−2)=0
λ1=5,λ2=2
Eigenvector for λ1=5:
(4−5213−5)(v1v2)=(00)
(−121−2)(v1v2)=(00)
From the first row: −v1+v2=0, so v1=v2. The eigenvector is
(11) (up to scalar multiples).
Eigenvector for λ2=2:
(4−2213−2)(v1v2)=(00)
(2211)(v1v2)=(00)
From the first row: 2v1+v2=0, so v2=−2v1. The eigenvector is
(1−2).
An n×n matrix A is diagonalisable if there exists an invertible matrix P and a
diagonal matrix D such that:
A=PDP−1
The columns of P are the eigenvectors of A, and the diagonal entries of D are the
corresponding eigenvalues.
Theorem. An n×n matrix is diagonalisable if and only if it has n linearly independent
eigenvectors. This is guaranteed when A has n distinct eigenvalues.
Procedure for diagonalising a 2×2 matrix:
Find the eigenvalues by solving det(A−λI)=0.
Find a corresponding eigenvector for each eigenvalue.
Form P=(′{′v′}′1′{′v′}′2) (eigenvectors as columns).
One of the most powerful applications of diagonalisation is computing large matrix powers. If
A=PDP−1, then:
Ak=PDkP−1
where Dk=(λ1k00λ2k) for a 2×2
case.
This transforms the problem of computing Ak (which would require k−1 matrix multiplications)
into computing three matrix products, which is dramatically more efficient.
Examples
Expand
Find A5 whereA=(4213)
From the previous example: λ1=5, λ2=2,
′{′v′}′1=(11),
′{′v′}′2=(1−2).
Gaussian elimination is a systematic method for solving systems of linear equations by reducing the
augmented matrix to row echelon form (REF) or reduced row echelon form (RREF).
Elementary row operations:
Swap two rows (Ri↔Rj).
Multiply a row by a non-zero scalar (Ri→kRi).
Add a multiple of one row to another (Ri→Ri+kRj).
Algorithm (for an n×n system):
Form the augmented matrix [A∣′{′b′}′].
Use row operations to create zeros below each pivot (forward elimination).
Back-substitute to find the solution.
Existence and uniqueness:
If rank(A)=rank([A∣′{′b′}′])=n: unique solution.
If rank(A)=rank([A∣′{′b′}′])<n: infinitely many solutions.
If rank(A)<rank([A∣′{′b′}′]): no solution (inconsistent system).
Cramer's rule provides an explicit formula for the solution of a system A′{′x′}′=′{′b′}′
when A is an n×n invertible matrix.
For a 2×2 system:
x = \frac{\begin`\{vmatrix}` e & b \\ f & d \end`\{vmatrix}`}{\det(A)}, \qquad y = \frac{\begin`\{vmatrix}` a & e \\ c & f \end`\{vmatrix}`}{\det(A)}
where the numerator for x replaces the first column of A with ′{′b′}′, and the numerator
for y replaces the second column.
General formula (Cramer's Rule): For each variable xi:
xi=det(A)det(Ai)
where Ai is the matrix A with column i replaced by the vector ′{′b′}′.
warning
Cramer's rule is computationally expensive for large systems (O(n!) for the determinant
computation compared to O(n3) for Gaussian elimination), but it is theoretically important and
frequently appears in examination questions for 2×2 and 3×3 systems.
The Hill cipher is a classical polygraphic substitution cipher that uses matrix multiplication to
encrypt blocks of text. It demonstrates a direct application of matrices in cryptography.
Encryption procedure:
Convert each letter to a number: A=0, B=1, …, Z=25.
Group the plaintext into blocks of size n (matching the dimension of the key matrix).
Multiply each block (as a column vector) by the n×n key matrix K modulo 26.
Convert the resulting numbers back to letters.
Decryption: Apply the inverse of the key matrix modulo 26:
′{′p′}′=K−1′{′c′}′(mod26)
The key matrix K must be invertible modulo 26, which requires det(K) to be coprime to 26 (i.e.
gcd(det(K),26)=1).
Examples
Expand
Encrypt "HELP" using the key matrix
K=(3152)
Convert to numbers: H=7, E=4, L=11, P=15.
Block 1: (74), Block 2: (1115)
Block 1:
(3152)(74)=(21+207+8)=(4115)
Modulo 26: (1515), which gives "PP".
Block 2:
(3152)(1115)=(33+7511+30)=(10841)
Modulo 26: 108mod26=4, 41mod26=15, giving (415),
which is "EP".
Matrices are fundamental to 2D and 3D computer graphics. Every transformation of a geometric object
(viewing, rotation, scaling, projection) is represented by matrix multiplication.
Homogeneous coordinates. To represent translations (which are not linear transformations) as
matrix multiplications, 2D points (x,y) are extended to homogeneous coordinates (x,y,1). A
translation by (tx,ty) is then:
Rotation and scaling in homogeneous coordinates use 3×3 matrices with the bottom row
(0,0,1).
Transformation pipeline. A typical graphics pipeline applies a sequence of transformations:
model transform (place object in world space), view transform (position camera), projection
transform (3D to 2D), and viewport transform (map to screen coordinates). Each stage is a matrix
multiplication.
Composition advantage. Instead of applying n separate transformations to each of m vertices,
the transformations are composed into a single matrix M=Tn⋯T2T1, and each vertex is
transformed with a single multiplication M′{′v′}′. This reduces the cost from O(nm) to
O(n3+m) matrix operations.
A Markov chain is a stochastic process where the probability of transitioning to any future
state depends only on the current state, not on the sequence of events that preceded it (the Markov
property).
Transition matrix. A transition matrix P is a square matrix where:
pij≥0 for all i,j (entries are probabilities)
Each row sums to 1: ∑jpij=1 for all i
The entry pij represents the probability of moving from state i to state j in one step.
State evolution. If ′{′s′}′(k) is the state probability vector at step k, then:
′{′s′}′(k)=′{′s′}′(0)Pk
Steady state. A steady-state (stationary) vector ′{′s′}′ satisfies
′{′s′}′P=′{′s′}′, or equivalently, ′{′s′}′(P−I)=′{′0′}′. This means
′{′s′}′ is a left eigenvector of P with eigenvalue 1.
For a regular Markov chain (one where some power of P has all positive entries), the steady-state
distribution exists, is unique, and is independent of the initial state. The eigenvalue 1 is
always the largest eigenvalue of a stochastic matrix (by the Perron-Frobenius theorem).
Examples
Expand
A weather model has two states: Sunny (S) and Rainy (R). If it is sunny today, the probability of
sun tomorrow is 0.7 and rain is 0.3. If it is rainy today, the probability of sun tomorrow is
0.4 and rain is 0.6.
Transition matrix:
P=(0.70.40.30.6)
Find the steady-state vector′{′s′}′=(sr):
′{′s′}′P=′{′s′}′ gives 0.7s+0.4r=s and 0.3s+0.6r=r.
From the first equation: −0.3s+0.4r=0, so 3s=4r.
Since s+r=1: s+43s=1, giving s=74 and r=73.
Steady state: ′{′s′}′=(7473).
In the long run, the weather is sunny 74≈57.1% of the time.
Eigenvalue verification. The eigenvalues of P are found from:
det(P−λI)=(0.7−λ)(0.6−λ)−0.12=λ2−1.3λ+0.3=0
(λ−1)(λ−0.3)=0
λ1=1, λ2=0.3. Since ∣λ2∣<1, as k→∞ the term
λ2k→0 and the system converges to the eigenvector for λ1=1.
danger
Common Pitfalls
Confusing matrix multiplication order: Matrix multiplication is NOT commutative: AB is generally not equal to BA. When applying a transformation matrix to a point, the ORDER matters. For combined transformations, the matrix closest to the point is applied FIRST: if transformation B follows transformation A, the combined matrix is BA (not AB).
Forgetting that the determinant of a singular matrix is zero: A singular matrix has determinant zero and NO inverse. If asked to find the inverse of a 2x2 matrix, always check that the determinant is non-zero first. A zero determinant means the transformation collapses space (maps the plane onto a line), which is why no inverse exists.
Misidentifying the type of transformation from its matrix: A reflection in the y-axis has matrix [[-1, 0], [0, 1]] (negative in top-left). A reflection in the x-axis has matrix [[1, 0], [0, -1]] (negative in bottom-right). Students frequently confuse these two. Also, a rotation of 90 degrees anticlockwise gives [[0, -1], [1, 0]], which students often mix up with the clockwise rotation.
Arithmetic errors when calculating determinants and inverses: For a 2x2 matrix [[a, b], [c, d]], the determinant is ad - bc (not ad + bc). The inverse is (1/det) * [[d, -b], [-c, a]] (note the swap of a and d and the negative signs). A single sign error invalidates the entire calculation. Always double-check by multiplying the matrix by its inverse to get the identity.
A square matrix Q is orthogonal if QTQ=QQT=I, which means Q−1=QT.
Equivalent characterisations:
The columns of Q form an orthonormal set (each column has unit length, and distinct columns are
perpendicular).
The rows of Q form an orthonormal set.
det(Q)=±1 (since det(QT)det(Q)=det(I)=1 and det(QT)=det(Q)).
Rotation matrices and reflection matrices in ′{′R′}′2 are orthogonal:
Rotations have det=+1 and are called proper orthogonal (special orthogonal).
Reflections have det=−1.
Preservation of the inner product. If Q is orthogonal, then for any vectors
′{′u′}′,′{′v′}′:
(Q′{′u′}′)⋅(Q′{′v′}′)=′{′u′}′⋅′{′v′}′
In particular, ∣Q′{′v′}′∣=∣′{′v′}′∣ and the angle between vectors is preserved.
Orthogonal diagonalisation (spectral theorem). A real symmetric matrix A can always be
orthogonally diagonalised: A=QDQT where Q is orthogonal and D is diagonal. This is a
stronger form of diagonalisation that is guaranteed for all symmetric matrices (even those with
repeated eigenvalues), since symmetric matrices always have n linearly independent eigenvectors
that can be chosen orthonormal.
A reflection in the y-axis is followed by an enlargement with scale factor 3 about the origin.
Find the single matrix that represents this composite transformation. What is the area scale factor?
Solution
Reflection in the y-axis: Ry=(−1001)
Enlargement with scale factor 3: E3=(3003)
Composite (enlargement applied after reflection):
M=E3⋅Ry=(3003)(−1001)=(−3003)
Area scale factor: ∣det(M)∣=∣−3×3−0∣=9.
If you get this wrong, revise: Composite Transformations and Area Scale Factor.
A stretch parallel to the x-axis with scale factor 2 is followed by a stretch parallel to the
y-axis with scale factor 3. Find the single matrix and describe its effect on the unit square.
Solution
M=(1003)(2001)=(2003)
The unit square (area = 1) is mapped to a rectangle with vertices
(0,0), (2,0), (2,3), (0,3). The new area is 6.
Area scale factor: ∣det(M)∣=6.
If you get this wrong, revise: Enlargements and Stretches section.
Find the invariant points and invariant lines of the transformation represented by
M=(2012).
Solution
Invariant points: Solve (M−I)′{′x′}′=′{′0′}′:
(1011)(xy)=(00)
From row 2: y=0. From row 1: x+0=0, so x=0.
The only invariant point is the origin (0,0).
Invariant lines: For an invariant line through the origin, the direction vector must be an
eigenvector. Eigenvalues satisfy (2−λ)2=0, so λ=2 (repeated).
For λ=2: (0010)(v1v2)=(00), giving v2=0. The only eigenvector direction is (10).
The x-axis (y=0) is the only invariant line through the origin.
If you get this wrong, revise: Invariant Points and Lines section.