Probability Distributions Discrete Random Variables
Definition
A discrete random variable X X X is a function that assigns a numerical value to each outcome in a
countable sample space. The set of possible values is finite or countably infinite: for example,
{ 0 , 1 , 2 , … , n } \{0, 1, 2, \ldots, n\} { 0 , 1 , 2 , … , n } or { 0 , 1 , 2 , … } \{0, 1, 2, \ldots\} { 0 , 1 , 2 , … } .
Probability Mass Function (PMF)
The probability mass function of X X X is p ( x ) = P ( X = x ) p(x) = P(X = x) p ( x ) = P ( X = x ) , assigning a probability to each
possible value. It must satisfy:
p ( x ) ≥ 0 p(x) \ge 0 p ( x ) ≥ 0 for all x x x
∑ a l l x p ( x ) = 1 \displaystyle\sum_{\mathrm{all } x} p(x) = 1 all x ∑ p ( x ) = 1
Cumulative Distribution Function (CDF)
F ( x ) = P ( X ≤ x ) = ∑ t ≤ x p ( t ) F(x) = P(X \le x) = \sum_{t \le x} p(t) F ( x ) = P ( X ≤ x ) = t ≤ x ∑ p ( t )
The CDF is non-decreasing and right-continuous, with F ( − ∞ ) = 0 F(-\infty) = 0 F ( − ∞ ) = 0 and F ( ∞ ) = 1 F(\infty) = 1 F ( ∞ ) = 1 . For a
discrete variable it is a step function with jumps at each value in the range of X X X . The size of
each jump at x = a x = a x = a equals P ( X = a ) P(X = a) P ( X = a ) .
Expected Value
The expected value (mean) of X X X is the probability-weighted average of all possible values:
E ( X ) = μ = ∑ a l l x x ⋅ p ( x ) E(X) = \mu = \sum_{\mathrm{all } x} x \cdot p(x) E ( X ) = μ = all x ∑ x ⋅ p ( x )
This represents the long-run average if the experiment is repeated many times. For a function
g ( X ) g(X) g ( X ) :
E ( g ( X ) ) = ∑ a l l x g ( x ) ⋅ p ( x ) E(g(X)) = \sum_{\mathrm{all } x} g(x) \cdot p(x) E ( g ( X )) = all x ∑ g ( x ) ⋅ p ( x )
A critical special case is E ( X 2 ) = ∑ x 2 p ( x ) E(X^2) = \sum x^2 p(x) E ( X 2 ) = ∑ x 2 p ( x ) .
Variance and Standard Deviation
V a r ( X ) = σ 2 = E [ ( X − μ ) 2 ] = ∑ a l l x ( x − μ ) 2 ⋅ p ( x ) \mathrm{Var}(X) = \sigma^2 = E\!\left[(X - \mu)^2\right] = \sum_{\mathrm{all } x} (x - \mu)^2 \cdot p(x) Var ( X ) = σ 2 = E [ ( X − μ ) 2 ] = all x ∑ ( x − μ ) 2 ⋅ p ( x )
The computational formula is almost always more convenient:
V a r ( X ) = E ( X 2 ) − [ E ( X ) ] 2 \mathrm{Var}(X) = E(X^2) - [E(X)]^2 Var ( X ) = E ( X 2 ) − [ E ( X ) ] 2
The standard deviation is σ = V a r ( X ) \sigma = \sqrt{\mathrm{Var}(X)} σ = Var ( X ) . It has the same units as X X X and
measures the typical distance of values from the mean.
Properties of Expectation and Variance
For any constant a a a and random variable X X X :
E ( a ) = a , E ( a X ) = a E ( X ) , E ( X + a ) = E ( X ) + a E(a) = a, \quad E(aX) = aE(X), \quad E(X + a) = E(X) + a E ( a ) = a , E ( a X ) = a E ( X ) , E ( X + a ) = E ( X ) + a
V a r ( a ) = 0 , V a r ( a X ) = a 2 V a r ( X ) , V a r ( X + a ) = V a r ( X ) \mathrm{Var}(a) = 0, \quad \mathrm{Var}(aX) = a^2 \mathrm{Var}(X), \quad \mathrm{Var}(X + a) = \mathrm{Var}(X) Var ( a ) = 0 , Var ( a X ) = a 2 Var ( X ) , Var ( X + a ) = Var ( X )
Adding a constant shifts the distribution but does not change its spread. Multiplying by a a a scales
the spread by ∣ a ∣ |a| ∣ a ∣ .
A discrete random variable X X X has PMF:
x x x 0 1 2 3 P ( X = x ) P(X = x) P ( X = x ) 0.1 0.4 0.3 0.2
E ( X ) = 0 ( 0.1 ) + 1 ( 0.4 ) + 2 ( 0.3 ) + 3 ( 0.2 ) = 1.6 E(X) = 0(0.1) + 1(0.4) + 2(0.3) + 3(0.2) = 1.6 E ( X ) = 0 ( 0.1 ) + 1 ( 0.4 ) + 2 ( 0.3 ) + 3 ( 0.2 ) = 1.6 E ( X 2 ) = 0 ( 0.1 ) + 1 ( 0.4 ) + 4 ( 0.3 ) + 9 ( 0.2 ) = 3.4 E(X^2) = 0(0.1) + 1(0.4) + 4(0.3) + 9(0.2) = 3.4 E ( X 2 ) = 0 ( 0.1 ) + 1 ( 0.4 ) + 4 ( 0.3 ) + 9 ( 0.2 ) = 3.4 V a r ( X ) = 3.4 − 1.6 2 = 3.4 − 2.56 = 0.84 , σ = 0.84 ≈ 0.917 \mathrm{Var}(X) = 3.4 - 1.6^2 = 3.4 - 2.56 = 0.84, \quad \sigma = \sqrt{0.84} \approx 0.917 Var ( X ) = 3.4 − 1. 6 2 = 3.4 − 2.56 = 0.84 , σ = 0.84 ≈ 0.917
Example: Finding an unknown parameter
P ( X = x ) = k x P(X = x) = kx P ( X = x ) = k x for x = 1 , 2 , 3 , 4 x = 1, 2, 3, 4 x = 1 , 2 , 3 , 4 . Find k k k and E ( X ) E(X) E ( X ) .
k ( 1 + 2 + 3 + 4 ) = 1 ⟹ 10 k = 1 ⟹ k = 0.1 k(1 + 2 + 3 + 4) = 1 \implies 10k = 1 \implies k = 0.1 k ( 1 + 2 + 3 + 4 ) = 1 ⟹ 10 k = 1 ⟹ k = 0.1
E ( X ) = 1 ( 0.1 ) + 2 ( 0.2 ) + 3 ( 0.3 ) + 4 ( 0.4 ) = 0.1 + 0.4 + 0.9 + 1.6 = 3.0 E(X) = 1(0.1) + 2(0.2) + 3(0.3) + 4(0.4) = 0.1 + 0.4 + 0.9 + 1.6 = 3.0 E ( X ) = 1 ( 0.1 ) + 2 ( 0.2 ) + 3 ( 0.3 ) + 4 ( 0.4 ) = 0.1 + 0.4 + 0.9 + 1.6 = 3.0
Worked Example: E(X) and Var(X) from a Table A random variable X X X has the following PMF:
x x x 1 2 3 4 5 P ( X = x ) P(X = x) P ( X = x ) 0.1 0.2 0.3 0.25 0.15
E ( X ) = 1 ( 0.1 ) + 2 ( 0.2 ) + 3 ( 0.3 ) + 4 ( 0.25 ) + 5 ( 0.15 ) = 0.1 + 0.4 + 0.9 + 1.0 + 0.75 = 3.15 E(X) = 1(0.1) + 2(0.2) + 3(0.3) + 4(0.25) + 5(0.15) = 0.1 + 0.4 + 0.9 + 1.0 + 0.75 = 3.15 E ( X ) = 1 ( 0.1 ) + 2 ( 0.2 ) + 3 ( 0.3 ) + 4 ( 0.25 ) + 5 ( 0.15 ) = 0.1 + 0.4 + 0.9 + 1.0 + 0.75 = 3.15
E ( X 2 ) = 1 ( 0.1 ) + 4 ( 0.2 ) + 9 ( 0.3 ) + 16 ( 0.25 ) + 25 ( 0.15 ) = 0.1 + 0.8 + 2.7 + 4.0 + 3.75 = 11.35 E(X^2) = 1(0.1) + 4(0.2) + 9(0.3) + 16(0.25) + 25(0.15) = 0.1 + 0.8 + 2.7 + 4.0 + 3.75 = 11.35 E ( X 2 ) = 1 ( 0.1 ) + 4 ( 0.2 ) + 9 ( 0.3 ) + 16 ( 0.25 ) + 25 ( 0.15 ) = 0.1 + 0.8 + 2.7 + 4.0 + 3.75 = 11.35
V a r ( X ) = 11.35 − 3.15 2 = 11.35 − 9.9225 = 1.4275 \mathrm{Var}(X) = 11.35 - 3.15^2 = 11.35 - 9.9225 = 1.4275 Var ( X ) = 11.35 − 3.1 5 2 = 11.35 − 9.9225 = 1.4275
σ = 1.4275 ≈ 1.195 \sigma = \sqrt{1.4275} \approx 1.195 σ = 1.4275 ≈ 1.195
Binomial Distribution
Conditions
A random variable X X X follows a binomial distribution, X ∼ B ( n , p ) X \sim B(n, p) X ∼ B ( n , p ) , when all four conditions
hold:
Fixed number of trials : exactly n n n identical trials.
Independent trials : each trial's outcome does not affect any other.
Two outcomes : each trial yields success (probability p p p ) or failure (probability q = 1 − p q = 1-p q = 1 − p ).
Constant probability : p p p is the same for every trial.
X X X counts the number of successes in n n n trials.
Probability Mass Function
P ( X = x ) = ( n x ) p x ( 1 − p ) n − x , x = 0 , 1 , 2 , … , n P(X = x) = \binom{n}{x} p^x (1-p)^{n-x}, \quad x = 0, 1, 2, \ldots, n P ( X = x ) = ( x n ) p x ( 1 − p ) n − x , x = 0 , 1 , 2 , … , n
where ( n x ) = n ! x ! ( n − x ) ! \dbinom{n}{x} = \dfrac{n!}{x!(n-x)!} ( x n ) = x ! ( n − x )! n ! counts the arrangements of x x x successes among n n n
trials.
Mean and Variance
E ( X ) = n p , V a r ( X ) = n p ( 1 − p ) , σ = n p ( 1 − p ) E(X) = np, \quad \mathrm{Var}(X) = np(1-p), \quad \sigma = \sqrt{np(1-p)} E ( X ) = n p , Var ( X ) = n p ( 1 − p ) , σ = n p ( 1 − p )
Derivation of E ( X ) = n p E(X) = np E ( X ) = n p and V a r ( X ) = n p ( 1 − p ) \mathrm{Var}(X) = np(1-p) Var ( X ) = n p ( 1 − p ) Let X 1 , … , X n X_1, \ldots, X_n X 1 , … , X n be indicator variables: X i = 1 X_i = 1 X i = 1 if trial i i i succeeds, X i = 0 X_i = 0 X i = 0 otherwise.
Then X = X 1 + ⋯ + X n X = X_1 + \cdots + X_n X = X 1 + ⋯ + X n .
E ( X i ) = 1 ⋅ p + 0 ⋅ ( 1 − p ) = p E(X_i) = 1 \cdot p + 0 \cdot (1-p) = p E ( X i ) = 1 ⋅ p + 0 ⋅ ( 1 − p ) = p , so E ( X ) = n p E(X) = np E ( X ) = n p by linearity of expectation.
V a r ( X i ) = E ( X i 2 ) − [ E ( X i ) ] 2 = p − p 2 = p ( 1 − p ) \mathrm{Var}(X_i) = E(X_i^2) - [E(X_i)]^2 = p - p^2 = p(1-p) Var ( X i ) = E ( X i 2 ) − [ E ( X i ) ] 2 = p − p 2 = p ( 1 − p ) , so V a r ( X ) = n p ( 1 − p ) \mathrm{Var}(X) = np(1-p) Var ( X ) = n p ( 1 − p ) by
independence.
Shape
p = 0.5 p = 0.5 p = 0.5 : symmetric about n p np n p .
p < 0.5 p \lt 0.5 p < 0.5 : positively skewed (right tail longer).
p > 0.5 p \gt 0.5 p > 0.5 : negatively skewed (left tail longer).
As n n n increases the distribution approaches a bell shape (by the Central Limit Theorem). The mode
of B ( n , p ) B(n, p) B ( n , p ) is at ⌊ ( n + 1 ) p ⌋ \lfloor (n+1)p \rfloor ⌊( n + 1 ) p ⌋ .
Cumulative Probabilities
On a GDC, P ( X ≤ k ) P(X \le k) P ( X ≤ k ) is computed directly. For "at least" problems, use the complement:
P ( X ≥ k ) = 1 − P ( X ≤ k − 1 ) P(X \ge k) = 1 - P(X \le k - 1) P ( X ≥ k ) = 1 − P ( X ≤ k − 1 )
Normal Approximation to the Binomial
When n n n is large and p p p is not too close to 0 or 1 (rule of thumb: n p ≥ 5 np \ge 5 n p ≥ 5 and n ( 1 − p ) ≥ 5 n(1-p) \ge 5 n ( 1 − p ) ≥ 5 ),
the binomial can be approximated by the normal with matching mean and variance:
B ( n , p ) ≈ N ( n p , n p ( 1 − p ) ) B(n, p) \approx N(np, np(1-p)) B ( n , p ) ≈ N ( n p , n p ( 1 − p ))
A continuity correction is required. For example:
P ( X ≤ k ) ≈ P ( Z ≤ k + 0.5 − n p n p ( 1 − p ) ) P(X \le k) \approx P\!\left(Z \le \frac{k + 0.5 - np}{\sqrt{np(1-p)}}\right) P ( X ≤ k ) ≈ P ( Z ≤ n p ( 1 − p ) k + 0.5 − n p )
P ( X = k ) ≈ P ( k − 0.5 − n p n p ( 1 − p ) < Z < k + 0.5 − n p n p ( 1 − p ) ) P(X = k) \approx P\!\left(\frac{k - 0.5 - np}{\sqrt{np(1-p)}} \lt Z \lt \frac{k + 0.5 - np}{\sqrt{np(1-p)}}\right) P ( X = k ) ≈ P ( n p ( 1 − p ) k − 0.5 − n p < Z < n p ( 1 − p ) k + 0.5 − n p )
A factory produces bulbs with 3% defect rate. X ∼ B ( 20 , 0.03 ) X \sim B(20, 0.03) X ∼ B ( 20 , 0.03 ) is the number of defects in a
sample of 20.
P ( X = 2 ) = ( 20 2 ) ( 0.03 ) 2 ( 0.97 ) 18 = 190 × 0.0009 × 0.5781 ≈ 0.0988 P(X = 2) = \binom{20}{2}(0.03)^2(0.97)^{18} = 190 \times 0.0009 \times 0.5781 \approx 0.0988 P ( X = 2 ) = ( 2 20 ) ( 0.03 ) 2 ( 0.97 ) 18 = 190 × 0.0009 × 0.5781 ≈ 0.0988
P ( X ≤ 1 ) = ( 0.97 ) 20 + 20 ( 0.03 ) ( 0.97 ) 19 ≈ 0.5438 + 0.3364 ≈ 0.8802 P(X \le 1) = (0.97)^{20} + 20(0.03)(0.97)^{19} \approx 0.5438 + 0.3364 \approx 0.8802 P ( X ≤ 1 ) = ( 0.97 ) 20 + 20 ( 0.03 ) ( 0.97 ) 19 ≈ 0.5438 + 0.3364 ≈ 0.8802
P ( X ≥ 3 ) = 1 − P ( X ≤ 2 ) ≈ 1 − 0.8802 − 0.0988 = 0.0210 P(X \ge 3) = 1 - P(X \le 2) \approx 1 - 0.8802 - 0.0988 = 0.0210 P ( X ≥ 3 ) = 1 − P ( X ≤ 2 ) ≈ 1 − 0.8802 − 0.0988 = 0.0210
E ( X ) = 20 ( 0.03 ) = 0.6 E(X) = 20(0.03) = 0.6 E ( X ) = 20 ( 0.03 ) = 0.6 , σ = 20 ( 0.03 ) ( 0.97 ) = 0.582 ≈ 0.763 \sigma = \sqrt{20(0.03)(0.97)} = \sqrt{0.582} \approx 0.763 σ = 20 ( 0.03 ) ( 0.97 ) = 0.582 ≈ 0.763
Example: IB Paper 2 style
A multiple choice test has 15 questions with 5 options each. A student guesses all answers.
X ∼ B ( 15 , 0.2 ) X \sim B(15, 0.2) X ∼ B ( 15 , 0.2 ) .
P ( X = 4 ) = ( 15 4 ) ( 0.2 ) 4 ( 0.8 ) 11 ≈ 0.1876 P(X = 4) = \binom{15}{4}(0.2)^4(0.8)^{11} \approx 0.1876 P ( X = 4 ) = ( 4 15 ) ( 0.2 ) 4 ( 0.8 ) 11 ≈ 0.1876
P ( X ≥ 8 ) = 1 − P ( X ≤ 7 ) ≈ 0.0042 P(X \ge 8) = 1 - P(X \le 7) \approx 0.0042 P ( X ≥ 8 ) = 1 − P ( X ≤ 7 ) ≈ 0.0042
To set a pass mark so that guessing gives at most 1% chance of passing:
P ( X ≥ 7 ) ≈ 0.0181 P(X \ge 7) \approx 0.0181 P ( X ≥ 7 ) ≈ 0.0181 and P ( X ≥ 8 ) ≈ 0.0042 P(X \ge 8) \approx 0.0042 P ( X ≥ 8 ) ≈ 0.0042 , so the minimum pass mark is 8 correct.
Worked Example: Binomial Probability with Normal Approximation A company manufactures light bulbs. On average, 8% are defective. A random sample of 100 bulbs is
selected. Find the probability that more than 12 are defective.
Let X ∼ B ( 100 , 0.08 ) X \sim B(100, 0.08) X ∼ B ( 100 , 0.08 ) .
Check conditions for normal approximation: n p = 8 ≥ 5 np = 8 \ge 5 n p = 8 ≥ 5 and n ( 1 − p ) = 92 ≥ 5 n(1-p) = 92 \ge 5 n ( 1 − p ) = 92 ≥ 5 .
μ = 100 ( 0.08 ) = 8 , σ 2 = 100 ( 0.08 ) ( 0.92 ) = 7.36 , σ = 2.713 \mu = 100(0.08) = 8, \quad \sigma^2 = 100(0.08)(0.92) = 7.36, \quad \sigma = 2.713 μ = 100 ( 0.08 ) = 8 , σ 2 = 100 ( 0.08 ) ( 0.92 ) = 7.36 , σ = 2.713
With continuity correction:
P ( X > 12 ) = P ( X ≥ 13 ) ≈ P ( Z > 12.5 − 8 2.713 ) = P ( Z > 1.659 ) P(X \gt 12) = P(X \ge 13) \approx P\!\left(Z \gt \frac{12.5 - 8}{2.713}\right) = P(Z \gt 1.659) P ( X > 12 ) = P ( X ≥ 13 ) ≈ P ( Z > 2.713 12.5 − 8 ) = P ( Z > 1.659 )
≈ 1 − Φ ( 1.659 ) ≈ 1 − 0.9515 = 0.0485 \approx 1 - \Phi(1.659) \approx 1 - 0.9515 = 0.0485 ≈ 1 − Φ ( 1.659 ) ≈ 1 − 0.9515 = 0.0485
There is approximately a 4.85% chance that more than 12 bulbs are defective.
Poisson Distribution
Conditions
X ∼ P o ( λ ) X \sim \mathrm{Po}(\lambda) X ∼ Po ( λ ) models the number of events in a fixed interval of time or space when:
Events occur singly : no simultaneous events.
Independence : events in non-overlapping intervals are independent.
Constant rate : events occur at average rate λ \lambda λ per unit interval.
Randomness : the count is proportional to the interval size.
Probability Mass Function
P ( X = x ) = e − λ λ x x ! , x = 0 , 1 , 2 , … P(X = x) = \frac{e^{-\lambda} \lambda^x}{x!}, \quad x = 0, 1, 2, \ldots P ( X = x ) = x ! e − λ λ x , x = 0 , 1 , 2 , …
where λ > 0 \lambda \gt 0 λ > 0 is the mean number of events, and e ≈ 2.71828 e \approx 2.71828 e ≈ 2.71828 .
Mean and Variance
E ( X ) = λ , V a r ( X ) = λ E(X) = \lambda, \quad \mathrm{Var}(X) = \lambda E ( X ) = λ , Var ( X ) = λ
That E ( X ) = V a r ( X ) E(X) = \mathrm{Var}(X) E ( X ) = Var ( X ) is a distinguishing feature. If observed data has mean approximately
equal to variance, a Poisson model may be appropriate.
Derivation of E ( X ) = λ E(X) = \lambda E ( X ) = λ and V a r ( X ) = λ \mathrm{Var}(X) = \lambda Var ( X ) = λ E ( X ) = ∑ x = 0 ∞ x ⋅ e − λ λ x x ! = e − λ ∑ x = 1 ∞ λ x ( x − 1 ) ! E(X) = \displaystyle\sum_{x=0}^{\infty} x \cdot \frac{e^{-\lambda}\lambda^x}{x!} = e^{-\lambda} \sum_{x=1}^{\infty} \frac{\lambda^x}{(x-1)!} E ( X ) = x = 0 ∑ ∞ x ⋅ x ! e − λ λ x = e − λ x = 1 ∑ ∞ ( x − 1 )! λ x
Substituting k = x − 1 k = x-1 k = x − 1 :
= e − λ ∑ k = 0 ∞ λ k + 1 k ! = λ e − λ ⋅ e λ = λ = e^{-\lambda} \sum_{k=0}^{\infty} \frac{\lambda^{k+1}}{k!} = \lambda e^{-\lambda} \cdot e^{\lambda} = \lambda = e − λ ∑ k = 0 ∞ k ! λ k + 1 = λ e − λ ⋅ e λ = λ .
For variance, use x 2 = x ( x − 1 ) + x x^2 = x(x-1) + x x 2 = x ( x − 1 ) + x : E ( X 2 ) = E [ X ( X − 1 ) ] + E ( X ) = λ 2 + λ E(X^2) = E[X(X-1)] + E(X) = \lambda^2 + \lambda E ( X 2 ) = E [ X ( X − 1 )] + E ( X ) = λ 2 + λ , so
V a r ( X ) = λ 2 + λ − λ 2 = λ \mathrm{Var}(X) = \lambda^2 + \lambda - \lambda^2 = \lambda Var ( X ) = λ 2 + λ − λ 2 = λ .
Poisson as a Limit of the Binomial
If n → ∞ n \to \infty n → ∞ , p → 0 p \to 0 p → 0 , while n p = λ np = \lambda n p = λ stays constant, then
B ( n , p ) → P o ( λ ) B(n, p) \to \mathrm{Po}(\lambda) B ( n , p ) → Po ( λ ) . The Poisson approximates the binomial when n n n is large, p p p is
small, and n p np n p is moderate (typically n ≥ 50 n \ge 50 n ≥ 50 , p ≤ 0.1 p \le 0.1 p ≤ 0.1 ).
Additivity
If X ∼ P o ( λ 1 ) X \sim \mathrm{Po}(\lambda_1) X ∼ Po ( λ 1 ) and Y ∼ P o ( λ 2 ) Y \sim \mathrm{Po}(\lambda_2) Y ∼ Po ( λ 2 ) are independent, then:
X + Y ∼ P o ( λ 1 + λ 2 ) X + Y \sim \mathrm{Po}(\lambda_1 + \lambda_2) X + Y ∼ Po ( λ 1 + λ 2 )
If the rate is λ \lambda λ per unit interval, then over t t t intervals the count is
P o ( t λ ) \mathrm{Po}(t\lambda) Po ( t λ ) .
A helpdesk receives λ = 3.5 \lambda = 3.5 λ = 3.5 calls per hour. X ∼ P o ( 3.5 ) X \sim \mathrm{Po}(3.5) X ∼ Po ( 3.5 ) .
P ( X = 5 ) = e − 3.5 ⋅ 3.5 5 5 ! ≈ 0.1318 P(X = 5) = \dfrac{e^{-3.5} \cdot 3.5^5}{5!} \approx 0.1318 P ( X = 5 ) = 5 ! e − 3.5 ⋅ 3. 5 5 ≈ 0.1318
P ( X ≤ 2 ) = e − 3.5 ( 1 + 3.5 + 12.25 2 ) = 10.625 e − 3.5 ≈ 0.3208 P(X \le 2) = e^{-3.5}\!\left(1 + 3.5 + \dfrac{12.25}{2}\right) = 10.625 \, e^{-3.5} \approx 0.3208 P ( X ≤ 2 ) = e − 3.5 ( 1 + 3.5 + 2 12.25 ) = 10.625 e − 3.5 ≈ 0.3208
Over 2 hours: Y ∼ P o ( 7 ) Y \sim \mathrm{Po}(7) Y ∼ Po ( 7 ) , P ( Y > 7 ) = 1 − P ( Y ≤ 7 ) ≈ 0.4013 P(Y \gt 7) = 1 - P(Y \le 7) \approx 0.4013 P ( Y > 7 ) = 1 − P ( Y ≤ 7 ) ≈ 0.4013 .
Example: Poisson approximation to Binomial
A typesetter makes errors at a rate of 1 per 500 characters. In a passage of 2000 characters, find
the probability of at most 2 errors.
Exact: X ∼ B ( 2000 , 1 / 500 ) X \sim B(2000, 1/500) X ∼ B ( 2000 , 1/500 ) , with λ = 2000 / 500 = 4 \lambda = 2000/500 = 4 λ = 2000/500 = 4 .
Approximate: X ≈ P o ( 4 ) X \approx \mathrm{Po}(4) X ≈ Po ( 4 ) .
P ( X ≤ 2 ) = e − 4 ( 1 + 4 + 16 2 ) = 13 e − 4 ≈ 0.2381 P(X \le 2) = e^{-4}\!\left(1 + 4 + \dfrac{16}{2}\right) = 13e^{-4} \approx 0.2381 P ( X ≤ 2 ) = e − 4 ( 1 + 4 + 2 16 ) = 13 e − 4 ≈ 0.2381
Using exact binomial:
P ( X ≤ 2 ) = ( 499 / 500 ) 2000 + 2000 ( 1 / 500 ) ( 499 / 500 ) 1999 + ( 2000 2 ) ( 1 / 500 ) 2 ( 499 / 500 ) 1998 P(X \le 2) = (499/500)^{2000} + 2000(1/500)(499/500)^{1999} + \binom{2000}{2}(1/500)^2(499/500)^{1998} P ( X ≤ 2 ) = ( 499/500 ) 2000 + 2000 ( 1/500 ) ( 499/500 ) 1999 + ( 2 2000 ) ( 1/500 ) 2 ( 499/500 ) 1998
This is computationally intensive but gives a result extremely close to 0.2381.
Worked Example: Poisson Distribution A call centre receives calls at a rate of λ = 4.2 \lambda = 4.2 λ = 4.2 per 10-minute interval.
(a) Find the probability of receiving exactly 6 calls in a 10-minute interval.
P ( X = 6 ) = e − 4.2 ⋅ 4.2 6 6 ! = e − 4.2 × 5489.0 720 P(X = 6) = \frac{e^{-4.2} \cdot 4.2^6}{6!} = \frac{e^{-4.2} \times 5489.0}{720} P ( X = 6 ) = 6 ! e − 4.2 ⋅ 4. 2 6 = 720 e − 4.2 × 5489.0
= 7.624 × e − 4.2 ≈ 7.624 × 0.0150 ≈ 0.1142 = 7.624 \times e^{-4.2} \approx 7.624 \times 0.0150 \approx 0.1142 = 7.624 × e − 4.2 ≈ 7.624 × 0.0150 ≈ 0.1142
(b) Find the probability of receiving at most 3 calls.
P ( X ≤ 3 ) = e − 4.2 ( 1 + 4.2 + 17.64 2 + 74.088 6 ) = e − 4.2 ( 1 + 4.2 + 8.82 + 12.348 ) P(X \le 3) = e^{-4.2}\!\left(1 + 4.2 + \frac{17.64}{2} + \frac{74.088}{6}\right) = e^{-4.2}(1 + 4.2 + 8.82 + 12.348) P ( X ≤ 3 ) = e − 4.2 ( 1 + 4.2 + 2 17.64 + 6 74.088 ) = e − 4.2 ( 1 + 4.2 + 8.82 + 12.348 )
= 26.368 × e − 4.2 ≈ 0.3954 = 26.368 \times e^{-4.2} \approx 0.3954 = 26.368 × e − 4.2 ≈ 0.3954
(c) Over a full hour (six intervals), find the probability of more than 30 calls.
Over one hour: Y ∼ P o ( 6 × 4.2 ) = P o ( 25.2 ) Y \sim \mathrm{Po}(6 \times 4.2) = \mathrm{Po}(25.2) Y ∼ Po ( 6 × 4.2 ) = Po ( 25.2 ) .
Using the normal approximation (since λ \lambda λ is large):
μ = 25.2 , σ = 25.2 = 5.020 \mu = 25.2, \quad \sigma = \sqrt{25.2} = 5.020 μ = 25.2 , σ = 25.2 = 5.020
P ( Y > 30 ) ≈ P ( Z > 30.5 − 25.2 5.020 ) = P ( Z > 1.056 ) ≈ 0.1455 P(Y \gt 30) \approx P\!\left(Z \gt \frac{30.5 - 25.2}{5.020}\right) = P(Z \gt 1.056) \approx 0.1455 P ( Y > 30 ) ≈ P ( Z > 5.020 30.5 − 25.2 ) = P ( Z > 1.056 ) ≈ 0.1455
Normal Distribution
Definition and Properties
X ∼ N ( μ , σ 2 ) X \sim N(\mu, \sigma^2) X ∼ N ( μ , σ 2 ) has probability density function:
f ( x ) = 1 σ 2 π e − ( x − μ ) 2 2 σ 2 , − ∞ < x < ∞ f(x) = \frac{1}{\sigma\sqrt{2\pi}} \, e^{-\frac{(x-\mu)^2}{2\sigma^2}}, \quad -\infty \lt x \lt \infty f ( x ) = σ 2 π 1 e − 2 σ 2 ( x − μ ) 2 , − ∞ < x < ∞
Key properties: bell-shaped, symmetric about x = μ x = \mu x = μ , asymptotic to the x x x -axis, total area = 1,
inflection points at x = μ ± σ x = \mu \pm \sigma x = μ ± σ . The mean, median, and mode all equal μ \mu μ .
E ( X ) = μ E(X) = \mu E ( X ) = μ and V a r ( X ) = σ 2 \mathrm{Var}(X) = \sigma^2 Var ( X ) = σ 2 .
For any normal variable, P ( X = a ) = 0 P(X = a) = 0 P ( X = a ) = 0 for any specific value a a a (continuous distribution).
The Empirical Rule (68-95-99.7)
P ( μ − σ < X < μ + σ ) ≈ 68.27 % P(\mu - \sigma \lt X \lt \mu + \sigma) \approx 68.27\% P ( μ − σ < X < μ + σ ) ≈ 68.27%
P ( μ − 2 σ < X < μ + 2 σ ) ≈ 95.45 % P(\mu - 2\sigma \lt X \lt \mu + 2\sigma) \approx 95.45\% P ( μ − 2 σ < X < μ + 2 σ ) ≈ 95.45%
P ( μ − 3 σ < X < μ + 3 σ ) ≈ 99.73 % P(\mu - 3\sigma \lt X \lt \mu + 3\sigma) \approx 99.73\% P ( μ − 3 σ < X < μ + 3 σ ) ≈ 99.73%
Standard Normal Distribution
The standard normal is Z ∼ N ( 0 , 1 ) Z \sim N(0, 1) Z ∼ N ( 0 , 1 ) . Any normal variable standardises via:
Z = X − μ σ Z = \frac{X - \mu}{\sigma} Z = σ X − μ
The CDF is Φ ( z ) = P ( Z ≤ z ) \Phi(z) = P(Z \le z) Φ ( z ) = P ( Z ≤ z ) . Key properties:
Φ ( − z ) = 1 − Φ ( z ) , P ( Z > z ) = 1 − Φ ( z ) , P ( − z < Z < z ) = 2 Φ ( z ) − 1 \Phi(-z) = 1 - \Phi(z), \quad P(Z \gt z) = 1 - \Phi(z), \quad P(-z \lt Z \lt z) = 2\Phi(z) - 1 Φ ( − z ) = 1 − Φ ( z ) , P ( Z > z ) = 1 − Φ ( z ) , P ( − z < Z < z ) = 2Φ ( z ) − 1
Probability Calculations
For X ∼ N ( μ , σ 2 ) X \sim N(\mu, \sigma^2) X ∼ N ( μ , σ 2 ) , to find P ( a < X < b ) P(a \lt X \lt b) P ( a < X < b ) , convert to z z z -scores:
P ( a < X < b ) = Φ ( b − μ σ ) − Φ ( a − μ σ ) P(a \lt X \lt b) = \Phi\!\left(\frac{b - \mu}{\sigma}\right) - \Phi\!\left(\frac{a - \mu}{\sigma}\right) P ( a < X < b ) = Φ ( σ b − μ ) − Φ ( σ a − μ )
On the GDC these are computed directly without manual standardisation.
Inverse Normal
Given probability p p p , the inverse normal finds x x x such that P ( X ≤ x ) = p P(X \le x) = p P ( X ≤ x ) = p . For the standard
normal, z = Φ − 1 ( p ) z = \Phi^{-1}(p) z = Φ − 1 ( p ) . For a general normal: x = μ + z σ x = \mu + z\sigma x = μ + z σ .
Finding Unknown Parameters
When μ \mu μ or σ \sigma σ is unknown, use standardisation with a known probability to set up
simultaneous equations. Each known probability gives one equation in two unknowns; two probabilities
are needed.
Bags of flour: X ∼ N ( 1000 , 225 ) X \sim N(1000, 225) X ∼ N ( 1000 , 225 ) (mean 1000 g, σ = 15 \sigma = 15 σ = 15 g).
P ( 985 < X < 1020 ) = P ( − 1 < Z < 1.333 ) = Φ ( 1.333 ) − Φ ( − 1 ) ≈ 0.9088 − 0.1587 = 0.7501 P(985 \lt X \lt 1020) = P(-1 \lt Z \lt 1.333) = \Phi(1.333) - \Phi(-1) \approx 0.9088 - 0.1587 = 0.7501 P ( 985 < X < 1020 ) = P ( − 1 < Z < 1.333 ) = Φ ( 1.333 ) − Φ ( − 1 ) ≈ 0.9088 − 0.1587 = 0.7501
P ( X < 970 ) = P ( Z < − 2 ) = 0.0228 P(X \lt 970) = P(Z \lt -2) = 0.0228 P ( X < 970 ) = P ( Z < − 2 ) = 0.0228 , so about 2.28% are rejected.
For the mass exceeded by only 5%: P ( X ≤ x ) = 0.95 P(X \le x) = 0.95 P ( X ≤ x ) = 0.95 , x = 1000 + 1.645 ( 15 ) = 1024.67 x = 1000 + 1.645(15) = 1024.67 x = 1000 + 1.645 ( 15 ) = 1024.67 g.
Example: Unknown parameters
Test scores are normal. 15% score above 80, 10% score below 45. Find μ \mu μ and σ \sigma σ .
80 − μ σ = 1.036 \dfrac{80 - \mu}{\sigma} = 1.036 σ 80 − μ = 1.036 and 45 − μ σ = − 1.282 \dfrac{45 - \mu}{\sigma} = -1.282 σ 45 − μ = − 1.282 .
Subtracting: 35 = 2.318 σ 35 = 2.318\sigma 35 = 2.318 σ , so σ ≈ 15.1 \sigma \approx 15.1 σ ≈ 15.1 and μ = 80 − 1.036 ( 15.1 ) ≈ 64.4 \mu = 80 - 1.036(15.1) \approx 64.4 μ = 80 − 1.036 ( 15.1 ) ≈ 64.4 .
Example: Normal approximation to Binomial
X ∼ B ( 80 , 0.4 ) X \sim B(80, 0.4) X ∼ B ( 80 , 0.4 ) . Find P ( X ≤ 30 ) P(X \le 30) P ( X ≤ 30 ) using a normal approximation.
μ = 80 ( 0.4 ) = 32 \mu = 80(0.4) = 32 μ = 80 ( 0.4 ) = 32 , σ 2 = 80 ( 0.4 ) ( 0.6 ) = 19.2 \sigma^2 = 80(0.4)(0.6) = 19.2 σ 2 = 80 ( 0.4 ) ( 0.6 ) = 19.2 , σ = 4.382 \sigma = 4.382 σ = 4.382 .
With continuity correction:
P ( X ≤ 30 ) ≈ P ( Z ≤ 30.5 − 32 4.382 ) = P ( Z ≤ − 0.342 ) P(X \le 30) \approx P\!\left(Z \le \dfrac{30.5 - 32}{4.382}\right) = P(Z \le -0.342) P ( X ≤ 30 ) ≈ P ( Z ≤ 4.382 30.5 − 32 ) = P ( Z ≤ − 0.342 ) .
≈ 0.3665 \approx 0.3665 ≈ 0.3665
Exact binomial: P ( X ≤ 30 ) ≈ 0.3642 P(X \le 30) \approx 0.3642 P ( X ≤ 30 ) ≈ 0.3642 . The approximation is very close.
Worked Example: Normal Distribution with Unknown Parameters Heights of a population are normally distributed. The 90th percentile is 182 c m 182\,\mathrm{cm} 182 cm and the
30th percentile is 164 c m 164\,\mathrm{cm} 164 cm . Find the mean and standard deviation.
P ( X ≤ 182 ) = 0.90 ⟹ 182 − μ σ = 1.282 P(X \le 182) = 0.90 \implies \frac{182 - \mu}{\sigma} = 1.282 P ( X ≤ 182 ) = 0.90 ⟹ σ 182 − μ = 1.282
P ( X ≤ 164 ) = 0.30 ⟹ 164 − μ σ = − 0.524 P(X \le 164) = 0.30 \implies \frac{164 - \mu}{\sigma} = -0.524 P ( X ≤ 164 ) = 0.30 ⟹ σ 164 − μ = − 0.524
Subtracting the second equation from the first:
18 σ = 1.806 ⟹ σ = 18 1.806 = 9.97 c m \frac{18}{\sigma} = 1.806 \implies \sigma = \frac{18}{1.806} = 9.97\,\mathrm{cm} σ 18 = 1.806 ⟹ σ = 1.806 18 = 9.97 cm
From the first equation: μ = 182 − 1.282 ( 9.97 ) = 182 − 12.78 = 169.2 c m \mu = 182 - 1.282(9.97) = 182 - 12.78 = 169.2\,\mathrm{cm} μ = 182 − 1.282 ( 9.97 ) = 182 − 12.78 = 169.2 cm .
So μ ≈ 169 c m \mu \approx 169\,\mathrm{cm} μ ≈ 169 cm and σ ≈ 10 c m \sigma \approx 10\,\mathrm{cm} σ ≈ 10 cm .
Definition
X ∼ U ( a , b ) X \sim U(a, b) X ∼ U ( a , b ) has PDF:
f ( x ) = 1 b − a , a ≤ x ≤ b f(x) = \frac{1}{b - a}, \quad a \le x \le b f ( x ) = b − a 1 , a ≤ x ≤ b
and f ( x ) = 0 f(x) = 0 f ( x ) = 0 otherwise. The PDF is constant over [ a , b ] [a, b] [ a , b ] , meaning all values in the interval are
equally likely.
Mean and Variance
E ( X ) = a + b 2 , V a r ( X ) = ( b − a ) 2 12 , σ = b − a 2 3 E(X) = \frac{a + b}{2}, \quad \mathrm{Var}(X) = \frac{(b - a)^2}{12}, \quad \sigma = \frac{b - a}{2\sqrt{3}} E ( X ) = 2 a + b , Var ( X ) = 12 ( b − a ) 2 , σ = 2 3 b − a
Derivation E ( X ) = ∫ a b x b − a d x = b 2 − a 2 2 ( b − a ) = a + b 2 E(X) = \displaystyle\int_a^b \frac{x}{b-a}\,dx = \frac{b^2-a^2}{2(b-a)} = \frac{a+b}{2} E ( X ) = ∫ a b b − a x d x = 2 ( b − a ) b 2 − a 2 = 2 a + b
E ( X 2 ) = ∫ a b x 2 b − a d x = b 3 − a 3 3 ( b − a ) = a 2 + a b + b 2 3 E(X^2) = \displaystyle\int_a^b \frac{x^2}{b-a}\,dx = \frac{b^3-a^3}{3(b-a)} = \frac{a^2+ab+b^2}{3} E ( X 2 ) = ∫ a b b − a x 2 d x = 3 ( b − a ) b 3 − a 3 = 3 a 2 + ab + b 2
V a r ( X ) = a 2 + a b + b 2 3 − ( a + b ) 2 4 = 4 ( a 2 + a b + b 2 ) − 3 ( a 2 + 2 a b + b 2 ) 12 = ( b − a ) 2 12 \mathrm{Var}(X) = \dfrac{a^2+ab+b^2}{3} - \dfrac{(a+b)^2}{4} = \dfrac{4(a^2+ab+b^2) - 3(a^2+2ab+b^2)}{12} = \dfrac{(b-a)^2}{12} Var ( X ) = 3 a 2 + ab + b 2 − 4 ( a + b ) 2 = 12 4 ( a 2 + ab + b 2 ) − 3 ( a 2 + 2 ab + b 2 ) = 12 ( b − a ) 2
CDF
F(x) = \begin`\{cases}` 0 & x \lt a \\ \dfrac{x - a}{b - a} & a \le x \le b \\ 1 & x \gt b \end`\{cases}`
For any [ c , d ] ⊆ [ a , b ] [c, d] \subseteq [a, b] [ c , d ] ⊆ [ a , b ] : P ( c ≤ X ≤ d ) = d − c b − a P(c \le X \le d) = \dfrac{d - c}{b - a} P ( c ≤ X ≤ d ) = b − a d − c .
A bus arrives every 15 minutes. X ∼ U ( 0 , 15 ) X \sim U(0, 15) X ∼ U ( 0 , 15 ) is the waiting time.
P ( X > 10 ) = 5 / 15 = 1 / 3 P(X \gt 10) = 5/15 = 1/3 P ( X > 10 ) = 5/15 = 1/3
E ( X ) = 7.5 E(X) = 7.5 E ( X ) = 7.5 minutes, σ = 15 2 3 = 5 3 2 ≈ 4.33 \sigma = \dfrac{15}{2\sqrt{3}} = \dfrac{5\sqrt{3}}{2} \approx 4.33 σ = 2 3 15 = 2 5 3 ≈ 4.33 minutes.
Given 5 minutes already waited, the remaining wait is U ( 0 , 10 ) U(0, 10) U ( 0 , 10 ) :
P ( w a i t ≥ 8 ) = 2 / 10 = 1 / 5 P(\mathrm{wait} \ge 8) = 2/10 = 1/5 P ( wait ≥ 8 ) = 2/10 = 1/5 .
Geometric Distribution (AHL)
Definition
X ∼ G e o ( p ) X \sim \mathrm{Geo}(p) X ∼ Geo ( p ) models the number of trials needed for the first success in independent
Bernoulli trials with success probability p p p .
Probability Mass Function
P ( X = x ) = ( 1 − p ) x − 1 p , x = 1 , 2 , 3 , … P(X = x) = (1-p)^{x-1} p, \quad x = 1, 2, 3, \ldots P ( X = x ) = ( 1 − p ) x − 1 p , x = 1 , 2 , 3 , …
The first x − 1 x-1 x − 1 trials must be failures, and trial x x x must succeed. This is the probability of
exactly x − 1 x-1 x − 1 consecutive failures followed by one success.
Mean and Variance
E ( X ) = 1 p , V a r ( X ) = 1 − p p 2 E(X) = \frac{1}{p}, \quad \mathrm{Var}(X) = \frac{1-p}{p^2} E ( X ) = p 1 , Var ( X ) = p 2 1 − p
Derivation of E ( X ) = 1 / p E(X) = 1/p E ( X ) = 1/ p and V a r ( X ) = ( 1 − p ) / p 2 \mathrm{Var}(X) = (1-p)/p^2 Var ( X ) = ( 1 − p ) / p 2 E ( X ) = p ∑ x = 1 ∞ x ( 1 − p ) x − 1 E(X) = p\displaystyle\sum_{x=1}^{\infty} x(1-p)^{x-1} E ( X ) = p x = 1 ∑ ∞ x ( 1 − p ) x − 1
Using ∑ x = 1 ∞ x r x − 1 = 1 ( 1 − r ) 2 \displaystyle\sum_{x=1}^{\infty} xr^{x-1} = \frac{1}{(1-r)^2} x = 1 ∑ ∞ x r x − 1 = ( 1 − r ) 2 1 for ∣ r ∣ < 1 |r| \lt 1 ∣ r ∣ < 1 , with
r = 1 − p r = 1-p r = 1 − p :
E ( X ) = p ⋅ 1 p 2 = 1 p E(X) = p \cdot \dfrac{1}{p^2} = \dfrac{1}{p} E ( X ) = p ⋅ p 2 1 = p 1
For variance: E ( X 2 ) = E [ X ( X − 1 ) ] + E ( X ) = 2 ( 1 − p ) p 2 + 1 p = 2 − p p 2 E(X^2) = E[X(X-1)] + E(X) = \dfrac{2(1-p)}{p^2} + \dfrac{1}{p} = \dfrac{2-p}{p^2} E ( X 2 ) = E [ X ( X − 1 )] + E ( X ) = p 2 2 ( 1 − p ) + p 1 = p 2 2 − p ,
so V a r ( X ) = 2 − p p 2 − 1 p 2 = 1 − p p 2 \mathrm{Var}(X) = \dfrac{2-p}{p^2} - \dfrac{1}{p^2} = \dfrac{1-p}{p^2} Var ( X ) = p 2 2 − p − p 2 1 = p 2 1 − p .
Useful shortcut
P ( X > n ) = ( 1 − p ) n P(X \gt n) = (1-p)^n P ( X > n ) = ( 1 − p ) n
The first n n n trials must all be failures. Similarly P ( X ≥ n ) = ( 1 − p ) n − 1 P(X \ge n) = (1-p)^{n-1} P ( X ≥ n ) = ( 1 − p ) n − 1 .
A basketball player has free-throw success rate 72%. X ∼ G e o ( 0.72 ) X \sim \mathrm{Geo}(0.72) X ∼ Geo ( 0.72 ) .
P ( X = 3 ) = ( 0.28 ) 2 ( 0.72 ) = 0.0784 × 0.72 ≈ 0.05645 P(X = 3) = (0.28)^2(0.72) = 0.0784 \times 0.72 \approx 0.05645 P ( X = 3 ) = ( 0.28 ) 2 ( 0.72 ) = 0.0784 × 0.72 ≈ 0.05645
P ( X > 5 ) = ( 0.28 ) 5 ≈ 0.00172 P(X \gt 5) = (0.28)^5 \approx 0.00172 P ( X > 5 ) = ( 0.28 ) 5 ≈ 0.00172
E ( X ) = 1 / 0.72 ≈ 1.389 E(X) = 1/0.72 \approx 1.389 E ( X ) = 1/0.72 ≈ 1.389 attempts.
Worked Example: Geometric Distribution A die is rolled repeatedly until a 6 appears.
X ∼ G e o ( 1 / 6 ) X \sim \mathrm{Geo}(1/6) X ∼ Geo ( 1/6 ) .
(a) Find the probability that the first 6 appears on the 4th roll.
P ( X = 4 ) = ( 5 6 ) 3 × 1 6 = 125 1296 ≈ 0.0965 P(X = 4) = \left(\frac{5}{6}\right)^3 \times \frac{1}{6} = \frac{125}{1296} \approx 0.0965 P ( X = 4 ) = ( 6 5 ) 3 × 6 1 = 1296 125 ≈ 0.0965
(b) Find the probability that at least 10 rolls are needed.
P ( X ≥ 10 ) = ( 1 − p ) 10 − 1 = ( 5 6 ) 9 = 0.1938 P(X \ge 10) = (1 - p)^{10-1} = \left(\frac{5}{6}\right)^9 = 0.1938 P ( X ≥ 10 ) = ( 1 − p ) 10 − 1 = ( 6 5 ) 9 = 0.1938
(c) Find the expected number of rolls.
E ( X ) = 1 p = 1 1 / 6 = 6 E(X) = \frac{1}{p} = \frac{1}{1/6} = 6 E ( X ) = p 1 = 1/6 1 = 6
On average, 6 rolls are needed to get the first 6.
Negative Binomial Distribution (AHL)
Definition
X ∼ N B ( r , p ) X \sim \mathrm{NB}(r, p) X ∼ NB ( r , p ) models the number of trials needed to obtain exactly r r r successes. The
geometric distribution is the special case N B ( 1 , p ) \mathrm{NB}(1, p) NB ( 1 , p ) .
Probability Mass Function
P ( X = x ) = ( x − 1 r − 1 ) p r ( 1 − p ) x − r , x = r , r + 1 , r + 2 , … P(X = x) = \binom{x-1}{r-1} p^r (1-p)^{x-r}, \quad x = r, r+1, r+2, \ldots P ( X = x ) = ( r − 1 x − 1 ) p r ( 1 − p ) x − r , x = r , r + 1 , r + 2 , …
In the first x − 1 x-1 x − 1 trials there are r − 1 r-1 r − 1 successes (in ( x − 1 r − 1 ) \dbinom{x-1}{r-1} ( r − 1 x − 1 ) ways), and trial x x x is
the r r r -th success.
Mean and Variance
E ( X ) = r p , V a r ( X ) = r ( 1 − p ) p 2 E(X) = \frac{r}{p}, \quad \mathrm{Var}(X) = \frac{r(1-p)}{p^2} E ( X ) = p r , Var ( X ) = p 2 r ( 1 − p )
Note the parallel with geometric: multiplying r r r by a factor scales both E ( X ) E(X) E ( X ) and
V a r ( X ) \mathrm{Var}(X) Var ( X ) by the same factor.
A coin has P ( h e a d s ) = 0.4 P(\mathrm{heads}) = 0.4 P ( heads ) = 0.4 . X ∼ N B ( 3 , 0.4 ) X \sim \mathrm{NB}(3, 0.4) X ∼ NB ( 3 , 0.4 ) counts flips for 3 heads.
P ( X = 7 ) = ( 6 2 ) ( 0.4 ) 3 ( 0.6 ) 4 = 15 × 0.064 × 0.1296 ≈ 0.1244 P(X = 7) = \dbinom{6}{2}(0.4)^3(0.6)^4 = 15 \times 0.064 \times 0.1296 \approx 0.1244 P ( X = 7 ) = ( 2 6 ) ( 0.4 ) 3 ( 0.6 ) 4 = 15 × 0.064 × 0.1296 ≈ 0.1244
E ( X ) = 3 / 0.4 = 7.5 E(X) = 3/0.4 = 7.5 E ( X ) = 3/0.4 = 7.5 , V a r ( X ) = 3 ( 0.6 ) / 0.16 = 11.25 \mathrm{Var}(X) = 3(0.6)/0.16 = 11.25 Var ( X ) = 3 ( 0.6 ) /0.16 = 11.25 ,
σ = 11.25 ≈ 3.354 \sigma = \sqrt{11.25} \approx 3.354 σ = 11.25 ≈ 3.354 .
Central Limit Theorem (AHL)
Statement
If X 1 , X 2 , … , X n X_1, X_2, \ldots, X_n X 1 , X 2 , … , X n are independent and identically distributed with mean μ \mu μ and variance
σ 2 \sigma^2 σ 2 , then for large n n n :
X ˉ n ∼ N ( μ , σ 2 n ) \bar{X}_n \sim N\!\left(\mu, \frac{\sigma^2}{n}\right) X ˉ n ∼ N ( μ , n σ 2 )
This holds regardless of the shape of the original distribution. The rule of thumb is n ≥ 30 n \ge 30 n ≥ 30 .
Distribution of the Sum
The sum S n = X 1 + ⋯ + X n S_n = X_1 + \cdots + X_n S n = X 1 + ⋯ + X n is approximately S n ∼ N ( n μ , n σ 2 ) S_n \sim N(n\mu, n\sigma^2) S n ∼ N ( n μ , n σ 2 ) for large n n n .
Standard Error
S E ( X ˉ ) = σ n \mathrm{SE}(\bar{X}) = \frac{\sigma}{\sqrt{n}} SE ( X ˉ ) = n σ
As n n n increases, the standard error decreases: larger samples give more precise estimates of the
population mean.
Apple masses: mean 150 g, σ = 20 \sigma = 20 σ = 20 g. Sample of 36. Find P ( X ˉ > 155 ) P(\bar{X} \gt 155) P ( X ˉ > 155 ) .
X ˉ ∼ N ( 150 , 400 / 36 ) \bar{X} \sim N(150, 400/36) X ˉ ∼ N ( 150 , 400/36 ) . P ( Z > 5 20 / 6 ) = P ( Z > 1.5 ) = 0.0668 P\!\left(Z \gt \dfrac{5}{20/6}\right) = P(Z \gt 1.5) = 0.0668 P ( Z > 20/6 5 ) = P ( Z > 1.5 ) = 0.0668 .
Example: Sum of uniform variables
X ∼ U ( 2 , 10 ) X \sim U(2, 10) X ∼ U ( 2 , 10 ) . Sample of 50 observations. Find P ( s u m > 310 ) P(\mathrm{sum} \gt 310) P ( sum > 310 ) .
μ = 6 \mu = 6 μ = 6 , σ 2 = 64 / 12 = 16 / 3 \sigma^2 = 64/12 = 16/3 σ 2 = 64/12 = 16/3 . Sum has mean 300 300 300 and variance 50 ( 16 / 3 ) = 800 / 3 50(16/3) = 800/3 50 ( 16/3 ) = 800/3 .
P ( Z > 10 800 / 3 ) = P ( Z > 0.612 ) ≈ 0.2704 P\!\left(Z \gt \dfrac{10}{\sqrt{800/3}}\right) = P(Z \gt 0.612) \approx 0.2704 P ( Z > 800/3 10 ) = P ( Z > 0.612 ) ≈ 0.2704 .
Confidence Intervals (AHL)
Concept
A C % C\% C % confidence interval gives a range of plausible values for an unknown population parameter.
If the sampling process were repeated many times, approximately C % C\% C % of constructed intervals would
contain the true parameter. The confidence level does not mean there is a C % C\% C % probability that the
parameter lies in any particular interval.
Confidence Interval for the Mean (σ \sigma σ known)
x ˉ ± z α / 2 ⋅ σ n \bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}} x ˉ ± z α /2 ⋅ n σ
where z α / 2 z_{\alpha/2} z α /2 satisfies P ( Z > z α / 2 ) = α / 2 P(Z \gt z_{\alpha/2}) = \alpha/2 P ( Z > z α /2 ) = α /2 and α = 1 − C / 100 \alpha = 1 - C/100 α = 1 − C /100 .
Confidence Level z α / 2 z_{\alpha/2} z α /2 90% 1.645 95% 1.960 99% 2.576
When σ \sigma σ is unknown and n n n is large (n ≥ 30 n \ge 30 n ≥ 30 ), replace σ \sigma σ with the sample standard
deviation s s s .
Margin of Error and Sample Size
Margin of error: E = z α / 2 ⋅ σ n E = z_{\alpha/2} \cdot \dfrac{\sigma}{\sqrt{n}} E = z α /2 ⋅ n σ . To halve E E E , quadruple n n n .
Required sample size for margin E E E : n = ( z α / 2 ⋅ σ E ) 2 n = \left(\dfrac{z_{\alpha/2} \cdot \sigma}{E}\right)^2 n = ( E z α /2 ⋅ σ ) 2
(round up to the next integer).
Bottle volumes: N ( μ , 25 ) N(\mu, 25) N ( μ , 25 ) , σ = 5 \sigma = 5 σ = 5 ml. Sample of 25 gives x ˉ = 498 \bar{x} = 498 x ˉ = 498 ml.
95% CI: 498 ± 1.960 × 5 / 25 = 498 ± 1.96 498 \pm 1.960 \times 5/\sqrt{25} = 498 \pm 1.96 498 ± 1.960 × 5/ 25 = 498 ± 1.96 , so ( 496.04 , 499.96 ) (496.04, 499.96) ( 496.04 , 499.96 ) ml.
For margin 1 ml at 95%: n = ( 1.960 × 5 / 1 ) 2 = 96.04 n = (1.960 \times 5/1)^2 = 96.04 n = ( 1.960 × 5/1 ) 2 = 96.04 , round up to 97.
Combining Random Variables
Linear Combinations
For any random variables X X X , Y Y Y and constants a a a , b b b :
E ( a X + b Y ) = a E ( X ) + b E ( Y ) E(aX + bY) = aE(X) + bE(Y) E ( a X + bY ) = a E ( X ) + b E ( Y )
This is the linearity of expectation and holds always, even without independence.
Variance of Sums
For independent X X X and Y Y Y :
V a r ( a X + b Y ) = a 2 V a r ( X ) + b 2 V a r ( Y ) \mathrm{Var}(aX + bY) = a^2\mathrm{Var}(X) + b^2\mathrm{Var}(Y) Var ( a X + bY ) = a 2 Var ( X ) + b 2 Var ( Y )
V a r ( X + Y ) = V a r ( X ) + V a r ( Y ) , V a r ( X − Y ) = V a r ( X ) + V a r ( Y ) \mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y), \quad \mathrm{Var}(X - Y) = \mathrm{Var}(X) + \mathrm{Var}(Y) Var ( X + Y ) = Var ( X ) + Var ( Y ) , Var ( X − Y ) = Var ( X ) + Var ( Y )
Note the plus sign even for differences: subtracting a variable still adds variability.
The general formula (not necessarily independent):
V a r ( X + Y ) = V a r ( X ) + V a r ( Y ) + 2 C o v ( X , Y ) \mathrm{Var}(X + Y) = \mathrm{Var}(X) + \mathrm{Var}(Y) + 2\mathrm{Cov}(X, Y) Var ( X + Y ) = Var ( X ) + Var ( Y ) + 2 Cov ( X , Y )
where C o v ( X , Y ) = E ( X Y ) − E ( X ) E ( Y ) = 0 \mathrm{Cov}(X, Y) = E(XY) - E(X)E(Y) = 0 Cov ( X , Y ) = E ( X Y ) − E ( X ) E ( Y ) = 0 when X X X and Y Y Y are independent.
Linearity of expectation always holds. The simple variance formula
V a r ( X + Y ) = V a r ( X ) + V a r ( Y ) \mathrm{Var}(X+Y) = \mathrm{Var}(X) + \mathrm{Var}(Y) Var ( X + Y ) = Var ( X ) + Var ( Y ) requires independence.
Independent Copies
If X 1 , … , X n X_1, \ldots, X_n X 1 , … , X n are iid with mean μ \mu μ and variance σ 2 \sigma^2 σ 2 :
E ( X 1 + ⋯ + X n ) = n μ , V a r ( X 1 + ⋯ + X n ) = n σ 2 E(X_1 + \cdots + X_n) = n\mu, \quad \mathrm{Var}(X_1 + \cdots + X_n) = n\sigma^2 E ( X 1 + ⋯ + X n ) = n μ , Var ( X 1 + ⋯ + X n ) = n σ 2
E ( X ˉ ) = μ , V a r ( X ˉ ) = σ 2 n E(\bar{X}) = \mu, \quad \mathrm{Var}(\bar{X}) = \frac{\sigma^2}{n} E ( X ˉ ) = μ , Var ( X ˉ ) = n σ 2
Combining Normal Variables
If X ∼ N ( μ X , σ X 2 ) X \sim N(\mu_X, \sigma_X^2) X ∼ N ( μ X , σ X 2 ) and Y ∼ N ( μ Y , σ Y 2 ) Y \sim N(\mu_Y, \sigma_Y^2) Y ∼ N ( μ Y , σ Y 2 ) are independent, then:
a X + b Y ∼ N ( a μ X + b μ Y , a 2 σ X 2 + b 2 σ Y 2 ) aX + bY \sim N(a\mu_X + b\mu_Y, a^2\sigma_X^2 + b^2\sigma_Y^2) a X + bY ∼ N ( a μ X + b μ Y , a 2 σ X 2 + b 2 σ Y 2 )
This is exact (not an approximation) for normal variables, and requires no CLT.
X ∼ B ( 10 , 0.3 ) X \sim B(10, 0.3) X ∼ B ( 10 , 0.3 ) , Y ∼ B ( 15 , 0.4 ) Y \sim B(15, 0.4) Y ∼ B ( 15 , 0.4 ) , independent.
E ( X + Y ) = 3 + 6 = 9 E(X + Y) = 3 + 6 = 9 E ( X + Y ) = 3 + 6 = 9
V a r ( X + Y ) = 10 ( 0.3 ) ( 0.7 ) + 15 ( 0.4 ) ( 0.6 ) = 2.1 + 3.6 = 5.7 \mathrm{Var}(X + Y) = 10(0.3)(0.7) + 15(0.4)(0.6) = 2.1 + 3.6 = 5.7 Var ( X + Y ) = 10 ( 0.3 ) ( 0.7 ) + 15 ( 0.4 ) ( 0.6 ) = 2.1 + 3.6 = 5.7
V a r ( 2 X − 3 Y ) = 4 ( 2.1 ) + 9 ( 3.6 ) = 8.4 + 32.4 = 40.8 \mathrm{Var}(2X - 3Y) = 4(2.1) + 9(3.6) = 8.4 + 32.4 = 40.8 Var ( 2 X − 3 Y ) = 4 ( 2.1 ) + 9 ( 3.6 ) = 8.4 + 32.4 = 40.8
Example: Normal combinations
Bus ride X ∼ N ( 25 , 16 ) X \sim N(25, 16) X ∼ N ( 25 , 16 ) , walk Y ∼ N ( 10 , 9 ) Y \sim N(10, 9) Y ∼ N ( 10 , 9 ) , independent.
X + Y ∼ N ( 35 , 25 ) X + Y \sim N(35, 25) X + Y ∼ N ( 35 , 25 ) . P ( X + Y > 40 ) = P ( Z > 1 ) = 0.1587 P(X + Y \gt 40) = P(Z \gt 1) = 0.1587 P ( X + Y > 40 ) = P ( Z > 1 ) = 0.1587 .
Machine A produces rods: X ∼ N ( 50.0 , 0.04 ) X \sim N(50.0, 0.04) X ∼ N ( 50.0 , 0.04 ) . Machine B: Y ∼ N ( 50.2 , 0.09 ) Y \sim N(50.2, 0.09) Y ∼ N ( 50.2 , 0.09 ) .
X − Y ∼ N ( − 0.2 , 0.13 ) X - Y \sim N(-0.2, 0.13) X − Y ∼ N ( − 0.2 , 0.13 ) .
P ( X − Y > 0 ) = P ( Z > 0.2 0.13 ) = P ( Z > 0.555 ) ≈ 0.2894 P(X - Y \gt 0) = P\!\left(Z \gt \dfrac{0.2}{\sqrt{0.13}}\right) = P(Z \gt 0.555) \approx 0.2894 P ( X − Y > 0 ) = P ( Z > 0.13 0.2 ) = P ( Z > 0.555 ) ≈ 0.2894 .
Worked Example: Combining Random Variables X ∼ B ( 12 , 0.3 ) X \sim B(12, 0.3) X ∼ B ( 12 , 0.3 ) and Y ∼ P o ( 5 ) Y \sim \mathrm{Po}(5) Y ∼ Po ( 5 ) are independent. Find:
(a) E ( 3 X − 2 Y ) E(3X - 2Y) E ( 3 X − 2 Y )
E ( 3 X − 2 Y ) = 3 E ( X ) − 2 E ( Y ) = 3 ( 12 × 0.3 ) − 2 ( 5 ) = 3 ( 3.6 ) − 10 = 10.8 − 10 = 0.8 E(3X - 2Y) = 3E(X) - 2E(Y) = 3(12 \times 0.3) - 2(5) = 3(3.6) - 10 = 10.8 - 10 = 0.8 E ( 3 X − 2 Y ) = 3 E ( X ) − 2 E ( Y ) = 3 ( 12 × 0.3 ) − 2 ( 5 ) = 3 ( 3.6 ) − 10 = 10.8 − 10 = 0.8
(b) V a r ( 3 X − 2 Y ) \mathrm{Var}(3X - 2Y) Var ( 3 X − 2 Y )
V a r ( 3 X − 2 Y ) = 9 V a r ( X ) + 4 V a r ( Y ) = 9 ( 12 × 0.3 × 0.7 ) + 4 ( 5 ) \mathrm{Var}(3X - 2Y) = 9\mathrm{Var}(X) + 4\mathrm{Var}(Y) = 9(12 \times 0.3 \times 0.7) + 4(5) Var ( 3 X − 2 Y ) = 9 Var ( X ) + 4 Var ( Y ) = 9 ( 12 × 0.3 × 0.7 ) + 4 ( 5 )
= 9 ( 2.52 ) + 20 = 22.68 + 20 = 42.68 = 9(2.52) + 20 = 22.68 + 20 = 42.68 = 9 ( 2.52 ) + 20 = 22.68 + 20 = 42.68
Note: the variance of the difference uses addition (plus signs for both terms), and the constants
are squared.
IB Exam-Style Questions
Question 1 (Paper 1)
X ∼ B ( 20 , 0.35 ) X \sim B(20, 0.35) X ∼ B ( 20 , 0.35 ) . Find P ( 5 ≤ X ≤ 8 ) P(5 \le X \le 8) P ( 5 ≤ X ≤ 8 ) .
P ( 5 ≤ X ≤ 8 ) = P ( X ≤ 8 ) − P ( X ≤ 4 ) ≈ 0.7625 − 0.1260 = 0.6365 P(5 \le X \le 8) = P(X \le 8) - P(X \le 4) \approx 0.7625 - 0.1260 = 0.6365 P ( 5 ≤ X ≤ 8 ) = P ( X ≤ 8 ) − P ( X ≤ 4 ) ≈ 0.7625 − 0.1260 = 0.6365
Question 2 (Paper 1)
X ∼ P o ( 4.2 ) X \sim \mathrm{Po}(4.2) X ∼ Po ( 4.2 ) . Find P ( X ≥ 3 ) P(X \ge 3) P ( X ≥ 3 ) .
P ( X ≥ 3 ) = 1 − P ( X ≤ 2 ) = 1 − e − 4.2 ( 1 + 4.2 + 8.82 ) ≈ 1 − 0.2103 = 0.7897 P(X \ge 3) = 1 - P(X \le 2) = 1 - e^{-4.2}(1 + 4.2 + 8.82) \approx 1 - 0.2103 = 0.7897 P ( X ≥ 3 ) = 1 − P ( X ≤ 2 ) = 1 − e − 4.2 ( 1 + 4.2 + 8.82 ) ≈ 1 − 0.2103 = 0.7897
Question 3 (Paper 2)
Daily rainfall: X ∼ N ( 2.8 , 1.44 ) X \sim N(2.8, 1.44) X ∼ N ( 2.8 , 1.44 ) (mean 2.8 mm, σ = 1.2 \sigma = 1.2 σ = 1.2 mm).
P ( X > 4 ) = P ( Z > 1 ) = 0.1587 P(X \gt 4) = P(Z \gt 1) = 0.1587 P ( X > 4 ) = P ( Z > 1 ) = 0.1587
Expected days per year exceeding 4 mm: 365 × 0.1587 ≈ 58 365 \times 0.1587 \approx 58 365 × 0.1587 ≈ 58 days.
Rainfall exceeded on only 5% of days: x = 2.8 + 1.645 ( 1.2 ) = 4.774 x = 2.8 + 1.645(1.2) = 4.774 x = 2.8 + 1.645 ( 1.2 ) = 4.774 mm.
Question 4 (Paper 2, AHL)
X ∼ U ( 0 , a ) X \sim U(0, a) X ∼ U ( 0 , a ) has P ( X > 3 ) = 0.4 P(X \gt 3) = 0.4 P ( X > 3 ) = 0.4 . Find a a a and V a r ( X ) \mathrm{Var}(X) Var ( X ) .
a − 3 a = 0.4 ⟹ 0.6 a = 3 ⟹ a = 5 \dfrac{a-3}{a} = 0.4 \implies 0.6a = 3 \implies a = 5 a a − 3 = 0.4 ⟹ 0.6 a = 3 ⟹ a = 5
V a r ( X ) = 25 / 12 ≈ 2.083 \mathrm{Var}(X) = 25/12 \approx 2.083 Var ( X ) = 25/12 ≈ 2.083
Question 5 (Paper 2, AHL)
X ∼ G e o ( 0.15 ) X \sim \mathrm{Geo}(0.15) X ∼ Geo ( 0.15 ) . Find the smallest n n n with P ( X ≤ n ) ≥ 0.8 P(X \le n) \ge 0.8 P ( X ≤ n ) ≥ 0.8 .
P ( X ≤ n ) = 1 − 0.85 n ≥ 0.8 ⟹ 0.85 n ≤ 0.2 P(X \le n) = 1 - 0.85^n \ge 0.8 \implies 0.85^n \le 0.2 P ( X ≤ n ) = 1 − 0.8 5 n ≥ 0.8 ⟹ 0.8 5 n ≤ 0.2
n ≥ ln ( 0.2 ) / ln ( 0.85 ) ≈ 9.90 n \ge \ln(0.2)/\ln(0.85) \approx 9.90 n ≥ ln ( 0.2 ) / ln ( 0.85 ) ≈ 9.90 , so n = 10 n = 10 n = 10 .
Question 6 (Paper 2, AHL)
Component lengths: N ( μ , 0.25 ) N(\mu, 0.25) N ( μ , 0.25 ) , σ = 0.5 \sigma = 0.5 σ = 0.5 mm. Sample of 30 gives x ˉ = 100.2 \bar{x} = 100.2 x ˉ = 100.2 mm.
90% CI: 100.2 ± 1.645 × 0.5 / 30 = 100.2 ± 0.150 100.2 \pm 1.645 \times 0.5/\sqrt{30} = 100.2 \pm 0.150 100.2 ± 1.645 × 0.5/ 30 = 100.2 ± 0.150 , so ( 100.05 , 100.35 ) (100.05, 100.35) ( 100.05 , 100.35 ) mm.
The claim μ = 100 \mu = 100 μ = 100 mm is not supported at 90% confidence, since 100 falls below the interval.
Question 7 (Paper 2, AHL)
X ∼ N B ( 4 , 0.25 ) X \sim \mathrm{NB}(4, 0.25) X ∼ NB ( 4 , 0.25 ) . Find P ( X = 10 ) P(X = 10) P ( X = 10 ) and E ( X ) E(X) E ( X ) .
P ( X = 10 ) = ( 9 3 ) ( 0.25 ) 4 ( 0.75 ) 6 = 84 × 0.003906 × 0.1780 ≈ 0.0584 P(X = 10) = \dbinom{9}{3}(0.25)^4(0.75)^6 = 84 \times 0.003906 \times 0.1780 \approx 0.0584 P ( X = 10 ) = ( 3 9 ) ( 0.25 ) 4 ( 0.75 ) 6 = 84 × 0.003906 × 0.1780 ≈ 0.0584
E ( X ) = 4 / 0.25 = 16 E(X) = 4/0.25 = 16 E ( X ) = 4/0.25 = 16
Question 8 (Paper 2, AHL)
The masses of male students are N ( 72 , 36 ) N(72, 36) N ( 72 , 36 ) and female students are N ( 58 , 25 ) N(58, 25) N ( 58 , 25 ) , independent. Find
the probability that a randomly chosen male is heavier than a randomly chosen female.
Let M ∼ N ( 72 , 36 ) M \sim N(72, 36) M ∼ N ( 72 , 36 ) and F ∼ N ( 58 , 25 ) F \sim N(58, 25) F ∼ N ( 58 , 25 ) . Then D = M − F ∼ N ( 72 − 58 , 36 + 25 ) = N ( 14 , 61 ) D = M - F \sim N(72-58, 36+25) = N(14, 61) D = M − F ∼ N ( 72 − 58 , 36 + 25 ) = N ( 14 , 61 ) .
P ( D > 0 ) = P ( Z > 0 − 14 61 ) = P ( Z > − 1.793 ) = Φ ( 1.793 ) ≈ 0.9636 P(D \gt 0) = P\!\left(Z \gt \dfrac{0 - 14}{\sqrt{61}}\right) = P(Z \gt -1.793) = \Phi(1.793) \approx 0.9636 P ( D > 0 ) = P ( Z > 61 0 − 14 ) = P ( Z > − 1.793 ) = Φ ( 1.793 ) ≈ 0.9636
Summary of Distributions
Discrete Distributions
Distribution Notation PMF E ( X ) E(X) E ( X ) V a r ( X ) \mathrm{Var}(X) Var ( X ) Support Binomial B ( n , p ) B(n, p) B ( n , p ) ( n x ) p x ( 1 − p ) n − x \dbinom{n}{x}p^x(1-p)^{n-x} ( x n ) p x ( 1 − p ) n − x n p np n p n p ( 1 − p ) np(1-p) n p ( 1 − p ) 0 , 1 , … , n 0, 1, \ldots, n 0 , 1 , … , n Poisson P o ( λ ) \mathrm{Po}(\lambda) Po ( λ ) e − λ λ x x ! \dfrac{e^{-\lambda}\lambda^x}{x!} x ! e − λ λ x λ \lambda λ λ \lambda λ 0 , 1 , 2 , … 0, 1, 2, \ldots 0 , 1 , 2 , … Geometric (AHL) G e o ( p ) \mathrm{Geo}(p) Geo ( p ) ( 1 − p ) x − 1 p (1-p)^{x-1}p ( 1 − p ) x − 1 p 1 p \dfrac{1}{p} p 1 1 − p p 2 \dfrac{1-p}{p^2} p 2 1 − p 1 , 2 , 3 , … 1, 2, 3, \ldots 1 , 2 , 3 , … Neg. Binomial (AHL) N B ( r , p ) \mathrm{NB}(r, p) NB ( r , p ) ( x − 1 r − 1 ) p r ( 1 − p ) x − r \dbinom{x-1}{r-1}p^r(1-p)^{x-r} ( r − 1 x − 1 ) p r ( 1 − p ) x − r r p \dfrac{r}{p} p r r ( 1 − p ) p 2 \dfrac{r(1-p)}{p^2} p 2 r ( 1 − p ) r , r + 1 , … r, r+1, \ldots r , r + 1 , …
Continuous Distributions
Distribution Notation PDF E ( X ) E(X) E ( X ) V a r ( X ) \mathrm{Var}(X) Var ( X ) Support Normal N ( μ , σ 2 ) N(\mu, \sigma^2) N ( μ , σ 2 ) 1 σ 2 π e − ( x − μ ) 2 2 σ 2 \dfrac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}} σ 2 π 1 e − 2 σ 2 ( x − μ ) 2 μ \mu μ σ 2 \sigma^2 σ 2 ( − ∞ , ∞ ) (-\infty, \infty) ( − ∞ , ∞ ) Uniform (AHL) U ( a , b ) U(a, b) U ( a , b ) 1 b − a \dfrac{1}{b-a} b − a 1 a + b 2 \dfrac{a+b}{2} 2 a + b ( b − a ) 2 12 \dfrac{(b-a)^2}{12} 12 ( b − a ) 2 [ a , b ] [a, b] [ a , b ]
Key Relationships
Relationship Condition B ( n , p ) ≈ P o ( n p ) B(n, p) \approx \mathrm{Po}(np) B ( n , p ) ≈ Po ( n p ) n n n large, p p p small, n p np n p moderateB ( n , p ) ≈ N ( n p , n p ( 1 − p ) ) B(n, p) \approx N(np, np(1-p)) B ( n , p ) ≈ N ( n p , n p ( 1 − p )) n p ≥ 5 np \ge 5 n p ≥ 5 , n ( 1 − p ) ≥ 5 n(1-p) \ge 5 n ( 1 − p ) ≥ 5 , with continuity correctionG e o ( p ) = N B ( 1 , p ) \mathrm{Geo}(p) = \mathrm{NB}(1, p) Geo ( p ) = NB ( 1 , p ) Special case X + Y ∼ P o ( λ 1 + λ 2 ) X + Y \sim \mathrm{Po}(\lambda_1 + \lambda_2) X + Y ∼ Po ( λ 1 + λ 2 ) Independent Poisson variables a X + b Y ∼ N ( a μ X + b μ Y , a 2 σ X 2 + b 2 σ Y 2 ) aX + bY \sim N(a\mu_X + b\mu_Y, a^2\sigma_X^2 + b^2\sigma_Y^2) a X + bY ∼ N ( a μ X + b μ Y , a 2 σ X 2 + b 2 σ Y 2 ) Independent normal variables X ˉ n ≈ N ( μ , σ 2 / n ) \bar{X}_n \approx N(\mu, \sigma^2/n) X ˉ n ≈ N ( μ , σ 2 / n ) CLT, large n n n E ( a X + b Y ) = a E ( X ) + b E ( Y ) E(aX + bY) = aE(X) + bE(Y) E ( a X + bY ) = a E ( X ) + b E ( Y ) Always V a r ( a X + b Y ) = a 2 V a r ( X ) + b 2 V a r ( Y ) \mathrm{Var}(aX + bY) = a^2\mathrm{Var}(X) + b^2\mathrm{Var}(Y) Var ( a X + bY ) = a 2 Var ( X ) + b 2 Var ( Y ) X X X , Y Y Y independent
Common Pitfalls
Confusing p p p and λ \lambda λ : For Poisson, λ \lambda λ is a rate, not a probability. Unlike
binomial p p p , there is no upper bound of 1 on λ \lambda λ .
Forgetting conditions : Before applying a distribution, verify all conditions. For binomial:
fixed n n n , independence, two outcomes, constant p p p .
Variance of differences : V a r ( X − Y ) = V a r ( X ) + V a r ( Y ) \mathrm{Var}(X - Y) = \mathrm{Var}(X) + \mathrm{Var}(Y) Var ( X − Y ) = Var ( X ) + Var ( Y ) (plus, not
minus) for independent variables.
Continuity correction : When approximating a discrete distribution with a continuous one,
apply a continuity correction. For example, P ( X ≤ 5 ) P(X \le 5) P ( X ≤ 5 ) becomes P ( X < 5.5 ) P(X \lt 5.5) P ( X < 5.5 ) under the normal
approximation.
Standardisation direction : Φ ( z ) \Phi(z) Φ ( z ) goes from z z z -score to probability; Φ − 1 ( p ) \Phi^{-1}(p) Φ − 1 ( p ) goes
from probability to z z z -score.
Geometric support : X ∼ G e o ( p ) X \sim \mathrm{Geo}(p) X ∼ Geo ( p ) counts trials starting from 1 (IB convention).
Poisson additivity : Requires independence. If events are correlated, the sum is not Poisson.
Confidence interval interpretation : A 95% CI does not mean there is a 95% probability that
μ \mu μ lies in the interval. It means 95% of similarly constructed intervals contain μ \mu μ .
Squaring constants in variance : V a r ( 3 X ) = 9 V a r ( X ) \mathrm{Var}(3X) = 9\mathrm{Var}(X) Var ( 3 X ) = 9 Var ( X ) , not
3 V a r ( X ) 3\mathrm{Var}(X) 3 Var ( X ) .
Always define your random variable and state the distribution with parameters at the start. For
normal problems, sketch the bell curve and shade the relevant area. When combining variables,
clearly state whether independence is assumed. For confidence intervals, state the level and
interpret in context.
Problem Set
Problem 1
A discrete random variable X X X has PMF P ( X = x ) = x + 1 15 P(X = x) = \frac{x + 1}{15} P ( X = x ) = 15 x + 1 for x = 0 , 1 , 2 , 3 , 4 x = 0, 1, 2, 3, 4 x = 0 , 1 , 2 , 3 , 4 . Find
E ( X ) E(X) E ( X ) , V a r ( X ) \mathrm{Var}(X) Var ( X ) , and P ( X ≥ 2 ) P(X \ge 2) P ( X ≥ 2 ) .
Solution Verify: ∑ x = 0 4 x + 1 15 = 1 + 2 + 3 + 4 + 5 15 = 15 15 = 1 \sum_{x=0}^{4}\frac{x+1}{15} = \frac{1+2+3+4+5}{15} = \frac{15}{15} = 1 ∑ x = 0 4 15 x + 1 = 15 1 + 2 + 3 + 4 + 5 = 15 15 = 1 .
E ( X ) = 0 ( 1 15 ) + 1 ( 2 15 ) + 2 ( 3 15 ) + 3 ( 4 15 ) + 4 ( 5 15 ) E(X) = 0\!\left(\frac{1}{15}\right) + 1\!\left(\frac{2}{15}\right) + 2\!\left(\frac{3}{15}\right) + 3\!\left(\frac{4}{15}\right) + 4\!\left(\frac{5}{15}\right) E ( X ) = 0 ( 15 1 ) + 1 ( 15 2 ) + 2 ( 15 3 ) + 3 ( 15 4 ) + 4 ( 15 5 )
= 0 + 2 + 6 + 12 + 20 15 = 40 15 = 8 3 ≈ 2.667 = \frac{0 + 2 + 6 + 12 + 20}{15} = \frac{40}{15} = \frac{8}{3} \approx 2.667 = 15 0 + 2 + 6 + 12 + 20 = 15 40 = 3 8 ≈ 2.667
E ( X 2 ) = 0 + 1 ( 2 15 ) + 4 ( 3 15 ) + 9 ( 4 15 ) + 16 ( 5 15 ) = 0 + 2 + 12 + 36 + 80 15 = 130 15 = 26 3 E(X^2) = 0 + 1\!\left(\frac{2}{15}\right) + 4\!\left(\frac{3}{15}\right) + 9\!\left(\frac{4}{15}\right) + 16\!\left(\frac{5}{15}\right) = \frac{0 + 2 + 12 + 36 + 80}{15} = \frac{130}{15} = \frac{26}{3} E ( X 2 ) = 0 + 1 ( 15 2 ) + 4 ( 15 3 ) + 9 ( 15 4 ) + 16 ( 15 5 ) = 15 0 + 2 + 12 + 36 + 80 = 15 130 = 3 26
V a r ( X ) = 26 3 − ( 8 3 ) 2 = 26 3 − 64 9 = 78 − 64 9 = 14 9 ≈ 1.556 \mathrm{Var}(X) = \frac{26}{3} - \left(\frac{8}{3}\right)^2 = \frac{26}{3} - \frac{64}{9} = \frac{78 - 64}{9} = \frac{14}{9} \approx 1.556 Var ( X ) = 3 26 − ( 3 8 ) 2 = 3 26 − 9 64 = 9 78 − 64 = 9 14 ≈ 1.556
P ( X ≥ 2 ) = 3 15 + 4 15 + 5 15 = 12 15 = 4 5 = 0.8 P(X \ge 2) = \frac{3}{15} + \frac{4}{15} + \frac{5}{15} = \frac{12}{15} = \frac{4}{5} = 0.8 P ( X ≥ 2 ) = 15 3 + 15 4 + 15 5 = 15 12 = 5 4 = 0.8
If you get this wrong, revise: Discrete Random Variables section.
Problem 2
X ∼ B ( 25 , 0.35 ) X \sim B(25, 0.35) X ∼ B ( 25 , 0.35 ) . Find P ( X = 10 ) P(X = 10) P ( X = 10 ) , P ( X ≤ 5 ) P(X \le 5) P ( X ≤ 5 ) , and P ( X ≥ 15 ) P(X \ge 15) P ( X ≥ 15 ) .
Solution P ( X = 10 ) = ( 25 10 ) ( 0.35 ) 10 ( 0.65 ) 15 ≈ 0.1268 P(X = 10) = \binom{25}{10}(0.35)^{10}(0.65)^{15} \approx 0.1268 P ( X = 10 ) = ( 10 25 ) ( 0.35 ) 10 ( 0.65 ) 15 ≈ 0.1268
P ( X ≤ 5 ) = ∑ x = 0 5 ( 25 x ) ( 0.35 ) x ( 0.65 ) 25 − x ≈ 0.0334 P(X \le 5) = \sum_{x=0}^{5}\binom{25}{x}(0.35)^x(0.65)^{25-x} \approx 0.0334 P ( X ≤ 5 ) = ∑ x = 0 5 ( x 25 ) ( 0.35 ) x ( 0.65 ) 25 − x ≈ 0.0334
P ( X ≥ 15 ) = 1 − P ( X ≤ 14 ) ≈ 1 − 0.9752 = 0.0248 P(X \ge 15) = 1 - P(X \le 14) \approx 1 - 0.9752 = 0.0248 P ( X ≥ 15 ) = 1 − P ( X ≤ 14 ) ≈ 1 − 0.9752 = 0.0248
If you get this wrong, revise: Binomial Distribution section.
Problem 3
A bookshop sells an average of 3.2 rare books per week. X ∼ P o ( 3.2 ) X \sim \mathrm{Po}(3.2) X ∼ Po ( 3.2 ) is the number sold
in a week. Find P ( X = 4 ) P(X = 4) P ( X = 4 ) , P ( X = 0 ) P(X = 0) P ( X = 0 ) , and P ( X > 5 ) P(X \gt 5) P ( X > 5 ) .
Solution P ( X = 4 ) = e − 3.2 ⋅ 3.2 4 4 ! = 104.858 × e − 3.2 24 ≈ 0.1781 P(X = 4) = \frac{e^{-3.2} \cdot 3.2^4}{4!} = \frac{104.858 \times e^{-3.2}}{24} \approx 0.1781 P ( X = 4 ) = 4 ! e − 3.2 ⋅ 3. 2 4 = 24 104.858 × e − 3.2 ≈ 0.1781
P ( X = 0 ) = e − 3.2 ≈ 0.0408 P(X = 0) = e^{-3.2} \approx 0.0408 P ( X = 0 ) = e − 3.2 ≈ 0.0408
P ( X > 5 ) = 1 − P ( X ≤ 5 ) = 1 − e − 3.2 ( 1 + 3.2 + 10.24 2 + 32.768 6 + 104.858 24 + 335.544 120 ) P(X \gt 5) = 1 - P(X \le 5) = 1 - e^{-3.2}\!\left(1 + 3.2 + \frac{10.24}{2} + \frac{32.768}{6} + \frac{104.858}{24} + \frac{335.544}{120}\right) P ( X > 5 ) = 1 − P ( X ≤ 5 ) = 1 − e − 3.2 ( 1 + 3.2 + 2 10.24 + 6 32.768 + 24 104.858 + 120 335.544 )
= 1 − e − 3.2 ( 1 + 3.2 + 5.12 + 5.461 + 4.369 + 2.796 ) = 1 − e − 3.2 ( 21.946 ) ≈ 1 − 0.8955 = 0.1045 = 1 - e^{-3.2}(1 + 3.2 + 5.12 + 5.461 + 4.369 + 2.796) = 1 - e^{-3.2}(21.946) \approx 1 - 0.8955 = 0.1045 = 1 − e − 3.2 ( 1 + 3.2 + 5.12 + 5.461 + 4.369 + 2.796 ) = 1 − e − 3.2 ( 21.946 ) ≈ 1 − 0.8955 = 0.1045
If you get this wrong, revise: Poisson Distribution section.
Problem 4
Exam scores follow N ( 65 , 64 ) N(65, 64) N ( 65 , 64 ) (mean 65, variance 64). Find the probability that a randomly chosen
student scores above 75, and the score that is exceeded by only 10% of students.
Solution μ = 65 \mu = 65 μ = 65 , σ = 64 = 8 \sigma = \sqrt{64} = 8 σ = 64 = 8 .
P ( X > 75 ) = P ( Z > 75 − 65 8 ) = P ( Z > 1.25 ) = 1 − Φ ( 1.25 ) ≈ 1 − 0.8944 = 0.1056 P(X \gt 75) = P\!\left(Z \gt \frac{75 - 65}{8}\right) = P(Z \gt 1.25) = 1 - \Phi(1.25) \approx 1 - 0.8944 = 0.1056 P ( X > 75 ) = P ( Z > 8 75 − 65 ) = P ( Z > 1.25 ) = 1 − Φ ( 1.25 ) ≈ 1 − 0.8944 = 0.1056
For the 90th percentile (exceeded by only 10%):
P ( X ≤ x ) = 0.90 ⟹ x − 65 8 = 1.282 ⟹ x = 65 + 1.282 ( 8 ) = 75.26 P(X \le x) = 0.90 \implies \frac{x - 65}{8} = 1.282 \implies x = 65 + 1.282(8) = 75.26 P ( X ≤ x ) = 0.90 ⟹ 8 x − 65 = 1.282 ⟹ x = 65 + 1.282 ( 8 ) = 75.26
A score of approximately 75.3 is exceeded by only 10% of students.
If you get this wrong, revise: Normal Distribution section.
Problem 5
The waiting time for a train is uniformly distributed between 0 and 12 minutes. Find the probability
that the waiting time is (a) less than 5 minutes, (b) between 7 and 10 minutes, (c) more than 8
minutes given that it has already been 3 minutes.
Solution X ∼ U ( 0 , 12 ) X \sim U(0, 12) X ∼ U ( 0 , 12 ) .
(a) P ( X < 5 ) = 5 / 12 ≈ 0.4167 P(X \lt 5) = 5/12 \approx 0.4167 P ( X < 5 ) = 5/12 ≈ 0.4167
(b) P ( 7 < X < 10 ) = ( 10 − 7 ) / 12 = 3 / 12 = 0.25 P(7 \lt X \lt 10) = (10 - 7)/12 = 3/12 = 0.25 P ( 7 < X < 10 ) = ( 10 − 7 ) /12 = 3/12 = 0.25
(c) Given 3 minutes already waited, the remaining time is U ( 0 , 9 ) U(0, 9) U ( 0 , 9 ) (memoryless property of the
uniform distribution):
P ( r e m a i n i n g > 5 ) = 4 / 9 ≈ 0.4444 P(\mathrm{remaining} \gt 5) = 4/9 \approx 0.4444 P ( remaining > 5 ) = 4/9 ≈ 0.4444
Alternatively: P ( X > 8 ∣ X > 3 ) = P ( X > 8 ) / P ( X > 3 ) = ( 4 / 12 ) / ( 9 / 12 ) = 4 / 9 P(X \gt 8 \mid X \gt 3) = P(X \gt 8)/P(X \gt 3) = (4/12)/(9/12) = 4/9 P ( X > 8 ∣ X > 3 ) = P ( X > 8 ) / P ( X > 3 ) = ( 4/12 ) / ( 9/12 ) = 4/9 .
If you get this wrong, revise: Continuous Uniform Distribution section.
Problem 6
X ∼ G e o ( 0.25 ) X \sim \mathrm{Geo}(0.25) X ∼ Geo ( 0.25 ) . Find the smallest n n n such that P ( X ≤ n ) ≥ 0.95 P(X \le n) \ge 0.95 P ( X ≤ n ) ≥ 0.95 .
Solution P ( X ≤ n ) = 1 − ( 1 − p ) n = 1 − 0.75 n ≥ 0.95 P(X \le n) = 1 - (1 - p)^n = 1 - 0.75^n \ge 0.95 P ( X ≤ n ) = 1 − ( 1 − p ) n = 1 − 0.7 5 n ≥ 0.95
0.75 n ≤ 0.05 0.75^n \le 0.05 0.7 5 n ≤ 0.05
n ln ( 0.75 ) ≤ ln ( 0.05 ) n \ln(0.75) \le \ln(0.05) n ln ( 0.75 ) ≤ ln ( 0.05 )
n ≥ ln ( 0.05 ) ln ( 0.75 ) = − 2.996 − 0.288 = 10.40 n \ge \frac{\ln(0.05)}{\ln(0.75)} = \frac{-2.996}{-0.288} = 10.40 n ≥ l n ( 0.75 ) l n ( 0.05 ) = − 0.288 − 2.996 = 10.40
So n = 11 n = 11 n = 11 trials are needed.
If you get this wrong, revise: Geometric Distribution section.
Problem 7
X ∼ N B ( 3 , 0.2 ) X \sim \mathrm{NB}(3, 0.2) X ∼ NB ( 3 , 0.2 ) . Find P ( X = 8 ) P(X = 8) P ( X = 8 ) and V a r ( X ) \mathrm{Var}(X) Var ( X ) .
Solution P ( X = 8 ) = ( 7 2 ) ( 0.2 ) 3 ( 0.8 ) 5 = 21 × 0.008 × 0.32768 = 0.05505 P(X = 8) = \binom{7}{2}(0.2)^3(0.8)^5 = 21 \times 0.008 \times 0.32768 = 0.05505 P ( X = 8 ) = ( 2 7 ) ( 0.2 ) 3 ( 0.8 ) 5 = 21 × 0.008 × 0.32768 = 0.05505
E ( X ) = 3 0.2 = 15 E(X) = \frac{3}{0.2} = 15 E ( X ) = 0.2 3 = 15
V a r ( X ) = 3 ( 0.8 ) 0.04 = 2.4 0.04 = 60 \mathrm{Var}(X) = \frac{3(0.8)}{0.04} = \frac{2.4}{0.04} = 60 Var ( X ) = 0.04 3 ( 0.8 ) = 0.04 2.4 = 60
σ = 60 ≈ 7.75 \sigma = \sqrt{60} \approx 7.75 σ = 60 ≈ 7.75
If you get this wrong, revise: Negative Binomial Distribution section.
Problem 8
The masses of packets of sugar are normally distributed with mean 500 g 500\,\mathrm{g} 500 g and standard
deviation 5 g 5\,\mathrm{g} 5 g . A sample of 36 packets is selected. Find the probability that the sample
mean is between 498 g 498\,\mathrm{g} 498 g and 503 g 503\,\mathrm{g} 503 g .
Solution By the CLT:
X ˉ ∼ N ( 500 , 25 36 ) \bar{X} \sim N\!\left(500, \frac{25}{36}\right) X ˉ ∼ N ( 500 , 36 25 )
σ X ˉ = 5 6 ≈ 0.833 \sigma_{\bar{X}} = \frac{5}{6} \approx 0.833 σ X ˉ = 6 5 ≈ 0.833
P ( 498 < X ˉ < 503 ) = P ( 498 − 500 5 / 6 < Z < 503 − 500 5 / 6 ) = P ( − 2.4 < Z < 3.6 ) P(498 \lt \bar{X} \lt 503) = P\!\left(\frac{498 - 500}{5/6} \lt Z \lt \frac{503 - 500}{5/6}\right) = P(-2.4 \lt Z \lt 3.6) P ( 498 < X ˉ < 503 ) = P ( 5/6 498 − 500 < Z < 5/6 503 − 500 ) = P ( − 2.4 < Z < 3.6 )
= Φ ( 3.6 ) − Φ ( − 2.4 ) = 0.9998 − 0.0082 = 0.9916 = \Phi(3.6) - \Phi(-2.4) = 0.9998 - 0.0082 = 0.9916 = Φ ( 3.6 ) − Φ ( − 2.4 ) = 0.9998 − 0.0082 = 0.9916
If you get this wrong, revise: Central Limit Theorem section.
Problem 9
A 95% confidence interval for the mean diameter of bolts is ( 10.02 m m , 10.18 m m ) (10.02\,\mathrm{mm}, 10.18\,\mathrm{mm}) ( 10.02 mm , 10.18 mm )
based on a sample of size 50. The population standard deviation is known to be
σ = 0.4 m m \sigma = 0.4\,\mathrm{mm} σ = 0.4 mm . Find the sample mean and verify the confidence interval.
Solution The sample mean is the midpoint of the interval:
x ˉ = 10.02 + 10.18 2 = 10.10 m m \bar{x} = \frac{10.02 + 10.18}{2} = 10.10\,\mathrm{mm} x ˉ = 2 10.02 + 10.18 = 10.10 mm
The margin of error is half the width:
E = 10.18 − 10.02 2 = 0.08 m m E = \frac{10.18 - 10.02}{2} = 0.08\,\mathrm{mm} E = 2 10.18 − 10.02 = 0.08 mm
Verify: E = z α / 2 ⋅ σ n = 1.960 × 0.4 50 = 1.960 × 0.0566 = 0.1109 E = z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}} = 1.960 \times \frac{0.4}{\sqrt{50}} = 1.960 \times 0.0566 = 0.1109 E = z α /2 ⋅ n σ = 1.960 × 50 0.4 = 1.960 × 0.0566 = 0.1109
The calculated margin of error (0.1109 0.1109 0.1109 ) exceeds the stated margin (0.08 0.08 0.08 ). This suggests the
confidence interval was constructed with a different confidence level or the stated σ \sigma σ does
not match the data. If we solve for the confidence level that gives E = 0.08 E = 0.08 E = 0.08 :
z α / 2 = 0.08 0.4 / 50 = 0.08 0.0566 = 1.413 z_{\alpha/2} = \frac{0.08}{0.4/\sqrt{50}} = \frac{0.08}{0.0566} = 1.413 z α /2 = 0.4/ 50 0.08 = 0.0566 0.08 = 1.413
This corresponds to approximately 84% confidence, not 95%.
If you get this wrong, revise: Confidence Intervals section.
Problem 10
X ∼ B ( 15 , 0.4 ) X \sim B(15, 0.4) X ∼ B ( 15 , 0.4 ) and Y ∼ B ( 20 , 0.3 ) Y \sim B(20, 0.3) Y ∼ B ( 20 , 0.3 ) are independent. Find E ( X + Y ) E(X + Y) E ( X + Y ) ,
V a r ( X − Y ) \mathrm{Var}(X - Y) Var ( X − Y ) , and P ( X + Y = 10 ) P(X + Y = 10) P ( X + Y = 10 ) .
Solution E ( X ) = 15 ( 0.4 ) = 6 , E ( Y ) = 20 ( 0.3 ) = 6 E(X) = 15(0.4) = 6, \quad E(Y) = 20(0.3) = 6 E ( X ) = 15 ( 0.4 ) = 6 , E ( Y ) = 20 ( 0.3 ) = 6
E ( X + Y ) = 6 + 6 = 12 E(X + Y) = 6 + 6 = 12 E ( X + Y ) = 6 + 6 = 12
V a r ( X ) = 15 ( 0.4 ) ( 0.6 ) = 3.6 , V a r ( Y ) = 20 ( 0.3 ) ( 0.7 ) = 4.2 \mathrm{Var}(X) = 15(0.4)(0.6) = 3.6, \quad \mathrm{Var}(Y) = 20(0.3)(0.7) = 4.2 Var ( X ) = 15 ( 0.4 ) ( 0.6 ) = 3.6 , Var ( Y ) = 20 ( 0.3 ) ( 0.7 ) = 4.2
V a r ( X − Y ) = 3.6 + 4.2 = 7.8 \mathrm{Var}(X - Y) = 3.6 + 4.2 = 7.8 Var ( X − Y ) = 3.6 + 4.2 = 7.8
For P ( X + Y = 10 ) P(X + Y = 10) P ( X + Y = 10 ) , enumerate pairs ( x , y ) (x, y) ( x , y ) where x + y = 10 x + y = 10 x + y = 10 , 0 ≤ x ≤ 15 0 \le x \le 15 0 ≤ x ≤ 15 , 0 ≤ y ≤ 20 0 \le y \le 20 0 ≤ y ≤ 20 :
This requires summing over x = 0 x = 0 x = 0 to x = 10 x = 10 x = 10 :
P ( X + Y = 10 ) = ∑ x = 0 10 P ( X = x ) P ( Y = 10 − x ) P(X + Y = 10) = \sum_{x=0}^{10} P(X = x)P(Y = 10 - x) P ( X + Y = 10 ) = ∑ x = 0 10 P ( X = x ) P ( Y = 10 − x )
This is computationally intensive without a GDC, but the key principle is clear: since X X X and Y Y Y are
independent binomial variables with the same success probability (p = 0.3 p = 0.3 p = 0.3 and p = 0.4 p = 0.4 p = 0.4 differ,
so the sum is not binomial), the distribution of X + Y X + Y X + Y must be found by convolution.
If you get this wrong, revise: Combining Random Variables section.
Problem 11
The lifetimes of batteries are normally distributed with mean 500 h o u r s 500\,\mathrm{hours} 500 hours and standard
deviation 50 h o u r s 50\,\mathrm{hours} 50 hours . Find the probability that a randomly selected battery lasts more than
550 h o u r s 550\,\mathrm{hours} 550 hours . If four batteries are selected independently, find the probability that at
least three last more than 550 h o u r s 550\,\mathrm{hours} 550 hours .
Solution P ( X > 550 ) = P ( Z > 550 − 500 50 ) = P ( Z > 1 ) = 1 − 0.8413 = 0.1587 P(X \gt 550) = P\!\left(Z \gt \frac{550 - 500}{50}\right) = P(Z \gt 1) = 1 - 0.8413 = 0.1587 P ( X > 550 ) = P ( Z > 50 550 − 500 ) = P ( Z > 1 ) = 1 − 0.8413 = 0.1587
Let Y Y Y be the number (out of 4) lasting more than 550 hours. Y ∼ B ( 4 , 0.1587 ) Y \sim B(4, 0.1587) Y ∼ B ( 4 , 0.1587 ) .
P ( Y ≥ 3 ) = P ( Y = 3 ) + P ( Y = 4 ) P(Y \ge 3) = P(Y = 3) + P(Y = 4) P ( Y ≥ 3 ) = P ( Y = 3 ) + P ( Y = 4 )
= ( 4 3 ) ( 0.1587 ) 3 ( 0.8413 ) + ( 0.1587 ) 4 = \binom{4}{3}(0.1587)^3(0.8413) + (0.1587)^4 = ( 3 4 ) ( 0.1587 ) 3 ( 0.8413 ) + ( 0.1587 ) 4
= 4 ( 0.003997 ) ( 0.8413 ) + 0.000635 = 0.01345 + 0.000635 = 0.01409 = 4(0.003997)(0.8413) + 0.000635 = 0.01345 + 0.000635 = 0.01409 = 4 ( 0.003997 ) ( 0.8413 ) + 0.000635 = 0.01345 + 0.000635 = 0.01409
Approximately 1.4% chance that at least three out of four batteries last more than 550 hours.
If you get this wrong, revise: Normal Distribution and Binomial Distribution sections.
Problem 12
Use the Poisson approximation to the binomial to estimate the probability of getting 3 or more
sixes when rolling a fair die 60 times.
Solution X ∼ B ( 60 , 1 / 6 ) X \sim B(60, 1/6) X ∼ B ( 60 , 1/6 ) . λ = n p = 60 / 6 = 10 \lambda = np = 60/6 = 10 λ = n p = 60/6 = 10 .
Approximate: X ≈ P o ( 10 ) X \approx \mathrm{Po}(10) X ≈ Po ( 10 ) .
Check conditions: n = 60 ≥ 50 n = 60 \ge 50 n = 60 ≥ 50 , p = 1 / 6 ≤ 0.1 p = 1/6 \le 0.1 p = 1/6 ≤ 0.1 ? No, p = 0.167 > 0.1 p = 0.167 \gt 0.1 p = 0.167 > 0.1 . The Poisson
approximation is less accurate here but still usable as an estimate.
P ( X ≥ 3 ) = 1 − P ( X ≤ 2 ) = 1 − e − 10 ( 1 + 10 + 100 2 ) P(X \ge 3) = 1 - P(X \le 2) = 1 - e^{-10}\!\left(1 + 10 + \frac{100}{2}\right) P ( X ≥ 3 ) = 1 − P ( X ≤ 2 ) = 1 − e − 10 ( 1 + 10 + 2 100 )
= 1 − 61 e − 10 = 1 − 61 ( 0.0000454 ) = 1 − 0.00277 = 0.9972 = 1 - 61e^{-10} = 1 - 61(0.0000454) = 1 - 0.00277 = 0.9972 = 1 − 61 e − 10 = 1 − 61 ( 0.0000454 ) = 1 − 0.00277 = 0.9972
Exact binomial: P ( X ≤ 2 ) = ( 60 0 ) ( 5 / 6 ) 60 + ( 60 1 ) ( 1 / 6 ) ( 5 / 6 ) 59 + ( 60 2 ) ( 1 / 6 ) 2 ( 5 / 6 ) 58 P(X \le 2) = \binom{60}{0}(5/6)^{60} + \binom{60}{1}(1/6)(5/6)^{59} + \binom{60}{2}(1/6)^2(5/6)^{58} P ( X ≤ 2 ) = ( 0 60 ) ( 5/6 ) 60 + ( 1 60 ) ( 1/6 ) ( 5/6 ) 59 + ( 2 60 ) ( 1/6 ) 2 ( 5/6 ) 58
This gives approximately P ( X ≤ 2 ) ≈ 0.00268 P(X \le 2) \approx 0.00268 P ( X ≤ 2 ) ≈ 0.00268 , so P ( X ≥ 3 ) ≈ 0.9973 P(X \ge 3) \approx 0.9973 P ( X ≥ 3 ) ≈ 0.9973 . The
approximation is quite close despite p > 0.1 p \gt 0.1 p > 0.1 because λ = 10 \lambda = 10 λ = 10 is moderate.
If you get this wrong, revise: Poisson as a Limit of the Binomial section.
Related Content at Other Levels
Diagnostic Test
Ready to test your understanding of Probability Distributions ? The diagnostic test contains the hardest questions within the IB specification for this topic, each with a full worked solution.
Unit tests probe edge cases and common misconceptions. Integration tests combine Probability Distributions with other IB mathematics topics to test synthesis under exam conditions.
See Diagnostic Guide for instructions on self-marking and building a personal test matrix.