SAMPLING - Theory & Formulas
๐ SAMPLING - Theory & Formulas
Cambridge AS & A Level Mathematics
๐ Part 1: Introduction to Sampling
Key Definitions
Population: Complete set of ALL items of interest
Sample: Part of the population (size = n)
Representative Sample: Accurately reflects population characteristics
Biased Sample: Does NOT properly represent population
Random Sample: ALL possible samples of size n have equal probability of selection
๐ก Why Use Samples?
| Reason | Example |
|---|---|
| ๐ฐ Cost-Effective | Test 50 products vs 10,000 |
| ⏰ Time-Saving | Survey 100 people vs millions |
| ๐จ Destructive Testing | Crash testing helmets |
| ๐ Impossible to Survey All | All fish in the ocean |
๐ฒ Random Sampling Methods
Using Random Number Tables:
- Number population: 000 to 499 (for 500 items)
- Pick starting point in table
- Read digits matching your numbering
- Ignore numbers outside range
- Ignore repeats
Using Excel:
=RAND() → Random number 0 to 1=INT(250*RAND())+1 → Random integer 1 to 250
⚠️ Types of Bias
| Type | Example |
|---|---|
| Location Bias | Survey only at gym about exercise |
| Time Bias | Survey Monday afternoon only |
| Leading Questions | "Don't you agree that...?" |
| Small Sample | Ask only 10 people |
๐ Part 2: Distribution of Sample Means
Sample Mean (X̄)
Definition: Average of all observations in sample
X̄ = (x₁ + x₂ + ... + xโ) / n
⚡ Different samples → Different sample means!
๐ FUNDAMENTAL FORMULAS
1. Expected Value of Sample Mean
E(X̄) = ฮผ
Sample mean equals population mean!
2. Variance of Sample Mean
Var(X̄) = ฯ² / n
Variance decreases as sample size increases!
3. Standard Deviation of Sample Mean
SD(X̄) = ฯ / √n
Also called: Standard Error (SE)
⭐ THE CENTRAL LIMIT THEOREM (CLT)
Most Important Theorem in Statistics!
X̄ ~ N(ฮผ, ฯ²/n)
When n is large (usually n ≥ 30)
What it means:
- Sample means follow NORMAL distribution
- Mean = ฮผ (population mean)
- Variance = ฯ²/n
- Works EVEN IF original population is NOT normal!
๐ How Large Should n Be?
| Original Population | Minimum n |
|---|---|
| Normal Distribution | n ≥ 5 |
| Approximately Symmetric | n ≥ 20 |
| Skewed Distribution | n ≥ 30 |
| Any Distribution (Safe) | n ≥ 50 |
๐ Complete Formula Summary
| Concept | Formula |
|---|---|
| Population Mean | ฮผ = E(X) |
| Population Variance | ฯ² = Var(X) |
| Sample Mean | X̄ = ฮฃxแตข / n |
| Expected Value | E(X̄) = ฮผ |
| Variance | Var(X̄) = ฯ²/n |
| Standard Error | SE = ฯ/√n |
| Distribution (CLT) | X̄ ~ N(ฮผ, ฯ²/n) |
| Z-Score | Z = (X̄ - ฮผ)/(ฯ/√n) |
๐ข Working with Sample Totals
If T = sample total of n observations:
T = n × X̄E(T) = nฮผVar(T) = nฯ²T ~ N(nฮผ, nฯ²) when n is large
⚡ Continuity Correction
For DISCRETE distributions (Binomial, Poisson):
Continuity Correction = ± 1/(2n)
NOT ± 1/2
| Probability | Correction |
|---|---|
| P(X̄ < a) | P(X̄ < a - 1/(2n)) |
| P(X̄ ≤ a) | P(X̄ < a + 1/(2n)) |
| P(X̄ > a) | P(X̄ > a + 1/(2n)) |
| P(X̄ ≥ a) | P(X̄ > a - 1/(2n)) |
๐ Problem Solving Steps
Step 1: Identify ฮผ, ฯ² (or ฯ), n
Step 2: Check if CLT applies (n ≥ 30 or population normal)
Step 3: Write distribution: X̄ ~ N(ฮผ, ฯ²/n)
Step 4: Calculate SE: ฯ/√n
Step 5: Find Z-score: Z = (X̄ - ฮผ)/(ฯ/√n)
Step 6: Use normal tables to find probability
Step 7: Apply continuity correction if discrete
๐ก Example 1: Pears in Bags
Problem: Pears: ฮผ=45g, ฯ²=52g², n=6. Find P(Total > 300g)
Solution: Total > 300g means X̄ > 50g
X̄ ~ N(45, 52/6) = N(45, 8.67)SE = √8.67 = 2.94Z = (50-45)/2.94 = 1.70P(Z > 1.70) = 1 - 0.9554 = 0.0446
Answer: 4.46%
๐ก Example 2: Water for Exercise
Problem: ฮผ=500ml, ฯ=50ml, n=25. 13L available. Enough?
Solution: Need X̄ < 520ml (13000/25)
X̄ ~ N(500, 100) (ฯ²/n = 2500/25)SE = 10Z = (520-500)/10 = 2.0P(Z < 2.0) = 0.9772
Answer: 97.72% probability
๐ก Example 3: Binomial (with Continuity Correction)
Problem: X ~ B(60, 0.25), n=50, Find P(X̄ ≤ 16)
Solution:
ฮผ = 60×0.25 = 15ฯ² = 60×0.25×0.75 = 11.25X̄ ~ N(15, 11.25/50) = N(15, 0.225)Correction: +1/(2×50) = +0.01P(X̄ ≤ 16) = P(X̄ < 16.01)Z = (16.01-15)/√0.225 = 2.13P(Z < 2.13) = 0.983
Answer: 98.3%
๐ฏ Quick Reference Card
| If you know... | You can find... |
|---|---|
| ฮผ, ฯ², n | E(X̄) = ฮผ, Var(X̄) = ฯ²/n |
| Population normal | X̄ is normal for ANY n |
| n ≥ 30 | X̄ ~ N(ฮผ, ฯ²/n) by CLT |
| Discrete distribution | Use continuity correction ±1/(2n) |
| Sample total T | T ~ N(nฮผ, nฯ²) |
✅ Key Takeaways
- Random sampling: Everyone has equal chance
- E(X̄) = ฮผ: Sample mean targets population mean
- Var(X̄) = ฯ²/n: Bigger sample = smaller variance
- CLT: X̄ is approximately normal when n is large
- Works for ANY distribution!
- Continuity correction: ±1/(2n) for discrete
