Z-Score: Comparing Values Using Standardization
What if you had one number and wanted to know whether it’s common or exceptional?
For example, is a score of 90 on a test considered average — or much higher than typical?
That’s where the Z-score comes in.
📚 This post is part of the "Intro to Statistics" series
🔙 Previously: Measuring Variability: Variance and Standard Deviation
🎯 What is a Z-Score?
A Z-score (or standard score) tells you:
❓ “How many standard deviations is this value away from the mean?”
It answers:
- Is this value above or below average?
- Is it unusual or common in this distribution?
🧮 Z-Score Formula
\[ z = \frac{x - \bar{x}}{\sigma} \]
This formula transforms a raw score \( x \) into a standardized score:
- The numerator \( x - \bar{x} \) tells us how far the value is from the mean
- The denominator \( \sigma \) scales this difference using standard deviation
- The result is a unit-free number (z-score) showing its relative position
Where:
- \( x \): the observation
- \( \bar{x} \): the mean
- \( \sigma \): the standard deviation
📊 Example: One Observation
Suppose:
- Mean = 70
- Standard Deviation = 10
- Observation = 85
Then:
\[ z = \frac{85 - 70}{10} = 1.5 \]
🟢 The value is 1.5 standard deviations above the mean.
Now try:
\[ z = \frac{60 - 70}{10} = -1 \]
🔵 This one is 1 standard deviation below the mean.
📈 How to Interpret Z-Scores
- Positive z-score → Above the mean
- Negative z-score → Below the mean
- z = 0 → Exactly the mean
Z-scores show where a value lies on the distribution curve.
📌 When the distribution is skewed:
- Right-skewed → Large z-scores occur more often in the tail
- Left-skewed → Negative z-scores dominate the lower tail
📉 Empirical Rules and Z-Score Ranges
There’s a general understanding of how much data falls in certain z-score ranges:
Z-Score Range | Approx. % of Data |
---|---|
-1 to +1 | ~68% |
-2 to +2 | ~75% |
-3 to +3 | ~89% |
✅ So most values (especially in bell-shaped distributions) lie between -2 and +2.
🔁 Z-Score Always Balances
If you compute z-scores for a full dataset, their sum is always zero:
\[ \sum z = 0 \]
That’s because the deviations above and below the mean cancel out.
🧪 Multiple Z-Scores from a Dataset
Let’s say we have a dataset of exam scores:
{70, 80, 90}
Step 1 — Find the mean and standard deviation:
- Mean = \( \bar{x} = 80 \)
- Standard deviation =
\[ \sigma = \sqrt{\frac{(70-80)^2 + (80-80)^2 + (90-80)^2}{3}} = \sqrt{66.67} \approx 8.16 \]
Step 2 — Compute z-scores for each value:
\[ z_{70} = \frac{70 - 80}{8.16} \approx -1.22 \]
\[ z_{80} = \frac{80 - 80}{8.16} = 0 \]
\[ z_{90} = \frac{90 - 80}{8.16} \approx 1.22 \]
These scores tell us:
- 70 is below average
- 80 is exactly the average
- 90 is above average
✅ The sum of these z-scores ≈ 0, confirming the rule.
⚖️ Comparing Across Distributions
Let’s say we have two distributions:
Test A:
- Mean = 60, SD = 5
- Observation = 70
\[ z = \frac{70 - 60}{5} = 2 \]
Test B:
- Mean = 85, SD = 10
- Observation = 90
\[ z = \frac{90 - 85}{10} = 0.5 \]
📌 Although 90 is numerically higher, it is less exceptional in Test B than 70 is in Test A.
🌍 This is Called Standardization
Standardization means:
Expressing a value in terms of how far it is from the mean, using the standard deviation.
It lets us:
- Compare scores from different tests
- Identify outliers
- Normalize data for machine learning
🧠 Level Up: How Z-Scores Power Real Analysis
Z-scores are more than just a tool for comparing test scores. They're the foundation for some of the most powerful techniques in statistics and machine learning:
- 🎯 Probability: Z-scores help us estimate how likely a value is in a normal distribution — using z-tables
- 📏 Confidence Intervals: Z-scores define the range of values we expect sample means to fall within
- 🚨 Outlier Detection: Observations with
|z| > 2
or|z| > 3
are often flagged as potential outliers - 🔄 Standardization: Machine learning models often require data to be normalized using z-scores
You’ll see these ideas come to life as we explore probability and inference in upcoming posts.
📌 Try It Yourself
Q: A test score has a z-score of -2.1. What does this tell us about the score?
💡 Show Answer
It means the score is 2.1 standard deviations below the mean — significantly lower than average.
🧠 Summary
Concept | What It Means |
---|---|
Z-score | # of standard deviations from the mean |
Positive z | Above average |
Negative z | Below average |
z = 0 | Exactly the mean |
Sum of all z | Always equals 0 in a full dataset |
Use case | Comparing values across distributions |
✅ Up Next
Next, we’ll walk through a real-life example that uses everything we’ve learned:
- Mean
- Median
- Standard deviation
- Z-scores
And how to interpret and compare them together.
Stay tuned!