Z-Score: Comparing Values Using Standardization
Have you ever scored 90 on a test and wondered: Is that impressive or just average?
To answer that, you need more than just the number — you need to know how it compares to others. That’s exactly what a Z-score helps you do.
A Z-score tells you how far a value is from the mean in terms of standard deviations. It’s one of the most useful tools in statistics and machine learning for comparing values across different distributions, detecting outliers, and standardizing data for models.
In this post, you’ll learn what Z-scores are, how to calculate and interpret them, and how they’re used in real-world analysis.
📚 This post is part of the "Intro to Statistics" series
🔙 Previously: Measuring Variability: Variance and Standard Deviation
🎯 What is a Z-Score?
A Z-score (or standard score) tells you:
❓ “How many standard deviations is this value away from the mean?”
It answers:
- Is this value above or below average?
- Is it unusual or common in this distribution?
🤖 Why Z-Scores Matter in Machine Learning
Z-scores are used to:
- Standardize features (essential for models like KNN, SVM, logistic regression)
- Detect outliers in high-dimensional data
- Compare different variables on the same scale
- Improve feature scaling and model convergence
Z-scores make raw values comparable across features and distributions.
🧮 Z-Score Formula
\[ z = \frac{x - \bar{x}}{\sigma} \]
This formula transforms a raw score \( x \) into a standardized score:
- The numerator \( x - \bar{x} \) tells us how far the value is from the mean
- The denominator \( \sigma \) scales this difference using standard deviation
- The result is a unit-free number (z-score) showing its relative position
Where:
- \( x \): the observation
- \( \bar{x} \): the mean
- \( \sigma \): the standard deviation
📊 Example: One Observation
Suppose:
- Mean = 70
- Standard Deviation = 10
- Observation = 85
Then:
\[ z = \frac{85 - 70}{10} = 1.5 \]
🟢 The value is 1.5 standard deviations above the mean.
Now try:
\[ z = \frac{60 - 70}{10} = -1 \]
🔵 This one is 1 standard deviation below the mean.
📈 How to Interpret Z-Scores
- Positive z-score → Above the mean
- Negative z-score → Below the mean
- z = 0 → Exactly the mean
Z-scores show where a value lies on the distribution curve.
📌 When the distribution is skewed:
- Right-skewed → Large z-scores occur more often in the tail
- Left-skewed → Negative z-scores dominate the lower tail
📉 Empirical Rules and Z-Score Ranges
There’s a general understanding of how much data falls in certain z-score ranges:
Z-Score Range | Approx. % of Data |
---|---|
-1 to +1 | ~68% |
-2 to +2 | ~75% |
-3 to +3 | ~89% |
✅ So most values (especially in bell-shaped distributions) lie between -2 and +2.
🖼️ Visual Insight: Z-Scores and the Normal Distribution
The Z-score works best with normally distributed data. Here’s how the values are typically spread:
- ~68% of values fall between z = -1 and +1
- ~95% fall between z = -2 and +2
- ~99.7% fall between z = -3 and +3
✅ Use this to visually estimate how common or rare a value is based on its Z-score.
🔁 Z-Score Always Balances
If you compute z-scores for a full dataset, their sum is always zero:
\[ \sum z = 0 \]
That’s because the deviations above and below the mean cancel out.
🧪 Multiple Z-Scores from a Dataset
Let’s say we have a dataset of exam scores:
{70, 80, 90}
Step 1 — Find the mean and standard deviation:
- Mean = \( \bar{x} = 80 \)
- Standard deviation =
\[ \sigma = \sqrt{\frac{(70-80)^2 + (80-80)^2 + (90-80)^2}{3}} = \sqrt{66.67} \approx 8.16 \]
Step 2 — Compute z-scores for each value:
\[ z_{70} = \frac{70 - 80}{8.16} \approx -1.22 \]
\[ z_{80} = \frac{80 - 80}{8.16} = 0 \]
\[ z_{90} = \frac{90 - 80}{8.16} \approx 1.22 \]
These scores tell us:
- 70 is below average
- 80 is exactly the average
- 90 is above average
✅ The sum of these z-scores ≈ 0, confirming the rule.
⚖️ Comparing Across Distributions
Let’s say we have two distributions:
Test A:
- Mean = 60, SD = 5
- Observation = 70
\[ z = \frac{70 - 60}{5} = 2 \]
Test B:
- Mean = 85, SD = 10
- Observation = 90
\[ z = \frac{90 - 85}{10} = 0.5 \]
📌 Although 90 is numerically higher, it is less exceptional in Test B than 70 is in Test A.
🌍 This is Called Standardization
Standardization means:
Expressing a value in terms of how far it is from the mean, using the standard deviation.
It lets us:
- Compare scores from different tests
- Identify outliers
- Normalize data for machine learning
🧠 Level Up: How Z-Scores Power Real Analysis
Z-scores are useful not just for detecting outliers or comparing scores, but also in **statistical inference** and **ML pipelines**. Here's how they're used in real-world applications:
- 🎯 Probability: Z-scores help us estimate how likely a value is in a normal distribution — using z-tables
- 📏 Confidence Intervals: Z-scores define the range of values we expect sample means to fall within
- 🚨 Outlier Detection: Observations with
|z| > 2
or|z| > 3
are often flagged as potential outliers - 🔄 Standardization: Machine learning models often require data to be normalized using z-scores
You’ll see these ideas come to life as we explore probability and inference in upcoming posts.
📌 Try It Yourself
Q: A student received a test score with a z-score of -2.1. What does this tell you about the score compared to the rest of the class?
💡 Show Answer
✅ It means the score is 2.1 standard deviations below the mean — significantly lower than average. In most distributions, that would place the score in the bottom 2%–3% of the group.
🧠 Summary
Concept | What It Means | Practical Use |
---|---|---|
Z-score | Distance from mean in standard deviations | Normalization, outlier detection |
Positive z | Above average | High-performing observation |
Negative z | Below average | Underperformance or anomaly |
z = 0 | Exactly average | Benchmark reference point |
Sum of all z | Zero in a complete dataset | Confirms correct standardization |
💬 Have a question or want to compare z-scores from your own dataset?
Drop it in the comments — happy to help!
✅ Up Next
Next, we’ll walk through a real-life example that uses everything we’ve learned:
- Mean
- Median
- Standard deviation
- Z-scores
And how to interpret and compare them together.
Stay tuned!