Z-Score: Comparing Values Using Standardization

Posted May 8, 2025 Updated Jun 22, 2025

By Hoda Osama

5 min read

Have you ever scored 90 on a test and wondered: Is that impressive or just average?
To answer that, you need more than just the number — you need to know how it compares to others. That’s exactly what a Z-score helps you do.

A Z-score tells you how far a value is from the mean in terms of standard deviations. It’s one of the most useful tools in statistics and machine learning for comparing values across different distributions, detecting outliers, and standardizing data for models.

In this post, you’ll learn what Z-scores are, how to calculate and interpret them, and how they’re used in real-world analysis.

📚 This post is part of the "Intro to Statistics" series

🔙 Previously: Measuring Variability: Variance and Standard Deviation

🔜 Next: Real Example: Putting It All Together

🎯 What is a Z-Score?

A Z-score (or standard score) tells you:

❓ “How many standard deviations is this value away from the mean?”

It answers:

Is this value above or below average?
Is it unusual or common in this distribution?

🤖 Why Z-Scores Matter in Machine Learning

Z-scores are used to:

Standardize features (essential for models like KNN, SVM, logistic regression)
Detect outliers in high-dimensional data
Compare different variables on the same scale
Improve feature scaling and model convergence

Z-scores make raw values comparable across features and distributions.

🧮 Z-Score Formula

\[ z = \frac{x - \bar{x}}{\sigma} \]

This formula transforms a raw score \( x \) into a standardized score:

The numerator \( x - \bar{x} \) tells us how far the value is from the mean
The denominator \( \sigma \) scales this difference using standard deviation
The result is a unit-free number (z-score) showing its relative position

Where:

\( x \): the observation
\( \bar{x} \): the mean
\( \sigma \): the standard deviation

📊 Example: One Observation

Suppose:

Mean = 70
Standard Deviation = 10
Observation = 85

Then:

\[ z = \frac{85 - 70}{10} = 1.5 \]

🟢 The value is 1.5 standard deviations above the mean.

Now try:

\[ z = \frac{60 - 70}{10} = -1 \]

🔵 This one is 1 standard deviation below the mean.

📈 How to Interpret Z-Scores

Positive z-score → Above the mean
Negative z-score → Below the mean
z = 0 → Exactly the mean

Z-scores show where a value lies on the distribution curve.

📌 When the distribution is skewed:

Right-skewed → Large z-scores occur more often in the tail
Left-skewed → Negative z-scores dominate the lower tail

📉 Empirical Rules and Z-Score Ranges

There’s a general understanding of how much data falls in certain z-score ranges:

Z-Score Range	Approx. % of Data
-1 to +1	~68%
-2 to +2	~75%
-3 to +3	~89%

✅ So most values (especially in bell-shaped distributions) lie between -2 and +2.

🖼️ Visual Insight: Z-Scores and the Normal Distribution

The Z-score works best with normally distributed data. Here’s how the values are typically spread:

~68% of values fall between z = -1 and +1
~95% fall between z = -2 and +2
~99.7% fall between z = -3 and +3

✅ Use this to visually estimate how common or rare a value is based on its Z-score.

🔁 Z-Score Always Balances

If you compute z-scores for a full dataset, their sum is always zero:

\[ \sum z = 0 \]

That’s because the deviations above and below the mean cancel out.

🧪 Multiple Z-Scores from a Dataset

Let’s say we have a dataset of exam scores:
{70, 80, 90}

Step 1 — Find the mean and standard deviation:

Mean = \( \bar{x} = 80 \)
Standard deviation =
\[ \sigma = \sqrt{\frac{(70-80)^2 + (80-80)^2 + (90-80)^2}{3}} = \sqrt{66.67} \approx 8.16 \]

Step 2 — Compute z-scores for each value:

\[ z_{70} = \frac{70 - 80}{8.16} \approx -1.22 \]

\[ z_{80} = \frac{80 - 80}{8.16} = 0 \]

\[ z_{90} = \frac{90 - 80}{8.16} \approx 1.22 \]

These scores tell us:

70 is below average
80 is exactly the average
90 is above average

✅ The sum of these z-scores ≈ 0, confirming the rule.

⚖️ Comparing Across Distributions

Let’s say we have two distributions:

Test A:

Mean = 60, SD = 5
Observation = 70
\[ z = \frac{70 - 60}{5} = 2 \]

Test B:

Mean = 85, SD = 10
Observation = 90
\[ z = \frac{90 - 85}{10} = 0.5 \]

📌 Although 90 is numerically higher, it is less exceptional in Test B than 70 is in Test A.

🌍 This is Called Standardization

Standardization means:

Expressing a value in terms of how far it is from the mean, using the standard deviation.

It lets us:

Compare scores from different tests
Identify outliers
Normalize data for machine learning

🧠 Level Up: How Z-Scores Power Real Analysis

Z-scores are useful not just for detecting outliers or comparing scores, but also in **statistical inference** and **ML pipelines**. Here's how they're used in real-world applications:

🎯 Probability: Z-scores help us estimate how likely a value is in a normal distribution — using z-tables
📏 Confidence Intervals: Z-scores define the range of values we expect sample means to fall within
🚨 Outlier Detection: Observations with |z| > 2 or |z| > 3 are often flagged as potential outliers
🔄 Standardization: Machine learning models often require data to be normalized using z-scores

You’ll see these ideas come to life as we explore probability and inference in upcoming posts.

📌 Try It Yourself

Q: A student received a test score with a z-score of -2.1. What does this tell you about the score compared to the rest of the class?

💡 Show Answer

✅ It means the score is 2.1 standard deviations below the mean — significantly lower than average. In most distributions, that would place the score in the bottom 2%–3% of the group.

🧠 Summary

Concept	What It Means	Practical Use
Z-score	Distance from mean in standard deviations	Normalization, outlier detection
Positive z	Above average	High-performing observation
Negative z	Below average	Underperformance or anomaly
z = 0	Exactly average	Benchmark reference point
Sum of all z	Zero in a complete dataset	Confirms correct standardization

💬 Have a question or want to compare z-scores from your own dataset?
Drop it in the comments — happy to help!

✅ Up Next

Next, we’ll walk through a real-life example that uses everything we’ve learned:

Mean
Median
Standard deviation
Z-scores
And how to interpret and compare them together.

Stay tuned!

statistics, beginner

This post is licensed under CC BY 4.0 by the author.