Post

Z-Score: Comparing Values Using Standardization

Z-Score: Comparing Values Using Standardization

What if you had one number and wanted to know whether it’s common or exceptional?
For example, is a score of 90 on a test considered average — or much higher than typical?

That’s where the Z-score comes in.


📚 This post is part of the "Intro to Statistics" series

🔙 Previously: Measuring Variability: Variance and Standard Deviation

🔜 Next: Real Example: Putting It All Together


🎯 What is a Z-Score?

A Z-score (or standard score) tells you:

❓ “How many standard deviations is this value away from the mean?”

It answers:

  • Is this value above or below average?
  • Is it unusual or common in this distribution?

🧮 Z-Score Formula

\[ z = \frac{x - \bar{x}}{\sigma} \]

This formula transforms a raw score \( x \) into a standardized score:

  • The numerator \( x - \bar{x} \) tells us how far the value is from the mean
  • The denominator \( \sigma \) scales this difference using standard deviation
  • The result is a unit-free number (z-score) showing its relative position

Where:

  • \( x \): the observation
  • \( \bar{x} \): the mean
  • \( \sigma \): the standard deviation

📊 Example: One Observation

Suppose:

  • Mean = 70
  • Standard Deviation = 10
  • Observation = 85

Then:

\[ z = \frac{85 - 70}{10} = 1.5 \]

🟢 The value is 1.5 standard deviations above the mean.

Now try:

\[ z = \frac{60 - 70}{10} = -1 \]

🔵 This one is 1 standard deviation below the mean.


📈 How to Interpret Z-Scores

  • Positive z-score → Above the mean
  • Negative z-score → Below the mean
  • z = 0 → Exactly the mean

Z-scores show where a value lies on the distribution curve.

📌 When the distribution is skewed:

  • Right-skewed → Large z-scores occur more often in the tail
  • Left-skewed → Negative z-scores dominate the lower tail

📉 Empirical Rules and Z-Score Ranges

There’s a general understanding of how much data falls in certain z-score ranges:

Z-Score RangeApprox. % of Data
-1 to +1~68%
-2 to +2~75%
-3 to +3~89%

✅ So most values (especially in bell-shaped distributions) lie between -2 and +2.


🔁 Z-Score Always Balances

If you compute z-scores for a full dataset, their sum is always zero:

\[ \sum z = 0 \]

That’s because the deviations above and below the mean cancel out.


🧪 Multiple Z-Scores from a Dataset

Let’s say we have a dataset of exam scores:
{70, 80, 90}

Step 1 — Find the mean and standard deviation:

  • Mean = \( \bar{x} = 80 \)
  • Standard deviation =
    \[ \sigma = \sqrt{\frac{(70-80)^2 + (80-80)^2 + (90-80)^2}{3}} = \sqrt{66.67} \approx 8.16 \]

Step 2 — Compute z-scores for each value:

\[ z_{70} = \frac{70 - 80}{8.16} \approx -1.22 \]

\[ z_{80} = \frac{80 - 80}{8.16} = 0 \]

\[ z_{90} = \frac{90 - 80}{8.16} \approx 1.22 \]

These scores tell us:

  • 70 is below average
  • 80 is exactly the average
  • 90 is above average

✅ The sum of these z-scores ≈ 0, confirming the rule.


⚖️ Comparing Across Distributions

Let’s say we have two distributions:

Test A:

  • Mean = 60, SD = 5
  • Observation = 70
    \[ z = \frac{70 - 60}{5} = 2 \]

Test B:

  • Mean = 85, SD = 10
  • Observation = 90
    \[ z = \frac{90 - 85}{10} = 0.5 \]

📌 Although 90 is numerically higher, it is less exceptional in Test B than 70 is in Test A.


🌍 This is Called Standardization

Standardization means:

Expressing a value in terms of how far it is from the mean, using the standard deviation.

It lets us:

  • Compare scores from different tests
  • Identify outliers
  • Normalize data for machine learning

🧠 Level Up: How Z-Scores Power Real Analysis

Z-scores are more than just a tool for comparing test scores. They're the foundation for some of the most powerful techniques in statistics and machine learning:

  • 🎯 Probability: Z-scores help us estimate how likely a value is in a normal distribution — using z-tables
  • 📏 Confidence Intervals: Z-scores define the range of values we expect sample means to fall within
  • 🚨 Outlier Detection: Observations with |z| > 2 or |z| > 3 are often flagged as potential outliers
  • 🔄 Standardization: Machine learning models often require data to be normalized using z-scores

You’ll see these ideas come to life as we explore probability and inference in upcoming posts.


📌 Try It Yourself

Q: A test score has a z-score of -2.1. What does this tell us about the score?

💡 Show Answer

It means the score is 2.1 standard deviations below the mean — significantly lower than average.


🧠 Summary

ConceptWhat It Means
Z-score# of standard deviations from the mean
Positive zAbove average
Negative zBelow average
z = 0Exactly the mean
Sum of all zAlways equals 0 in a full dataset
Use caseComparing values across distributions

✅ Up Next

Next, we’ll walk through a real-life example that uses everything we’ve learned:

  • Mean
  • Median
  • Standard deviation
  • Z-scores
    And how to interpret and compare them together.

Stay tuned!

This post is licensed under CC BY 4.0 by the author.