Post

Understanding Dispersion: Range, IQR, and the Box Plot

Understanding Dispersion: Range, IQR, and the Box Plot

Understanding how data spreads is just as important as knowing its center. In this post, you’ll learn about dispersion — using the range, interquartile range (IQR), and box plots — and how these tools help identify outliers, variability, and improve machine learning models.


📚 This post is part of the "Intro to Statistics" series

🔙 Previously: Measuring the Center: Mean, Median, and Mode Explained

🔜 Next: Measuring Variability: Variance and Standard Deviation


📏 Range: A Simple Start

  • Range = Largest value − Smallest value

\[ \text{Range} = x_{\max} - x_{\min} \]

  • Gives a basic idea of spread
  • But it’s not reliable for measuring variability if there are outliers

💡 Example:
If data = [5, 6, 6, 7, 95] →
\[ \text{Range} = 95 - 5 = 90 \]

🛑 That huge gap is because of one extreme value (an outlier).


📦 Interquartile Range (IQR)

To handle outliers, we use the IQR, which is based on quartiles:

QuartileMeaning
Q125% of data is below this point
Q250% (median)
Q375% of data is below this point

🧮 Formula:

\[ \text{IQR} = Q_3 - Q_1 \]

  • IQR focuses on the middle 50% of the data
  • It removes the influence of extreme values

💡 Example:

Given ordered data:
[2, 4, 5, 7, 8, 10, 12, 15, 20, 22]

  • ( Q_1 = 5 ) (25th percentile)
  • ( Q_3 = 15 ) (75th percentile)

\[ \text{IQR} = 15 - 5 = 10 \]

This means the middle half of data spans 10 units.


📊 Box Plot: Best of Both Worlds

A box plot visually summarizes:

  • The minimum and maximum values (excluding outliers)
  • Q1, Q2 (median), and Q3
  • Any outliers (points beyond 1.5×IQR from quartiles)

It’s one of the best visual tools to understand:

  • Center
  • Spread
  • Skewness
  • Outliers

🖼️ Visual: Anatomy of a Box Plot

Box plot showing minimum, Q1, median, Q3, maximum, and outliers

  • Each 25% of data is shown as a section
  • The box spans from Q1 to Q3
  • The line in the middle is the median (Q2)
  • Points outside the whiskers are outliers

🎯 Why Not Just Use the Mean?

While central tendency is important, it’s not enough.
We need to know how spread out the data is — especially when comparing groups.

🧠 The box plot helps you see both center and variability.


🤖 Why Dispersion Matters in Machine Learning

  • Outlier Detection: IQR and box plots help identify outliers, which can strongly affect model performance.
  • Feature Selection: Features with very low or very high dispersion may be less useful or require special handling.
  • Comparing Groups: Box plots make it easy to compare distributions across classes or experimental groups.
  • Data Preprocessing: Understanding spread helps guide normalization, scaling, and robust imputation strategies.

In machine learning, understanding and visualizing data dispersion is essential for building reliable, interpretable models and for effective data cleaning.


📌 Try It Yourself

Q: Consider these two datasets:

  1. 📦 Dataset A: {10, 12, 12, 13, 13, 13, 14, 15}
  2. 📦 Dataset B: {10, 12, 13, 14, 15, 70}

Both datasets have the same median. Which one has a larger range, and what does that tell you about its spread?

💡 Show Answer

Dataset B — it has a much larger range:
70 - 10 = 60 vs 15 - 10 = 5 in Dataset A.

This tells us that Dataset B includes a more extreme value — possibly an outlier — which greatly increases its range.


Bonus: Why might the IQR be a better measure of spread than the range in some cases?

💡 Show Answer

✅ The Interquartile Range (IQR) measures the spread of the middle 50% of the data.
It’s not affected by outliers, so it's a more reliable indicator of typical variability in skewed datasets.


🧠 Level Up: Understanding Variability Beyond the Range

While the range gives a simple measure of spread, it’s very sensitive to outliers — extreme values can distort your understanding.

  • 📊 The IQR zeroes in on the middle 50% of data, making it more robust when outliers exist.
  • 📦 The box plot visually separates the central bulk of data from outliers, showing you skewness and spread at a glance.
  • 🔍 These tools are especially important in fields like finance, biology, and machine learning where outliers are common.

Mastering these measures will help you make better decisions and spot patterns that average measures alone can miss.


💬 Got a question or suggestion?
Feel free to leave a comment below — I’d love to hear your thoughts or help clarify any part of this topic.


🔁 Summary

MeasureWhat it tells usSensitive to outliers?
RangeMax − Min✅ Yes
IQRSpread of middle 50%❌ No
Box PlotVisual of quartiles & outliers❌ No

✅ Up Next

Next, we’ll go deeper into numeric measures of variability:

  • Variance
  • Standard Deviation

And we’ll learn how to calculate and visualize them!

Stay tuned.

This post is licensed under CC BY 4.0 by the author.