Post

What Are Random Variables and How Do We Visualize Their Distributions?

What Are Random Variables and How Do We Visualize Their Distributions?

What’s the difference between a PMF, PDF, and CDF β€” and how do they relate to random variables? In this post, we’ll break down the key types of random variables (discrete vs continuous), show you how to visualize them, and explain how these concepts power real-world applications in statistics and machine learning. β€”

πŸ“š This post is part of the "Intro to Statistics" series

πŸ”™ Previously: Understanding Independence and Bayes’ Rule

πŸ”œ Next: Summary Statistics of Probability Distributions


🎲 What Is a Random Variable?

A random variable is a numerical outcome of a random phenomenon.

It can take different values depending on the situation β€” like the result of a die roll, the temperature in your city, or a person’s height.


🧱 Types of Random Variables

TypeDescriptionExamples
DiscreteTakes a countable number of values# of calls/day, die roll result
ContinuousTakes any value within an interval (infinite possibilities)Height, temperature, weight

πŸ“Š How Do We Work With Random Variables?

We use probability distributions to describe how likely each outcome is.

A probability distribution can be expressed as:

  • A table
  • A graph
  • An equation

Depending on the variable type, we use:

TypeDistribution Function
DiscreteProbability Mass Function (PMF)
ContinuousProbability Density Function (PDF)

πŸ” Visual: PMF (Discrete Distribution)

PMF Plot

Each bar shows the probability of an exact outcome.


πŸ” Visual: PDF (Continuous Distribution)

PDF Plot

The area under the curve (not the height) represents probability.
You can’t directly say \( P(X = 5) \); it’s always \( P(a \le X \le b) \).


βš–οΈ Why Are Discrete Probabilities Simpler?

With discrete random variables, calculating probabilities is straightforward β€” you can just add up the values:

\( P(X = 2 \text{ or } X = 3) = P(X = 2) + P(X = 3) \)

In contrast, with continuous variables, you need to integrate the area under the curve β€” which often requires formulas or software.


πŸ“ˆ Cumulative Distribution Function (CDF)

The Cumulative Distribution Function answers:

What is the probability that \( X \) is less than or equal to some value?

We can compute CDFs for both discrete and continuous variables.


πŸ§ͺ Example: CDF (Discrete)

xP(X = x)P(X ≀ x)
10.10.1
20.30.4
30.20.6
40.250.85
50.151.0

CDF Discrete

Each step adds the probability from the previous value.


πŸ“Š Example: CDF (Continuous)

CDF Continuous

This curve shows P(X ≀ x) for every point β€” and it always increases.


πŸ“‰ Distribution vs Cumulative: Visual Comparison

ViewWhat It Shows
PDF / PMFProbability of individual values (or areas)
CDFCumulative probability up to a certain point

🎨 Visual Comparison

PDF β†’ Use the area under curve to find probability
CDF β†’ Read probability directly from the graph


πŸ“Œ Key Properties of CDF

  • Always increases (never decreases)
  • Final value = 1
  • You can find \( x \) for a given probability β€” or the other way around

🎯 What Is a Quantile?

A quantile tells us the value at a certain cumulative probability.

  • The median is the 0.5 quantile β†’ 50% of values lie below
  • The 0.9 quantile means 90% of values are below that point

πŸ” Visual Example

Quantile Concept

If the 90th percentile is 8.1, then \( P(X \le 8.1) = 0.90 \)


πŸ€– Why It Matters for Machine Learning

In machine learning, understanding random variables and their distributions is essential for:

  • Modeling prediction uncertainty (e.g., probabilistic classifiers)
  • Evaluating models (e.g., CDFs in ROC analysis)
  • Data generation (e.g., sampling from PDFs or PMFs)
  • Feature engineering (quantile normalization, log transformations)

Mastering these basics helps you build better models with interpretable results.


🧠 Level Up: Why CDFs Are Powerful
CDFs help you answer questions in reverse:
  • β€œWhat’s the probability of X being below a threshold?” (➑️ read from CDF)
  • β€œWhat value corresponds to 75% of cases?” (➑️ find the x-value for \( P(X) = 0.75 \)) You can even invert the function to get values back from probabilities. CDFs are especially useful in:
  • Risk modeling
  • Threshold setting
  • Statistical simulations
  • Machine learning (quantile regression)

  • βœ… Best Practices
    • Use CDFs when comparing distributions or setting thresholds.
    • Choose PMF for count data and PDF for continuous measurements.
    • Use visualizations to detect skewness or outliers in distributions.
    • Verify the total probability sums to 1 in your PMF/PDF.

    ⚠️ Common Pitfalls
    • ❌ Misinterpreting PDF height as probability β€” area matters.
    • ❌ Forgetting that continuous variables can't have P(X = a) > 0.
    • ❌ Confusing quantiles with raw values or assuming symmetry.
    • ❌ Using PMF formulas on continuous data or vice versa.

    πŸ“Œ Try It Yourself

    Q: What’s the key difference between discrete and continuous random variables?

    πŸ’‘ Show Answer

    βœ… Discrete variables take countable values (like 0, 1, 2...), while Continuous variables can take any value within a range (like height or time).


    Q: What is the distribution function for discrete random variables called?

    πŸ’‘ Show Answer

    βœ… It’s called the Probability Mass Function (PMF) β€” it gives the probability of each possible value.


    Q: Why is it easier to calculate probabilities with discrete variables than continuous ones?

    πŸ’‘ Show Answer

    βœ… Because you can just sum the individual probabilities. Continuous variables need integration over intervals.


    Q: What does the Cumulative Distribution Function (CDF) tell us?

    πŸ’‘ Show Answer

    βœ… It shows the cumulative probability that a variable is less than or equal to a certain value.


    Q: What is a quantile in statistics?

    πŸ’‘ Show Answer

    βœ… A quantile is a value below which a given percentage of data falls.
    For example, the median is the 0.5 quantile.


    Bonus: What’s the probability that a continuous variable takes an exact value?

    πŸ’‘ Show Answer

    βœ… Zero.
    We calculate probability over intervals β€” single values have zero probability in continuous distributions.


    βœ… Summary

    ConceptDescription
    Random VariableRepresents numeric outcome of a random event
    DiscreteCountable outcomes (use PMF)
    ContinuousInfinite outcomes (use PDF)
    PMF / PDFDescribe probability distribution
    CDFAccumulated probability up to x
    QuantileInverse of CDF β€” get x for a given probability

    πŸ’¬ Got a question or suggestion?

    Leave a comment below β€” I’d love to hear your thoughts or help if something was unclear.


    πŸ”œ Up Next

    In the next post, we’ll explore summary statistics like:

    • Mean
    • Variance
    • Standard deviation
    • Expected value

    These help us describe how a probability distribution behaves.

    Stay tuned!

    This post is licensed under CC BY 4.0 by the author.