Post

Mean, Variance, and Standard Deviation of Random Variables

Mean, Variance, and Standard Deviation of Random Variables

How do we summarize a random variable with a single number?
What happens to the mean and variance if we shift or scale the variable?
This post explains the mean, variance, and standard deviation for both discrete and continuous random variables โ€” with concrete examples.


๐Ÿ“š This post is part of the "Intro to Statistics" series

๐Ÿ”™ Previously: What Are Random Variables and How Do We Visualize Their Distributions?

๐Ÿ”œ Next: Introduction to the Normal Distribution


๐Ÿ“ What Is the Mean of a Random Variable?

The mean (or expected value) of a random variable ( X ) is its probability-weighted average of all possible values.


๐Ÿงฎ Mean of a Discrete Random Variable

\[ \mu_X = E(X) = \sum_i x_i P(x_i) \]

This means each value \( x_i \) is weighted by its probability \( P(x_i) \).

Example:

\(x_i\)1234
\(P(x_i)\)0.10.30.40.2

Calculate:

\[ E(X) = 1 \times 0.1 + 2 \times 0.3 + 3 \times 0.4 + 4 \times 0.2 = 0.1 + 0.6 + 1.2 + 0.8 = 2.7 \]


Mean of Discrete Random Variable Light


๐Ÿ“ Mean of a Continuous Random Variable

\[ \mu_X = E(X) = \int_{-\infty}^{\infty} x f(x) \, dx \]

Where \( f(x) \) is the probability density function (PDF).

Example:

If

\[ f(x) = \frac{1}{2} \quad \text{for } 0 \leq x \leq 2, \quad 0 \text{ otherwise} \]

Then

\[ E(X) = \int_0^2 x \times \frac{1}{2} \, dx = \frac{1}{2} \int_0^2 x \, dx = \frac{1}{2} \times \left[ \frac{x^2}{2} \right]_0^2 = \frac{1}{2} \times 2 = 1 \]


Mean of a Continuous Random Variable


๐Ÿ”„ Mean Under Linear Transformations

If we transform \( X \) as:

\[ Y = a + bX \]

then

\[ E(Y) = a + b E(X) \]


Example (Using discrete mean above):

\[ E(Y) = 3 + 2 \times 2.7 = 3 + 5.4 = 8.4 \]


Mean of a Continuous Random Variable


๐Ÿ“Š What Is Variance?

Variance measures the spread or deviation of values around the mean:

\[ \text{Var}(X) = E[(X - \mu)^2] \]


๐Ÿงฎ Variance for Discrete Random Variable

\[ \text{Var}(X) = \sum_i (x_i - \mu)^2 P(x_i) \]

Using the discrete example above (\( \mu = 2.7 \)):

\[ \text{Var}(X) = (1 - 2.7)^2 \times 0.1 + (2 - 2.7)^2 \times 0.3 + (3 - 2.7)^2 \times 0.4 + (4 - 2.7)^2 \times 0.2 \]

\[ = (2.89)(0.1) + (0.49)(0.3) + (0.09)(0.4) + (1.69)(0.2) \]

\[ = 0.289 + 0.147 + 0.036 + 0.338 \]

\[ = 0.81 \]


Mean of a Continuous Random Variable


๐Ÿ“ Variance for Continuous Random Variable

\[ \text{Var}(X) = \int_{-\infty}^\infty (x - \mu)^2 f(x) \, dx \]

For the continuous example above (\( \mu=1 \)):

\[ \text{Var}(X) = \int_0^2 (x - 1)^2 \times \frac{1}{2} \, dx = \frac{1}{2} \int_0^2 (x^2 - 2x + 1) \, dx \]

Calculate:

\[ = \frac{1}{2} \left[ \frac{x^3}{3} - x^2 + x \right]_0^2 = \frac{1}{2} \left( \frac{8}{3} - 4 + 2 \right) = \frac{1}{2} \times \frac{2}{3} = \frac{1}{3} \approx 0.333 \]


๐Ÿ”„ Variance Under Linear Transformations

For \( Y = a + bX \), variance changes as:

\[ \text{Var}(Y) = b^2 \text{Var}(X) \]

Adding or subtracting a constant \( a \) does not affect variance.


โœ๏ธ Proof Sketch:

\[ \text{Var}(Y) = E[(Y - E[Y])^2] \]

\[ = E[(a + bX - (a + bE[X]))^2] \]

\[ = E[(b(X - E[X]))^2] \]

\[ = b^2 E[(X - E[X])^2] \]

\[ = b^2 \text{Var}(X) \]


Example:

Using previous discrete variance ( 0.81 ):

\[ \text{Var}(Y) = 2^2 \times 0.81 = 4 \times 0.81 = 3.24 \]


๐Ÿ“ Standard Deviation and Scaling

Standard deviation \( \sigma \) is the square root of variance:

\[ \sigma_X = \sqrt{\text{Var}(X)} \]

For \( Y = a + bX \):

\[ \sigma_Y = \sqrt{\text{Var}(Y)} = \sqrt{b^2 \text{Var}(X)} = |b| \sigma_X \]


Example (continued):

\[ \sigma_X = \sqrt{0.81} = 0.9 \]

\[ \sigma_Y = 2 \times 0.9 = 1.8 \]


๐Ÿ”ข Variance of Sum and Difference

For any two variables \( X \) and \( Y \):

\[ \text{Var}(X \pm Y) = \text{Var}(X) + \text{Var}(Y) \pm 2\,\text{Cov}(X, Y) \]


๐Ÿค– Why It Matters for Machine Learning

Understanding expected value and variance is foundational to many machine learning algorithms:

  • ๐Ÿ“Š Feature Scaling: When features are transformed (e.g., using standardization or min-max scaling), you're applying linear transformations โ€” and knowing how these affect mean and variance helps you avoid introducing bias.
  • ๐Ÿง  Loss Functions: Common losses like MSE rely on variance concepts โ€” minimizing variance between predicted and actual values improves model performance.
  • ๐Ÿ“ˆ Model Interpretation: Many models assume data has constant variance (homoscedasticity). Violating this can lead to poor generalization.

๐Ÿ’ก Mastering these statistical fundamentals makes it easier to debug models, improve feature engineering, and better understand algorithm behavior under the hood.


๐Ÿง  Level Up: Understanding Variance Properties
  • Adding a constant \( a \) to a random variable shifts the mean but leaves variance unchanged.
  • Multiplying by \( b \) scales the variance by \( b^2 \).
  • Standard deviation scales by \( |b| \) โ€” the absolute value of the multiplier.
  • Variance of sums depends on covariance โ€” independent variables have zero covariance, so variances add.

  • โœ… Best Practices for Working with Random Variable Metrics
    • Always check whether your variable is discrete or continuous before choosing formulas.
    • Use linear transformation rules to simplify calculations and scaling checks.
    • Understand that standard deviation is more interpretable in the context of units.
    • Apply variance rules carefully when dealing with sums of variables.

    โš ๏ธ Common Pitfalls to Avoid
    • โŒ Confusing the formula for variance between sample vs. population.
    • โŒ Forgetting to square the scaling factor when applying transformations to variance.
    • โŒ Assuming that adding constants affects the variance (it doesnโ€™t).
    • โŒ Neglecting covariance when computing variance of sums.

    ๐Ÿ“Œ Try It Yourself: Mean, Variance & Linear Transformations

    Q1: What does the expected value (mean) of a discrete random variable represent?

    ๐Ÿ’ก Show Answer

    โœ… Itโ€™s the probability-weighted average of all possible outcomes.

    In other words, it's the long-run average value youโ€™d expect over many trials.


    Q2: For a linear transformation \( Y = a + bX \), how do you compute the expected value \( E(Y) \)?

    ๐Ÿ’ก Show Answer

    โœ… \( E(Y) = a + b \cdot E(X) \)

    This means the expected value shifts and scales along with the transformation.


    Q3: If you add a constant \( a \) to every value of a random variable, how does it affect the variance?

    ๐Ÿ’ก Show Answer

    โœ… It doesnโ€™t change the variance.

    Variance only measures spread, not the location โ€” adding a constant shifts all values equally.


    Q4: For \( Y = a + bX \), how is the variance of \( Y \) related to the variance of \( X \)?

    ๐Ÿ’ก Show Answer

    โœ… \( \text{Var}(Y) = b^2 \cdot \text{Var}(X) \)

    Because scaling a variable by \( b \) increases the spread by \( b^2 \), while the constant \( a \) has no effect.


    โœ… Summary

    ConceptFormula / Description
    Mean (Discrete)\( \mu = \sum x_i P(x_i) \)
    Mean (Continuous)\( \mu = \int x f(x) dx \)
    Variance (Discrete)\( \sigma^2 = \sum (x_i - \mu)^2 P(x_i) \)
    Variance (Continuous)\( \sigma^2 = \int (x - \mu)^2 f(x) dx \)
    Linear Transform Mean\( E(a + bX) = a + b E(X) \)
    Linear Transform Variance\( \text{Var}(a + bX) = b^2 \text{Var}(X) \)
    Variance of Sum/Diff\( \text{Var}(X \pm Y) = \text{Var}(X) + \text{Var}(Y) \pm 2\text{Cov}(X,Y) \)
    Std Deviation\( \sigma = \sqrt{\text{Var}(X)} \)

    ๐Ÿ’ฌ Got a question or suggestion?

    Leave a comment below โ€” Iโ€™d love to hear your thoughts or help if something was unclear.


    ๐Ÿ”œ Up Next

    Next, weโ€™ll explore the Normal Distribution โ€” a fundamental continuous distribution that appears everywhere in statistics and data science.

    Stay tuned!

    This post is licensed under CC BY 4.0 by the author.