Post

Understanding Normal Distribution

Understanding Normal Distribution

๐Ÿ“Œ What is Normal Distribution (Gaussian Distribution)?

The normal distribution (or Gaussian distribution) is a type of continuous probability distribution for a real-valued random variable. It describes how many natural phenomena and errors in measurements are distributed. The graph is symmetric and bell-shaped.


๐Ÿ“š This post is part of the "Intro to Statistics" series

๐Ÿ”™ Previously: Mean, Variance, and Standard Deviation of Random Variables

๐Ÿ”œ Next: Z Distribution


๐Ÿ“ The Probability Density Function (PDF) for Normal Distribution

The equation for the PDF of a normal distribution is:

\[ f(x) = \frac{1}{\sigma \sqrt{2\pi}} \exp \left( -\frac{(x - \mu)^2}{2\sigma^2} \right) \]

Where:

  • \( \mu \) is the mean (location parameter) of the distribution, which defines where the peak of the bell curve is located.
  • \( \sigma \) is the standard deviation (shape parameter), which controls the width of the bell curve.
  • \( \exp \) is the exponential function, describing how particles or phenomena distribute themselves in nature (e.g., diffusion).

This equation connects the statistical world to real-world distributions.


๐Ÿ“Š Understanding the Equation

This equation is an exponential function and, after standardization, it describes how the values are distributed symmetrically around the mean.

  • The area under the curve represents the total probability, and the sum of all probabilities equals 1.
  • The variable \( x \) can take any value from \( -\infty \) to \( +\infty \), meaning the distribution extends infinitely in both directions.

๐Ÿ”„ Important Characteristics of Normal Distribution

  • \( \mu \) describes the location of the distribution, i.e., where the center of the bell curve lies.
  • \( \sigma \) defines the shape of the distribution, i.e., how spread out the values are around the mean.
  • The probability for any given range can be found using the cumulative distribution function (CDF).

๐Ÿงฎ Example of Normal Distribution

For any normal distribution:

  • 68% of values lie between \( \mu - \sigma \) and \( \mu + \sigma \).
  • 95% of values lie between \( \mu - 2\sigma \) and \( \mu + 2\sigma \).
  • 99.7% of values lie between \( \mu - 3\sigma \) and \( \mu + 3\sigma \).

๐Ÿ“ˆ Visualizing the 68%, 95%, and 99.7% Rule

Hereโ€™s a visual showing the 68%, 95%, and 99.7% areas under the curve:

Normal Distribution - Empirical Rule


๐Ÿ“ How to Calculate Probabilities Using Normal Distribution

To calculate the probability that a variable \( X \) lies within a specific range:

  • We use the Cumulative Distribution Function (CDF), which gives the area under the curve from \( -\infty \) to a specified \( x \).

๐Ÿง  Level Up: Understanding the Normal Distribution in Detail
  • The normal distribution is foundational in statistics. It is used in hypothesis testing, confidence intervals, and in many natural and social sciences.
  • The 68-95-99.7 rule: This empirical rule highlights the percentage of data that falls within 1, 2, and 3 standard deviations from the mean.
  • The central limit theorem suggests that, regardless of the original distribution of data, the sampling distribution of the sample mean will approximate a normal distribution as the sample size increases.
  • In practice, many natural phenomena and errors in measurement follow a normal distribution because of the law of large numbers.

๐Ÿ“Œ Try It Yourself: Normal Distribution

Q1: What is the normal distribution also known as?

๐Ÿ’ก Show Answer

It is also known as the **Gaussian distribution**.

Q2: What does the standard deviation \( \sigma \) control in a normal distribution?

๐Ÿ’ก Show Answer

The standard deviation \( \sigma \) controls the spread of the distribution (the width of the bell curve).

Q3: What is the probability that a value \( x \) lies between \( \mu - 3\sigma \) and \( \mu + 3\sigma \) in any normal distribution?

๐Ÿ’ก Show Answer

99.7% of values lie between \( \mu - 3\sigma \) and \( \mu + 3\sigma \).

Q4: What percentage of values lie between \( \mu - 2\sigma \) and \( \mu + 2\sigma \)?

๐Ÿ’ก Show Answer

95% of values lie between \( \mu - 2\sigma \) and \( \mu + 2\sigma \).

Q5: What is the cumulative distribution function (CDF) used for in the normal distribution?

๐Ÿ’ก Show Answer

The CDF is used to calculate the probability that a random variable \( X \) falls within a specific range.

Q6: In a normal distribution, how much of the distribution falls within one standard deviation of the mean?

๐Ÿ’ก Show Answer

68% of the distribution lies between \( \mu - \sigma \) and \( \mu + \sigma \).


๐Ÿ“ Summary of Key Points

  • The normal distribution is symmetric and bell-shaped.
  • The mean \( \mu \) determines the location of the peak.
  • The standard deviation \( \sigma \) controls the spread.
  • 68% of values lie within one standard deviation (\( \mu \pm \sigma \)).
  • 95% of values lie within two standard deviations (\( \mu \pm 2\sigma \)).
  • 99.7% of values lie within three standard deviations (\( \mu \pm 3\sigma \)).

๐Ÿ”œ Up Next

Next, weโ€™ll explore the Z-Distribution โ€” a standardized version of the normal distribution that is used to calculate probabilities and percentiles.

Stay tuned!

This post is licensed under CC BY 4.0 by the author.