Understanding Normal Distribution
๐ What is Normal Distribution (Gaussian Distribution)?
The normal distribution (or Gaussian distribution) is a type of continuous probability distribution for a real-valued random variable. It describes how many natural phenomena and errors in measurements are distributed. The graph is symmetric and bell-shaped.
๐ This post is part of the "Intro to Statistics" series
๐ Previously: Mean, Variance, and Standard Deviation of Random Variables
๐ Next: Z Distribution
๐ The Probability Density Function (PDF) for Normal Distribution
The equation for the PDF of a normal distribution is:
\[ f(x) = \frac{1}{\sigma \sqrt{2\pi}} \exp \left( -\frac{(x - \mu)^2}{2\sigma^2} \right) \]
Where:
- \( \mu \) is the mean (location parameter) of the distribution, which defines where the peak of the bell curve is located.
- \( \sigma \) is the standard deviation (shape parameter), which controls the width of the bell curve.
- \( \exp \) is the exponential function, describing how particles or phenomena distribute themselves in nature (e.g., diffusion).
This equation connects the statistical world to real-world distributions.
๐ Understanding the Equation
This equation is an exponential function and, after standardization, it describes how the values are distributed symmetrically around the mean.
- The area under the curve represents the total probability, and the sum of all probabilities equals 1.
- The variable \( x \) can take any value from \( -\infty \) to \( +\infty \), meaning the distribution extends infinitely in both directions.
๐ Important Characteristics of Normal Distribution
- \( \mu \) describes the location of the distribution, i.e., where the center of the bell curve lies.
- \( \sigma \) defines the shape of the distribution, i.e., how spread out the values are around the mean.
- The probability for any given range can be found using the cumulative distribution function (CDF).
๐งฎ Example of Normal Distribution
For any normal distribution:
- 68% of values lie between \( \mu - \sigma \) and \( \mu + \sigma \).
- 95% of values lie between \( \mu - 2\sigma \) and \( \mu + 2\sigma \).
- 99.7% of values lie between \( \mu - 3\sigma \) and \( \mu + 3\sigma \).
๐ Visualizing the 68%, 95%, and 99.7% Rule
Hereโs a visual showing the 68%, 95%, and 99.7% areas under the curve:
๐ How to Calculate Probabilities Using Normal Distribution
To calculate the probability that a variable \( X \) lies within a specific range:
- We use the Cumulative Distribution Function (CDF), which gives the area under the curve from \( -\infty \) to a specified \( x \).
๐ง Level Up: Understanding the Normal Distribution in Detail
- The normal distribution is foundational in statistics. It is used in hypothesis testing, confidence intervals, and in many natural and social sciences.
- The 68-95-99.7 rule: This empirical rule highlights the percentage of data that falls within 1, 2, and 3 standard deviations from the mean.
- The central limit theorem suggests that, regardless of the original distribution of data, the sampling distribution of the sample mean will approximate a normal distribution as the sample size increases.
- In practice, many natural phenomena and errors in measurement follow a normal distribution because of the law of large numbers.
๐ Try It Yourself: Normal Distribution
Q1: What is the normal distribution also known as?
๐ก Show Answer
It is also known as the **Gaussian distribution**.
Q2: What does the standard deviation \( \sigma \) control in a normal distribution?
๐ก Show Answer
The standard deviation \( \sigma \) controls the spread of the distribution (the width of the bell curve).
Q3: What is the probability that a value \( x \) lies between \( \mu - 3\sigma \) and \( \mu + 3\sigma \) in any normal distribution?
๐ก Show Answer
99.7% of values lie between \( \mu - 3\sigma \) and \( \mu + 3\sigma \).
Q4: What percentage of values lie between \( \mu - 2\sigma \) and \( \mu + 2\sigma \)?
๐ก Show Answer
95% of values lie between \( \mu - 2\sigma \) and \( \mu + 2\sigma \).
Q5: What is the cumulative distribution function (CDF) used for in the normal distribution?
๐ก Show Answer
The CDF is used to calculate the probability that a random variable \( X \) falls within a specific range.
Q6: In a normal distribution, how much of the distribution falls within one standard deviation of the mean?
๐ก Show Answer
68% of the distribution lies between \( \mu - \sigma \) and \( \mu + \sigma \).
๐ Summary of Key Points
- The normal distribution is symmetric and bell-shaped.
- The mean \( \mu \) determines the location of the peak.
- The standard deviation \( \sigma \) controls the spread.
- 68% of values lie within one standard deviation (\( \mu \pm \sigma \)).
- 95% of values lie within two standard deviations (\( \mu \pm 2\sigma \)).
- 99.7% of values lie within three standard deviations (\( \mu \pm 3\sigma \)).
๐ Up Next
Next, weโll explore the Z-Distribution โ a standardized version of the normal distribution that is used to calculate probabilities and percentiles.
Stay tuned!