Post

Understanding Independence and Bayes’ Rule

Understanding Independence and Bayes’ Rule

Understanding uncertainty is at the heart of statistics — and Bayes’ Rule is one of the most powerful tools to deal with it. This post will show you how Bayes’ Theorem helps us update probabilities with new information, and how it connects to independence, conditional probability, and real-world reasoning — from medical diagnoses to machine learning models. —

📚 This post is part of the "Intro to Statistics" series

🔙 Previously: Making Sense of Union, Tables, and Conditional Thinking

🔜 Next: Probability Distributions & Cumulative Thinking


🔗 What Is Independence?

Two events are independent when the occurrence of one doesn’t affect the probability of the other.

📌 Mathematically:

Any of the following implies independence:

\[ P(A \mid B) = P(A) \] \[ P(B \mid A) = P(B) \] \[ P(A \cap B) = P(A) \cdot P(B) \]

If any of these holds, the events are independent.


🔄 Independence vs Disjoint

ConceptDescription
Disjoint (Mutually Exclusive)Events can’t both happen. \( P(A \cap B) = 0 \)
IndependentOne event doesn’t affect the other’s probability

✅ Key Insight:

  • If A and B are disjoint, then \( P(A \cap B) = 0 \)
  • But that contradicts \( P(A) \cdot P(B) > 0 \) — so:

    Disjoint events are always dependent.
    Independent events are never disjoint (unless one has probability 0).


🧪 Example: Email Spam Detection

Suppose:

  • 40% of all emails are spam → \( P(S) = 0.4 \)
  • 80% of spam emails contain the word “free” → \( P(F \mid S) = 0.8 \)
  • 10% of non-spam emails contain “free” → \( P(F \mid \bar{S}) = 0.1 \)

🌳 Build a Decision Tree

Bayes Decision Tree Figure: Tree diagram showing all outcomes for Spam vs Free

We can calculate all joint probabilities:

  • \( P(S \cap F) = 0.4 \cdot 0.8 = 0.32 \)
  • \( P(\bar{S} \cap F) = 0.6 \cdot 0.1 = 0.06 \)
  • \( P(F) = 0.32 + 0.06 = 0.38 \)

📘 Bayes’ Theorem

Now, we want:

If I see “free” in an email, what’s the probability it’s spam?

📌 Formula:

\[ P(S \mid F) = \frac{P(S \cap F)}{P(F)} = \frac{0.32}{0.38} \approx 0.842 \]


🧠 Understanding Bayes’ Rule: Components

TermMeaning
PriorWhat you believe before seeing the evidence → \( P(S) = 0.4 \)
LikelihoodProbability of the evidence given the hypothesis → \( P(F \mid S) \)
EvidenceTotal probability of seeing “free” → \( P(F) = 0.38 \)
PosteriorUpdated belief → \( P(S \mid F) = 0.842 \)

🧠 Bayes’ Theorem in Action: Two Real-World Examples


📚 Example 1: Student Cheating Detection

A teacher knows that only 2% of students cheat on exams.

She uses a plagiarism detector that:

  • Correctly identifies cheaters 90% of the time
  • Wrongly flags innocent students 5% of the time

Now, a student gets flagged. What are the chances they actually cheated?

Let:

  • \( C \): student cheated
  • \( P \): student flagged

We know:

  • \( P(C) = 0.02 \),  \( P(\bar{C}) = 0.98 \)
  • \( P(P \mid C) = 0.90 \)
  • \( P(P \mid \bar{C}) = 0.05 \)

✏️ Bayes’ Theorem:

\[ P(C \mid P) = \frac{P(P \mid C) \cdot P(C)}{P(P \mid C) \cdot P(C) + P(P \mid \bar{C}) \cdot P(\bar{C})} \]

🔍 Interpretation:

  • Numerator = Likelihood × Prior = \( 0.90 \cdot 0.02 = 0.018 \)
    → This is the joint probability of being a cheater and being flagged
  • Denominator = Total probability of being flagged
    → Includes both cheaters and non-cheaters who were flagged: \[ = 0.018 + (0.05 \cdot 0.98) = 0.018 + 0.049 = 0.067 \]

✅ Final Answer:

\[ P(C \mid P) = \frac{0.018}{0.067} \approx 0.268 \]

Even if flagged, there’s only a ~26.8% chance the student actually cheated.

Bayes Pie Chart Figure: True vs False Positives that make up the total evidence (P(Flagged))


💊 Example 2: Random Drug Testing at Work

A company screens employees for a rare performance-enhancing drug.

  • Only 1 in 1,000 uses it → \( P(D) = 0.001 \)
  • The test is 99% accurate:
    • \( P(+ \mid D) = 0.99 \)
    • \( P(+ \mid \bar{D}) = 0.01 \)

An employee tests positive. What’s the probability they actually use the drug?


✏️ Bayes’ Theorem:

\[ P(D \mid +) = \frac{P(+ \mid D) \cdot P(D)}{P(+ \mid D) \cdot P(D) + P(+ \mid \bar{D}) \cdot P(\bar{D})} \]

🔍 Interpretation:

  • Numerator = Likelihood × Prior = \( 0.99 \cdot 0.001 = 0.00099 \)
    → This is the joint probability of actually using the drug and testing positive
  • Denominator = Total probability of testing positive: \[ = 0.00099 + (0.01 \cdot 0.999) = 0.00099 + 0.00999 = 0.01098 \]

✅ Final Answer:

\[ P(D \mid +) = \frac{0.00099}{0.01098} \approx 0.09 \]

Despite a positive result, there’s only a ~9% chance the employee actually uses the drug — because the condition is rare, and false positives dominate the denominator.


Bayes Formula Flow Figure: Flow of belief update — from prior and likelihood to posterior


🧠 Level Up: Mastering Bayes — What’s Really Going On?

Bayes’ Theorem might look like a formula — but it’s actually a way of reversing conditional logic.

  • 🎯 The numerator is the probability that both the hypothesis and evidence are true (joint probability).
  • 🧪 The denominator is the total probability of the observed evidence — from all possible sources.

So Bayes’ Theorem simply asks:

If this result just happened, how likely was it caused by what I suspected?

🧠 You’re updating your belief (the prior) based on what you just saw (the evidence), and how likely that evidence is under each possible explanation (likelihood).

Bayes is not just math — it’s decision logic under uncertainty.


✅ Best Practices for Bayes’ Rule
  • Understand the difference between prior, likelihood, and posterior before applying the formula.
  • Use tree diagrams or tables to break down problems clearly.
  • Double-check that your events are independent when assuming so.
  • Always compute total probability in the denominator correctly.

⚠️ Common Pitfalls
  • ❌ Confusing P(A | B) with P(B | A).
  • ❌ Ignoring base rates (priors), especially when they are very small.
  • ❌ Mislabeling dependent events as independent.
  • ❌ Forgetting to normalize with the full evidence probability in the denominator.

📌 Try It Yourself: Independence & Bayes

Q1: If \( P(A \mid B) = P(A) \), what does this imply?

💡 Show Answer

Independence — it means that knowing B occurred does not change the probability of A.

So A and B are independent events.


Q2: Can disjoint events be independent?

💡 Show Answer

No — disjoint events cannot both happen together, so the occurrence of one means the other definitely didn’t happen.

That makes them dependent by definition.


Q3: What is the formula for Bayes’ Theorem?

💡 Show Answer

Bayes’ Rule:

P(A | B) = [P(B | A) × P(A)] ÷ P(B)

It allows us to reverse conditional probabilities based on observed evidence.


Bonus: Why is Bayes’ Rule so powerful in real-world applications?

💡 Show Answer

✅ It helps update probabilities when new evidence appears.

Whether in medical tests, spam filters, or AI predictions, Bayes’ Rule allows smart decision-making based on prior knowledge.


🧠 Summary

ConceptMeaning
IndependenceOne event does not affect the other
DisjointEvents can’t happen together
Joint from MarginalOnly possible if events are independent
Bayes’ RuleUpdates belief with new data
PriorInitial belief
LikelihoodHow likely the data is under a hypothesis
PosteriorUpdated probability
EvidenceTotal probability of the observed condition

💬 Got a question or suggestion?

Leave a comment below — I’d love to hear your thoughts or help if something was unclear.


✅ Up Next

Next, we’ll dive into probability distributions and how cumulative distributions help us model real-world events over time.

Stay tuned!

This post is licensed under CC BY 4.0 by the author.