Bayes Theorem
Introduction and Intuition
Bayes' Theorem is one of the most powerful and widely-used tools in probability and statistics. It allows us to update our beliefs about an event based on new evidence, and is essential in:
- Statistics
- Machine Learning (especially Bayesian models)
- Medical diagnosis
- Inverse problems in engineering and science
Conditional Probability Refresher
We define the probability of event A given that event B has occurred as:
This represents an update to the probability of A when we know that B has happened.
Multiplication Rule
From the definition above, we also get: \(P(A \cap B) = P(A|B) \cdot P(B)\)
This is known as the multiplication rule, and it can also be written as: \(P(B \cap A) = P(B|A) \cdot P(A)\)
This symmetry is important and is used to derive Bayes' Theorem.
Why Do We Want to Reverse the Conditioning?
Often, we know:
- \(P(\text{symptom} \mid \text{disease})\): how likely the symptom is if someone has the disease.
But what we want is:
- \(P(\text{disease} \mid \text{symptom})\): how likely someone has the disease given the symptom (this is the inverse problem).
This reversal is essential in diagnostics, prediction, and decision making.
Bayes’ Theorem (Basic Form)
This is derived by noting:
- \(P(A \cap B) = P(A|B) \cdot P(B) = P(B|A) \cdot P(A)\)
Bayesian Terminology
- Posterior \(P(B|A)\): updated belief after seeing evidence A.
- Likelihood (Update) \(P(A|B)\): probability of seeing evidence A if B is true.
- Prior \(P(B)\): belief about B before seeing evidence.
- Evidence (Marginal) \(P(A)\): total probability of observing A (normalizing constant).
Bayes’ Theorem expresses: \(\text{Posterior} = \frac{\text{Likelihood} \times \text{Prior}}{\text{Evidence}}\)
Bayes’ Theorem (Extended Form: Law of Total Probability)
Often, \(P(A)\) is hard to compute directly. But we can break it into components using the law of total probability:
So the full expression becomes:
This is extremely useful when we only know conditional probabilities and priors.
Bayes’ Theorem (General Form for Partitioned Sample Space)
If events \(B_1, B_2, ..., B_n\) form a partition of the sample space (disjoint and exhaustive), then:
This is useful in multiclass classification and hypothesis testing with multiple possible causes.
Example: Cancer Screening Paradox
Setup:
- Test is 99% accurate.
- \(P(\text{+}|\text{disease}) = 0.99\)
- \(P(\text{+}|\text{no disease}) = 0.01\)
- Disease is rare:
- \(P(\text{disease}) = 0.001\)
- i.e. only 1 in 1000 people have the disease
- \(P(\text{no disease}) = 0.999\)
- \(P(\text{disease}) = 0.001\)
We want: \(P(\text{disease}|\text{+})\)
Apply Bayes’ Theorem:
Interpretation:
Even with a 99% accurate test, the chance you actually have cancer given a positive result is only 9%.
Why? Because the disease is so rare, the false positives dominate.
This is why positive results in screening often lead to secondary confirmatory tests.
Sequential Updates and Priors
Bayesian reasoning supports sequential data updates:
- Start with a prior (e.g., the coin is fair).
- Gather data (e.g., coin shows tails 10 times).
- Compute the posterior (updated belief).
- Use that posterior as the new prior for the next observation.
- Repeat.
This principle is at the heart of Bayesian machine learning, e.g., in:
- Bayesian optimization
- Bayesian networks
- Bayesian deep learning
Bayesian thinking is about updating beliefs with new evidence. In practice, prior knowledge and base rates dramatically affect interpretation, especially for rare events. Think carefully about what you can measure (e.g., test results) vs. what you really want to know (e.g., actual condition).