Bayes’ Theorem

Bayes’ Theorem #

2.1 Total probability (needed for Bayes) #

Often we split the world into cases \( E_1,E_2,\dots,E_k \) that:

  • are mutually exclusive
  • cover the whole sample space

Then for any event \( A \) :

\[ P(A)=\sum_{i=1}^{k} P(A\mid E_i)\,P(E_i) \]

Tree intuition:

flowchart TD
  S[Start] --> E1[Case E1]
  S --> E2[Case E2]
  S --> E3[Case E3]
  E1 --> A1["A happens"]
  E2 --> A2["A happens"]
  E3 --> A3["A happens"]

2.2 Bayes’ theorem (two-event form) #

Bayes’ Theorem is a mathematical formula used to determine the conditional probability of an event based on prior knowledge and new evidence.

It adjusts probabilities when new information comes in and helps make better decisions in uncertain situations.

Bayes’ theorem (also known as the Bayes Rule or Bayes Law) is used to determine the conditional probability of event \( A \) when event \( B \) has already occurred.

The general statement of Bayes’ theorem is: The conditional probability of an event \( A \) , given the occurrence of another event \( B \) , is equal to the product of the probability of \( B \) given \( A \) and the probability of \( A \) divided by the probability of event \( B \) .

Bayes’ theorem “reverses” conditioning:

\[ P(A\mid B)=\frac{P(B\mid A)\,P(A)}{P(B)},\quad P(B)>0 \]

Language you will hear often:

  • Prior: \( P(A) \)
  • Likelihood: \( P(B\mid A) \)
  • Evidence: \( P(B) \)
  • Posterior: \( P(A\mid B) \)

Hypotheses #

Hypotheses refer to possible events or outcomes in the sample space; they are denoted as \( E_1,E_2,\dots,E_n \) .

Each hypothesis represents a distinct scenario that could explain an observed event.

Priori Probability #

Priori probability \( P(E_i) \) is the initial probability of an event occurring before any new data is taken into account.

It reflects existing knowledge or assumptions about the event.

Example: The probability of a person having a disease before taking a test.

Posterior Probability #

Posterior probability \( P(E_i\mid A) \) is the updated probability of an event after considering new information.

It is derived using Bayes’ Theorem.

Example: The probability of having a disease given a positive test result.

Conditional Probability #

The probability of an event \( A \) based on the occurrence of another event \( B \) is termed conditional probability.

It is denoted as \( P(A\mid B) \) and represents the probability of \( A \) when event \( B \) has already happened.

Joint Probability #

When the probability of two or more events occurring together and at the same time is measured, it is called joint probability.

For two events \( A \) and \( B \) , it is denoted by \( P(A\cap B) \) .

Random Variables #

Real-valued variables whose possible values are determined by random experiments are called random variables.

The probability of finding such variables is the experimental probability.


Difference between Conditional Probability and Bayes Theorem #

TopicBayes TheoremConditional Probability
MeaningDerived using conditional probability and used to find the “reverse” probability.Probability of event \( A \) when event \( B \) has already occurred.
Formula\( P(A\mid B)=\frac{P(B\mid A)P(A)}{P(B)} \)\( P(A\mid B)=\frac{P(A\cap B)}{P(B)} \)
PurposeUpdate the probability of an event based on new evidence.Find the probability of one event based on the occurrence of another.
FocusUses prior knowledge and evidence to compute a revised probability.Direct relationship between two events.

2.3 Bayes’ theorem (general partition form) #

If \( E_1,\dots,E_k \) is a partition and B is observed, then:

\[ P(E_j\mid B)=\frac{P(B\mid E_j)\,P(E_j)}{\sum_{i=1}^{k} P(B\mid E_i)\,P(E_i)} \]

This form is ideal for: diagnosis, fault detection, and “which cause is most likely?” problems.


2.4 Worked example (rare disease, why Bayes can surprise) #

Suppose:

  • Disease rate is 1 in 1000: \( P(D)=0.001 \)
  • Test sensitivity 99%: \( P(+\mid D)=0.99 \)
  • False positive rate 2%: \( P(+\mid D^c)=0.02 \)

First compute evidence:

\[ P(+)=P(+\mid D)P(D)+P(+\mid D^c)P(D^c) \] \[ P(+)=0.99(0.001)+0.02(0.999)=0.02097 \]

Now apply Bayes:

\[ P(D\mid +)=\frac{0.99(0.001)}{0.02097}\approx 0.047 \]

Why this matters: Even a very accurate test can produce many false positives when the condition is rare. The prior probability (base rate) strongly affects the posterior.


2.5 Worked example (communication channel) #

A binary channel:

  • 1 is transmitted 40% of the time: \( P(T1)=0.4 \)
  • 0 is correctly received with probability 0.90
  • 1 is correctly received with probability 0.95

Let: \( R1 \) = “1 received”.

Then: \( P(R1\mid T1)=0.95 \) , \( P(R1\mid T0)=0.10 \) , \( P(T0)=0.6 \) .

Total probability:

\[ P(R1)=P(R1\mid T1)P(T1)+P(R1\mid T0)P(T0) \] \[ P(R1)=0.95(0.4)+0.10(0.6)=0.44 \]

Bayes:

\[ P(T1\mid R1)=\frac{P(R1\mid T1)P(T1)}{P(R1)}=\frac{0.95(0.4)}{0.44}\approx 0.864 \]

Practice prompts #

  1. Write \( P(A\cap B\cap C\cap D) \) using the multiplication rule.
  2. If \( P(A)=0.3 \) and \( P(B\mid A)=0.8 \) , find \( P(A\cap B) \) .
  3. Create a 3-branch total probability tree from your work (e.g., device type, customer segment, failure mode), and compute an overall probability.

Quick answer for 2): \( P(A\cap B)=0.3 imes 0.8=0.24 \)


Reference #


Home | Conditional Probability & Bayes’ Theorem