Joint and conditional probability

In this section, we will study relationships between random variables in more detail. So far, we have often assumed different random variables to be independent from each other. But the more interesting case is when one random variable can provide information about the other. As an example, denote the data that you write on a flash drive as $X$ and let $Y$ be the data that you read from it. If $X$ and $Y$ are independent from each other, then the flash drive is completely useless. Ideally, you want $Y = X$ , but in reality this is seldom the case, because noise is everywhere. So $X$ and $Y$ are neither independent nor can they be assumed to be equal. But $Y$ does provide information about $X$ . We can study such relationships through joint and conditional probabilities.

Joint probability distributions

We have seen that a distribution can fully describe a single random variable. We have also seen that for two independent random variables, $X$ and $Y$ , we have $P (X = i, Y = j) = P (X = i) P (Y = j)$ . What about random variables that are not independent?

Suppose a coin is flipped twice. Let $X$ denote the result of the first flip ( $X = 1$ represents heads and $X = 0$ represents tails). Also, let $Y$ be the total number of heads. Clearly, $Y$ and $X$ are not independent. For example, if we know $Y = 2$ , then necessarily $X = 1$ . How can we fully describe this relationship? We can do so using their joint probability distribution, which is essentially the set of values $P (X = i, Y = j)$ for all possible $i, j$ . We can show it using a table. Let’s first find the probability values:

\begin{aligned} P (X = 0, Y = 0) & = P ({T T}) = \frac{1}{4}, \\ P (X = 0, Y = 1) & = P ({T H}) = \frac{1}{4}, \\ P (X = 0, Y = 2) & = P ({}) = 0, \\ P (X = 1, Y = 0) & = P ({}) =, \\ P (X = 1, Y = 1) & = P ({H T}) = \frac{1}{4}, \\ P (X = 1, Y = 2) & = P ({H H}) = \frac{1}{4} . \end{aligned}

Now we can represent the distribution as a table (or a 3D plot):

$P (X = i, Y = j)$	$Y = 0$	$Y = 1$	$Y = 2$
$X = 0$	1/4	1/4	0
$X = 1$	0	1/4	1/4

If we have the joint probability distribution, how can we find the distribution of each random variable? To find $P (Y = i)$ say, we can sum up the corresponding row or column. In our example,

P (Y = 1) = P (X = 0, Y = 1) + P (X = 1, Y = 1) = 1 / 4 + 1 / 4 = 1 / 2.

We can add this information to the table to present all the information at once:

$P (X = i, Y = j)$	$Y = 0$	$Y = 1$	$Y = 2$	$P (X = i)$
$X = 0$	1/4	1/4	0	1/2
$X = 1$	0	1/4	1/4	1/2

$P (Y = j)$	1/4	1/2	1/4

(Feller, 1968) A red ball, a blue ball, and a green ball are randomly distributed into three bins numbered 1,2,3. Note that since for each ball we have three options, there are 27 outcomes. All 27 outcomes are equally likely.

Let $X_{1}$ be the number of balls in the first bin and $X_{2}$ the number of balls in the second bin.

Find the outcomes in the event $X_{1} = 1, X_{2} = 2$ and the event $X_{1} = 2, X_{2} = 2$ .
Find the joint distribution of $X_{1}$ and $X_{2}$ as a table.
Find the distribution of $X_{1}$ and the distribution of $X_{2}$ .

We'll only give the distributions.

$P (X_{1} = i, X_{2} = j)$	$X_{2} = 0$	$X_{2} = 1$	$X_{2} = 2$	$X_{2} = 3$
$X_{1} = 0$	1/27	3/27	3/27	1/27
$X_{1} = 1$	3/27	6/27	3/27	0/27
$X_{1} = 2$	3/27	3/27	0/27	0/27
$X_{1} = 3$	1/27	0/27	0/27	0/27

By summing the rows, we find the distribution of

X_{1}

P (X_{1} = 0) = 8 / 27, P (X_{1} = 1) = 12 / 27, P (X_{1} = 2) = 6 / 27, P (X_{1} = 3) = 1 / 27.

The distribution for

X_{2}

is the same since the joint distribution is symmetric.

Let

X

Y

be independent binary random variables, each taking values of 0 and 1 with equal probability. Let

Z = X \oplus Y

, where the the sum is in

F_{2}

(similar to normal sum except that

1 \oplus 1 = 0

). Find the joint distribution of

X

and

Z

as well as the distribution of

X

and the distribution of

Z

, and show them in a single table.

Conditional probability

Suppose we have collected the following data over a year about traffic collisions and weather on a given road:

	Bad weather	Good weather
Days with car crashes	5	5
Days with no car crash	60	295

Let’s assume that this data is representative and we can use it to estimate probabilities.

Let $B$ be the event that the weather is bad. Then $P (B) = (60 + 5) / 365 = 18 %$ .
Let $G$ be the event that the weather is good. Then $P (G) = (300) / (365) = 72 %$ .
Let $C$ be the event that there is a car clash. Then $P (C) = (5 + 5) / 365 = 2.7 %$ .
The probability of bad weather and car crash is $P (B \cap C) = 5 / 365 = 1.4 %$ .

Now, we’d like to consider the effect of weather on the probability of car crash. To find the probability of a car crash when the weather is bad, we limit our attention to the 65 days in which we have bad weather. Among these, in 5 days there were car crashes. So we can estimate this probability as

P (car crash given that the weather is bad) = \frac{5}{65} = 7.7 % .

This is a conditional probability. We assume some event to hold and then find the probability of another event under that assumption. We write the conditional probability of an event $A$ assuming $B$ as $P (A | B)$ . So above we have found

P (C | B) = \frac{5}{65} = 7.7 % .

Similarly,

P (C | G) = \frac{5}{300} = 1.7 %,

which is smaller as can be expected.

The key point is that when computing the conditional probability of $A$ given $B$ , in the denominator, we only consider those cases (days in our example) in which $B$ has occurred, and in the numerator, we consider cases in which both $A$ and $B$ occurred. While above we were dealing with cases (days), we can apply the same logic to probabilities (and get the same result):

P (C | B) = \frac{P (C \cap B)}{P (B)} = \frac{5 / 365}{65 / 365} = \frac{5}{65} = 7.7 % .

	The probability of an event A given that B has occurred is defined as 
	P(A|B)=P(A∩B)P(B). 
  We require that P(B)≠0.

If two events $A$ and $B$ are independent, then

P (A | B) = \frac{P (A \cap B)}{P (B)} = \frac{P (A) P (B)}{P (B)} = P (A) .

So whether or not $B$ has occurred does not affect the probability of $A$ .

Prove that

P (A^{c} | B) = 1 - P (A | B)

P (A^{c} | B) = \frac{P (A^{c} \cap B)}{P (B)} = \frac{P (B ∖ A)}{P (B)} = \frac{P (B) - P (A \cap B)}{P (B)} = 1 - P (A | B) .

A positive integer less than or equal to 18 is chosen at random, with all outcomes equally likely.

What is the probability that it is divisible by 8?
What is the probability that it is divisible by 8 given that it is divisible by 4?
What is the probability that it is divisible by 2?
What is the probability that it is divisible by 2 given that it is divisible by 3?

There are 2 outcomes that are divisible by 8. So the probability is $2 / 18 = 1 / 9$ .
There are 4 outcomes that are divisible by 4. Among these, 2 are divisible by both 4 and 8. So the desired conditional probability is $2 / 4 = 1 / 2$ .
There are 9 outcomes that are divisible by 2. So the probability is $9 / 18 = 1 / 2$
There are 6 outcomes that are divisible by 3. Among these, 3 are divisible by both 2 and 3. So the desired conditional probability is $3 / 6 = 1 / 2$ . So divisibility by 2 and by 3 are independent in this example.

The law of total probability

Conditional probability can be helpful for finding the (unconditional) probability of events too. Note that

P (A) = P (B) P (A | B) + P (B^{c}) P (A | B^{c}) .

Prove the above equality.

A coin is flipped. If tails shows, it is flipped two more times, and if heads shows, it is flipped once more. What is the probability that we observe two heads.

Let

B

be the event that tails shows on the first flip and

A

the event that we observe two heads. Then

P (A) = P (B) P (A | B) + P (B^{c}) P (A | B^{c}) = \frac{1}{2} \cdot (\frac{1}{2} \cdot \frac{1}{2}) + \frac{1}{2} \cdot \frac{1}{2} = \frac{3}{8} .

This approach can be extended further. For example, suppose $B_{1}, B_{2}, B_{3}$ are events that that are pairwise disjoint (the intersection of any two of them is empty) and $B_{1} \cup B_{2} \cup B_{3} = Ω$ . Then,

$\begin{matrix} (1) & P (A) = P (B_{1}) P (A | B_{1}) + P (B_{2}) P (A | B_{2}) + P (B_{3}) P (A | B_{3}) . \end{matrix}$

Let’s see an example. Suppose a random 2-digit number (between 10 and 99, inclusive) is selected by choosing each digit at random. What is the probability of the event $A$ that it is larger than 63? It is easy to see that all choices are equally likely so we can easily find the desired probability by finding the number of choices that are larger than 63 and dividing by the total number of cases,

\frac{99 - 63}{99 - 9} = \frac{36}{90} = 0.4 .

Let us also find it using conditional probability. The chosen number is larger than 63 if ‘the first digit (from the left) is larger than 6’ or ‘the first digit is equal to 6 and the second digit is larger than 3’. Let $B_{1}, B_{2}, B_{3}$ be the events that the first digit is less than 6, equal to 6, and larger than 6, respectively. Using $(1)$

P (A) = \frac{5}{9} \cdot 0 + \frac{1}{9} \cdot \frac{6}{10} + \frac{3}{9} \cdot 1 = \frac{36}{90} = 0.4 .

The Monty Hall problem

In the Monty Hall game, a prize is hidden behind one of three doors. After you pick a door, the host opens another one without the prize and gives you the option to switch to the unopened door. There are two strategies. One is sticking with your original choice and the other is switching to the unopened door. Does the strategy you pick affect your chance of winning?

You can play the game below. A car and two goats are hidden behind the doors and your goal is to win the car (although I have heard goats are pretty good pets!). Play a few rounds of this game. After each round, clicking any of the door or prize resets the game.

🚪

Play the game to win a 🚗

After the host reveals the door, should you switch or not? Play the game at least 10 times with each strategy. Is the number of wins larger with the strategy you think is better?

Let

W

be the event that you win the prize and

D_{i}

the event that the prize is behind door

i

. If your strategy is not to switch, then

P (W) = 1 / 3

. Now let's assume that your strategy is to switch. Suppose without loss of generality that you picked the first door. Then

\begin{aligned} P (W) & = P (W | D_{1}) P (D_{1}) + P (W | D_{2}) P (D_{2}) + P (W | D_{3}) P (D_{3}) \\ = \frac{1}{3} (P (W | D_{1}) + P (W | D_{2}) + P (W | D_{3})) \\ = \frac{1}{3} (0 + 1 + 1) \\ = \frac{2}{3} . \end{aligned}

So it's best to switch. Deciding this problem based on one's intuition can be misleading. If you are surprised by the result, you are in good company, which includes one of the most prominent mathematicians of the 20th century, Paul Erdos.

Conditional distributions

Conditional probability also naturally applies to random variables. As an example, consider a coin being flipped twice. Let $X$ be the number of heads in the first flip (so either 0 or 1) and $Y$ be the total number of heads. Let us find $P (X = i | Y = j)$ for $i \in {0, 1}$ and $j \in {0, 1, 2}$ . Let us repeat the joint distribution table found above

$P (X = i, Y = j)$	$Y = 0$	$Y = 1$	$Y = 2$	$P (X = i)$
$X = 0$	1/4	1/4	0	1/2
$X = 1$	0	1/4	1/4	1/2

$P (Y = j)$	1/4	1/2	1/4

Then,

\begin{aligned} P (X = 0 | Y = 0) & = \frac{P (X = 0, Y = 0)}{P (Y = 0)} = \frac{1 / 4}{1 / 4} = 1, & P (X = 1 | Y = 0) & = 1 - P (X = 0 | Y = 0) = 0, \\ P (X = 0 | Y = 1) & = \frac{P (X = 0, Y = 1)}{P (Y = 1)} = \frac{1 / 4}{1 / 2} = \frac{1}{2}, & P (X = 1 | Y = 1) & = 1 - P (X = 0 | Y = 1) = \frac{1}{2}, \\ P (X = 0 | Y = 2) & = \frac{P (X = 0, Y = 2)}{P (Y = 2)} = \frac{0}{1 / 4} = 0, & P (X = 1 | Y = 2) & = 1 - P (X = 0 | Y = 2) = 0. \end{aligned}

The function $P (X = i | B)$ for an event $B$ and all possible values of $i$ , is called the conditional distribution of $X$ given $B$ .

Let

X_{1}

and

X_{2}

be defined as in this exercise. Find the distribution

P (X_{2} = i | X_{1} = 1) .

P (X_{2} = i | X_{1} = 1) = \frac{P (X_{2} = i, X_{1} = 1)}{P (X_{1} = 1)} = \frac{P (X_{2} = i, X_{1} = 1)}{12 / 27} .

P (X_{2} = 0 | X_{1} = 1) = 3 / 12, P (X_{2} = 1 | X_{1} = 1) = 6 / 12, P (X_{2} = 2 | X_{1} = 1) = 3 / 12, P (X_{2} = 3 | X_{1} = 1) = 0.

A die is rolled twice, showing first

X

and then

Y

. Let

Z = X + Y

. What is

P (Z = 4)

. Find the conditional distribution of

X

given

Z = 4

. That is, find

P (X = i | Z = 4)

for all possible values of

i

A die is rolled and

X

is shown. A coin is flipped

X

times. The number of heads is denoted by

Y

. What is the conditional distribution of

Y

given

X = 3

, i.e.,

P (Y = j | X = 3)