Mathematics of Information Teaching Group Publications Home

Joint and conditional probability

In this section, we will study relationships between random variables in more detail. So far, we have often assumed different random variables to be independent from each other. But the more interesting case is when one random variable can provide information about the other. As an example, denote the data that you write on a flash drive as X and let Y be the data that you read from it. If X and Y are independent from each other, then the flash drive is completely useless. Ideally, you want Y=X, but in reality this is seldom the case, because noise is everywhere. So X and Y are neither independent nor can they be assumed to be equal. But Y does provide information about X. We can study such relationships through joint and conditional probabilities.

Joint probability distributions

We have seen that a distribution can fully describe a single random variable. We have also seen that for two independent random variables, X and Y, we have P(X=i,Y=j)=P(X=i)P(Y=j). What about random variables that are not independent?

Suppose a coin is flipped twice. Let X denote the result of the first flip (X=1 represents heads and X=0 represents tails). Also, let Y be the total number of heads. Clearly, Y and X are not independent. For example, if we know Y=2, then necessarily X=1. How can we fully describe this relationship? We can do so using their joint probability distribution, which is essentially the set of values P(X=i,Y=j) for all possible i,j. We can show it using a table. Let’s first find the probability values:

P(X=0,Y=0)=P({TT})=14,P(X=0,Y=1)=P({TH})=14,P(X=0,Y=2)=P({})=0,P(X=1,Y=0)=P({})=,P(X=1,Y=1)=P({HT})=14,P(X=1,Y=2)=P({HH})=14.

Now we can represent the distribution as a table (or a 3D plot):

P(X=i,Y=j) Y=0 Y=1 Y=2
X=0 1/4 1/4 0
X=1 0 1/4 1/4


If we have the joint probability distribution, how can we find the distribution of each random variable? To find P(Y=i) say, we can sum up the corresponding row or column. In our example,

P(Y=1)=P(X=0,Y=1)+P(X=1,Y=1)=1/4+1/4=1/2.

We can add this information to the table to present all the information at once:

P(X=i,Y=j) Y=0 Y=1 Y=2 P(X=i)
X=0 1/4 1/4 0 1/2
X=1 0 1/4 1/4 1/2
P(Y=j) 1/4 1/2 1/4


(Feller, 1968) A red ball, a blue ball, and a green ball are randomly distributed into three bins numbered 1,2,3. Note that since for each ball we have three options, there are 27 outcomes. All 27 outcomes are equally likely.

Let X1 be the number of balls in the first bin and X2 the number of balls in the second bin.


Let X, Y be independent binary random variables, each taking values of 0 and 1 with equal probability. Let Z=XY, where the the sum is in F2 (similar to normal sum except that 11=0). Find the joint distribution of X and Z as well as the distribution of X and the distribution of Z, and show them in a single table.


Conditional probability

Suppose we have collected the following data over a year about traffic collisions and weather on a given road:

Bad weather Good weather
Days with car crashes 5 5
Days with no car crash 60 295

Let’s assume that this data is representative and we can use it to estimate probabilities.

Now, we’d like to consider the effect of weather on the probability of car crash. To find the probability of a car crash when the weather is bad, we limit our attention to the 65 days in which we have bad weather. Among these, in 5 days there were car crashes. So we can estimate this probability as

P(car crash given that the weather is bad)=565=7.7%.

This is a conditional probability. We assume some event to hold and then find the probability of another event under that assumption. We write the conditional probability of an event A assuming B as P(A|B). So above we have found

P(C|B)=565=7.7%.

Similarly,

P(C|G)=5300=1.7%,

which is smaller as can be expected.

The key point is that when computing the conditional probability of A given B, in the denominator, we only consider those cases (days in our example) in which B has occurred, and in the numerator, we consider cases in which both A and B occurred. While above we were dealing with cases (days), we can apply the same logic to probabilities (and get the same result):

P(C|B)=P(CB)P(B)=5/36565/365=565=7.7%.
The probability of an event A given that B has occurred is defined as P(A|B)=P(AB)P(B). We require that P(B)0.


If two events A and B are independent, then

P(A|B)=P(AB)P(B)=P(A)P(B)P(B)=P(A).

So whether or not B has occurred does not affect the probability of A.

Prove that P(Ac|B)=1P(A|B).


A positive integer less than or equal to 18 is chosen at random, with all outcomes equally likely.


The law of total probability

Conditional probability can be helpful for finding the (unconditional) probability of events too. Note that

P(A)=P(B)P(A|B)+P(Bc)P(A|Bc).
Prove the above equality.


A coin is flipped. If tails shows, it is flipped two more times, and if heads shows, it is flipped once more. What is the probability that we observe two heads.


This approach can be extended further. For example, suppose B1,B2,B3 are events that that are pairwise disjoint (the intersection of any two of them is empty) and B1B2B3=Ω. Then,

(1)P(A)=P(B1)P(A|B1)+P(B2)P(A|B2)+P(B3)P(A|B3).

Let’s see an example. Suppose a random 2-digit number (between 10 and 99, inclusive) is selected by choosing each digit at random. What is the probability of the event A that it is larger than 63? It is easy to see that all choices are equally likely so we can easily find the desired probability by finding the number of choices that are larger than 63 and dividing by the total number of cases,

9963999=3690=0.4.

Let us also find it using conditional probability. The chosen number is larger than 63 if ‘the first digit (from the left) is larger than 6’ or ‘the first digit is equal to 6 and the second digit is larger than 3’. Let B1,B2,B3 be the events that the first digit is less than 6, equal to 6, and larger than 6, respectively. Using (1)

P(A)=590+19610+391=3690=0.4.

The Monty Hall problem

In the Monty Hall game, a prize is hidden behind one of three doors. After you pick a door, the host opens another one without the prize and gives you the option to switch to the unopened door. There are two strategies. One is sticking with your original choice and the other is switching to the unopened door. Does the strategy you pick affect your chance of winning?

You can play the game below. A car and two goats are hidden behind the doors and your goal is to win the car (although I have heard goats are pretty good pets!). Play a few rounds of this game. After each round, clicking any of the door or prize resets the game.

🚪 🚪 🚪

After the host reveals the door, should you switch or not? Play the game at least 10 times with each strategy. Is the number of wins larger with the strategy you think is better?


Conditional distributions

Conditional probability also naturally applies to random variables. As an example, consider a coin being flipped twice. Let X be the number of heads in the first flip (so either 0 or 1) and Y be the total number of heads. Let us find P(X=i|Y=j) for i{0,1} and j{0,1,2}. Let us repeat the joint distribution table found above

P(X=i,Y=j) Y=0 Y=1 Y=2 P(X=i)
X=0 1/4 1/4 0 1/2
X=1 0 1/4 1/4 1/2
P(Y=j) 1/4 1/2 1/4

Then,

P(X=0|Y=0)=P(X=0,Y=0)P(Y=0)=1/41/4=1,P(X=1|Y=0)=1P(X=0|Y=0)=0,P(X=0|Y=1)=P(X=0,Y=1)P(Y=1)=1/41/2=12,P(X=1|Y=1)=1P(X=0|Y=1)=12,P(X=0|Y=2)=P(X=0,Y=2)P(Y=2)=01/4=0,P(X=1|Y=2)=1P(X=0|Y=2)=0.

The function P(X=i|B) for an event B and all possible values of i, is called the conditional distribution of X given B.

Let X1 and X2 be defined as in this exercise. Find the distribution P(X2=i|X1=1).


A die is rolled twice, showing first X and then Y. Let Z=X+Y. What is P(Z=4). Find the conditional distribution of X given Z=4. That is, find P(X=i|Z=4) for all possible values of i.


A die is rolled and X is shown. A coin is flipped X times. The number of heads is denoted by Y. What is the conditional distribution of Y given X=3, i.e., P(Y=j|X=3)?