Mathematics of Information Teaching Group Publications Home

Random variables, distributions, and independence

In this section, we will first introduce random variables, which will enable us to model diverse random experiments in a unified mathematical framework. We will also learn about probability distributions, which are useful for describing and predicting how likely random events are to occur, and expected values which summarize distributions as a single value.

Random variables

Sample spaces of random experiments are fairly arbitrary objects (e.g., {heads, tails}, values showing on dice, playing cards). This generality makes probability theory powerful. But unfortunately, such generality is also inconvenient – mathematics works best with concrete definitions.

So how can we develop a uniform framework for coin flips, die rolls, card draws, etc.? We define random variables to represent random outcomes. A random variable (RV) assigns a real number to any possible outcome of a trial or experiment. Sometimes the RV can be defined in a natural way for example for a die roll, the random variable can be defined as the number showing on the die. In other case, we have to define an arbitrary mapping such as heads0,tails1.

We actually represented outcomes with numbers with foresight in the last section; we just didn’t mention the term random variables (it’s a good idea to review this table and the examples after it).

Random variable are shown by capital letters, such as X. When a particular outcomes occurs the random variable takes the corresponding value. For example, let X be the random variable defined for a coin toss, where we assign heads0,tails1. Then if heads shows, we say X=0 while if tails shows we say X=1.

Random variables allow us to translate general ideas about probability to numeric concepts:

A random variable whose range is discrete, is called a discrete random variable and those with continuous ranges are called continuous random variables.

Define a random variable for

Consider a random variable X, defined as the number showing on a die. We can now represent events as membership in a set of numbers. For example, the event that an odd number shows on the die can be written as X{1,3,5} and its probability can be written as P(X{1,3,5}).

A die is rolled and the number showing on it is denoted by X. Find the probability of the events below.

P(X=2)= , P(3X5)=


When an experiment has more than one component, for example, when an action is repeated, we can describe the outcomes as a tuple of random variables. For example, suppose a die is rolled twice. Let the result of the first roll be denoted by X and the result of the second roll be denoted by Y. We can represent each outcome with a pair of numbers, i.e., (X,Y). For example, (2,3) represents the first die showing 2 and the second die showing 3.

Random variables can also be functions of other random variables. Continuing the two dice example, we can define a third random variable as the sum of X and Y, i.e., Z=X+Y.

What is the range of the random variable Z defined above (i.e., the set of possible values)? What is the probability of Z=3. (I strongly recommend reviewing this example and also this example.)


A fair coin is flipped twice (all outcomes are equally likely). Let X be the number of heads. What is P(X=1).


A binary sequence of length 3 is chosen at random with all choices equally likely. Let X be the number of 0s and Y be the number of 1s in this sequence. What is the probability that X>Y?


When dealing with more than one random variable, say X and Y, we can write P(X=i,Y=j) to indicate the intersection of the two events X=i and Y=j, i.e., P(X=iY=j).

Describing probabilities: distributions

Probability over discrete sets

Suppose a die is rolled twice, with all outcomes equally likely. Consider the random variable Z defined as the sum of the two die rolls. The sample space, with all outcomes equally likely, is

Ω={(x,y):x,yN,1x6,1y6}

and the range of the random variable Z is the set {2,3,4,5,6,7,8,9,10,11,12}. We can find the probability of each of these values:

P(Z=2)=1/36since {Z=2}={(1,1)}P(Z=3)=2/36since {Z=3}={(1,2),(2,1)}P(Z=4)=3/36since {Z=4}={(1,3),(2,2),(3,1)}P(Z=5)=4/36since {Z=5}={(1,4),(2,3),(3,2),(4,1)}P(Z=6)=5/36since {Z=6}={(1,5),(2,4),(3,3),(4,2),(5,1)}P(Z=7)=6/36since {Z=7}={(1,6),(2,5),(3,4),(4,3),(5,2),(6,1)}P(Z=8)=5/36since {Z=8}={(2,6),(3,5),(4,4),(5,3),(6,2)}P(Z=9)=4/36since {Z=9}={(3,6),(4,5),(5,4),(6,3)}P(Z=10)=3/36since {Z=10}={(4,6),(5,5),(6,4)}P(Z=11)=2/36since {Z=11}={(5,6),(6,5)}P(Z=12)=1/36since {Z=12}={(6,6)}

Now that we have these probabilities, we can plot P(Z=z), for z{2,3,4,5,6,7,8,9,10,11,12}, which you can see below:

123456789101112130.050.10.15
P(Z=z)
z

The function P(Z=z) is the distribution of Z. It determines the probability of each value that Z can take. This is useful for a few reasons. First, after we determine the distribution, we no longer need to recalculate the probabilities of outcomes of interest. Second, we can calculate the probabilities of different events by summing the probabilities of the relevant outcomes. Finally, as we will see, it will allow us to define quantities such as the expected value, which help us better understand the behavior of Z.

Let Z be the sum of two die rolls as above.

In this example, our random variable was discrete. The distribution of a discrete random variable is called a probability mass function (pmf). Note that pmf’s do not apply to continuous random variables. We will discuss continuous probability distributions later, albeit much more briefly.

For brevity, instead of P(X=x) we may write P(x) or p(x).

Let X denote the number showing when a die is rolled. Plot the distribution of X.


A binary sequence of length 3 is chosen at random with all choices equally likely. Let X be the number of 0s this sequence.

Independence

Suppose we have a fair die, which when rolled has equal probability of showing each of the numbers 1 to 6, and a fair coin with equal probability of heads and tails. Let X denote the number showing on the die and Y be equal to 0 if heads shows and equal to 1 if tails shows. If you know what X is, does that provide any information about Y? It doesn’t seem so. For example, if X=4, that doesn’t affect the probability of Y=0. The latter probability is still 1/2.

Two random variables X and Y are called independent if the value of one does not affect the other.

We described above what independence intuitively mean. What about a mathematical definition?

Two random variables X and Y are called independent if for any possible values i and j, P(X=i,Y=j)=P(X=i)P(Y=j).

This definition may seem strange at first sight but it agrees with our intuitive understanding of independence. For example, what is the probability P(X=4,Y=0), with X and Y defined as above? The mathematical definition of independence says if X and Y are independent, as we believe they are, then

P(X=4,Y=0)=P(X=4)×P(Y=0)=1/6×1/2=1/12.

Suppose the experiment is performed many many times. Would we intuitively expect to see X=4,Y=0 in a twelfth of the trials? Yes! In 1/6 of the trials, we would expect to see X=4. Since the coin doesn’t care about the die, among the trials in which 4 shows, in about half of them the coin should show heads (Y=0). So overall, in 1/6×1/2 of the trials, we would expect to see both X=4 and Y=0.

Let’s give this a try:

Click the button below to simulate 60 die roll and coin flip trials (heads are shown with 🙂 and tails with 🏛️). In what fraction of the trials do you observe a 4? In what fraction do you observe heads? In what fraction do you observe both 4 and heads?



Suppose two dice are rolled, with all outcomes equally likely. Let X be the first die, Y be the second die, and Z be their sum. Note that to prove independence you need to prove it for all possible values, but to prove that two random variables are not independent, it suffices to show that the equality in the definition does not hold for one pair of values.

The definition of independence also extends to events:

Two events A and B are called independent if P(AB)=P(A)P(B).

Sometimes the components of an experiment are physically independent from each other, for example, when two dice are rolled, one by one. In such cases, independence is very natural. But independence also may be the case when the two events are physically related or may result from a single action. For example, consider a deck of cards from which you draw a card at random. The color of the card could be red or black; its suit can be heart, diamond, club or spade; and its rank may be A,2,3,4,,J,Q,K. Let’s check the independence of a few events:

P(heart)=14,P(ace)=113,P(heartace)=152=14113 P(red)=12,P(redheart)=141214 P(black)=12,P(blackheart)=01214 P(acered)=252=11312.

The examples above show that color and suit are not independent. Since the first example works for any rank, not just ace, and for any color, not just red, the rank and color are independent. Similarly, rank and suit are independent.

Let A denote the probability of rain and let B denote the probability that UVa wins in a given basketball game. We can assume A and B are independent and P(A)=0.3, P(B)=0.8. Find


Show that if A and B are independent, then A and Bc are also independent.


Events defined using random variables: If two random variables X and Y are independent, any event defined based on these will also be independent. (This is an important result but we will not prove it.) For example, if X and Y are two independent die rolls, then

P(X<4,Y=3)=P(X<4)P(Y=3)=36×16=336=112.

Specifically, X<4,Y=3 corresponds to the event {(1,3),(2,3),(3,3)}.

What about more than two random variables?

Bernoulli and binomial distributions

Bernoulli distribution

The distribution of a random variable X that takes only two values, typically 0 and 1, is called a Bernoulli distribution, where the probability of 1 is usually denoted by p, i.e., p=P(X=1). Such a random variable results from an experiment with two outcomes such as flipping a coin, playing a game (no draw), performing any task that may lead to success or failure, etc. The distribution is determined in full by p:

P(X=1)=p,P(X=0)=1p.

As an example, for p=0.3, the plot of the pmf is given below.

0.510.20.40.60.8
P(X=x)
x

If the distribution of X is Bernoulli with probability of 1 equal to p, we write XBernoulli(p). The most common case is p=1/2 resulting from a fair coin.

Binomial distribution

An archer hits the target with probability p. She participates in a competition that involves shooting three times and we can assume each shot is independent from the others. Let X denote the number of times she hits the target. What is the distribution of X?

Let’s show each outcome as a binary sequence of length 3, with 1 denoting hitting the target. We have

{X=0}={000}{X=1}={001,010,100}{X=2}={011,101,110}{X=3}={111}

Let us now find the probability of each event. X=3 corresponds to three hits. Since each hit has probability p and they are independent,

P(X=3)=p×p×p=p3.

Similarly, the probability of X=0 is

P(X=0)=(1p)×(1p)×(1p)=(1p)3.

The case of X=1 is trickier. There are three outcomes in this event. But they all have the same probability:

P({001})=(1p)(1p)pP({010})=(1p)p(1p)P({100})=p(1p)(1p)

So P(X=1)=3p(1p)2. Similarly, P(X=2)=3p2(1p).

Now let us consider the general case, when the archer tries N times. What is the probability of k hits, i.e., P(X=k)? The probability of a particular sequence of k hits and Nk misses is pk(1p)Nk. But we also need to consider the number of such sequences. Each sequence of k hits and Nk misses is equivalent to a binary sequence with k 1s and Nk 0s. There are (Nk) such sequences. Putting thing together,

P(X=k)=(Nk)pk(1p)Nk.

This is called the binomial distribution, written as XBinomial(N,p).

Below, you can plot the Binomial pmf for different values of N and p.

N= 50, p= 0.5

1020304050607080900.10.20.30.40.50.60.70.80.91


An archer hits the target with probability 0.1. If she shoots 10 times, find the following probabilities: she hits the target once; she hits the target at least once; and she hits the target 5 or 6 times.


Probability over continuous sets

What if we are interested in the amount of rainfall, blood pressure, etc.? Here the probability is over a continuous set. The distribution is then shown with a probability density function, or pdf for short, which is a continuous curve. An example is shown below. Intuitively, wherever pdf is larger, the surrounding region has a higher probability.

1234−1−2−3−40.10.20.30.40.5−0.1

The pdf shown above is for a random variable X with Gaussian distribution, whose formula is

f(x)=12πσ2e(xμ)22σ2

where μ and σ control the shape of the distribution (similar to N and p for binomial distribution.) In the figure σ=1,μ=0.

If the pdf is given by a function f(x), then we can compute event probabilities using integrals:

P(aXb)=abf(x)dx

If we are interested in the probability that X is between -1 and 1, we can find it as

P(1X1)=11f(x)d(x)=0.6827,

which is the area of the shaded region in the graph below.

1234−1−2−3−40.10.20.30.40.5−0.1

Below, we simulate 20 values from this distribution:

0.48889, 1.0347, 0.72689, -0.30344, 0.29387, -0.78728, 0.8884, -1.1471, -1.0689, -0.8095, -2.9443, 1.4384, 0.32519, -0.75493, 1.3703, -1.7115, -0.10224, -0.24145, 0.31921, 0.31286

13 of them are in the interval of interest, a fraction of 13/20=0.65 which is close to the probability 0.68.

Our discussion of continuous distributions will for the most part be limited to the above. We won’t need to compute integrals and Gaussian distribution is the only one we will consider, although there are many more common continuous distributions.

Write a question that you still have about this section.