Expectation and variance
Distributions are very useful as they can fully describe random variables. But there are two problems that make it necessary to find other ways of describing random variables. The first is that determining the distribution of a random variable in a real-world problem is usually difficult. For example, what is the distribution of the amount of rain in a particular location in June? This is hard to say without measuring the rainfall in June for many years. The second is distributions are complicated and difficult to understand. Even if we have the distribution of rainfall, it might be difficult to use to make decisions about crops, etc.
So we are after ways to describe random variables via metrics that are both easier to determine and easier to understand. The most common such metrics are the expected value, which describes the average, and variance, which tells us how much variation about the average we can expect.
Expected value (mean)
If we roll a die many times, what will be the average of the values that we obtain? Intuitively, we may expect the average to be the average of the outcomes, i.e., $\frac{1+2+3+4+5+6}{6}=\frac{21}6=3.5$, since they are equally likely. Let’s see if this agrees with what we know from probability theory. Let the number of times we repeat the experiment be $N$, where $N$ is a very large number. The number of times 1 will show is approximately $\frac N6$ since the probability of 1 is $\frac16$. The same holds for any other number. So the average will approximately be
\[\frac{\frac N6\times1+\frac N6 \times 2+\frac N6 \times 3+\frac N6 \times 4+\frac N6 \times 5+\frac N6 \times 6}{N}=\frac{1+2+3+4+5+6}{6}=3.5,\]which agrees with our guess.
Let’s also test this via simulation. The following is the numbers showing when a die is rolled 60 times
\[424332216461654325215554335255644631643624431132536542626231\]We would expect each number to appear around 10 times. Based on the actual counts, the average is
\[\frac{7\times1+11 \times 2+11 \times 3+11 \times 4+10 \times 5+10\times 6}{60}=3.6.\]What if we have a loaded die, where the probability of $i$ is $p_i$ for \(i\in \{1,2,3,4,5,6\}\)? This time, $i$ appears in $N$ trials approximately $Np_i$ times if $N$ is large. The same argument as above then tells us the average over many throws is approximately
\[\frac{Np_1\times1+Np_2 \times 2+Np_3\times3+Np_4 \times 4+Np_5 \times 5+Np_6 \times 6}N =p_1+2p_2+3p_3+4p_4 +5p_5+6p_6 = \sum_{i=1}^6 i p_i.\]Our discussion above motivates the expected value of a random variable, defined such that if the experiment repeated many times, the average of the values will be close to the expected value.
Prediction and expected value
Suppose we want to use probability to predict the outcome of an experiment. Given a random variable $X$ for the outcome and a known probability distribution for that random variable, how do we come up with the best prediction of $X$? Here are some candidates:
- The most likely outcome (mode)
- The average if the experiment is repeated many times (expected value/mean)
- The “middle” outcome (median)
The mode is often not a good solution as the most likely value could be isolated and far from other likely values. The mean is a decent choice. In addition to being close to the average of many trials, the difference between the mean and the actual outcome is on average the smallest compared to other predictions. But it is susceptible to outliers. That is, a single very large outcome, even it has very small probability, has a significant effect on the mean. For this reason, sometimes median is preferred. (If Bill Gates walks into a bar, it increases the mean income substantially but the median will change little.)
Expected values of functions of random variables
Sometimes we need to compute the expected value of a function of a random variable. An example of this is the power of a signal, which is equal to the square of the value of the signal. There are two ways in which we can find the expectation of a function of a random variable. As an example, consider the following random variable
For the expected value of $X$, we have
What is the expected value of $X^2$? For each value that $X$ can take, we can find the value of $X^2$ and multiply by the corresponding probability,
As natural as this approach is, it does not directly follow the definition of the expected value given above. To follow that definition, we first need to define another random variable, $Y$, as $Y=X^2$. Then we find the distribution of $Y$ and compute its expected value. We have
and
It is not difficult to show that both methods lead to the same result. This is called the Law Of The Unconscious Statistician or LOTUS. Specifically,
For example, for a random variable $X$, the expected value of $X^2$ is
\[E[X^2] = \sum_{x} x^2 p_x.\]Linearity of expectation
From the last exercise, we see that the expected value of the sum of two die rolls is equal to the sum of the expected values of the die rolls. Is this a coincidence? No, it follows from an important property of expectation.
For two random variables $X$ and $Y$, we have $$E[X+Y] = E[X]+E[Y]$$ and more generally, $$E[aX+bY+c] = aE[X]+bE[Y]+c,$$ where $a,b,c$ are constants (numbers).
Let $X_i$ for $i=1,\dotsc,N,$ be random variables with the same mean $\mu$. Furthermore, let $X = X_1+\dotsm+X_N$. By the linearity of expectation,
A special case of \eqref{eq:LinExpIdentical} occurs when $X_i\sim Bernoulli(p)$ for all $i$. Since $E[X_i]=p$, we have
We can use \eqref{eq:LinExpIdBer} to find the expected value of random variables with binomial distribution. Let us consider our archer again, who hits the target in each try with probability $p$. The number $X$ of hits in $N$ tries has a binomial distribution, i.e.,
\[X\sim Binomial(N,p).\]What is the expected number of hits? Define random variables $X_i$ for $i=1,2,\dotsc,N$ as follows: If the $i$th try is successful, then $X_i=1$. Otherwise, $X_i=0$. Then $X=X_1+\dotsm+X_N$ and
\[E[X]=E[X_1]+\dotsm+E[X_N]=Np.\]
Note that in \eqref{eq:LinExpIdBer}, we did not need to assume independence between the random variables $X_1,X_2,\dotsc,X_N$. This makes the equation very powerful, allowing us to obtain results that would be difficult to find otherwise.
A. What is the probability that two people have the same birthday (without considering leap years)?
B. In a class with 100 students, what is the expected number of pairs of students who have the same birthday (without considering leap years)?
C. What is the smallest number $N$ of students such that we can expect at least one pair of students to share a birthday (without considering leap years)?
Variance
Is the mean enough?
Suppose someone offers you a game in which your expected winning is $100. Will you accept?
Which game would you play?
- You always win exactly $100.
- You win $0 with probability ½ and $200 with probability ½.
- You win $1200 with probability ½ and lose $1000 with probability ½.
All three have the same mean. So what’s different between them?
The mean does not capture the variability, or how much the outcome is expected to change from trial to trial. It is possible for random outcomes to be consistently close to the expected value or vary widely from it. How do we quantify such behaviors?
The square root of the variance of $X$ is called its standard deviation and usually shown by \(\sigma_X\).
Assuming that the mean is $\mu$, why don’t we define the variance as the expected difference \(E[X-\mu]\)? This is because this quantity is always 0: \(E[(X-\mu)] = E[X] - \mu = \mu - \mu = 0\).
We can also find the variance of a function of a random variable. From the definition,
where $\mu_g = E[g(X)]$.
Properties of variance
Just like expected value, the variance has certain properties:
- If we scale random variable X by a constant \(a\), then \(Var(aX) = \vert a \vert^2Var(X)\).
- If we add a constant to a random variable, the variance does not change.
- The variance of the sum of independent random variables is the sum of the variances: If $X$ and $Y$ are independent, then $Var(X+Y) = Var(X)+Var(Y)$.
Viewing a binomial random variable \(X \sim \text{Binomial}(N,p)\) as the sum of $N$ Bernoulli($p$) random variable, we find that