Expected value, also called mathematical expectation, is the weighted average value of a Random Variable under a Probability Distribution.
It is written with the expectation operator:
Read this as “the expected value of
Intuition
Expected value is the long-run average outcome if the same random process is repeated many times.
It is not necessarily the value that is most likely to happen. For example, the expected value of a fair six-sided die is:
But rolling a
Discrete random variables
For a discrete random variable
Example:
Then:
Continuous random variables
For a continuous random variable, expectation is computed with an integral instead of a sum:
where
Expectation of a function
Expectation can also be taken over a function of a random variable:
For continuous variables:
This is useful because many objectives in machine learning are written as expectations over losses, rewards, or log probabilities.
Subscript notation
Sometimes the distribution is written under the expectation symbol:
This means:
- sample
from the distribution - compute
- average the result over many samples
For example:
means “the expected return when trajectories
This notation appears often in Policy Gradient:
Conditional expectation
Expectation can be conditioned on some information:
Read this as “the expected value of
For example, in reinforcement learning:
This means the value of a state is the expected future return given that the agent is currently in state
Linearity of expectation
Expectation is linear:
This is true even if
Useful special cases:
Sample estimate
In practice, expectations are often estimated by averaging samples:
where each
This is why many ML and RL objectives use expectations in theory but averages over minibatches or sampled trajectories in code.
Common readings
| Notation | Read as |
|---|---|
| expected value of | |
| expected value of | |
| expected value of a function of | |
| expected value of | |
| empirical/sample average estimate of expectation |
Summary
Expectation notation means taking an average with respect to a probability distribution.
The key idea is:
In ML and RL,