Random Variables

Random Variables #

A random variable is a way to attach numbers to outcomes of a random experiment.

It lets us move from: “what happened?” to: “what number should we analyse?”

Key takeaway: A random variable is a function from the sample space to real numbers. Once you define the random variable clearly, the rest (pmf/pdf/cdf, mean, variance) becomes systematic.


1) Definition #

Random variable: a rule that assigns a number to each outcome.

\[ X: S \to \mathbb{R} \]

Where:

  • \( S \) is the sample space
  • \( \mathbb{R} \) is the set of real numbers

2) Types of random variables #

Discrete random variable #

A discrete random variable can take a countable set of values (e.g., 0, 1, 2, 3, …).

Examples:

  • number of heads in 2 coin tosses
  • number of customers arriving in an hour
  • number of cars in a drive-thru queue

Continuous random variable #

A continuous random variable can take any value in an interval.

Examples:

  • time to finish a task
  • height of a person
  • temperature

Rule of thumb: Counts → usually discrete. Measurements → usually continuous.


3) Probability functions: pmf, pdf, cdf #

3.1 Discrete: probability mass function (pmf) #

The pmf gives the probability of each possible value.

\[ p(x)=P(X=x) \]

A valid pmf must satisfy:

\[ p(x)\ge 0 \quad \text{for all }x \] \[ \sum_x p(x)=1 \]

Example: number of heads in 2 tosses #

Let \( X \) = number of heads. Outcomes: HH, HT, TH, TT.

So:

  • \( P(X=0)=1/4 \)
  • \( P(X=1)=2/4 \)
  • \( P(X=2)=1/4 \)

3.2 Continuous: probability density function (pdf) #

For a continuous random variable, we do not talk about: \( P(X=c) \) (it is 0).

We talk about intervals.

\[ P(a\le X\le b)=\int_a^b f(x)\,dx \]

A valid pdf must satisfy:

\[ f(x)\ge 0 \] \[ \int_{-\infty}^{\infty} f(x)\,dx=1 \]

3.3 Cumulative distribution function (cdf) #

The cdf accumulates probability up to a point.

\[ F(x)=P(X\le x) \]

Discrete case: it is the running total of pmf values.

Continuous case: it is the area under the pdf to the left of x.

Discrete: cdf jumps (step-like). Continuous: cdf is smooth (usually).


4) Mean (Expectation) and Variance #

4.1 Expectation #

Discrete:

\[ E(X)=\sum_x x\,p(x) \]

Continuous:

\[ E(X)=\int_{-\infty}^{\infty} x\,f(x)\,dx \]

Meaning: the long-run average value of X if you repeat the experiment many times.


4.2 Variance and standard deviation #

Variance measures spread around the mean.

Discrete:

\[ V(X)=\sum_x (x-\mu)^2\,p(x) \]

Continuous:

\[ V(X)=\int_{-\infty}^{\infty} (x-\mu)^2\,f(x)\,dx \]

Shortcut (both discrete and continuous):

\[ V(X)=E(X^2)-[E(X)]^2 \]

Standard deviation:

\[ \sigma_X = \sqrt{V(X)} \]

5) Rules to memorise #

Expected value rules:

\[ E(aX+b)=aE(X)+b \]

Variance rules:

\[ V(aX+b)=a^2V(X) \]

6) Two random variables: joint, marginal, conditional #

6.1 Joint distribution (discrete) #

\[ p(x,y)=P(X=x,\,Y=y) \]

6.2 Marginal distributions #

Sum over the other variable:

\[ p_X(x)=\sum_y p(x,y) \] \[ p_Y(y)=\sum_x p(x,y) \]

6.3 Conditional distribution #

\[ p_{Y\mid X}(y\mid x)=\frac{p(x,y)}{p_X(x)}\quad (p_X(x)>0) \]

7) Covariance (relationship between X and Y) #

Covariance measures whether two variables move together.

\[ \operatorname{Cov}(X,Y)=E(XY)-E(X)E(Y) \]
  • Positive covariance: large X tends to come with large Y.
  • Negative covariance: large X tends to come with small Y.
  • Zero covariance: no linear relationship (but they could still be dependent).

8) Transformation of random variables (basic idea) #

Sometimes we define a new random variable: \( Y=g(X) \) .

Discrete: compute \( P(Y=y) \) by summing probabilities of all \( x \) values that map to y.

Continuous (monotone g): use a change-of-variables approach.

You will use transformations later for: standardisation, log transforms, and turning data into “nice” distributions.


Mini-check #

  1. If X is continuous, what is \( P(X=c) \) ?
  2. For a pmf, what must \( \sum_x p(x) \) equal?
  3. State the shortcut formula for variance.

Answers:

  1. 0
  2. 1
\( V(X)=E(X^2)-[E(X)]^2 \)

References #

  • Devore: Ch. 3 (discrete random variables) and Ch. 4 (continuous random variables)

Home | Probability Distributions