\( \newcommand{\blah}{blah-blah-blah} \newcommand{\eqb}[1]{\begin{eqnarray*}#1\end{eqnarray*}} \newcommand{\eqbn}[1]{\begin{eqnarray}#1\end{eqnarray}} \newcommand{\bb}[1]{\mathbf{#1}} \newcommand{\mat}[1]{\begin{bmatrix}#1\end{bmatrix}} \newcommand{\nchoose}[2]{\left(\begin{array}{c} #1 \\ #2 \end{array}\right)} \newcommand{\defn}{\stackrel{\vartriangle}{=}} \newcommand{\rvectwo}[2]{\left(\begin{array}{c} #1 \\ #2 \end{array}\right)} \newcommand{\rvecthree}[3]{\left(\begin{array}{r} #1 \\ #2\\ #3\end{array}\right)} \newcommand{\rvecdots}[3]{\left(\begin{array}{r} #1 \\ #2\\ \vdots\\ #3\end{array}\right)} \newcommand{\vectwo}[2]{\left[\begin{array}{r} #1\\#2\end{array}\right]} \newcommand{\vecthree}[3]{\left[\begin{array}{r} #1 \\ #2\\ #3\end{array}\right]} \newcommand{\vecdots}[3]{\left[\begin{array}{r} #1 \\ #2\\ \vdots\\ #3\end{array}\right]} \newcommand{\eql}{\;\; = \;\;} \newcommand{\eqt}{& \;\; = \;\;&} \newcommand{\dv}[2]{\frac{#1}{#2}} \newcommand{\half}{\frac{1}{2}} \newcommand{\mmod}{\!\!\! \mod} \newcommand{\ops}{\;\; #1 \;\;} \newcommand{\implies}{\Rightarrow\;\;\;\;\;\;\;\;\;\;\;\;} \newcommand{\prob}[1]{\mbox{Pr}\left[ #1 \right]} \newcommand{\probm}[1]{\mbox{Pr}\left[ \mbox{#1} \right]} \newcommand{\simm}[1]{\sim\mbox{#1}} \definecolor{dkblue}{RGB}{0,0,120} \definecolor{dkred}{RGB}{120,0,0} \definecolor{dkgreen}{RGB}{0,120,0} \)

Module 7: Probability and Simulation

7.1 A sample application

A queueing simulation:

Experiments like coin flips are easy to simulate
→ The simulation code is itself simple.

The subject of simulation is concerned with more complex systems
→ Here, the simulation code itself may be intricate.

Queueing systems are classic examples in simulation
→ Many simulation principles can be learned from these systems.

Consider a simple two-queue example:

There are two service stations (kiosks).
Each station has a separate queue for customers.
Customers arrive to the system from outside.
→ Their interarrival times are random.
Each customer picks a queue to join and waits for service.
The service time for each customer is random.

Generally, we simulate for a purpose: to estimate something

The average waiting time for a customer.
Is it better for customers to select the shortest queue or flip a coin to decide which queue to join?

Exercise 1: Download, compile and execute QueueControl.java,. setting animate=true and clicking "reset" followed by "go". (You will also need RandTool.java)

You can step through and observe the sequence of events.
Use animate=false and estimate the average wait time for customers.
What queue-selection policy are customers using? Change the policy to improve the waiting time.

Elements of a simulation:

Methods for generating randomness
e.g., random interarrivals and random service times.
How to track time
e.g., advance the clock from one key point to the next.
How to store key aspects and ignore inessential ones
e.g., store waiting customers, but not departed customers.
How to record statistics
e.g., collect wait times at customer departure moments.
What data structures are useful
e.g., a priority queue

Exercise 2: Consider the time spent in the system by each customer. The simulation computes the average "system time."

Is this a discrete or continuous quantity?
Are the system times of two successive customers independent?

Before learning how to write simulations, we will need to understand a few core concepts in probability:

Random variables and distributions.
Expectation and laws of large numbers.
Dependence, joint and conditional distributions.
Statistics: how to estimate and know how many samples you need.

7.2 Random variables

A random variable is an observable numeric quantity from an experiment:

Coin flip example:

"Heads" and "Tails" are not numeric, but 1 and 0 are.
For multiple coin flips, one can count the number of heads.

More coin-flip related random variables:

How many flips does it take to get two heads in a row?
Random variable: number of such flips
How many heads in 10 coin flips?
Random variable: number of heads

Dice example:

Sum of numbers that show up in two rolls
Random variable: the sum

Bus-stop example:

How long before the next bus arrives?
Random variable: actual waiting time
How many buses arrive in a 10-minute period?
Random variable: number of such buses.

Exercise 3: Identify some random variables of interest in the Queueing example.

Range of a random variable: the possible values it can take

Example: number of heads in 10 flips
$\Rightarrow$ range = $\{0,1,2,...,10\}$
Example: sum of two dice
$\Rightarrow$ range = $\{2,3,...,12\}$

Exercise 4: What are the ranges for these random variables?

The number of buses that arrive in a 10-minute period.
The waiting time for the next bus to arrive.

Notation

Sometimes we use the short form rv = random variable.
Typically, we use capital letters to denote rv's:
- Let $X = $ number of heads in 10 coin flips.
- Let $Y = $ the waiting time for the next bus to arrive.

Let's look at the $n$-coin flip example in more detail:

Let $X = $ number of heads in $n$ coin flips.
Suppose $\probm{heads} = 0.6$ for a single flip.

Consider the events when $n=3$ (3 flips)

Outcome	$\probm{outcome}$	value of $X$
$HHH$	0.6 * 0.6 * 0.6	$X = 3$
$HHT$	0.6 * 0.6 * 0.4	$X = 2$
$HTH$	0.6 * 0.4 * 0.6	$X = 2$
$HTT$	0.6 * 0.4 * 0.4	$X = 1$
$THH$	0.4 * 0.6 * 0.6	$X = 2$
$THT$	0.4 * 0.6 * 0.4	$X = 1$
$TTH$	0.4 * 0.4 * 0.6	$X = 1$
$TTT$	0.4 * 0.4 * 0.4	$X = 0$

First, note:
- We've assumed independence between coin flips
  e.g., $\probm{HHT}$ = Pr[1st is heads] * Pr[2nd is heads] * Pr[3rd is tails]
- Each individual outcomes results in an observable numeric value for the rv of interest ($X$).
Notice the following
- The range of $X$ is $X \in \{0,1,2,3\}$.
- $\prob{X = 3} \eql \probm{HHH} \eql 0.6 * 0.6 * 0.6 \eql 0.216$
- $\prob{X = 0} \eql \probm{TTT} \eql 0.4 * 0.4 * 0.4 \eql 0.064$
The two remaining probabilities of interest are: $\prob{X=1}$ and $\prob{X=2}$
- $\prob{X=1} \eql \probm{HTT} + \probm{THT} + \probm{TTH} \eql 3 * 0.4 * 0.6 *0.4 \eql 0.288$
- $\prob{X=2} \eql \probm{HHT} + \probm{HTH} + \probm{THH} \eql 3 * 0.6 * 0.6 *0.4 \eql 0.432$
- We've made use of Axiom 3 (disjoint events).
Thus, this list is complete for $X$:
     $\prob{X=0} \eql 0.064$
     $\prob{X=1} \eql 0.288$
     $\prob{X=2} \eql 0.432$
     $\prob{X=3} \eql 0.216$
Clearly,
$\prob{X=0} + \prob{X=1} + \prob{X=2} + \prob{X=3} \eql 1$

Exercise 5: What is $\prob{X \leq 2}$ above? Write down $\prob{X \leq k}$ for all relevant values of $k$. Next, suppose $\probm{heads} = p$ in the above example. Write down $\prob{X=i}, i\in \{0,1,2,3\},$ in terms of $p$.

Discrete vs. continuous random variables:

Discrete: when the range is finite or countably infinite.
Continuous: when the range is real-valued.
To specify probabilities for a discrete rv $X$, we need to specify $\prob{X=k}$ for every possible value $k$.
For continuous rv's it's a little more complicated
$\Rightarrow$ We will get to this later.
This specification of probabilities is called a distribution.

7.3 Discrete distributions

Consider the following rv and distribution:

Random variable $X$ has range $\{0,1\}$.

Thus, one needs to specify
$\prob{X=0} = ?$
$\prob{X=1} = ?$

For example,
$\prob{X=0} = 0.4$
$\prob{X=1} = 0.6$

One application is coin tossing: $X=1$ (heads) or $X=0$ (tails).

This type of rv can be used for any success-failure type of experiment where $0=\mbox{failure}$ and $1=\mbox{success}$.

In general, we write this type of distribution as
$\prob{X=0} = 1-p$
$\prob{X=1} = p$
where $p$ is the success probability.

This type of rv is called a Bernoulli rv.

The associated distribution is called a Bernoulli distribution.

Here $p$ is called a parameter of the distribution.
$\Rightarrow$ In a particular problem, $p$ would have some value like $p=0.6$.

Note: $\prob{X=0} + \prob{X=1} = 1$, as we would expect.

To summarize:

A discrete rv $X$ takes on values in some set $A$ where $A$ is finite or countably infinite.
Probabilities are specified by giving $\prob{X=k}$ for each $k \in A$.
This list of probabilities is called the distribution of $X$.
Because some distributions are used for multiple purposes, we often parameterize them to study general properties.
It is now possible to work with distributions in the abstract without specifying an experiment.
For the distribution to be valid, those probabilities should sum to 1. $$ \sum_{k} \prob{X=k} \eql 1 $$

Exercise 6: What would be an example of a discrete rv in the Queueing application?

An example of a distribution (in the abstract):

Consider a rv $X$ with range $\{1,2,...\}$.
Suppose the distribution of $X$ is given by $$ \prob{X=k} \eql (1-p)^{k-1} p $$ where $p$ is a parameter such that $0 \lt p \lt 1$.
Here are a few sample values:
     $\prob{X=1} = p$
     $\prob{X=2} = (1-p) p$
     $\prob{X=3} = (1-p)^2 p$
     $\vdots$
This distribution is called the Geometric distribution.
Note: this distribution has one parameter ($p$).
We use the short form
$X \sim \mbox{Geometric}(p)$
to mean
"rv $X$ has the Geometric distribution with parameter $p$"
One application: coin flips
     Let $X = $ the number of flips needed to get the first heads.
     $\Rightarrow$ $\prob{X=k} = \probm{k-1 tails in a row, then heads}$
     $\Rightarrow$ $\prob{X=k} = \probm{tails} \probm{tails} \mbox{... (k-1) times ... } \probm{tails} \probm{heads}$
     $\Rightarrow$ $\prob{X=k} = (1-p)^{k-1} p$

Exercise 7: Verify by hand that
$\sum_k \prob{X=k} = 1$
for the above distribution.

Exercise 8: If $p=0.6$, what is the probability that the first heads appears on the 3rd flip? Verify your answer using Coin.java and CoinExample.java.

Exercise 9: Suppose I compare two parameter values for the Geometric distribution: $p=0.6$ and $p=0.8$. For which of the two values of $p$ is $\prob{X=3}$ higher?

Exercise 10: Compute (by hand) $\prob{X\gt k}$ when $X\sim\mbox{Geometric}(p)$.

The discrete uniform distribution:

A rv $X$ has the discrete uniform distribution if:
1. It has a finite range, i.e., $X \in A$ where $|A|$ is finite.
2. And $$\prob{X=k} \eql \frac{1}{|A|}$$
Example: die roll
- Range is finite: $X \in \{1,2,...,6\}$.
- $\prob{X=k} = \frac{1}{6}$.
Example: card drawing.
- Range is finite: $X \in \{0,1,2,...,51\}$.
- $\prob{X=k} = \frac{1}{52}$.
Quite often, useful discrete-uniform rv's have a range of $\{0,1,...,m-1\}$ or $\{1,2,...,m\}$.
In the latter case, we write $X\sim\mbox{Discrete-uniform}(1,m)$.

The Binomial distribution:

A rv $X$ has the Binomial distribution with parameters $n$ and $p$ (two parameters) if:
1. $X$ has range $\{0,1,...,n\}$.
2. And $$ \prob{X=k} \eql \nchoose{n}{k} p^k (1-p)^{n-k} $$
Short form: $X\simm{Binomial}(n,p)$.
Example: coin flips
- A coin with $\prob{H} = p$ is flipped $n$ times.
- $X$ counts the number of heads.
To see why this is appropriate for coin flips:
- Suppose we want to compute $\prob{X=k}$
  $\Rightarrow$ Probability of observing exactly $k$ heads.
- Think of the $n$ flips as a string over the alphabet $\{H,T\}$.
  e.g. HHTTH = 3 heads in 5 flips
- Now consider $k$ heads in $n$ flips:
- The probability of getting exactly $k$ heads and exactly $(n-k)$ tails is:
  $p^k (1-p)^{n-k}$
- But there are $\nchoose{n}{k}$ ("n choose k") different ways of arranging the $k$ heads.
  $\Rightarrow$ Each is a possible outcome
  $\Rightarrow$ $\prob{X=k} \eql \nchoose{n}{k} p^k (1-p)^{n-k} $

Exercise 11: Suppose we flip a coin $n$ times and count the number of heads using a coin for which $\prob{H} = p$.

Write code to compute $\prob{X=k}$ using the formula
$\prob{X=k} \eql \nchoose{n}{k} p^k (1-p)^{n-k} $
Write your code in Binomial.java.
Plot a graph of $\prob{X=k}$ vs. $k$ for the case $n=10, p=0.6$ and for the case $n=10, p=0.2$.
Write a simulation to estimate $\prob{X=3}$ when $n=10, p=0.6$. You can use Coin.java and CoinExample2.java for this purpose. Verify the estimate using the earlier formula.

Exercise 12: For the above experiment, verify that the sum of probabilities is 1. How would you prove this analytically for the Binomial distribution?

The Poisson distribution:

$X\simm{Poisson}(\gamma)$ if:
- $X$ has range $\{0,1,2,...\}$
- And $$\prob{X=k} \eql e^{-\gamma}\frac{\gamma^k}{k!}$$
Thus, the Poisson rv counts zero or more occurences when the number of possible occurences can be arbitrarily large.
$\Rightarrow$Unlike the number of heads in a fixed number of flips.
Note: this distribution has only one parameter ($\gamma$).
An application? It turns out that "the number of buses that arrive in a fixed interval" is a good application.
$\Rightarrow$ Why this is so is not trivial to show.

Exercise 13: Add code to Poisson.java to compute $\prob{X=k}$ and plot a graph of $\prob{X=k}$ vs. $k$ when $\gamma=2$. Use the Taylor series for $e^x$ to prove that $\sum_k \prob{X=k}$ adds up to 1.

Exercise 14: Download BusStop.java and BusStopExample3.java, and modify the latter to estimate the probability that exactly three buses arrive during the interval $[0,2]$. Compare this with $\prob{X=3}$ when $X\simm{Poisson}(2)$.

7.4 The cumulative distribution function (CDF)

Consider $X\simm{Binomial}(n,p)$:

We know that $X$'s range is $\{0,1,...,n\}$ and that the distribution specifies $\prob{X=k}$.

Now consider the probability $\prob{X \leq y}$ for some number $y$.

Example: $\prob{X \leq 5}$

In this case: $$ \prob{X \leq 5} \eql \prob{X=0} + \prob{X=1} + \ldots + \prob{X=5} $$

Exercise 15: Plot $\prob{X \leq k}$ vs $k$ for $X\simm{Binomial}(10,0.6)$.

Next, observe that $$\eqb{ \prob{X \leq 5} & \eql & \prob{X=0} + \prob{X=1} + \prob{X=3} \\ & & + \prob{X=4} + \prob{X=5} \\ & \eql & \prob{X \leq 3} \;\; + \;\; \prob{3 \lt X \leq 5} }$$

In general, for $m \lt k$
$\prob{X \leq k} = \prob{X \leq m} + \prob{m \lt X \leq k}$

Now, consider $\prob{X \leq 5.32}$ when $X\simm{Binomial}(n,p)$.
Does this make sense?

It can if we write $$\prob{X \leq 5.32} \eql \prob{X \leq 5} + \prob{5 \lt X \leq 5.32}$$
Obviously, $\prob{5 \lt X \leq 5.32} = 0$
$\Rightarrow$ $\prob{X \leq 5.32} = \prob{X \leq 5}$.

In this way, $\prob{X \leq y}$ is well defined for any value of $y$.

Since it is defined for every $y$, it is a function of $y$:
$F(y) \eql \prob{X \leq y}$
$\Rightarrow$ called the Cumulative Distribution Function (CDF) of rv $X$

CDFs are usually defined for all possible $y$: $y \in (-\infty, +\infty)$

Important: When we write $F(y)$ do NOT think of $y$ as the "y" axis. Instead: it's just an argument (input) to the function $F$. In fact, when we plot $F(y)$ vs $y$, then $y$ will be on the horizonal axis, while $F(y)$ will be on the vertical axis.

Exercise 16: Plot (by hand, on paper) the CDF $F(y)$ for $X\simm{Bernoulli}(0.6)$. Next, consider the sequence of real numbers $y_n = (1 - \frac{1}{n})$ and the sequence $z_n = (1 + \frac{1}{n})$ Compute by hand or by programming the following limits:

$lim_{n\to\infty} y_n$
$lim_{n\to\infty} z_n$
$lim_{n\to\infty} F(y_n)$
$lim_{n\to\infty} F(z_n)$

What do you notice?

About the CDF for discrete rv's:

The CDF is right-continuous
You can approach $F(y)$ from the right, but not always from the left.
The CDF $F(y)$ never decreases with increasing $y$.

Exercise 17: Why is this true?

Exercise 18: Suppose rv $X$ denotes the outcome ($\{1,2,...,6\}$) of a die roll. Draw by hand the CDF.

Exercise 19: Consider the distribution for the 3-coin-flip example:
     $\prob{X=0} = 0.064$
     $\prob{X=1} = 0.288$
     $\prob{X=2} = 0.432$
     $\prob{X=3} = 0.216$
Sketch the CDF on paper.

Exercise 20: For the two CDFs we've seen so far (Bernoulli, Discrete-Uniform), is it possible to differentiate the function $F(y)$? If so, what is the derivative for the Bernoulli case?

Consider a CDF $F(y)$ of a rv $X$:

Suppose that $y_1$ and $y_2$ are two values of $y$ such that $y_1 \lt y_2$
By the definition of the CDF
$F(y_1) = \prob{X \leq y_1}$ and $F(y_2) = \prob{X \leq y_2}$
Then, because $y_1 \lt y_2$, we see that $$\eqb{ F(y_2) & \eql & \prob{X \leq y_2} \\ & & \\ & \eql & \prob{X \leq y_1} + \prob{y_1 \lt X \leq y_2}\\ & & \\ & \eql & F(y_1) + \prob{y_1 \lt X \leq y_2} }$$
Thus, $$ \prob{y_1 \lt X \leq y_2} \eql F(y_2) - F(y_1) $$
An alternative way of writing this is: $$ \prob{X \in (y_1,y_2]} \eql F(y_2) - F(y_1) $$
In words: the probability that $X$ falls in the interval $(y_1,y_2]$ is $F(y_2) - F(y_1)$
What's important to note: this is true for any rv, any distribution, and any values of $y_1$ and $y_2$ where $y_1 \lt y_2$.

Exercise 21: Suppose $X\simm{Geometric}(p)$.

Argue that $\prob{X\gt k} = (1-p)^k$.
Then write down the CDF.
Write down an expression for $\prob{3.11 \lt X \leq 5.43}$ in terms of the CDF.

7.5 Continuous random variables

Exercise 22: Recall the Bus-stop example. Suppose rv $X$ denotes the first interarrival time. Download BusStop.java, RandTool.java, and BusStopExample.java. Then modify the latter to estimate the following:

$\prob{X \lt 1}$

$\prob{X \lt 1.1}$

$\prob{X=1}$

Exercise 23: What is an example of a continuous rv associated with the QueueControl.java application?

The difficulty with continuous rv's:

First, $\prob{X=x} = 0$ for most continuous rv's $X$.
$\Rightarrow$ This means we can't specify the distribution using a list.
Second, it's not clear how we can get probabilities to sum to 1.
e.g., for a rv that takes any real value in $[0,1]$, how do we sum probabilities?
Fortunately, the CDF offers a way out.

Let's write some code to estimate a CDF:

First, identify the range $[a,b]$ of interest.
Then, break this up into $M$ small intervals:
$[a,a+\delta ], [a+\delta ,a+2\delta ], \ldots, [a+(M-1)\delta ,a+M\delta ]$
Maintain a counter intervalCount[i] for each interval.
For any random value $x$, identify the interval
$j = \lfloor \frac{x-a}{\delta} \rfloor $
and increment the count of every interval $i \geq j$.

Pseudocode:

    Algorithm: cdfEstimation (M, a, b)
    Input: number of intervals M, range [a,b]
    1.   δ = (b-a) / M
    2.   for n=1 to numTrials 
    3.       x = generateRV ()
    4.       j = floor ( (x-a) / δ )
    5.       for i=j to M-1
    6.           intervalCount[i] = intervalCount[i] + 1
    7.       endfor
    8.   endfor
         // Now build and print the CDF
    9.   for i=0 to M-1
    10.      midPoint = a + iδ + δ/2                           // Interval: [a+iδ,a+(i+1)δ]
    11.      print (midPoint, intervalCount[i] / numTrials)
    12.  endfor

Sample code: (source file)

    static Function makeUniformCDF ()
    {
	double a = 0, b = 1;
	int M = 50;                   // Number of intervals.
	double delta = (b-a) / M;     // Interval size.

	double[] intervalCounts = new double [M];
	double numTrials = 1000000;

	for (int t=0; t<numTrials; t++) {
            // Random sample:
	    double y = RandTool.uniform ();
            // Find the right interval:
            int k = (int) Math.floor ((y-a) / delta);
            // Increment the count for every interval above and including k.
            for (int i=k; i<M; i++) {
                intervalCounts[i] ++;
            }
	}

	// Now compute probabilities for each interval.
	double[] cdf = new double [M];
	for (int k=0; k<M; k++) {
	    cdf[k] = intervalCounts[k] / numTrials;
	}

        // Build the CDF. Use mid-point of each interval.
        Function F = new Function ("Uniform cdf");
	for (int k=0; k<M; k++) {
	    double midPoint = a + k * delta + delta/2;
            F.add (midPoint, cdf[k]);
	}

        return F;
    }

Exercise 24: Download and execute UniformCDF.java. You will also need Function.java, RandTool.java, and SimplePlotPanel.java. Modify the code to print $F(1)-F(0.75)$.

Getting probabilities out of a CDF:

Recall that for any rv $X$ with CDF $F$:
$\prob{y_1 \lt X \leq y_2} = F(y_2) - F(y_1)$
Thus, for example, if $X$ has the uniform CDF
$\prob{0.75 \lt X \leq 1.0} = F(1) - F(0.75)$
Important: the CDF gets you the probability that $X$'s outcome lands in any particular interval of interest:
$\probm{X falls in any interval (a,b]} = F(b) - F(a)$

Exercise 25: The program GaussianCDF.java estimates the CDF of a Gaussian rv. Execute the program to plot the CDF. Then, use this CDF to compute the following probabilities:

$\prob{0 \lt X \leq 2}$.
$\prob{X \gt 0}$.

A small technicality that goes to the heart of a continuous rv:

Notice something peculiar about the statement
$\probm{X falls in any interval (a,b]} = F(b) - F(a)$
The interval is NOT written as $[a,b]$ or $(a,b)$ or $[a,b]$.
Instead, it's written as: $(a,b]$
Recall from earlier: $$\eqb{ F(y_2) & \eql & \prob{X \leq y_2} \\ & \eql & \prob{X \leq y_1} + \prob{y_1 \lt X \leq y_2}\\ & \eql & F(y_1) + \prob{y_1 \lt X \leq y_2} }$$ and so $$ \prob{X \in (y_1,y_2]} \eql \prob{y_1 \lt X \leq y_2} \eql F(y_2) - F(y_1) $$
What does this mean?
The interval $(a,b]$ includes $b$ but does NOT include $a$.
Why is this relevant?
- Because a discrete rv's CDF has jumps
  $\Rightarrow$ We need to account for the jump points carefully.
- We say that the CDF for a discrete rv is discontinuous.
- A continous rv's CDF, on the other hand, is continuous (smooth, no jumps).
  $\Rightarrow$ Inclusion or exclusion of an end point makes no difference.
Another related point: consider $\lim_{a\to b} F(b) - F(a)$
- For a continuous rv
  $\lim_{a\to b} F(b) - F(a) = 0$
- This is why $\prob{X=a} = 0$
  $\prob{X=a} = \lim_{b\to a} F(b)-F(a) = 0$
A few more observations:
- A discrete rv's CDF is continuous in places
  In fact, it's discontinous only at the points where the jumps occurs
- So, is this useful?
  
       No, because the non-jump points never actually occur
       $\Rightarrow$ We are never interested in probabilities of intervals between jump points.
       $\Rightarrow$ Those probabilities are always zero.
Finally, since $F$ is continuous for a continuous distribution, we might ask: is the derivative $F'$ of any value?
We'll explore this next.

Exercise 26: Modify UniformCDF.java and GaussianCDF.java to compute the derivative of each. What is the shape of $F'(y)$ in each case?

Exercise 27: If $X$ denotes the first interarrival time in the bus-stop problem, estimate the CDF of $X$ as follows:

Assume that values fall in the range $[0,3]$ (i.e., disregard values outside this range).
Use ExponentialCDF.java as a template, and add modified code from UniformCDF.java
Next, compute the derivative of this function and display it.

About CDFs and derivatives:

The concept of CDF applies to both discrete and continuous rv's.
The CDF of a discrete rv is not differentiable.
For continuous rv's, the derivative is called a probability density function (pdf).
The pdf has two uses:
- It's useful in some calculations as we'll see below.
- Most books use the pdf
  $\Rightarrow$ You need to be comfortable with the pdf to read them.
For completeness, note that the list of probabilities for a discrete rv is called a probability mass function (pmf), e.g.
     $\prob{X=0} = 0.064$
     $\prob{X=1} = 0.288$
     $\prob{X=2} = 0.432$
     $\prob{X=3} = 0.216$
Customarily, a discrete rv's distribution is specified by giving its pmf.
$\Rightarrow$ The CDF is not really useful for discrete rv's
Customarily, a continuous rv's distribution is specified by giving its pdf.
$\Rightarrow$ The CDF is equivalent (because you can take its derivative to get the pdf)
To summarize:
- Discrete rv's use a pmf.
- Continuous rv's use a pdf.
- Both use a CDF.

7.6 Expectation - the discrete case

The expected value of a rv is a calculation performed on its distribution:

If $X$ is a discrete rv with range $A$, then $$ \mbox{Expected value of }X \eql \sum_{k\in A} k \; \prob{X=k} $$

We write this as $$ E[X] \eql \sum_{k\in A} k \; \prob{X=k} $$

Note: this is merely a calculation, and (so far) has nothing to do with the English word "expect".

$E[X]$ is a number (not a function).

Example: recall the 3-coin-flip example where $X$ = number-of-heads (we'll use $\prob{H}=0.6$).

The range: $A = \{0,1,2,3\}$.
The pmf:
     $\prob{X=0} = 0.064$
     $\prob{X=1} = 0.288$
     $\prob{X=2} = 0.432$
     $\prob{X=3} = 0.216$
Thus, $$\eqb{ E[X] & \eql & \sum_{k\in \{0,1,2,3\}} k \; \prob{X=k} \\ & \eql & 0 \; \prob{X=0} + 1 \; \prob{X=1} + 2 \; \prob{X=2} + 3\; \prob{X=3} \\ & \eql & 0*0.064 + 1*0.288 + 2*0.432 + 3*0.216 }$$

Exercise 28: Complete the calculation above. What would you get if $\prob{H}=0.5$?

A few more examples:

$X \simm{Bernoulli}(p)$
- Recall the pmf:
  $\prob{X=0} = 1-p$
  $\prob{X=1} = p$
- Thus, $$\eqb{ E[X] & \eql & 0 \; \prob{X=0} + 1\; \prob{X=1}\\ & \eql & 0*(1-p) + 1*p = p }$$
- In particular, for a fair coin $p=0.5$ and so $E[X]=0.5$.
$X\simm{Geometric}(p)$
- The range: $X\in \{1,2,\ldots \}$
- The pmf:
  $\prob{X=k} = (1-p)^{k-1} p$
- Thus, $$\eqb{ E[X] & \eql & \sum_{k=1}^\infty k \; \prob{X=k} \\ & \eql & \sum_{k=1}^\infty k (1-p)^{k-1} p \\ & \eql & \frac{1}{p} }$$
$X \simm{Binomial}(n,p)$
- The range: $X \in \{0,1,2,...,n\}$
- The pmf:
  $\prob{X=k} \eql \nchoose{n}{k} p^k (1-p)^{n-k} $
- Thus, $$\eqb{ E[X] & \eql & \sum_{k=0}^{n} k\; \prob{X=k} \\ & \eql & \sum_{k=0}^{n} k \nchoose{n}{k} p^k (1-p)^{n-k} \\ & \eql & np }$$ (The last step is not at all obvious.)
  
  Exercise 29: How does this relate to the 3-coin-flip example?
$X\simm{Discrete-uniform}(1,k)$
- The range: $X\in \{1,2,...,m\}$
- The pmf:
  $\prob{X=k} = \frac{1}{m}$
- Thus, $$\eqb{ E[X] & \eql & \sum_{k=1}^m k \; \prob{X=k} \\ & \eql & \sum_{k=1}^m k \frac{1}{m} \\ & \vdots & \\ & \eql & \frac{m+1}{2} }$$
$X\simm{Poisson}(\gamma)$
- The range: $X\in \{0,1,2,...\}$
- The pmf:
  $\prob{X=k} \eql e^{-\gamma}\frac{\gamma^k}{k!}$
- Thus, $$\eqb{ E[X] & \eql & \sum_{k=0}^\infty k \; \prob{X=k} \\ & \eql & \sum_{k=0}^\infty k e^{-\gamma}\frac{\gamma^k}{k!}\\ & \vdots & \\ & \eql & \gamma }$$

Exercise 30: Fill in the steps for the last two expectations (discrete-uniform, Poisson).

What is the meaning of this expected value?

First, let's return to the 3-coin-flip example where
     $\prob{X=0} = 0.064$
     $\prob{X=1} = 0.288$
     $\prob{X=2} = 0.432$
     $\prob{X=3} = 0.216$
Now suppose we perform $n$ trials
     Let $n_0 = $number of occurences of $0$ in the trials.
     Similarly, let $n_1 = $number of occurences of $1$ in the trials.
     ... etc
Next, let
$S_n = $ the sum of values obtained.
Notice that
$\frac{1}{n} S_n = $ average of values obtained.
For example, suppose we ran some trials and obtained
     $3, 1, 1, 0, 2, 0, 1, 2, 2, 2$
In this case
     $n_0 = 2$
     $n_1 = 3$
     $n_2 = 4$
     $n_3 = 1$
     $S_n = 14$
This is evident, if we group the values:
     0, 0,   1, 1, 1,   2, 2, 2, 2,   3
Next, notice that
     $S_n \eql 0 + 1*3 + 2*4 + 3*1$
     $= 0*n_0 + 1*n_1 + 2*n_2 + 3*n_3 $
     $= \sum_k k\; n_k$
Now consider the limit $$\eqb{ \lim_{n\to\infty} \frac{1}{n} S_n \eqt \lim_{n\to\infty} \frac{1}{n} \sum_{k} k n_k \\ \eqt \lim_{n\to\infty} \sum_{k} k\; \frac{n_k}{n} \\ }$$

Exercise 31: What does $\frac{n_k}{n}$ become in the limit? Unfold the sum for the 3-coin-flip example to see why this is true.
Thus, $$\eqb{ \lim_{n\to\infty} \frac{1}{n} S_n \eqt \sum_{k} k\; \prob{X=k} \\ \eqt E[X] }$$
Thus, even though $E[X]$ is a calculation done with the distribution, it turns out to be the average (mean) outcome in the long run (limit).
(provided of course we believe $\frac{n_k}{n}\to \prob{X=k}$).

Exercise 32: Download Coin.java and CoinExample3.java and let $X=$ the number of heads in 3 coin flips.

Compute the average value of $X$ using $\frac{1}{n} S_n$
Estimate $\prob{X=k}$ using $\frac{n_k}{n}$.
Compute $\sum_{k} k\; \frac{n_k}{n}$ using the estimates of $\frac{n_k}{n}$.

Compare with the $E[X]$ calculation you made earlier.

Exercise 33: Use Coin.java and CoinExample4.java and let $X=$ the number of flips needed to get the first heads when $\prob{Heads}=0.1$. Compute the average value of $X$ using $\frac{1}{n} S_n$ as you did in the previous exercise. Compare with the $E[X]$ calculation from earlier.

7.7 Expectation - the continuous case

There are two technical difficulties with the continuous case:

Before explaining, recall the calculation for the discrete case:
$E[X] \eql \sum_{k\in A} k \prob{X=k}$

One problem with the continuous case: $\prob{X=k} = 0$, for any outcome $k$.

Another problem: we don't know how to do uncountable sums.

Let's first try an approximation:

We will try to discretize and apply the discrete definition.
Suppose we have a continuous rv with range $[a,b]$.
We'll divide $[a,b]$ into $M$ intervals
$[a,a+\delta ], [a+\delta ,a+2\delta ], ..., [a+(M-1)\delta ,a+M\delta ]$
For convenience, we'll write
$x_i = a+i\delta $
Then, the intervals become
$[x_0,x_1], [x_1,x_2], ..., [x_{M-1}, x_M]$
We can compute $\probm{X in i-th interval}$ as follows:
$\prob{X \in [x_i,x_{i+1}]} = F(x_{i+1}) - F(x_i)$
A representative point in the interval would be the mid-point $m_i = (x_i + x_{i+1})/2$
Thus, we could define
$E[X] \eql \sum_{i\lt M} m_i (F(x_{i+1}) - F(x_i)) $

Exercise 34: Try this computation with the uniform, Gaussian and exponential distributions using UniformCDF2.java, GaussianCDF2.java, and ExponentialCDF2.java. Explore what happens when more intervals are used in the expectation computation than in the CDF estimation.

Now for an important insight:

Let's return to the thus-far definition of $E[X]$ for a continuous rv: $$ E[X] \eql \sum_{i\lt M}\; m_i (F(x_{i+1}) - F(x_i)) $$
Multiply and divide by the interval length $\delta $ $$ E[X] \eql \sum_{i\lt M} \; m_i \frac{(F(x_{i+1}) - F(x_i))}{\delta} \delta $$
Also, recall that
$ \delta \eql (x_{i+1} - x_i) $
Or
$ x_{i+1} \eql x_i + \delta $
As $M$ grows (and $\delta $ shrinks), $$ \frac{(F(x_{i+1}) - F(x_i))}{\delta} \eql \frac{(F(x_i+\delta) - F(x_i))}{\delta} \; \to \; F'(x_i) $$
And the mean
$m_i \; \to \; x_i $
Thus, in the limit, we could argue that $$ E[X] \eql \lim_{M\to\infty} \sum_{i=0}^M m_i \frac{(F(x_{i+1}) - F(x_i))}{\delta} \delta \eql \int_a^b x F'(x) dx $$
We use the term probability density function (pdf) to name $f(x) = F'(x)$, the derivative of the CDF $F$.
In this case, $$ E[X] \eql \int_a^b x f(x) dx $$
$\Rightarrow$ This is (almost) the usual textbook definition of $E[X]$.
By making $f(x)=0$ outside the rv's range, this is equivalent to: $$ E[X] \eql \int_{-\infty}^\infty x f(x) dx $$
$\Rightarrow$ Textbook definition of $E[X]$.

As an aside, since we've formally introduced pdfs, calculus tells us that:

If $f(x)$ is some density function, then there is some $F$ such that $$ \int_a^b f(x)dx \eql F(b) - F(a) $$
(First fundamental theorem of calculus)
But because $F(b) - F(a) = \prob{a \leq X \leq b}$, $$ \prob{a \leq X \leq b} \eql \int_a^b f(x)dx $$
As a corollary, $$\eqb{ F(y) \eqt \prob{X \leq y} \\ \eqt \prob{-\infty \lt X \leq y} \\ \eqt \int_{-\infty}^y f(x) dx }$$
Finally,
$F' = f$
(Second fundamental theorem of calculus).

Exercise 35: Find a description and picture of the standard Gaussian pdf. What is the maximum value (height) of the pdf? Use a calculator or write a program to compute this value as a decimal.

7.8 Histograms, pmfs and pdfs

What is the connection?

We are used to the intuitive idea of a histogram, which shows the spread of frequency-of-occurence.

We sense that pmfs and pdfs show something similar
$\Rightarrow$ This is more or less true, with some modifications.

Types of histograms:

Let's start with the most basic histogram:
Suppose we are given 7 data: 1.6, 0.7, 2.4, 0.3, 1.8, 1.7, 0.4
- Step 1: find an appropriate range
  $\Rightarrow$ We'll use $[0, 2.5]$.
- Step 2: decide the granularity (number of bins)
  $\Rightarrow$ We'll use 5 bins.
- Step 3: Identify, for each bin, how many data values fall into that bin.
Note: the height of each bin is the number of data falling into each bin.
An alternative type of histogram is a relative or proportional histogram:
- Here, we compute the proportion of data in each bin:

Exercise 36: Consider the 3-coin flip example (with $\probm{Heads]=0.6}$). Use a bin width of 1.0 and build a proportional histogram when the data is $3, 1, 1, 0, 2, 0, 1, 2, 2, 2$.

Let's write some code for the 3-coin flip example:

First, observe that the data is integers.
We will build bins to differentiate each possible outcome

The program: (source file)

	double numTrials = 100000;

	Coin coin = new Coin (0.6);                   

        // Build intervals to surround the values of interest.
        PropHistogram hist = new PropHistogram (-0.5, 3.5, 4);

	for (int t=0; t<numTrials; t++) {
            // #heads in 3 flips.
            int numHeads = coin.flip() + coin.flip() + coin.flip();
            hist.add (numHeads);
	}

        // View the histogram, text and graph.
        System.out.println (hist);
        hist.display ();

Exercise 37: Run CoinExample5.java to see what you get for a large number of samples. You will also need Coin.java and PropHistogram.java.

The relationship between the proportional histogram and pmf:

Recall: pmf is for discrete rv's
$\Rightarrow$ the list $\prob{X=k}$ for each $k$
We have selected each bin so that the height of bin $k$ is the proportion of times $k$ occurs in the data.
Clearly, in the limit,
Height of bin $k \;\to\; \prob{X=k}$
Thus, the proportional histogram becomes the pmf in the limit.

Does this work for continuous rv's?

Exercise 38: Use PropHistogram.java and execute GaussianExample.java. Does it work? Change the number of intervals from 10 to 20. What do you notice? Is the height correct?

Histograms for continuous rv's:

To make the histogram more like a pdf, we need the area of each bin to be a probability:

$\Rightarrow$ Let bin height = proportion/bin-width.
Then, proportion = height $*$ width (area).
We'll call this a density histogram.

Exercise 39: Modify computeHeights() in PropHistogram.java to compute heights as above for continuous rv's. Then, execute GaussianExample.java. Change the number of intervals from 10 to 20. Does it work?

Summary:

In a regular histogram,
bin height = number of data in bin
For discrete rv's, proportional histogram → pmf in the limit.
bin height = proportion of data in bin
Another way of saying it: a proportional histogram estimates the true pmf.
For continuous rv's, density histogram → pmf in the limit.
bin height = proportion-of-data-in-bin / bin-width
A density histogram estimates the true pdf.

Exercise 40: Estimate the density of the time spent in the system by a random customer in the QueueControl example. To do this, you need to build a density histogram of values of the variable timeInSystem in QueueControl.java.

7.9 Some well-known continuous rv's

Three important distributions (pdf shapes):

Uniform:

Exponential (γ = 4):

Gaussian (also called Normal):

The Uniform distribution: if $X\simm{Uniform}(a,b)$

$a \lt b$ are two real numbers.
$\Rightarrow$ Most common case: $a=0, b=1$
$X$ takes values in $[a,b]$ ($a,b$ inclusive).
$X$ has pdf (defined, for generality, on the whole real line): $$ f(x) \eql \left\{ \begin{array}{ll} 0 & \;\;\; x \lt a \\ \frac{1}{b-a} & \;\;\; a \leq x \leq b \\ 0 & \;\;\; x \gt b \\ \end{array} \right. $$
$X$ has CDF $$ F(x) \eql \left\{ \begin{array}{ll} 0 & \;\;\; x \lt a \\ \frac{x-a}{b-a} & \;\;\; a \leq x \leq b \\ 1 & \;\;\; x \gt b \\ \end{array} \right. $$
$X$ has expected value $$\eqb{ E[X] \eqt \int_{-\infty}^\infty f(x) dx \\ \eqt \int_a^b \frac{1}{b-a} dx \\ \eqt \frac{a+b}{2} }$$

Exercise 41: Suppose $X\simm{Uniform}(a,b)$ and $[c,d]$ is a sub-interval of $[a,b]$. Use either the pdf or cdf to compute $\prob{X\in [c,d]}$.

Exercise 42: Suppose $X\simm{Uniform}(0,1)$ and let rv $Y=1-X$. What is the distribution of $Y$?

What is the range of $Y$?
Use PropHistogram to estimate the density (remember to scale by interval-width) of $Y$. Recall that RandTool.uniform() generates values from the Uniform(0,1) distribution.
How would you derive the pdf or CDF of $Y$?

The Exponential distribution: if $X\simm{Exponential}(\gamma)$

The parameter $\gamma\gt 0$ is a real number.
$X$ takes values in the positive half of the real-line.
$X$ has pdf (defined, for generality, on the whole real line): $$ f(x) \eql \left\{ \begin{array}{ll} 0 & \;\;\; x \leq 0\\ \gamma e^{-\gamma x} & \;\;\; x \gt 0 \end{array} \right. $$
$X$ has CDF $$ F(x) \eql \left\{ \begin{array}{ll} 0 & \;\;\; x \leq 0\\ 1 - e^{-\gamma x} & \;\;\; x \gt 0 \end{array} \right. $$
$X$ has expected value $$\eqb{ E[X] \eqt \int_{-\infty}^\infty f(x) dx \\ \eqt \int_0^\infty \gamma e^{-\gamma x} dx \\ \eqt \frac{1}{\gamma} }$$

Exercise 43: Suppose $X\simm{Exponential}(\gamma)$.

Write down an expression for $\prob{X\gt a}$.
Compute $\prob{X\gt b+a \;|\; X\gt b}$.

Exercise 44: Suppose $X\simm{Exponential}(\gamma)$ with CDF $F(x)$. Write down an expression for $F^{-1}(y)$, the inverse of $F$.

The Normal distribution: if $X\simm{N}(\mu,\sigma^2)$

The parameter $\mu$ is any real number.
The parameter $\sigma$ is any non-negative real number.
$X$ takes any real value.
$X$ has pdf $$ f(x) \eql \frac{1}{\sqrt{2\pi} \sigma} e^{- \frac{(x-\mu)^2}{2\sigma^2}} $$
There is no closed-form CDF.
$X$ has expected value $E[X] = \mu$.

7.10 Generating random values from distributions

We have used random values from various distributions, but how does one write a program to generate them?

Key ideas:

Since we are writing a (hopefully small) program, the output cannot really be truly random.
$\Rightarrow$ the goal is to make pseudo-random generators.

Use the strange properties of numbers to make a (pseudo-random) generator for Uniform(0,1) values.

To generate values for some other distribution:

Generate Uniform(0,1) values.
Transform them so that the transformed values have the desired distribution.

Note: Random-number generation is a topic of research by itself:

Much work on high-quality Uniform(0,1) (or bit-stream) generators.
Similarly, there is much work on efficient transformations for particular distributions.

Desirable properties for a pseudo-random generator:

Should be repeatable:
$\Rightarrow$ It should be possible to reproduce the sequence given some kind of initial "seed".
Should be portable:
$\Rightarrow$ It shouldn't be dependent on language, OS or hardware.
Should be efficient:
$\Rightarrow$ The calculations shouldn't take too long (simple arithmetic, preferably).
Should have some theoretical basis:
$\Rightarrow$ Ideally, some number-theory results about its properties.
(Contradiction) Should have good statistical properties:
$\Rightarrow$ Should satisfy statistical tests for randomness.

Exercise 45: What kinds of statistical tests? Is it enough that the histogram resemble the $U[0,1]$ pdf?

A simple algorithm: the Lehmer generator:

For some large number $M$, consider the set $S = {1,2,...,M-1}$.
We will generate a number $K$ in $S$ and return $U=\frac{K}{M}$.
$\Rightarrow$ Output is between 0 and 1.
If K is uniformly distributed between $1$ and $M-1$
$\Rightarrow$ $U$ is approximately $Uniform(0,1)$.
Note: this means we'll never generate some values like $(1+2)/2M$.
Only remaining question: how do we pick uniformly among the discrete elements in $S={1,2,...,M-1}$?
Key idea:
- Consider two large numbers $a$ and $b$ (but both $a,b \lt M$).
- Suppose we compute $x = ab$ mod $M$.
- Then $x$ could be anywhere between ${1,2,...,M-1}$.
- Then, pick another two values for $a$ and $b$ and repeat.
But how do we pick $a$ and $b$?
The logic seems circular at this point.
The key insight: use the previous value $x$.
- Fix the values of $a$ and $b$.
- Compute $x_1 = ab$ mod $M$.
- Then, compute $x_2 = ax_1$ mod $M$.
- Then, compute $x_3 = ax_2$ mod $M$.
- ... etc
We will call $b = x_0$, the seed.
It can be shown (number theory) that if $M$ is prime, the sequence of values $x_1,x_2, ..., x_p$ is periodic (and never repeats in between).
Furthermore, for some values of $a$, $p=M-1$.
Example: $M=2^{31}-1$ and $a=48271$.

Pseudocode:

     Algorithm: uniform ()
     // Assumes M and a are initialized.
     // Assumes x (global) has been initialized to seed value.
     1.   x = (a*x) mod M
     2.   return x

What about its properties?

Some properties are obviously satisfied: portable, repeatable, efficient (provided $mod$ is efficient).
The Lehmer generator has reasonably good statistical properties for many applications.
However, there are (stranger) generators with better statistical properties.

Using a $U[0,1]$ generator for discrete rv's:

First, consider $X\simm{Bernoulli}(0.6)$ as an example:

Recall: $X$ takes values 0 or 1 and $\prob{X=1}=0.6$.
Key idea: generate a uniform value $U$.
If $0\lt U\lt 0.6$, return $X=1$, else return $X=0$.

Pseudocode:

          Algorithm: generateBernoulli (p)
          Input: Bernoulli parameter p=Pr[success]
          1.   u = uniform ()
          2.   if u < p
          3.       return 1
          4.   else
          5.       return 0
          6.   endif

This idea can be generalized: e.g., consider rv $X$ with pmf
     $\prob{X=0} = 0.064$
     $\prob{X=1} = 0.288$
     $\prob{X=2} = 0.432$
     $\prob{X=3} = 0.216$

Generate uniform value $U$.

         if U < 0.064
             return 0
         else if U < 0.064 + 0.288
             return 1
         else if U < 0.064 + 0.288 + 0.432
             return 2
         else // if U < 1
             return 3

Exercise 46: Add code to DiscreteGenExample.java to implement the above generator, and to test it by building a histogram.

Let's generalize this to any discrete pmf of the type $\prob{X=k} = ...$.

We'll use the notation $p_k = \prob{X=k}$.
Define $Q_k = \sum_{i\leq k} \prob{X=i}$
Then, the algorithm is:
Given uniform value $U$, return the smallest $k$ such that $U \gt Q_k$.

Pseudocode:

    Algorithm: generateFromDiscretePmf ()
    // Assumes Q[k] have been computed.
    1.   u = uniform ()
    2.   k = 0
    3.   while Q[k] < u
    4.       k = k + 1
    5.   endwhile
    6.   return k

This presents a problem when $X\simm{Geometric}(p)$.
Here, $X \in \{1,2,...\}$
$\Rightarrow$ Cannot pre-compute $Q_k$'s.

Alternatively, we can compute the $Q_k$'s "as we go":

    Algorithm: generateFromDiscretePmf ()
    1.   u = uniform ()
    2.   k = 0
    3.   Q = Pr[X=0]
    4.   while Q < u
    5.       k = k + 1
    6.       Q = Q + Pr[X=k]
    7.   endwhile
    8.   return k

Exercise 47: Add code to GeometricGenerator.java to implement the above generator. The test code is already written.

Let's return to the 3-coin-flip example for a key insight:

Recall the pmf:
     $\prob{X=0} = 0.064$
     $\prob{X=1} = 0.288$
     $\prob{X=2} = 0.432$
     $\prob{X=3} = 0.216$
From which we can infer the CDF:
     $\prob{X \leq 0} = 0.064$
     $\prob{X \leq 1} = 0.352 $   (= 0.064 + 0.288)
     $\prob{X\leq 2} = 0.784$   (= 0.064 + 0.288 + 0.432)
     $\prob{X\leq 3} = 1.000$   (= 0.064 + 0.288 + 0.432 + 0.216)
Let's plot the CDF:
Recall our generation algorithm:
Given uniform value $u$, return the smallest $k$ such that $U \gt Q_k$.
Thus, it looks like we are using the CDF "in reverse".
We will use this idea for continuous rv's.

Using the inverse-CDF for continuous rv's:

Suppose $X$ is a continuous rv with CDF $F(x)$.
Suppose, further, that the inverse $F^{-1}(y)$ can be computed.
Consider $F^{-1}(U)$ where $U$ is a uniform value.
When $U$ is a rv, so is $g(U)$ for any function $g$.
$\Rightarrow$ In particular, $F^{-1}(U)$ is a rv.
Let $H(x)$ be the CDF of the rv $F^{-1}(U)$.
$\Rightarrow$ Thus, $H(x) = \prob{F^{-1}(U) \leq x}$.
Let's work this through a few steps:
     $H(x)$
     $= \prob{F^{-1}(U) \leq x}$
     $= \prob{U \leq F(x)}$
     $= F(x)$
     $\Rightarrow$ $F^{-1}(U)$ has the same CDF as $X$.
     $\Rightarrow$ $F^{-1}(U)$ has the same distribution as $X$.
Thus, we can use $F^{-1}(U)$ to produce random values of $X$:
$\Rightarrow$ Given uniform value $U$, return $F^{-1}(U)$.

Pseudocode:

          Algorithm: generateFromContinuousCDF ()
          // Assumes inverse CDF F^-1 is available.
          1.   u = uniform ()
          2.   return F^-1(u)

Exercise 48: Add code to ExponentialGenerator.java to implement the above idea. Use the inverse-CDF you computed earlier. The test code is written to produce a histogram. Use your modified version of PropHistogram.java to make a density histogram. Compare the result with the actual density (using $\gamma = 4$). How do you know your code worked?

About random generation:

The inverse-CDF idea does not work easily for distributions like the Normal:
$\Rightarrow$ One needs to numerically approximate some inverse-CDFs.
There are a host of techniques for specific distributions.
$\Rightarrow$ random generation is a specialty in itself (several books).
For simple simulations, the straightforward generators will suffice.
For very large simulations (large numbers of samples)
$\Rightarrow$ You'd want to use a more industrial-strength generator.
Example: the Mersenne-Twister algorithm for uniform values.
Another issue: many simulations require generating multiple streams of random values
$\Rightarrow$ Need to be careful that you don't introduce correlations through careless use of generators.

7.1 Summary

What have we learned in this module?

Prior to this, we learned how to reason about events:

Events, sample spaces, probability measure.
Combining events (intersection, union).
Conditional probability, Bayes' rule, total-probability.

A random variable (rv) measures numeric output from an experiment.
(As opposed to the outcome "King of Spades")

Why is this useful?
You can perform arithmetic and other types of calculations with numeric output, e.g., expected value.

Two types: discrete and continuous.

Discrete rv's have pmfs. Continuous rv's have pdfs.

Both have CDFs defined the same way: $F(y) = \prob{X \leq y}$.

A proportional histogram becomes the pmf (in the limit).

A density histogram becomes the pdf (in the limit).
(Minor technical issue: there are two limits here - the number of samples, the size of the interval).

The important discrete distributions: Bernoulli, Discrete-Uniform, Binomial, Geometric, Poisson.

The important continuous ones: Uniform, Exponential, Normal.

The expected value is a calculation performed on a pmf or pdf.

For discrete rv's: $$ E[X] \eql \sum_{k\in A} k \; \prob{X=k} $$

For continuous rv's: $$ E[X] \eql \int_{-\infty}^\infty x f(x) dx $$

Outcome	\(\probm{outcome}\)	value of \(X\)
\(HHH\)	0.6 * 0.6 * 0.6	\(X = 3\)
\(HHT\)	0.6 * 0.6 * 0.4	\(X = 2\)
\(HTH\)	0.6 * 0.4 * 0.6	\(X = 2\)
\(HTT\)	0.6 * 0.4 * 0.4	\(X = 1\)
\(THH\)	0.4 * 0.6 * 0.6	\(X = 2\)
\(THT\)	0.4 * 0.6 * 0.4	\(X = 1\)
\(TTH\)	0.4 * 0.4 * 0.6	\(X = 1\)
\(TTT\)	0.4 * 0.4 * 0.4	\(X = 0\)