Discrete Random Variables class: center, middle, inverse, title-slide .title[ # Discrete Random Variables ] .subtitle[ ## MA336 Statistics ] .author[ ### Fei Ye Department of Mathematics and Computer Science ] .date[ ### July, 2022 ] --- ## Learning Goals for Probability and Probability Distribution - Demonstrate understanding of random variables - Demonstrate understanding of characteristics of binomial distributions. - Calculate accurate probabilities of discrete random variables and interpret them in a variety of settings. --- ## Random Variables - A **random variable**, usually written `\(X\)`, is a variable whose values are numerical quantities of possible outcomes a random experiment. - A **discrete random variable** takes on only a finite or countable number of distinct values. **Example:** - Rolling a fair dice, the number of dots on the top faces is a discrete random variables takes on the possible values: 1, 2, 3, ,4, 5, 6. - Flipping a fair coin 10 times, the number of heads is a discrete random variable takes on the possible values: 1, 2, 3, ..., 10. - A **continuous random variable** takes on values which form an interval of numbers. **Example:** - The height of an randomly select 10 year-old boy in US is normally between 129 cm and 157 cm. So the height is a continuous random variable. - The measure the voltage at an randomly electrical outlet normally is between 118 and 122. So the measure of voltage is a continuous random variable. --- ## Practice: Discrete or continuous Classify each random variable as either discrete or continuous. 1. The number of boys in a randomly selected three-child family. 2. The temperature of a cup of coffee served at a restaurant. 3. The number of math majors in randomly selected group of 10 students. 4. The amount of rain recorded in a small town one day. --- ## Probability Distributions - The **probability distribution** of a discrete random variable `\(X\)` is defined by the probability `\(P(X=x)\)` associated with each possible value `\(x\)` of the variable `\(X\)`. The function `\(p_X(x)=P(X=x)\)` is called the **probability mass function**. - A probability distribution of a discrete random variable is usually characterized by a table of all possible values `\(X\)` together with probabilities `\(P(X)\)`, or a probability histogram, or a formula. - A random variable `\(X\)` (discrete and continuous) always has a **cumulative distribution function**: `\(F_X(x)=P(X\leq x)\)` (= `\(\sum\limits_{x_i\leq x} P(x_i)\)` if `\(X\)` is discrete). --- ## Basic Properties of Probability Distributions - Basic rules of probability: - `\(0\leq P(X=x)\leq 1\)`. - the sum of all the probabilities is 1, that is `\(P(X\leq x_{max})=1\)`. - In particular, `\(0\leq F_X(x)\leq 1\)`. - The cumulative distribution function `\(F_X(x)\)` is non-decreasing. - The probability distribution can be recovered from its cumulative distribution function. Indeed, for a *discrete* random variable `\(X\)`, we have `$$P(X=x_i)=P(X\le x_i)-P(X\le x_{i-1}),$$` where `\(P(X\le x_i)=\sum\limits_{k=1}^i P(X=x_k)\)`. --- ## Example: Probability Distribution of Flipping Two Fair Coins Let `\(X\)` be the number of heads that are observed when tossing two fair coins. 1. Construct the probability distribution for `\(X\)`. 2. Find `\(P(X\le 1)\)` and `\(P(X\le 2)\)`. -- **Solution:** The possible values of the number of heads are `\(0\)`, `\(1\)` and `\(2\)`. The probability distribution can be characterized by the following table: .center[ | `\(x\)` | 0 | 1 | 2 | |-----|-----|-----|-----| | `\(P(X=x)\)` |0.25 | 0.5 | 0.25| ] From the table, we may find the following cumulative distributions: `$$P(X\leq 1)=P(X=0)+P(X=1)=0.25+0.5=0.75.$$` `$$P(X\leq 2)= P(X=0)+P(X=1)+P(X=2)=0.25+0.5+0.25=1.$$` --- ## Example: Probability Histogram of an Unfair Coin The probability distribution of an unfair coin is characterized by the following histogram. Find the probability of getting at most 1 head. .center[ <img src="data:image/png;base64,#MA336-Week-7-Discrete-Random-Variables_files/figure-html/unnamed-chunk-2-1.png" width="360" /> ] -- **Solution:** Let `\(X\)` be the number of heads. From the probability histogram, we know that `\(P(X=0)=0.36\)`, and `\(P(X=1)=0.47\)`. Then the probability of getting at most 1 head is $$ P(X\le 1)=P(X=0)+P(X=1)=0.36+0.47=0.83. $$ --- ## Practice: Probability Distribution of Rolling a Pair of Fair Dice A pair of fair 6-sided dice were rolled. Let `\(X\)` denote the sum of the number of dots on the top faces. 1. Construct the probability distribution of `\(X\)`. 2. Find the probability that `\(X\)` takes an odd value. --- ## Practice: Probability Distribution of Bus Waiting Time <iframe src="https://www.myopenmath.com/embedq2.php?id=916595&seed=2020&showansafter" width="100%" height="560px" data-external="1"></iframe> --- ## Mean and Standard Deviation of a Discrete Random Variable Let `\(X\)` be a discrete random variable and `\(p_X(x)=P(X=x)\)` the probability mass function. - The **expected value** `\(E(X)\)` (also called **mean** and denoted by `\(\mu\)`) of the discrete random variable `\(X\)` is the number `$$\mu=E(X)=\sum xp_X(x).$$` - The **variance** `\(\mathrm{Var}(X)\)` (also denoted by `\(\sigma^2\)`) of the discrete random variable `\(X\)` is the number `$$\sigma^2=\mathrm{Var(X)}=\sum (x-E(X))^2p_X(x).$$` - The **standard deviation** `\(\sigma\)` of a discrete random variable `\(X\)` is the square root of its variance: `$$\sigma=\sqrt{\sum (x-E(X))^2p_X(x)}.$$` <!-- --- --> <!-- ## Some Properties of Expected Value (Optional) --> <!-- - The expected value of a linear combination two random variables `\(X\)` and `\(Y\)` is the linear combination of their expected values, that is `$$E(aX+bY)=aE(X)+bE(Y).$$` --> <!-- - The expected value in general is not multiplicative, that is `\(E(XY)\ne E(X)E(Y)\)`. --> <!-- - If the two random variable `\(X\)` and `\(Y\)` are independent, then `$$E(XY)=E(X)E(Y).$$` --> <!-- - The variance can be computed using expected values: --> <!-- `$$\mathrm{Var}(X)=E(X^{2})-(E[X])^{2}.$$` --> <!-- - Let `\(X\)` and `\(Y\)` be two *independent* variables. Then the variance of a linear combination `\(aX+bY\)` equals --> <!-- `$$\mathrm{Var}(aX+bY)=a^2\mathrm{Var}(X)+b^2\mathrm{Var}(Y)$$` --> --- ## Example: Expected Gain One thousand raffle tickets are sold for $2 each. Each has an equal chance of winning. First prize is $500, second prize is $300, and third prize is $100. Find the expected value of net gain, and interpret its meaning. **Solution:** Let `\(X\)` denote the net gain from purchasing one ticket. The probability distribution for `\(X\)` is .center[ | `\(x\)` | 498 | 298 | 98 | -2 | |------|--------|--------|--------|----------| | `\(P(X=x)\)` | `\(\frac{1}{1000}\)` | `\(\frac{1}{1000}\)` | `\(\frac{1}{1000}\)` | `\(\frac{997}{1000}\)` | ] The expected gain is `$$E(X)= 498\cdot \frac{1}{1000}+ 298\cdot\frac{1}{1000}+98\cdot \frac{1}{1000}+(-2)\cdot \frac{997}{1000}=-1.1,$$` which means that when buying one ticket, the buyer may expect a loss of $1.1. --- ## Example: Waiting Time The wait times (rounded to multiples of 5) in the cafeteria at a Community College has the following probability distribution. Find the expected waiting time and the standard deviation. .center[ | `\(x\)` (minutes) | 5 | 10 | 15 | 20 | 25 | | -------------------- | -------- | -------- | -------- | -------- | -------- | | `\(P(X=x)\)` | 0.13 | 0.25 | 0.31 | 0.21 | 0.1 | ] **Solution:** The expected waiting time is `$$E(X)= 5\cdot 0.13 +10\cdot 0.25+15\cdot 0.31+20\cdot 0.21 + 25\cdot 0.1 = 14.5.$$` The standard deviation is then `$$\scriptsize\begin{aligned}\sigma=&\sqrt{(5-14.5)^2\cdot 0.13 +(10-14.5)^2\cdot 0.25+(15-14.5)^2\cdot 0.31+(20-14.5)^2\cdot 0.21 + (25-14.5)^2\cdot 0.1}\\ \approx& 5.9. \end{aligned}$$` --- ## Example: Unfair Die (1/2) The probability distribution of an unfair die is given in the following table. | `\(x\)` | 1| 2| 3| 4| 5| 6| |---|---|---|---|---|---| | `\(P(X=x)\)` | 0.18| 0.12| `\(\,?\,\)` |0.14|0.23|0.17| 1. Find `\(P(X=3)\)`. 2. Find the mean, variance and standard deviation of this probability distribution. -- **Solution:** Since the sum of probabilities must be 1, we know that $$ P(X=3)=1-(0.18+0.12+0.14+0.23+0.17)=0.16 $$ --- ## Example: Unfair Die (2/2) **Solution: (Continued)** The mean is the weighted sum $$ % \scriptstyle `\begin{aligned} \mu=&0.18\cdot 1+0.12\cdot 2+0.16\cdot 3+0.14\cdot 4+0.23\cdot 5+0.17\cdot 6\\ =&3.63. \end{aligned}` $$ The variance is $$ \scriptstyle `\begin{aligned} \sigma^2=&0.18(1-3.63)^2+0.12(2-3.63)^2+0.16(3-3.63)^2+0.14(4-3.63)^2+0.23(5-3.63)^2+0.17(6-3.63)^2\\ =&3.0331. \end{aligned}` $$ Therefore, the standard deviation is $$ \sigma=\sqrt{\sigma^2}=\sqrt{3.0331}\approx 1.7416. $$ --- ## Practice: Lottery Tickets Seven thousand lottery tickets are sold for $5 each. One ticket will win $2,000, two tickets will win $750 each, and five tickets will win $100 each. Let `\(X\)` denote the net gain from the purchase of a randomly selected ticket. 1. Construct the probability distribution of `\(X\)`. 2. Compute the expected value `\(E(X)\)` of `\(X\)`. Interpret its meaning. 3. Compute the standard deviation `\(\sigma\)` of `\(X\)`. .footmark[ Source: [https://saylordotorg.github.io/text_introductory-statistics/s08-02-probability-distributions-for-.html](https://saylordotorg.github.io/text_introductory-statistics/s08-02-probability-distributions-for-.html) ] --- ## Binomial Distribution - A **binomial experiment** is a probability experiment satisfying: 1. The experiment has a fixed number `\(n\)` of independent trials. 2. Each trial has only two possible outcomes: a success (S) or a failure (F). 3. The probability `\(p\)` of a success is the same for each trial. - The discrete random variable `\(X\)` counting the number of successes in the `\(n\)` trials is the **binomial random variable**. We say `\(X\)` has a **binomial distribution** with parameters `\(n\)` and `\(p\)` and write it as `\(X\sim B(n, p)\)`. - For `\(X\sim B(n, p)\)`, the **probability of getting exactly `\(x\)` successes in `\(n\)` trials** is `$$P(X=x)=B(x,n,p)={_n C_x} p^x(1-p)^{n-x}=\frac{n!}{(n-x)!x!}p^x(1-p)^{n-x},$$` where `\(n!=n(n-1)\cdots 1\)`, read as `\(n\)` factorial, for `\(n>0\)` and `\(0!=1.\)` - The notation `\({_n C_x}=\frac{n!}{(n-x)!x!}\)` is read as `\(n\)` choose `\(x\)`, which is the number of ways to choose `\(x\)` objects from a set of `\(n\)` objects. --- ## Example: Probability from Drawing a Card Multiple Times (1/2) A card is randomly selected from a standard deck and replaced. This experiment is repeated a total of `\(5\)` times. - Find the probability of getting exactly `\(3\)` clubs. - Find the probability of getting at least `\(3\)` clubs. -- **Solution:** This is a binomial experiment. The number to total trials is `\(n=5\)`. The number of successes is `\(3\)`. The chance of a success is `\(p=\frac{13}{52}=\frac14\)`. Apply the binomial probability formula, we have `$$P(X=3)=\frac{5!}{3!2!} \left(\frac{1}{4}\right)^3\left(\frac34\right)^2=10\cdot\frac{9}{4^5}\approx 0.088.$$` The probability `\(P(X=3)\)` can also be found from the binomial distribution table or by using the Excel function `BINOM.DIST(3,5,1/4,FALSE)`. To probability of getting at least `\(3\)` club is `$$P(X\geq 3) =1-P(X\leq 2)=1-(P(0)+P(1)+P(2))\approx 1-0.8965=0.1035.$$` --- ## Example: Probability from Drawing a Card Multiple Times (2/2) **Solution:(continued)** To calculate `\(P(X\leq 2)\)`, we may also use the binomial distribution table or the Excel function `BINOM.DIST()`. **Method 1:** As `\(n=5\)` and `\(p=0.25\)`, we use the following portion of the cumulative binomial distribution table. .center[Binomial Probability Table *n=5* | n | x | 0.1 | 0.15 | 0.2 | 0.25 | 0.3 | 0.35 | 0.4 | | ---- | ---- | ------ | ------ | ------ | ------ | ------ | ------ | ------ | | 5 | 0 | 0.5905 | 0.4437 | 0.3277 | .red[0.2373] | 0.1681 | 0.116 | 0.0778 | | 5 | 1 | 0.3281 | 0.3915 | 0.4096 | .red[0.3955] | 0.3602 | 0.3124 | 0.2592 | | 5 | 2 | 0.0729 | 0.1382 | 0.2048 | .red[0.2637] | 0.3087 | 0.3364 | 0.3456 | | 5 | 3 | 0.0081 | 0.0244 | 0.0512 | 0.0879 | 0.1323 | 0.1811 | 0.2304 | ] `\(P(X\le 2) \approx 0.2373+0.3955+0.2637= 0.8965.\)` **Method 2:** In Excel, `\(P(X\le 2)\)` `=BINOM.DIST(2,5,0.25,TRUE)` `\(\approx 0.8965\)`. ??? In a calculator, use `binompdf(n, p, x)` for `\(P(X=x)\)` and `binomcdf(n, p, x)` for `\(P(X\leq x)\)`. --- ## Practice: Find Probability of from a Binomial Distribution Let `\(X\)` be a binomial random variable with parameters `\(n = 5\)`, `\(p=0.2\)`. Find the probabilities 1. `\(P(X=3),\)` 2. `\(P(X<3),\)` 3. `\(P(X>3).\)` --- ## Practice: Machine Defect Rate <iframe src="https://www.myopenmath.com/embedq2.php?id=7211&seed=2020&showansafter" width="100%" height="560px" data-external="1"></iframe> --- ## Mean and Standard Deviation of Binomial Distribution (4/5) - The mean of a binomial distribution of `\(n\)` trials is `$$\mu =\sum xP(X=x)=\sum x\cdot \dfrac{n!}{(n-x)!x!}p^x(1-p)^{n-x} = np.$$` - The variance of a binomial distribution of `\(n\)` trials is `$$\sigma^2 =\sum (x-np)^2P(X=x)=\sum x^2P(X=x)-(np)^2=np(1-p).$$` - The standard deviation of a binomial distribution of `\(n\)` trials is `$$\sigma=\sqrt{np(1-p)}.$$` - We consider an event `\(E\)` **unusual** if the probability `\(P(E)\leq 5\%\)`. --- ## Example: Expected Value and SD of Cracked Eggs The probability that an egg in a retail package is cracked or broken is 0.02. 1. Find the average number of cracked or broken eggs in a one dozen carton. 2. Find the standard deviation. 3. Is getting at least two broken eggs unusual? -- **Solution:** Since there are 12 eggs and the chance of getting a cracked egg is 0.02, the average number of cracked is `$$\mu =np=12\cdot 0.02=0.24.$$` The standard deviation is `$$\sigma=\sqrt{12\cdot 0.02\cdot(1-0.02)}\approx 0.4850.$$` Recall the Empirical rule: 95% data are within 2 standard deviation away from the mean. Since `\(2>0.24+2\cdot 0.4850\)`, the chance of getting at least two cracked eggs is less than 5%, which is considered as unusual. --- ## Practice: Quality of Grapefruit Adverse growing conditions have caused 5% of grapefruit grown in a certain region to be of inferior quality. Grapefruit are sold by the dozen. 1. Find the average number of inferior quality grapefruit per box of a dozen. 2. A box that contains two or more grapefruit of inferior quality will cause a strong adverse customer reaction. Find the probability that a box of one dozen grapefruit will contain two or more grapefruit of inferior quality. --- ## Practice: Mean and SD of a Binomial Distribution <iframe src="https://www.myopenmath.com/embedq2.php?id=916606&seed=2020&showansafter" width="100%" height="560px" data-external="1"></iframe> <!-- --- --> <!-- # Extra Practice Problems --> <!-- --- --> <!-- ## Practice: Find the mean of a discrete probability distribution --> <!-- <iframe src="https://www.myopenmath.com/embedq2.php?id=7197&seed=2020&showansafter" width="100%" height="560px" data-external="1"></iframe> --> <!-- --- --> <!-- ## Practice: Find the standard deviation of a discrete probability distribution --> <!-- <iframe src="https://www.myopenmath.com/embedq2.php?id=7198&seed=2020&showansafter" width="100%" height="560px" data-external="1"></iframe> --> <!-- --- --> <!-- ## Practice: The number of sales of new employees --> <!-- .pull-left[ --> <!-- A company tracks the number of sales new employees make each day during a 100-day probationary period. The results for one new employee are shown at the right. --> <!-- 1. Find the probability of each outcome. --> <!-- 2. Construct a probability distribution table. --> <!-- 3. Find the mean of the probability distribution. --> <!-- 4. Find the variance and standard deviation. --> <!-- ]<br> --> <!-- .pull-right[ --> <!-- | Sales per day `\(x\)` | Number of days `\(f\)` | --> <!-- |-------------------|--------------------| --> <!-- | 0 | 16 | --> <!-- | 1 | 19 | --> <!-- | 2 | 15 | --> <!-- | 3 | 21 | --> <!-- | 4 | 9 | --> <!-- | 5 | 10 | --> <!-- | 6 | 8 | --> <!-- | 7 | 2 | --> <!-- ] --> <!-- --- --> <!-- ## Practice: Probability within one SD of a discrete probability distribution --> <!-- <iframe src="https://www.myopenmath.com/embedq2.php?id=14762&seed=2020&showansafter" width="100%" height="560px" data-external="1"></iframe> --> <!-- --- --> <!-- ## Practice: Probability from a poll --> <!-- <iframe src="https://www.myopenmath.com/embedq2.php?id=7210&seed=2020&showansafter" width="100%" height="560px" data-external="1"></iframe> --> <!-- --- --> <!-- ## Practice: Chance of continuous successes of a type of surgery --> <!-- A type of surgery has a 90% chance of success. The surgery is performed on three patients. Find the probability of the surgery being successful on exactly two patients. --> --- class: center middle # Lab Instructions in Excel --- ## Excel Functions for Binomial Let `\(X\)` be a binomial random variable with parameters `\(n\)` and `\(p\)`, that is `\(X\sim B(n, p)\)`. In Excel, `\(P(X=x)\)` is given by `BINOM.DIST(x, n, p, FALSE)` and `\(P(X\le x)\)` is given by `BINOM.DIST(x, n, p, TRUE)`. You may click input function `\(f_x\)` and then search `binom` to find the function.