Continuous Random Variables class: center, middle, inverse, title-slide .title[ # Continuous Random Variables ] .subtitle[ ## MA336 Statistics ] .author[ ### Fei Ye Department of Mathematics and Computer Science ] .date[ ### July 2022 ] --- ## Learning Goals for Probability and Probability Distribution - Demonstrate understanding of characteristics of normal distributions. - Calculate accurate probabilities of continuous random variables and interpret them in a variety of settings. - Calculate the standardized value (or `\(z\)`-score). --- ## Probability Distribution of a Continuous Random Variable - The probability distribution of a continuous random variable `\(X\)` is characterized by its **probability density function** `\(f(X)\)` satisfying that the probability `\(P(a\leq X\leq b)\)` equals the area above the interval `\([a, b]\)` but under the graph of the density function `\(f(X)\)` which is also called a **density curve**. .center[ <img src="data:image/png;base64,#MA336-Week-8-Continuous-Random-Variables_files/figure-html/unnamed-chunk-2-1.png" width="360" /> <img src="data:image/png;base64,#MA336-Week-8-Continuous-Random-Variables_files/figure-html/unnamed-chunk-3-1.png" width="360" /> ] --- ## Properties of Probability Distribution of a Continuous Random Variable - The probability density function `\(f\)` is nonnegative, that is `\(f(X)\ge 0\)`. - The total area under a density curve is 1. - The cumulative probability `\(P(X\le b)\)` of a random variable `\(X\)` equals the area under the density curve to the left side of `\(b\)`. - By the addition rule of probability, we have - `$$P(a\le X\le b)=P(X\le b)-P(X\le a)$$` - `$$P(X\ge b)=1-P(X\le b)$$` - As a line segment has no area, we have `\(P(X\le a)=P(X< a)\)` as well as `\(P(X\ge b)=P(X>b)\)` --- ## Example: An Uniform Distribution Let `\(X\)` be the amount of time that a commuter must wait for a train. Suppose `\(X\)` has a probability density function $$ f(X)= `\begin{cases} 0.1, & 0\leq X\leq 10\\ 0, & \text{otherwise} \end{cases}` $$ What is the probability that the commuter's waiting time is less than 4 minutes? -- **Solution:** The probability `\(P(X\leq 4)\)` is the area under the horizontal line `\(y=0.1\)` to the left of `\(X=4\)`. Since `\(f(X)=0\)` for `\(X<0\)`, the area is the area of the rectangle with width 4 and height 0.1. So the probability is `\(P(X\leq 4)=0.1\cdot 4=0.4\)`. .center[ <img src="data:image/png;base64,#MA336-Week-8-Continuous-Random-Variables_files/figure-html/unnamed-chunk-4-1.png" width="360" /> ] --- ## Normal Distribution - A **normal distribution** has a **density function** `\(f(x)=\frac{1}{\sqrt{2\pi \sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}},\)` where `\(\mu\)` is the mean, `\(\sigma\)` is the standard deviation, `\(\pi\approx 3.14159\)` and `\(e\approx 2.71828\)`. The graph of `\(f\)` is called a **normal curve**. - We write `\(X\sim N(\mu, \sigma^2)\)` for a normal random variable `\(X\)` with the mean `\(\mu\)` and the standard deviation `\(\sigma\)`. - A normal distribution has the following properties: - *The mean, median, and mode are equal*. - The normal curve is *bell shaped and __symmetric__* with respect to the mean. - The *total area* under the curve and above the `\(x\)`-axis is `\(1\)`. - The normal curve *approaches, but never touches, the `\(x\)`-axis* as `\(x\)` goes to `\(\pm\infty\)`. - Between `\(\mu-\sigma\)` and `\(\mu+\sigma\)`, the graph *curves downward*. On the left side of `\(\mu-\sigma\)` or the right side of `\(\mu+\sigma\)`, the graph *curves upward*. A point at which the curve changes the direction of curving is called an **inflection point**. --- ## Normal Curves with Different Means and Standard Deviations .center[ <img src="data:image/png;base64,#MA336-Week-8-Continuous-Random-Variables_files/figure-html/unnamed-chunk-5-1.png" width="504" /> <img src="data:image/png;base64,#MA336-Week-8-Continuous-Random-Variables_files/figure-html/unnamed-chunk-6-1.png" width="504" /> ] --- ## The Empirical Rule for Normal Distributions For any normal distribution, the proportion of data values within 1, 2, and 3 standard deviations away from the mean are approximately 68.3%, 95.4% and 99.7% respectively. .center[ <img src="data:image/png;base64,#MA336-Week-8-Continuous-Random-Variables_files/figure-html/unnamed-chunk-7-1.png" width="720" /> ] --- ## Example: Foot length (1/2) Suppose that foot length of a randomly chosen adult male is a normal random variable with the mean `\(\mu=11\)` and the standard deviation `\(\sigma=1.5\)`. - How likely is a male's foot length to be smaller than 9.5 inches - How likely is a male's foot length to be bigger than 8 inches -- **Solution:** Let's first sketch the normal curve. .center[ <img src="data:image/png;base64,#MA336-Week-8-Continuous-Random-Variables_files/figure-html/unnamed-chunk-8-1.png" width="432" /> <img src="data:image/png;base64,#MA336-Week-8-Continuous-Random-Variables_files/figure-html/unnamed-chunk-9-1.png" width="432" /> ] --- ## Example: Foot length (2/2) **Solution: (Continued)** Note that `\(9.5=11-1.5=\mu-\sigma\)`. By the symmetry of normal curve, we know that the probability `\(P(X<9.5)\)` is the shaded area on the left. Because the probability of getting a foot length within 1 standard deviation away from the mean is 0.683. Then `$$\scriptstyle P(X<9.5)=\frac12(1-P(9.5<X<12.5))\approx\frac12(1-0.683)=0.1585.$$` Note that `\(8=11-2\cdot 1.5=\mu-2\sigma\)`. Because the probability of getting a foot length within 2 standard deviation away from the mean is 0.954. Then `$$\scriptstyle P(X>8)=(1-P(X<8))=1-\frac12(1- P(8<X<14))=1-\frac12(1-0.954)=0.977.$$` .footmark[ The probability `\(P(X<x)\)` for a normal random variable `\(X\)` can be calculated using the Excel function `NORM.DIST(x, mean, sd, TRUE)`. In this case, `\(P(X<9.5)=\)` `NORM.DIST(9.5, 11, 1.5, TRUE)` `\(\approx 0.1587.\)` `\(P(X>8)=1-P(X\le 8)=1-\)` `NORM.DIST(8, 11, 1.5, TRUE)` `\(\approx 1-0.02275=0.97725\)`. ] --- ## Standard Normal Distribution - A normal distribution is called a **standard normal distribution** if the mean is `\(\mu=0\)` and the standard deviation is `\(\sigma=1\)`. - A random normal variable can be **standardized** by the following formula `\(z=\frac{x-\mu}{\sigma}.\)` We call the value `\(z\)` the `\(Z\)`-**score** of `\(x\)`. In Excel, the `\(Z\)`-score of `\(x\)` can be calculated using the function `STANDARDIZE()`. - Standardization preserves probability: `$$P(a<X<b)=P\left(\frac{a-\mu}{\sigma}< Z < \frac{b-\mu}{\sigma}\right).$$` - The probability `\(P(Z< z)\)` of a standard normal random variable `\(Z\)` can be found using the Excel function `NORM.S,DIST(z, TRUE)` or the [standard normal distribution table](https://yfei.page/teaching/statistics/normal-tables.html). - The probability `\(P(X< x)\)` of a normal random variable `\(X\)` can be calculated using the Excel function `NORM.DIST(x, mean, sd, TRUE)`. --- ## Example: Find the Standard Score Let `\(X\)` be a norma random variable with the mean `\(\mu = 8\)` and the standard deviation `\(\sigma=2\)`. 1. Find the `\(Z\)`-score for the value `\(X=13\)`. 2. Find the `\(X\)`-value for the `\(Z\)`-score `\(z=-0.6\)`. -- **Solution:** The `\(z\)`-score for the value `\(X=13\)` is `$$z=\dfrac{x-\mu}{\sigma}=\dfrac{13-8}{2}=\dfrac{5}{2}=2.5.$$` The `\(X\)`-value for the the `\(Z\)`-score `\(z=-0.6\)` is `$$x=z\cdot\sigma+\mu=-0.6\cdot 2+8=-1.2+8=6.8.$$` --- ## Example: Probability of a Standard Normal Random Variable (1/2) Let `\(Z\)` be a standard normal random variable. .pull-left[ 1. Find `\(P(Z<1.21)\)`. 2. Find `\(P(Z\geq 1.21)\)`. 3. Find `\(P(0<Z\leq 1.21)\)`. ] .pull-right[ | Z | 0 | 0.01 | 0.02 | |-----|--------|--------|--------| | 1.2 | .red[0.8849] | 0.8869 | 0.8888 | | 1.3 | 0.9856 | 0.9856 | 0.9857 | ] -- **Solution:** To find the probability, we may use the standard normal distribution table, or the Excel function `NORM.S.DIST(z,TRUE)`. 1. From the table, we see that `\(P(Z<1.21)\approx 0.8869\)`. 2. Since the total area under the normal curve is 1, we get `$$P(Z\geq 1.21)\approx 1-0.8869=0.1131.$$` 3. By the symmetry, `\(P(Z<0)=0.5\)`. Then the probability `$$P(0<Z<1.21)\approx 0.8869-0.5=0.3869.$$` --- ## Example: Heights of 25-year-old women The heights of 25-year-old women in a certain region are approximately normally distributed with mean 62 inches and standard deviation 4 inches. Find the probability that a randomly selected 25-year-old woman is more than 67 inches tall. -- **Solution:** Let's first sketch the normal curve. .pull-left[ <img src="data:image/png;base64,#MA336-Week-8-Continuous-Random-Variables_files/figure-html/unnamed-chunk-10-1.png" width="432" /> ] .pull-right[ | Z | 0.04 | 0.05 | 0.06 | | ---- | ------ | ------ | ------ | | 1.2 | 0.8925 | .red[0.8944] | 0.8962 | ] The probaiblity is `\(P(X>67)=1-P(X<67)\)`. To calculate `\(P(X<67)\)`, one way is to use the standard normal distribution table. First find the `\(Z\)`-score is `\(z=\frac{67-62}{4}=1.25\)`. Then `\(P(Z<1.25)\approx 0.8944\)`. Another way is to use the Excel function `NORM.DIST(67, 62, 4, TRUE)`. Then `\(P(X>67)=1-P(X\le 67)\approx 1-0.8944=0.1056\)`. --- ## Cutoff Value for a Given Tail Area - The `\(k\)`-th percentile for a random variable `\(X\)` is the value `\(x_k\)` that cuts off a left tail with the area `\(k/100\)`, that is `\(P(X<x_k)=\frac{k}{100}\)`, where `\(0\leq k\leq 100\)`. - Let `\(c\)` be a nonnegative number less than or equal to 1. The `\((100c)\)`-th percentile for the standard normal distribution is usually denoted as `\(-z_c\)`, that is `\(P(Z<-z_c)=c\)`. By symmetry, `\(z_c\)` is the value such that `\(P(Z> z_c)=c\)`, that is `\(P(Z<z_c)=1-c\)`. - For a noraml random variable `\(X\)` with the mean `\(\mu\)` and standard deviation `\(\sigma\)`, the cutoff value `\(x^*\)` with a **tail area** `\(c\)`, can be calculated using the standardization formula, that is, `$$x^*=z^*\cdot \sigma+\mu,$$` where `\(z^*\)` is the cutoff `\(z\)`-score with the tail area `\(c\)`, that is `\(z^*=-z_c\)` given that `\(c\)` is the left-tail area and `\(z^*=z_c\)` given that `\(c\)` is the right tail area. --- ## Example: Cutoff Value for a Normal Random Variable Let `\(X\)` be the normal random variable with mean `\(6\)` and standard deviation `\(3\)`. Suppose the value `\(x^*\)` cuts off a left-tail area `\(0.05\)`. Find the value `\(x^*\)`. **Solution:** One way to find the value `\(x^*\)` is to use the Excel function `NORM.INV(0.05, 6, 3)`: `$$x^* \approx 1.065.$$` Another way is to use the standardization formula. Using the standard normal distrution table or the Excel function `NORM.S.INV(0.05)`, we find that `\(-z_{0.05}=-1.645\)`. Then `$$x^*=-z_{0.05}\cdot 3+6=1.065.$$` | z | − 0.04 | − 0.05 | |------|--------|---------| | -1.6 | 0.0505 | 0.04947 | .footmark[ **Note:** if the value `\(c\)` is between two cells in the standard deviation table, we take `\(z^*\)` be the average of the two `\(z\)`-scores associated to the values in the two cells. ] --- ## Example: Math Course Placement Scores on a standardized college placement examination are normally distributed with mean 60 and standard deviation 13. Students whose scores are in the top 5% will be placed in a Calculus II course. Find the minimum score needed to be placed in a Calculus II course. **Solution:** Let `\(x^*\)` be the minimum score. From the question, we know that `\(P(X\geq x^*)=0.05\)`. Equivalently, `\(P(X<x^*)=1-0.05=0.95\)`. Using the function `NORM.INV(0.95, 60, 13)`, we find the `\(x^*\)` score is `$$x^*=81.38.$$` Another way is to find `\(z_{0.05}\)` first, then use the standardization formula. Use the standard normal distribution table or the Excel function, you will find that the `\(z\)`-score is `\(z^*=z_{0.05}=1.64\)`. Then `$$x^*=z^*\sigma+\mu=1.64\cdot 13+60=81.32$$` So the minimum score needed is `\(82\)`. .footmark[ **Note:** The minimum score is the same as the 95th percentile. ] --- ## Practice: Dash washing time <iframe src="https://www.myopenmath.com/embedq2.php?id=7251&seed=2020&showansafter" width="100%" height="560px" data-external="1"></iframe> --- ## Practice: Find probabilities of a normal random variable 1. Let `\(Z\)` be a standard normal random variable. Find the probabilities: `$$\text{1.}\,\, P(Z<1.58)\quad \text{2.}\,\, P(-0.6<Z<1.67)\quad \text{3.}\,\, P(Z>0.19).$$` 2. Let `\(X\)` be a normal random variable with `\(\mu=5\)` and `\(\sigma=2\)`. Find the probabilities: `$$\text{1.}\,\, P(-2<X<8)\quad \text{2.}\,\, P(X>-1) \quad \text{3.}\,\, P(X<4).$$` --- ## Practice: Fruit weight <iframe src="https://www.myopenmath.com/embedq2.php?id=7280&seed=2020&showansafter" width="100%" height="560px" data-external="1"></iframe> --- ## Practice: Shortest lifespan <iframe src="https://www.myopenmath.com/embedq2.php?id=7281&seed=2020&showansafter" width="100%" height="560px" data-external="1"></iframe> --- ## Practice: Battery life <iframe src="https://www.myopenmath.com/embedq2.php?id=268987&seed=2020&showansafter" width="100%" height="560px" data-external="1"></iframe> --- ## Practice: Sum of probabilities of two normal random variables Let `\(Z\)` be a normal random variable with `\(\mu=0\)` and `\(\sigma=1\)`. Let `\(X\)` be a normal random variable with `\(\mu=4.3\)` and `\(\sigma=1.7\)`. Determine the values `\(P(Z>1) + P(X<6)\)` and explain how do you find the value. <!-- --- --> <!-- # More Practice on Normal Distributions --> <!-- --- --> <!-- ## Practice: Area under a normal curve --> <!-- <iframe src="https://www.myopenmath.com/embedq2.php?id=267521&seed=2020&showansafter" width="100%" height="560px" data-external="1"></iframe> --> <!-- --- --> <!-- ## Practice: Find cutoff `\(Z\)`-scores --> <!-- <iframe src="https://www.myopenmath.com/embedq2.php?id=7277&seed=2020&showansafter" width="100%" height="560px" data-external="1"></iframe> --> <!-- --- --> <!-- ## Practice: Numbers of Chocolate Chips in Acceptable Cookies --> <!-- <iframe src="https://www.myopenmath.com/embedq2.php?id=293469&seed=2020&showansafter" width="100%" height="560px" data-external="1"></iframe> --> <!-- --- --> <!-- ## Practice: Blood Pressure --> <!-- <iframe src="https://www.myopenmath.com/embedq2.php?id=283090&seed=2020&showansafter" width="100%" height="560px" data-external="1"></iframe> --> --- class: center middle # Lab Instructions in Excel --- ## Excel Functions for Normal Distributions - Let `\(Z\)` be a standard normal random varaible. In Excel, `\(P(Z<z)\)` is given by `NORM.S.DIST(z, TRUE)`. - Let `\(X\)` be a normal random variable with mean `\(\mu\)` and standard deviation `\(\sigma\)`, that is `\(X\sim N(\mu, \sigma^2)\)`. In Excel, `\(P(X<x)\)` is given by `NORM.DIST(x, mean, sd, TRUE)`. - When a cumulative probability `\(p=P(X<x)\)` of a normal random variable `\(X\)` is given, we can find `\(x\)` using `NORM.INV(p, mean, sd)`. - When a cumulative probability `\(p=P(Z<z)\)` of a standard normal random variable `\(Z\)` is given, we can find `\(z\)` using `NORM.S.INV(p)`.