Confidence Intervals for Proportion class: center, middle, inverse, title-slide .title[ # Confidence Intervals for Proportion ] .subtitle[ ## MA336 Statistics ] .author[ ### Fei Ye Department of Mathematics and Computer Science ] .date[ ### August 2022 ] --- <style type="text/css"> .remark-slide table, .remark-slide table thead th { border-top: 0px; border-bottom: 0px; } .remark-slide thead, .remark-slide tr:nth-child(even){ background-color: white; } table{ border-collapse: collapse; } .remark-slide thead:empty { display: none; } </style> ## Learning Goals for Confidence Intervals - Construct and interpret a confidence intervals for one population proportion. - Describe how the following will affect the width of the confidence interval: - increasing the sample size; - increasing the confidence level. --- ## Confidence Interval for a Proportion (1/2) - Recall that the standard error of sample proportions is `\(\sigma_{\hat{P}}=\sqrt{\frac{p(1-p)}{n}}\)`, where `\(n\)` is the sample size and `\(p\)` is the population proportion. As a consequence, when estimating the population proportion `\(p\)`, we only have a point estimate `\(\hat{p}\)` (phat) to use. For the standard error, we use the estimation `$$\sigma_{\hat{p}}\approx\hat{\sigma}_{\hat{p}}=\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}.$$` - Based on the central limit theorem, when `\(n\)` is large enough, at the `\(100(1-\alpha)\%\)` level, the margin of error for `\(p\)` is defined as `$$E=z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$` In Excel, `\(z_{\alpha/2}\)`=`NORM.S.INV((1 + confidence level)/2)`. The marginal error can also be obtained by `CONFIDENCE.NORM(1-confidence level, SQRT(phat*(1-phat)/n, n)`. --- ## Confidence Interval for a Proportion (2/2) - The confidence interval for `\(p\)` is defined by `$$[\hat{p}-E,\hat{p}+E]=\left[\hat{p}-z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}, \hat{p}+z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\right],$$` where [the critical value `\(z_{\alpha/2}\)` satisfies that `\(P(Z< z_{\alpha/2})=1-\alpha/2\)`](https://saylordotorg.github.io/text_introductory-statistics/s11-01-large-sample-estimation-of-a-p.html) for the standard normal variable `\(Z\)`. - In practical, the sample size `\(n\)` is considered large enough if `\(n\hat{p}\ge 10\)` and `\(n(1-\hat{p})\ge 10\)`. - The above defined confidence interval is known as the [normal approximation (or Wald's) confidence interval](https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval). It is popular in introductory statistics books. However, it is unreliable when the sample size is small or the sample proportion is close to 0 or 1. Indeed, if the sample proportion is 0 or 1, the confidence interval defined here will have zero length. ??? By the central limit theorem, the random variable `\(\hat{p}\)` is normal distributed. The chance that `\(p\in \left[\hat{p}-z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}, \hat{p}+z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\right]\)` is the same as the chance that `\(\hat{p}\in \left[p-z_{\alpha/2}\sqrt{\frac{p(1-p)}{n}}, p+z_{\alpha/2}\sqrt{\frac{p(1-p)}{n}}\right]\)`. That shows `\(z_{\alpha/2}\)` satisfying `$$P(-z_{\alpha/2}<\dfrac{\hat{p}-p}{\sqrt{\frac{p(1-p)}{n}}<z_{\alpha/2})=1-\alpha.$$` --- ## Example: Estimating the Proportion of Students Taking Busses (1/2) In a random sample of 100 students in college, 65 said that they come to college by bus. 1. Give a point estimate of the proportion of all students who come to college by bus. 2. Construct a 99% confidence interval for that proportion. -- **Solution:** A good point estimate would be a sample proportion. Here the sample proportion is `\(\hat{p}=65/100=0.65\)`. As `\(n\hat{p}=100\cdot 0.65=65>10\)` and `\(n(1-\hat{p})=100\cdot 0.35=35>10\)`, which implies the sample is large enough, approximately the standard error is `$$\hat{\sigma}_{\hat{P}}=\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}=\sqrt{\frac{0.65(1-0.65)}{100}}\approx0.048.$$` --- ## Example: Estimating the Proportion of Students Taking Busses (2/2) **Solution: (Continued)** At 99% level of confidence, the value `\(1-\alpha/2=(1+\text{confidence level})/2=(1+0.99)/2=0.995\)`. The critical value `\(z_{\alpha/2}\)` is determined by the equation `\(P(Z<z_{\alpha/2})=0.9955\)`. Using the Excel function `NORM.S.INV(0.995)`, we find the critical value \\(z_{\alpha/2}\approx 2.576\\). Thus the marginal error is `$$E=z_{\alpha/2}\cdot \hat{\sigma}_{\hat{P}}=2.576\cdot 0.048=0.123,$$` and the confidence interval at 99% level is `$$[\hat{p}-E, \hat{p}+E]\approx [0.65-0.123, 0.65+0.123]=[0.527, 0.773].$$` Conclusion: we are 99% confident that the proportion of all students at the college who take bus is in the interval `\([0.527, 0.773]\)`. .footmark[ **Note:** The marginal error can also be obtained by the Excel function `CONFIDENCE.NORM(1-0.99, SQRT(65/100*(1-65/100)/100), 100)`. ] <!-- --- --> <!-- ## Example: Estimating a Population Proportion (1/2) --> <!-- Foothill College’s athletic department wants to calculate the proportion of students who have attended a women’s basketball game at the college. They use student email addresses, randomly choose 220 students, and email them. Of the 145 who responded, 22 had attended a women’s basketball game. Calculate and interpret the approximate 90% confidence interval for the proportion of all Foothill College students who have attended a women’s basketball game. --> <!-- -- --> <!-- **Solution:** Although 220 students were surveyed, only 145 responded. So the sample consists of those 145 students. The sample proportion is `\(\hat{p}=\frac{22}{145}=0.152\)`. The estimated standard error is `$$\hat{\sigma}_{\hat{P}}=\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}=\sqrt{\frac{0.152\cdot(1-0.152)}{145}}\approx 0.03.$$` --> <!-- --- --> <!-- ## Example: Estimating a Population Proportion (2/2) --> <!-- **Solution: (Continued)** At the 90% confidence level, `\(\alpha=1-0.9=0.1\)` and the critical value is `\(z_{\alpha/2}\)`=`NORM.S.INV(1-0.1/2)` `\(\approx 1.645\)`. Therefore, the marginal error is `$$E=z_{\alpha/2}\cdot\hat{\sigma}_{\hat{P}}\approx 1.645\cdot0.03=0.049,$$` and the confidence interval is `$$[\hat{p}-E, \hat{p}-E]\approx[0.152-0.049, 0.152+0.049]=[0.103, 0.049].$$` Conclusion: we are 90% confident that the proportion of all Foothill College students who have attended a women’s basketball game is between 0.103 and 0.201. --> <!-- .footmark[ Source: [Estimating a Population Proportion in the book Concepts in Statistics](https://courses.lumenlearning.com/wmopen-concepts-statistics/chapter/estimating-a-population-proportion-1-of-3/) ] --> --- ## Factors Affect the Width of Confidence Intervals - The width of a confidence interval, equals twice the standard error, gives a measure of precision of the estimation. - Recall, for population proportion and mean, `$$\text{Marginal Error} = \text{Critical Value}\cdot \frac{\text{(estimated) Population SD}}{\sqrt{\text{Sample Size}}}$$` - The formula tells us the precision of a confidence interval is affected by the confidence level, the variability, and the sample size. - Larger the confidence levels give larger critical values and errors. - Populations (and samples) with more variability gives larger errors. - Larger sample sizes give smaller errors. --- ## Sample Size Determination - In practice, we may desire a marginal error of `\(E\)`. With a fixed confidence level `\(100(1-\alpha)\%\)`, the larger the sample size the smaller the marginal error. - When estimating population proportion, if we can produce a reasonable guess `\(\hat{p}\)` for population proportion, then an appropriate minimum sample size for the study is determined by `$$n=\left(\frac{z_{\alpha/2}}{{E}}\right)^2\cdot \hat{p}(1-\hat{p}).$$` - When estimating population mean, if we can produce a reasonable guess `\(\sigma\)` for the population standard deviation, then an appropriate minimum sample size is given by `$$n=\left(\dfrac{z_{\alpha/2}\cdot \sigma}{{E}}\right)^2.$$` --- ## Example: Minimum Sample Size - Error in Proportion Suppose you want to estimate the proportion of students at QCC who live in Queens. By surveying your classmates, you find around 70% live in Queens. Use this as a guess to determine how many students would need to be included in a random sample if you wanted the error of margin for a 95% confidence interval to be less than or equal to 2%. **Solution:** We may use `\(\hat{p}=0.7\)` as a reasonable guess for the population proportion. At the 95% level, the critical value is `\(z_{\alpha/2}=\)` `NORM.S.INV((1+0.95)/2)` `\(\approx 1.96\)`. The marginal error is `\(E=0.02\)`. Then the appropriate minimal sample size is determined by `$$n=\left(\frac{z_{\alpha/2}}{{E}}\right)^2\cdot \hat{p}(1-\hat{p})=(1.96/0.02)^2\cdot 0.7\cdot(1-0.7)=2016.84.$$` Since the sample size has to be an integer, to get a error no more than 2% at the level 95%, the minimal sample size should be at least 2017. --- ## Example: Minimum Sample Size - Error in Mean Find the minimum sample size necessary to construct a 99% confidence interval for the population mean with a margin of error `\(E =0.2\)`. Assume that the estimated population standard deviation is `\(\sigma=1.3\)`. **Solution:** At the 99% level, the critical value `\(z_{\alpha/2}=\)` `NORM.S.INV((1+0.99)/2)` `\(\approx 2.576\)`. The desired marginal error is `\({E}=0.2\)`. The estimated population standard deviation is `\(\sigma=1.3\)`. Then the minimal sample size is approximately `$$n=\left(\dfrac{z_{\alpha/2}\cdot \sigma}{{E}}\right)^2\approx (2.576\cdot 1.3/0.2)^2 \approx 280.4.$$` To get a error no more than 0.2 at the level 95%, the minimal sample size should be at least 281. --- ## Practice: Confidence Interval of Proportion of Kids <iframe src="https://www.myopenmath.com/embedq2.php?id=7347&seed=2020&showansafter" width="100%" height="560px" data-external="1"></iframe> --- ## Practice: Confidence Intervals for Product Quality To understand the reason for returned goods, the manager of a store examines the records on 40 products that were returned in the last year. Reasons were coded by 1 for “defective,” 2 for “unsatisfactory,” and 0 for all other reasons, with the results shown in the table. <table class="table" style="margin-left: auto; margin-right: auto;"> <tbody> <tr> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(42, 120, 142, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(187, 223, 39, 1) !important;">2</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(42, 120, 142, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">2</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(42, 120, 142, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> </tr> <tr> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(187, 223, 39, 1) !important;">1</span> </td> <td style="text-align:center;"> <span style=" color: rgba(42, 120, 142, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(187, 223, 39, 1) !important;">2</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">2</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(187, 223, 39, 1) !important;">2</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> <td style="text-align:center;"> <span style=" color: rgba(46, 180, 124, 1) !important;">0</span> </td> </tr> </tbody> </table> 1. Give a point estimate of the proportion of all returns that are because of something wrong with the product, that is, either defective or performed unsatisfactorily. 2. Construct an 80% confidence interval for the proportion of all returns that are because of something wrong with the product. --- ## Practice: Sample Size for Mean with Given Error <iframe src="https://www.myopenmath.com/embedq2.php?id=19799&seed=2020&showansafter" width="100%" height="560px" data-external="1"></iframe> --- ## Practice: Sample Size for Proportion with Given Error <iframe src="https://www.myopenmath.com/embedq2.php?id=7348&seed=2020&showansafter" width="100%" height="560px" data-external="1"></iframe> <!-- --- --> <!-- ## Practice: Confidence Interval of Proportion Given Table --> <!-- <iframe src="https://www.myopenmath.com/embedq2.php?id=213816&seed=2020&showansafter" width="100%" height="560px" data-external="1"></iframe> --> <!-- --- --> <!-- ## Practice: Minimum Sample Size - Mean --> <!-- A software engineer wishes to estimate, to within 5 seconds, the mean time that a new application takes to start up, with 95% confidence. Estimate the minimum size sample required if the standard deviation of start up times for similar software is 12 seconds. .footmark[ Source: [Exercise 7 in Section 7.4 in Introductory Statistics](https://saylordotorg.github.io/text_introductory-statistics/s11-04-sample-size-considerations.html) ] --> <!-- --- --> <!-- ## Practice: Minimum Sample Size - Proportion --> <!-- The administration at a college wishes to estimate, to within two percentage points, the proportion of all its entering freshmen who graduate within four years, with 90% confidence. Estimate the minimum size sample required. .footmark[ Source: [Exercise 13 in Section 7.4 in Introductory Statistics](https://saylordotorg.github.io/text_introductory-statistics/s11-04-sample-size-considerations.html) ] --> --- class: center middle # Lab Instructions in Excel --- ## Excel Functions for Normal Distributions and Marginal Errors - When a cumulative probability `\(p=P(Z<z)\)` of a standard normal random variable `\(Z\)` is given, we can find `\(z\)` using `NORM.S.INV(p)`. - If a sample of size `\(n\)` has the proportion `\(\hat{p}\)` and the sampling distribution is approximately normal, the marginal error for the proportion can be obtained by the Excel function `CONFIDENCE.NORM(1-confidence level, SQRT(phat*(1-phat)), n)`