Determine whether the study meets the conditions under which inferences on a population parameter may be performed.
Demonstrate understating of confidence level \(1-\alpha\)
.
Explain when and why to use the normal distribution or the t-distribution for a given study.
Determine the appropriate degrees of freedom associated with the t-distribution.
Determine the critical values using tables or Excel functions.
Describe how the following will affect the width of the confidence interval:
Construct and interpret a confidence intervals for one population mean.
When estimating a population parameter, we may consider the statistic of a random sample as an estimate of the population parameter. But we expect some chance error.
Estimating an unknown parameter by a single number calculated from a sample is called a point estimation. The single number (statistic) from the sample is called a point estimate.
Point estimate gives no indication of how reliable the estimate is or how large the error is.
From a box of 20 pencils of two colors, black and blue, 10 pencils were randomly drawn. 6 out of the 10 pencils are black. What proportion of black pencils are in the box.
Solution: Since the sample proportion is 0.6, one may make a point estimation that 60% of the box, or 12 are black pencils. However, we don't know how close the sample proportion is to the population proportion.
To increase the chance, we estimate an unknown parameter using intervals that are obtained by adding chance errors to a point estimate.
Estimating an unknown parameter using an interval of values which likely contains the true value of the parameter is called a interval estimation. The interval is called an interval estimate.
The reliability of an interval estimate is measured by the probability \(1-\alpha\)
that the interval estimate will capture the true value of the parameter. This probability \(1-\alpha\)
is called the confidence level.
The 90%, 95% and 99% level of confidence are frequently used in statistical study. The 95% level of confidence is usually the standard choice of confidence level for scientific polls published in the media and online.
Recall that the standard error of a statistic, denoted by SE, is the standard deviation of the sampling distribution.
A randomly selected 100 students at a college have an average GPA 3.0. How likely does the interval \([3.0-2\cdot\text{SE}, 3.0+2\cdot\text{SE}]\)
contain the average GPA \(\mu\)
of that college?
Recall that the standard error of a statistic, denoted by SE, is the standard deviation of the sampling distribution.
A randomly selected 100 students at a college have an average GPA 3.0. How likely does the interval \([3.0-2\cdot\text{SE}, 3.0+2\cdot\text{SE}]\)
contain the average GPA \(\mu\)
of that college?
Solution: The probability that the interval \([3.0-2\cdot\text{SE}, 3.0+2\cdot\text{SE}]\)
contains the population mean \(\mu\)
equals the probability that the sample statistic 3.0 lies in the interval \([\mu-2\cdot\text{SE}, \mu+2\cdot\text{SE}]\)
. Since, \([\mu-2\cdot\text{SE}, \mu+2\cdot\text{SE}]\)
contains 95.5% of data of the population.
That means, we can be 95.5% confidence that the average GPA \(\mu\)
of that college is in the interval \([3.0-2\cdot\text{SE}, 3.0+2\cdot\text{SE}]\)
.
When the sampling distribution of a statistic is approximately symmetric, we take interval estimates in the following form \([\text{Statistic}- \text{E}, \text{Statistic}+ \text{E}],\)
where the value \(\text{E}\)
is called the marginal error or margin of error.
Given a confidence level \(100(1-\alpha)\%\)
, the marginal error \(\text{E}\)
is the value such that \(100(1-\alpha)\%\)
of the intervals \([\text{Statistic}- \text{E}, \text{Statistic}+ \text{E}]\)
contains the true parameter \(\mu_\text{par}\)
. Equivalently, the marginal error \(\text{E}\)
is the value such that \(100(1-\alpha)\%\)
of statistics are in the interval \([\mu_\text{par}- \text{E}, \mu_\text{par}+ \text{E}]\)
.
Denote by \(X\)
the random variable for the sample statistic. Then \(\text{E}\)
is determined the following probability equation
$$P(\mu_\text{par}-\text{E}< X < \mu_\text{par}+\text{E})=1-\alpha.$$
If the distribution of \(X\)
is symmetric, then the marginal error \(E\)
is the value such that
$$P(X-\mu_\text{par}<\text{E})=1-\alpha/2.$$
Because the parameter \(\mu_\text{par}\)
is unknown. If we standardize the random variable \(X\)
by \(Z=\frac{X-\mu_\text{par}}{\text{SE}}\)
, we get
$$\textstyle P\left(-\frac{\text{E}}{\text{SE}}<Z<\frac{\text{E}}{\text{SE}}\right)=1-\alpha,$$
where the random variable \(Z\)
has a mean \(0\)
and standard deviation \(1\)
.
The above probability equation suggests the following formula
$$\textstyle \text{Marginal Error}=\text{Critical value}\cdot \text{Standard Error},$$
where the critical value is the value \(z_{\alpha/2}\)
so that \(P(-z_{\alpha/2}<Z<z_{\alpha/2})=1-\alpha\)
.
Let \(X\)
be a point estimate, we call the interval \([X-z_{\alpha/2}\text{SE}, X+z_{\alpha/2}\text{SE}]\)
a confidence interval at the \(100(1-\alpha)\%\)
level of confidence.
Suppose the population standard deviation \(\sigma\)
is given. By the central limit theorem, if \(n>30\)
or the population distribution is approximately normal, then the sampling distribution is approximately normal with the standard error \(\sigma/\sqrt{n}\)
.
At the confidence level \(1-\alpha\)
, the marginal error for a population mean is
\(E=z_{\alpha/2}\dfrac{\sigma}{\sqrt{n}}\)
and the confidence interval is
$$\left[\bar{x}-z_{\alpha/2}\frac{\sigma}{\sqrt{n}}, \bar{x}+z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\right],$$
where the critical value \(z_{\alpha/2}\)
satisfies that \(P(Z<z_{\alpha/2})=1-\alpha/2\)
for the standard normal variable \(Z\)
.
In Excel, \(z_{\alpha/2}\)
=NORM.S.INV((1+confidence level)/2)
.
The marginal error \(E=z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\)
can also be obtained by the Excel function
CONFIDENCE.NORM(1-confidence level, sigma, n)
.
A sample of size 15 drawn from a normally distributed population with the standard deviation 6. Find the critical value \(z_{\alpha/2}\)
needed in construction of a confidence interval:
A sample of size 15 drawn from a normally distributed population with the standard deviation 6. Find the critical value \(z_{\alpha/2}\)
needed in construction of a confidence interval:
Solution: One can find the critical value \(z_{\alpha/2}\)
by using the normal distribution table. Here, we will use the Excel function NORM.S.INV(prob)
Using Excel function NORM.S.INV((1+0.9)/2), we get the critical value
$$z_{\alpha/2}=1.6449.$$
Using the Excel function NORM.S.INV((1+0.98)/2)
, we get the critical value
$$z_{\alpha/2}=2.3263.$$
A random sample of 50 students from a college gives a mean GPA 2.51. Suppose the standard deviation of GPA of all students at the college is 0.43. Construct a 99% confidence interval for the mean GPA of all students at the college.
A random sample of 50 students from a college gives a mean GPA 2.51. Suppose the standard deviation of GPA of all students at the college is 0.43. Construct a 99% confidence interval for the mean GPA of all students at the college.
Solution: We first gather information from the question:
\(n=50\)
,\(\bar{x}=2.51\)
,\(\sigma=0.43\)
, and\(1-\alpha=0.99\)
.Now let's find the critical value and the standard error.
\(z_{\alpha/2}\)
=NORM.S.INV((1+0.99)/2)
\(\approx 2.576\)
\(\sigma_{\bar{x}}=\sigma/\sqrt{n}=0.43/\sqrt{50}\approx 0.06.\)
Then the marginal error is \(\text{E}=z_{\alpha/2}\cdot\sigma_{\bar{x}}=2.576\cdot 0.06\approx 0.16.\)
We can conclude with 99% confidence that the average GPA of all students is between \(2.51-0.16=2.35\)
and \(2.51+0.16=2.67\)
.
Note: The marginal error \(E\)
can also be obtained by the Excel function
CONFIDENCE.NORM(1-0.99, 0.43, 50)
.
\(t\)
-DistributionWhen the population standard deviation is unknown, we may replace \(\sigma\)
by the sample standard deviation \(s\)
and use \(s/\sqrt{n}\)
as an estimate to the standard error for the sampling distribution of the sample mean.
When we use the estimated standard error \(s / \sqrt{n}\)
to build a confidence interval, the normal distribution may NOT be appropriate for calculating the critical value.
If the random variable \(\bar{x}\)
is approximately normal, then the random variable \(t=\dfrac{\bar{x}-\mu}{s / \sqrt{n}}\)
has a Student's \(t\)
-distribution with the degree of freedom \(n-1\)
.
This result was discovered by William Gosset, an employee of the Guinness brewing company, who published his result using the name Student.
Unlike in the case of a sample proportion, the sample standard deviation \(s\)
is not determined by the sample mean \(\bar{x}\)
.
\(t\)
-DistributionThe \(t\)
-distributions is a family of curves, called
The \(t\)
-distribution has the following important properties.
\(t\)
-curve is 1.\(t\)
-distribution has slightly more variation (i.e. \(t\)
-curves are slightly “fatter”) than the standard normal distribution.\(t\)
-distribution becomes closer to the standard normal distribution.In practice, when the sample size is large enough \(n>30\)
, people use normal distribution as an approximation for the Student \(t\)
-distribution.
Suppose the sampling distribution is approximately normal. At the confidence level \(1-\alpha\)
, the margin of error is \(E=t_{\alpha/2}\frac{s}{\sqrt{n}},\)
and the confidence interval for a population mean \(\mu\)
is
$$\left[\bar{x}-t_{\alpha/2}\frac{s}{\sqrt{n}}, \bar{x}+t_{\alpha/2}\frac{s}{\sqrt{n}}\right],$$
where \(t_{\alpha/2}\)
is the critical value such that \(P(T<t_{\alpha/2})=1-\alpha/2\)
for a Student \(t\)
-distribution with degree of freedom \(n-1\)
.
In Excel, the critical value \(t_{\alpha/2}\)
can be calculated by T.INV((1+confidence level)/2, n-1)
or T.INV.2T(1-confidence level, n-1)
, where \(n\)
is the sample size.
The marginal error \(E=t_{\alpha/2}\frac{s}{\sqrt{n}}\)
can also be obtained by the Excel function
CONFIDENCE.T(1-confidence level, s, n)
.
\(t\)
-DistributionsA sample of size 15 drawn from a normally distributed population. Find the critical value \(t_{\alpha/2}\)
needed in construction of a confidence interval:
\(t\)
-DistributionsA sample of size 15 drawn from a normally distributed population. Find the critical value \(t_{\alpha/2}\)
needed in construction of a confidence interval:
Solution: To find the critical value \(t_{\alpha/2}\)
, we may use the Excel function T.INV(left tail area, df)
or T.INV.2T(tail areas, df)
.
Since the confidence level is \(1-\alpha=0.99\)
, the critical value is
\(t_{\alpha/2}\)
=T.INV.2T(1-0.99, 15-1)
=T.INV((1+0.99)/2, 15-1)
=2.9768.
Since the confidence level is \(1-\alpha=0.95\)
, the critical value is
\(t_{\alpha/2}\)
=T.INV.2T(1-0.95, 15-1)
=T.INV((1+0.95)/2, 15-1)
=2.1448.
A sample of size 16 is randomly drawn from a normally distributed population. The sample has a mean 79 and standard deviation 7. Construct a confidence interval for that population mean at the 90% level of confidence.
A sample of size 16 is randomly drawn from a normally distributed population. The sample has a mean 79 and standard deviation 7. Construct a confidence interval for that population mean at the 90% level of confidence.
Solution: Since the population is normally distributed, and the population standard deviation is unknown, we apply the formula \(\text{E}=t_{\alpha/2}\cdot\dfrac{s}{\sqrt{n}}\)
for marginal error.
Since the sample size is 16, the degree of freedom is df=15.
At 90% confidence level, the critical value is \(t_{\alpha/2}=\)
T.INV((1+0.9)/2, 16-1)
\(\approx 1.753\)
.
Then the marginal error is \(\text{E}=1.753\cdot 7/\sqrt{16}\approx 3\)
. Thus \(\bar{x}-\text{E}=79-3=76\)
and \(\bar{x}+\text{E}=79+3=82\)
.
With 90% confidence, we may conclude that the population mean is in the interval \([76, 82]\)
.
Note: The marginal error \(E\)
can also be obtained by CONFIDENCE.T(1-0.9, 7, 16)
The data blow shows numbers of hours worked from 40 randomly selected employees from several grocery stores in the county.
30 | 26 | 33 | 26 | 26 | 33 | 31 | 31 | 21 | 37 | 27 | 20 | 34 | 35 | 30 | 24 | 38 | 34 | 39 | 31 |
22 | 30 | 23 | 23 | 31 | 44 | 31 | 33 | 33 | 26 | 27 | 28 | 25 | 35 | 23 | 32 | 29 | 31 | 25 | 27 |
Construct 99% confidence interval for the mean worked time.
The data blow shows numbers of hours worked from 40 randomly selected employees from several grocery stores in the county.
30 | 26 | 33 | 26 | 26 | 33 | 31 | 31 | 21 | 37 | 27 | 20 | 34 | 35 | 30 | 24 | 38 | 34 | 39 | 31 |
22 | 30 | 23 | 23 | 31 | 44 | 31 | 33 | 33 | 26 | 27 | 28 | 25 | 35 | 23 | 32 | 29 | 31 | 25 | 27 |
Construct 99% confidence interval for the mean worked time.
Solution: Since the sample size is 40 (>30), by the central limit theorem, the sample mean is approximately normally distributed.
Using the Excel functions AVERAGE()
and STDEV.S()
to the data, we find \(\bar{x}\approx 29.6\)
and \(s\approx 5.3\)
.
Since \(\alpha=1-0.99=0.01\)
, the marginal error is \(\text{E}=\)
CONFIDENCE.T(0.01, 5.3, 40)
\(\approx 2.3\)
. Thus \(\bar{x}-\text{E}=29.6-2.3=27.3\)
and \(\bar{x}+\text{E}=29.6+2.3=31.9\)
With a 99% confidence, one may conclude that the average worked hours of employees in all grocery stores is between 27.3 and 31.9 hours.
\(t\)
-DistributionPopulation is approximately normally distributed.
\(\sigma\)
is known: use the normal distribution.\(\sigma\)
is unknown: use the \(t\)
-distribution.Population distribution unknown, but sample size is large enough, i.e. \(n>30\)
.
\(\sigma\)
is known: use normal distribution.\(\sigma\)
is unknown: either one can be used but the \(t\)
-distribution is more accurate.Warning: Population distribution unknown and the sample size is small, neither the \(t\)
-distribution nor the normal distribution is reliable.
For small samples, there is method called "The Shapiro–Wilk test" which can be used to determine if we may assume the sampling distribution is approximately normal.
Even when \(n>30\)
, a visual inspection (using histogram for example) of the normality is necessary.
Decide whether the following statements are true or false. Explain your reasoning.
\(z\)
-Value\(\sigma\)
\(\sigma\)
\(t\)
-Value\(\sigma\)
\(t\)
-DistributionsSuppose a Student's \(t\)
-distribution has the degree of freedom \(\text{df}=n-1\)
.
Find a probability for a given \(t\)
-value.
The area of the left tail of the \(t\)
-value may be calculated by the function T.DIST(t,df,true)
.
The area of the right tail of the \(t\)
-value may be calculated by the function T.DIST.RT(t,df)
.
The area of two tails of the \(t\)
-value (here \(t\)
>0) may be calculated by function T.DIST.2T(t,df)
.
Find the critical value for a given probability \(p\)
.
When the area of the left tail is given, the function T.INV(p,df)
may be used.
When the area of both tails is given, the function T.INV.2T(p,df)
may be used. This function is good for construction confidence interval.
If the population standard deviation \(\sigma\)
is given and the sampling distribution is approximately normal, the marginal error can be obtained by the Excel function
CONFIDENCE.NORM(1-confidence level, population SD, sample size)
If the population standard deviation \(\sigma\)
is NOT given and the sampling distribution is approximately normal, the marginal error can be obtained by the Excel function, the marginal error can be obtained by the Excel function
CONFIDENCE.T(1-confidence level, sample SD, sample size)
Determine whether the study meets the conditions under which inferences on a population parameter may be performed.
Demonstrate understating of confidence level \(1-\alpha\)
.
Explain when and why to use the normal distribution or the t-distribution for a given study.
Determine the appropriate degrees of freedom associated with the t-distribution.
Determine the critical values using tables or Excel functions.
Describe how the following will affect the width of the confidence interval:
Construct and interpret a confidence intervals for one population mean.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Alt + f | Fit Slides to Screen |
Esc | Back to slideshow |