Demonstrate understanding of the sampling distribution of a statistic.
Explain how the central limit theorem applies in inference.
Determine whether a sampling distribution is approximately a normal distribution.
Calculate key characteristics (mean, standard error) of the sampling distribution of a statistic.
Estimate the probability of an event using the sampling distribution.
When using sample statistics to estimate population parameter, there will be a chance error
$$\text{Population Parameter}=\text{Sample Statistic}+\text{Chance Error}.$$
To understand the chance error, we need to know how sample statistics distribute. Consider samples of the same size \(n\)
randomly chosen from the population with replacement.
The probability distribution of a sample statistic is called a sampling distribution.
The sampling distribution varies as the sample size changes. In general, A larger sample size will result a smaller standard deviation of the sampling distribution.
The standard deviation of a sampling distribution is also called the standard error.
The Central Limit Theorem:
As the sample size \(n\)
increases, the sampling distribution of the sample mean, from a population with the mean \(\mu\)
and the standard deviation \(\sigma\)
, will approach to a normal distribution with the mean \(\mu_{\bar{X}}=\mu\)
and the standard deviation \(\sigma_{\bar{X}}=\dfrac{\sigma}{\sqrt{n}}\)
.
Remark: In terms of standardization, the central limit theorem says that the random variable \(\bar{Z}=\dfrac{\bar{x}-\mu}{\sigma/\sqrt{n}}\)
has an approximately standard normal distribution.
For most distributions (not highly skewed), when sample size \(n>30\)
, the sampling distribution of the sample mean \(\bar{X}\)
can be approximated reasonably well by a normal distribution. The larger the sample size, the better the approximation will be.
When the population is normally distributed, the sampling distribution of the sample means will be normally distributed for any sample size.
If the population distribution is highly skewed, relying on CLT can be risky.
Randomly draw samples of size 2 with replacement from the numbers 1, 3, 4.
Randomly draw samples of size 2 with replacement from the numbers 1, 3, 4.
Solution: Using the Excel function AVERAGE()
, we may find means of samples and means of sample means.
Using the Excel function STDEV.P()
, we may find the standard deviation of the population and the standard deviation of sample means.
\(\color{red}{\mu}\) |
\(\color{red}{\sigma}\) |
\(\color{blue}{\mu_{\bar{X}}}\) |
\(\color{blue}{\sigma_{\bar{X}}}\) |
---|---|---|---|
2.7 | 1.25 | 2.7 | 0.88 |
sample | (1,1) | (1,3) | (1,4) | (3,1) | (3,3) | (3,4) | (4,1) | (4,3) | (4,4) |
---|---|---|---|---|---|---|---|---|---|
\(\bar{X}\) |
1 | 2 | 2.5 | 2 | 3 | 3.5 | 2.5 | 3.5 | 4 |
It can be verified that \(\mu_{\bar{X}}=\mu\)
and \(\sigma/\sqrt{n}=1.25/\sqrt{2}\approx 0.88=\sigma_{\bar{X}}\)
.
Solution: (Continued) The following are the distribution of the population and the distribution of sample means.
Example: Suppose the mean length of time that a caller is placed on hold when telephoning a customer service center is 23.8 seconds, with standard deviation 4.6 seconds. Find the probability that the mean length of time on hold in a random sample of 1,000 calls will be within 0.5 second of the population mean.
Example: Suppose the mean length of time that a caller is placed on hold when telephoning a customer service center is 23.8 seconds, with standard deviation 4.6 seconds. Find the probability that the mean length of time on hold in a random sample of 1,000 calls will be within 0.5 second of the population mean.
Solution: Since the sample size \(n=1000>30\)
is large enough, by the Central Limit Theorem, we know that the mean length of time is approximately normally distributed.
The mean of the sampling distribution is \(\mu_{\bar{X}}=\mu=23.8\)
.
The standard deviation of the sampling distribution is \(\mu_{\bar{X}}=\dfrac{\sigma}{\sqrt{n}}=\dfrac{4.6}{\sqrt{1000}}\)
.
The probability is
NORM.DIST(24.3, 23.8, 4.6/SQRT(1000),TRUE)-NORM.DIST(23.3, 23.8, 4.6/SQRT(1000),TRUE)
Suppose speeds of vehicles on a particular stretch of roadway are normally distributed with mean 36.6 mph and standard deviation 1.7 mph.
\(X\)
of a randomly selected vehicle is between 35 and 40 mph.\(\bar{X}\)
of 10 randomly selected vehicles is between 35 and 40 mph.Solution: Since the population is normally distributed \(\mu=36.6\)
and \(\sigma=1.7\)
, the sampling distribution of the sample mean is also normal distributed but with \(\mu_{\bar{x}}=\mu=36.6\)
and \(\sigma_{\bar{X}}=\sigma/\sqrt{n}=1.7/\sqrt{10}\)
.
The probability that the speed of a vehicle is between 35 and 40 is \(P(35< X< 40)=P(X< 40)-P(X<35)\approx 0.8039\)
which can be obtained by NORM.DIST(40, 36.6, 1.7, TRUE)-NORM.DIST(35, 36.6, 1.7, TRUE)
.
The probability getting a sample of size 10 with the mean between 35 and 40 is
\(P(35<\bar{X}< 40)=P(\bar{X}< 40)-P(\bar{X}<35)\approx 0.9985\)
which can be obtained by
NORM.DIST(40, 36.6, 1.7/SQRT(10), TRUE)-NORM.DIST(35, 36.6, 1.7/SQRT(10), TRUE)
When working with categorical variables, we often study the proportion of a data set.
The proportion of a specific characteristic in a data set can be viewed as the mean of the data set by identifying the specific characteristic with 1 and others with \(0\)
.
Example: Consider the following data set 1, 0, 1, 1, 0, 0, 1, 0, 1, 1 Find proportion of red numbers and the mean of the data set.
Solution: The proportion of red numbers is \(\frac{6}{10}=0.6\)
. So is the mean: \(\frac{6\cdot 1 + 4\cdot 0}{10}=0.6\)
.
Consider a population consisting of 1s and 0s. Let \(p\)
be the proportion of 1s. Then standard deviation is
$$\sigma=\sqrt{(1-p)^2p+(0-p)^2(1-p)}=\sqrt{p(1-p)}.$$
For a sampling distribution of sample proportion, we write \(\hat{P}\)
for the random variable of sample proportions.
Central Limit Theorem for Proportion:
For large samples, the distribution of sample proportions \(\hat{P}\)
is approximately normal, with the mean \(\mu_{\hat{P}}=p\)
and standard deviation \(\sigma_{\hat{P}}=\sqrt{\frac{p(1-p)}{n}}\)
, where \(p\)
is the population proportion.
As a sample proportion is always between 0 and 1, and 99.7% of sample proportions lie within 3 standard deviation away from the population proportion, when using the central limit theorem for proportion, we require the sample size \(n\)
satisfying the following condition: the interval \(\left[p-3\sqrt{\frac{p(1-p)}{n}}, p+3\sqrt{\frac{p(1-p)}{n}}\right]\)
lies wholly in the interval \([0, 1]\)
.
In practice, if \(n\)
satisfies the following two inequalities: \(np\ge 10\)
and \(n(1-p)\ge 10\)
, then we consider \(n\)
is large enough for assuming that the sampling distribution of the sample proportion is approximately normal.
When the population proportion \(p\)
is unknown, to apply the central limit theorem for proportion, we require the sample size \(n\)
satisfying the same conditions with \(p\)
replaced by the sample proportion \(\hat{p}\)
. That is, the sample size \(n\)
should satisfies \(n\hat{p}\ge 10\)
and \(n(1-\hat{p})\ge 10\)
.
Suppose that in a population of voters in a certain region 53% are in favor of a particular law. Nine hundred randomly selected voters are asked if they favor the law.
Find the probability that the sample proportion computed from a random sample of size 900 will be at least 2% above true population proportion.
Suppose that in a population of voters in a certain region 53% are in favor of a particular law. Nine hundred randomly selected voters are asked if they favor the law.
Find the probability that the sample proportion computed from a random sample of size 900 will be at least 2% above true population proportion.
Solution: We first verify that the sampling distribution is approximately normal.
Since \(p=0.53\)
and \(n=900\)
, \(np=900\cdot 0.53>10\)
and \(n(1-p)=900(1-0.53)>10\)
. By the central limit theorem, the sampling distribution is approximately normal.
The standard deviation of the sampling distribution is \(\sigma_{\hat{P}}=\sqrt{\frac{0.53(1-0.53)}{900}}\approx 0.017\)
.
Then the probability that the random sample has a proportion at least 2% above 53% is
$$P(\hat{P}>0.55)=1-P(\hat{P}\le 0.55)\approx 0.1197$$
which can be obtained by
1-NORM.DIST(0.55, 0.53, SQRT(0.53*(1-0.53)/900),TRUE)
.
Suppose that in 36% of all car accidents involve injury. Find the probability that the injury rate in a random sample of 250 car accidents is between 30% and 45%.
Suppose that in 36% of all car accidents involve injury. Find the probability that the injury rate in a random sample of 250 car accidents is between 30% and 45%.
Solution: Firs we verify that the sample size is large enough to assume the sample proportion is approximately normally distributed by the Central Limit Theorem.
The injury rate of all car accidents is \(p=10\%=0.3\)
and the sample size is \(250\)
. Because \(np=250\cdot 0.36=90>10\)
and \(n(1-p)=250\cdot(1-0.36)=160>10\)
, the sample size is considered large enough.
Let \(\hat{P}\)
be the sample proportion of a random sample. By the Central Limit Theorem, the distribution of \(\hat{P}\)
is approximately normally with the mean \(p=0.36\)
and standard deviation \(\sigma_{\hat{P}}=\sqrt{\frac{p(1-p)}{n}}\approx 0.03\)
Using the Excel function, NORM.DIST(x, mean, SD, TRUE)
, we find the probability of a random sample of 250 car accidents with the injury rate between 30% and 45% is
An unknown distribution has a mean of 28 and a standard deviation 6. Samples of size n = 30 are drawn randomly from the population. Find the probability that the sample mean is between 27 and 30.
The numerical population of grade point averages at a college has mean 2.61 and standard deviation 0.5. If a random sample of size 100 is taken from the population, what is the probability that the sample mean will be between 2.51 and 2.71?
In a mayoral election, based on a poll, a newspaper reported that the current mayor received 45% of the vote. If this is true, what is the probability that a random sample of 100 voters had less than 35% voting for the current mayor?
A population has mean 73.5 and standard deviation 2.5.
\(\bar{X}\)
for samples of size 30.A normally distributed population has mean 57.7 and standard deviation 12.1.
\(\bar{X}\)
for samples of size 16.Suppose the mean amount of cholesterol in eggs labeled “large” is 186 milligrams, with standard deviation 7 milligrams. Find the probability that the mean amount of cholesterol in a sample of 144 eggs will be within 2 milligrams of the population mean.
Suppose that 8% of all males suffer some form of color blindness. Find the probability that in a random sample of 250 men at least 10% will suffer some form of color blindness.
An airline claims that 72% of all its flights to a certain region arrive on time. In a random sample of 30 recent arrivals, 19 were on time. You may assume that the normal distribution applies.
NORM.DIST()
FunctionLet \(X\)
be a normal random variable with mean \(\mu\)
and standard deviation \(\sigma\)
, that is \(X\sim \mathcal{N}(\mu, \sigma^2)\)
. In Excel, \(P(X<x)\)
is given by NORM.DIST(x, mean, sd, TRUE)
.
Recall the mean of a data set can obtained by the Excel function AVERAGE()
.
Given the population mean \(\mu\)
and standard deviation \(\sigma\)
, if the sample size \(n\)
is bigger than 30 and the sample mean is \(\bar{x}\)
. The probability of getting another sample of the same size but smaller mean can be obtained by the following Excel function:
NORM.DIST(
\(\bar{x},\mu,\sigma\)
/sqrt(n),TRUE)
.
Demonstrate understanding of the sampling distribution of a statistic.
Explain how the central limit theorem applies in inference.
Determine whether a sampling distribution is approximately a normal distribution.
Calculate key characteristics (mean, standard error) of the sampling distribution of a statistic.
Estimate the probability of an event using the sampling distribution.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Alt + f | Fit Slides to Screen |
Esc | Back to slideshow |