Distinguish between a population and a sample.
Determine whether a study is an observational study or an experiment.
Determine the goal of a statistical study and what types of conclusions are appropriate.
Recognize typical forms of sampling biases such as convenience sample and voluntary response.
Explain why randomization should be used.
Describe how to implement a randomized design: Simple random sample, Stratified random sample, Cluster random sample, Systematic random sample.
Determine whether the conclusion of an experiment design is appropriate.
Identify the lurking variable and confounding variable.
Data consists of information from observation, counts, measurements, responses or experiments.
A population is the collection of all objects that are of interest.
A parameter is a number that is a property of the population.
A sample is a subset of a population.
A statistic is a number, such as a percentage, that represents a property of a sample.
Determine if the group is a population or sample
In statistics, a variable is a characteristic, or attribute of interest that we gather about individuals or objects. There are two types of variables according to their values.
Categorical variables (or qualitative variables) represent attributes, labels or nonnumerical entries, such as names, and colors.
Quantitative variables represent numerical measurements or counts, such as weights and number of students in each class.
Variable | Type |
---|---|
Age | Quantitative |
Hair color | Qualitative |
GPA | Quantitative |
Education attainment (AS, BS, MS, etc.) | Qualitative |
Identify the population, sample, the variable of study, the type of the variable, the population parameter and the sample statistics.
An administrator wishes to estimate the passing rate of a certain course. She takes a random sample of 50 students and obtains their letter grades of that course. Among the 50 students, 32 students earned a grade C or better.
A statistical study can usually be categorized as an observational study or an experiment by the mean of study.
An observational study observes individuals and measures variables of interest. The main purpose of an observational study is to describe a group of individuals or to investigate an association between two variables.
An experiment intentionally manipulates one variable in an attempt to cause an effect on another variable. The primary goal of an experiment is to provide evidence for a cause-and-effect relationship between two variables.
Which type of study will answer the question best.
what proportion of all college students in the United States have taken classes at a community college?
Does use of computer-aided instruction in college math classes improve test scores?
Answer: 1. Observational, 2.experimental
See Types of Statistical Studies (2 of 4) in the textbook Concepts in Statistics for more examples.
Identify the type of statistical study:
A study took random sample of adults and asked them about their bedtime habits. The data showed that people who drank a cup of tea before bedtime were more likely to go to sleep earlier than those who didn't drink tea.
A. Observational
B. Experimental
Another study took a group of adults and randomly divided them into two groups. One group was told to drink tea every night for a week, while the other group was told not to drink tea that week. Researchers then compared when each group fell asleep.
A. Observational
B. Experimental
Source: Khan Academy
Type of Research Question | Examples |
---|---|
Make an estimate about the population (often an estimate about an average value or a proportion with a given characteristic) | What is the average number of hours that community college students work each week? What proportion of all U.S. college students are enrolled at a community college? |
Test a claim about the population (often a claim about an average value or a proportion with a given characteristic) | Is the average course load for a community college student greater than 12 units? Do the majority of community college students qualify for federal student loans? |
Type of Research Question | Examples |
---|---|
Compare two populations (often a comparison of population averages or proportions with a given characteristic) | In community colleges, do female students have a higher GPA than male students? Are college athletes more likely than non-athletes to receive academic advising? |
Investigate a correlation between two variables in the population | Is there a correlation between the number of hours high school students spend each week on Facebook and their GPA? Is academic counseling associated with quicker completion of a college degree? |
A research question that focuses on a cause-and-effect relationship is common in disciplines that use experiments, such as medicine or psychology.
In a study of a relationship between two variables, one variable is the explanatory variable, and the other is the response variable.
Determine if the question is a cause-and-effect question? What are the explanatory and response variables?
Answer:
This question investigate a cause-and-effect relationship. The explanatory variable is computer-aided instruction and the response variable is the test scores.
This question investigate a correlation between variables in a population and is not a cause-and-effect question. The explanatory variable is tutoring and the response variable is the performance.
In general, we should not make cause-and-effect statements from observational studies unless impact of confounding variables can be significantly decreased.
A researcher studies the medical records of 500 randomly selected patients. Based on the information in the records, he divides the patients into two groups: those given the recommendation to take an aspirin every day and those with no such recommendation. He reports the percentage of each group that developed heart disease.
Determine whether the study supports the conclusion that taking aspirin lowers the risk of heart attacks.
Answer: The conclusion claims a cause-and-effect relationship. To answer the question, we need an experimental study. However, the study has no control on data which makes it inappropriate.
Does higher education attainment lead to higher salary?
To make accurate inference, the sample must be representative of the population.
A sampling plan describes exactly how we will choose the sample.
A sampling plan is biased if it systematically favors certain outcomes.
In random Sampling, every individual or object has an equal chance of being selected.
Simple random sample: groups of the same size are randomly selected. Table of random numbers, calculator and softwares are often used to generate random numbers.
Stratified random sample: The population is first split into homogeneous groups, i.e. subjects in the same group share the same attribute. Then a same proportion or a same number of subjects from each group are selected randomly.
Show how to generate a random number using the Excel function RANDBETWEEN() In the latest version of Excel, a new function RANDARRAY() is available.
Cluster sample: The population is first split into groups. Then some groups are selected randomly.
Systematic sample: First, a starting number is chosen randomly. Then take every \(n\)
-th piece of the data.
Biased sampling
Undercoverage
Suppose that you want to estimate the proportion of students at your college that use the library.
Which sampling plan will produce the most reliable results?
Select 100 students at random from students in the library.
Select 200 students at random from students who use the Tutoring Center.
Select 300 students who have checked out a book from the library.
Select 50 students at random from the college.
Suppose that you want to estimate the proportion of students at your college that use the library.
Which sampling plan will produce the most reliable results?
Select 100 students at random from students in the library.
Select 200 students at random from students who use the Tutoring Center.
Select 300 students who have checked out a book from the library.
Select 50 students at random from the college.
Answer: The 4th sampling plan is the most reliable plan. The first three undercover the college.
In general, the larger sample size, the more accurate of conclusion. However, we have to avoid bad sampling.
Students at an elementary school are given a questionnaire that they are asked to return after their parents have completed it. One of the questions asked is, "Do you find that your work schedule makes it difficult for you to spend time with your kids after school?" Of the parents who replied, 85% said "no". Based on these results, the school officials conclude that a great majority of the parents have no difficulty spending time with their kids after school.
Problem 1.39 (a) in Openintro Statistics
To establish a cause-and-effect relationship, we want to make sure that the explanatory variable is the only thing that impacts the response variable. We therefore want to get rid of all other factors that might affect the response. These factors are called confounding variables.
Control reduces the effects of extraneous variables (i.e., variables other than the independent variable and the dependent variable). These extraneous variables are called lurking variables.
Three control strategies are control groups, placebos, and blinding.
A control group is a baseline group that receives no treatment or a neutral treatment.
A neutral treatment that has no "real" effect on the dependent variable is called a placebo, and a participant's positive response to a placebo is called the placebo effect.
Blinding is the practice of not telling participants whether they are receiving a placebo. Double-blinding is the practice of not telling both both the participants and the researchers which group receiving a treatment or a placebo.
Randomization ensures that this estimate is statistically valid.
Replication reduces variability in experimental results and increases their significance.
A company tested their new golf ball by having 20 professional golfers each hit 100 shots with the company's new ball and 100 shots with the golfer's current ball (in a random order). The labels were removed, so the golfers didn't know which balls were which. The golfers, on average, hit their shots significantly farther with the new ball.
The company cites this study in an advertisement claiming that this new ball will help all golfers hit farther shots.
Is the company's claim appropriate? Why?
Source: Khan Academy
A *confounding variable is a variable other than the explanatory variable that has at least a partial effect on the response variable.
Example: In the study of the relation between a type fertilizer and tomato size, the amount of sunshine will be a confounding variable. It contributes to the growth of tomato.
A *lurking variable is a variable other than the explanatory and response variables that has effects on both the explanatory variable and the response variable.
Example: People find that there is a positive association between number of firefighters and amount of damage. However, both are affected the size of fire.
Example: Randomly generate a number between 0 and 1.
Step 1: Choose a cell, say A1
Step 2: click insert function button \(f_x\)
.
Step 3: In the popup window, search "random" and select RAND.
Step 4: Click OK, you will get a randomly generated number.
Alternatively, you may also manually enter the function: =rand()
in the cell and hit enter.
Example: Generate 10 random integers of 2 digits.
Step 1: Generate a random integer, say in the cell A1
, using the Excel function randbetween(bottom,top)
.
Step 2: Move the mouse cursor to the lower right corner of the cell A1
. A solid plus +
will appear.
Step 3: Hold the left-click of the mouse and drag the cell to horizontally or vertically to get 10 numbers.
Using randbetween
, you will find that some numbers were repeated. If you are using the latest version of Excel, you may use randarray
to generate numbers without repetition.
Example: Generate 10 random integers of 2 digits without repetition.
A1
, apply the Excel function =randarray(rows,columns,min,max,integer)
. In this case, you should set rows=10, columns=1, min=10, max=99, and choose integer to be TRUE.Insert a table
Step 1: In the menu bar, select Insert.
Step 2: Look for Table and click it.
Step 3: In the popup windows, you may enter the two diagonal cell locators. For that, press Shift and select the two diagonal cells of you table.
Step 4: Click OK. You will see the table.
Remark: Tables are normally used for more than one variables, that are characteristic or attributes being studied, such as attendance rate and grade. In table, a column is usually used to put entries of a data set for a certain variable. Rows are used as labels of individual entries.
Insert or delete cells, rows or columns
Step 1: Highlight by left clicking the cell(s), row, or column that you want to insert or delete.
Step 2: Right-click the highlighted cell, row or column
Step 3: In the popup window, select insert or delete and follow the instruction.
We will use analysis toolpak frequently for analyzing data.
To install the add-in The Analysis ToolPak
:
Step 1: In the Excel menu bar, select Home.
Step 2: Choose and click options
Step 3: In the popup window, choose and click Add-ins.
Step 4: In the new display, look for Manage: Excel Add-ins and click Go next to it.
Step 5: In the new popup windows, select The Analysis ToolPak and then click the OK button.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Alt + f | Fit Slides to Screen |
Esc | Back to slideshow |