Summarize and interpret the relationship between two two qualitative (categorical) variables using two-way tables.
Demonstrate understanding and find conditional, joint and marginal probability from a two-way frequency table.
Create and analyze two-way table to answer probability questions.
As we organize and analyze data from two categorical variables, we make use of two-way tables.
Information in a two-way frequency table:
Values of the two variables are displayed in the left column and the top row.
The body of table consists of frequency counts associated to pairs of values of the two variables.
The right column and the bottom row, which are called margins of the table, consists of row totals and column totals respectively.
A number in a margin are called marginal frequency.
A numbers in the body of the table is called joint frequency.
The following table summarize responses of a random sample of 1,200 U.S. college students as part of a larger survey.
About Right | Overweight | Underweight | Row Totals | |
---|---|---|---|---|
Female | 560 | 163 | 37 | 760 |
Male | 295 | 72 | 73 | 440 |
Column Totals | 855 | 235 | 110 | 1,200 |
A two-way relative frequency table is obtained from a two-way frequency table by converting frequencies in a two-way table to relative frequencies.
Marginal probability
$$P(X)=\frac{\text{Marginal frequency in}~ X}{\text{Total}}$$
Conditional probability
$$P(X|Y)=\frac{\text{Joint frequency}}{\text{Marginal Frequency in}~Y} \quad \text{or}\quad P(Y|X)=\frac{\text{Joint frequency}}{\text{Marginal Frequency in}~X}$$
Joint probability
$$P(X ~\text{and}~ Y)=\frac{\text{Joint frequency}}{\text{Total}}$$
Note that \(P(X~\text{and}~Y)=P(X)\cdot P(Y|X)=P(Y)\cdot P(X|Y).\)
The following table shows joint and marginal probabilities of body image and gender.
About Right | Overweight | Underweight | Row Totals | |
---|---|---|---|---|
Female | \(\frac{560}{1200}=46.67\%\) |
\(\frac{163}{1200}=13.58\%\) |
\(\frac{37}{1200}=3.08\%\) |
\(\frac{760}{1200}=63.33\%\) |
Male | \(\frac{295}{1200}=24.58\%\) |
\(\frac{72}{1200}=6.00\%\) |
\(\frac{73}{1200}=6.08\%\) |
\(\frac{440}{1200}=36.67\%\) |
Column Totals | \(\frac{855}{1200}=71.25\%\) |
\(\frac{235}{1200}=19.58\%\) |
\(\frac{110}{1200}=9.17\%\) |
\(\frac{1200}{1200}=100.00\%\) |
The following table shows probabilities of randomly select male or female who has a certain body image.
About Right | Overweight | Underweight | Row Totals | |
---|---|---|---|---|
Female | \(\frac{560}{760}=73.68\%\) |
\(\frac{163}{760}=21.45\%\) |
\(\frac{37}{760}=4.87\%\) |
\(\frac{760}{760}=100.00\%\) |
Male | \(\frac{295}{440}=67.05\%\) |
\(\frac{72}{440}=16.36\%\) |
\(\frac{73}{440}=16.59\%\) |
\(\frac{440}{440}=100.00\%\) |
The following table summarizes the full-time enrollment at a community college.
Arts-Sci | Bus-Econ | Info Tech | Health Science | Graphics Design | Culinary Arts | Row Totals | |
---|---|---|---|---|---|---|---|
Female | 4,660 | 435 | 494 | 421 | 105 | 83 | 6,198 |
Male | 4,334 | 490 | 564 | 223 | 97 | 94 | 5,802 |
Column Totals | 8,994 | 925 | 1,058 | 644 | 202 | 177 | 12,000 |
What proportion of the total number of students are male students?
The following table summarizes the full-time enrollment at a community college.
Arts-Sci | Bus-Econ | Info Tech | Health Science | Graphics Design | Culinary Arts | Row Totals | |
---|---|---|---|---|---|---|---|
Female | 4,660 | 435 | 494 | 421 | 105 | 83 | 6,198 |
Male | 4,334 | 490 | 564 | 223 | 97 | 94 | 5,802 |
Column Totals | 8,994 | 925 | 1,058 | 644 | 202 | 177 | 12,000 |
What proportion of the total number of students are male students?
Solution:
$$P(\text{Male})=\dfrac{5802}{12000}\approx 0.4835=48.35\%.$$
If we select a male student at random, what is the probability that he is in the Info Tech program?
If we select a male student at random, what is the probability that he is in the Info Tech program?
Solution:
$$P(\text{Info Tech}|\text{Male})=\dfrac{564}{5802}\approx 0.097=9.7\%.$$
If we select a male student at random, what is the probability that he is in the Info Tech program?
Solution:
$$P(\text{Info Tech}|\text{Male})=\dfrac{564}{5802}\approx 0.097=9.7\%.$$
If a student is selected at random, what is the probability that the student is a male and in the Info Tech program?
If we select a male student at random, what is the probability that he is in the Info Tech program?
Solution:
$$P(\text{Info Tech}|\text{Male})=\dfrac{564}{5802}\approx 0.097=9.7\%.$$
If a student is selected at random, what is the probability that the student is a male and in the Info Tech program?
Solution:
$$P(\text{Male and Info Tech})=\dfrac{564}{12000}= 0.047=4.7\%.$$
If we select a male student at random, what is the probability that he is in the Info Tech program?
Solution:
$$P(\text{Info Tech}|\text{Male})=\dfrac{564}{5802}\approx 0.097=9.7\%.$$
If a student is selected at random, what is the probability that the student is a male and in the Info Tech program?
Solution:
$$P(\text{Male and Info Tech})=\dfrac{564}{12000}= 0.047=4.7\%.$$
The probabilities are related:
If we select a male student at random, what is the probability that he is in the Info Tech program?
Solution:
$$P(\text{Info Tech}|\text{Male})=\dfrac{564}{5802}\approx 0.097=9.7\%.$$
If a student is selected at random, what is the probability that the student is a male and in the Info Tech program?
Solution:
$$P(\text{Male and Info Tech})=\dfrac{564}{12000}= 0.047=4.7\%.$$
The probabilities are related:
Solution:
$$P(\text{Male and Info Tech})=\dfrac{564}{12000}=\dfrac{5802}{12000}\cdot \dfrac{564}{5802}=P(\text{Male})\cdot P(\text{Info Tech}|\text{Male}).$$
This table relates the weights and heights of a group of individuals participating in an observational study.
Weight/Height | Tall | Medium | Short |
---|---|---|---|
Obese | 18 | 28 | 14 |
Normal | 20 | 51 | 28 |
Underweight | 12 | 25 | 9 |
Source: https://courses.lumenlearning.com/introstats1/chapter/contingency-tables/
To understand association between categorical variables, we may think conversely. How do we test no association?
If the conditional probabilities are nearly equal for all categories, there may be no association between the variables. Conversely, if the conditional probabilities are different enough, we are confidence to say there is an association.
In general, the bigger the differences in the conditional probabilities, the stronger the association between the variables.
Two variables \(X\)
and \(Y\)
are independent if \(P(X~\text{and}~Y)=P(X)\cdot P(Y)\)
. Equivalently, \(P(X|Y)=P(X)\)
and \(P(Y|X)=P(Y)\)
.
Is body image related to gender?
About Right | Overweight | Underweight | Row Totals | |
---|---|---|---|---|
Female | 560 | 163 | 37 | 760 |
Male | 295 | 72 | 73 | 440 |
Column Totals | 855 | 235 | 110 | 1,200 |
Solution: Using Excel (stacked bar chart), we may compare side-by-side the conditional body image distributions for females and males
As a result of our analysis, we know that the conditional distributions of body images for males and females are quite different. We can conclude that there is enough difference to believe that those two categorical variables are in fact related.
When calculating the probability of a negative outcome, we often refer to the probability as a risk.
In general, we are interested in determining how much a new treatment reduces the risk compared to a reference risk
The percentage reduction of risk is
$$\text{percentage reduction of risk}=\frac{\text{new treatment risk}-\text{reference risk}}{\text{reference risk}}.$$
Researchers in the Physicians’ Health Study (1989) designed a randomized double-blind experiment to determine whether aspirin reduces the risk of heart attack. Here are the final results.
Heart Attack | No Heart Attack | Row Totals | |
---|---|---|---|
Aspirin | 139 | 10,898 | 11,037 |
Placebo | 239 | 10,795 | 11,034 |
Column Totals | 378 | 21,693 | 22,071 |
Does aspirin lower the risk of having a heart attack?
Solution: To answer this question, we compare two conditional probabilities:
$$P(\text{heart attack}|\text{aspirin}) = 139 / 11,037 = 0.013$$
$$P(\text{heart attack}|\text{placebo}) = 239 / 11,034 = 0.022$$
The result shows that taking aspirin reduced the risk from 0.022 to 0.013.
The percentage reduction of risk is
Therefore, we conclude that taking aspirin results in a 41% reduction in risk.
A hypothetical two-way table, also known as a hypothetical 1000 two-way table, is a two-way table constructed from given probability conditions with 1000 as the total frequency. It can be used to answer complex probability questions.
Sometimes, the total frequency can be taken to be 10,000 or a higher power of 10 so that joint frequencies are integers.
A pregnant woman often opts to have an ultrasound to predict the gender of her baby. Assume the following facts are known:
Use the above facts to answer the following questions.
If the examination predicts a girl, how likely the baby will be a girl?
If the examination predicts a boy, how likely the baby will be a boy?
A pregnant woman often opts to have an ultrasound to predict the gender of her baby. Assume the following facts are known:
Use the above facts to answer the following questions.
If the examination predicts a girl, how likely the baby will be a girl?
If the examination predicts a boy, how likely the baby will be a boy?
Solution: Assume that we have ultrasound predictions for 1,000 random babies.
\(P(\text{Girl})=48\%\)
.\(P(\text{Predicted as girl}|\text{Girl})=9/10\)
.\(P(\text{Predicted as boy}|\text{Boy})=3/4\)
Solution: (Continued)
Using those facts, we may create a two-way frequency table.
Girl | Boy | Row Totals | |
---|---|---|---|
Predict Girl | \(480\cdot\frac{9}{10}= 432\) |
\(520-390 = 130\) |
\(432+130=562\) |
Predict Boy | \(480-432=48\) |
\(520\cdot\frac34=390\) |
\(48+390=438\) |
Column Totals | \(48\%\cdot 1000=480\) |
\(1000-480=520\) |
\(1,000\) |
If the examination predicts a girl, the probability that the born baby is a girl is
$$P(\text{Girl}|\text{predict girl})=\frac{432}{562} \approx 0.769=76.9\%.$$
If the examination predicts a boy, the probability that the born baby is a boy is
$$P(\text{Boy} | \text{predict boy}) = \frac{390}{438} \approx 0.890=89\%.$$
The table below is based on a 1988 study of accident records conducted by the Florida State Department of Highway Safety.
Nonfatal | Fatal | Row Totals | |
---|---|---|---|
Seat Belt | 412,368 | 510 | 412,878 |
No Seat Belt | 162,527 | 1,601 | 164,128 |
Column Totals | 574,895 | 2,111 | 577,006 |
Does wearing a seat belt lower the risk of an accident resulting in a fatality?
A large company has instituted a mandatory employee drug screening program. Assume that the drug test used is known to be 99% accurate. That is, if an employee is a drug user, the test will come back positive (“drug detected”) 99% of the time. If an employee is a non-drug user, then the test will come back negative (“no drug detected”) 99% of the time. Assume that 2% of the employees of the company are drug users.
If an employee’s drug test comes back positive, what is the probability that the test is wrong and the employee is in fact a non drug user?
To create a a stacked bar chart of a two-way table
First select the data table.
Look for and click Insert Column or Bar Chart
in the menu Insert
-> Charts
.
In the dropdown menu, choose the third option in 2-D Column (100% Stacked Column
) or the third option 2-D Bar (100% Stacked Bar
).
To switch row/column, in the output graph, right click the row axis or the column axis, and chose the option Select Data...
to make a switch.
The following table summarize results from a study on program selection and gender.
Arts-Sci | Bus-Econ | Info Tech | Health Science | Graphics Design | Culinary Arts | Row Totals | |
---|---|---|---|---|---|---|---|
Female | 4,660 | 435 | 494 | 421 | 105 | 83 | 6,198 |
Male | 4,334 | 490 | 564 | 223 | 97 | 94 | 5,802 |
Column Totals | 8,994 | 925 | 1,058 | 644 | 202 | 177 | 12,000 |
Use Excel to answer the following question about the study.
Is there an association between gender and program selection? Why or why not?
If they are associated, is the association strong or week?
Summarize and interpret the relationship between two two qualitative (categorical) variables using two-way tables.
Demonstrate understanding and find conditional, joint and marginal probability from a two-way frequency table.
Create and analyze two-way table to answer probability questions.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Alt + f | Fit Slides to Screen |
Esc | Back to slideshow |