Correlation Between Variables: Contingency Tables and Scatter Plots
To understand the relationship between two variables, we use correlation.
But how we analyze that relationship depends on the type of data weโre working with โ categorical or quantitative.
๐ This post is part of the "Intro to Statistics" series
๐ Previously: A Real-World Statistics Example
๐ Next: Understanding Pearson's R
๐ Real-Life Case: Study Habits and Exam Performance
Imagine a high school counselor wants to investigate the relationship between how often students study and whether they pass or fail a weekly quiz.
She surveys 30 students and records two things:
- ๐ Study Time Category: Rarely, Sometimes, Often
- โ Quiz Result: Pass or Fail
๐งฎ Step 1: The Contingency Table
This type of table is used for categorical variables. It shows how often combinations of categories occur.
Study Frequency \ Quiz Result | Pass | Fail | Total |
---|---|---|---|
Rarely | 3 | 7 | 10 |
Sometimes | 6 | 4 | 10 |
Often | 9 | 1 | 10 |
Total | 18 | 12 | 30 |
๐ Step 2: Conditional Proportions
The raw counts donโt tell the full story. So we calculate the percentage of each outcome within each group.
For example:
- Among students who study Rarely, 3/10 passed = 30%
- Among those who study Often, 9/10 passed = 90%
Study Frequency | % Passed | % Failed |
---|---|---|
Rarely | 30% | 70% |
Sometimes | 60% | 40% |
Often | 90% | 10% |
โ These are conditional proportions โ percentages within each row.
๐ Step 3: Understanding Proportions โ Quick Summary
We use conditional proportions to look within groups, and marginal proportions to summarize a variable on its own.
- Conditional example:
Among those who study Rarely โ 3/10 passed = 30% - Marginal example:
Overall pass rate โ 18/30 = 60%
๐ Want a full breakdown with examples, visual tables, and when to use each?
๐ Read: Conditional vs. Marginal Proportions โ
๐ Step 4: Interpreting the Categorical Correlation
The more a student studies, the more likely they are to pass.
We can see a positive association in the conditional proportions:
- Rarely study โ low pass rate
- Often study โ high pass rate
โก๏ธ But contingency tables donโt quantify correlation โ they only show patterns.
๐ Step 5: Letโs Make It Quantitative
Now letโs change the scenario:
The counselor asks students for their exact number of study hours per week and records their quiz scores out of 100.
Hereโs a sample:
Hours Studied | Quiz Score |
---|---|
2 | 50 |
3 | 55 |
5 | 65 |
7 | 70 |
8 | 76 |
10 | 85 |
12 | 92 |
๐ Step 6: Scatter Plot
This type of plot is perfect for quantitative variables.
It helps us visually assess correlation:
- Each point = one student
- X-axis: Hours studied
- Y-axis: Quiz score
Youโll notice: the more hours students study, the higher their scores.
This is a strong positive relationship.
๐ง Level Up: Choosing the Right Correlation Approach Based on Data Types
Correlation analysis isnโt one-size-fits-all โ the type of variables determines the best method:
- ๐ For two quantitative variables, measures like Pearson's r capture linear relationships.
- ๐ For two categorical variables, contingency tables and tests like Chi-square help assess association.
- ๐ For mixed variable types, specialized methods like point-biserial correlation or ANOVA are used.
Understanding your data types ensures you pick the most powerful and appropriate analysis technique.
๐ Try It Yourself
Q: If your data has outliers that raise the mean, which measure of center is more reliable: mean or median?
๐ก Show Answer
โ Median โ because it's resistant to outliers, unlike the mean which gets skewed.
โ Conclusion
Type of Data | Tool to Use | Example |
---|---|---|
Categorical (Nominal/Ordinal) | Contingency Table | Study Frequency vs Pass/Fail |
Quantitative | Scatter Plot | Hours Studied vs Quiz Score |
๐ง Choose the right tool based on your variable types.
๐ Up Next
Next, weโll calculate the Pearson correlation coefficient (r) โ a number that tells us how strong a linear relationship really is.