Post

Correlation Between Variables: Contingency Tables and Scatter Plots

Correlation Between Variables: Contingency Tables and Scatter Plots

To understand the relationship between two variables, we use correlation.

But how we analyze that relationship depends on the type of data weโ€™re working with โ€” categorical or quantitative.


๐Ÿ“š This post is part of the "Intro to Statistics" series

๐Ÿ”™ Previously: A Real-World Statistics Example

๐Ÿ”œ Next: Understanding Pearson's R


๐ŸŽ“ Real-Life Case: Study Habits and Exam Performance

Imagine a high school counselor wants to investigate the relationship between how often students study and whether they pass or fail a weekly quiz.

She surveys 30 students and records two things:

  • ๐Ÿ“š Study Time Category: Rarely, Sometimes, Often
  • โœ… Quiz Result: Pass or Fail

๐Ÿงฎ Step 1: The Contingency Table

This type of table is used for categorical variables. It shows how often combinations of categories occur.

Study Frequency \ Quiz ResultPassFailTotal
Rarely3710
Sometimes6410
Often9110
Total181230

๐Ÿ” Step 2: Conditional Proportions

The raw counts donโ€™t tell the full story. So we calculate the percentage of each outcome within each group.

For example:

  • Among students who study Rarely, 3/10 passed = 30%
  • Among those who study Often, 9/10 passed = 90%
Study Frequency% Passed% Failed
Rarely30%70%
Sometimes60%40%
Often90%10%

โœ… These are conditional proportions โ€” percentages within each row.


๐Ÿ“Š Step 3: Understanding Proportions โ€” Quick Summary

We use conditional proportions to look within groups, and marginal proportions to summarize a variable on its own.

  • Conditional example:
    Among those who study Rarely โ†’ 3/10 passed = 30%
  • Marginal example:
    Overall pass rate โ†’ 18/30 = 60%

๐Ÿ“š Want a full breakdown with examples, visual tables, and when to use each?
๐Ÿ‘‰ Read: Conditional vs. Marginal Proportions โ†’


๐Ÿ” Step 4: Interpreting the Categorical Correlation

The more a student studies, the more likely they are to pass.
We can see a positive association in the conditional proportions:

  • Rarely study โ†’ low pass rate
  • Often study โ†’ high pass rate

โžก๏ธ But contingency tables donโ€™t quantify correlation โ€” they only show patterns.


๐Ÿ”„ Step 5: Letโ€™s Make It Quantitative

Now letโ€™s change the scenario:

The counselor asks students for their exact number of study hours per week and records their quiz scores out of 100.

Hereโ€™s a sample:

Hours StudiedQuiz Score
250
355
565
770
876
1085
1292

๐Ÿ“ˆ Step 6: Scatter Plot

This type of plot is perfect for quantitative variables.
It helps us visually assess correlation:

  • Each point = one student
  • X-axis: Hours studied
  • Y-axis: Quiz score

Scatter Plot โ€“ Study Hours vs Quiz Score

Youโ€™ll notice: the more hours students study, the higher their scores.
This is a strong positive relationship.


๐Ÿง  Level Up: Choosing the Right Correlation Approach Based on Data Types

Correlation analysis isnโ€™t one-size-fits-all โ€” the type of variables determines the best method:

  • ๐Ÿ“Š For two quantitative variables, measures like Pearson's r capture linear relationships.
  • ๐Ÿ“‹ For two categorical variables, contingency tables and tests like Chi-square help assess association.
  • ๐Ÿ”„ For mixed variable types, specialized methods like point-biserial correlation or ANOVA are used.

Understanding your data types ensures you pick the most powerful and appropriate analysis technique.


๐Ÿ“Œ Try It Yourself

Q: If your data has outliers that raise the mean, which measure of center is more reliable: mean or median?

๐Ÿ’ก Show Answer

โœ… Median โ€” because it's resistant to outliers, unlike the mean which gets skewed.


โœ… Conclusion

Type of DataTool to UseExample
Categorical (Nominal/Ordinal)Contingency TableStudy Frequency vs Pass/Fail
QuantitativeScatter PlotHours Studied vs Quiz Score

๐Ÿง  Choose the right tool based on your variable types.


๐Ÿ”œ Up Next

Next, weโ€™ll calculate the Pearson correlation coefficient (r) โ€” a number that tells us how strong a linear relationship really is.

This post is licensed under CC BY 4.0 by the author.