How to Build Frequency Tables in Python (With Charts)
Before building a machine learning model or exploring data in Python, you need to understand how your data is distributed. This guide walks you through creating frequency tables for both categorical and continuous data β and visualizing them using bar charts and histograms with Python libraries like pandas, numpy, and matplotlib.
π This post is part of the "Intro to Statistics" series
π Previously: Choosing the Right Graph: How to Visualize Your Data
π Next: Measuring the Center: Mean, Median, and Mode Explained
π Understand your data.
Thatβs where frequency tables come in β they help you summarize your raw data and reveal hidden patterns. In this post, youβll learn how to create frequency tables in Python and visualize them with charts.
Weβll cover:
βοΈ What frequency tables are
βοΈ Why they matter
βοΈ How to build them for both categorical and numerical data
βοΈ How to plot them with bar charts and histograms
π What is a Frequency Table?
A frequency table shows how often each value appears in your data. Itβs a way to take messy raw numbers and turn them into something readable β a summary that helps you spot patterns fast.
π‘ Imagine you asked 20 people about their favorite fruit. Hereβs the data:
1
2
3
4
fruits = ['apple', 'banana', 'apple', 'orange', 'banana', 'banana',
'apple', 'banana', 'orange', 'apple', 'banana', 'banana',
'orange', 'apple', 'apple', 'banana', 'banana', 'apple',
'orange', 'banana']
We want to know: how many chose each fruit?
π’ Step 1: Count Frequencies for Categorical Data
1
2
3
4
from collections import Counter
fruit_counts = Counter(fruits)
print(fruit_counts)
π Output:
1
Counter({'banana': 9, 'apple': 7, 'orange': 4})
π― This is your frequency table. It tells you that:
- 9 people chose banana π
- 7 people chose apple π
- 4 people chose orange π
πΌ Alternative: Using pandas for Frequency Tables
If your data is in a pandas DataFrame (which is common in real projects), you can use value_counts()
for one column or pd.crosstab()
for two-way tables:
1
2
3
4
import pandas as pd
df = pd.DataFrame({'fruit': fruits})
One-way frequency table
1
print(df['fruit'].value_counts())
Two-way frequency table (example with another variable)
1
2
3
4
df['color'] = ['red', 'yellow', ...] # Add another column if you have one
print(pd.crosstab(df['fruit'], df['color']))
undefined
π Two-Way Frequency Tables (Contingency Tables)
To examine the relationship between two categorical variables, use pd.crosstab()
:
1
2
3
4
5
6
7
Example: Suppose you have 'fruit' and 'color' columns
df = pd.DataFrame({
'fruit': ['apple', 'banana', 'apple', 'orange', 'banana', 'banana'],
'color': ['red', 'yellow', 'green', 'orange', 'yellow', 'green']
})
print(pd.crosstab(df['fruit'], df['color']))
undefined
π Step 2: Visualize It with a Bar Chart
1
2
3
4
5
6
7
import matplotlib.pyplot as plt
plt.bar(fruit_counts.keys(), fruit_counts.values(), color='skyblue', edgecolor='black')
plt.title("Favorite Fruit Survey")
plt.xlabel("Fruit")
plt.ylabel("Frequency")
plt.show()
π§ This simple chart helps you immediately see which fruit is most popular β bar charts are perfect for categorical data.
π Step 3: Frequency Table for Numerical Data
What if your data is continuous? Like heights or ages?
Letβs simulate 50 studentsβ heights:
1
2
3
4
import numpy as np
heights = np.random.normal(loc=165, scale=10, size=50)
bins = [140, 150, 160, 170, 180, 190]
Now create a frequency table using intervals:
1
2
3
4
counts, bin_edges = np.histogram(heights, bins=bins)
for i in range(len(counts)):
print(f"{int(bin_edges[i])}β{int(bin_edges[i+1])}: {counts[i]}")
π Output (varies by run):
1
2
3
4
5
140β150: 2
150β160: 6
160β170: 21
170β180: 15
180β190: 6
This means:
- Most students are in the 160β170 cm range
- Very few are shorter than 150 or taller than 180
π Step 4: Plot It with a Histogram
1
2
3
4
5
plt.hist(heights, bins=bins, color='lightgreen', edgecolor='black')
plt.title("Height Distribution")
plt.xlabel("Height (cm)")
plt.ylabel("Frequency")
plt.show()
Unlike bar charts, histograms have connected bars β theyβre designed for continuous data.
Tip:
For discrete numerical data (like test scores), you can also usevalue_counts()
to create a frequency table:
1
2
3
import pandas as pd
scores = [85, 90, 85, 88, 92, 85, 90]
pd.Series(scores).value_counts().sort_index()
Histogram Best Practices:
- Always label your x-axis and y-axis clearly, including units (e.g., βHeight (cm)β).
- Use a consistent y-axis scale when comparing multiple histograms.
- Start the y-axis at zero unless there is a strong reason not to, and indicate clearly if you do otherwise.
π§ Why Frequency Tables Matter
- They help you understand distributions at a glance
- Let you detect outliers or unexpected values
- Help in preparing datasets for machine learning (e.g., binning or encoding)
Theyβre also a first step toward more advanced tools: descriptive statistics, histograms, boxplots, and beyond.
β οΈ Common Pitfalls
- Forgetting to sort frequency tables by value or category.
- Using too many or too few bins in histograms, which can hide patterns or create noise.
- Not handling missing values before counting frequencies.
π§ Level Up: Leveraging Frequency Tables for Deeper Data Insights
Frequency tables are more than simple counts β theyβre powerful tools for exploring and preparing data:
- π’ For categorical data, frequency tables reveal the distribution and highlight dominant categories.
- π For numerical data, grouping values into intervals in frequency tables helps uncover patterns and anomalies.
- π§βπ» Building frequency tables programmatically (e.g., with Pythonβs
pandas
) enables scalable and reproducible analysis. - π¨ Visualizing frequency tables with bar charts or histograms bridges raw numbers to intuitive understanding.
Mastering frequency tables will improve your data wrangling and make your visualizations more meaningful.
π Try It Yourself
Q: You have a list of 100 product categories (like Electronics, Clothing, Books, etc.).
What type of chart and table would help you best understand the distribution of these categories?
π‘ Show Answer
β
Use a frequency table to count how many times each category appears, and a bar chart to visualize it.
Since this is categorical data, bar charts and frequency tables are ideal for summarizing and comparing counts.
Bonus: What if instead you had 100 numerical values showing product prices?
π‘ Show Answer
β
Use a frequency table with intervals (like 0β50, 50β100) and a histogram to visualize the distribution.
Since prices are continuous numerical data, histograms show how values are spread across ranges.
β Summary
Task | Tool |
---|---|
Count categories | Counter() |
Visualize categories | Bar chart |
Group continuous data | numpy.histogram() |
Visualize continuous data | Histogram |
π¬ Got a question or suggestion?
Feel free to leave a comment in the section below β Iβd love to hear your thoughts or help with your dataset!
π Coming Next
In the next post, weβll take this frequency data and calculate powerful summary statistics like:
- Mean
- Median
- Standard deviation
Stay tuned!