Post

How to Build Frequency Tables in Python (With Charts)

How to Build Frequency Tables in Python (With Charts)

Before building a machine learning model or exploring data in Python, you need to understand how your data is distributed. This guide walks you through creating frequency tables for both categorical and continuous data β€” and visualizing them using bar charts and histograms with Python libraries like pandas, numpy, and matplotlib.


πŸ“š This post is part of the "Intro to Statistics" series

πŸ”™ Previously: Choosing the Right Graph: How to Visualize Your Data

πŸ”œ Next: Measuring the Center: Mean, Median, and Mode Explained


πŸ‘‰ Understand your data.

That’s where frequency tables come in β€” they help you summarize your raw data and reveal hidden patterns. In this post, you’ll learn how to create frequency tables in Python and visualize them with charts.

We’ll cover:

βœ”οΈ What frequency tables are
βœ”οΈ Why they matter
βœ”οΈ How to build them for both categorical and numerical data
βœ”οΈ How to plot them with bar charts and histograms


πŸ“Š What is a Frequency Table?

A frequency table shows how often each value appears in your data. It’s a way to take messy raw numbers and turn them into something readable β€” a summary that helps you spot patterns fast.

🟑 Imagine you asked 20 people about their favorite fruit. Here’s the data:

1
2
3
4
fruits = ['apple', 'banana', 'apple', 'orange', 'banana', 'banana',
          'apple', 'banana', 'orange', 'apple', 'banana', 'banana',
          'orange', 'apple', 'apple', 'banana', 'banana', 'apple',
          'orange', 'banana']

We want to know: how many chose each fruit?


πŸ”’ Step 1: Count Frequencies for Categorical Data

1
2
3
4
from collections import Counter

fruit_counts = Counter(fruits)
print(fruit_counts)

πŸ“Œ Output:

1
Counter({'banana': 9, 'apple': 7, 'orange': 4})

🎯 This is your frequency table. It tells you that:

  • 9 people chose banana 🍌
  • 7 people chose apple 🍎
  • 4 people chose orange 🍊

🐼 Alternative: Using pandas for Frequency Tables

If your data is in a pandas DataFrame (which is common in real projects), you can use value_counts() for one column or pd.crosstab() for two-way tables:

1
2
3
4
import pandas as pd

df = pd.DataFrame({'fruit': fruits})

One-way frequency table

1
print(df['fruit'].value_counts())

Two-way frequency table (example with another variable)

1
2
3
4
df['color'] = ['red', 'yellow', ...] # Add another column if you have one
print(pd.crosstab(df['fruit'], df['color']))
undefined

πŸ”„ Two-Way Frequency Tables (Contingency Tables)

To examine the relationship between two categorical variables, use pd.crosstab():

1
2
3
4
5
6
7
Example: Suppose you have 'fruit' and 'color' columns
df = pd.DataFrame({
'fruit': ['apple', 'banana', 'apple', 'orange', 'banana', 'banana'],
'color': ['red', 'yellow', 'green', 'orange', 'yellow', 'green']
})
print(pd.crosstab(df['fruit'], df['color']))
undefined

πŸ“Š Step 2: Visualize It with a Bar Chart

1
2
3
4
5
6
7
import matplotlib.pyplot as plt

plt.bar(fruit_counts.keys(), fruit_counts.values(), color='skyblue', edgecolor='black')
plt.title("Favorite Fruit Survey")
plt.xlabel("Fruit")
plt.ylabel("Frequency")
plt.show()

🧠 This simple chart helps you immediately see which fruit is most popular β€” bar charts are perfect for categorical data.


πŸ“ Step 3: Frequency Table for Numerical Data

What if your data is continuous? Like heights or ages?

Let’s simulate 50 students’ heights:

1
2
3
4
import numpy as np

heights = np.random.normal(loc=165, scale=10, size=50)
bins = [140, 150, 160, 170, 180, 190]

Now create a frequency table using intervals:

1
2
3
4
counts, bin_edges = np.histogram(heights, bins=bins)

for i in range(len(counts)):
    print(f"{int(bin_edges[i])}–{int(bin_edges[i+1])}: {counts[i]}")

πŸ“Œ Output (varies by run):

1
2
3
4
5
140–150: 2
150–160: 6
160–170: 21
170–180: 15
180–190: 6

This means:

  • Most students are in the 160–170 cm range
  • Very few are shorter than 150 or taller than 180

πŸ“‰ Step 4: Plot It with a Histogram

1
2
3
4
5
plt.hist(heights, bins=bins, color='lightgreen', edgecolor='black')
plt.title("Height Distribution")
plt.xlabel("Height (cm)")
plt.ylabel("Frequency")
plt.show()

Unlike bar charts, histograms have connected bars β€” they’re designed for continuous data. Frequency Table Charts


Tip:
For discrete numerical data (like test scores), you can also use value_counts() to create a frequency table:

1
2
3
import pandas as pd
scores = [85, 90, 85, 88, 92, 85, 90]
pd.Series(scores).value_counts().sort_index()

Histogram Best Practices:

  • Always label your x-axis and y-axis clearly, including units (e.g., β€œHeight (cm)”).
  • Use a consistent y-axis scale when comparing multiple histograms.
  • Start the y-axis at zero unless there is a strong reason not to, and indicate clearly if you do otherwise.

🧠 Why Frequency Tables Matter

  • They help you understand distributions at a glance
  • Let you detect outliers or unexpected values
  • Help in preparing datasets for machine learning (e.g., binning or encoding)

They’re also a first step toward more advanced tools: descriptive statistics, histograms, boxplots, and beyond.


⚠️ Common Pitfalls

  • Forgetting to sort frequency tables by value or category.
  • Using too many or too few bins in histograms, which can hide patterns or create noise.
  • Not handling missing values before counting frequencies.

🧠 Level Up: Leveraging Frequency Tables for Deeper Data Insights

Frequency tables are more than simple counts β€” they’re powerful tools for exploring and preparing data:

  • πŸ”’ For categorical data, frequency tables reveal the distribution and highlight dominant categories.
  • πŸ“Š For numerical data, grouping values into intervals in frequency tables helps uncover patterns and anomalies.
  • πŸ§‘β€πŸ’» Building frequency tables programmatically (e.g., with Python’s pandas) enables scalable and reproducible analysis.
  • 🎨 Visualizing frequency tables with bar charts or histograms bridges raw numbers to intuitive understanding.

Mastering frequency tables will improve your data wrangling and make your visualizations more meaningful.


πŸ“Œ Try It Yourself

Q: You have a list of 100 product categories (like Electronics, Clothing, Books, etc.).

What type of chart and table would help you best understand the distribution of these categories?

πŸ’‘ Show Answer

βœ… Use a frequency table to count how many times each category appears, and a bar chart to visualize it.

Since this is categorical data, bar charts and frequency tables are ideal for summarizing and comparing counts.


Bonus: What if instead you had 100 numerical values showing product prices?

πŸ’‘ Show Answer

βœ… Use a frequency table with intervals (like 0–50, 50–100) and a histogram to visualize the distribution.

Since prices are continuous numerical data, histograms show how values are spread across ranges.


βœ… Summary

TaskTool
Count categoriesCounter()
Visualize categoriesBar chart
Group continuous datanumpy.histogram()
Visualize continuous dataHistogram

πŸ’¬ Got a question or suggestion?
Feel free to leave a comment in the section below β€” I’d love to hear your thoughts or help with your dataset!


πŸš€ Coming Next

In the next post, we’ll take this frequency data and calculate powerful summary statistics like:

  • Mean
  • Median
  • Standard deviation

Stay tuned!

This post is licensed under CC BY 4.0 by the author.