Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Pandas GroupBy – Count the occurrences of each combination
When analyzing data, we often need to count how many times each combination of values appears across multiple columns. In Pandas, we can use DataFrame.groupby() with size() to count occurrences of each unique combination.
Creating a Sample DataFrame
Let's start by creating a DataFrame with car sales data ?
import pandas as pd
# Create sample data
data = {
'Car': ['BMW', 'Mercedes', 'Lamborghini', 'Audi', 'Mercedes', 'Porsche', 'RollsRoyce', 'BMW'],
'Place': ['Delhi', 'Hyderabad', 'Chandigarh', 'Bangalore', 'Hyderabad', 'Mumbai', 'Mumbai', 'Delhi'],
'Sold': [95, 80, 80, 75, 90, 90, 95, 50]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
Original DataFrame:
Car Place Sold
0 BMW Delhi 95
1 Mercedes Hyderabad 80
2 Lamborghini Chandigarh 80
3 Audi Bangalore 75
4 Mercedes Hyderabad 90
5 Porsche Mumbai 90
6 RollsRoyce Mumbai 95
7 BMW Delhi 50
Counting Combinations with groupby() and size()
The groupby() method groups rows by unique combinations of specified columns, and size() returns the count of rows in each group ?
import pandas as pd
data = {
'Car': ['BMW', 'Mercedes', 'Lamborghini', 'Audi', 'Mercedes', 'Porsche', 'RollsRoyce', 'BMW'],
'Place': ['Delhi', 'Hyderabad', 'Chandigarh', 'Bangalore', 'Hyderabad', 'Mumbai', 'Mumbai', 'Delhi'],
'Sold': [95, 80, 80, 75, 90, 90, 95, 50]
}
df = pd.DataFrame(data)
print("Counting occurrences of each Car-Place combination:")
result = df.groupby(['Car', 'Place']).size()
print(result)
Counting occurrences of each Car-Place combination: Car Place Audi Bangalore 1 BMW Delhi 2 Lamborghini Chandigarh 1 Mercedes Hyderabad 2 Porsche Mumbai 1 RollsRoyce Mumbai 1 dtype: int64
Alternative Methods
Using value_counts()
For counting combinations, you can also use value_counts() on multiple columns ?
import pandas as pd
data = {
'Car': ['BMW', 'Mercedes', 'Lamborghini', 'Audi', 'Mercedes', 'Porsche', 'RollsRoyce', 'BMW'],
'Place': ['Delhi', 'Hyderabad', 'Chandigarh', 'Bangalore', 'Hyderabad', 'Mumbai', 'Mumbai', 'Delhi'],
'Sold': [95, 80, 80, 75, 90, 90, 95, 50]
}
df = pd.DataFrame(data)
print("Using value_counts():")
result = df[['Car', 'Place']].value_counts()
print(result)
Using value_counts(): Car Place BMW Delhi 2 Mercedes Hyderabad 2 Audi Bangalore 1 Lamborghini Chandigarh 1 Porsche Mumbai 1 RollsRoyce Mumbai 1 Name: count, dtype: int64
Converting to DataFrame
To get the result as a DataFrame instead of a Series, use reset_index() ?
import pandas as pd
data = {
'Car': ['BMW', 'Mercedes', 'Lamborghini', 'Audi', 'Mercedes', 'Porsche', 'RollsRoyce', 'BMW'],
'Place': ['Delhi', 'Hyderabad', 'Chandigarh', 'Bangalore', 'Hyderabad', 'Mumbai', 'Mumbai', 'Delhi'],
'Sold': [95, 80, 80, 75, 90, 90, 95, 50]
}
df = pd.DataFrame(data)
result_df = df.groupby(['Car', 'Place']).size().reset_index(name='Count')
print("Result as DataFrame:")
print(result_df)
Result as DataFrame:
Car Place Count
0 Audi Bangalore 1
1 BMW Delhi 2
2 Lamborghini Chandigarh 1
3 Mercedes Hyderabad 2
4 Porsche Mumbai 1
5 RollsRoyce Mumbai 1
Comparison
| Method | Return Type | Best For |
|---|---|---|
groupby().size() |
Series with MultiIndex | Quick counting with hierarchical index |
value_counts() |
Series (sorted by count) | When you want results sorted by frequency |
groupby().size().reset_index() |
DataFrame | When you need a flat DataFrame structure |
Conclusion
Use groupby().size() to count occurrences of each unique combination across multiple columns. For a flat DataFrame result, add reset_index(), or use value_counts() for frequency-sorted results.
