Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to do groupby on a multiindex in Pandas?
A MultiIndex DataFrame in Pandas has multiple levels of row or column indices. You can perform groupby operations on specific levels of the MultiIndex using the level parameter or by referencing index names directly.
Creating Sample Data
Let's create a sample sales dataset to demonstrate groupby operations on MultiIndex ?
import pandas as pd
# Create sample sales data
data = {
'Car': ['BMW', 'Mercedes', 'Lamborgini', 'Audi', 'Mercedes', 'Porsche', 'RollsRoyce', 'BMW'],
'Place': ['Delhi', 'Hyderabad', 'Chandigarh', 'Bangalore', 'Hyderabad', 'Mumbai', 'Mumbai', 'Delhi'],
'UnitsSold': [95, 80, 80, 75, 90, 90, 95, 50]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
Original DataFrame:
Car Place UnitsSold
0 BMW Delhi 95
1 Mercedes Hyderabad 80
2 Lamborgini Chandigarh 80
3 Audi Bangalore 75
4 Mercedes Hyderabad 90
5 Porsche Mumbai 90
6 RollsRoyce Mumbai 95
7 BMW Delhi 50
Setting MultiIndex
Convert the DataFrame to use MultiIndex with 'Car' and 'Place' as index levels ?
import pandas as pd
# Create sample data
data = {
'Car': ['BMW', 'Mercedes', 'Lamborgini', 'Audi', 'Mercedes', 'Porsche', 'RollsRoyce', 'BMW'],
'Place': ['Delhi', 'Hyderabad', 'Chandigarh', 'Bangalore', 'Hyderabad', 'Mumbai', 'Mumbai', 'Delhi'],
'UnitsSold': [95, 80, 80, 75, 90, 90, 95, 50]
}
df = pd.DataFrame(data)
# Set Car and Place columns as MultiIndex
df = df.set_index(['Car', 'Place'])
print("MultiIndex DataFrame:")
print(df)
MultiIndex DataFrame:
UnitsSold
Car Place
BMW Delhi 95
Mercedes Hyderabad 80
Lamborgini Chandigarh 80
Audi Bangalore 75
Mercedes Hyderabad 90
Porsche Mumbai 90
RollsRoyce Mumbai 95
BMW Delhi 50
GroupBy on MultiIndex Level
Group by the first level ('Car') and calculate mean units sold ?
import pandas as pd
# Create and set up MultiIndex DataFrame
data = {
'Car': ['BMW', 'Mercedes', 'Lamborgini', 'Audi', 'Mercedes', 'Porsche', 'RollsRoyce', 'BMW'],
'Place': ['Delhi', 'Hyderabad', 'Chandigarh', 'Bangalore', 'Hyderabad', 'Mumbai', 'Mumbai', 'Delhi'],
'UnitsSold': [95, 80, 80, 75, 90, 90, 95, 50]
}
df = pd.DataFrame(data)
df = df.set_index(['Car', 'Place'])
# GroupBy on the 'Car' level (level 0)
result = df.groupby(level=['Car'])['UnitsSold'].mean()
print("Mean units sold by Car:")
print(result)
Mean units sold by Car: Car Audi 75.0 BMW 72.5 Lamborgini 80.0 Mercedes 85.0 Porsche 90.0 RollsRoyce 95.0 Name: UnitsSold, dtype: float64
GroupBy Multiple Levels
You can group by multiple index levels simultaneously ?
import pandas as pd
# Create sample data with more entries for demonstration
data = {
'Car': ['BMW', 'BMW', 'Mercedes', 'Mercedes', 'Audi', 'Audi'],
'Place': ['Delhi', 'Mumbai', 'Delhi', 'Mumbai', 'Delhi', 'Mumbai'],
'Year': [2020, 2020, 2020, 2020, 2021, 2021],
'UnitsSold': [95, 85, 80, 90, 75, 70]
}
df = pd.DataFrame(data)
df = df.set_index(['Car', 'Place', 'Year'])
# GroupBy multiple levels
result = df.groupby(level=['Car', 'Place'])['UnitsSold'].sum()
print("Total units sold by Car and Place:")
print(result)
Total units sold by Car and Place:
Car Place
Audi Delhi 75
Mumbai 70
BMW Delhi 95
Mumbai 85
Mercedes Delhi 80
Mumbai 90
Name: UnitsSold, dtype: int64
Alternative GroupBy Methods
You can also group by index names directly or use level numbers ?
import pandas as pd
data = {
'Car': ['BMW', 'Mercedes', 'BMW', 'Mercedes'],
'Place': ['Delhi', 'Mumbai', 'Mumbai', 'Delhi'],
'UnitsSold': [95, 80, 85, 90]
}
df = pd.DataFrame(data)
df = df.set_index(['Car', 'Place'])
# Method 1: Using level numbers
result1 = df.groupby(level=0)['UnitsSold'].mean()
print("Using level number:")
print(result1)
# Method 2: Using index names
result2 = df.groupby('Car')['UnitsSold'].mean()
print("\nUsing index name:")
print(result2)
Using level number: Car BMW 90.0 Mercedes 85.0 Name: UnitsSold, dtype: float64 Using index name: Car BMW 90.0 Mercedes 85.0 Name: UnitsSold, dtype: float64
Conclusion
Use groupby(level=['index_name']) to group by specific MultiIndex levels. You can group by multiple levels simultaneously and use either level numbers or index names for grouping operations.
