How to do groupby on a multiindex in Pandas?

A MultiIndex DataFrame in Pandas has multiple levels of row or column indices. You can perform groupby operations on specific levels of the MultiIndex using the level parameter or by referencing index names directly.

Creating Sample Data

Let's create a sample sales dataset to demonstrate groupby operations on MultiIndex ?

import pandas as pd

# Create sample sales data
data = {
    'Car': ['BMW', 'Mercedes', 'Lamborgini', 'Audi', 'Mercedes', 'Porsche', 'RollsRoyce', 'BMW'],
    'Place': ['Delhi', 'Hyderabad', 'Chandigarh', 'Bangalore', 'Hyderabad', 'Mumbai', 'Mumbai', 'Delhi'],
    'UnitsSold': [95, 80, 80, 75, 90, 90, 95, 50]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
Original DataFrame:
         Car       Place  UnitsSold
0        BMW       Delhi         95
1   Mercedes   Hyderabad         80
2  Lamborgini  Chandigarh         80
3       Audi   Bangalore         75
4   Mercedes   Hyderabad         90
5    Porsche      Mumbai         90
6 RollsRoyce      Mumbai         95
7        BMW       Delhi         50

Setting MultiIndex

Convert the DataFrame to use MultiIndex with 'Car' and 'Place' as index levels ?

import pandas as pd

# Create sample data
data = {
    'Car': ['BMW', 'Mercedes', 'Lamborgini', 'Audi', 'Mercedes', 'Porsche', 'RollsRoyce', 'BMW'],
    'Place': ['Delhi', 'Hyderabad', 'Chandigarh', 'Bangalore', 'Hyderabad', 'Mumbai', 'Mumbai', 'Delhi'],
    'UnitsSold': [95, 80, 80, 75, 90, 90, 95, 50]
}

df = pd.DataFrame(data)

# Set Car and Place columns as MultiIndex
df = df.set_index(['Car', 'Place'])
print("MultiIndex DataFrame:")
print(df)
MultiIndex DataFrame:
                    UnitsSold
Car        Place             
BMW        Delhi           95
Mercedes   Hyderabad       80
Lamborgini Chandigarh      80
Audi       Bangalore       75
Mercedes   Hyderabad       90
Porsche    Mumbai          90
RollsRoyce Mumbai          95
BMW        Delhi           50

GroupBy on MultiIndex Level

Group by the first level ('Car') and calculate mean units sold ?

import pandas as pd

# Create and set up MultiIndex DataFrame
data = {
    'Car': ['BMW', 'Mercedes', 'Lamborgini', 'Audi', 'Mercedes', 'Porsche', 'RollsRoyce', 'BMW'],
    'Place': ['Delhi', 'Hyderabad', 'Chandigarh', 'Bangalore', 'Hyderabad', 'Mumbai', 'Mumbai', 'Delhi'],
    'UnitsSold': [95, 80, 80, 75, 90, 90, 95, 50]
}

df = pd.DataFrame(data)
df = df.set_index(['Car', 'Place'])

# GroupBy on the 'Car' level (level 0)
result = df.groupby(level=['Car'])['UnitsSold'].mean()
print("Mean units sold by Car:")
print(result)
Mean units sold by Car:
Car
Audi          75.0
BMW           72.5
Lamborgini    80.0
Mercedes      85.0
Porsche       90.0
RollsRoyce    95.0
Name: UnitsSold, dtype: float64

GroupBy Multiple Levels

You can group by multiple index levels simultaneously ?

import pandas as pd

# Create sample data with more entries for demonstration
data = {
    'Car': ['BMW', 'BMW', 'Mercedes', 'Mercedes', 'Audi', 'Audi'],
    'Place': ['Delhi', 'Mumbai', 'Delhi', 'Mumbai', 'Delhi', 'Mumbai'],
    'Year': [2020, 2020, 2020, 2020, 2021, 2021],
    'UnitsSold': [95, 85, 80, 90, 75, 70]
}

df = pd.DataFrame(data)
df = df.set_index(['Car', 'Place', 'Year'])

# GroupBy multiple levels
result = df.groupby(level=['Car', 'Place'])['UnitsSold'].sum()
print("Total units sold by Car and Place:")
print(result)
Total units sold by Car and Place:
Car      Place 
Audi     Delhi     75
         Mumbai    70
BMW      Delhi     95
         Mumbai    85
Mercedes Delhi     80
         Mumbai    90
Name: UnitsSold, dtype: int64

Alternative GroupBy Methods

You can also group by index names directly or use level numbers ?

import pandas as pd

data = {
    'Car': ['BMW', 'Mercedes', 'BMW', 'Mercedes'],
    'Place': ['Delhi', 'Mumbai', 'Mumbai', 'Delhi'],
    'UnitsSold': [95, 80, 85, 90]
}

df = pd.DataFrame(data)
df = df.set_index(['Car', 'Place'])

# Method 1: Using level numbers
result1 = df.groupby(level=0)['UnitsSold'].mean()
print("Using level number:")
print(result1)

# Method 2: Using index names
result2 = df.groupby('Car')['UnitsSold'].mean()
print("\nUsing index name:")
print(result2)
Using level number:
Car
BMW         90.0
Mercedes    85.0
Name: UnitsSold, dtype: float64

Using index name:
Car
BMW         90.0
Mercedes    85.0
Name: UnitsSold, dtype: float64

Conclusion

Use groupby(level=['index_name']) to group by specific MultiIndex levels. You can group by multiple levels simultaneously and use either level numbers or index names for grouping operations.

Updated on: 2026-03-26T01:27:23+05:30

459 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements