Python - Count distinct in Pandas Aggregation with Numpy


To count distinct, use nunique in Pandas. We will groupby a column and find sun as well using Numpy sum().

At first, import the required libraries −

import pandas as pd
import numpy as np

Create a DataFrame with 3 columns. The columns have duplicate values −

dataFrame = pd.DataFrame(
   {
      "Car": ['BMW', 'Audi', 'BMW', 'Lexus', 'Lexus'],"Place": ['Delhi','Bangalore','Delhi','Chandigarh','Chandigarh'],"Units": [100, 150, 50, 110, 90]
   }
)

Count distinct in aggregation agg() with nunique. Calculating the sum for counting, we are using numpy sum() −

dataFrame = dataFrame.groupby("Car").agg({"Units": np.sum, "Place": pd.Series.nunique})

Example

Following is the code −

import pandas as pd
import numpy as np

dataFrame = pd.DataFrame(
   {
      "Car": ['BMW', 'Audi', 'BMW', 'Lexus', 'Lexus'],"Place": ['Delhi','Bangalore','Delhi','Chandigarh','Chandigarh'],"Units": [100, 150, 50, 110, 90]
   }
)

print"DataFrame ...\n",dataFrame

# count distinct in aggregation with nunique
dataFrame = dataFrame.groupby("Car").agg({"Units": np.sum, "Place": pd.Series.nunique})

print"\nUpdated DataFrame ...\n",dataFrame

Output

This will produce the following output −

DataFrame ...
     Car      Place   Units
0    BMW      Delhi    100
1   Audi  Bangalore    150
2    BMW      Delhi     50
3  Lexus Chandigarh    110
4  Lexus Chandigarh     90

Updated DataFrame ...
       Units   Place
Car
Audi    150       1
BMW     150       1
Lexus   200       1

Updated on: 16-Sep-2021

589 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements