Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python – Group and calculate the sum of column values of a Pandas DataFrame
In Pandas, you can group DataFrame rows by specific columns and calculate aggregated values like sum, mean, or count. This is particularly useful for analyzing time-series data where you want to group by periods like months, quarters, or years.
Creating a Sample DataFrame
Let's create a DataFrame with car sales data to demonstrate grouping and summing ?
import pandas as pd
# Create DataFrame with car sales data
dataFrame = pd.DataFrame({
"Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley", "Mustang"],
"Date_of_Purchase": [
pd.Timestamp("2021-06-10"),
pd.Timestamp("2021-07-11"),
pd.Timestamp("2021-06-25"),
pd.Timestamp("2021-06-29"),
pd.Timestamp("2021-03-20"),
pd.Timestamp("2021-01-22"),
pd.Timestamp("2021-01-06"),
pd.Timestamp("2021-01-04"),
pd.Timestamp("2021-05-09")
],
"Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
})
print("DataFrame...")
print(dataFrame)
DataFrame...
Car Date_of_Purchase Reg_Price
0 Audi 2021-06-10 1000
1 Lexus 2021-07-11 1400
2 Tesla 2021-06-25 1100
3 Mercedes 2021-06-29 900
4 BMW 2021-03-20 1700
5 Toyota 2021-01-22 1800
6 Nissan 2021-01-06 1300
7 Bentley 2021-01-04 1150
8 Mustang 2021-05-09 1350
Grouping by Month and Calculating Sum
Use pd.Grouper with frequency 'M' to group by month and calculate the sum of registration prices ?
import pandas as pd
# Create DataFrame
dataFrame = pd.DataFrame({
"Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley", "Mustang"],
"Date_of_Purchase": [
pd.Timestamp("2021-06-10"),
pd.Timestamp("2021-07-11"),
pd.Timestamp("2021-06-25"),
pd.Timestamp("2021-06-29"),
pd.Timestamp("2021-03-20"),
pd.Timestamp("2021-01-22"),
pd.Timestamp("2021-01-06"),
pd.Timestamp("2021-01-04"),
pd.Timestamp("2021-05-09")
],
"Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
})
# Group by month and calculate sum
monthly_sum = dataFrame.groupby(pd.Grouper(key='Date_of_Purchase', axis=0, freq='M')).sum()
print("Group DataFrame by month...")
print(monthly_sum)
Group DataFrame by month...
Reg_Price
Date_of_Purchase
2021-01-31 4250.0
2021-02-28 NaN
2021-03-31 1700.0
2021-04-30 NaN
2021-05-31 1350.0
2021-06-30 3000.0
2021-07-31 1400.0
Understanding pd.Grouper Parameters
The pd.Grouper function accepts several key parameters ?
| Parameter | Description | Example Value |
|---|---|---|
key |
Column name to group by | 'Date_of_Purchase' |
freq |
Frequency for time grouping | 'M' (month), 'Q' (quarter), 'Y' (year) |
axis |
Axis to group along | 0 (rows), 1 (columns) |
Removing NaN Values
To exclude months with no data (NaN values), use dropna() ?
import pandas as pd
# Create DataFrame
dataFrame = pd.DataFrame({
"Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley", "Mustang"],
"Date_of_Purchase": [
pd.Timestamp("2021-06-10"),
pd.Timestamp("2021-07-11"),
pd.Timestamp("2021-06-25"),
pd.Timestamp("2021-06-29"),
pd.Timestamp("2021-03-20"),
pd.Timestamp("2021-01-22"),
pd.Timestamp("2021-01-06"),
pd.Timestamp("2021-01-04"),
pd.Timestamp("2021-05-09")
],
"Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350]
})
# Group by month, sum, and remove NaN values
monthly_sum_clean = dataFrame.groupby(pd.Grouper(key='Date_of_Purchase', freq='M')).sum().dropna()
print("Monthly sum without NaN values...")
print(monthly_sum_clean)
Monthly sum without NaN values...
Reg_Price
Date_of_Purchase
2021-01-31 4250.0
2021-03-31 1700.0
2021-05-31 1350.0
2021-06-30 3000.0
2021-07-31 1400.0
Conclusion
Use pd.Grouper with frequency parameters to group time-series data by periods. The sum() method calculates totals for each group, and dropna() removes empty periods from results.
