Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python Pandas - Draw a Bar Plot and use median as the estimate of central tendency
A bar plot in Seaborn displays point estimates and confidence intervals as rectangular bars. You can use the estimator parameter in seaborn.barplot() to set median as the measure of central tendency instead of the default mean.
Required Libraries
Import the necessary libraries for creating bar plots with median estimation ?
import seaborn as sns import pandas as pd import matplotlib.pyplot as plt import numpy as np
Creating Sample Data
Let's create sample cricket data to demonstrate median estimation in bar plots ?
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Create sample cricket data
data = {
'Academy': ['Mumbai', 'Delhi', 'Chennai', 'Mumbai', 'Delhi',
'Chennai', 'Mumbai', 'Delhi', 'Chennai'],
'Matches': [45, 32, 28, 52, 38, 35, 41, 29, 33]
}
df = pd.DataFrame(data)
print("Sample Cricket Data:")
print(df)
Sample Cricket Data: Academy Matches 0 Mumbai 45 1 Delhi 32 2 Chennai 28 3 Mumbai 52 4 Delhi 38 5 Chennai 35 6 Mumbai 41 7 Delhi 29 8 Chennai 33
Bar Plot with Default Mean Estimator
First, let's see the default bar plot that uses mean as the estimator ?
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Create sample data
data = {
'Academy': ['Mumbai', 'Delhi', 'Chennai', 'Mumbai', 'Delhi',
'Chennai', 'Mumbai', 'Delhi', 'Chennai'],
'Matches': [45, 32, 28, 52, 38, 35, 41, 29, 33]
}
df = pd.DataFrame(data)
# Bar plot with default mean estimator
plt.figure(figsize=(8, 5))
sns.barplot(data=df, x='Academy', y='Matches')
plt.title('Bar Plot with Mean Estimator (Default)')
plt.ylabel('Average Matches')
plt.show()
Bar Plot with Median Estimator
Now let's create a bar plot using median as the central tendency estimator ?
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Create sample data
data = {
'Academy': ['Mumbai', 'Delhi', 'Chennai', 'Mumbai', 'Delhi',
'Chennai', 'Mumbai', 'Delhi', 'Chennai'],
'Matches': [45, 32, 28, 52, 38, 35, 41, 29, 33]
}
df = pd.DataFrame(data)
# Bar plot with median estimator
plt.figure(figsize=(8, 5))
sns.barplot(data=df, x='Academy', y='Matches', estimator=np.median)
plt.title('Bar Plot with Median Estimator')
plt.ylabel('Median Matches')
plt.show()
Comparison of Mean vs Median
Let's compare the actual values to understand the difference ?
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Create sample data
data = {
'Academy': ['Mumbai', 'Delhi', 'Chennai', 'Mumbai', 'Delhi',
'Chennai', 'Mumbai', 'Delhi', 'Chennai'],
'Matches': [45, 32, 28, 52, 38, 35, 41, 29, 33]
}
df = pd.DataFrame(data)
# Calculate mean and median for each academy
summary = df.groupby('Academy')['Matches'].agg(['mean', 'median']).round(2)
print("Mean vs Median Comparison:")
print(summary)
Mean vs Median Comparison:
mean median
Academy
Chennai 32.00 33.0
Delhi 33.00 32.0
Mumbai 46.00 45.0
Key Parameters
| Parameter | Description | Example |
|---|---|---|
estimator |
Function to estimate central tendency |
np.median, np.mean
|
ci |
Confidence interval size |
95 (default), None
|
orient |
Plot orientation |
'v' (vertical), 'h' (horizontal) |
When to Use Median vs Mean
Use median when your data has outliers or is skewed, as it's more robust than the mean. Use mean for normally distributed data without significant outliers.
Conclusion
The estimator parameter in seaborn.barplot() allows you to use median instead of mean for central tendency. This is particularly useful when dealing with skewed data or outliers that might distort the mean.
