Python Pandas - Draw swarms of observations on top of a violin plot with Seaborn

A violin plot shows the distribution of data across categories, while a swarm plot displays individual data points without overlap. Combining them creates a powerful visualization that shows both distribution shape and individual observations.

Creating Sample Data

Let's create sample cricket data to demonstrate this visualization ?

import seaborn as sb
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Create sample cricket data
np.random.seed(42)
data = {
    'Role': ['Batsman'] * 20 + ['Bowler'] * 20 + ['All-rounder'] * 15,
    'Matches': (
        list(np.random.normal(45, 12, 20)) +  # Batsmen
        list(np.random.normal(38, 8, 20)) +   # Bowlers  
        list(np.random.normal(42, 10, 15))    # All-rounders
    )
}

# Convert to positive integers
data['Matches'] = [max(10, int(x)) for x in data['Matches']]
df = pd.DataFrame(data)

print(df.head(10))
        Role  Matches
0    Batsman       52
1    Batsman       39
2    Batsman       60
3    Batsman       52
4    Batsman       36
5    Batsman       57
6    Batsman       46
7    Batsman       42
8    Batsman       48
9    Batsman       41

Creating Violin Plot with Swarm Overlay

Combine violin and swarm plots to show distribution and individual points ?

import seaborn as sb
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Create sample data
np.random.seed(42)
data = {
    'Role': ['Batsman'] * 20 + ['Bowler'] * 20 + ['All-rounder'] * 15,
    'Matches': (
        list(np.random.normal(45, 12, 20)) +
        list(np.random.normal(38, 8, 20)) +
        list(np.random.normal(42, 10, 15))
    )
}
data['Matches'] = [max(10, int(x)) for x in data['Matches']]
df = pd.DataFrame(data)

# Set theme and create the plot
sb.set_theme(style="whitegrid")
plt.figure(figsize=(10, 6))

# Draw violin plot first (background)
sb.violinplot(x="Role", y="Matches", data=df, inner=None, alpha=0.7)

# Draw swarm plot on top
sb.swarmplot(x="Role", y="Matches", data=df, color="white", size=5, edgecolor="black", linewidth=0.5)

plt.title("Cricket Matches by Player Role")
plt.xlabel("Player Role")
plt.ylabel("Number of Matches")
plt.show()

Customizing the Visualization

Enhance the plot with colors and styling options ?

import seaborn as sb
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Create sample data
np.random.seed(42)
data = {
    'Role': ['Batsman'] * 20 + ['Bowler'] * 20 + ['All-rounder'] * 15,
    'Matches': (
        list(np.random.normal(45, 12, 20)) +
        list(np.random.normal(38, 8, 20)) +
        list(np.random.normal(42, 10, 15))
    )
}
data['Matches'] = [max(10, int(x)) for x in data['Matches']]
df = pd.DataFrame(data)

# Create enhanced visualization
plt.figure(figsize=(12, 7))

# Custom color palette
colors = ["lightblue", "lightgreen", "lightcoral"]

# Draw violin plot with custom colors
sb.violinplot(x="Role", y="Matches", data=df, 
              palette=colors, inner=None, alpha=0.6)

# Draw swarm plot with dark points
sb.swarmplot(x="Role", y="Matches", data=df, 
             color="black", size=4, alpha=0.8)

plt.title("Distribution of Matches Played by Cricket Player Roles", 
          fontsize=14, fontweight='bold')
plt.xlabel("Player Role", fontsize=12)
plt.ylabel("Number of Matches", fontsize=12)
plt.grid(True, alpha=0.3)
plt.show()

Key Features

Component Purpose Parameters
violinplot() Shows distribution shape x, y, data, palette, inner
swarmplot() Shows individual points x, y, data, color, size
Overlay Combines both visualizations Plot violin first, then swarm

Benefits of Combined Plot

This combination provides several advantages ?

  • Distribution Shape: Violin plot shows data density and spread
  • Individual Points: Swarm plot reveals actual data points
  • Outlier Detection: Easy to spot unusual values
  • Sample Size: Number of points indicates group size

Conclusion

Combining violin and swarm plots creates comprehensive visualizations that show both distribution patterns and individual observations. Use violinplot() for the background distribution and swarmplot() for precise point placement.

Updated on: 2026-03-26T13:35:48+05:30

665 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements