Python Pandas - Comparing Categorical Data
Comparing categorical data is an essential task for getting insights and understanding the relationships between different categories of the data. In Python, Pandas provides various ways to perform comparisons using comparison operators (==, !=, >, >=, <, and <=) on categorical data. These comparisons can be made in three main scenarios −
Equality comparison (== and !=).
All comparisons (==, !=, >, >=, <, and <=).
Comparing categorical data to a scalar value.
It is important to note that any non-equality comparisons between categorical data with different categories or between a categorical Series and a list-like object will raise a TypeError. This is due to the categories ordering could be interpreted in two ways, one with taking into account the ordering and one without.
In this tutorial, we will learn how to compare categorical data in Python Pandas library using the comparison operators such as ==, !=, >, >=, <, and <=.
Equality comparisons of Categorical Data
In Pandas, comparing categorical data for equality is possible with a variety of objects such as lists, arrays, or Series objects of the same length as the categorical data.
Example
The following example demonstrates how to perform equality and inequality comparisons between categorical Series and the list-like objects.
import pandas as pd
from pandas.api.types import CategoricalDtype
import numpy as np
# Creating a categorical Series
s = pd.Series([1, 2, 1, 1, 2, 3, 1, 3]).astype(CategoricalDtype([3, 2, 1], ordered=True))
# Creating another categorical Series for comparison
s2 = pd.Series([2, 2, 2, 1, 1, 3, 3, 3]).astype(CategoricalDtype([3, 2, 1], ordered=True))
# Equality comparison
print("Equality comparison (s == s2):")
print(s == s2)
print("\nInequality comparison (s != s2):")
print(s != s2)
# Equality comparison with a NumPy array
print("\nEquality comparison with NumPy array:")
print(s == np.array([1, 2, 3, 1, 2, 3, 2, 1]))
Following is the output of the above code −
Equality comparison (s == s2): 0 False 1 True 2 False 3 True 4 False 5 True 6 False 7 True dtype: bool Inequality comparison (s != s2): 0 True 1 False 2 True 3 False 4 True 5 False 6 True 7 False dtype: bool Equality comparison with NumPy array: 0 True 1 True 2 False 3 True 4 True 5 True 6 False 7 False dtype: bool
All Comparisons of Categorical Data
Pandas allows you to perform various comparison operations including (>, >=, <=, <=) between the ordered categorical data.
Example
This example demonstrates how to perform non-equality comparisons (>, >=, <=, <=) on ordered categorical data.
import pandas as pd
from pandas.api.types import CategoricalDtype
import numpy as np
# Creating a categorical Series
s = pd.Series([1, 2, 1, 1, 2, 3, 1, 3]).astype(CategoricalDtype([3, 2, 1], ordered=True))
# Creating another categorical Series for comparison
s2 = pd.Series([2, 2, 2, 1, 1, 3, 3, 3]).astype(CategoricalDtype([3, 2, 1], ordered=True))
# Greater than comparison
print("Greater than comparison:\n",s > s2)
# Less than comparison
print("\nLess than comparison:\n",s < s2)
# Greater than or equal to comparison
print("\nGreater than or equal to comparison:\n",s >= s2)
# Lessthan or equal to comparison
print("\nLess than or equal to comparison:\n",s <= s2)
Following is the output of the above code −
Greater than comparison: 0 True 1 False 2 True 3 False 4 False 5 False 6 True 7 False dtype: bool Less than comparison: 0 False 1 False 2 False 3 False 4 True 5 False 6 False 7 False dtype: bool Greater than or equal to comparison: 0 True 1 True 2 True 3 True 4 False 5 True 6 True 7 True dtype: bool Lessthan or equal to comparison: 0 False 1 True 2 False 3 True 4 True 5 True 6 False 7 True dtype: bool
Comparing Categorical Data to Scalars
Categorical data can also be compared to scalar values using all comparison operators (==, !=, >, >=, <, and <=). The categorical values are compared to the scalar based on the order of their categories.
Example
The following example demonstrates how the categorical data can be compared to a scalar value.
import pandas as pd
# Creating a categorical Series
s = pd.Series([1, 2, 3]).astype(pd.CategoricalDtype([3, 2, 1], ordered=True))
# Compare to a scalar
print("Comparing categorical data to a scalar:")
print(s > 2)
Following is the output of the above code −
Comparing categorical data to a scalar: 0 True 1 False 2 False dtype: bool
Comparing Categorical Data with Different Categories
When comparing two categorical Series that have different categories or orderings, then a TypeError will be raised.
Example
The following example demonstrates handling the TypeError while performing the comparison between the two categorical Series objects with the different categories or orders.
import pandas as pd
from pandas.api.types import CategoricalDtype
import numpy as np
# Creating a categorical Series
s = pd.Series([1, 2, 1, 1, 2, 3, 1, 3]).astype(CategoricalDtype([3, 2, 1], ordered=True))
# Creating another categorical Series for comparison
s3 = pd.Series([2, 2, 2, 1, 1, 3, 1, 2]).astype(CategoricalDtype(ordered=True))
try:
print("Attempting to compare differently ordered two Series objects:")
print(s > s3)
except TypeError as e:
print("TypeError:", str(e))
Following is the output of the above code −
Attempting to compare differently ordered two Series objects: TypeError: Categoricals can only be compared if 'categories' are the same.