Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Comparing two Pandas series and printing the the difference
In this article, we will compare two Pandas series and print the differences between them. By difference, we mean the index positions where elements did not match, along with the actual values from both series.
What is Series Comparison?
Pandas provides the compare() method to identify differences between two series. This method returns a DataFrame showing only the positions where values differ, with columns representing each series.
Basic Example
Let's start with a simple comparison between two series ?
import pandas as pd
s1 = pd.Series([10, 20, 30, 40, 50, 60])
s2 = pd.Series([10, 30, 30, 40, 55, 60])
print("S1:")
print(s1)
print("\nS2:")
print(s2)
difference = s1.compare(s2)
print("\nDifference between the series:")
print(difference)
S1: 0 10 1 20 2 30 3 40 4 50 5 60 dtype: int64 S2: 0 10 1 30 2 30 3 40 4 55 5 60 dtype: int64 Difference between the series: self other 1 20.0 30.0 4 50.0 55.0
Understanding the Output
The comparison output shows a DataFrame with two columns:
- self − Values from the first series (s1)
- other − Values from the second series (s2)
Only rows where values differ are displayed. In our example, differences occur at index 1 (20 vs 30) and index 4 (50 vs 55).
Comparing Series with Different Lengths
When series have different lengths, compare() handles missing values gracefully ?
import pandas as pd
s1 = pd.Series([1, 2, 3, 4])
s2 = pd.Series([1, 2, 5])
difference = s1.compare(s2)
print("Difference between series of different lengths:")
print(difference)
Difference between series of different lengths: self other 2 3.0 5.0 3 4.0 NaN
Using keep_equal Parameter
To see both matching and differing values, use keep_equal=True ?
import pandas as pd
s1 = pd.Series([10, 20, 30])
s2 = pd.Series([10, 25, 30])
difference = s1.compare(s2, keep_equal=True)
print("Comparison with equal values shown:")
print(difference)
Comparison with equal values shown: self other 0 10.0 10.0 1 20.0 25.0 2 30.0 30.0
Conclusion
The compare() method is an efficient way to identify differences between two Pandas series. It returns a DataFrame showing only differing positions by default, with options to include equal values when needed.
