Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to check if Pandas column has value from list of string?
In pandas, we often need to check if values in a DataFrame column match any items from a given list of strings. This is useful for filtering data, data validation, or finding specific patterns in your dataset.
Pandas is a powerful Python library for data analysis and manipulation. It provides several methods to check if column values exist in a list of strings, with isin() being the most commonly used approach.
Creating Sample Data
Let's first create a sample DataFrame to demonstrate different methods ?
import pandas as pd
data = {
"Name": ["Anil", "Srivatsav", "Prasad", "Mitra"],
"Class": [10, 30, 20, 50],
"Subject": ["Python", "Java", "C", "Android"]
}
df = pd.DataFrame(data)
print(df)
Name Class Subject
0 Anil 10 Python
1 Srivatsav 30 Java
2 Prasad 20 C
3 Mitra 50 Android
Using pandas isin() Method
The isin() method is the most efficient way to check if column values exist in a list. It returns a Boolean Series indicating matches ?
Syntax
DataFrame.column.isin(values)
Example
import pandas as pd
data = {
"Name": ["Anil", "Srivatsav", "Prasad", "Mitra"],
"Class": [10, 30, 20, 50],
"Subject": ["Python", "Java", "C", "Android"]
}
df = pd.DataFrame(data)
search_list = ["Anil", "Prasad"]
# Check which names are in the search list
result = df["Name"].isin(search_list)
print("Boolean result:")
print(result)
print("\nFiltered rows:")
print(df[result])
Boolean result:
0 True
1 False
2 True
3 False
Name: Name, dtype: bool
Filtered rows:
Name Class Subject
0 Anil 10 Python
2 Prasad 20 C
Using NumPy isin() Function
NumPy also provides an isin() function that works similarly to pandas but offers more flexibility for complex operations ?
Example
import pandas as pd
import numpy as np
data = {
"Name": ["Anil", "Srivatsav", "Prasad", "Mitra"],
"Class": [10, 30, 20, 50],
"Subject": ["Python", "Java", "C", "Android"]
}
df = pd.DataFrame(data)
search_names = ["Anil", "Srivatsav"]
# Using numpy isin()
result = np.isin(df["Name"], search_names)
print("NumPy isin() result:")
print(result)
NumPy isin() result: [ True True False False]
Checking Multiple Columns
You can check multiple columns simultaneously using various approaches ?
import pandas as pd
data = {
"Name": ["Anil", "Srivatsav", "Prasad", "Mitra"],
"Class": [10, 30, 20, 50],
"Subject": ["Python", "Java", "C", "Android"]
}
df = pd.DataFrame(data)
# Check if any value in Name or Subject columns matches the list
search_values = ["Anil", "Java", "C"]
name_match = df["Name"].isin(search_values)
subject_match = df["Subject"].isin(search_values)
print("Name matches:")
print(name_match)
print("\nSubject matches:")
print(subject_match)
print("\nRows with either match:")
print(df[name_match | subject_match])
Name matches:
0 True
1 False
2 False
3 False
Name: Name, dtype: bool
Subject matches:
0 False
1 True
2 True
3 False
Name: Subject, dtype: bool
Rows with either match:
Name Class Subject
0 Anil 10 Python
1 Srivatsav 30 Java
2 Prasad 20 C
Comparison of Methods
| Method | Performance | Best For |
|---|---|---|
pandas.isin() |
Fast | Single column checking |
numpy.isin() |
Very Fast | Complex array operations |
| Boolean indexing | Fast | Multiple conditions |
Conclusion
Use pandas.isin() for simple column value checking against a list. For more complex operations or better performance with large datasets, consider numpy.isin(). Both methods return Boolean arrays that can be used for filtering DataFrames efficiently.
