How to use regular expressions (Regex) to filter valid emails in a Pandas series?

A regular expression is a sequence of characters that define a search pattern. In this program, we will use these regular expressions to filter valid and invalid emails in a Pandas series.

We will define a Pandas series with different emails and check which email is valid using Python's re library for regex operations.

Email Validation Regex Pattern

The regex pattern for email validation contains several components ?

  • ^: Anchor for the start of the string
  • [a-z0-9]: Character class to match lowercase letters and digits
  • [\._]?: Optional dot or underscore character
  • @: Required @ symbol
  • \w+: One or more word characters for domain name
  • [.]: Literal dot character
  • \w{2,3}: 2-3 word characters for domain extension
  • $: Anchor for the end of the string

Example

Let's create a Pandas series with email addresses and filter them using regex ?

import pandas as pd
import re

# Create a series with different email addresses
emails = pd.Series(['jimmyadams123@gmail.com', 'hellowolrd.com', 'user@domain.org', 'invalid.email', 'test123@yahoo.co.uk'])

# Define regex pattern for email validation
regex_pattern = r'^[a-z0-9]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}$'

print("Email Validation Results:")
print("-" * 40)
for email in emails:
    if re.search(regex_pattern, email):
        print(f"{email}: Valid Email")
    else:
        print(f"{email}: Invalid Email")
Email Validation Results:
----------------------------------------
jimmyadams123@gmail.com: Valid Email
hellowolrd.com: Invalid Email
user@domain.org: Valid Email
invalid.email: Invalid Email
test123@yahoo.co.uk: Invalid Email

Using Pandas str.contains() Method

You can also filter emails directly using Pandas string methods ?

import pandas as pd

emails = pd.Series(['jimmyadams123@gmail.com', 'hellowolrd.com', 'user@domain.org', 'invalid.email'])

# Filter valid emails using regex pattern
regex_pattern = r'^[a-z0-9]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}$'
valid_emails = emails[emails.str.contains(regex_pattern, regex=True, na=False)]

print("Valid emails:")
print(valid_emails)
Valid emails:
0    jimmyadams123@gmail.com
2           user@domain.org
dtype: object

Creating Boolean Mask

Generate a boolean series to identify valid emails ?

import pandas as pd

emails = pd.Series(['jimmyadams123@gmail.com', 'hellowolrd.com', 'user@domain.org', 'invalid.email'])
regex_pattern = r'^[a-z0-9]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}$'

# Create boolean mask
is_valid = emails.str.contains(regex_pattern, regex=True, na=False)

# Create DataFrame with results
result_df = pd.DataFrame({
    'Email': emails,
    'Is_Valid': is_valid
})

print(result_df)
                     Email  Is_Valid
0  jimmyadams123@gmail.com      True
1           hellowolrd.com     False
2          user@domain.org      True
3            invalid.email     False

Conclusion

Regular expressions provide a powerful way to validate email addresses in Pandas series. Use str.contains() with regex patterns for efficient filtering, or combine with re.search() for more complex validation logic.

Updated on: 2026-03-25T17:58:40+05:30

884 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements