Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python regex to find sequences of one upper case letter followed by lower case letters
When working with text processing, you often need to find sequences of one uppercase letter followed by lowercase letters. Python's regex module provides an efficient way to match such patterns using regular expressions.
Understanding the Pattern
The regex pattern [A-Z][a-z]+ matches one uppercase letter followed by one or more lowercase letters. Let's break it down ?
-
[A-Z]− matches exactly one uppercase letter -
[a-z]+− matches one or more lowercase letters -
$− ensures the pattern matches to the end of string
Basic Pattern Matching
Here's a simple function to check if a string follows the pattern ?
import re
def match_string(my_string):
pattern = '[A-Z][a-z]+$'
if re.search(pattern, my_string):
return 'The string meets the required condition'
else:
return 'The string does not meet the required condition'
# Test with different strings
test_strings = ["Python", "python", "PYTHON", "PyThOn", "P"]
for string in test_strings:
print(f"String: {string}")
print(match_string(string))
print()
String: Python The string meets the required condition String: python The string does not meet the required condition String: PYTHON The string does not meet the required condition String: PyThOn The string does not meet the required condition String: P The string does not meet the required condition
Finding All Matches in Text
To find all sequences within a longer text, use findall() method ?
import re
text = "Hello World! Python Programming Language Java Script"
pattern = '[A-Z][a-z]+'
matches = re.findall(pattern, text)
print("Found sequences:", matches)
# Find matches with their positions
for match in re.finditer(pattern, text):
print(f"Match: '{match.group()}' at position {match.start()}-{match.end()}")
Found sequences: ['Hello', 'World', 'Python', 'Programming', 'Language', 'Java', 'Script'] Match: 'Hello' at position 0-5 Match: 'World' at position 6-11 Match: 'Python' at position 13-19 Match: 'Programming' at position 20-31 Match: 'Language' at position 32-40 Match: 'Java' at position 41-45 Match: 'Script' at position 46-52
Comparison of Regex Methods
| Method | Returns | Use Case |
|---|---|---|
search() |
First match object or None | Check if pattern exists |
findall() |
List of all matches | Get all matching strings |
finditer() |
Iterator of match objects | Get matches with positions |
match() |
Match object if starts at beginning | Validate entire string format |
Practical Example
Extract proper nouns (capitalized words) from a sentence ?
import re
sentence = "Alice and Bob visited New York and Paris last Summer."
pattern = '[A-Z][a-z]+'
proper_nouns = re.findall(pattern, sentence)
print("Proper nouns found:", proper_nouns)
# Count occurrences
print(f"Total proper nouns: {len(proper_nouns)}")
Proper nouns found: ['Alice', 'Bob', 'New', 'York', 'Paris', 'Summer'] Total proper nouns: 6
Conclusion
Use the regex pattern [A-Z][a-z]+ to find sequences of one uppercase letter followed by lowercase letters. The search() method checks for existence, while findall() extracts all matches from text.
