How does nested character class subtraction work in Python?


Nested Character Class Subtraction

Since we can use the full character class syntax within the subtracted character class, we can subtract a class from the class being subtracted. [0-9-[0-7-[0-3]]] first subtracts 0-3 from 0-7, yielding [0-9-[4-7]], or [0-38-9], which matches any character in the string 012389.

The class subtraction is always the last element in the character class. [0-9-[4-7]a-d] is not a valid regular expression. It should be rewritten as [0-9a-d-[4-7]]. The subtraction works on the whole class. 

While we can use nested character class subtraction, we cannot subtract two classes sequentially. To subtract ASCII characters and Arabic characters from a class with all Unicode letters, combine the ASCII and Arabic characters into one class, and subtract that, as in [\p{L}-[\p{IsBasicLatin}\p{IsArabic}]].

Updated on: 30-Jul-2019

307 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements