What are regular expression repetition cases in Python?


We can build regular expressions that recognise repeated character groups using a few special characters. The following metacharacters can be used to search for a character or set of repeated characters.

The question mark was the first repeating operator or quantifier developed. It effectively makes it optional by instructing the engine to try matching the previous token 0 or 1 times.

The engine is instructed to try matching the previous token zero or more times by the asterisk or star. The plus instructs the engine to make one or more attempts to match the previous token. <[A-Za-z] An HTML tag without any attributes is matched by [A-Za-z0-9]*>. Angle brackets are literal expressions. The initial character class corresponds to a letter. The second character class matches letters or numbers. The star repeats the second character class. It's acceptable if the second character class matches nothing because we utilised the star. Therefore, a tag like this will match our regex.

The first character class will match H upon matching. The second character class will be repeated three times, matching T, M, and L with each step, thanks to the star. I also had the option of using [A-Za-z0-9]+. I refrained from doing so since this regex would match the invalid HTML element 1>. However, if you are certain that the text you are looking through does not contain such incorrect tags, this regex can be adequate.

Limiting Repetition

You may also indicate how many times a token can be repeated using an extra quantifier. The syntax is min,max, where min represents the lowest number of matches (zero or a positive integer) and max represents the maximum number of matches (an integer equal to or greater than min). The maximum number of matches is infinite if the comma is present, but the max is left out. As a result, the values 0 and 1 are equivalent to?, * and +, respectively. The engine is instructed to repeat the token precisely min times if the comma and max are omitted.

To match a number between 1000 and 9999, use the expression b[1-9][0-9]3b. A number between 100 and 99999 matches the value 2,4b. Take note of the term limits used.

Metacharacters Used in Repetition

Character Meaning Example
?

This means zero or one of the preceding characters.

Note the zero part there because that can trip you up if you aren't careful.

pythonl?y matches:

pythony

pythonly

* Looks for zero or more of the preceding characters.

pythonl*y matches

both of the above plus

pythonlly,

pythonllly, and so on

+ Looks for one or more of the preceding characters.

pythonl+y matches:

pythonly,

pythonlly,

pythonllly,and so on

{n,m} looks for n to m repetitions of the preceding characters.

fo{1,2} matches

fo or

foo

These repetition characters may all be used to describe groups of characters. Thus

>>> import re # Python's reg exp module - implicit import in all examples

>>> re.match('(.an){1,2}s', 'cans')

<re.MatchObject instance at 862760>

The same pattern will also match 'cancans' or 'pans' or 'canpans', but not 'bananas'.

Example

The following code answers the question above

#importing re import re #storing a string s = "sheeeeeeeeple" print("Give String -",s) #searching for the repetitive words match = re.search(r"he+", s) #printing the matched groups of letters together print ("Latest String-",match.group())

Output

Give String - sheeeeeeeeple
Latest String- heeeeeeee

Code Explanation

Firstly, we import re from the regex module. Then we store a string sheeeeeeeeple in a variable s. By using the search method, we search for the repetition of “e” in the string s and store it in a variable match. Then by using the group method we print the matching words.

Example 2

The following code answers the question above

#importing re import re #storing a string s = "tutoriallllllllsPoint" print('Give String-',s) #searching for the repetitive words match = re.search(r"al+", s) #printing the matched groups of letters together print ("Lates string -",match.group())

Output

Give String- tutoriallllllllsPoint
Lates string - allllllll

Code Explanation

Firstly, we import re from the regex module. Then we store a string sheeeeeeeeple in a variable s. Using the search method, we search for the repetition of “l” in the string s and store it in a variable match. Then by using the group method, we print the matching words.

Conclusion

An expression that has been repeated however many times is known as a repetition. Any number of times, including 0, can be used to repeat an expression after a '*'. Any number of times, but at least once, can be used to repeat an expression followed by "+."

An expression followed by ‘?’ may be repeated at zero or once. When it is necessary to specify the minimum explicitly and the maximum number of repeats, the bounds operator {} may be used. Thus. a{2} is the letter ‘a’ repeated exactly twice, a{2,4} represents the letter ‘a’ repeated between 2 and 4 times, and a{2,} represents the letter ‘a’ repeated at least twice with no upper limit. Note that there must be no whitespace inside the {}, and there is no upper limit on the values of the lower and upper bounds. All repeat expressions refer to the shortest possible previous sub-expression: a single character; a character set, or a sub-expression grouped with ()

Updated on: 23-Nov-2022

878 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements