Home > Net >  What is a regex expression that can prune down repeating identical characters down to a maximum of t
What is a regex expression that can prune down repeating identical characters down to a maximum of t

Time:07-28

I feel I am having the most difficulty explaining this well enough for a search engine to pick up on what I'm looking for. The behavior is essentially this:

string = "aaaaaaaaare yooooooooou okkkkkk"

would become "aare yoou okk", with the maximum number of repeats for any given character is two.

Matching the excess duplicates, and then re.sub -ing it seems to me the approach to take, but I can't figure out the regex statement I need.

The only attempt I feel is even worth posting is this - (\w)\1{3,0}

Which matched only the first instance of a character repeating more than three times - so only one match, and the whole block of repeated characters, not just the ones exceeding the max of 2. Any help is appreciated!

CodePudding user response:

The regexp should be (\w)\1{2,} to match a character followed by at least 2 repetitions. That's 3 or more when you include the initial character.

The replacement is then \1\1 to replace with just two repetitions.

string = "aaaaaaaaare yooooooooou okkkkkk"
new_string = re.sub(r'(\w)\1{2,}', r'\1\1', string)
  • Related