Home > Software design >  Match same element with n occurrences
Match same element with n occurrences

Time:03-30

I want to select the same element with exact n occurrences.

Match letters that repeats exact 3 times in this String: "aaaaabbbcccccccccdddee"

this should return "bbb" and "ddd"

If I define what I should match like "b{3}" or "d{3}", this would be easier, but I want to match all elements

I've tried and the closest I came up is this regex: (.)\1{2}(?!\1) Which returns "aaa", "bbb", "ccc", "ddd"

And I can't add negative lookbehind, because of "non-fixed width" (?<!\1)

CodePudding user response:

One possibility is to use a regex that looks for a character which is not followed by itself (or beginning of line), followed by three identical characters, followed by another character which is not the same as the second three i.e.

(?:(.)(?!\1)|^)((.)\3{2})(?!\3)

Demo on regex101

The match is captured in group 2. The issue with this though is that it absorbs a character prior to the match, so cannot find adjacent matches: as shown in the demo, it only matches aaa, ccc and eee in aaabbbcccdddeee.

This issue can be resolved by making the entire regex a lookahead, a technique which allows for capturing overlapping matches as described in this question. So:

(?=(?:(.)(?!\1)|^)((.)\3{2})(?!\3))

Again, the match is captured in group 2.

Demo on regex101

CodePudding user response:

This gets sticky because you cannot put a back reference inside a negative character set, so we'll use a lookbehind followed by a negative lookahead like this:

(?<=(.))((?!\1).)\2\2(?!\2))

This says find a character but don't include it in the match. Then look ahead to be certain the next character is different. Next consume it into capture group 2 and be certain that the next two characters match it, and the one after does not match.

Unfortunately, this does not work on 3 characters at the beginning of the string. I had to add a whole alternation clause to handle that case. So the final regex is:

(?:(?<=(.))((?!\1).)\2\2(?!\2))|^(.)\3\3(?!\3)

This handles all cases.

CodePudding user response:

You could match what you don't want to keep, which is 4 or more times the same character.

Then use an alternation to capture what you want to keep, which is 3 times the same character.

The desired matches are in capture group 2.

(.)\1{3,}|((.)\3\3)
  • (.) Capture group 1, match a single character
  • \1{3,} Repeat the same char in group 1, 3 or more times
  • | Or
  • ( Capture group 2
    • (.)\3\3 Capture group 3, match a single character followed by 2 backreferences matching 2 times the same character as in group 3
  • ) Close group 2

Regex demo

  • Related