Home > Back-end >  Regex match group at least n times or not at all, in random order with other matching groups
Regex match group at least n times or not at all, in random order with other matching groups

Time:06-30

I am working with java regexes, but I guess the principles apply for every regex.

I have these requirements for the segment a regex should match:

  • have at least 3 times 'a'
  • have at least 3 times 'b'
  • occurrences of 'a' and 'b' can be in any order

Inspired by this post I came up with the following regex (regex101):

(?=([b]*[a]){3})(?=([a]*[b]){3})[ab] 

I am struggling with adding a new requirement:

  • Match if there is no or at least 3 'c'
  • as above, 'c' can occur anywhere in the segment

Examples for valid sequences:

aaabbb
ababab
aaabbbccc
abcabcabc
ababcabcc

Examples for invalid sequences (as a whole):

aaabbbc
aabbb
abbccc
abcabca

My thoughts so far:

  • Having at least 3 'c'

    (?=([bc]*[a]){3})(?=([ac]*[b]){3})(?=([ab]*[c]){3,})[abc]

  • Combining this and above solution in a crude manner (regex101) which basically just a large "either none or at least 3"

    ((?=([bc]*[a]){3})(?=([ac]*[b]){3})(?=([ab]*[c]){3,})[abc] |(?=([b]*[a]){3})(?=([a]*[b]){3})[ab] )

Finally the Question: Is there a better way to achieve this using other methods, like or-ing the 'c'-requirement look-ahead, nested look-aheads or something entirely different?

CodePudding user response:

(?=^(?:.*a){3}.*$)(?=^(?:.*b){3}.*$)(?=^(?:.*c){3}.*$|^[^c]*$).*

Short Explanation

  • (?=^(?:.*a){3}.*$) Assert that string contains at least 3 a
  • (?=^(?:.*b){3}.*$) Assert that string contains at least 3 b
  • (?=^(?:.*c){3}.*$|^[^c]*$) Assert that string contains at least 3 c or the string does not contain any c
  • .* Match the whole string that passes all assertions

Also, see the regex demo and Java example

CodePudding user response:

You could assert 3 times a and 3 times b, and then optionally match at least 3 times a c

Add anchors ^ and $ to assert the start and the end of the string.

Note that you don't have to put a single char like [a] in a character class:

^(?=([bc]*a){3})(?=([ca]*b){3})[ab]*(?:c[ab]*c[ab]*c[abc]*)?$

Explanation

  • ^ Start of string
  • (?=([bc]*a){3}) Assert 3 times an a char
  • (?=([ca]*b){3}) Assert 3 times a b char
  • [ab]* Match optional chars a b
  • (?: Non capture group
    • c[ab]*c[ab]*c Match 3 times a c char
    • [abc]* Match optional a,b and c chars
  • )? Close the non capture group and make it optional
  • $ End of string

Regex demo

As you don't really need the capture groups, you can use non capture groups (?: instead for the repetition:

^(?=(?:[bc]*a){3})(?=(?:[ca]*b){3})[ab]*(?:c[ab]*c[ab]*c[abc]*)?$

Regex demo

CodePudding user response:

You can use

(?<![abc])              # No "a", "b", "c" allowed immediately on the left
(?=(?:[bc]*a){3})       # At least three "a"s
(?=(?:[ac]*b){3})       # At least three "b"s
(?:                     # Either
   (?=[ab]*(?![abc]))   #  only "a" or "b"s allowed until a location not followed with "a", "b" or "c"
 |                      #  or
   (?=(?:[ab]*c){3})    # At least three "c"s
)
[abc]                   # Match and consume one or more "a", "b" or "c" chars

See the regex demo.

As a single line:

(?<![abc])(?=(?:[bc]*a){3})(?=(?:[ac]*b){3})(?:(?=[ab]*(?![abc]))|(?=(?:[ab]*c){3}))[abc] 
  • Related