Home > Blockchain >  Regex to match any number of repititions except k
Regex to match any number of repititions except k

Time:10-07

I'm trying to find all text which doesn't match the following pattern ^([^\.]*\.){3}[^\.]*$ - i.e. any text seperated by 3 periods (.) for example XXX.XXX.XXX.XXX does match, XX.XX or XX.XX.XX.XX.XX does not.

Any character except the period can be used in place of X, i.e. I want essentially want to count the number of periods in the string and filter by the count <> 3 (the expression above does the opposite, i.e. matches when the count == 3)

How do you say match 1,2 or 4 times?

CodePudding user response:

First, I'm going to simplify your question down to the core piece: how to match any number of repetitions except k. To that end, I'll simplify the expression down to x. After all, this should work with any expression, so might as well start off simple.

Regex provides two useful constructs for us:

  1. The "n or more" construct {n,}
    • This specifies that you want this expression to repeat n or more times.
  2. The "range" construct {n,m}
    • This specifies that you want this expression to repeat any number of times between n and m, inclusive.

We can put these together using regex's OR notation (|) to match "between 1 and k - 1 times" ({1,k-1}) and "k 1 or more times" ({k 1,}), separately. We are using k - 1 and k 1 as bounds because both of these functions are inclusive, and we want to exclude k. If we wanted k to be, say, 3, we would end up with the following expression:

^(x{1,2}|x{4,})$

Now, this could be a problem if you happen to have a really long expression, since you would have to type the expression out twice. This could get really long! Luckily, we can refer back to a capturing group we made earlier. The syntax is (?n), where n denotes which capturing group you are referring to. In this case, we'll put our pattern in the first capturing group, and we'll refer to it using (?1). This gives us:

^(x)((?1){0,1}|(?1){3,})$

Notice that I used 0 to 1 and 3 as my quantifiers because we already matched the expression once at the beginning of our pattern. One caveat here is that not all flavors of regex support this syntax. PCRE (and PCRE2) supports it, but Python, Golang, Java, and ECMAScript do not.

Now all we've got left is to plug in your pattern. Super easy, we can just drop it in where x is in our previous patterns:

Using the first method, if you aren't using PCRE or you have a short expression:

^(([^\.]*\.){1,2}|([^\.]*\.){4,})[^\.]*$

And using the second method, if you're using PCRE and you have a long expression:

^([^\.]*\.)((?1){0,1}|(?1){3,})[^\.]*$

CodePudding user response:

You could use Negative Lookahead with what you don't want and if the condition satisfied capture anything:

The general idea:

(?!^not_this_pattern$)^[\s\S]*$

[\s\S] capture anything, including new line (in contrast to .)

And for this example:

(?!^([^\.]*\.){3}[^\.]*$)^[\s\S]*$

Demo

Or alternatively use condition operator | for the pattern to repeat (2 or less) or (4 or more):

^((?:[^\.]*\.[^\.]*){,2}|(?:[^\.]*\.[^\.]*){4,})$

Demo

  • Related