Home > Blockchain >  Restrict negative lookahead to be between substrings regex
Restrict negative lookahead to be between substrings regex

Time:11-19

In my regex pattern, I would like to make sure a certain substring only occurs once in between two other substrings.

So, let's take for example these strings:

string_a = “this and that”
string_b = "this and and that"

I want to return a match for string_a but not for string_b, because 'and' occurs twice there between this/that. I would do that with a negative lookahead-tempered dot:

my_pattern = "this(?:(?!and.*and).)*that"

This matches string_a and not string_b, so so far so good.

However, with the following sentence is also not matched (like string_b):

string_c = "this and that and"

Evidently, the negative lookahead occurs for the whole string, rather than between "this" and "that" as I had anticipated and hoped.

How can I do this instead?

CodePudding user response:

You can use another tempered greedy token to temper the .* inside the lookahead:

this(?:(?!this|that|and(?:(?!that).)*?and).)*?that

See the regex demo.

Details:

  • this - a fixed string
  • (?:(?!this|that|and(?:(?!that).)*?and).)*? - any char other than line break chars, zero or more but as few as possible occurrernces, that does not start a this, that char sequences or a pattern that matches and, then any char other than line break chars, zero or more but as few as possible occurrernces, that does not start a that char sequence and then and string
  • that - a fixed string.
  • Related