Home > Net >  How to not match a substring in any location of the main string
How to not match a substring in any location of the main string

Time:02-18


This might seem to be a repetitive question here but I have tried all other SO posts and the suggestions are not working for me.
Basically, I want to exclude strings that have a particular substring in them, either at the beginning, middle or at the end.

Here is an example,
Max_Num_HR, HR_Max_Num, Max_HR_Num
I want to exclude the strings that contain either _HR (at the end), HR_(at the beginning) or _HR_ (in between)

What I have tried so far:
r"(^((?!HR_).*))(?<!_HR)$"
This will successfully exclude strings that have HR_ (at the beginning) and _HR (at the end), but not _HR_ (in between)

I have looked at How to exclude a string in the middle of a RegEx string?
But their solution did not seem to work for me.

I understand that the first segment of my code (^((?!HR_).*)) will exclude everything that contains HR_ since I have a ^ at the beginning followed by a negative lookahead. The second segment (?<!_HR)$ will begin at the end of the string and perform a negative lookbehind to see if _HR is not included at the end. Going with this train of thought, I tried including (?!_HR_) in between the two segments, but to no avail.

So, how do I get it to exclude all three HR_, _HR_, _HR considering Max_Num_HR, HR_Max_Num, Max_HR_Num as the test case?

CodePudding user response:

The pattern is missing the assertion for _HR_ somewhere in the string.

You can add the negative lookbehind to assert not _HR at the end after the dollar sign like $(?<!_HR) to prevent some backtracking over the .

Note that for a match only you don't need the capture groups.

^(?!HR_)(?!.*_HR_). $(?<!_HR)
  • ^ Start of string
  • (?!HR_) Assert not HR_ at the start
  • (?!.*_HR_) Assert not _HR_ in the string
  • . $ Match 1 chars to not match an empty string, and assert end of string
  • (?<!_HR) Assert not _HR to the left

Regex demo

CodePudding user response:

One way to avoid matching strings that contain 'HR_' at the beginning, '_HR_' in the middle or '_HR' at the end is to match a regular expression having a beginning-of-string anchor followed by a negative lookahead, followed by .*:

^(?!HR_|. _HR_.|. _HR$).*

Demo

Note that lines containing '_HR_' at the beginning or end are matched, as per the specification.

The negative lookahead reads, "do not match 'HR_' at the beginning of the string or '_HR_' when preceded by at least one character and followed by one character (possibly more than one) or '_HR' at the end of the string.

The entire string is matched if and only if the negative lookahead succeeds.

The negative lookahead could of course be replaced by three negative lookaheads:

^(?!HR_)(?!. _HR_.)(?!. _HR$).*
  • Related