Home > Net >  Regex - Exclude pattern if a certain word appears after the desired word
Regex - Exclude pattern if a certain word appears after the desired word

Time:07-29

I want my regex to match the appearance of a certain word, except if it is followed by another specific word.

More specifically, I would like it to match "union" (in the sense of union or loyalty to a group, so it would not include words like "reunion", i.e. with word boundaries at the beginning and end of the string) in all cases, except when the string says "union europea" (which is understood as an administration and does not appeal to a group in the same way).

Using the pattern union\b does not help, because it would also match the aforementioned sentence.

CodePudding user response:

You can use a negative lookahead:

pattern = '\W(union)\W(?!europea)'

As pointed out by @Michael Ruth, you probably don't want to capture words other than union. So, with some test data:

unionize
union 
union europea
reunion 

This pattern only captures union in the second case, (ie., it does not capture reunion or unionize. The \W are non-word characters, so additional letters (like from reunion and unionize) are not captured.

CodePudding user response:

Use

pattern = r'\bunion\b(?!\W*europea)'

(?!\W*europea) excludes matches where union is followed with nonword characters (if any) and then europea string.

  • Related