I want my regex to match the appearance of a certain word, except if it is followed by another specific word.
More specifically, I would like it to match "union"
(in the sense of union or loyalty to a group, so it would not include words like "reunion"
, i.e. with word boundaries at the beginning and end of the string) in all cases, except when the string says "union europea"
(which is understood as an administration and does not appeal to a group in the same way).
Using the pattern union\b
does not help, because it would also match the aforementioned sentence.
CodePudding user response:
You can use a negative lookahead:
pattern = '\W(union)\W(?!europea)'
As pointed out by @Michael Ruth, you probably don't want to capture words other than union
. So, with some test data:
unionize
union
union europea
reunion
This pattern only captures union
in the second case, (ie., it does not capture reunion
or unionize
. The \W
are non-word characters, so additional letters (like from reunion
and unionize
) are not captured.
CodePudding user response:
Use
pattern = r'\bunion\b(?!\W*europea)'
(?!\W*europea)
excludes matches where union
is followed with nonword characters (if any) and then europea
string.