Home > Net >  Regex matches on exclusion/inclusion of a word with various possessiveness then a key word
Regex matches on exclusion/inclusion of a word with various possessiveness then a key word

Time:02-10

  1. I want to look for various types of matches on the word "car" but not if its preceded by "Jane, Jane's, Janes, and Jane(s).

the following 2 regex partially work for exclusion and inclusion, but I can't get the other variants to work

  • (?<!\bJane) car
  • Jane car

for example

  • the car is red - Match
  • here is Jane car is red -> None
  • here is Janes car is red -> None
  • here is Jane's car is red -> None
  1. I also want to find the cases Jane is in the phrase

    • the car is red - None
    • here is Jane car is red - Match
    • here is Janes car is red - Match
    • here is Jane's car is red - Match
  2. and where car is not preceding by Jane(s)

  • here Jane(s) car is red - None
  1. and of course the opposite
  • here is Jane(s) car is red - Match

Edit

If I have a document with "red car\n and Janes car" this should be a Match as there is a reference to "car" without the word Jane/Janes/Jane's/etc. in front of it.

In fact, for additional clarity. I will be doing a re.Findall for all the occurrences of "car" without the word Jane in front of them.

CodePudding user response:

If you want to match it where the different forms of Jane should not occur, you can exclude the match with a negative lookahead, and then still match car

^(?!.*\bJane(?:'?s|\(s\))? car\b).*\bcar\b.*
  • ^ Start of string
  • (?! Negative lookahead
    • .*\bJane(?:'?s|\(s\))? Match Jane Janes Jane's Jane(s)
    • car\b Match a space and the word car
  • ) Close the lookahead
  • .*\bcar\b.* Match the whole line with the word car between word boundaries

Regex demo

If the different forms of Jane followed by car should be there, you can match it:

^.*\bJane(?:'?s|\(s\))? car\b.*

Regex demo

To matching all occurrences of car without the ones that have Jane in front of it, you can match what you don't want to keep and capture what you do want to keep.

Then in Python you can use re.findall which will return the capture group values and remove the empty entries from the result.

\bJane(?:'?s|\(s\))? car\b|\b(car)\b

Regex demo | Python demo

  • Related