- I want to look for various types of matches on the word "car" but not if its preceded by "Jane, Jane's, Janes, and Jane(s).
the following 2 regex partially work for exclusion and inclusion, but I can't get the other variants to work
- (?<!\bJane) car
- Jane car
for example
- the car is red - Match
- here is Jane car is red -> None
- here is Janes car is red -> None
- here is Jane's car is red -> None
I also want to find the cases Jane is in the phrase
- the car is red - None
- here is Jane car is red - Match
- here is Janes car is red - Match
- here is Jane's car is red - Match
and where car is not preceding by Jane(s)
- here Jane(s) car is red - None
- and of course the opposite
- here is Jane(s) car is red - Match
Edit
If I have a document with "red car\n and Janes car" this should be a Match as there is a reference to "car" without the word Jane/Janes/Jane's/etc. in front of it.
In fact, for additional clarity. I will be doing a re.Findall for all the occurrences of "car" without the word Jane in front of them.
CodePudding user response:
If you want to match it where the different forms of Jane
should not occur, you can exclude the match with a negative lookahead, and then still match car
^(?!.*\bJane(?:'?s|\(s\))? car\b).*\bcar\b.*
^
Start of string(?!
Negative lookahead.*\bJane(?:'?s|\(s\))?
MatchJane
Janes
Jane's
Jane(s)
car\b
Match a space and the word car
)
Close the lookahead.*\bcar\b.*
Match the whole line with the wordcar
between word boundaries
If the different forms of Jane followed by car should be there, you can match it:
^.*\bJane(?:'?s|\(s\))? car\b.*
To matching all occurrences of car
without the ones that have Jane
in front of it, you can match what you don't want to keep and capture what you do want to keep.
Then in Python you can use re.findall
which will return the capture group values and remove the empty entries from the result.
\bJane(?:'?s|\(s\))? car\b|\b(car)\b