here is a dataset i have
ID | Details |
---|---|
1 | she wants to rent out her unit |
2 | he wants to rent 2 bedroom apartment |
3 | looking for renting in a downtown |
4 | wants to rent out her villa |
I need to regex and get only rows with 'rent', but not 'rent out'
Here is how I want it to look like:
ID | Details |
---|---|
2 | he wants to rent 2 bedroom apartment |
3 | looking for renting in a downtown |
I have tried this one:
df['Details'].str.contains(r'rent|renting')]
However, it still selects rows with rent out
CodePudding user response:
Use: r'rent(?! out)'
df['Details'].str.contains(r'rent(?! out)')
this will match lines which contain:
- rent not followed by out and
- renting, which basically is also rent not followed by out
Test regex here: https://regex101.com/r/feHovb/1
CodePudding user response:
You need to use
df['Details'].str.contains(r'\brent(?:ing)?\b(?!\s out\b)')
See the regex demo. Details:
\b
- a word boundaryrent(?:ing)?
-rent
orrenting
(basically, almost the same as(?:renting|rent)
)\b
- a word boundary(?!\s out\b)
- a negative lookahead that fails the match if there are zero or more whitespaces and thenout
as a whole word immediately to the right of the current location.