Home > Mobile >  Regex only if there is no preceding word 'out'
Regex only if there is no preceding word 'out'

Time:03-26

here is a dataset i have

ID Details
1 she wants to rent out her unit
2 he wants to rent 2 bedroom apartment
3 looking for renting in a downtown
4 wants to rent out her villa

I need to regex and get only rows with 'rent', but not 'rent out'

Here is how I want it to look like:

ID Details
2 he wants to rent 2 bedroom apartment
3 looking for renting in a downtown

I have tried this one:

df['Details'].str.contains(r'rent|renting')]

However, it still selects rows with rent out

CodePudding user response:

Use: r'rent(?! out)'

df['Details'].str.contains(r'rent(?! out)')

this will match lines which contain:

  • rent not followed by out and
  • renting, which basically is also rent not followed by out

Test regex here: https://regex101.com/r/feHovb/1

CodePudding user response:

You need to use

df['Details'].str.contains(r'\brent(?:ing)?\b(?!\s out\b)')

See the regex demo. Details:

  • \b - a word boundary
  • rent(?:ing)? - rent or renting (basically, almost the same as (?:renting|rent))
  • \b - a word boundary
  • (?!\s out\b) - a negative lookahead that fails the match if there are zero or more whitespaces and then out as a whole word immediately to the right of the current location.
  • Related