Home > Mobile >  regex pattern to match whole word or word followed by another
regex pattern to match whole word or word followed by another

Time:03-15

I'm starting to learn regex in order to match words in python columns and replace them for other values.

df['col1']=df['col1'].str.replace(r'(?i)unlimi \w*', 'Unlimited', regex=True)

This pattern serves to match different variations of the world Unlimited. But I have some values in the column that have not only one word, but two or more: ex:

[Unlimited, Unlimited (on-net), Unlimited (on-off-net)]`

I was wondering if there is a way to match all of the words in the previous example with a single regex line.

CodePudding user response:

You can use

df['col1']=df['col1'].str.replace(r'(?i)unlimi\w*(?:\s*\([^()]*\))?', 'Unlimited', regex=True)

See the regex demo.

The (?i)unlimi\w*(?:\s*\([^()]*\))? regex matches

  • (?i) - the regex to the right is case insensitive
  • unlimi - a fixed string
  • \w* - zero or more word chars
  • (?:\s*\([^()]*\))? - an optional sequence of
    • \s* - zero or more whitespaces
    • \( - a ( char
    • [^()]* - zero or more chars other than ( and )
    • \) - a ) char.
  • Related