regex pattern to match whole word or word followed by another-CodePudding

I'm starting to learn regex in order to match words in python columns and replace them for other values.

df['col1']=df['col1'].str.replace(r'(?i)unlimi \w*', 'Unlimited', regex=True)

This pattern serves to match different variations of the world Unlimited. But I have some values in the column that have not only one word, but two or more: ex:

[Unlimited, Unlimited (on-net), Unlimited (on-off-net)]`

I was wondering if there is a way to match all of the words in the previous example with a single regex line.

CodePudding user response：

You can use

df['col1']=df['col1'].str.replace(r'(?i)unlimi\w*(?:\s*\([^()]*\))?', 'Unlimited', regex=True)

See the regex demo.

The (?i)unlimi\w*(?:\s*\([^()]*\))? regex matches

(?i) - the regex to the right is case insensitive
unlimi - a fixed string
\w* - zero or more word chars
(?:\s*\([^()]*\))? - an optional sequence of
- \s* - zero or more whitespaces
- \( - a ( char
- [^()]* - zero or more chars other than ( and )
- \) - a ) char.