I'm starting to learn regex in order to match words in python columns and replace them for other values.
df['col1']=df['col1'].str.replace(r'(?i)unlimi \w*', 'Unlimited', regex=True)
This pattern serves to match different variations of the world Unlimited. But I have some values in the column that have not only one word, but two or more: ex:
[Unlimited, Unlimited (on-net), Unlimited (on-off-net)]`
I was wondering if there is a way to match all of the words in the previous example with a single regex line.
CodePudding user response:
You can use
df['col1']=df['col1'].str.replace(r'(?i)unlimi\w*(?:\s*\([^()]*\))?', 'Unlimited', regex=True)
See the regex demo.
The (?i)unlimi\w*(?:\s*\([^()]*\))?
regex matches
(?i)
- the regex to the right is case insensitiveunlimi
- a fixed string\w*
- zero or more word chars(?:\s*\([^()]*\))?
- an optional sequence of\s*
- zero or more whitespaces\(
- a(
char[^()]*
- zero or more chars other than(
and)
\)
- a)
char.