Home > OS >  How to replace an dynamic text phrase within parenthesis in a string?
How to replace an dynamic text phrase within parenthesis in a string?

Time:05-12

I have a dataframe that has one column of strings. I need to clean this column and remove any text within a parenthesis from each string. For example

Names
Mike (5 friends)
Tom
Joe (2 friends)
Alex

I want df to look like this after:

Names
Mike
Tom
Joe
Alex

Currently my code looks like this

import re

for i in df["Names"]:
    if i contains r"\([^()]*\)"
        i = re.sub(r"\([^()]*\)", "", i)

But I am getting a syntax error on the if statement line. What do I need to set my if statement conntains condition to in order to make this work, while staying dynamic for the number inside the parenthesis. Thanks

I used the following code on an isolated string and it worked as i wanted. I'm having trouble understanding why this same line wouldn't also work as my "contains" condition

re.sub(r"\([^()]*\)", ""

CodePudding user response:

Use str.replace:

df["Names"] = df["Names"].str.replace(r'\s*\(.*?\)$', '')

Here is a regex demo showing that the above replacement logic is working.

CodePudding user response:

try this. It will go through the string letter wise and will trim it off when opening parenthesis is found.

def CustomParser(word):
    trim_position = -1
    for j in len(word):
        letter = word[j:j 1]
        if letter == "(":
            trim_position = j

    return word[0,trim_position].strip()

CodePudding user response:


df['Names'] = df['Names'].str.replace(r"\(.[^\)] .",'',regex=True)

OR

import re
lst = ["Mike (5 friends)",
"Tom",
"Joe (2 friends)",
"Alex",]

new_lst = [re.sub(r'\(.[^\)] .','',a).strip() for a in lst]
print(new_lst)

OUTPUT

['Mike', 'Tom', 'Joe', 'Alex']
  • Related