How to replace an dynamic text phrase within parenthesis in a string?-CodePudding

I have a dataframe that has one column of strings. I need to clean this column and remove any text within a parenthesis from each string. For example

Names
Mike (5 friends)
Tom
Joe (2 friends)
Alex

I want df to look like this after:

Names
Mike
Tom
Joe
Alex

Currently my code looks like this

import re

for i in df["Names"]:
    if i contains r"\([^()]*\)"
        i = re.sub(r"\([^()]*\)", "", i)

But I am getting a syntax error on the if statement line. What do I need to set my if statement conntains condition to in order to make this work, while staying dynamic for the number inside the parenthesis. Thanks

I used the following code on an isolated string and it worked as i wanted. I'm having trouble understanding why this same line wouldn't also work as my "contains" condition

re.sub(r"\([^()]*\)", ""

CodePudding user response：

Use str.replace:

df["Names"] = df["Names"].str.replace(r'\s*\(.*?\)$', '')

Here is a regex demo showing that the above replacement logic is working.

CodePudding user response：

try this. It will go through the string letter wise and will trim it off when opening parenthesis is found.

def CustomParser(word):
    trim_position = -1
    for j in len(word):
        letter = word[j:j 1]
        if letter == "(":
            trim_position = j

    return word[0,trim_position].strip()

CodePudding user response：


df['Names'] = df['Names'].str.replace(r"\(.[^\)] .",'',regex=True)

import re
lst = ["Mike (5 friends)",
"Tom",
"Joe (2 friends)",
"Alex",]

new_lst = [re.sub(r'\(.[^\)] .','',a).strip() for a in lst]
print(new_lst)

OUTPUT

['Mike', 'Tom', 'Joe', 'Alex']