I have a dataframe that has one column of strings. I need to clean this column and remove any text within a parenthesis from each string. For example
Names |
---|
Mike (5 friends) |
Tom |
Joe (2 friends) |
Alex |
I want df to look like this after:
Names |
---|
Mike |
Tom |
Joe |
Alex |
Currently my code looks like this
import re
for i in df["Names"]:
if i contains r"\([^()]*\)"
i = re.sub(r"\([^()]*\)", "", i)
But I am getting a syntax error on the if statement line. What do I need to set my if statement conntains condition to in order to make this work, while staying dynamic for the number inside the parenthesis. Thanks
I used the following code on an isolated string and it worked as i wanted. I'm having trouble understanding why this same line wouldn't also work as my "contains" condition
re.sub(r"\([^()]*\)", ""
CodePudding user response:
Use str.replace
:
df["Names"] = df["Names"].str.replace(r'\s*\(.*?\)$', '')
Here is a regex demo showing that the above replacement logic is working.
CodePudding user response:
try this. It will go through the string letter wise and will trim it off when opening parenthesis is found.
def CustomParser(word):
trim_position = -1
for j in len(word):
letter = word[j:j 1]
if letter == "(":
trim_position = j
return word[0,trim_position].strip()
CodePudding user response:
df['Names'] = df['Names'].str.replace(r"\(.[^\)] .",'',regex=True)
OR
import re
lst = ["Mike (5 friends)",
"Tom",
"Joe (2 friends)",
"Alex",]
new_lst = [re.sub(r'\(.[^\)] .','',a).strip() for a in lst]
print(new_lst)
OUTPUT
['Mike', 'Tom', 'Joe', 'Alex']