I have a dataframe that has one column of strings. I need to clean this column and remove any text within a parenthesis from each string. For example
Names |
---|
Mike (5 friends) |
Tom |
Joe (2 friends) |
Alex |
I want df to look like this after:
Names |
---|
Mike |
Tom |
Joe |
Alex |
Currently my code looks like this
import re
for i in df["Names"]:
if i contains r"\([^()]*\)"
i = re.sub(r"\([^()]*\)", "", i)
But I am getting a syntax error on the if statement line. What do I need to set my if statement conntains condition to in order to make this work, while staying dynamic for the number inside the parenthesis. Thanks
I used the following code on an isolated string and it worked as i wanted. I'm having trouble understanding why this same line wouldn't also work as my "contains" condition
re.sub(r"\([^()]*\)", ""
CodePudding user response:
Use str.replace
:
df["Names"] = df["Names"].str.replace(r'\s*\(.*?\)$', '')
Here is a regex demo showing that the above replacement logic is working.
CodePudding user response:
df['Names'] = df['Names'].str.replace(r"\(.[^\)] .",'',regex=True)
OR
import re
lst = ["Mike (5 friends)",
"Tom",
"Joe (2 friends)",
"Alex",]
new_lst = [re.sub(r'\(.[^\)] .','',a).strip() for a in lst]
print(new_lst)
OUTPUT
['Mike', 'Tom', 'Joe', 'Alex']
CodePudding user response:
try this. It will go through the string letter wise and will trim it off when opening parenthesis is found.
def CustomParser(word):
trim_position = -1
for j in len(word):
letter = word[j:j 1]
if letter == "(":
trim_position = j
return word[0,trim_position].strip()
CodePudding user response:
You need to use
df["Names"] = df["Names"].str.replace(r'\s*\([^()]*\)$', '', regex=True)
If there can be trailing whitespaces:
df["Names"] = df["Names"].str.replace(r'\s*\([^()]*\)\s*$', '', regex=True)
Details:
\s*
- zero or more whitespaces\(
- a(
char[^()]*
- zero or more chars other than(
and)
\)
- a)
char$
- end of string.
NOTE on regex=True
:
Acc. to Pandas 1.2.0 release notes:
The default value of regex for
Series.str.replace()
will change from True to False in a future release. In addition, single character regular expressions will not be treated as literal strings when regex=True is set (GH24804).