Values in my DataFrame look like this:
id val
big_val_167 80
renv_100 100
color_100 200
color_60/write_10 200
I want to remove everything in values of id column after _numeric. So desired result must look like:
id val
big_val 80
renv 100
color 200
color 200
How to do that? I know that str.replace()
can be used, but I don't understand how to write regular expression part in it.
CodePudding user response:
You can use regex(re.search
) to find the first occurence of _ digit and then you can solve the problem.
Code:
import re
import pandas as pd
def fix_id(id):
# Find the first occurence of: _ digits in the id:
digit_search = re.search(r"_\d", id)
return id[:digit_search.start()]
# Your df
df = pd.DataFrame({"id": ["big_val_167", "renv_100", "color_100", "color_60/write_10"],
"val": [80, 100, 200, 200]})
df["id"] = df["id"].apply(fix_id)
print(df)
Output:
id val
0 big_val 80
1 renv 100
2 color 200
3 color 200