I'm very new to pandas, want some guidance from you smart folks.
df.head()
:
feature_category
transgender_gender
725-750_crif_score
<25_age
<575_crif_score
I want to make a separate column containing the string after the first under score.
df
:
feature_category. feature_name
transgender_gender gender
725-750_crif_score crif_score
<25_age age
<575_crif_score crif_score
Please guide to achieve the desired results.
CodePudding user response:
You could use str.split
method and setting parameter n=1
, which limits the number of splits to 1. Then use the str
accessor to select the second part:
df['feature_name'] = df['feature_category'].str.split('_', 1).str[1]
Output:
feature_category feature_name
0 transgender_gender gender
1 725-750_crif_score crif_score
2 <25_age age
3 <575_crif_score crif_score
CodePudding user response:
Use str.extract
:
df['feature_name'] = df['feature_category'].str.extract('_(.*)')
print(df)
# Output
feature_category feature_name
0 transgender_gender gender
1 725-750_crif_score crif_score
2 <25_age age
3 <575_crif_score crif_score
_(.*)
extract all characters after the first underscore.