Split a dataframe about a column of a character in a special manner-CodePudding

I'm very new to pandas, want some guidance from you smart folks.

df.head():

feature_category
transgender_gender
725-750_crif_score
<25_age
<575_crif_score

I want to make a separate column containing the string after the first under score.

df:

feature_category.             feature_name
transgender_gender              gender 
725-750_crif_score              crif_score
<25_age                         age
<575_crif_score                 crif_score

Please guide to achieve the desired results.

CodePudding user response：

You could use str.split method and setting parameter n=1, which limits the number of splits to 1. Then use the str accessor to select the second part:

df['feature_name'] = df['feature_category'].str.split('_', 1).str[1]

Output:

     feature_category feature_name
0  transgender_gender       gender
1  725-750_crif_score   crif_score
2             <25_age          age
3     <575_crif_score   crif_score

CodePudding user response：

Use str.extract:

df['feature_name'] = df['feature_category'].str.extract('_(.*)')
print(df)

# Output
     feature_category feature_name
0  transgender_gender       gender
1  725-750_crif_score   crif_score
2             <25_age          age
3     <575_crif_score   crif_score

_(.*) extract all characters after the first underscore.