I have the following table
df = pd.DataFrame({'favs':{0:'chicken_panfry1_t360_ketchup',
1:'chicken_bake2_t450_out_bbq',
2:'chicken_boiled2_season_gravy'}})
That looks like this
favs
0 chicken_panfry1_t360_ketchup
1 chicken_bake2_t450_out_bbq
2 chicken_boiled2_season_gravy
I would like to split the column at the last underscore to create 2 new columns that look like this.
favs recipe sauce
0 chicken_panfry1_t360_ketchup chicken_panfry1_t360 ketchup
1 chicken_bake2_t450_out_bbq chicken_bake2_t450_out bbq
2 chicken_boiled2_season_gravy chicken_boiled2_season gravy
This is what I've tried
df[['recipe','sauce']]=df['favs'].str.split(r'.*_', expand=True)
This creates the sauce column correctly but the recipe column is blank. It looks like this. Unsure of how to correct it.
favs recipe sauce
0 chicken_panfry1_t360_ketchup ketchup
1 chicken_bake2_t450_out_bbq bbq
2 chicken_boiled2_season_gravy gravy
CodePudding user response:
You need Series.str.extract
with the (.*)_(.*)
regex pattern:
df[['recipe','sauce']]=df['favs'].str.extract(r'(.*)_(.*)', expand=True)
See the regex demo.
The (.*)_(.*)
regex matches and captures the part before the last _
into Group 1 (with the first (.*)
) and the part after last _
into the second column (with the second (.*)
).
CodePudding user response:
No need for a regex, you can simply rsplit
limiting to 1 split:
df[['recipe','sauce']] = df['favs'].str.rsplit('_', n=1, expand=True)
output:
favs recipe sauce
0 chicken_panfry1_t360_ketchup chicken_panfry1_t360 ketchup
1 chicken_bake2_t450_out_bbq chicken_bake2_t450_out bbq
2 chicken_boiled2_season_gravy chicken_boiled2_season gravy