Splitting a column into multiple using regular expression-CodePudding

I have the following table

df = pd.DataFrame({'favs':{0:'chicken_panfry1_t360_ketchup',
                          1:'chicken_bake2_t450_out_bbq',
                          2:'chicken_boiled2_season_gravy'}})

That looks like this

   favs
0 chicken_panfry1_t360_ketchup
1 chicken_bake2_t450_out_bbq
2 chicken_boiled2_season_gravy

I would like to split the column at the last underscore to create 2 new columns that look like this.

 favs                              recipe                     sauce
0 chicken_panfry1_t360_ketchup     chicken_panfry1_t360       ketchup
1 chicken_bake2_t450_out_bbq       chicken_bake2_t450_out     bbq
2 chicken_boiled2_season_gravy     chicken_boiled2_season     gravy

This is what I've tried

df[['recipe','sauce']]=df['favs'].str.split(r'.*_', expand=True)

This creates the sauce column correctly but the recipe column is blank. It looks like this. Unsure of how to correct it.

 favs                              recipe       sauce
0 chicken_panfry1_t360_ketchup                  ketchup
1 chicken_bake2_t450_out_bbq                    bbq
2 chicken_boiled2_season_gravy                  gravy

CodePudding user response：

You need Series.str.extract with the (.*)_(.*) regex pattern:

df[['recipe','sauce']]=df['favs'].str.extract(r'(.*)_(.*)', expand=True)

See the regex demo.

The (.*)_(.*) regex matches and captures the part before the last _ into Group 1 (with the first (.*)) and the part after last _ into the second column (with the second (.*)).

CodePudding user response：

No need for a regex, you can simply rsplit limiting to 1 split:

df[['recipe','sauce']] = df['favs'].str.rsplit('_', n=1, expand=True)

output:

                           favs                  recipe    sauce
0  chicken_panfry1_t360_ketchup    chicken_panfry1_t360  ketchup
1    chicken_bake2_t450_out_bbq  chicken_bake2_t450_out      bbq
2  chicken_boiled2_season_gravy  chicken_boiled2_season    gravy