Home > Blockchain >  Pandas melt dataframe while separating value columns
Pandas melt dataframe while separating value columns

Time:05-12

I have a pandas dataframe like so:

id  cost  val1  val1_avg  val1_tot  val2  val2_avg  val2_tot  val3  val3_avg  val3_tot  
1   5     29    24        45        15    18        78        25    32        56

Now I want to melt this dataframe on id and cost, such that I also separate value, avg value & total value for val1, val2, & val3. So finally I want to get below dataframe:

id  cost variable  value  value_avg  value_tot
1    5   val1      29     24         45
1    5   val3      15     18         78
1    5   val3      25     32         56

How can I achieve this?

CodePudding user response:

You can try pandas.wide_to_long

df.columns = [col.split('_')[1]   '_'   col.split('_')[0] if '_' in col else col for col in df.columns]

out = pd.wide_to_long(df, ["val", "avg_val", "tot_val"], i=["id", "cost"], j='variable').reset_index()
out['variable'] = 'val'   out['variable'].astype(str)

out.columns = [col.split('_')[1]   '_'   col.split('_')[0] if '_' in col else col for col in out.columns]
print(out)

   id  cost variable  val  val_avg  val_tot
0   1     5     val1   29       24       45
1   1     5     val2   15       18       78
2   1     5     val3   25       32       56

CodePudding user response:

Here is an approach:

from itertools import zip_longest
(df
 .set_index(['id', 'cost'])
 .pipe(lambda d: d.set_axis(pd.MultiIndex
                              .from_arrays(zip_longest(*d.columns.str.split('(?=_)'),
                                                       fillvalue='')),
                            axis=1
                           ))
 .rename_axis(columns=('variable', None))
 .stack(0)
 .add_prefix('value')
 .reset_index()
)

Alternative with melt pivot:

(df
 .melt(id_vars=['id', 'cost'])
 .assign(col=lambda d: 'value' d['variable'].str.split('(?=_)').str[1].fillna(''),
         variable=lambda d: d['variable'].str.split('_').str[0],
        )
 .pivot(['id', 'cost', 'variable'], 'col', 'value')
 .reset_index()
)

output:

   id  cost variable  value  value_avg  value_tot
0   1     5     val1     29         24         45
1   1     5     val2     15         18         78
2   1     5     val3     25         32         56
  • Related