I have a dataframe as such:
offer_id hurdle hurdle_lvl reward_value
0 5c0c1545a944456aa28dcf578e0cbdd2 35000.0 1 500.0
1 5c0c1545a944456aa28dcf578e0cbdd2 40000.0 2 1500.0
2 5c0c1545a944456aa28dcf578e0cbdd2 45000.0 3 3000.0
3 f21306541ae046edbdf0a79daea3a005 500.0 1 25.0
4 f21306541ae046edbdf0a79daea3a005 750.0 2 100.0
5 f21306541ae046edbdf0a79daea3a005 25000.0 2 1500.0
I need to reformat it such that
offer_id hurdle_1 hurdle_2 hurdle_3 reward_1 reward_2 reward_3
0 5c0c1545a944456aa28dcf578e0cbdd2 35000.0 40000.0 45000.0 500.0 1500.0 3000.0
1 f21306541ae046edbdf0a79daea3a005 500.0 750.0 25000.0 25.0 100.0 1500.0
So stack the hurdle and reward rows as columns based on the hurdle_lvl column. Any help is greatly appreciated
So I used pivot table:
y.pivot_table(index=y.groupby('hurdle_lvl').cumcount(), columns='hurdle_lvl', values=['hurdle','reward_value'])
But this gives me a dataframe like the following:
hurdle reward_value
hurdle_lvl 1 2 3 1 2 3
0 35000.0 40000.0 45000.0 500.0 1500.0 3000.0
1 500.0 750.0 30000.0 25.0 100.0 1500.0
The problem is that I lose the offer_id mapping. Any way to combine that to the pivoted table?
CodePudding user response:
Use pivot_table
and sum common values.
out = df.astype({'hurdle_lvl': str}) \
.pivot_table(['hurdle', 'reward_value'], 'offer_id', 'hurdle_lvl',
aggfunc='sum', fill_value=0)
out.columns = out.columns.to_flat_index().str.join('_')
Output:
>>> out
hurdle_1 hurdle_2 hurdle_3 reward_value_1 reward_value_2 reward_value_3
offer_id
5c0c1545a944456aa28dcf578e0cbdd2 35000 40000 45000 500 1500 3000
f21306541ae046edbdf0a79daea3a005 500 25750 0 25 1600 0