I have a data frame as shown below.
id type d_count d_amt c_count c_amt
1 restaurant 1 10 NaN NaN
1 restaurant 3 8 NaN NaN
1 fees NaN NaN 2 100
1 health 1 20 NaN NaN
1 restaurant NaN NaN 2 15
1 restaurant 1 10 NaN NaN
1 restaurant NaN NaN 1 3
2 wage NaN NaN 1 10
2 wage NaN NaN 3 30
2 fees NaN NaN 2 100
2 health 1 20 NaN NaN
2 fees NaN NaN 2 15
2 fees 1 10 NaN NaN
2 fees 1 3 NaN NaN
From the above I would like to prepare below data frames.
Explanation: For each customer calculate their different type
total amounts
Expected output:
df1:
id restaurant_d_amt_sum health_d_amt_sum fees_d_amt_sum
1 28 20 NaN
2 NaN 20 13
df2:
id fees_c_amt_sum restaurant_c_amt_sum wage_c_amt_sum
1 100 18 NaN
2 115 NaN 40
CodePudding user response:
df.groupby(['id','type'])['d_amt'].sum().unstack().add_suffix('_d_amt_sum').reset_index()
type id fees_d_amt_sum health_d_amt_sum restaurant_d_amt_sum wage_d_amt_sum
0 1 0.0 20.0 28.0 NaN
1 2 13.0 20.0 NaN 0.0
df.groupby(['id','type'])['c_amt'].sum().unstack().add_suffix('_c_amt_sum').reset_index()
type id fees_c_amt_sum health_c_amt_sum restaurant_c_amt_sum wage_c_amt_sum
0 1 100.0 0.0 18.0 NaN
1 2 115.0 0.0 NaN 40.0
CodePudding user response:
df.pivot_table(columns='type', index='id', aggfunc='sum')
Output:
c_amt c_count d_amt d_count
type fees health restaurant wage fees health restaurant wage fees health restaurant wage fees health restaurant wage
id
1 100.0 0.0 18.0 NaN 2.0 0.0 3.0 NaN 0.0 20.0 28.0 NaN 0.0 1.0 5.0 NaN
2 115.0 0.0 NaN 40.0 4.0 0.0 NaN 4.0 13.0 20.0 NaN 0.0 2.0 1.0 NaN 0.0