I have a dataframe
Month | Acct_id| Sku | Cost $
2020-01-01 |1 |book |10
2020-01-02 |2 |phone |100
2020-01-01 |3 |book |5
Now, I want to create dummies of "Sku" but instead of binary values, I want the values to be the sum of "Cost" and the unique count of account like this:
Month | book | phone | total_accounts
2020-01-01 | $15 | $0 | 2
2020-01-02 | $0 | $100 | 1
I am using
dummies=df.set_index('Month')['Sku'].str.get_dummies().sum(level=0).reset_index()
But the output gives only binary values and also, it does NOT grab the cost and account column at all! How do I tweak this?
CodePudding user response:
The main idea here get the groupby
and pivot
, however, with unstack
out = df.groupby(['Month','Sku']).\
agg(nan= ('Cost$','sum'),total_accounts= ('Acct_id','nunique')).\
set_index('total_accounts',append=True)['nan'].unstack(level='Sku',fill_value=0).reset_index()
Out[35]:
Sku Month total_accounts book phone
0 2020-01-01 2 15 0
1 2020-01-02 1 0 100