Home > Enterprise >  How to sum values instead of creating binary values when creating pandas dummies?
How to sum values instead of creating binary values when creating pandas dummies?

Time:10-12

I have a dataframe

Month      | Acct_id| Sku    | Cost $
2020-01-01 |1       |book    |10
2020-01-02 |2       |phone   |100
2020-01-01 |3       |book    |5

Now, I want to create dummies of "Sku" but instead of binary values, I want the values to be the sum of "Cost" and the unique count of account like this:

Month      | book   | phone | total_accounts
2020-01-01 | $15    | $0     | 2
2020-01-02 | $0     | $100   | 1

I am using

dummies=df.set_index('Month')['Sku'].str.get_dummies().sum(level=0).reset_index()

But the output gives only binary values and also, it does NOT grab the cost and account column at all! How do I tweak this?

CodePudding user response:

The main idea here get the groupby and pivot , however, with unstack

out = df.groupby(['Month','Sku']).\
      agg(nan= ('Cost$','sum'),total_accounts= ('Acct_id','nunique')).\
             set_index('total_accounts',append=True)['nan'].unstack(level='Sku',fill_value=0).reset_index()
Out[35]: 
Sku       Month  total_accounts  book  phone
0    2020-01-01               2    15      0
1    2020-01-02               1     0    100
  • Related