Home > Software engineering >  How to create sum of binary dummy variable grouped by month in pandas?
How to create sum of binary dummy variable grouped by month in pandas?

Time:10-13

I have a dataframe

Month      | Acct_id| Sku    
2020-01-01 |1       |book   
2020-01-02 |2       |phone 
2020-01-01 |3       |book  

Now, I want to create dummies of the "Sku" column and sum of the resulting binary values when grouping by month. Additionally, I also want to get unique count for the "Acct_id" column like this:

Month      | book   | phone | total_accounts 
2020-01-01 | 2      | 0     | 2
2020-01-02 | 0      | 1     | 1

I am using

dummies=df.set_index('Month')['Sku'].str.get_dummies().sum(level=0).reset_index()

But the output gives only binary values and it not summing. Also, it does NOT grab the account column the way I want it! How do I tweak this?

CodePudding user response:

Use crosstab

res = pd.crosstab(index=df["Month"], columns=df["Sku"], margins=True, margins_name="total_counts").drop("total_counts")
print(res)

Output

Sku         book  phone  total_counts
Month                                
2020-01-01     2      0             2
2020-01-02     0      1             1

If you need to strictly match the output, just do as @ddejohn suggested:

res = pd.crosstab(index=df["Month"], columns=df["Sku"], margins=True, margins_name="total_counts").drop("total_counts")
res = res.reset_index().rename_axis(None, axis=1)
print(res)

Output

        Month  book  phone  total_counts
0  2020-01-01     2      0             2
1  2020-01-02     0      1             1

CodePudding user response:

Let us try

out = pd.crosstab(df.Month, df.Sku).join(df.groupby('Month')['Acct_id'].nunique().to_frame('total_accounts'))
Out[52]: 
            book  phone  total_accounts
Month                                  
2020-01-01     2      0               2
2020-01-02     0      1               1
  • Related