I have a dataframe
Month | Acct_id| Sku
2020-01-01 |1 |book
2020-01-02 |2 |phone
2020-01-01 |3 |book
Now, I want to create dummies of the "Sku" column and sum of the resulting binary values when grouping by month. Additionally, I also want to get unique count for the "Acct_id" column like this:
Month | book | phone | total_accounts
2020-01-01 | 2 | 0 | 2
2020-01-02 | 0 | 1 | 1
I am using
dummies=df.set_index('Month')['Sku'].str.get_dummies().sum(level=0).reset_index()
But the output gives only binary values and it not summing. Also, it does NOT grab the account column the way I want it! How do I tweak this?
CodePudding user response:
Use crosstab
res = pd.crosstab(index=df["Month"], columns=df["Sku"], margins=True, margins_name="total_counts").drop("total_counts")
print(res)
Output
Sku book phone total_counts
Month
2020-01-01 2 0 2
2020-01-02 0 1 1
If you need to strictly match the output, just do as @ddejohn suggested:
res = pd.crosstab(index=df["Month"], columns=df["Sku"], margins=True, margins_name="total_counts").drop("total_counts")
res = res.reset_index().rename_axis(None, axis=1)
print(res)
Output
Month book phone total_counts
0 2020-01-01 2 0 2
1 2020-01-02 0 1 1
CodePudding user response:
Let us try
out = pd.crosstab(df.Month, df.Sku).join(df.groupby('Month')['Acct_id'].nunique().to_frame('total_accounts'))
Out[52]:
book phone total_accounts
Month
2020-01-01 2 0 2
2020-01-02 0 1 1