I have a df with a few categorical columns. I applied OHE to convert those into binary. However, at the same time I also wanted to sum those columns as they are being converted. Like this:
user | product
2 | A
2 | A
3 | B
Currently:
user | product_A | product_B | product_c
2 | 1 | 0 | 0
2 | 1 | 0 | 0
3 | 0 | 1 | 0
But I want:
user | product_A | product_B | product_c
2 | 2 | 0 | 0
3 | 0 | 1 | 0
How would I be able to sum in the last step? Thanks
CodePudding user response:
Three ways to do this.
With your one hot-encoded df:
result = df.groupby(by=["user"]).sum()
Or with your original dataframe:
result = (
df.groupby("user")
.value_counts()
.unstack(level="prod")
.fillna(0)
.astype(int)
)
And also with the original dataframe:
result = (
df.assign(n=1)
.pivot_table(
index="user", columns="prod",
aggfunc="sum", fill_value=0
)
.loc[:, "n"]
)
CodePudding user response:
Just do below on your df obtained after OHE:
df.groupby('user', as_index=False).agg('sum')