Home > Software design >  How to sum OHE'd columns in dataframe?
How to sum OHE'd columns in dataframe?

Time:09-27

I have a df with a few categorical columns. I applied OHE to convert those into binary. However, at the same time I also wanted to sum those columns as they are being converted. Like this:

user | product 
2    | A
2    | A
3    | B


Currently: 
    user | product_A | product_B | product_c
    2    | 1         | 0         | 0
    2    | 1         | 0         | 0
    3    | 0         | 1         | 0

But I want:

user | product_A | product_B | product_c
2    | 2         | 0         | 0
3    | 0         | 1         | 0

How would I be able to sum in the last step? Thanks

CodePudding user response:

Three ways to do this.

With your one hot-encoded df:

result = df.groupby(by=["user"]).sum()

Or with your original dataframe:

result = (
    df.groupby("user")
      .value_counts()
      .unstack(level="prod")
      .fillna(0)
      .astype(int)
)

And also with the original dataframe:

result = (
    df.assign(n=1)
      .pivot_table(
          index="user", columns="prod",
          aggfunc="sum", fill_value=0
      )
      .loc[:, "n"]
)

CodePudding user response:

Just do below on your df obtained after OHE:

df.groupby('user', as_index=False).agg('sum')
  • Related