I have a data frame as shown below
df:
cust_id product
1 tv
1 phone
2 bat
2 ball
3 bat
4 ball
4 bat
4 tv
4 phone
5 tv
6 bat
7 bat
7 ball
7 tv
7 phone
8 phone
8 tv
from the above I would like to prepare below which basically perform groupby cust_id
and one hot encoding on the column product
.
Expected output:
product ball bat phone tv
cust_id
1 0 0 1 1
2 1 1 0 0
3 0 1 0 0
4 1 1 1 1
5 0 0 0 1
6 0 1 0 0
7 1 1 1 1
8 0 0 1 1
I have tried following code but that did not work.
try-1:
one_hot = pd.get_dummies(df.groupby('cust_id')['product'])
one_hot
try-2
df.pivot_table(index=['cust_id'],
columns='product',
values='product')
try-3
df1 = (df.sort_values(['cust_id', 'product'])
.groupby('cust_id')['product'].agg(list)
.reset_index(name='product_list')
)
df1
CodePudding user response:
pd.crosstab(df['cust_id'],df['product'])
product ball bat phone tv
cust_id
1 0 0 1 1
2 1 1 0 0
3 0 1 0 0
4 1 1 1 1
5 0 0 0 1
6 0 1 0 0
7 1 1 1 1
8 0 0 1 1
CodePudding user response:
Because hot encoding
alwayes return 0,1
as necessary aggregate max
:
one_hot = pd.get_dummies(df.set_index('cust_id')['product']).groupby(level=0).max()
And if count values per columns need clip
for maximal 1
value:
one_hot = pd.crosstab(df["cust_id"], df["product"]).clip(upper=1)
Sample:
print (df)
cust_id product
0 1 ball
1 1 ball
2 1 ball
3 1 phone
4 2 bat
5 2 ball
6 3 bat
7 4 ball
8 4 bat
9 4 bat
10 4 tv
11 4 phone
one_hot = pd.get_dummies(df.set_index('cust_id')['product']).groupby(level=0).max()
print (one_hot)
ball bat phone tv
cust_id
1 1 0 1 0
2 1 1 0 0
3 0 1 0 0
4 1 1 1 1
#count values - possible another integers like 0,1
print (pd.crosstab(df["cust_id"], df["product"]))
product ball bat phone tv
cust_id
1 3 0 1 0
2 1 1 0 0
3 0 1 0 0
4 1 2 1 1
one_hot = pd.crosstab(df["cust_id"], df["product"]).clip(upper=1)
print (one_hot)
product ball bat phone tv
cust_id
1 1 0 1 0
2 1 1 0 0
3 0 1 0 0
4 1 1 1 1
CodePudding user response:
Try:
x = pd.crosstab(df["cust_id"], df["product"])
print(x)
Prints:
product ball bat phone tv
cust_id
1 0 0 1 1
2 1 1 0 0
3 0 1 0 0
4 1 1 1 1
5 0 0 0 1
6 0 1 0 0
7 1 1 1 1
8 0 0 1 1