I have customer dataframe and it composes of customer id and product in each time.
customerID product
1 shirt style A
1 sky wing
2 sky wing
3 shirt style B
3 shirt style D
3 queen couple
4 big cup
5 small pool
5 sky wing
5 shirt style A
I want to transform from row to column by taking value to column name. The value in each is yes word.
customerID p_shirt style A p_sky wing p_shirt style B p_shirt style D p_queen couple p_big cup p_small pool
1 yes yes
2 yes
3 yes yes yes
4 yes
5 yes yes yes
CodePudding user response:
Try using pd.get_dummies
:
pd.get_dummies(df, prefix='p', columns=['product']).groupby('customerID').sum().replace({1: 'yes', 0: ''}).reset_index()
Output:
customerID p_big cup p_queen couple p_shirt style A p_shirt style B p_shirt style D p_sky wing p_small pool
0 1 yes yes
1 2 yes
2 3 yes yes yes
3 4 yes
4 5 yes yes yes
But in this case:
pd.get_dummies(df, prefix='p').groupby('customerID').sum().replace({1: 'yes', 0: ''}).reset_index()
Is enough, you don't need to specify column name here.
CodePudding user response:
You can use .str.get_dummies()
to get a dummy table. Add column label prefix p_
by .add_prefix()
. Then, group by customerID
and finally use .replace
to replace 1 with 'yes' and 0 with empty string, as folllows:
(df.set_index('customerID')['product'].str.get_dummies()
.add_prefix('p_')
.groupby('customerID').max()
.replace({1: 'yes', 0: ''})
.reset_index()
)
Result:
customerID p_big cup p_queen couple p_shirt style A p_shirt style B p_shirt style D p_sky wing p_small pool
0 1 yes yes
1 2 yes
2 3 yes yes yes
3 4 yes
4 5 yes yes yes