The dataset is shows a transaction id and multiple binary flags of which department the product is associated with. Transaction is not necessarily a unique id because one transaction can have multiple items from different departments.
- Example
import pandas as pd
d = {'Trans_ID': [149857724, 149857724, 394875934, 16544562, 16544562], 'Item': ['Windex','Car Horn', 'Steering Wheel', 'Horse Feed', 'Bleech'
],'Cleaning_Supplies_Dept_Flag': [1, 0, 0, 0, 1],
'Automobile_Parts_Dept_Flag': [0, 1, 1, 0, 0], 'Horse_Supplies_Dept_Flag': [0, 0, 0, 1, 0]}
ex = pd.DataFrame(data=d)
ex
Essentially, My goal is drop the Item name but keep the flags
- Final Product
d = {'Trans_ID': [149857724, 394875934, 16544562], 'Cleaning_Supplies_Dept_Flag': [1, 0, 1],
'Automobile_Parts_Dept_Flag': [1, 0,1],
'Horse_Supplies_Dept_Flag': [0,0,1]}
result = pd.DataFrame(data=d)
result
I have tried transposing, squeezing, stacking, and melting the dataframe; however, I am unable to get it the result format below.
CodePudding user response:
#groupby and sum
out=ex.groupby('Trans_ID').sum().reset_index()
Trans_ID Cleaning_Supplies_Dept_Flag Automobile_Parts_Dept_Flag Horse_Supplies_Dept_Flag
0 16544562 1 0 1
1 149857724 1 1 0
2 394875934 0 1 0
CodePudding user response:
You might want:
ex.groupby('Trans_ID', as_index=False, sort=False).max()
However the value for 16544562
/Automobile_Parts_Dept_Flag
is different. Is this a mistake? Else, please explain the logic.
output:
Trans_ID Cleaning_Supplies_Dept_Flag Automobile_Parts_Dept_Flag Horse_Supplies_Dept_Flag
0 149857724 1 1 0
1 394875934 0 1 0
2 16544562 1 0 1