Home > Back-end >  How can I group by below table from Customer ID and Product Code and get them to one row?
How can I group by below table from Customer ID and Product Code and get them to one row?

Time:05-07

How can I group by below table from Customer ID and Product Code and get them to one row as below using Python?

Customer ID Product Code Days since the last transaction
A 1 10
A 1 23
A 1 7
A 2 8
A 2 9
A 3 6
B 1 18
B 2 4
B 3 4
B 3 12
C 2 27
C 2 15

Need to get below table by grouping them by Customer ID and Product Code.

Customer ID Product Code D1 D2 D3
A 1 10 23 7
A 2 8 9 N/A
A 3 6 N/A N/A
B 1 18 N/A N/A
B 2 4 N/A N/A
B 3 4 12 N/A
C 2 27 15 N/A

CodePudding user response:

df[''] = df.groupby(['Customer ID', 'Product Code']).cumcount()
df = df.pivot(index=['Customer ID', 'Product Code'], columns='')
print(df)

Output:

                         Days since the last transaction
                                                       0     1    2
Customer ID Product Code
A           1                                       10.0  23.0  7.0
            2                                        8.0   9.0  NaN
            3                                        6.0   NaN  NaN
B           1                                       18.0   NaN  NaN
            2                                        4.0   NaN  NaN
            3                                        4.0  12.0  NaN
C           2                                       27.0  15.0  NaN

CodePudding user response:

Below python code also worked for me.

#keep only the needed data
grouped = df.groupby(['Customer_ID','Product Code'], as_index=False).agg({"Days since the last transaction": lambda x: x.tolist()[:3] [x.iat[-1]]}).explode("Days since the last transaction")

#get the count for the age columns
grouped["idx"] = grouped.groupby(['Customer_ID','Product Code']).cumcount().add(1)

#pivot to get the required structure
output = grouped.pivot(["Customer_ID","Product Code"],"idx","Days since the last transaction").add_prefix("Days since the last transaction").reset_index().rename_axis(None, axis=1)

output.head()
  • Related