I have a data frame as shown below
df
cust_id product
1 tv
1 phone
2 bat
2 ball
3 bat
4 ball
4 bat
4 tv
4 phone
5 tv
6 bat
7 bat
7 ball
7 tv
8 phone
8 tv
from the above I would like to prepare below data frame as shown below. Where product_list
column consist of list of products the elements in the list should be sorted in acsending order.
Expected output:
cust_id product_list
1 ['phone', 'tv']
2 ['ball', 'bat']
3 ['bat']
4 ['ball', 'bat', 'phone', 'tv']
5 ['tv']
6 ['bat']
7 ['ball', 'bat', 'phone', 'tv']
8 ['phone', 'tv']
I tried below code
df1 = df.groupby('cust_id').agg(product_list=('product','unique')).reset_index()
df1
cust_id product_list
0 1 [tv, phone]
1 2 [bat, ball]
2 3 [bat]
3 4 [ball, bat, tv, phone]
4 5 [tv]
5 6 [bat]
6 7 [bat, ball, tv]
7 8 [phone, tv]
But which is not exactly what I want.
I tried below code as well
s = df.groupby('cust_id')['product'].apply(list).reset_index()
s.rename({'product':'product_list'}, axis=1, inplace=True)
s
what I got is as shown below
cust_id product_list
0 1 [tv, phone]
1 2 [bat, ball]
2 3 [bat]
3 4 [ball, bat, tv, phone]
4 5 [tv]
5 6 [bat]
6 7 [bat, ball, tv]
7 8 [phone, tv]
CodePudding user response:
IIUC you need the values in quotes. here is one way to do it
df.groupby('cust_id')['product'].apply(lambda x: [', '.join("'" item "'" for item in sorted(x))]).reset_index()
cust_id product
0 1 ['phone', 'tv']
1 2 ['ball', 'bat']
2 3 ['bat']
3 4 ['ball', 'bat', 'phone', 'tv']
4 5 ['tv']
5 6 ['bat']
6 7 ['ball', 'bat', 'tv']
7 8 ['phone', 'tv']
CodePudding user response:
Sort the data before aggregation:
df1 = (df.sort_values(['cust_id', 'product'])
.groupby('cust_id')['product'].agg(list)
.reset_index(name='product_list')
)
Output:
cust_id product_list
0 1 [phone, tv]
1 2 [ball, bat]
2 3 [bat]
3 4 [ball, bat, phone, tv]
4 5 [tv]
5 6 [bat]
6 7 [ball, bat, tv]
7 8 [phone, tv]