Home > OS >  Create a column containing list of products that each customer has using groupby in pandas on custom
Create a column containing list of products that each customer has using groupby in pandas on custom

Time:09-08

I have a data frame as shown below

df

cust_id         product
1               tv
1               phone
2               bat
2               ball
3               bat
4               ball
4               bat
4               tv
4               phone
5               tv
6               bat
7               bat
7               ball
7               tv
8               phone
8               tv

from the above I would like to prepare below data frame as shown below. Where product_list column consist of list of products the elements in the list should be sorted in acsending order.

Expected output:

cust_id           product_list
1                 ['phone', 'tv']
2                 ['ball', 'bat']
3                 ['bat']
4                 ['ball', 'bat', 'phone', 'tv']
5                 ['tv']
6                 ['bat']
7                 ['ball', 'bat', 'phone', 'tv']       
8                 ['phone', 'tv']

I tried below code

df1 = df.groupby('cust_id').agg(product_list=('product','unique')).reset_index()
df1

    cust_id   product_list
0   1         [tv, phone]
1   2         [bat, ball]
2   3         [bat]
3   4         [ball, bat, tv, phone]
4   5         [tv]
5   6         [bat]
6   7         [bat, ball, tv]
7   8         [phone, tv]

But which is not exactly what I want.

I tried below code as well

s = df.groupby('cust_id')['product'].apply(list).reset_index()
s.rename({'product':'product_list'}, axis=1, inplace=True)
s

what I got is as shown below

cust_id product_list
0   1   [tv, phone]
1   2   [bat, ball]
2   3   [bat]
3   4   [ball, bat, tv, phone]
4   5   [tv]
5   6   [bat]
6   7   [bat, ball, tv]
7   8   [phone, tv]

CodePudding user response:

IIUC you need the values in quotes. here is one way to do it

df.groupby('cust_id')['product'].apply(lambda x: [', '.join("'"   item   "'" for item in sorted(x))]).reset_index()

 
    cust_id     product
0         1     ['phone', 'tv']
1         2     ['ball', 'bat']
2         3     ['bat']
3         4     ['ball', 'bat', 'phone', 'tv']
4         5     ['tv']
5         6     ['bat']
6         7     ['ball', 'bat', 'tv']
7         8     ['phone', 'tv']

CodePudding user response:

Sort the data before aggregation:

df1 = (df.sort_values(['cust_id', 'product'])
         .groupby('cust_id')['product'].agg(list)
         .reset_index(name='product_list')
      )

Output:

   cust_id            product_list
0        1             [phone, tv]
1        2             [ball, bat]
2        3                   [bat]
3        4  [ball, bat, phone, tv]
4        5                    [tv]
5        6                   [bat]
6        7         [ball, bat, tv]
7        8             [phone, tv]
  • Related