Home > Software design >  Combine two columns into one column by making them into a dictionary - pandas - groupby
Combine two columns into one column by making them into a dictionary - pandas - groupby

Time:08-31

I have A data frame as shown below

df

cust_id   product       score
1         bat           0.8
2         ball          0.3
2         phone         0.6
3         tv            1.0
2         bat           1.0
4         phone         0.2
1         ball          0.6 

From the above I would like to prepare below data frame.

Note: the dictionary should be sorted based on the value of score in descending order

Expected Output:

cust_id   product_dict       
1         {'bat': 0.8, 'ball':0.6}         
2         {'bat': 1.0, 'phone':0.6, 'ball':0.3}         
3         {'tv': 1.0}
4         {'phone':0.2}
   

I tried below code but did not work.

s = df.groupby(['cust_id','product'])['score'].apply(list)

d = {x: s.xs(x).to_dict() for x in s.index.levels[0]}

df1 = df.groupby('cust_id').agg(num_of_products=('product','nunique'))

df1.insert(2, 'product_dict', df1.index.map(d))
df1 = df1.reset_index()

CodePudding user response:

You can try:

df.set_index('product')
      .groupby('cust_id')
   .apply(lambda x: x['score'].to_dict())
   .reset_index(name='product_dict')
)

Output:

   cust_id                             product_dict
0        1                {'bat': 0.8, 'ball': 0.6}
1        2  {'ball': 0.3, 'phone': 0.6, 'bat': 1.0}
2        3                              {'tv': 1.0}
3        4                           {'phone': 0.2}

Note: to get the correct order, you can just sort the data before groupby:

(df.sort_values('score', ascending=False)
   .set_index('product')
   .groupby('cust_id')
   .apply(lambda x: x['score'].to_dict())
   .reset_index(name='product_dict')
)

Output:

   cust_id                             product_dict
0        1                {'bat': 0.8, 'ball': 0.6}
1        2  {'bat': 1.0, 'phone': 0.6, 'ball': 0.3}
2        3                              {'tv': 1.0}
3        4                           {'phone': 0.2}

CodePudding user response:

Another possible solution:

(df.groupby('cust_id')
 .apply(lambda g: dict(zip(g['product'], g['score'])))
 .reset_index()
 .rename({0: "product_dict"}, axis=1)
 )

Output:

   cust_id                             product_dict
0        1                {'bat': 0.8, 'ball': 0.6}
1        2  {'ball': 0.3, 'phone': 0.6, 'bat': 1.0}
2        3                              {'tv': 1.0}
3        4                           {'phone': 0.2}

In case sorting by score is needed, we can do as follows:

(df.sort_values('score', ascending=False)
 .groupby('cust_id')
 .apply(lambda g: dict(zip(g['product'], g['score'])))
 .reset_index()
 .rename({0: "product_dict"}, axis=1)
 )

Output:

   cust_id                             product_dict
0        1                {'bat': 0.8, 'ball': 0.6}
1        2  {'bat': 1.0, 'phone': 0.6, 'ball': 0.3}
2        3                              {'tv': 1.0}
3        4                           {'phone': 0.2}
  • Related