I have a data frame as shown below df:
cust_id products rec_product conf sup
1 ['phone', 'tv'] ball 0.68 0.12
1 ['phone', 'tv'] bat 0.21 0.34
1 ['phone', 'tv'] book 0.02 0.25
2 ['bat'] ball 0.97 0.18
2 ['bat'] book 0.65 0.65
2 ['bat'] phone 0.23 0.36
2 ['bat'] tv 0.03 0.48
Where I wants to combine rec_product
and conf
column as dictionary after performing groupby
Expected output:
cust_id products prod_conf prod_sup
1 ['phone', 'tv'] {'ball':0.68, 'bat':0.21, 'book':0.02} {'ball':0.12, 'bat':0.34, 'book':0.25}
2 ['bat'] {'ball':0.97, 'book':0.65, 'phone':0.23, 'tv':0.03} {'ball':0.18, 'book':0.65, 'phone':0.36, 'tv':0.48}
I tried below code it worked. But I would like to know is there any faster methods than this, which consume less memory and executes faster.
Combine rec_product
and conf
into one column
prod_conf_df = df.sort_values(['cust_id', 'conf'], ascending=[True,
False]).set_index('rec_product').groupby(['cust_id', 'products']).\
apply(lambda x: x['conf'].to_dict()).reset_index(name='prod_conf')
Combine rec_product
and sup
into one column
prod_sup_df = df.sort_values(['cust_id', 'conf'], ascending=[True,
False]).set_index('rec_product').groupby(['cust_id']).\
apply(lambda x: x['sup'].to_dict()).reset_index(name='prod_sup')
combine both the above dfs into one
combined_df = pd.merge(prod_conf_df, prod_supp_df, on='cust_id', how='inner')
CodePudding user response:
Instead of using multiple groupby
's apply
's..I would suggest doing all the aggregations using a single groupby
inside a comprehension
def dictify(k, g):
return {
'cust_id': k,
'products' : g['products'].iat[0],
'prod_conf': dict(zip(g['rec_product'], g['conf'])),
'prod_sup' : dict(zip(g['rec_product'], g['sup']))
}
s = df.sort_values(['cust_id', 'conf'], ascending=[True, False])
s = pd.DataFrame(dictify(k, g) for k, g in s.groupby('cust_id', sort=False))
Result
cust_id products prod_conf prod_sup
0 1 ['phone', 'tv'] {'ball': 0.68, 'bat': 0.21, 'book': 0.02} {'ball': 0.12, 'bat': 0.34, 'book': 0.25}
1 2 ['bat'] {'ball': 0.97, 'book': 0.65, 'phone': 0.23, 'tv': 0.03} {'ball': 0.18, 'book': 0.65, 'phone': 0.36, 'tv': 0.48}