I need to be able to find the values from one df in another by looking at each value per column per row.
Basically I have this df which is my main_df (note this is a mock up of the data in reality I have hundreds of columns and thousands of rows):
user_id | fav_fruit | fav_veg | basket |
---|---|---|---|
1 | apple | potato | apple |
2 | pear | potato | fruit |
3 | banana | carrot | fruit |
4 | apple | broccoli | carrot |
And I have another df whihch is my auxiliary_df:
category | answer | value |
---|---|---|
fav_fruit | apple | 0.5 |
fav_fruit | pear | 0.5 |
fav_fruit | banana | 0.8 |
fav_veg | potato | 0.7 |
fav_veg | carrot | 1 |
fav_veg | broccoli | 1 |
basket | apple | 3 |
basket | fruit | 5 |
basket | carrot | 3 |
I need to sear for each user id (row loop) find the column name in the auxiliary_df - category column, and within the answers in that category map what i have in my main_df as value with the auxiliary_df to get the value of that table so at the end I get the following:
user_id | fav_fruit | fav_veg | basket |
---|---|---|---|
1 | 0.5 | 0.7 | 3 |
2 | 0.5 | 0.7 | 5 |
3 | 0.8 | 1 | 5 |
4 | 0.5 | 1 | 3 |
So at this point I repalced all the values from my main_df with the numerical values from the auxiliary_df. I cannot ma just by answer since many answers are repeated across categories so unsure how t map it properly. I dont know how to do this, I tried to do a dictionary to map but couldnt make it work to get the correct dection of category with answer and value. Also, I cannot hardcode anything since is hundreds of columns and values that change regularly.
CodePudding user response:
You can .pivot()
the auxiliary_df
and .map()
:
tmp = auxiliary_df.pivot(index='answer', columns='category', values='value')
main_df = main_df.set_index('user_id')
for c in main_df:
main_df[c] = main_df[c].map(tmp[c])
print(main_df.reset_index())
Prints:
user_id fav_fruit fav_veg basket
0 1 0.5 0.7 3.0
1 2 0.5 0.7 5.0
2 3 0.8 1.0 5.0
3 4 0.5 1.0 3.0