Hi have the following Dataframe that contains sends and open totals df_send_open:
date user_id name send open
0 2022-03-31 35 sally 50 20
1 2022-03-31 47 bob 100 55
2 2022-03-31 01 john 500 102
3 2022-03-31 45 greg 47 20
4 2022-03-30 232 william 60 57
5 2022-03-30 147 mary 555 401
6 2022-03-30 35 sally 20 5
7 2022-03-29 41 keith 65 55
8 2022-03-29 147 mary 100 92
My other Dataframe contains calls and cancelled totals df_call_cancel:
date user_id name call cancel
0 2022-03-31 21 percy 54 21
1 2022-03-31 47 bob 150 21
2 2022-03-31 01 john 100 97
3 2022-03-31 45 greg 101 13
4 2022-03-30 232 william 61 55
5 2022-03-30 147 mary 5 3
6 2022-03-30 35 sally 13 5
7 2022-03-29 41 keith 14 7
8 2022-03-29 147 mary 102 90
Like a VLOOKUP in excel, i want to add the additional columns from df_call_cancel to df_send_open, however I need to do it on the unique combination of BOTH date and user_id and this is where i'm tripping up.
I have two desired Dataframes outcomes (not sure which to go forward with so thought i'd ask for both solutions):
Desired Dataframe 1:
date user_id name send open call cancel
0 2022-03-31 35 sally 50 20 0 0
1 2022-03-31 47 bob 100 55 150 21
2 2022-03-31 01 john 500 102 100 97
3 2022-03-31 45 greg 47 20 101 13
4 2022-03-30 232 william 60 57 61 55
5 2022-03-30 147 mary 555 401 5 3
6 2022-03-30 35 sally 20 5 13 5
7 2022-03-29 41 keith 65 55 14 7
8 2022-03-29 147 mary 100 92 102 90
Dataframe 1 only joins the call and cancel columns if the combination of date and user_id exists in df_send_open as this is the primary dataframe.
Desired Dataframe 2:
date user_id name send open call cancel
0 2022-03-31 35 sally 50 20 0 0
1 2022-03-31 47 bob 100 55 150 21
2 2022-03-31 01 john 500 102 100 97
3 2022-03-31 45 greg 47 20 101 13
4 2022-03-31 21 percy 0 0 54 21
5 2022-03-30 232 william 60 57 61 55
6 2022-03-30 147 mary 555 401 5 3
7 2022-03-30 35 sally 20 5 13 5
8 2022-03-29 41 keith 65 55 14 7
9 2022-03-29 147 mary 100 92 102 90
Dataframe 2 will do the same as df1 but will also add any new date and user combinations in df_call_cancel that isn't in df_send_open (see percy).
Many thanks.
CodePudding user response:
merged_df1 = df_send_open.merge(df_call_cancel, how='left', on=['date', 'user_id'])
merged_df2 = df_send_open.merge(df_call_cancel, how='outer', on=['date', 'user_id']).fillna(0)
This should work for your 2 cases, one left and one outer join.