Home > Software engineering >  Pandas: Joining two Dataframes based on two criteria matches
Pandas: Joining two Dataframes based on two criteria matches

Time:04-13

Hi have the following Dataframe that contains sends and open totals df_send_open:

   date        user_id   name    send   open
0  2022-03-31  35        sally   50     20
1  2022-03-31  47        bob     100    55
2  2022-03-31  01        john    500    102
3  2022-03-31  45        greg    47     20
4  2022-03-30  232       william 60     57
5  2022-03-30  147       mary    555    401
6  2022-03-30  35        sally   20     5
7  2022-03-29  41        keith   65     55
8  2022-03-29  147       mary    100    92

My other Dataframe contains calls and cancelled totals df_call_cancel:

   date        user_id   name    call   cancel
0  2022-03-31  21        percy   54     21
1  2022-03-31  47        bob     150    21
2  2022-03-31  01        john    100    97
3  2022-03-31  45        greg    101    13
4  2022-03-30  232       william 61     55
5  2022-03-30  147       mary    5      3
6  2022-03-30  35        sally   13     5
7  2022-03-29  41        keith   14     7
8  2022-03-29  147       mary    102    90

Like a VLOOKUP in excel, i want to add the additional columns from df_call_cancel to df_send_open, however I need to do it on the unique combination of BOTH date and user_id and this is where i'm tripping up.

I have two desired Dataframes outcomes (not sure which to go forward with so thought i'd ask for both solutions):

Desired Dataframe 1:

   date        user_id  name    send   open  call  cancel
0  2022-03-31  35       sally   50     20    0     0
1  2022-03-31  47       bob     100    55    150   21
2  2022-03-31  01       john    500    102   100   97
3  2022-03-31  45       greg    47     20    101   13
4  2022-03-30  232      william 60     57    61    55
5  2022-03-30  147      mary    555    401   5     3
6  2022-03-30  35       sally   20     5     13    5 
7  2022-03-29  41       keith   65     55    14    7
8  2022-03-29  147      mary    100    92    102   90

Dataframe 1 only joins the call and cancel columns if the combination of date and user_id exists in df_send_open as this is the primary dataframe.

Desired Dataframe 2:

   date        user_id  name    send   open  call  cancel
0  2022-03-31  35       sally   50     20    0     0
1  2022-03-31  47       bob     100    55    150   21
2  2022-03-31  01       john    500    102   100   97
3  2022-03-31  45       greg    47     20    101   13
4  2022-03-31  21       percy   0      0     54    21
5  2022-03-30  232      william 60     57    61    55
6  2022-03-30  147      mary    555    401   5     3
7  2022-03-30  35       sally   20     5     13    5 
8  2022-03-29  41       keith   65     55    14    7
9  2022-03-29  147      mary    100    92    102   90

Dataframe 2 will do the same as df1 but will also add any new date and user combinations in df_call_cancel that isn't in df_send_open (see percy).

Many thanks.

CodePudding user response:

merged_df1 = df_send_open.merge(df_call_cancel, how='left', on=['date', 'user_id'])

merged_df2 = df_send_open.merge(df_call_cancel, how='outer', on=['date', 'user_id']).fillna(0)

This should work for your 2 cases, one left and one outer join.

  • Related