Home > Back-end >  How to merge two dataframes on datetime, conditionally if date is the same?
How to merge two dataframes on datetime, conditionally if date is the same?

Time:03-16

I have two dataframes

df1:

       datetime                 value  date_only
324    2022-03-09 09:26:12.185  48     2022-03-09
626    2022-03-10 08:39:59.087  24     2022-03-10
923    2022-03-11 09:20:41.032  30     2022-03-11
1218   2022-03-12 09:54:11.748  44     2022-03-12

and df2:

      datetime                      amount  date_only
22    2022-03-08 09:53:12.040358    5.0     2022-03-08
313   2022-03-09 08:34:55.160470    16.0    2022-03-09
339   2022-03-09 10:39:25.053984    3.0     2022-03-09
623   2022-03-10 08:30:14.754819    7.0     2022-03-10
916   2022-03-11 08:51:41.312365    14.0    2022-03-11
1217  2022-03-12 09:40:40.462450    13.0    2022-03-12
1472  2022-03-13 07:52:02.369454    8.0     2022-03-13
1774  2022-03-14 07:58:23.352999    8.0     2022-03-14
2095  2022-03-15 08:50:33.755545    7.0     2022-03-15

I am trying to merge the two dataframes so that everything from the df1 is included, and only rows where df2 occurs on the same date (using date_only) are included, but are interspersed chronologically based on datetime (or index), as follows:

       datetime                     value   date_only    amount
313    2022-03-09 08:34:55.160470   null    2022-03-09   16.0
324    2022-03-09 09:26:12.185      48      2022-03-09   null
339    2022-03-09 10:39:25.053984   null    2022-03-09   3.0
623    2022-03-10 08:30:14.754819   null    2022-03-10   7.0
626    2022-03-10 08:39:59.087      24      2022-03-10   null
916    2022-03-11 08:51:41.312365   null    2022-03-11   14.0
923    2022-03-11 09:20:41.032      30      2022-03-11   null
1217   2022-03-12 09:40:40.462450   null    2022-03-12   13.0
1218   2022-03-12 09:54:11.748      44      2022-03-12   null

Note that rows from df2 where date was NOT in df1 (March 8/13/14/15) are not included in the merged df.

I have tried following this question with a similar topic but they are using cumcount() to achieve something else. I have also tried implementing my own merges, joins, concats, etc. to no avail, e.g.

test_merge = pd.merge(df1, df2, how='left', on='date_only', left_index=True, right_index=True)

Can anyone help me figure out the proper syntax for this merge? Or do I need some other pre-processing step before the merge (new column, etc.) to make it work? Thank you so much.

CodePudding user response:

IIUC:

out = pd.concat([df1, df2.loc[df2['date_only'].isin(df1['date_only'])]]).sort_index()
print(out)

# Output
                        datetime  value   date_only  amount
313   2022-03-09 08:34:55.160470    NaN  2022-03-09    16.0
324      2022-03-09 09:26:12.185   48.0  2022-03-09     NaN
339   2022-03-09 10:39:25.053984    NaN  2022-03-09     3.0
623   2022-03-10 08:30:14.754819    NaN  2022-03-10     7.0
626      2022-03-10 08:39:59.087   24.0  2022-03-10     NaN
916   2022-03-11 08:51:41.312365    NaN  2022-03-11    14.0
923      2022-03-11 09:20:41.032   30.0  2022-03-11     NaN
1217  2022-03-12 09:40:40.462450    NaN  2022-03-12    13.0
1218     2022-03-12 09:54:11.748   44.0  2022-03-12     NaN

CodePudding user response:

IIUC, you do not need a merge. Just concat your data after filtering the rows of df2 by dates in df1 and sort_values:

out = (pd
 .concat([df1, df2[df2['date_only'].isin(df1['date_only'])]])
 .sort_values(by=['datetime'])
)

output:

                        datetime  value   date_only  amount
313   2022-03-09 08:34:55.160470    NaN  2022-03-09    16.0
324      2022-03-09 09:26:12.185   48.0  2022-03-09     NaN
339   2022-03-09 10:39:25.053984    NaN  2022-03-09     3.0
623   2022-03-10 08:30:14.754819    NaN  2022-03-10     7.0
626      2022-03-10 08:39:59.087   24.0  2022-03-10     NaN
916   2022-03-11 08:51:41.312365    NaN  2022-03-11    14.0
923      2022-03-11 09:20:41.032   30.0  2022-03-11     NaN
1217  2022-03-12 09:40:40.462450    NaN  2022-03-12    13.0
1218     2022-03-12 09:54:11.748   44.0  2022-03-12     NaN

  • Related