I have two dataframes
df1:
datetime value date_only
324 2022-03-09 09:26:12.185 48 2022-03-09
626 2022-03-10 08:39:59.087 24 2022-03-10
923 2022-03-11 09:20:41.032 30 2022-03-11
1218 2022-03-12 09:54:11.748 44 2022-03-12
and df2:
datetime amount date_only
22 2022-03-08 09:53:12.040358 5.0 2022-03-08
313 2022-03-09 08:34:55.160470 16.0 2022-03-09
339 2022-03-09 10:39:25.053984 3.0 2022-03-09
623 2022-03-10 08:30:14.754819 7.0 2022-03-10
916 2022-03-11 08:51:41.312365 14.0 2022-03-11
1217 2022-03-12 09:40:40.462450 13.0 2022-03-12
1472 2022-03-13 07:52:02.369454 8.0 2022-03-13
1774 2022-03-14 07:58:23.352999 8.0 2022-03-14
2095 2022-03-15 08:50:33.755545 7.0 2022-03-15
I am trying to merge the two dataframes so that everything from the df1 is included, and only rows where df2 occurs on the same date (using date_only) are included, but are interspersed chronologically based on datetime (or index), as follows:
datetime value date_only amount
313 2022-03-09 08:34:55.160470 null 2022-03-09 16.0
324 2022-03-09 09:26:12.185 48 2022-03-09 null
339 2022-03-09 10:39:25.053984 null 2022-03-09 3.0
623 2022-03-10 08:30:14.754819 null 2022-03-10 7.0
626 2022-03-10 08:39:59.087 24 2022-03-10 null
916 2022-03-11 08:51:41.312365 null 2022-03-11 14.0
923 2022-03-11 09:20:41.032 30 2022-03-11 null
1217 2022-03-12 09:40:40.462450 null 2022-03-12 13.0
1218 2022-03-12 09:54:11.748 44 2022-03-12 null
Note that rows from df2 where date was NOT in df1 (March 8/13/14/15) are not included in the merged df.
I have tried following this question with a similar topic but they are using cumcount() to achieve something else. I have also tried implementing my own merges, joins, concats, etc. to no avail, e.g.
test_merge = pd.merge(df1, df2, how='left', on='date_only', left_index=True, right_index=True)
Can anyone help me figure out the proper syntax for this merge? Or do I need some other pre-processing step before the merge (new column, etc.) to make it work? Thank you so much.
CodePudding user response:
IIUC:
out = pd.concat([df1, df2.loc[df2['date_only'].isin(df1['date_only'])]]).sort_index()
print(out)
# Output
datetime value date_only amount
313 2022-03-09 08:34:55.160470 NaN 2022-03-09 16.0
324 2022-03-09 09:26:12.185 48.0 2022-03-09 NaN
339 2022-03-09 10:39:25.053984 NaN 2022-03-09 3.0
623 2022-03-10 08:30:14.754819 NaN 2022-03-10 7.0
626 2022-03-10 08:39:59.087 24.0 2022-03-10 NaN
916 2022-03-11 08:51:41.312365 NaN 2022-03-11 14.0
923 2022-03-11 09:20:41.032 30.0 2022-03-11 NaN
1217 2022-03-12 09:40:40.462450 NaN 2022-03-12 13.0
1218 2022-03-12 09:54:11.748 44.0 2022-03-12 NaN
CodePudding user response:
IIUC, you do not need a merge. Just concat
your data after filtering the rows of df2 by dates in df1 and sort_values
:
out = (pd
.concat([df1, df2[df2['date_only'].isin(df1['date_only'])]])
.sort_values(by=['datetime'])
)
output:
datetime value date_only amount
313 2022-03-09 08:34:55.160470 NaN 2022-03-09 16.0
324 2022-03-09 09:26:12.185 48.0 2022-03-09 NaN
339 2022-03-09 10:39:25.053984 NaN 2022-03-09 3.0
623 2022-03-10 08:30:14.754819 NaN 2022-03-10 7.0
626 2022-03-10 08:39:59.087 24.0 2022-03-10 NaN
916 2022-03-11 08:51:41.312365 NaN 2022-03-11 14.0
923 2022-03-11 09:20:41.032 30.0 2022-03-11 NaN
1217 2022-03-12 09:40:40.462450 NaN 2022-03-12 13.0
1218 2022-03-12 09:54:11.748 44.0 2022-03-12 NaN