Be the following DataFrames in python pandas:
| date | counter |
|-----------------------------|------------------|
| 2022-01-01 10:00:02 00:00 | 34 |
| 2022-01-03 11:03:02 00:00 | 23 |
| 2022-02-01 12:00:05 00:00 | 12 |
| 2022-03-01 21:04:02 00:00 | 7 |
| date | holiday |
|-----------------------------|------------------|
| 2022-01-01 | True |
| 2022-01-02 | False |
| 2022-01-03 | True |
| 2022-02-01 | True |
| 2022-02-02 | True |
| 2022-02-03 | True |
| 2022-03-01 | False |
| 2022-03-02 | True |
| 2022-03-03 | False |
How could I merge both DataFrames taking into account that I don't want to create an auxiliary column with the date?
| date | counter | holiday |
|-----------------------------|------------------|--------------|
| 2022-01-01 10:00:02 00:00 | 34 | True |
| 2022-01-03 11:03:02 00:00 | 23 | True |
| 2022-02-01 12:00:05 00:00 | 12 | True |
| 2022-03-01 21:04:02 00:00 | 7 | False |
Thank you for your help in advance.
CodePudding user response:
Use Series.map
with datetimes without times by Series.dt.normalize
- then helper column is not created in df2
output:
df2['holiday'] = df2['date'].dt.normalize().map(df1.set_index('date')['holiday'])
Another idea with merge_asof
, but for avoid error need remove timezones by Series.dt.tz_convert
:
df = pd.merge_asof(df1.assign(date = df1['date'].dt.tz_convert(None)).sort_values('date'),
df2, on='date')
print (df)
date counter holiday
0 2022-01-01 10:00:02 34 True
1 2022-01-03 11:03:02 23 True
2 2022-02-01 12:00:05 12 True
3 2022-03-01 21:04:02 7 False