Home > Enterprise >  Perform a merge by date field without creating an auxiliary column in the DataFrame
Perform a merge by date field without creating an auxiliary column in the DataFrame

Time:05-13

Be the following DataFrames in python pandas:

|      date                   |     counter      |
|-----------------------------|------------------|
|  2022-01-01 10:00:02 00:00  |         34       |
|  2022-01-03 11:03:02 00:00  |         23       |
|  2022-02-01 12:00:05 00:00  |         12       |
|  2022-03-01 21:04:02 00:00  |         7        |
|      date                   |     holiday      |
|-----------------------------|------------------|
|  2022-01-01                 |       True       |
|  2022-01-02                 |       False      |
|  2022-01-03                 |       True       |
|  2022-02-01                 |       True       |
|  2022-02-02                 |       True       |
|  2022-02-03                 |       True       |
|  2022-03-01                 |       False      |
|  2022-03-02                 |       True       |
|  2022-03-03                 |       False      |

How could I merge both DataFrames taking into account that I don't want to create an auxiliary column with the date?

|      date                   |     counter      |   holiday    |
|-----------------------------|------------------|--------------|
|  2022-01-01 10:00:02 00:00  |         34       |    True      |
|  2022-01-03 11:03:02 00:00  |         23       |    True      |
|  2022-02-01 12:00:05 00:00  |         12       |    True      |
|  2022-03-01 21:04:02 00:00  |         7        |    False     |

Thank you for your help in advance.

CodePudding user response:

Use Series.map with datetimes without times by Series.dt.normalize - then helper column is not created in df2 output:

df2['holiday'] = df2['date'].dt.normalize().map(df1.set_index('date')['holiday'])

Another idea with merge_asof, but for avoid error need remove timezones by Series.dt.tz_convert:

df = pd.merge_asof(df1.assign(date = df1['date'].dt.tz_convert(None)).sort_values('date'), 
                   df2, on='date')
print (df)
                 date  counter  holiday
0 2022-01-01 10:00:02       34     True
1 2022-01-03 11:03:02       23     True
2 2022-02-01 12:00:05       12     True
3 2022-03-01 21:04:02        7    False
  • Related