I have 2 dataframes, the first is dfweather(Datetimeindex is vdatetime):
vdatetime station date temp
2017-01-01 00:52:00 00:00 IAD 1/1/2017 0:52 45.00
2017-01-01 01:52:00 00:00 IAD 1/1/2017 1:52 45.00
2017-01-01 02:52:00 00:00 IAD 1/1/2017 2:52 45.00
2017-01-01 03:52:00 00:00 IAD 1/1/2017 3:52 44.10
2017-01-01 04:52:00 00:00 IAD 1/1/2017 4:52 42.10
The 2nd dataframe is dfload(Datetimeindex is vdatetime):
vdatetime date load
2017-01-01 00:00:00 00:00 1/1/2017 0:00 483.52698
2017-01-01 00:05:00 00:00 1/1/2017 0:05 480.69008
2017-01-01 00:10:00 00:00 1/1/2017 0:10 480.28880
2017-01-01 00:15:00 00:00 1/1/2017 0:15 478.72021
2017-01-01 00:20:00 00:00 1/1/2017 0:20 476.37618
As above dfweather only has weather data at fixed points name hourly:52 mins but the load dataframe has load data every 5 minutes. I'd like to bring in temp in dfweather to the closest time value in dfload (absolute value so either before or after), so for example if dfload time is 1/1/2017 at 0:20 and dfweather's closet value is 1/1/2017 0:02, then it brings in the weather at 0:02 at the dfload point at 0:20 on 1/1/2017. I'm not sure how to complete this.
CodePudding user response:
Try using merge_asof
:
pd.merge_asof(dfload, dfweather, on='vdatetime', direction='nearest', suffixes=('load', 'weather'))
vdatetime date_load load station date_weather temp
0 2017-01-01 00:00:00 00:00 1/1/2017 0:00 483.52698 IAD 1/1/2017 0:52 45.0
1 2017-01-01 00:05:00 00:00 1/1/2017 0:05 480.69008 IAD 1/1/2017 0:52 45.0
2 2017-01-01 00:10:00 00:00 1/1/2017 0:10 480.28880 IAD 1/1/2017 0:52 45.0
3 2017-01-01 00:15:00 00:00 1/1/2017 0:15 478.72021 IAD 1/1/2017 0:52 45.0
4 2017-01-01 00:20:00 00:00 1/1/2017 0:20 476.37618 IAD 1/1/2017 0:52 45.0