I am trying to overlap timestamp from df2 with df1 time window. When ever there is no match I am getting the following error. how can I get the output with out the following error?
Error
KeyError: "[Timestamp('2022-01-01 03:12:02')] not in index"
input
from datetime import datetime, date
import pandas as pd
df1 = pd.DataFrame({'id': ['aa0', 'aa1', 'aa2', 'aa3'],
'number': [1, 2, 2, 1],
'color': ['blue', 'red', 'yellow', "green"],
'date1': [datetime(2022,1,1,1,1,1),
datetime(2022,1,1,2,4,1),
datetime(2022,1,1,3,8,1),
datetime(2022,1,1,4,12,1)],
'date2': [datetime(2022,1,1,2,1,1),
datetime(2022,1,1,3,6,1),
datetime(2022,1,1,3,10,1),
datetime(2022,1,1,4,14,1)] })
input2
df2 = pd.DataFrame({'id': ['A', 'B', 'C', 'D'],
'value': [10,20,30,40],
'date': [datetime(2022,1,1,1,12,1),
datetime(2022,1,1,1,40,1),
datetime(2022,1,1,3,12,2),
datetime(2022,1,1,4,12,2)] })
Expected output
(2022-01-01 01:01:01, 2022-01-01 02:01:01] 15.0
(2022-01-01 04:12:01, 2022-01-01 04:14:01] 40.0
Code
idx = pd.IntervalIndex.from_arrays(pd.to_datetime(df1['date1']),
pd.to_datetime(df1['date2']))
mapper = pd.Series(idx, index=idx)
df2.groupby(mapper[pd.to_datetime(df2['date'])].values)['value'].mean()
CodePudding user response:
I think I figured it out. It is not the best but works.
df1['date'] = pd.to_datetime(df1['date1']).dt.date
df2['date'] = pd.to_datetime(df2['dates']).dt.date
df3 = pd.merge(df1, df2, on=['date'], how='left')
mask = (df3['dates'] > df3['date1']) & (df3['dates'] < df3['date2'])
df4 = df3.loc[mask]
df4.groupby(['date1', 'date2'])['value'].mean()
CodePudding user response:
One option is with conditional_join from pyjanitor, which solves inequality joins such as this:
# pip install pyjanitor
import pandas as pd
import janitor
df1['date1'] = pd.to_datetime(df1['date1'])
df1['date2'] = pd.to_datetime(df1['date2'])
df2['date'] = pd.to_datetime(df2['date'])
(
df1
.filter(like='date')
.conditional_join(
df2.filter(['value', 'date']),
('date1', 'date', '<='),
('date2', 'date', '>='))
.groupby(['date1', 'date2'])
.value
.mean()
)
date1 date2
2022-01-01 01:01:01 2022-01-01 02:01:01 15.0
2022-01-01 04:12:01 2022-01-01 04:14:01 40.0
Name: value, dtype: float64