Say I have two pandas time-series dataframes:
a = pd.DataFrame([1,2,3,4,5,6,7,8,9],index=pd.date_range('2020-01-01', periods=9, freq='10min'))
b = pd.DataFrame(['a','b','c'],index=pd.date_range('2020-01-01', periods=3, freq='30min'))
Is there a way to vectorize the following algorithm, using pandas?
next_b = []
for row in a.itertuples():
df = b[b.index >= row.Index]
if len(df.index):
next_b.append(df.index[0])
else:
next_b.append(None)
a['next_b'] = next_b
output:
0 next_b
2020-01-01 00:00:00 1 2020-01-01 00:00:00
2020-01-01 00:10:00 2 2020-01-01 00:30:00
2020-01-01 00:20:00 3 2020-01-01 00:30:00
2020-01-01 00:30:00 4 2020-01-01 00:30:00
2020-01-01 00:40:00 5 2020-01-01 01:00:00
2020-01-01 00:50:00 6 2020-01-01 01:00:00
2020-01-01 01:00:00 7 2020-01-01 01:00:00
2020-01-01 01:10:00 8 NaT
2020-01-01 01:20:00 9 NaT
CodePudding user response:
with merge_asof
, you can do
res = pd.merge_asof(a, b.assign(next_b=b.index),
left_index=True, right_index=True,
direction='forward', suffixes=('','_b'))
print(res)
# 0 0_b next_b
# 2020-01-01 00:00:00 1 a 2020-01-01 00:00:00
# 2020-01-01 00:10:00 2 b 2020-01-01 00:30:00
# 2020-01-01 00:20:00 3 b 2020-01-01 00:30:00
# 2020-01-01 00:30:00 4 b 2020-01-01 00:30:00
# 2020-01-01 00:40:00 5 c 2020-01-01 01:00:00
# 2020-01-01 00:50:00 6 c 2020-01-01 01:00:00
# 2020-01-01 01:00:00 7 c 2020-01-01 01:00:00
# 2020-01-01 01:10:00 8 NaN NaT
# 2020-01-01 01:20:00 9 NaN NaT