I have two dataframes and like to add a column to the first one with volume information from the second one. The first dataframe should have a volume column with the sum of the volume from the second dataframe determined between the index rows of the first dataframe. Can someone point me in the right direction please? So 507 in the example would be 260 61 186 from the other dataframe(*) between the corresponding times.
Thanks!!
dataframe 1
adjclose
datetime
2022-10-20 02:55:00 30492.0
2022-10-20 03:10:00 30402.0
2022-10-20 03:20:00 30457.0
2022-10-20 03:45:00 30376.0
2022-10-20 04:05:00 30425.0
2022-10-20 04:10:00 30373.0
2022-10-20 04:45:00 30436.0
2022-10-20 04:55:00 30401.0
2022-10-20 05:25:00 30437.0
Dataframe 2
volume
datetime
2022-10-20 02:40:00 119.0
2022-10-20 02:45:00 210.0
2022-10-20 02:50:00 179.0
2022-10-20 02:55:00 260.0*
2022-10-20 03:00:00 61.0*
2022-10-20 03:05:00 186.0*
2022-10-20 03:10:00 204.0
2022-10-20 03:15:00 333.0
2022-10-20 03:20:00 122.0
2022-10-20 03:25:00 75.0
Desired result
adjclose volume
datetime
2022-10-20 02:35:00 30433.0 507
2022-10-20 02:55:00 30492.0 etc
2022-10-20 03:10:00 30402.0 etc
2022-10-20 03:20:00 30457.0
2022-10-20 03:45:00 30376.0
2022-10-20 04:05:00 30425.0
2022-10-20 04:10:00 30373.0
2022-10-20 04:45:00 30436.0
2022-10-20 04:55:00 30401.0
2022-10-20 05:25:00 30437.0
CodePudding user response:
You can use a merge_asof
:
df1.index = pd.to_datetime(df1.index)
df2.index = pd.to_datetime(df2.index)
s = pd.Series(range(len(df1)), index=df1.index, name='id')
df1['volume'] = s.map(
pd.merge_asof(df2, df1.assign(id=s)['id'],
left_index=True, right_index=True)
.groupby('id').sum()['volume']
)
NB. ensure df1 and df2's indexes are sorted in ascending order.
output:
adjclose volume
datetime
2022-10-20 02:55:00 30492.0 507.0
2022-10-20 03:10:00 30402.0 537.0
2022-10-20 03:20:00 30457.0 197.0
2022-10-20 03:45:00 30376.0 NaN
2022-10-20 04:05:00 30425.0 NaN
2022-10-20 04:10:00 30373.0 NaN
2022-10-20 04:45:00 30436.0 NaN
2022-10-20 04:55:00 30401.0 NaN
2022-10-20 05:25:00 30437.0 NaN
CodePudding user response:
You ca use pandas DataFrame.join
Join columns of another DataFrame.
Join columns with other DataFrame either on index or on a key column. Efficiently join multiple DataFrame objects by index at once by passing a list.
Here is the documentation with some examples