Home > Software engineering >  Combining time series data frames
Combining time series data frames

Time:10-21

I have two dataframes and like to add a column to the first one with volume information from the second one. The first dataframe should have a volume column with the sum of the volume from the second dataframe determined between the index rows of the first dataframe. Can someone point me in the right direction please? So 507 in the example would be 260 61 186 from the other dataframe(*) between the corresponding times.

Thanks!!

dataframe 1

                    adjclose   
datetime    
2022-10-20 02:55:00 30492.0    
2022-10-20 03:10:00 30402.0    
2022-10-20 03:20:00 30457.0    
2022-10-20 03:45:00 30376.0
2022-10-20 04:05:00 30425.0
2022-10-20 04:10:00 30373.0
2022-10-20 04:45:00 30436.0
2022-10-20 04:55:00 30401.0
2022-10-20 05:25:00 30437.0

Dataframe 2

                    volume
datetime    
2022-10-20 02:40:00 119.0
2022-10-20 02:45:00 210.0
2022-10-20 02:50:00 179.0
2022-10-20 02:55:00 260.0*
2022-10-20 03:00:00 61.0*
2022-10-20 03:05:00 186.0*
2022-10-20 03:10:00 204.0
2022-10-20 03:15:00 333.0
2022-10-20 03:20:00 122.0
2022-10-20 03:25:00 75.0

Desired result

                    adjclose volume
datetime    
2022-10-20 02:35:00 30433.0   507
2022-10-20 02:55:00 30492.0   etc
2022-10-20 03:10:00 30402.0   etc
2022-10-20 03:20:00 30457.0
2022-10-20 03:45:00 30376.0
2022-10-20 04:05:00 30425.0
2022-10-20 04:10:00 30373.0
2022-10-20 04:45:00 30436.0
2022-10-20 04:55:00 30401.0
2022-10-20 05:25:00 30437.0

CodePudding user response:

You can use a merge_asof:

df1.index = pd.to_datetime(df1.index)
df2.index = pd.to_datetime(df2.index)

s = pd.Series(range(len(df1)), index=df1.index, name='id')

df1['volume'] = s.map(
    pd.merge_asof(df2, df1.assign(id=s)['id'],
                  left_index=True, right_index=True)
   .groupby('id').sum()['volume']
)

NB. ensure df1 and df2's indexes are sorted in ascending order.

output:

                     adjclose  volume
datetime                             
2022-10-20 02:55:00   30492.0   507.0
2022-10-20 03:10:00   30402.0   537.0
2022-10-20 03:20:00   30457.0   197.0
2022-10-20 03:45:00   30376.0     NaN
2022-10-20 04:05:00   30425.0     NaN
2022-10-20 04:10:00   30373.0     NaN
2022-10-20 04:45:00   30436.0     NaN
2022-10-20 04:55:00   30401.0     NaN
2022-10-20 05:25:00   30437.0     NaN

CodePudding user response:

You ca use pandas DataFrame.join

Join columns of another DataFrame.

Join columns with other DataFrame either on index or on a key column. Efficiently join multiple DataFrame objects by index at once by passing a list.

Here is the documentation with some examples

  • Related