Home > Blockchain >  How to combine time series data with different granularity
How to combine time series data with different granularity

Time:08-12

I have two dataframes:

df1
   value1   value2      timestamp     
0   193     68       2022-08-05 01:37:00
1   214     122      2022-08-06 01:55:00    
2   260     75       2022-08-07 02:16:00    

value1 and value2 are recorded once everyday.

df2
     value3      timestamp  
0     0.01      2022-08-05 00:00:00 
1     0.0       2022-08-05 00:01:00 
2     0.2       2022-08-05 00:02:00 
3     1.0       2022-08-05 00:03:00 
...
4561  0.5       2022-08-07 12:30:00

value3 is recorded every 1-minute. I need to combine these into one dataframe by time.

value1 and value2 will be the same between each recordings. (ie. from 2022-08-06 01:55:00 until 2022-08-07 02:16:00, value1 should display 214 for every minute.

combined_df
       value1   value2    value3      timestamp     
0       193     68        0.02        2022-08-06 01:54:00
1       214     122       0.0         2022-08-06 01:55:00
2       214     122       0.1         2022-08-06 01:56:00
3       214     122       0.5         2022-08-06 01:57:00
...
4120    260     75        0.5         2022-08-07 02:16:00 

CodePudding user response:

Use merge_asof:

print (df2)
      value3            timestamp
0       0.01  2022-08-06 01:54:00
1       0.00  2022-08-06 01:55:00
2       0.20  2022-08-06 01:56:00
3       1.00  2022-08-06 01:57:00
4561    0.50  2022-08-07 12:30:00

df1['timestamp'] = pd.to_datetime(df1['timestamp'])
df2['timestamp'] = pd.to_datetime(df2['timestamp'])

cols = ['value1','value2','value3','timestamp']
df = pd.merge_asof(df2, df1, on='timestamp')[cols]

print (df)
   value1  value2  value3           timestamp
0     193      68    0.01 2022-08-06 01:54:00
1     214     122    0.00 2022-08-06 01:55:00
2     214     122    0.20 2022-08-06 01:56:00
3     214     122    1.00 2022-08-06 01:57:00
4     260      75    0.50 2022-08-07 12:30:00

CodePudding user response:

You need to merge the two dataframes and then forward fill the missing values:

df1 = df1.merge(right=df2, on='timestamp', how='right')
df1.ffill(axis = 0, inplace=True)
  • Related