Home > Software design >  Pandas: merge two time series and get the mean values during the period when these two have overlapp
Pandas: merge two time series and get the mean values during the period when these two have overlapp

Time:05-02

I got two pandas dataframes as following:

ts1
Out[50]: 
                     soil_moisture_ids41  
date_time                                 
2007-01-07 05:00:00               0.1830  
2007-01-07 06:00:00               0.1825  
2007-01-07 07:00:00               0.1825  
2007-01-07 08:00:00               0.1825  
2007-01-07 09:00:00               0.1825  
...                                 ...  
2017-10-10 20:00:00               0.0650  
2017-10-10 21:00:00               0.0650  
2017-10-10 22:00:00               0.0650  
2017-10-10 23:00:00               0.0650  
2017-10-11 00:00:00               0.0650  

[94316 rows x 3 columns]

and the other one is

ts2
Out[51]: 
                     soil_moisture_ids42  
date_time                                                        
2016-07-20 00:00:00                0.147  
2016-07-20 01:00:00                0.148  
2016-07-20 02:00:00                0.149  
2016-07-20 03:00:00                0.150  
2016-07-20 04:00:00                0.152  
...                                 ...  
2019-12-31 19:00:00                0.216 
2019-12-31 20:00:00                0.216 
2019-12-31 21:00:00                0.215 
2019-12-31 22:00:00                0.215 
2019-12-31 23:00:00                0.215 

[30240 rows x 3 columns]

You could see that, from 2007-01-07 to 2016-07-19, only ts1 has the data points. And from 2016-07-20 to 2017-10-11 there are some overlapped time series. Now I want to combine these two data frames. During the overlapped period, I want to get the mean values over ts1 and ts2. During the non-overlapped period, (2007-01-07 to 2016-07-19 and 2017-10-12 to 2019-12-31), the values at each time stamp is set as the value from ts1 or ts2. So how can I do it?

Thanks!

CodePudding user response:

Use concat with aggregate mean, if only one value get same ouput, if multiple get mean. Also finally DatatimeIndex is sorted:

s = pd.concat([ts1, ts2]).groupby(level=0).mean()
  • Related