I got two pandas dataframes as following:
ts1
Out[50]:
soil_moisture_ids41
date_time
2007-01-07 05:00:00 0.1830
2007-01-07 06:00:00 0.1825
2007-01-07 07:00:00 0.1825
2007-01-07 08:00:00 0.1825
2007-01-07 09:00:00 0.1825
... ...
2017-10-10 20:00:00 0.0650
2017-10-10 21:00:00 0.0650
2017-10-10 22:00:00 0.0650
2017-10-10 23:00:00 0.0650
2017-10-11 00:00:00 0.0650
[94316 rows x 3 columns]
and the other one is
ts2
Out[51]:
soil_moisture_ids42
date_time
2016-07-20 00:00:00 0.147
2016-07-20 01:00:00 0.148
2016-07-20 02:00:00 0.149
2016-07-20 03:00:00 0.150
2016-07-20 04:00:00 0.152
... ...
2019-12-31 19:00:00 0.216
2019-12-31 20:00:00 0.216
2019-12-31 21:00:00 0.215
2019-12-31 22:00:00 0.215
2019-12-31 23:00:00 0.215
[30240 rows x 3 columns]
You could see that, from 2007-01-07
to 2016-07-19
, only ts1
has the data points. And from 2016-07-20
to 2017-10-1
1 there are some overlapped time series. Now I want to combine these two data frames. During the overlapped period, I want to get the mean values over ts1
and ts2
. During the non-overlapped period, (2007-01-07
to 2016-07-19
and 2017-10-12
to 2019-12-31
), the values at each time stamp is set as the value from ts1
or ts2
. So how can I do it?
Thanks!
CodePudding user response:
Use concat
with aggregate mean
, if only one value get same ouput, if multiple get mean
. Also finally DatatimeIndex
is sorted:
s = pd.concat([ts1, ts2]).groupby(level=0).mean()