I have two time series with same sampling frequency but different end dates. I want to combine them into one and keep the total time range instead of the intersection. Leave the data outside the intersection NaN.
I've tried:
df_to_merge= [df1, df2]
df_merged = reduce(lambda left,right: pd.merge(left,right, on='timestamp'), df_to_merge)
The data:
df1
timestamp col1
2010-10-10 00:00 10
2010-10-10 00:01 15
...
2010-10-15 00:00 10
df2
timestamp col2
2010-10-07 00:00 20
2010-10-10 00:01 25
...
2010-10-18 00:00 20
Desired outcome:
timestamp col1 col2
2010-10-07 00:00 NaN 20
2010-10-07 00:01 NaN 25
...
2010-10-10 00:01 10 30
2010-10-15 00:00 10 40
..
2010-10-18 00:00 NaN 20
CodePudding user response:
You can perform a join operation:
df_merged = df1.join(df2,how='right')
By using right
you ensure all values from the right (longer df) will be kept.
For example:
df1 = pd.DataFrame({'timestamp':pd.to_datetime(pd.Series(['2020-10-10 23:32',
'2020-10-13 23:28'])),
'col1':[5,8]})
df1 = df1.set_index('timestamp').resample('1d').fillna(method='ffill')
col1
timestamp
2020-10-10 NaN
2020-10-11 5.0
2020-10-12 5.0
2020-10-13 5.0
And
df2 = pd.DataFrame({'timestamp':pd.to_datetime(pd.Series(['2020-10-08 23:32',
'2020-10-15 23:28'])),
'col2':[50,80]})
df2 = df2.set_index('timestamp').resample('1d').fillna(method='ffill')
col1
timestamp
2020-10-08 NaN
2020-10-09 50.0
2020-10-10 50.0
2020-10-11 50.0
2020-10-12 50.0
2020-10-13 50.0
2020-10-14 50.0
2020-10-15 50.0
Return:
col1 col2
timestamp
2020-10-08 NaN NaN
2020-10-09 NaN 50.0
2020-10-10 NaN 50.0
2020-10-11 5.0 50.0
2020-10-12 5.0 50.0
2020-10-13 5.0 50.0
2020-10-14 NaN 50.0
2020-10-15 NaN 50.0