Home > OS >  How to combine two timeseries in an upsert manner?
How to combine two timeseries in an upsert manner?

Time:02-21

Want an Upsert on the index

  • df1 is long time-series (historical)
  • df2 is an update which will have new entries, but may overlap df1. On overlap, df2 takes precedence.
  • df3 is the output that I want.

I tried calling an outer merge, but my indices are being ignored.

I also tried setting left_index and right_index, but that didn't work, either.

import pandas as pd

idx1 = pd.date_range('2015-10-05 10:00:00', '2015-10-05 10:04:00', freq='1min')
df1 = pd.DataFrame({'x': [1,2,3,4,5]}, index=idx1)

idx2 = pd.date_range('2015-10-05 10:03:00', '2015-10-05 10:07:00', freq='1min')
df2 = pd.DataFrame({'x': [10,20,30,40,50]}, index=idx2)

idx3 = pd.date_range('2015-10-05 10:00:00', '2015-10-05 10:07:00', freq='1min')
df3 = pd.DataFrame({'x': [1,2,3,10,20,30,40,50]}, index=idx3)

# pd.merge(df1, df2, how='outer')      # index being ignored
# df = pd.merge(df_hist, df_update, left_index=True, right_index=True, how='outer')  # also bad

CodePudding user response:

You can append then groupby : Notice the order of append do matter in your case, since you would like to keep df2 value

out = df2.append(df1).groupby(level=0).head(1).sort_index()
Out[231]: 
                      x
2015-10-05 10:00:00   1
2015-10-05 10:01:00   2
2015-10-05 10:02:00   3
2015-10-05 10:03:00  10
2015-10-05 10:04:00  20
2015-10-05 10:05:00  30
2015-10-05 10:06:00  40
2015-10-05 10:07:00  50

CodePudding user response:

Just pd.concat your dataframes and then use x.loc a lambda function with duplicated(keep='last'):

df3 = pd.concat([df1, df2]).loc[lambda x: ~x.index.duplicated(keep='last')]

Output:

>>> df3
                      x
2015-10-05 10:00:00   1
2015-10-05 10:01:00   2
2015-10-05 10:02:00   3
2015-10-05 10:03:00  10
2015-10-05 10:04:00  20
2015-10-05 10:05:00  30
2015-10-05 10:06:00  40
2015-10-05 10:07:00  50

CodePudding user response:

We can use combine_first here:

df3 = df2.combine_first(df1)
df3

Output:

                      x
2015-10-05 10:00:00   1
2015-10-05 10:01:00   2
2015-10-05 10:02:00   3
2015-10-05 10:03:00  10
2015-10-05 10:04:00  20
2015-10-05 10:05:00  30
2015-10-05 10:06:00  40
2015-10-05 10:07:00  50
​
  • Related