Home > Blockchain >  Pandas fill missing Time-Series data. Only if more than one day is missing
Pandas fill missing Time-Series data. Only if more than one day is missing

Time:05-04

I have two time-series with different frequencies. Would like to fill values using the lower frequency data.

Here is what I mean. Hope it is clear this way:

index = [pd.datetime(2022,1,10,1),
        pd.datetime(2022,1,10,2),
        pd.datetime(2022,1,12,7),
        pd.datetime(2022,1,14,12),]
df1 = pd.DataFrame([1,2,3,4],index=index)

2022-01-10 01:00:00 1
2022-01-10 02:00:00 2
2022-01-12 07:00:00 3
2022-01-14 12:00:00 4

index = pd.date_range(start=pd.datetime(2022,1,9),
                      end = pd.datetime(2022,1,15),
                      freq='D')
df2 = pd.DataFrame([n 99 for n in range(len(index))],index=index)

2022-01-09  99
2022-01-10  100
2022-01-11  101
2022-01-12  102
2022-01-13  103
2022-01-14  104
2022-01-15  105

The final df should only fill values if more than one day is missing under df1. So the result should be:

2022-01-09 00:00:00 99
2022-01-10 01:00:00 1
2022-01-10 02:00:00 2
2022-01-11 00:00:00 101
2022-01-12 07:00:00 3
2022-01-13 00:00:00 103
2022-01-14 12:00:00 4
2022-01-15 00:00:00 105

Any idea how to do this? Thanks a lot for any hint!

CodePudding user response:

You can filter df2 to keep only the new dates and concat to df1:

import numpy as np

idx1 = pd.to_datetime(df1.index).date
idx2 = pd.to_datetime(df2.index).date

df3 = pd.concat([df1, df2[~np.isin(idx2, idx1)]]).sort_index()

Output:

                       0
2022-01-09 00:00:00   99
2022-01-10 01:00:00    1
2022-01-10 02:00:00    2
2022-01-11 00:00:00  101
2022-01-12 07:00:00    3
2022-01-13 00:00:00  103
2022-01-14 12:00:00    4
2022-01-15 00:00:00  105
  • Related