Home > Software engineering >  how do i resample a dataframe with index half as time series and half as integer
how do i resample a dataframe with index half as time series and half as integer

Time:09-01

                             close         0
2020-01-02 09:00:00 00:00  0.467291       NaN
2020-01-02 09:30:00 00:00  0.467267       NaN
2020-01-02 10:00:00 00:00  0.467729       NaN
2020-01-02 10:30:00 00:00  0.467923       NaN
2020-01-02 11:00:00 00:00  0.466707       NaN
...                             ...       ...
1500                            NaN  0.140868
1501                            NaN  0.136557
1502                            NaN  0.131828
1503                            NaN  0.128827
1504                            NaN  0.128978

consider this dataframe. is there a way to "ffilll" the time series to the it continues the time series? like. *note that the 0 column filled the close column "sideways"

                             close        
2020-01-02 09:00:00 00:00  0.467291       
2020-01-02 09:30:00 00:00  0.467267       
2020-01-02 10:00:00 00:00  0.467729       
2020-01-02 10:30:00 00:00  0.467923       
2020-01-02 11:00:00 00:00  0.466707       
...                             ...       
2020-17-02 09:30:00 00:00  0.161267       
2020-17-02 10:00:00 00:00  0.165729       
2020-17-02 10:30:00 00:00  0.164923       
2020-17-02 11:00:00 00:00  0.163707       

CodePudding user response:

You can split your df into 2 dataframes.

In case you have access to your original 2 dataframes before they were merged - you can use them right away.

Then you can reindex the second dataframe with the dates like you want, and merge these 2 dataframes properly:

last_ts = df['close'].last_valid_index()

df1 = df.loc[ : last_ts, ['close']]
df2 = df.iloc[len(df1) : , [1]]     # 1 is the index position of column 0
df2.index = pd.date_range(start = last_ts   pd.Timedelta('30 min'), 
                          periods = len(df2),
                          freq='30 min')
df2.columns = ['close']

result = pd.concat([df1, df2])

Example:

df = pd.DataFrame([[1, np.nan],
                   [2, np.nan],
                   [np.nan, 4]],
                   index = list(pd.date_range(start='2022', periods=2, freq='30 min'))   [1],
                   columns=['close', 0])

                     close    0
2022-01-01 00:00:00    1.0  NaN
2022-01-01 00:30:00    2.0  NaN
1                      NaN  4.0

Result:

                     close
2022-01-01 00:00:00    1.0
2022-01-01 00:30:00    2.0
2022-01-01 01:00:00    4.0
  • Related