Home > OS >  Generate a period timestamps in a Dataframe with multiple columns and fill missing values using forw
Generate a period timestamps in a Dataframe with multiple columns and fill missing values using forw

Time:01-10

I have a DataFrame with multiple columns and it looks like this:

date       col1 col2  col3  
2023-01-01  Y   N     NaN           
2023-01-02  Y   N     Y             

Knowing the start and the end timestamp of df['date'], I want to generate the in between timestamps with the desired frequency. That I can do it using the code below:

new_date = pd.Series(pd.date_range(start=df.index[0], end=df.index[-1], freq= '15min'))

However, I would like to generate for the rest of the columns an equal number of rows with len(new_date) and fill the in between values with the value before. I know I can do that using .ffill() but I don't know how to generate the missing rows. The end result should look like this:

date                col1 col2  col3 
2020-11-01 00:00:00 Y    N     NaN
2020-11-01 00:15:00 Y    N     NaN
2020-11-01 00:30:00 Y    N     NaN
    ...         
2023-01-02 00:00:00 Y    N     Y            

CodePudding user response:

You can reindex with those datetimes and forward fill the missings. This way, if the starting value was NaN, the followings will stay as NaN as well:

>>> df.reindex(new_date, method="ffill")

                    col1 col2 col3
2023-01-01 00:00:00    Y    N  NaN
2023-01-01 00:15:00    Y    N  NaN
2023-01-01 00:30:00    Y    N  NaN
2023-01-01 00:45:00    Y    N  NaN
2023-01-01 01:00:00    Y    N  NaN
...                  ...  ...  ...
2023-01-01 23:00:00    Y    N  NaN
2023-01-01 23:15:00    Y    N  NaN
2023-01-01 23:30:00    Y    N  NaN
2023-01-01 23:45:00    Y    N  NaN
2023-01-02 00:00:00    Y    N    Y

[97 rows x 3 columns]

where new_date is pd.date_range(start=df.index[0], end=df.index[-1], freq="15min").

  • Related