I have a DataFrame with multiple columns and it looks like this:
date col1 col2 col3
2023-01-01 Y N NaN
2023-01-02 Y N Y
Knowing the start and the end timestamp of df['date'], I want to generate the in between timestamps with the desired frequency. That I can do it using the code below:
new_date = pd.Series(pd.date_range(start=df.index[0], end=df.index[-1], freq= '15min'))
However, I would like to generate for the rest of the columns an equal number of rows with len(new_date)
and fill the in between values with the value before. I know I can do that using .ffill()
but I don't know how to generate the missing rows. The end result should look like this:
date col1 col2 col3
2020-11-01 00:00:00 Y N NaN
2020-11-01 00:15:00 Y N NaN
2020-11-01 00:30:00 Y N NaN
...
2023-01-02 00:00:00 Y N Y
CodePudding user response:
You can reindex with those datetimes and forward fill the missings. This way, if the starting value was NaN, the followings will stay as NaN as well:
>>> df.reindex(new_date, method="ffill")
col1 col2 col3
2023-01-01 00:00:00 Y N NaN
2023-01-01 00:15:00 Y N NaN
2023-01-01 00:30:00 Y N NaN
2023-01-01 00:45:00 Y N NaN
2023-01-01 01:00:00 Y N NaN
... ... ... ...
2023-01-01 23:00:00 Y N NaN
2023-01-01 23:15:00 Y N NaN
2023-01-01 23:30:00 Y N NaN
2023-01-01 23:45:00 Y N NaN
2023-01-02 00:00:00 Y N Y
[97 rows x 3 columns]
where new_date
is pd.date_range(start=df.index[0], end=df.index[-1], freq="15min")
.