I have Pandas DataFrame and I'm trying to figure out how to insert values between the datetime values in my DF. Ideally I'd have a row for each hour, with the 'price' just being filled in by the previous price.
If I have a table like this:
day_hour price
---------------------------------
2020-01-01 05:00:00 12.083333
2020-01-01 09:00:00 11.047170
2020-01-01 10:00:00 11.678125
2020-01-01 11:00:00 11.920000
2020-01-01 12:00:00 12.012500
How would I turn it into this?:
day_hour price
---------------------------------
2020-01-01 05:00:00 12.083333
2020-01-01 06:00:00 12.083333
2020-01-01 07:00:00 12.083333
2020-01-01 08:00:00 12.083333
2020-01-01 09:00:00 11.047170
2020-01-01 10:00:00 11.678125
2020-01-01 11:00:00 11.920000
2020-01-01 12:00:00 12.012500
CodePudding user response:
We can set_index
then resample.ffill
to hourly (1H
) and reset_index
to restore the column:
df = df.set_index('day_hour').resample('1H').ffill().reset_index()
df
:
day_hour price
0 2020-01-01 05:00:00 12.083333
1 2020-01-01 06:00:00 12.083333
2 2020-01-01 07:00:00 12.083333
3 2020-01-01 08:00:00 12.083333
4 2020-01-01 09:00:00 11.047170
5 2020-01-01 10:00:00 11.678125
6 2020-01-01 11:00:00 11.920000
7 2020-01-01 12:00:00 12.012500
Setup:
import pandas as pd
df = pd.DataFrame({
'day_hour': pd.to_datetime(['2020-01-01 05:00:00', '2020-01-01 09:00:00',
'2020-01-01 10:00:00', '2020-01-01 11:00:00',
'2020-01-01 12:00:00']),
'price': [12.083333, 11.04717, 11.678125, 11.92, 12.0125]
})