How to add empty/dummy row with continuous datetime index in pandas?-CodePudding

This is my dataframe

                                 consumption  hour
start_time
2022-09-30 14:00:00 02:00            199.0  14.0
2022-09-30 15:00:00 02:00            173.0  15.0
2022-09-30 16:00:00 02:00            173.0  16.0
2022-09-30 17:00:00 02:00            156.0  17.0
2022-09-30 18:00:00 02:00            142.0  18.0
2022-09-30 19:00:00 02:00            163.0  19.0
2022-09-30 20:00:00 02:00            138.0  20.0
2022-09-30 21:00:00 02:00            183.0  21.0
2022-09-30 22:00:00 02:00            138.0  22.0
2022-09-30 23:00:00 02:00            143.0  23.0

I want outout like this

                                 consumption  hour
start_time
2022-09-30 14:00:00 02:00            199.0  14.0
2022-09-30 15:00:00 02:00            173.0  15.0
2022-09-30 16:00:00 02:00            173.0  16.0
2022-09-30 17:00:00 02:00            156.0  17.0
2022-09-30 18:00:00 02:00            142.0  18.0
2022-09-30 19:00:00 02:00            163.0  19.0
2022-09-30 20:00:00 02:00            138.0  20.0
2022-09-30 21:00:00 02:00            183.0  21.0
2022-09-30 22:00:00 02:00            138.0  22.0
2022-09-30 23:00:00 02:00            143.0  23.0
*2022-09-31 00:00:00 02:00           00.0   00.0*
*2022-09-31 01:00:00 02:00           00.0   01.0*

Here my index is datetime (start_time), i want to create rows with continuation of datetime and values as dummy or zero. How to do it in pandas python?

CodePudding user response：

Create helper DataFrame and add to original by concat:

N = 2
df1 = (pd.DataFrame({'consumption':0}, 
                     index=pd.date_range(df.index.max()   pd.Timedelta('1h'),
                           df.index.max()   pd.Timedelta(f'{N}h'),
                           freq='H'))
          .assign(hour=lambda x: x.index.hour))

df = pd.concat([df, df1])
print (df)
                           consumption  hour
2022-09-30 14:00:00 02:00        199.0  14.0
2022-09-30 15:00:00 02:00        173.0  15.0
2022-09-30 16:00:00 02:00        173.0  16.0
2022-09-30 17:00:00 02:00        156.0  17.0
2022-09-30 18:00:00 02:00        142.0  18.0
2022-09-30 19:00:00 02:00        163.0  19.0
2022-09-30 20:00:00 02:00        138.0  20.0
2022-09-30 21:00:00 02:00        183.0  21.0
2022-09-30 22:00:00 02:00        138.0  22.0
2022-09-30 23:00:00 02:00        143.0  23.0
2022-10-01 00:00:00 02:00          0.0   0.0
2022-10-01 01:00:00 02:00          0.0   1.0

Or use DataFrame.reindex with new index with added N hours:

N = 2
df = (df.reindex(pd.date_range(df.index.min(), 
                               df.index.max()   pd.Timedelta(f'{N}h'), 
                               freq='H'), fill_value=0)
        .assign(hour=lambda x: x.index.hour))

print (df)
                           consumption  hour
2022-09-30 14:00:00 02:00        199.0    14
2022-09-30 15:00:00 02:00        173.0    15
2022-09-30 16:00:00 02:00        173.0    16
2022-09-30 17:00:00 02:00        156.0    17
2022-09-30 18:00:00 02:00        142.0    18
2022-09-30 19:00:00 02:00        163.0    19
2022-09-30 20:00:00 02:00        138.0    20
2022-09-30 21:00:00 02:00        183.0    21
2022-09-30 22:00:00 02:00        138.0    22
2022-09-30 23:00:00 02:00        143.0    23
2022-10-01 00:00:00 02:00          0.0     0
2022-10-01 01:00:00 02:00          0.0     1