Home > Net >  How to add empty/dummy row with continuous datetime index in pandas?
How to add empty/dummy row with continuous datetime index in pandas?

Time:11-21

This is my dataframe

                                 consumption  hour
start_time
2022-09-30 14:00:00 02:00            199.0  14.0
2022-09-30 15:00:00 02:00            173.0  15.0
2022-09-30 16:00:00 02:00            173.0  16.0
2022-09-30 17:00:00 02:00            156.0  17.0
2022-09-30 18:00:00 02:00            142.0  18.0
2022-09-30 19:00:00 02:00            163.0  19.0
2022-09-30 20:00:00 02:00            138.0  20.0
2022-09-30 21:00:00 02:00            183.0  21.0
2022-09-30 22:00:00 02:00            138.0  22.0
2022-09-30 23:00:00 02:00            143.0  23.0

I want outout like this

                                 consumption  hour
start_time
2022-09-30 14:00:00 02:00            199.0  14.0
2022-09-30 15:00:00 02:00            173.0  15.0
2022-09-30 16:00:00 02:00            173.0  16.0
2022-09-30 17:00:00 02:00            156.0  17.0
2022-09-30 18:00:00 02:00            142.0  18.0
2022-09-30 19:00:00 02:00            163.0  19.0
2022-09-30 20:00:00 02:00            138.0  20.0
2022-09-30 21:00:00 02:00            183.0  21.0
2022-09-30 22:00:00 02:00            138.0  22.0
2022-09-30 23:00:00 02:00            143.0  23.0
*2022-09-31 00:00:00 02:00           00.0   00.0*
*2022-09-31 01:00:00 02:00           00.0   01.0*

Here my index is datetime (start_time), i want to create rows with continuation of datetime and values as dummy or zero. How to do it in pandas python?

CodePudding user response:

Create helper DataFrame and add to original by concat:

N = 2
df1 = (pd.DataFrame({'consumption':0}, 
                     index=pd.date_range(df.index.max()   pd.Timedelta('1h'),
                           df.index.max()   pd.Timedelta(f'{N}h'),
                           freq='H'))
          .assign(hour=lambda x: x.index.hour))

df = pd.concat([df, df1])
print (df)
                           consumption  hour
2022-09-30 14:00:00 02:00        199.0  14.0
2022-09-30 15:00:00 02:00        173.0  15.0
2022-09-30 16:00:00 02:00        173.0  16.0
2022-09-30 17:00:00 02:00        156.0  17.0
2022-09-30 18:00:00 02:00        142.0  18.0
2022-09-30 19:00:00 02:00        163.0  19.0
2022-09-30 20:00:00 02:00        138.0  20.0
2022-09-30 21:00:00 02:00        183.0  21.0
2022-09-30 22:00:00 02:00        138.0  22.0
2022-09-30 23:00:00 02:00        143.0  23.0
2022-10-01 00:00:00 02:00          0.0   0.0
2022-10-01 01:00:00 02:00          0.0   1.0

Or use DataFrame.reindex with new index with added N hours:

N = 2
df = (df.reindex(pd.date_range(df.index.min(), 
                               df.index.max()   pd.Timedelta(f'{N}h'), 
                               freq='H'), fill_value=0)
        .assign(hour=lambda x: x.index.hour))

print (df)
                           consumption  hour
2022-09-30 14:00:00 02:00        199.0    14
2022-09-30 15:00:00 02:00        173.0    15
2022-09-30 16:00:00 02:00        173.0    16
2022-09-30 17:00:00 02:00        156.0    17
2022-09-30 18:00:00 02:00        142.0    18
2022-09-30 19:00:00 02:00        163.0    19
2022-09-30 20:00:00 02:00        138.0    20
2022-09-30 21:00:00 02:00        183.0    21
2022-09-30 22:00:00 02:00        138.0    22
2022-09-30 23:00:00 02:00        143.0    23
2022-10-01 00:00:00 02:00          0.0     0
2022-10-01 01:00:00 02:00          0.0     1
  • Related