Pandas dataframe expand rows in specific times-CodePudding

I have a dataframe:

df =      T1           C1
     01/01/2022 11:20   2
     01/01/2022 15:40   8
     01/01/2022 17:50   3

I want to expand it such that I will have a row for each round timestamp:

df_new =      T1            C1
         01/01/2022 11:20   2
         01/01/2022 12:00   2
         01/01/2022 13:00   2
         01/01/2022 14:00   2
         01/01/2022 15:00   2
         01/01/2022 15:40   8
         01/01/2022 16:00   8
         01/01/2022 17:00   8
         01/01/2022 17:50   3

CodePudding user response：

You can add the extra dates and ffill:

df['T1'] = pd.to_datetime(df['T1'])

extra = pd.date_range(df['T1'].min().ceil('H'), df['T1'].max().floor('H'), freq='1h')

(pd.concat([df, pd.DataFrame({'T1': extra})])
   .sort_values(by='T1', ignore_index=True)
   .ffill()
 )

Output:


                   T1   C1
0 2022-01-01 11:20:00  2.0
1 2022-01-01 12:00:00  2.0
2 2022-01-01 13:00:00  2.0
3 2022-01-01 14:00:00  2.0
4 2022-01-01 15:00:00  2.0
5 2022-01-01 15:40:00  8.0
6 2022-01-01 16:00:00  8.0
7 2022-01-01 17:00:00  8.0
8 2022-01-01 17:50:00  3.0

CodePudding user response：

Another possible solution, based on pandas.DataFrame.resample:

df['T1'] = pd.to_datetime(df['T1'])

(pd.concat([df, df.set_index('T1').resample('1H').ffill().assign(C1=np.nan)
            .reset_index()])
 .sort_values('T1').ffill().dropna().reset_index(drop=True))

Output:

                   T1   C1
0 2022-01-01 11:20:00  2.0
1 2022-01-01 12:00:00  2.0
2 2022-01-01 13:00:00  2.0
3 2022-01-01 14:00:00  2.0
4 2022-01-01 15:00:00  2.0
5 2022-01-01 15:40:00  8.0
6 2022-01-01 16:00:00  8.0
7 2022-01-01 17:00:00  8.0
8 2022-01-01 17:50:00  3.0

CodePudding user response：

Here is a way to do what your question asks that will ensure:

there are no duplicate times in T1 in the output, even if any of the times in the original are round hours
the results will be of the same type as the values in the C1 column of the input (in this case, integers not floats).

hours = pd.date_range(df.T1.min().ceil("H"), df.T1.max().floor("H"), freq="60min")
idx_new = df.set_index('T1').join(pd.DataFrame(index=hours), how='outer', sort=True).index
df_new = df.set_index('T1').reindex(index = idx_new, method='ffill').reset_index().rename(columns={'index':'T1'})

Output:

                   T1  C1
0 2022-01-01 11:20:00   2
1 2022-01-01 12:00:00   2
2 2022-01-01 13:00:00   2
3 2022-01-01 14:00:00   2
4 2022-01-01 15:00:00   2
5 2022-01-01 15:40:00   8
6 2022-01-01 16:00:00   8
7 2022-01-01 17:00:00   8
8 2022-01-01 17:50:00   3

Example of how round dates in the input are handled:

df = pd.DataFrame({
#'T1':pd.to_datetime(['01/01/2022 11:20','01/01/2022 15:40','01/01/2022 17:50']),
'T1':pd.to_datetime(['01/01/2022 11:00','01/01/2022 15:40','01/01/2022 17:00']),
'C1':[2,8,3]})

Input:

                   T1  C1
0 2022-01-01 11:00:00   2
1 2022-01-01 15:40:00   8
2 2022-01-01 17:00:00   3

Output (no duplicates):

                   T1  C1
0 2022-01-01 11:00:00   2
1 2022-01-01 12:00:00   2
2 2022-01-01 13:00:00   2
3 2022-01-01 14:00:00   2
4 2022-01-01 15:00:00   2
5 2022-01-01 15:40:00   8
6 2022-01-01 16:00:00   8
7 2022-01-01 17:00:00   3