Home > Blockchain >  Pandas dataframe expand rows in specific times
Pandas dataframe expand rows in specific times

Time:01-10

I have a dataframe:

df =      T1           C1
     01/01/2022 11:20   2
     01/01/2022 15:40   8
     01/01/2022 17:50   3

I want to expand it such that I will have a row for each round timestamp:

df_new =      T1            C1
         01/01/2022 11:20   2
         01/01/2022 12:00   2
         01/01/2022 13:00   2
         01/01/2022 14:00   2
         01/01/2022 15:00   2
         01/01/2022 15:40   8
         01/01/2022 16:00   8
         01/01/2022 17:00   8
         01/01/2022 17:50   3

CodePudding user response:

You can add the extra dates and ffill:

df['T1'] = pd.to_datetime(df['T1'])

extra = pd.date_range(df['T1'].min().ceil('H'), df['T1'].max().floor('H'), freq='1h')

(pd.concat([df, pd.DataFrame({'T1': extra})])
   .sort_values(by='T1', ignore_index=True)
   .ffill()
 )

Output:


                   T1   C1
0 2022-01-01 11:20:00  2.0
1 2022-01-01 12:00:00  2.0
2 2022-01-01 13:00:00  2.0
3 2022-01-01 14:00:00  2.0
4 2022-01-01 15:00:00  2.0
5 2022-01-01 15:40:00  8.0
6 2022-01-01 16:00:00  8.0
7 2022-01-01 17:00:00  8.0
8 2022-01-01 17:50:00  3.0

CodePudding user response:

Another possible solution, based on pandas.DataFrame.resample:

df['T1'] = pd.to_datetime(df['T1'])

(pd.concat([df, df.set_index('T1').resample('1H').ffill().assign(C1=np.nan)
            .reset_index()])
 .sort_values('T1').ffill().dropna().reset_index(drop=True))

Output:

                   T1   C1
0 2022-01-01 11:20:00  2.0
1 2022-01-01 12:00:00  2.0
2 2022-01-01 13:00:00  2.0
3 2022-01-01 14:00:00  2.0
4 2022-01-01 15:00:00  2.0
5 2022-01-01 15:40:00  8.0
6 2022-01-01 16:00:00  8.0
7 2022-01-01 17:00:00  8.0
8 2022-01-01 17:50:00  3.0

CodePudding user response:

Here is a way to do what your question asks that will ensure:

  • there are no duplicate times in T1 in the output, even if any of the times in the original are round hours
  • the results will be of the same type as the values in the C1 column of the input (in this case, integers not floats).
hours = pd.date_range(df.T1.min().ceil("H"), df.T1.max().floor("H"), freq="60min")
idx_new = df.set_index('T1').join(pd.DataFrame(index=hours), how='outer', sort=True).index
df_new = df.set_index('T1').reindex(index = idx_new, method='ffill').reset_index().rename(columns={'index':'T1'})

Output:

                   T1  C1
0 2022-01-01 11:20:00   2
1 2022-01-01 12:00:00   2
2 2022-01-01 13:00:00   2
3 2022-01-01 14:00:00   2
4 2022-01-01 15:00:00   2
5 2022-01-01 15:40:00   8
6 2022-01-01 16:00:00   8
7 2022-01-01 17:00:00   8
8 2022-01-01 17:50:00   3

Example of how round dates in the input are handled:

df = pd.DataFrame({
#'T1':pd.to_datetime(['01/01/2022 11:20','01/01/2022 15:40','01/01/2022 17:50']),
'T1':pd.to_datetime(['01/01/2022 11:00','01/01/2022 15:40','01/01/2022 17:00']),
'C1':[2,8,3]})

Input:

                   T1  C1
0 2022-01-01 11:00:00   2
1 2022-01-01 15:40:00   8
2 2022-01-01 17:00:00   3

Output (no duplicates):

                   T1  C1
0 2022-01-01 11:00:00   2
1 2022-01-01 12:00:00   2
2 2022-01-01 13:00:00   2
3 2022-01-01 14:00:00   2
4 2022-01-01 15:00:00   2
5 2022-01-01 15:40:00   8
6 2022-01-01 16:00:00   8
7 2022-01-01 17:00:00   3
  • Related