Home > Software design >  How to calculate time difference in minutes and populate the dataframe according
How to calculate time difference in minutes and populate the dataframe according

Time:09-22

I have a time series data, converted to a dataframe. It has multiple columns, where the first column is timestamps and rest of the column names are timestamps with values.

The dataframe looks like

date                   2022-01-02 10:20:00   2022-01-02 10:25:00  2022-01-02 10:30:00  2022-01-02 10:35:00   2022-01-02 10:40:00   2022-01-02 10:45:00  2022-01-02 10:50:00   2022-01-02 10:55:00   2022-01-02 11:00:00
2022-01-02 10:30:00        25.5                  26.3                  26.9                 NaN                  NaN                NaN                        NaN                  NaN                   NaN
2022-01-02 10:45:00        60.3                  59.3                  59.2                 58.4                 56.9               58.0                     NaN                  NaN                   NaN
2022-01-02 11:00:00        43.7                  43.9                  48                   48                   48.1               48.9                        49                  49.5                  49.5

Note that if value in date column matches with columns names, there are NaNs after the intersecting column.

The dataframe I am trying to achieve is as below where the column names are the minutes before date (40,35,30,25,20,15,10,5,0) and the same values are populated accordingly:

For example : 1) 2022-01-02 10:30:00 - 2022-01-02 10:30:00 = 0 mins, hence the corresponding value there should be 26.9. 2) 2022-01-02 10:30:00 - 2022-01-02 10:25:00 = 5 mins, hence the value there should be 26.3 and so on.

Note - values with * are dummy values to represent.(The real dataframe has many more columns)

date                   40mins      35mins       30mins      25mins       20mins       15mins        10mins      5mins      0mins
2022-01-02 10:30:00     24*        24*           24.8*       24.8*         25*          25*          25.5        26.3      26.9   
2022-01-02 10:45:00     59*        58*           60*         60.3         59.3          59.2         58.4        56.9      58.0   
2022-01-02 11:00:00     43.7        43.9         48          48           48.1          48.9         49          49.5      49.5   

I would highly appreciate some help here. Apologies if I have not framed the question well. Please ask for clarification if needed.

CodePudding user response:

IIUC, you can melt, compute the timedelta and filter, then pivot back:

(df.melt('date', var_name='date2') # reshape the columns to rows
   # convert the date strings to datetime
   # and compute the timedelta
   .assign(date=lambda d: pd.to_datetime(d['date']),
           date2=lambda d: pd.to_datetime(d['date2']),
           delta=lambda d: d['date'].sub(d['date2'])
                           .dt.total_seconds().floordiv(60)
          )
   # filter out negative timedelta
   .loc[lambda d: d['delta'].ge(0)]
   # reshape the rows back to columns
   .pivot('date', 'delta', 'value')
   # rename columns from integer to "Xmins"
   .rename(columns=lambda x: f'{x:.0f}mins') 
   # remove columns axis label
   .rename_axis(columns=None)
)

output:

                     0mins  5mins  10mins  15mins  20mins  25mins  30mins  35mins  40mins
date                                                                                     
2022-01-02 10:30:00   26.9   26.3    25.5     NaN     NaN     NaN     NaN     NaN     NaN
2022-01-02 10:45:00   58.0   56.9    58.4    59.2    59.3    60.3     NaN     NaN     NaN
2022-01-02 11:00:00   49.5   49.5    49.0    48.9    48.1    48.0    48.0    43.9    43.7
  • Related