I have a time series data, converted to a dataframe. It has multiple columns, where the first column is timestamps and rest of the column names are timestamps with values.
The dataframe looks like
date 2022-01-02 10:20:00 2022-01-02 10:25:00 2022-01-02 10:30:00 2022-01-02 10:35:00 2022-01-02 10:40:00 2022-01-02 10:45:00 2022-01-02 10:50:00 2022-01-02 10:55:00 2022-01-02 11:00:00
2022-01-02 10:30:00 25.5 26.3 26.9 NaN NaN NaN NaN NaN NaN
2022-01-02 10:45:00 60.3 59.3 59.2 58.4 56.9 58.0 NaN NaN NaN
2022-01-02 11:00:00 43.7 43.9 48 48 48.1 48.9 49 49.5 49.5
Note that if value in date column matches with columns names, there are NaNs after the intersecting column.
The dataframe I am trying to achieve is as below where the column names are the minutes before date (40,35,30,25,20,15,10,5,0) and the same values are populated accordingly:
For example : 1) 2022-01-02 10:30:00 - 2022-01-02 10:30:00 = 0 mins, hence the corresponding value there should be 26.9. 2) 2022-01-02 10:30:00 - 2022-01-02 10:25:00 = 5 mins, hence the value there should be 26.3 and so on.
Note - values with * are dummy values to represent.(The real dataframe has many more columns)
date 40mins 35mins 30mins 25mins 20mins 15mins 10mins 5mins 0mins
2022-01-02 10:30:00 24* 24* 24.8* 24.8* 25* 25* 25.5 26.3 26.9
2022-01-02 10:45:00 59* 58* 60* 60.3 59.3 59.2 58.4 56.9 58.0
2022-01-02 11:00:00 43.7 43.9 48 48 48.1 48.9 49 49.5 49.5
I would highly appreciate some help here. Apologies if I have not framed the question well. Please ask for clarification if needed.
CodePudding user response:
IIUC, you can melt
, compute the timedelta and filter, then pivot
back:
(df.melt('date', var_name='date2') # reshape the columns to rows
# convert the date strings to datetime
# and compute the timedelta
.assign(date=lambda d: pd.to_datetime(d['date']),
date2=lambda d: pd.to_datetime(d['date2']),
delta=lambda d: d['date'].sub(d['date2'])
.dt.total_seconds().floordiv(60)
)
# filter out negative timedelta
.loc[lambda d: d['delta'].ge(0)]
# reshape the rows back to columns
.pivot('date', 'delta', 'value')
# rename columns from integer to "Xmins"
.rename(columns=lambda x: f'{x:.0f}mins')
# remove columns axis label
.rename_axis(columns=None)
)
output:
0mins 5mins 10mins 15mins 20mins 25mins 30mins 35mins 40mins
date
2022-01-02 10:30:00 26.9 26.3 25.5 NaN NaN NaN NaN NaN NaN
2022-01-02 10:45:00 58.0 56.9 58.4 59.2 59.3 60.3 NaN NaN NaN
2022-01-02 11:00:00 49.5 49.5 49.0 48.9 48.1 48.0 48.0 43.9 43.7