I would like to have a certain time buckets and then find the difference of each time bucket to analyse.
For example,
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': 'A-1 A-1 A-1 A-1 A-1 A-1'.split(),
'Date':'23.10.2021 23.10.2021 23.10.2021 23.10.2021 23.10.2021 23.10.2021'.split(),
'Time': '06:05:31 06:11:13 06:19:22 06:25:03 06:33:12 06:44:05'.split(),
'Cumulative': '12 17 19 23 29 38'.split()})
print(df)
out:
A Date Time Cumulative
0 A-1 23.10.2021 06:05:31 12
1 A-1 23.10.2021 06:11:13 17
2 A-1 23.10.2021 06:19:22 19
3 A-1 23.10.2021 06:25:03 23
4 A-1 23.10.2021 06:33:12 29
5 A-1 23.10.2021 06:44:05 38
What I'd like to have is ceiling the hours by 15 mins intervals and find the difference of each, 1st Step:
A Date Time Cumulative TimeBuckets
0 A-1 23.10.2021 06:05:31 12 06:15:00
1 A-1 23.10.2021 06:11:13 17 06:15:00
2 A-1 23.10.2021 06:19:22 19 06:30:00
3 A-1 23.10.2021 06:25:03 23 06:30:00
4 A-1 23.10.2021 06:33:12 29 06:45:00
5 A-1 23.10.2021 06:44:05 38 06:45:00
and in final stage as a different dataframe, difference of each minimum and maximum value for each time bucket would be written:
A Diff TimeBuckets
0 A-1 5 06:15:00
1 A-1 4 06:30:00
2 A-1 8 06:45:00
CodePudding user response:
IIUC, you could use dt.ceil
and GroupBy.agg
:
(df.assign(Cumulative=df['Cumulative'].astype(int),
TimeBuckets=pd.to_datetime(df['Time']).dt.ceil('15min').dt.time
)
.groupby('TimeBuckets', as_index=False)
.agg({'A': 'first', 'Cumulative': lambda x: x.max()-x.min()})
)
output:
TimeBuckets A Cumulative
0 06:15:00 A-1 5
1 06:30:00 A-1 4
2 06:45:00 A-1 9