I have a pandas dataframe where it has start_time, end_time and booking duration.
Please find below example dataframe
id | Start_time | End_time | Duration |
---|---|---|---|
1 | 2013-11-20 09:00:00 | 2013-11-20 09:30:00 | 0 days 0:30:00 |
2 | 2013-11-20 09:00:00 | 2013-11-20 12:10:00 | 0 days 3:10:00 |
3 | 2013-11-20 10:00:00 | 2013-11-20 11:00:00 | 0 days 1:00:00 |
4 | 2013-11-20 08:00:00 | 2013-11-20 09:40:00 | 0 days 1:40:00 |
I am trying to get the timeslots from the above dataframe
Expected output:
id | Start_time | End_time | Duration | Timeslots |
---|---|---|---|---|
1 | 2013-11-20 09:00:00 | 2013-11-20 09:30:00 | 0 days 0:30:00 | 9 - 10 |
2 | 2013-11-20 09:00:00 | 2013-11-20 12:10:00 | 0 days 3:10:00 | 9-10 |
2 | 2013-11-20 09:00:00 | 2013-11-20 12:10:00 | 0 days 3:10:00 | 10-11 |
2 | 2013-11-20 09:00:00 | 2013-11-20 12:10:00 | 0 days 3:10:00 | 11-12 |
3 | 2013-11-20 10:00:00 | 2013-11-20 11:00:00 | 0 days 1:00:00 | 10 - 11 |
4 | 2013-11-20 08:00:00 | 2013-11-20 09:40:00 | 0 days 1:40:00 | 8-9 |
4 | 2013-11-20 08:00:00 | 2013-11-20 09:40:00 | 0 days 1:40:00 | 9-10 |
What I have tried till now
I can get the slots from start_time and end_time but i am missing the expected output
id | Start_time | End_time | Duration | TimeSlot |
---|---|---|---|---|
1 | 2013-11-20 09:00:00 | 2013-11-20 09:30:00 | 0 days 0:30:00 | 9-9:30 |
2 | 2013-11-20 09:00:00 | 2013-11-20 12:10:00 | 0 days 3:10:00 | 9-12:10 |
3 | 2013-11-20 10:00:00 | 2013-11-20 11:00:00 | 0 days 1:00:00 | 10-11 |
4 | 2013-11-20 08:00:00 | 2013-11-20 09:40:00 | 0 days 1:40:00 | 8 - 9:40 |
Can anyone give some hints please
CodePudding user response:
Try:
def get_slots(row):
dti = pd.date_range(row['Start_time'].floor('H'),
row['End_time'].ceil('H'), freq='H')
return [f"{s.hour:02}-{e.hour:02}" for s, e in zip(dti, dti[1:])]
out = df.assign(Timeslots=df.apply(get_slots, axis=1)).explode('Timeslots')
print(out)
# Output:
id Start_time End_time Duration Timeslots
0 1 2013-11-20 09:00:00 2013-11-20 09:30:00 0 days 00:30:00 09-10
1 2 2013-11-20 09:00:00 2013-11-20 12:10:00 0 days 03:10:00 09-10
1 2 2013-11-20 09:00:00 2013-11-20 12:10:00 0 days 03:10:00 10-11
1 2 2013-11-20 09:00:00 2013-11-20 12:10:00 0 days 03:10:00 11-12
1 2 2013-11-20 09:00:00 2013-11-20 12:10:00 0 days 03:10:00 12-13
2 3 2013-11-20 10:00:00 2013-11-20 11:00:00 0 days 01:00:00 10-11
3 4 2013-11-20 08:00:00 2013-11-20 09:40:00 0 days 01:40:00 08-09
3 4 2013-11-20 08:00:00 2013-11-20 09:40:00 0 days 01:40:00 09-10
Setup to be reproducible:
import pandas as pd
from pandas import Timestamp, Timedelta
data = {
'id': [1, 2, 3, 4],
'Start_time': [Timestamp('2013-11-20 09:00:00'), Timestamp('2013-11-20 09:00:00'),
Timestamp('2013-11-20 10:00:00'), Timestamp('2013-11-20 08:00:00')],
'End_time': [Timestamp('2013-11-20 09:30:00'), Timestamp('2013-11-20 12:10:00'),
Timestamp('2013-11-20 11:00:00'), Timestamp('2013-11-20 09:40:00')],
'Duration': [Timedelta('0 days 00:30:00'), Timedelta('0 days 03:10:00'),
Timedelta('0 days 01:00:00'), Timedelta('0 days 01:40:00')]
}
df = pd.DataFrame(data)