I created a dataframe of time slots with interval of 30 minutes, from 1/10/2020 till 3/10/2020 as follows:
timeInterval=pd.DataFrame()
startYear = 2020
startMonth = 10
startDay = 1
endYead = 2020
endMonth = 10
endDay = 3
interval = 30
startDate = str(datetime(startYear,startMonth,startDay).date())
endDate = str(datetime(endYear,endMonth,endDay).date())
endDateMinus1 = str(datetime(endYear,endMonth,endDay)-timedelta(seconds=1))
timeInterval['Start']=pd.date_range(start=startDate ' 00:00:00', end=endDateMinus1,freq=str(interval) 'T')
timeInterval['End']= pd.date_range(start=startDate ' 00:' str(interval) ':00', end=endDate ' 00:00:00',freq=str(interval) 'T')
Start End
0 2020-10-01 00:00:00 2020-10-01 00:30:00
1 2020-10-01 00:30:00 2020-10-01 01:00:00
2 2020-10-01 01:00:00 2020-10-01 01:30:00
3 2020-10-01 01:30:00 2020-10-01 02:00:00
4 2020-10-01 02:00:00 2020-10-01 02:30:00
... ... ...
91 2020-10-02 21:30:00 2020-10-02 22:00:00
92 2020-10-02 22:00:00 2020-10-02 22:30:00
93 2020-10-02 22:30:00 2020-10-02 23:00:00
94 2020-10-02 23:00:00 2020-10-02 23:30:00
95 2020-10-02 23:30:00 2020-10-03 00:00:00
I have a DataFrame df
with 100k rows, which I need to go through and count according to the time slot it falls in. An example DataFrame as follows:
Entry_Time Exit_Time Sector
0 2020-10-01 22:24:00 2020-10-01 22:50:55 North
1 2020-10-01 22:32:00 2020-10-01 22:53:00 West
2 2020-10-01 22:44:00 2020-10-01 23:01:53 Central
3 2020-10-01 22:50:55 2020-10-01 23:04:07 North
4 2020-10-01 22:53:00 2020-10-01 23:03:54 North
5 2020-10-01 23:01:53 2020-10-01 23:13:44 West
6 2020-10-01 23:04:07 2020-10-01 23:26:48 Central
7 2020-10-01 23:13:44 2020-10-01 23:28:00 Central
8 2020-10-02 15:02:00 2020-10-02 15:09:31 West
9 2020-10-02 15:09:31 2020-10-02 15:25:47 North
I need to find the time slot that each row of the df
falls in, in accordance to the timeInterval
dataframe. So The expected result can be something like below:
Entry_Time Exit_Time Sector Timeslot
0 2020-10-01 22:24:00 2020-10-01 22:50:55 North 2020-10-01 22:00:00 - 2020-10-01 22:30:00,2020-10-01 22:30:00 - 2020-10-01 23:00:00
1 2020-10-01 22:32:00 2020-10-01 22:53:00 West 2020-10-01 22:30:00 - 2020-10-01 23:00:00
2 2020-10-01 22:44:00 2020-10-01 23:01:53 Central 2020-10-01 22:30:00 - 2020-10-01 23:00:00,2020-10-01 23:00:00 - 2020-10-01 23:30:00
3 2020-10-01 22:50:55 2020-10-01 23:04:07 North 2020-10-01 22:30:00 - 2020-10-01 23:00:00,2020-10-01 23:00:00 - 2020-10-01 23:30:00
4 2020-10-01 22:53:00 2020-10-01 23:03:54 North 2020-10-01 22:30:00 - 2020-10-01 23:00:00,2020-10-01 23:00:00 - 2020-10-01 23:30:00
5 2020-10-01 23:01:53 2020-10-01 23:13:44 West 2020-10-01 23:00:00 - 2020-10-01 23:30:00
6 2020-10-01 23:04:07 2020-10-01 23:26:48 Central 2020-10-01 23:00:00 - 2020-10-01 23:30:00
7 2020-10-01 23:13:44 2020-10-01 23:28:00 Central 2020-10-01 23:00:00 - 2020-10-01 23:30:00
8 2020-10-02 15:02:00 2020-10-02 15:09:31 West 2020-10-02 15:00:00 - 2020-10-02 15:30:00
9 2020-10-02 15:09:31 2020-10-02 15:25:47 North 2020-10-02 15:00:00 - 2020-10-02 15:30:00
CodePudding user response:
Since these are fixed intervals you can simply use the .dt
accessor to properly floor and ceil the start and end columns respectively, then use pd.date_range on those bounds:
>>> interval = pd.Timedelta(minutes=30)
>>> df['Entry_Time'].dt.floor(freq=interval)
0 2020-10-01 22:00:00
1 2020-10-01 22:30:00
2 2020-10-01 22:30:00
3 2020-10-01 22:30:00
4 2020-10-01 22:30:00
5 2020-10-01 23:00:00
6 2020-10-01 23:00:00
7 2020-10-01 23:00:00
8 2020-10-02 15:00:00
9 2020-10-02 15:00:00
Name: Entry_Time, dtype: datetime64[ns]
>>> bounds = pd.concat([
... df['Entry_Time'].dt.floor(freq=interval),
... df['Exit_Time'].dt.ceil(freq=interval)
... ], axis='columns')
>>> df.join(bounds.agg(lambda s: pd.date_range(*s, freq=interval).to_list(),
... axis='columns').rename('Timeslot'))
Entry_Time Exit_Time Sector Timeslot
0 2020-10-01 22:24:00 2020-10-01 22:50:55 North [2020-10-01 22:00:00, 2020-10-01 22:30:00, 202...
1 2020-10-01 22:32:00 2020-10-01 22:53:00 West [2020-10-01 22:30:00, 2020-10-01 23:00:00]
2 2020-10-01 22:44:00 2020-10-01 23:01:53 Central [2020-10-01 22:30:00, 2020-10-01 23:00:00, 202...
3 2020-10-01 22:50:55 2020-10-01 23:04:07 North [2020-10-01 22:30:00, 2020-10-01 23:00:00, 202...
4 2020-10-01 22:53:00 2020-10-01 23:03:54 North [2020-10-01 22:30:00, 2020-10-01 23:00:00, 202...
5 2020-10-01 23:01:53 2020-10-01 23:13:44 West [2020-10-01 23:00:00, 2020-10-01 23:30:00]
6 2020-10-01 23:04:07 2020-10-01 23:26:48 Central [2020-10-01 23:00:00, 2020-10-01 23:30:00]
7 2020-10-01 23:13:44 2020-10-01 23:28:00 Central [2020-10-01 23:00:00, 2020-10-01 23:30:00]
8 2020-10-02 15:02:00 2020-10-02 15:09:31 West [2020-10-02 15:00:00, 2020-10-02 15:30:00]
9 2020-10-02 15:09:31 2020-10-02 15:25:47 North [2020-10-02 15:00:00, 2020-10-02 15:30:00]
Note that I am supposing your columns are proper datetime datatypes.
If not, replace df['Entry_Time'].dt.floor(freq=interval)
with pd.to_datetime(df['Entry_Time']).dt.floor(freq=interval)
and similarly with Exit_Time
.