Finding Time Slot of Dataframe according to Entry Time and Exit Time-CodePudding

I created a dataframe of time slots with interval of 30 minutes, from 1/10/2020 till 3/10/2020 as follows:

timeInterval=pd.DataFrame()
startYear = 2020
startMonth = 10
startDay = 1
endYead = 2020
endMonth = 10
endDay = 3
interval  = 30

startDate = str(datetime(startYear,startMonth,startDay).date())
endDate   = str(datetime(endYear,endMonth,endDay).date())

endDateMinus1 = str(datetime(endYear,endMonth,endDay)-timedelta(seconds=1))
timeInterval['Start']=pd.date_range(start=startDate ' 00:00:00', end=endDateMinus1,freq=str(interval) 'T')
timeInterval['End']= pd.date_range(start=startDate ' 00:' str(interval) ':00', end=endDate ' 00:00:00',freq=str(interval) 'T')

           Start                    End
0    2020-10-01 00:00:00    2020-10-01 00:30:00
1    2020-10-01 00:30:00    2020-10-01 01:00:00
2    2020-10-01 01:00:00    2020-10-01 01:30:00
3    2020-10-01 01:30:00    2020-10-01 02:00:00
4    2020-10-01 02:00:00    2020-10-01 02:30:00
...          ...                  ...
91   2020-10-02 21:30:00    2020-10-02 22:00:00
92   2020-10-02 22:00:00    2020-10-02 22:30:00
93   2020-10-02 22:30:00    2020-10-02 23:00:00
94   2020-10-02 23:00:00    2020-10-02 23:30:00
95   2020-10-02 23:30:00    2020-10-03 00:00:00

I have a DataFrame df with 100k rows, which I need to go through and count according to the time slot it falls in. An example DataFrame as follows:

    Entry_Time            Exit_Time             Sector
0   2020-10-01 22:24:00   2020-10-01 22:50:55   North
1   2020-10-01 22:32:00   2020-10-01 22:53:00   West
2   2020-10-01 22:44:00   2020-10-01 23:01:53   Central
3   2020-10-01 22:50:55   2020-10-01 23:04:07   North
4   2020-10-01 22:53:00   2020-10-01 23:03:54   North
5   2020-10-01 23:01:53   2020-10-01 23:13:44   West
6   2020-10-01 23:04:07   2020-10-01 23:26:48   Central
7   2020-10-01 23:13:44   2020-10-01 23:28:00   Central
8   2020-10-02 15:02:00   2020-10-02 15:09:31   West
9   2020-10-02 15:09:31   2020-10-02 15:25:47   North

I need to find the time slot that each row of the df falls in, in accordance to the timeInterval dataframe. So The expected result can be something like below:

    Entry_Time            Exit_Time             Sector    Timeslot
0   2020-10-01 22:24:00   2020-10-01 22:50:55   North     2020-10-01 22:00:00 - 2020-10-01 22:30:00,2020-10-01 22:30:00 - 2020-10-01 23:00:00
1   2020-10-01 22:32:00   2020-10-01 22:53:00   West      2020-10-01 22:30:00 - 2020-10-01 23:00:00
2   2020-10-01 22:44:00   2020-10-01 23:01:53   Central   2020-10-01 22:30:00 - 2020-10-01 23:00:00,2020-10-01 23:00:00 - 2020-10-01 23:30:00
3   2020-10-01 22:50:55   2020-10-01 23:04:07   North     2020-10-01 22:30:00 - 2020-10-01 23:00:00,2020-10-01 23:00:00 - 2020-10-01 23:30:00
4   2020-10-01 22:53:00   2020-10-01 23:03:54   North     2020-10-01 22:30:00 - 2020-10-01 23:00:00,2020-10-01 23:00:00 - 2020-10-01 23:30:00 
5   2020-10-01 23:01:53   2020-10-01 23:13:44   West      2020-10-01 23:00:00 - 2020-10-01 23:30:00
6   2020-10-01 23:04:07   2020-10-01 23:26:48   Central   2020-10-01 23:00:00 - 2020-10-01 23:30:00
7   2020-10-01 23:13:44   2020-10-01 23:28:00   Central   2020-10-01 23:00:00 - 2020-10-01 23:30:00
8   2020-10-02 15:02:00   2020-10-02 15:09:31   West      2020-10-02 15:00:00 - 2020-10-02 15:30:00
9   2020-10-02 15:09:31   2020-10-02 15:25:47   North     2020-10-02 15:00:00 - 2020-10-02 15:30:00

CodePudding user response：

Since these are fixed intervals you can simply use the .dt accessor to properly floor and ceil the start and end columns respectively, then use pd.date_range on those bounds:

>>> interval = pd.Timedelta(minutes=30)
>>> df['Entry_Time'].dt.floor(freq=interval)
0   2020-10-01 22:00:00
1   2020-10-01 22:30:00
2   2020-10-01 22:30:00
3   2020-10-01 22:30:00
4   2020-10-01 22:30:00
5   2020-10-01 23:00:00
6   2020-10-01 23:00:00
7   2020-10-01 23:00:00
8   2020-10-02 15:00:00
9   2020-10-02 15:00:00
Name: Entry_Time, dtype: datetime64[ns]
>>> bounds = pd.concat([
...   df['Entry_Time'].dt.floor(freq=interval),
...   df['Exit_Time'].dt.ceil(freq=interval)
... ], axis='columns')
>>> df.join(bounds.agg(lambda s: pd.date_range(*s, freq=interval).to_list(),
...                    axis='columns').rename('Timeslot'))
           Entry_Time           Exit_Time   Sector                                           Timeslot
0 2020-10-01 22:24:00 2020-10-01 22:50:55    North  [2020-10-01 22:00:00, 2020-10-01 22:30:00, 202...
1 2020-10-01 22:32:00 2020-10-01 22:53:00     West         [2020-10-01 22:30:00, 2020-10-01 23:00:00]
2 2020-10-01 22:44:00 2020-10-01 23:01:53  Central  [2020-10-01 22:30:00, 2020-10-01 23:00:00, 202...
3 2020-10-01 22:50:55 2020-10-01 23:04:07    North  [2020-10-01 22:30:00, 2020-10-01 23:00:00, 202...
4 2020-10-01 22:53:00 2020-10-01 23:03:54    North  [2020-10-01 22:30:00, 2020-10-01 23:00:00, 202...
5 2020-10-01 23:01:53 2020-10-01 23:13:44     West         [2020-10-01 23:00:00, 2020-10-01 23:30:00]
6 2020-10-01 23:04:07 2020-10-01 23:26:48  Central         [2020-10-01 23:00:00, 2020-10-01 23:30:00]
7 2020-10-01 23:13:44 2020-10-01 23:28:00  Central         [2020-10-01 23:00:00, 2020-10-01 23:30:00]
8 2020-10-02 15:02:00 2020-10-02 15:09:31     West         [2020-10-02 15:00:00, 2020-10-02 15:30:00]
9 2020-10-02 15:09:31 2020-10-02 15:25:47    North         [2020-10-02 15:00:00, 2020-10-02 15:30:00]

Note that I am supposing your columns are proper datetime datatypes. If not, replace df['Entry_Time'].dt.floor(freq=interval) with pd.to_datetime(df['Entry_Time']).dt.floor(freq=interval) and similarly with Exit_Time.