I have a dataframe df_to_db:
enter_dt_tm | exit_dt_tm | total_time | 08:00-23:00 |
---|---|---|---|
2021-07-20 13:24:00 | 2021-07-20 22:51:00 | 0 days 09:27:00 | 0 days 09:27:00 |
2021-05-31 02:52:00 | 2021-05-31 06:16:00 | 0 days 03:24:00 | 0 days 00:00:00 |
2021-06-01 23:50:00 | 2021-06-01 23:58:00 | 0 days 00:08:00 | 0 days 00:00:00 |
2021-06-01 23:58:00 | 2021-06-02 07:58:00 | 0 days 08:00:00 | 0 days 00:00:00 |
2021-06-01 07:59:00 | 2021-06-01 23:12:00 | 0 days 15:13:00 | 0 days 15:00:00 |
And I want to print only the rows where there are greater than 15 minutes between "08:00-23:00" AND the "enter_dt_tm" timestamp is not between 07:40:00 am and 08:00:00 am (this would eliminate all but the first row).
I am writing the code as follows but am having issues:
test_df = df_to_db[df_to_db['08:00-23:00'] >= '0 days 00:15:00' & df_to_db['enter_dt_tm'].dt.time <= '07:40:00' & df_to_db['enter_dt_tm'].dt.time >= '08:00:00']
print(test_df)
The first portion where I specify 08:00-23:00 is greater than 15 minutes works, but I can't seem to figure out how to work with .dt.time to get my code to work. I have tried writing it in different formats with no luck (e.g. (7, 40, 0), 7, 40, 0) etc.
Any help would be greatly appreciated.
EDIT: The error I am receiving is: TypeError: '<=' not supported between instances of 'datetime.time' and 'str'
CodePudding user response:
You can use:
import datetime
import pandas as pd
import numpy as np
# convert `enter_dt_tm` to datetime format
df_to_db['enter_dt_tm'] = pd.to_datetime(df_to_db['enter_dt_tm'])
# convert `08:00-23:00` to timedelta format
df_to_db['08:00-23:00'] = pd.to_timedelta(df_to_db['08:00-23:00'])
# set boolean mask for `08:00-23:00` >= 15 minutes
m1 = df_to_db['08:00-23:00'] >= np.timedelta64(15,'m')
# set boolean mask for `enter_dt_tm` NOT between 07:40:00 am and 08:00:00 am
m2 = ~ df_to_db['enter_dt_tm'].dt.time.between(datetime.time(7, 40, 0), datetime.time(8, 0, 0))
# filter by both boolean masks
df_to_db[m1 & m2]
Result:
enter_dt_tm exit_dt_tm total_time 08:00-23:00
0 2021-07-20 13:24:00 2021-07-20 22:51:00 0 days 09:27:00 0 days 09:27:00
CodePudding user response:
Please try to provide sample data or a working example when you ask questions.
import pandas as pd
dat = [
['2021-07-20 13:24:00', '2021-07-20 22:51:00', '0 days 09:27:00', '0 days 09:27:00'],
['2021-05-31 02:52:00', '2021-05-31 06:16:00', '0 days 03:24:00', '0 days 00:00:00'],
['2021-06-01 23:50:00', '2021-06-01 23:58:00', '0 days 00:08:00', '0 days 00:00:00'],
['2021-06-01 23:58:00', '2021-06-02 07:58:00', '0 days 08:00:00', '0 days 00:00:00'],
['2021-06-01 07:59:00', '2021-06-01 23:12:00', '0 days 15:13:00', '0 days 15:00:00']
]
df = pd.DataFrame(dat, columns=['enter_dt_tm', 'exit_dt_tm', 'total_time', '08:00-23:00'])
for c in ['enter_dt_tm', 'exit_dt_tm']:
df[c] = pd.to_datetime(df[c])
In[2]: df
Out[2]:
enter_dt_tm exit_dt_tm total_time 08:00-23:00
0 2021-07-20 13:24:00 2021-07-20 22:51:00 0 days 09:27:00 0 days 09:27:00
1 2021-05-31 02:52:00 2021-05-31 06:16:00 0 days 03:24:00 0 days 00:00:00
2 2021-06-01 23:50:00 2021-06-01 23:58:00 0 days 00:08:00 0 days 00:00:00
3 2021-06-01 23:58:00 2021-06-02 07:58:00 0 days 08:00:00 0 days 00:00:00
4 2021-06-01 07:59:00 2021-06-01 23:12:00 0 days 15:13:00 0 days 15:00:00
Two things:
- The error you're getting is because you are trying to compare a time to a string. Cast your time-component string appropriately first as such:
pd.Timestamp('07:40:000').time()
- You're filtering the time component of
enter_dt_tm
to before 7:40 am AND after 8:00 am. Such a time does not exist. You can instead check to see if the time is before 7:40 am OR after 8:00 am. Alternatively, you can check to see if the time is between that window, then invert the logical.
entry_time_mask = (df.enter_dt_tm.dt.time >= pd.Timestamp('07:40:00').time()) & (df.enter_dt_tm.dt.time <= pd.Timestamp('08:00:00').time())
time_diff_mask = df['08:00-23:00'] > '0 days 00:15:00'
# combine two boolean masks, inverting the entry_time_mask
# here, ~ means "not"
df[~entry_time_mask & time_diff_mask]
Out[7]:
enter_dt_tm exit_dt_tm total_time 08:00-23:00
0 2021-07-20 13:24:00 2021-07-20 22:51:00 0 days 09:27:00 0 days 09:27:00
CodePudding user response:
Try grouping the part you want maybe?
`test_df = df_to_db[df_to_db['08:00-23:00'] >= '0 days 00:15:00' & (df_to_db['or_enter_dt_tm'].dt.time <= '07:40:00' & df_to_db['or_enter_dt_tm'].dt.time >= '08:00:00')]
`
and/or
get them to be in the same format
`test_df = df_to_db[df_to_db['08:00-23:00'] >= '0 days 00:15:00' & df_to_db['or_enter_dt_tm'].dt.time <= '07:40:00'.dt.time & df_to_db['or_enter_dt_tm'].dt.time >= '08:00:00'.dt.time]