Home > database >  Pandas- How to work with pandas .dt.time timestamps
Pandas- How to work with pandas .dt.time timestamps

Time:09-21

I have a dataframe df_to_db:

enter_dt_tm exit_dt_tm total_time 08:00-23:00
2021-07-20 13:24:00 2021-07-20 22:51:00 0 days 09:27:00 0 days 09:27:00
2021-05-31 02:52:00 2021-05-31 06:16:00 0 days 03:24:00 0 days 00:00:00
2021-06-01 23:50:00 2021-06-01 23:58:00 0 days 00:08:00 0 days 00:00:00
2021-06-01 23:58:00 2021-06-02 07:58:00 0 days 08:00:00 0 days 00:00:00
2021-06-01 07:59:00 2021-06-01 23:12:00 0 days 15:13:00 0 days 15:00:00

And I want to print only the rows where there are greater than 15 minutes between "08:00-23:00" AND the "enter_dt_tm" timestamp is not between 07:40:00 am and 08:00:00 am (this would eliminate all but the first row).

I am writing the code as follows but am having issues:

test_df = df_to_db[df_to_db['08:00-23:00'] >= '0 days 00:15:00' & df_to_db['enter_dt_tm'].dt.time <= '07:40:00' & df_to_db['enter_dt_tm'].dt.time >= '08:00:00']

print(test_df)

The first portion where I specify 08:00-23:00 is greater than 15 minutes works, but I can't seem to figure out how to work with .dt.time to get my code to work. I have tried writing it in different formats with no luck (e.g. (7, 40, 0), 7, 40, 0) etc.

Any help would be greatly appreciated.

EDIT: The error I am receiving is: TypeError: '<=' not supported between instances of 'datetime.time' and 'str'

CodePudding user response:

You can use:

import datetime
import pandas as pd
import numpy as np

# convert `enter_dt_tm` to datetime format
df_to_db['enter_dt_tm'] = pd.to_datetime(df_to_db['enter_dt_tm'])

# convert `08:00-23:00` to timedelta format
df_to_db['08:00-23:00'] = pd.to_timedelta(df_to_db['08:00-23:00'])

# set boolean mask for `08:00-23:00` >= 15 minutes
m1 = df_to_db['08:00-23:00'] >= np.timedelta64(15,'m')

# set boolean mask for `enter_dt_tm` NOT between 07:40:00 am and 08:00:00 am
m2 = ~ df_to_db['enter_dt_tm'].dt.time.between(datetime.time(7, 40, 0), datetime.time(8, 0, 0))

# filter by both boolean masks
df_to_db[m1 & m2]

Result:

          enter_dt_tm           exit_dt_tm       total_time     08:00-23:00
0 2021-07-20 13:24:00  2021-07-20 22:51:00  0 days 09:27:00 0 days 09:27:00

CodePudding user response:

Please try to provide sample data or a working example when you ask questions.

import pandas as pd
dat = [
       ['2021-07-20 13:24:00', '2021-07-20 22:51:00',   '0 days 09:27:00',  '0 days 09:27:00'],
       ['2021-05-31 02:52:00',  '2021-05-31 06:16:00',  '0 days 03:24:00',  '0 days 00:00:00'],
       ['2021-06-01 23:50:00',  '2021-06-01 23:58:00',  '0 days 00:08:00',  '0 days 00:00:00'],
       ['2021-06-01 23:58:00',  '2021-06-02 07:58:00',  '0 days 08:00:00',  '0 days 00:00:00'],
       ['2021-06-01 07:59:00',  '2021-06-01 23:12:00',  '0 days 15:13:00',  '0 days 15:00:00']
      ]

df = pd.DataFrame(dat, columns=['enter_dt_tm', 'exit_dt_tm', 'total_time', '08:00-23:00'])
for c in ['enter_dt_tm', 'exit_dt_tm']:
    df[c] = pd.to_datetime(df[c])
In[2]: df
Out[2]: 
          enter_dt_tm          exit_dt_tm       total_time      08:00-23:00
0 2021-07-20 13:24:00 2021-07-20 22:51:00  0 days 09:27:00  0 days 09:27:00
1 2021-05-31 02:52:00 2021-05-31 06:16:00  0 days 03:24:00  0 days 00:00:00
2 2021-06-01 23:50:00 2021-06-01 23:58:00  0 days 00:08:00  0 days 00:00:00
3 2021-06-01 23:58:00 2021-06-02 07:58:00  0 days 08:00:00  0 days 00:00:00
4 2021-06-01 07:59:00 2021-06-01 23:12:00  0 days 15:13:00  0 days 15:00:00

Two things:

  1. The error you're getting is because you are trying to compare a time to a string. Cast your time-component string appropriately first as such: pd.Timestamp('07:40:000').time()
  2. You're filtering the time component of enter_dt_tm to before 7:40 am AND after 8:00 am. Such a time does not exist. You can instead check to see if the time is before 7:40 am OR after 8:00 am. Alternatively, you can check to see if the time is between that window, then invert the logical.
entry_time_mask = (df.enter_dt_tm.dt.time >= pd.Timestamp('07:40:00').time()) & (df.enter_dt_tm.dt.time <= pd.Timestamp('08:00:00').time())

time_diff_mask = df['08:00-23:00'] > '0 days 00:15:00'

# combine two boolean masks, inverting the entry_time_mask
# here, ~ means "not"
df[~entry_time_mask & time_diff_mask] 

Out[7]: 
          enter_dt_tm          exit_dt_tm       total_time      08:00-23:00
0 2021-07-20 13:24:00 2021-07-20 22:51:00  0 days 09:27:00  0 days 09:27:00

CodePudding user response:

Try grouping the part you want maybe?

`test_df = df_to_db[df_to_db['08:00-23:00'] >= '0 days 00:15:00' & (df_to_db['or_enter_dt_tm'].dt.time <= '07:40:00' & df_to_db['or_enter_dt_tm'].dt.time >= '08:00:00')]
`

and/or

get them to be in the same format

`test_df = df_to_db[df_to_db['08:00-23:00'] >= '0 days 00:15:00' & df_to_db['or_enter_dt_tm'].dt.time <= '07:40:00'.dt.time & df_to_db['or_enter_dt_tm'].dt.time >= '08:00:00'.dt.time]
  • Related