Find the overlap between two timestamps in R to assign shifts-CodePudding

Problem

Currently, I have a large flight crew schedule dataset, with a start and end time, where my goal is to identify whether an employee was working a night shift. A night shift is defined as any portion of the shift between 01:00:00 and 05:59:59. I have looked at functions such as %overlaps%, but these seem not to work for only timestamps. Some sample data (in UTC-tz):

library(lubridate)
df <- data.frame(start = ymd_hms(c("2018-09-19 23:30:00", "2018-09-19 17:00:00", "2018-09-22 04:30:00")),
                 end = ymd_hms(c('2018-09-20 07:05:00', "2018-09-19 21:00:00", "2018-09-22 12:00:00")))

Solution

Ideally, I would like to get the following output, with a Boolean variable indicating whether the employee worked a night shift:

               start                 end    night.shift
2018-09-19 23:30:00 | 2018-09-20 07:05:00 |  TRUE
2018-09-19 17:00:00 | 2018-09-19 21:00:00 |  FALSE
2018-09-22 04:30:00 | 2018-09-22 12:00:00 |  TRUE

Thanks in advance!

CodePudding user response：

You can use interval() or %--% to create an Interval object and int_overlaps() to test if two intervals overlap.

library(dplyr)
library(lubridate)

df %>%
  as_tibble() %>%
  mutate(
    night.shift = int_overlaps(
      (date(end)   hms("01:00:00")) %--% (date(end)   hms("05:59:59")),
      start %--% end
    )
  )

# # A tibble: 3 × 3
#   start               end                 night.shift
#   <dttm>              <dttm>              <lgl>      
# 1 2018-09-19 23:30:00 2018-09-20 07:05:00 TRUE       
# 2 2018-09-19 17:00:00 2018-09-19 21:00:00 FALSE      
# 3 2018-09-22 04:30:00 2018-09-22 12:00:00 TRUE

Reference

Utilities for creation and manipulation of Interval objects

CodePudding user response：

This is super janky and not optimized, but it works (and was fun to figure out). You'll want to vectorize it if possible.

library(lubridate)
df <- data.frame(start = ymd_hms(c("2018-09-19 23:30:00", "2018-09-19 17:00:00", "2018-09-22 04:30:00")),
                 end = ymd_hms(c('2018-09-20 07:05:00', "2018-09-19 21:00:00", "2018-09-22 12:00:00")))
night <- interval( hms::as_hms(3600), hms::as_hms(21599), tz = "UTC")
print(night)

for(i in 1:3) {
    s = df$start[i]
    f = df$end[i]
    start_seconds = hms::as_hms(60*60*hour(s)   60*minute(s)   second(s))
    end_seconds = hms::as_hms(60*60*hour(f)   60*minute(f)   second(f))
    
    interval <- interval(start_seconds, end_seconds, tz = "UTC")
    
    t <- int_overlaps(night, interval)
    print(t)
    
    }

CodePudding user response：

Using seq.POSIXt

transform(df, night.shift=mapply(\(x, y) any(
  as.POSIXct(outer(as.Date(c(x, y)), c('01:00:00', '05:59:59'), paste), tz='GMT') %in% 
    seq.POSIXt(x, y, by='sec')), 
  start, end))
#                 start                 end night.shift
# 1 2018-09-19 23:30:00 2018-09-20 07:05:00        TRUE
# 2 2018-09-19 17:00:00 2018-09-19 21:00:00       FALSE
# 3 2018-09-22 04:30:00 2018-09-22 12:00:00        TRUE

or, almost twice as fast, %inrange% from the data.table package.

library(data.table)

transform(df, night.shift=mapply(\(x, y) any(
  as.POSIXct(outer(as.Date(c(x, y)), c('01:00:00', '05:59:59'), paste), tz='GMT') %inrange% 
    c(x, y)), 
  start, end))
#                 start                 end night.shift
# 1 2018-09-19 23:30:00 2018-09-20 07:05:00        TRUE
# 2 2018-09-19 17:00:00 2018-09-19 21:00:00       FALSE
# 3 2018-09-22 04:30:00 2018-09-22 12:00:00        TRUE