Home > database >  Find the overlap between two timestamps in R to assign shifts
Find the overlap between two timestamps in R to assign shifts

Time:04-30

Problem

Currently, I have a large flight crew schedule dataset, with a start and end time, where my goal is to identify whether an employee was working a night shift. A night shift is defined as any portion of the shift between 01:00:00 and 05:59:59. I have looked at functions such as %overlaps%, but these seem not to work for only timestamps. Some sample data (in UTC-tz):

library(lubridate)
df <- data.frame(start = ymd_hms(c("2018-09-19 23:30:00", "2018-09-19 17:00:00", "2018-09-22 04:30:00")),
                 end = ymd_hms(c('2018-09-20 07:05:00', "2018-09-19 21:00:00", "2018-09-22 12:00:00")))

Solution

Ideally, I would like to get the following output, with a Boolean variable indicating whether the employee worked a night shift:

               start                 end    night.shift
2018-09-19 23:30:00 | 2018-09-20 07:05:00 |  TRUE
2018-09-19 17:00:00 | 2018-09-19 21:00:00 |  FALSE
2018-09-22 04:30:00 | 2018-09-22 12:00:00 |  TRUE

Thanks in advance!

CodePudding user response:

You can use interval() or %--% to create an Interval object and int_overlaps() to test if two intervals overlap.

library(dplyr)
library(lubridate)

df %>%
  as_tibble() %>%
  mutate(
    night.shift = int_overlaps(
      (date(end)   hms("01:00:00")) %--% (date(end)   hms("05:59:59")),
      start %--% end
    )
  )

# # A tibble: 3 × 3
#   start               end                 night.shift
#   <dttm>              <dttm>              <lgl>      
# 1 2018-09-19 23:30:00 2018-09-20 07:05:00 TRUE       
# 2 2018-09-19 17:00:00 2018-09-19 21:00:00 FALSE      
# 3 2018-09-22 04:30:00 2018-09-22 12:00:00 TRUE 
Reference

Utilities for creation and manipulation of Interval objects

CodePudding user response:

This is super janky and not optimized, but it works (and was fun to figure out). You'll want to vectorize it if possible.

library(lubridate)
df <- data.frame(start = ymd_hms(c("2018-09-19 23:30:00", "2018-09-19 17:00:00", "2018-09-22 04:30:00")),
                 end = ymd_hms(c('2018-09-20 07:05:00', "2018-09-19 21:00:00", "2018-09-22 12:00:00")))
night <- interval( hms::as_hms(3600), hms::as_hms(21599), tz = "UTC")
print(night)

for(i in 1:3) {
    s = df$start[i]
    f = df$end[i]
    start_seconds = hms::as_hms(60*60*hour(s)   60*minute(s)   second(s))
    end_seconds = hms::as_hms(60*60*hour(f)   60*minute(f)   second(f))
    
    interval <- interval(start_seconds, end_seconds, tz = "UTC")
    
    t <- int_overlaps(night, interval)
    print(t)
    
    }

CodePudding user response:

Using seq.POSIXt

transform(df, night.shift=mapply(\(x, y) any(
  as.POSIXct(outer(as.Date(c(x, y)), c('01:00:00', '05:59:59'), paste), tz='GMT') %in% 
    seq.POSIXt(x, y, by='sec')), 
  start, end))
#                 start                 end night.shift
# 1 2018-09-19 23:30:00 2018-09-20 07:05:00        TRUE
# 2 2018-09-19 17:00:00 2018-09-19 21:00:00       FALSE
# 3 2018-09-22 04:30:00 2018-09-22 12:00:00        TRUE

or, almost twice as fast, %inrange% from the data.table package.

library(data.table)

transform(df, night.shift=mapply(\(x, y) any(
  as.POSIXct(outer(as.Date(c(x, y)), c('01:00:00', '05:59:59'), paste), tz='GMT') %inrange% 
    c(x, y)), 
  start, end))
#                 start                 end night.shift
# 1 2018-09-19 23:30:00 2018-09-20 07:05:00        TRUE
# 2 2018-09-19 17:00:00 2018-09-19 21:00:00       FALSE
# 3 2018-09-22 04:30:00 2018-09-22 12:00:00        TRUE
  • Related