Home > Software engineering >  Calculate how long each row's time period overlaps with other rows
Calculate how long each row's time period overlaps with other rows

Time:09-05

Each row records the start time and end time of a time period. To simply, we assume all are in the same day, so we don't need to bother other more difficult issues.

I need to calculate how long each row's time period overlaps with other rows. And the number of rows that have such overlaps. Suppose I have a data set. I want column D and column E.

There is no sample data. I just use this fake data set to make an example.

dat <- data.frame(id=1:4, 
                                    Start = c("02:50","02:55","03:15","03:25"),
                   End = c("03:10", "03:05", "03:20", "03:30"),
                   num_overlap = c(1,1,0,0),
                   time_overlap = c(10,10,0,0))

dat

CodePudding user response:

Here's a solution using the foverlaps method within the data.table package:

Packages:

library(data.table)
library(chron) # To convert the times provided into a `times` format

Your data:

dat <- data.frame(id=1:4, 
              Start = c("02:50","02:55","03:15","03:25"),
              End = c("03:10", "03:05", "03:20", "03:30"))

I assume your times represent hours and minutes, so need to add the seconds ":00" and format as a times object:

dat$Start = times(paste0(dat$Start, ":00"))
dat$End = times(paste0(dat$End,":00"))

We will use the data.table package so we coerce the data frame to a data.table object

setDT(dat)

Set the Start and End time columns as keys for merging and then call the foverlaps function

setkey(dat, "Start", "End")

# Merge onto itself to find overlaps
overlaps = foverlaps(dat, dat, type = "any")

(See more on foverlaps here at https://www.rdocumentation.org/packages/data.table/versions/1.14.2/topics/foverlaps)

foverlaps keeps cases where the match was with itself (id = i.id) so we remove them

overlaps = overlaps[id != i.id]

Next, compute the amount of overlap, for each overlap that occurred

overlaps[, time_overlap := pmin(i.End, End) - pmax(i.Start, Start)]

Convert to minutes per https://stackoverflow.com/a/29067748/3674399

overlaps[, time_overlap := 60 * 24 * as.numeric(time_overlap)]

Keep only id and time_overlap, and summarize by id

overlaps = overlaps[, list(id, time_overlap)]
overlaps = overlaps[, list(num_overlap = .N, time_overlap = sum(time_overlap)), by = "id"]

Finally, merge with original dataset now, and fill any missing values

result = merge(dat, overlaps, by = "id", all.x = T)
result[is.na(num_overlap), num_overlap := 0]
result[is.na(time_overlap), time_overlap := 0]

The result is as follows:

   id    Start      End num_overlap time_overlap
1:  1 02:50:00 03:10:00           1           10
2:  2 02:55:00 03:05:00           1           10
3:  3 03:15:00 03:20:00           0            0
4:  4 03:25:00 03:30:00           0            0

If you are new to the data.table syntax, please see for example, this intro here: https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html

  •  Tags:  
  • r
  • Related