Each row records the start time and end time of a time period. To simply, we assume all are in the same day, so we don't need to bother other more difficult issues.
I need to calculate how long each row's time period overlaps with other rows. And the number of rows that have such overlaps. Suppose I have a data set. I want column D and column E.
There is no sample data. I just use this fake data set to make an example.
dat <- data.frame(id=1:4,
Start = c("02:50","02:55","03:15","03:25"),
End = c("03:10", "03:05", "03:20", "03:30"),
num_overlap = c(1,1,0,0),
time_overlap = c(10,10,0,0))
dat
CodePudding user response:
Here's a solution using the foverlaps
method within the data.table
package:
Packages:
library(data.table)
library(chron) # To convert the times provided into a `times` format
Your data:
dat <- data.frame(id=1:4,
Start = c("02:50","02:55","03:15","03:25"),
End = c("03:10", "03:05", "03:20", "03:30"))
I assume your times represent hours and minutes, so need to add the seconds ":00" and format as a times
object:
dat$Start = times(paste0(dat$Start, ":00"))
dat$End = times(paste0(dat$End,":00"))
We will use the data.table
package so we coerce the data frame to a data.table object
setDT(dat)
Set the Start
and End
time columns as keys for merging and then call the foverlaps
function
setkey(dat, "Start", "End")
# Merge onto itself to find overlaps
overlaps = foverlaps(dat, dat, type = "any")
(See more on foverlaps
here at https://www.rdocumentation.org/packages/data.table/versions/1.14.2/topics/foverlaps)
foverlaps
keeps cases where the match was with itself (id = i.id) so we remove them
overlaps = overlaps[id != i.id]
Next, compute the amount of overlap, for each overlap that occurred
overlaps[, time_overlap := pmin(i.End, End) - pmax(i.Start, Start)]
Convert to minutes per https://stackoverflow.com/a/29067748/3674399
overlaps[, time_overlap := 60 * 24 * as.numeric(time_overlap)]
Keep only id and time_overlap, and summarize by id
overlaps = overlaps[, list(id, time_overlap)]
overlaps = overlaps[, list(num_overlap = .N, time_overlap = sum(time_overlap)), by = "id"]
Finally, merge with original dataset now, and fill any missing values
result = merge(dat, overlaps, by = "id", all.x = T)
result[is.na(num_overlap), num_overlap := 0]
result[is.na(time_overlap), time_overlap := 0]
The result is as follows:
id Start End num_overlap time_overlap
1: 1 02:50:00 03:10:00 1 10
2: 2 02:55:00 03:05:00 1 10
3: 3 03:15:00 03:20:00 0 0
4: 4 03:25:00 03:30:00 0 0
If you are new to the data.table syntax, please see for example, this intro here: https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html