Pulling values of co-occuring/over-lapping time data in R-CodePudding

I have a set of data, that looks like this in dataframe:

channel	start.time	stop time	vp
A	0	9.719	N
A	9.719	11.735	EE
C	0.264	2.032	N
B	26.514	28.264	CH1
D	82.316	82.702	self
D	10.354	11.666	other
C	80.251	82.719	CH2
B	27.564	30.819	CH1
D	25.621	27.693	N
A	10.354	11.666	other
B	80.251	82.719	CH2
B	61.564	64.819	CH1
A	60.621	62.693	N

The first column 'channel' is place in which an observation of an event occured, the start time and stop time columns show when the when started and stopped, and 'vp' shows the kind of event that it was.

I want to see what events occur simultaneously (not only 100% simultaneous, but also when overlapping).

I want to be able to do something like "For every 'A' event with the value of 'N', give me a list of the observations that co-occur with them.' Then I could get a vector or factor that would read something like: B:CH1, C:CH3, D:N, etc. I don't really care about getting the stop or start times, nor the durations or amount of overlap. I simply want a list that shows what I get in categories B,C, and D, if I specify a specific value in 'vp' in the category of 'A'

I imagine some kind of for-loop is order here, but I can even begin to imagine how to get it to look at the overlapping times, nor how to concatenate the output and give the "C:N" output I want.

Any suggestions would be appreciated.

CodePudding user response：

You could use foverlaps from data.table package (dt is your data.frame):

library(data.table)

setDT(dt)
setkey(dt, start.time, stop.time)
dt[, p := paste(channel, vp, sep = ":")]
overlap_dt <- foverlaps(dt, dt)[p != i.p,]

split(overlap_dt$p, overlap_dt$i.p)

For each 'process' returns all other 'processes' that overlap:

$`A:EE`
[1] "A:N"     "D:other" "A:other"

$`A:N`
[1] "C:N"   "A:EE"  "B:CH1"

$`A:other`
[1] "A:EE"    "D:other"

$`B:CH1`
[1] "D:N" "D:N" "A:N"

$`B:CH2`
[1] "C:CH2"  "D:self"

$`C:CH2`
[1] "B:CH2"  "D:self"

$`C:N`
[1] "A:N"

$`D:N`
[1] "B:CH1" "B:CH1"

$`D:other`
[1] "A:EE"    "A:other"

$`D:self`
[1] "C:CH2" "B:CH2"

CodePudding user response：

here is an other data.table approach

library(data.table)
# if your data is not in data.table format already, make it so now
setDT(DT)
# create a unique key
DT[, id := .I]
setkey(DT, id)
#self join on subset by row
DT[DT, overlaps := { 
  temp <- DT[!id == i.id & start.time <= i.stop.time & stop.time >= i.start.time, ]
  paste(temp$channel, temp$vp, sep = "_", collapse = ";")
}, by = .EACHI]

#    channel start.time stop.time    vp id              overlaps
# 1:       A      0.000     9.719     N  1             A:EE, C:N
# 2:       A      9.719    11.735    EE  2 A:N, D:other, A:other
# 3:       C      0.264     2.032     N  3                   A:N
# 4:       B     26.514    28.264   CH1  4            B:CH1, D:N
# 5:       D     82.316    82.702  self  5          C:CH2, B:CH2
# 6:       D     10.354    11.666 other  6         A:EE, A:other
# 7:       C     80.251    82.719   CH2  7         D:self, B:CH2
# 8:       B     27.564    30.819   CH1  8            B:CH1, D:N
# 9:       D     25.621    27.693     N  9          B:CH1, B:CH1
#10:       A     10.354    11.666 other 10         A:EE, D:other
#11:       B     80.251    82.719   CH2 11         D:self, C:CH2
#12:       B     61.564    64.819   CH1 12                   A:N
#13:       A     60.621    62.693     N 13                 B:CH1

# post processing if needed
DT[, paste0("overlap", 1:length(tstrsplit(DT$overlaps, ", "))) := tstrsplit(overlaps, ", ")]
DT[, `:=`(id = NULL, overlaps = NULL)][]

#    channel start.time stop.time    vp overlap1 overlap2 overlap3
# 1:       A      0.000     9.719     N     A:EE      C:N     <NA>
# 2:       A      9.719    11.735    EE      A:N  D:other  A:other
# 3:       C      0.264     2.032     N      A:N     <NA>     <NA>
# 4:       B     26.514    28.264   CH1    B:CH1      D:N     <NA>
# 5:       D     82.316    82.702  self    C:CH2    B:CH2     <NA>
# 6:       D     10.354    11.666 other     A:EE  A:other     <NA>
# 7:       C     80.251    82.719   CH2   D:self    B:CH2     <NA>
# 8:       B     27.564    30.819   CH1    B:CH1      D:N     <NA>
# 9:       D     25.621    27.693     N    B:CH1    B:CH1     <NA>
#10:       A     10.354    11.666 other     A:EE  D:other     <NA>
#11:       B     80.251    82.719   CH2   D:self    C:CH2     <NA>
#12:       B     61.564    64.819   CH1      A:N     <NA>     <NA>
#13:       A     60.621    62.693     N    B:CH1     <NA>     <NA>

sample data

DT <- fread("channel    start.time  stop.time   vp
A   0   9.719   N
A   9.719   11.735  EE
C   0.264   2.032   N
B   26.514  28.264  CH1
D   82.316  82.702  self
D   10.354  11.666  other
C   80.251  82.719  CH2
B   27.564  30.819  CH1
D   25.621  27.693  N
A   10.354  11.666  other
B   80.251  82.719  CH2
B   61.564  64.819  CH1
A   60.621  62.693  N")