Home > OS >  Pulling values of co-occuring/over-lapping time data in R
Pulling values of co-occuring/over-lapping time data in R

Time:12-27

I have a set of data, that looks like this in dataframe:

channel start.time stop time vp
A 0 9.719 N
A 9.719 11.735 EE
C 0.264 2.032 N
B 26.514 28.264 CH1
D 82.316 82.702 self
D 10.354 11.666 other
C 80.251 82.719 CH2
B 27.564 30.819 CH1
D 25.621 27.693 N
A 10.354 11.666 other
B 80.251 82.719 CH2
B 61.564 64.819 CH1
A 60.621 62.693 N

The first column 'channel' is place in which an observation of an event occured, the start time and stop time columns show when the when started and stopped, and 'vp' shows the kind of event that it was.

I want to see what events occur simultaneously (not only 100% simultaneous, but also when overlapping).

I want to be able to do something like "For every 'A' event with the value of 'N', give me a list of the observations that co-occur with them.' Then I could get a vector or factor that would read something like: B:CH1, C:CH3, D:N, etc. I don't really care about getting the stop or start times, nor the durations or amount of overlap. I simply want a list that shows what I get in categories B,C, and D, if I specify a specific value in 'vp' in the category of 'A'

I imagine some kind of for-loop is order here, but I can even begin to imagine how to get it to look at the overlapping times, nor how to concatenate the output and give the "C:N" output I want.

Any suggestions would be appreciated.

CodePudding user response:

You could use foverlaps from data.table package (dt is your data.frame):

library(data.table)

setDT(dt)
setkey(dt, start.time, stop.time)
dt[, p := paste(channel, vp, sep = ":")]
overlap_dt <- foverlaps(dt, dt)[p != i.p,]

split(overlap_dt$p, overlap_dt$i.p)

For each 'process' returns all other 'processes' that overlap:

$`A:EE`
[1] "A:N"     "D:other" "A:other"

$`A:N`
[1] "C:N"   "A:EE"  "B:CH1"

$`A:other`
[1] "A:EE"    "D:other"

$`B:CH1`
[1] "D:N" "D:N" "A:N"

$`B:CH2`
[1] "C:CH2"  "D:self"

$`C:CH2`
[1] "B:CH2"  "D:self"

$`C:N`
[1] "A:N"

$`D:N`
[1] "B:CH1" "B:CH1"

$`D:other`
[1] "A:EE"    "A:other"

$`D:self`
[1] "C:CH2" "B:CH2"

CodePudding user response:

here is an other data.table approach

library(data.table)
# if your data is not in data.table format already, make it so now
setDT(DT)
# create a unique key
DT[, id := .I]
setkey(DT, id)
#self join on subset by row
DT[DT, overlaps := { 
  temp <- DT[!id == i.id & start.time <= i.stop.time & stop.time >= i.start.time, ]
  paste(temp$channel, temp$vp, sep = "_", collapse = ";")
}, by = .EACHI]

#    channel start.time stop.time    vp id              overlaps
# 1:       A      0.000     9.719     N  1             A:EE, C:N
# 2:       A      9.719    11.735    EE  2 A:N, D:other, A:other
# 3:       C      0.264     2.032     N  3                   A:N
# 4:       B     26.514    28.264   CH1  4            B:CH1, D:N
# 5:       D     82.316    82.702  self  5          C:CH2, B:CH2
# 6:       D     10.354    11.666 other  6         A:EE, A:other
# 7:       C     80.251    82.719   CH2  7         D:self, B:CH2
# 8:       B     27.564    30.819   CH1  8            B:CH1, D:N
# 9:       D     25.621    27.693     N  9          B:CH1, B:CH1
#10:       A     10.354    11.666 other 10         A:EE, D:other
#11:       B     80.251    82.719   CH2 11         D:self, C:CH2
#12:       B     61.564    64.819   CH1 12                   A:N
#13:       A     60.621    62.693     N 13                 B:CH1

# post processing if needed
DT[, paste0("overlap", 1:length(tstrsplit(DT$overlaps, ", "))) := tstrsplit(overlaps, ", ")]
DT[, `:=`(id = NULL, overlaps = NULL)][]

#    channel start.time stop.time    vp overlap1 overlap2 overlap3
# 1:       A      0.000     9.719     N     A:EE      C:N     <NA>
# 2:       A      9.719    11.735    EE      A:N  D:other  A:other
# 3:       C      0.264     2.032     N      A:N     <NA>     <NA>
# 4:       B     26.514    28.264   CH1    B:CH1      D:N     <NA>
# 5:       D     82.316    82.702  self    C:CH2    B:CH2     <NA>
# 6:       D     10.354    11.666 other     A:EE  A:other     <NA>
# 7:       C     80.251    82.719   CH2   D:self    B:CH2     <NA>
# 8:       B     27.564    30.819   CH1    B:CH1      D:N     <NA>
# 9:       D     25.621    27.693     N    B:CH1    B:CH1     <NA>
#10:       A     10.354    11.666 other     A:EE  D:other     <NA>
#11:       B     80.251    82.719   CH2   D:self    C:CH2     <NA>
#12:       B     61.564    64.819   CH1      A:N     <NA>     <NA>
#13:       A     60.621    62.693     N    B:CH1     <NA>     <NA>

sample data

DT <- fread("channel    start.time  stop.time   vp
A   0   9.719   N
A   9.719   11.735  EE
C   0.264   2.032   N
B   26.514  28.264  CH1
D   82.316  82.702  self
D   10.354  11.666  other
C   80.251  82.719  CH2
B   27.564  30.819  CH1
D   25.621  27.693  N
A   10.354  11.666  other
B   80.251  82.719  CH2
B   61.564  64.819  CH1
A   60.621  62.693  N")
  • Related