I have a set of data, that looks like this in dataframe:
channel | start.time | stop time | vp |
---|---|---|---|
A | 0 | 9.719 | N |
A | 9.719 | 11.735 | EE |
C | 0.264 | 2.032 | N |
B | 26.514 | 28.264 | CH1 |
D | 82.316 | 82.702 | self |
D | 10.354 | 11.666 | other |
C | 80.251 | 82.719 | CH2 |
B | 27.564 | 30.819 | CH1 |
D | 25.621 | 27.693 | N |
A | 10.354 | 11.666 | other |
B | 80.251 | 82.719 | CH2 |
B | 61.564 | 64.819 | CH1 |
A | 60.621 | 62.693 | N |
The first column 'channel' is place in which an observation of an event occured, the start time and stop time columns show when the when started and stopped, and 'vp' shows the kind of event that it was.
I want to see what events occur simultaneously (not only 100% simultaneous, but also when overlapping).
I want to be able to do something like "For every 'A' event with the value of 'N', give me a list of the observations that co-occur with them.' Then I could get a vector or factor that would read something like: B:CH1, C:CH3, D:N, etc. I don't really care about getting the stop or start times, nor the durations or amount of overlap. I simply want a list that shows what I get in categories B,C, and D, if I specify a specific value in 'vp' in the category of 'A'
I imagine some kind of for-loop is order here, but I can even begin to imagine how to get it to look at the overlapping times, nor how to concatenate the output and give the "C:N" output I want.
Any suggestions would be appreciated.
CodePudding user response:
You could use foverlaps
from data.table
package (dt
is your data.frame
):
library(data.table)
setDT(dt)
setkey(dt, start.time, stop.time)
dt[, p := paste(channel, vp, sep = ":")]
overlap_dt <- foverlaps(dt, dt)[p != i.p,]
split(overlap_dt$p, overlap_dt$i.p)
For each 'process' returns all other 'processes' that overlap:
$`A:EE`
[1] "A:N" "D:other" "A:other"
$`A:N`
[1] "C:N" "A:EE" "B:CH1"
$`A:other`
[1] "A:EE" "D:other"
$`B:CH1`
[1] "D:N" "D:N" "A:N"
$`B:CH2`
[1] "C:CH2" "D:self"
$`C:CH2`
[1] "B:CH2" "D:self"
$`C:N`
[1] "A:N"
$`D:N`
[1] "B:CH1" "B:CH1"
$`D:other`
[1] "A:EE" "A:other"
$`D:self`
[1] "C:CH2" "B:CH2"
CodePudding user response:
here is an other data.table
approach
library(data.table)
# if your data is not in data.table format already, make it so now
setDT(DT)
# create a unique key
DT[, id := .I]
setkey(DT, id)
#self join on subset by row
DT[DT, overlaps := {
temp <- DT[!id == i.id & start.time <= i.stop.time & stop.time >= i.start.time, ]
paste(temp$channel, temp$vp, sep = "_", collapse = ";")
}, by = .EACHI]
# channel start.time stop.time vp id overlaps
# 1: A 0.000 9.719 N 1 A:EE, C:N
# 2: A 9.719 11.735 EE 2 A:N, D:other, A:other
# 3: C 0.264 2.032 N 3 A:N
# 4: B 26.514 28.264 CH1 4 B:CH1, D:N
# 5: D 82.316 82.702 self 5 C:CH2, B:CH2
# 6: D 10.354 11.666 other 6 A:EE, A:other
# 7: C 80.251 82.719 CH2 7 D:self, B:CH2
# 8: B 27.564 30.819 CH1 8 B:CH1, D:N
# 9: D 25.621 27.693 N 9 B:CH1, B:CH1
#10: A 10.354 11.666 other 10 A:EE, D:other
#11: B 80.251 82.719 CH2 11 D:self, C:CH2
#12: B 61.564 64.819 CH1 12 A:N
#13: A 60.621 62.693 N 13 B:CH1
# post processing if needed
DT[, paste0("overlap", 1:length(tstrsplit(DT$overlaps, ", "))) := tstrsplit(overlaps, ", ")]
DT[, `:=`(id = NULL, overlaps = NULL)][]
# channel start.time stop.time vp overlap1 overlap2 overlap3
# 1: A 0.000 9.719 N A:EE C:N <NA>
# 2: A 9.719 11.735 EE A:N D:other A:other
# 3: C 0.264 2.032 N A:N <NA> <NA>
# 4: B 26.514 28.264 CH1 B:CH1 D:N <NA>
# 5: D 82.316 82.702 self C:CH2 B:CH2 <NA>
# 6: D 10.354 11.666 other A:EE A:other <NA>
# 7: C 80.251 82.719 CH2 D:self B:CH2 <NA>
# 8: B 27.564 30.819 CH1 B:CH1 D:N <NA>
# 9: D 25.621 27.693 N B:CH1 B:CH1 <NA>
#10: A 10.354 11.666 other A:EE D:other <NA>
#11: B 80.251 82.719 CH2 D:self C:CH2 <NA>
#12: B 61.564 64.819 CH1 A:N <NA> <NA>
#13: A 60.621 62.693 N B:CH1 <NA> <NA>
sample data
DT <- fread("channel start.time stop.time vp
A 0 9.719 N
A 9.719 11.735 EE
C 0.264 2.032 N
B 26.514 28.264 CH1
D 82.316 82.702 self
D 10.354 11.666 other
C 80.251 82.719 CH2
B 27.564 30.819 CH1
D 25.621 27.693 N
A 10.354 11.666 other
B 80.251 82.719 CH2
B 61.564 64.819 CH1
A 60.621 62.693 N")