Insert rows for missing time measurements of a negative event-CodePudding

I have data from an experiment where Subjects rated an Event called f:

df <- structure(list(Subject = c("A", "A", "A", "B", "B", "B"),
                     Timestamp = c("00:00:00.146 - 00:00:00.889", 
                                   "00:00:01.568 - 00:00:02.183", "00:00:03.642 - 00:00:04.522", 
                                   "00:00:00.000 - 00:00:00.660", "00:00:01.247 - 00:00:02.229", 
                                   "00:00:03.697 - 00:00:04.926"),
                     Starttime_ms = c(146, 1568, 3642, 0, 1247, 3697), 
                     Endtime_ms = c(889, 2183, 4522, 660, 2229, 4926), 
                     Duration = c(743, 615, 880, 660, 982, 1229), 
                     Event = c("f", "f", "f", "f", "f", "f")), row.names = c(NA, 6L), class = "data.frame")

What is missing are the respective measurements for those in-between timespans where the Subjects did not rate the event. That's an event too, just a negative one; let's call it nf. How can I insert rows for those missing timespans to obtain the complete record including both positive and negative events f and nf shown below?

Expected:

  Subject                   Timestamp Starttime_ms Endtime_ms Duration Event
1       A 00:00:00.000 - 00:00:00.146          0          146      146    nf
2       A 00:00:00.146 - 00:00:00.889          146        889      743     f
3       A 00:00:00.889 - 00:00:01.568          889       1568      679    nf
4       A 00:00:01.568 - 00:00:02.183         1568       2183      615     f
5       A 00:00:02.183 - 00:00:03.642         2183       3642     1459    nf
6       A 00:00:03.642 - 00:00:04.522         3642       4522      880     f
7       B 00:00:00.000 - 00:00:00.660            0        660      660     f
8       B 00:00:00.660 - 00:00:01.247          660       1247      587    nf
9       B 00:00:01.247 - 00:00:02.229         1247       2229      982     f
10      B 00:00:02.229 - 00:00:03.697         2229       3697     1468    nf
11      B 00:00:03.697 - 00:00:04.926         3697       4926     1229     f

I've experimented with data.tablebut since I'm not fluent with this, this attempt only brought partial success:

unq <- c(0, sort(unique(setDT(df)[, c(Starttime_ms, Endtime_ms)])))
df[.(unq[-length(unq)], unq[-1]), on=c("Starttime_ms", "Endtime_ms")]

While I'm open to any solution my preference is for a dplyr one.

CodePudding user response：

This is a tidyverse solution.

library(tidyverse)
library(lubridate)

df %>%
  separate(Timestamp, c("start", "end"), sep = " - ") %>%
  group_by(Subject) %>%
  mutate(Starttime_ms = lag(end, default = "00:00:00.000"),
         Endtime_ms = start,
         Event = "nf",
         across(ends_with("_ms"), ~ hms(.x) * 1000),
         Duration = Endtime_ms - Starttime_ms,
         across(where(is.period), as.numeric)) %>%
  unite(Timestamp, start, end, sep = " - ") %>%
  bind_rows(df) %>%
  arrange(Starttime_ms, .by_group = TRUE) %>%
  filter(Duration > 0) %>%
  ungroup()

Output

# A tibble: 11 × 6
   Subject Timestamp                   Starttime_ms Endtime_ms Duration Event
   <chr>   <chr>                              <dbl>      <dbl>    <dbl> <chr>
 1 A       00:00:00.146 - 00:00:00.889            0        146      146 nf   
 2 A       00:00:00.146 - 00:00:00.889          146        889      743 f    
 3 A       00:00:01.568 - 00:00:02.183          889       1568      679 nf   
 4 A       00:00:01.568 - 00:00:02.183         1568       2183      615 f    
 5 A       00:00:03.642 - 00:00:04.522         2183       3642     1459 nf   
 6 A       00:00:03.642 - 00:00:04.522         3642       4522      880 f    
 7 B       00:00:00.000 - 00:00:00.660            0        660      660 f    
 8 B       00:00:01.247 - 00:00:02.229          660       1247      587 nf   
 9 B       00:00:01.247 - 00:00:02.229         1247       2229      982 f    
10 B       00:00:03.697 - 00:00:04.926         2229       3697     1468 nf   
11 B       00:00:03.697 - 00:00:04.926         3697       4926     1229 f

CodePudding user response：

You were nearly there. The issue is to complete the time sequence separately for each group. This is what would do:

library(data.table)
# create sequence of complete periods for each Subject
unq <- setDT(df)[, {
  tmp <- sort(unique(c(0, Starttime_ms, Endtime_ms)))
  list(Starttime_ms = head(tmp, -1L),
       Endtime_ms = tail(tmp, -1L))
}, by = Subject]
# join and complete missing values (except Timestamp)
df[unq, on = .(Subject, Starttime_ms, Endtime_ms)][
  is.na(Event), c("Duration", "Event") := .(Endtime_ms - Starttime_ms, "nf")][]

    Subject                   Timestamp Starttime_ms Endtime_ms Duration Event
 1:       A                        <NA>            0        146      146    nf
 2:       A 00:00:00.146 - 00:00:00.889          146        889      743     f
 3:       A                        <NA>          889       1568      679    nf
 4:       A 00:00:01.568 - 00:00:02.183         1568       2183      615     f
 5:       A                        <NA>         2183       3642     1459    nf
 6:       A 00:00:03.642 - 00:00:04.522         3642       4522      880     f
 7:       B 00:00:00.000 - 00:00:00.660            0        660      660     f
 8:       B                        <NA>          660       1247      587    nf
 9:       B 00:00:01.247 - 00:00:02.229         1247       2229      982     f
10:       B                        <NA>         2229       3697     1468    nf
11:       B 00:00:03.697 - 00:00:04.926         3697       4926     1229     f

I am still unsure what the best way is to complete Timestamp.

unq contains all periods for each subject to join on:

unq

    Subject Starttime_ms Endtime_ms
 1:       A            0        146
 2:       A          146        889
 3:       A          889       1568
 4:       A         1568       2183
 5:       A         2183       3642
 6:       A         3642       4522
 7:       B            0        660
 8:       B          660       1247
 9:       B         1247       2229
10:       B         2229       3697
11:       B         3697       4926