Calculate time between events with multiple observations using dplyr-CodePudding

I have data in the following format:

    mydata <- data.frame(id=c(1,1,1,2,2,2,2),event=c(1,1,2,1,2,2,3), time=c(2,2,3,6,8,8,11))
                         
    mydata

  id event time
1  1     1    2
2  1     1    2
3  1     2    3
4  2     1    6
5  2     2    8
6  2     2    8
7  2     3   11

I wish to calculate the time between each event, but am having trouble because some events have multiple observations. The resulting column should look like this:

  id event time event_dt
1  1     1    2        0
2  1     1    2        0
3  1     2    3        1
4  2     1    6        0
5  2     2    8        2
6  2     2    8        2
7  2     3   11        3

I would like to do this using dplyr, if possible.

CodePudding user response：

I added in minimum time in case you had multiple times (if possible) for a given event. Then, you can use lag to capture the time difference, then this can be joined back to the original dataframe.

library(tidyverse)

mydata %>%
  dplyr::group_by(id, event) %>%
  dplyr::mutate_at(vars("time"), min) %>% 
  dplyr::distinct() %>% 
  dplyr::ungroup(event) %>% 
  dplyr::mutate(event_dt = time - lag(time)) %>% 
  dplyr::left_join(., mydata, by = c("id", "event", "time")) %>% 
  tidyr::replace_na(., list(event_dt=0))

Output

# A tibble: 7 × 4
# Groups:   id [2]
     id event  time event_dt
  <dbl> <dbl> <dbl>    <dbl>
1     1     1     2        0
2     1     1     2        0
3     1     2     3        1
4     2     1     6        0
5     2     2     8        2
6     2     2     8        2
7     2     3    11        3

Data

mydata <- structure(list(
  id = c(1, 1, 1, 2, 2, 2, 2),
  event = c(1, 1, 2, 1, 2, 2, 3),
  time = c(2, 2, 3, 6, 8, 8, 11)
),
class = "data.frame",
row.names = c(NA,-7L))

CodePudding user response：

Calculate the differences across each unique id/event/time combination, then merge it back:

mydata %>% 
  distinct(id, event, time) %>%
  group_by(id) %>%
  mutate(event_dt = c(0, diff(time))) %>%
  right_join(mydata)

#Joining, by = c("id", "event", "time")
## A tibble: 7 x 4
## Groups:   id [2]
#     id event  time event_dt
#  <dbl> <dbl> <dbl>    <dbl>
#1     1     1     2        0
#2     1     1     2        0
#3     1     2     3        1
#4     2     1     6        0
#5     2     2     8        2
#6     2     2     8        2
#7     2     3    11        3