I have data in the following format:
mydata <- data.frame(id=c(1,1,1,2,2,2,2),event=c(1,1,2,1,2,2,3), time=c(2,2,3,6,8,8,11))
mydata
id event time
1 1 1 2
2 1 1 2
3 1 2 3
4 2 1 6
5 2 2 8
6 2 2 8
7 2 3 11
I wish to calculate the time between each event, but am having trouble because some events have multiple observations. The resulting column should look like this:
id event time event_dt
1 1 1 2 0
2 1 1 2 0
3 1 2 3 1
4 2 1 6 0
5 2 2 8 2
6 2 2 8 2
7 2 3 11 3
I would like to do this using dplyr, if possible.
CodePudding user response:
I added in minimum time in case you had multiple times (if possible) for a given event. Then, you can use lag
to capture the time difference, then this can be joined back to the original dataframe.
library(tidyverse)
mydata %>%
dplyr::group_by(id, event) %>%
dplyr::mutate_at(vars("time"), min) %>%
dplyr::distinct() %>%
dplyr::ungroup(event) %>%
dplyr::mutate(event_dt = time - lag(time)) %>%
dplyr::left_join(., mydata, by = c("id", "event", "time")) %>%
tidyr::replace_na(., list(event_dt=0))
Output
# A tibble: 7 × 4
# Groups: id [2]
id event time event_dt
<dbl> <dbl> <dbl> <dbl>
1 1 1 2 0
2 1 1 2 0
3 1 2 3 1
4 2 1 6 0
5 2 2 8 2
6 2 2 8 2
7 2 3 11 3
Data
mydata <- structure(list(
id = c(1, 1, 1, 2, 2, 2, 2),
event = c(1, 1, 2, 1, 2, 2, 3),
time = c(2, 2, 3, 6, 8, 8, 11)
),
class = "data.frame",
row.names = c(NA,-7L))
CodePudding user response:
Calculate the differences across each unique id/event/time
combination, then merge it back:
mydata %>%
distinct(id, event, time) %>%
group_by(id) %>%
mutate(event_dt = c(0, diff(time))) %>%
right_join(mydata)
#Joining, by = c("id", "event", "time")
## A tibble: 7 x 4
## Groups: id [2]
# id event time event_dt
# <dbl> <dbl> <dbl> <dbl>
#1 1 1 2 0
#2 1 1 2 0
#3 1 2 3 1
#4 2 1 6 0
#5 2 2 8 2
#6 2 2 8 2
#7 2 3 11 3