This is an example of a dataset I am working with
library(tidyverse)
id <- c(1,1,1,2,2,2,3,3,4)
date <- c("2020-01-01","2020-02-04", "2021-06-08","2022-05-08","2021-08-09","2021-11-29","2020-04-12", "2022-09-12", "2022-09-22")
df <- data.frame(id,date)
# A tibble: 9 × 2
id date
<dbl> <chr>
1 1 2020-01-01
2 1 2020-02-04
3 1 2021-06-08
4 2 2022-05-08
5 2 2021-08-09
6 2 2021-11-29
7 3 2020-04-12
8 3 2022-09-12
9 4 2022-09-22
I would like to group these observations by the ID value and look at the dates to assign another ID based on the days that fall within 90 days of each other like below
# A tibble: 9 × 3
id date ep_id
<dbl> <chr> <dbl>
1 1 2020-01-01 1
2 1 2020-02-04 1
3 1 2021-06-08 2
4 2 2022-05-08 2
5 2 2021-08-09 1
6 2 2021-11-29 2
7 3 2020-04-12 1
8 3 2022-09-12 2
9 4 2022-09-22 1
I can't rack my brain how I would go about this. I would need to be able to add ep_ID from 1:n depending on how many dates fall within 90 days of each other.
CodePudding user response:
We may do
library(dplyr)
df %>%
mutate(date = as.Date(date), rn = row_number()) %>%
arrange(id, date) %>%
group_by(id) %>%
mutate(ep_id = cumsum(c(1, diff(date) > 90))) %>%
ungroup %>%
arrange(rn) %>%
select(-rn)
-output
# A tibble: 9 × 3
id date ep_id
<dbl> <date> <dbl>
1 1 2020-01-01 1
2 1 2020-02-04 1
3 1 2021-06-08 2
4 2 2022-05-08 3
5 2 2021-08-09 1
6 2 2021-11-29 2
7 3 2020-04-12 1
8 3 2022-09-12 2
9 4 2022-09-22 1