Home > Software design >  Dynamically assign a ID based on date interval for groups
Dynamically assign a ID based on date interval for groups

Time:10-26

This is an example of a dataset I am working with

library(tidyverse)

id <- c(1,1,1,2,2,2,3,3,4)
date <- c("2020-01-01","2020-02-04", "2021-06-08","2022-05-08","2021-08-09","2021-11-29","2020-04-12", "2022-09-12", "2022-09-22")

df <- data.frame(id,date)

# A tibble: 9 × 2
     id date      
  <dbl> <chr>     
1     1 2020-01-01
2     1 2020-02-04
3     1 2021-06-08
4     2 2022-05-08
5     2 2021-08-09
6     2 2021-11-29
7     3 2020-04-12
8     3 2022-09-12
9     4 2022-09-22

I would like to group these observations by the ID value and look at the dates to assign another ID based on the days that fall within 90 days of each other like below

# A tibble: 9 × 3
     id date       ep_id
  <dbl> <chr>      <dbl>
1     1 2020-01-01     1
2     1 2020-02-04     1
3     1 2021-06-08     2
4     2 2022-05-08     2
5     2 2021-08-09     1
6     2 2021-11-29     2
7     3 2020-04-12     1
8     3 2022-09-12     2
9     4 2022-09-22     1

I can't rack my brain how I would go about this. I would need to be able to add ep_ID from 1:n depending on how many dates fall within 90 days of each other.

CodePudding user response:

We may do

library(dplyr)
df %>% 
  mutate(date = as.Date(date), rn = row_number()) %>%
  arrange(id, date) %>% 
  group_by(id) %>%
  mutate(ep_id = cumsum(c(1, diff(date) > 90))) %>%
  ungroup %>%
  arrange(rn) %>%
  select(-rn)

-output

# A tibble: 9 × 3
     id date       ep_id
  <dbl> <date>     <dbl>
1     1 2020-01-01     1
2     1 2020-02-04     1
3     1 2021-06-08     2
4     2 2022-05-08     3
5     2 2021-08-09     1
6     2 2021-11-29     2
7     3 2020-04-12     1
8     3 2022-09-12     2
9     4 2022-09-22     1
  • Related