Home > Software engineering >  Group Data Frame by Lag Calculation
Group Data Frame by Lag Calculation

Time:01-05

I have the following data that I need to group based on the id column and the lag date being within a certain time frame such as 3 days.

Sorry if this is a duplicate, can't seem to word the question right to get some help.

id <- c("123", "123", "123", "123", "123", "456", "456")  
date <- as.Date(c("2022/10/01", "2022/10/04","2022/10/07", "2022/12/01", "2022/12/04", "2022/03/01", "2022/06/09"))

x <- data.frame(id, date)

The desired output:

id date group
123 2022/10/01 1
123 2022/10/04 1
123 2022/10/07 1
123 2022/12/01 2
123 2022/12/04 2
456 2022/03/01 3
456 2022/06/09 4

CodePudding user response:

Here I count the cumulative times the id changes or gaps larger than three days:

library(dplyr)
x |>
  arrange(id, date) |>    # in case not already ordered
  mutate(group = cumsum(id != lag(id) | 
                        date > lag(date, default = as.Date("2000-01-01"))   3))

Result

   id       date group
1 123 2022-10-01     1
2 123 2022-10-04     1
3 123 2022-10-07     1
4 123 2022-12-01     2
5 123 2022-12-04     2
6 456 2022-03-01     3
7 456 2022-06-09     4
  • Related