I have the following data that I need to group based on the id column and the lag date being within a certain time frame such as 3 days.
Sorry if this is a duplicate, can't seem to word the question right to get some help.
id <- c("123", "123", "123", "123", "123", "456", "456")
date <- as.Date(c("2022/10/01", "2022/10/04","2022/10/07", "2022/12/01", "2022/12/04", "2022/03/01", "2022/06/09"))
x <- data.frame(id, date)
The desired output:
id | date | group |
---|---|---|
123 | 2022/10/01 | 1 |
123 | 2022/10/04 | 1 |
123 | 2022/10/07 | 1 |
123 | 2022/12/01 | 2 |
123 | 2022/12/04 | 2 |
456 | 2022/03/01 | 3 |
456 | 2022/06/09 | 4 |
CodePudding user response:
Here I count the cumulative times the id changes or gaps larger than three days:
library(dplyr)
x |>
arrange(id, date) |> # in case not already ordered
mutate(group = cumsum(id != lag(id) |
date > lag(date, default = as.Date("2000-01-01")) 3))
Result
id date group
1 123 2022-10-01 1
2 123 2022-10-04 1
3 123 2022-10-07 1
4 123 2022-12-01 2
5 123 2022-12-04 2
6 456 2022-03-01 3
7 456 2022-06-09 4