Using dplyr, I want to do a group by, followed by a date comparison for the following data frame.
df <- data.frame(ID = c(1,1,2,2,3,3,4,4,5,6),
X1 = c("A","A","B","C","A","B","B","B","C","A"),
X2 = sample(10:30,10,replace = TRUE),
dat = as.Date(c("2021-01-01","2021-01-01","2021-02-01","2021-02-01","2021-01-03",
"2021-10-05","2021-05-05","2021-05-06","2021-09-14","2021-06-04")))
The group by should be on ID and X1 (X2 can be ignored). So basically, for all IDs with identical values for X1, the dates should be compared and IDs where the dates difference is 1 (positive or negative) or less should be kept. The desired output is:
ID X1
1 1 A
2 1 A
3 4 B
4 4 B
CodePudding user response:
Grouping by ID
and X1
select only those groups that have 2 or more rows and the difference between dates is 1.
You can try -
library(dplyr)
df %>%
group_by(ID, X1) %>%
filter(n() >= 2, all(abs(diff(dat)) <= 1)) %>%
ungroup
# ID X1 X2 dat
# <dbl> <chr> <int> <date>
#1 1 A 30 2021-01-01
#2 1 A 19 2021-01-01
#3 4 B 24 2021-05-05
#4 4 B 30 2021-05-06
If you are only interested in ID
and X1
column add %>% select(ID, X1)
.