I have a df which looks like the following:
ID Start End Contact
1 2022-08-08 2022-08-08 2022-08-08
1 2022-08-08 2022-08-08 2022-08-10
2 2022-08-05 2022-08-14 2022-08-06
2 2022-08-05 2022-08-14 2022-08-13
2 2022-08-05 2022-08-14 2022-08-16
3 2022-08-27 2022-08-31 2022-08-29
3 2022-08-27 2022-08-31 2022-09-01
3 2022-08-27 2022-08-31 2022-09-03
4 2022-08-28 2022-08-31 2022-08-28
4 2022-08-28 2022-08-31 2022-08-30
I want to remove the rows where the Contact dates are > than the End date for each ID so that the data then looks like this:
ID Start End Contact
1 2022-08-08 2022-08-08 2022-08-08
2 2022-08-05 2022-08-14 2022-08-06
2 2022-08-05 2022-08-14 2022-08-13
3 2022-08-27 2022-08-31 2022-08-29
4 2022-08-28 2022-08-31 2022-08-28
4 2022-08-28 2022-08-31 2022-08-30
I have tried :df %>% group_by(ID) %>% filter(Contact <= End)
but this doesn't give the desired output. Please help.
CodePudding user response:
df <- clipr::read_clip_tbl() %>%
tibble::as_tibble() %>%
tidyr::separate(., col = "ID.Start.......End.........Contact", into = c("ID", "Start", "End", "Contract"), sep = " ")
df %>%
dplyr::mutate(Start = as.Date(Start), End = as.Date(End), Contract = as.Date(Contract)) %>%
dplyr::filter(Contract <= End)
# A tibble: 6 × 4
ID Start End Contract
<chr> <date> <date> <date>
1 1 2022-08-08 2022-08-08 2022-08-08
2 2 2022-08-05 2022-08-14 2022-08-06
3 2 2022-08-05 2022-08-14 2022-08-13
4 3 2022-08-27 2022-08-31 2022-08-29
5 4 2022-08-28 2022-08-31 2022-08-28
6 4 2022-08-28 2022-08-31 2022-08-30
Please excuse the hackey way of importing and changing to dates.
CodePudding user response:
It runs fine for me:
library(dplyr)
data.frame(
ID = c(1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L),
Start = c("2022-08-08","2022-08-08","2022-08-05",
"2022-08-05","2022-08-05","2022-08-27","2022-08-27",
"2022-08-27","2022-08-28","2022-08-28"),
End = c("2022-08-08","2022-08-08","2022-08-14",
"2022-08-14","2022-08-14","2022-08-31","2022-08-31",
"2022-08-31","2022-08-31","2022-08-31"),
Contact = c("2022-08-08","2022-08-10","2022-08-06",
"2022-08-13","2022-08-16","2022-08-29","2022-09-01",
"2022-09-03","2022-08-28","2022-08-30")
) %>%
group_by(ID) %>%
filter(Contact <= End)
# A tibble: 6 × 4
# Groups: ID [4]
ID Start End Contact
<int> <chr> <chr> <chr>
1 1 2022-08-08 2022-08-08 2022-08-08
2 2 2022-08-05 2022-08-14 2022-08-06
3 2 2022-08-05 2022-08-14 2022-08-13
4 3 2022-08-27 2022-08-31 2022-08-29
5 4 2022-08-28 2022-08-31 2022-08-28
6 4 2022-08-28 2022-08-31 2022-08-30