Home > front end >  filtering and manipulating rows in df in r
filtering and manipulating rows in df in r

Time:10-07

I have a df which looks like the following:

ID Start       End         Contact
1  2022-08-08  2022-08-08  2022-08-08
1  2022-08-08  2022-08-08  2022-08-10
2  2022-08-05  2022-08-14  2022-08-06
2  2022-08-05  2022-08-14  2022-08-13
2  2022-08-05  2022-08-14  2022-08-16
3  2022-08-27  2022-08-31  2022-08-29
3  2022-08-27  2022-08-31  2022-09-01
3  2022-08-27  2022-08-31  2022-09-03
4  2022-08-28  2022-08-31  2022-08-28
4  2022-08-28  2022-08-31  2022-08-30 

I want to remove the rows where the Contact dates are > than the End date for each ID so that the data then looks like this:

ID Start       End         Contact
1  2022-08-08  2022-08-08  2022-08-08
2  2022-08-05  2022-08-14  2022-08-06
2  2022-08-05  2022-08-14  2022-08-13
3  2022-08-27  2022-08-31  2022-08-29
4  2022-08-28  2022-08-31  2022-08-28
4  2022-08-28  2022-08-31  2022-08-30

I have tried :df %>% group_by(ID) %>% filter(Contact <= End) but this doesn't give the desired output. Please help.

CodePudding user response:

df <- clipr::read_clip_tbl() %>% 
  tibble::as_tibble() %>% 
  tidyr::separate(., col = "ID.Start.......End.........Contact", into = c("ID", "Start", "End", "Contract"), sep = "  ")
    
df %>% 
  dplyr::mutate(Start = as.Date(Start), End = as.Date(End), Contract = as.Date(Contract)) %>% 
  dplyr::filter(Contract <= End)

# A tibble: 6 × 4
  ID    Start      End        Contract  
  <chr> <date>     <date>     <date>    
1 1     2022-08-08 2022-08-08 2022-08-08
2 2     2022-08-05 2022-08-14 2022-08-06
3 2     2022-08-05 2022-08-14 2022-08-13
4 3     2022-08-27 2022-08-31 2022-08-29
5 4     2022-08-28 2022-08-31 2022-08-28
6 4     2022-08-28 2022-08-31 2022-08-30

Please excuse the hackey way of importing and changing to dates.

CodePudding user response:

It runs fine for me:

library(dplyr)
data.frame(
          ID = c(1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L),
       Start = c("2022-08-08","2022-08-08","2022-08-05",
                 "2022-08-05","2022-08-05","2022-08-27","2022-08-27",
                 "2022-08-27","2022-08-28","2022-08-28"),
         End = c("2022-08-08","2022-08-08","2022-08-14",
                 "2022-08-14","2022-08-14","2022-08-31","2022-08-31",
                 "2022-08-31","2022-08-31","2022-08-31"),
     Contact = c("2022-08-08","2022-08-10","2022-08-06",
                 "2022-08-13","2022-08-16","2022-08-29","2022-09-01",
                 "2022-09-03","2022-08-28","2022-08-30")
) %>% 
  group_by(ID) %>%
  filter(Contact <= End)



# A tibble: 6 × 4
# Groups:   ID [4]
     ID Start      End        Contact   
  <int> <chr>      <chr>      <chr>     
1     1 2022-08-08 2022-08-08 2022-08-08
2     2 2022-08-05 2022-08-14 2022-08-06
3     2 2022-08-05 2022-08-14 2022-08-13
4     3 2022-08-27 2022-08-31 2022-08-29
5     4 2022-08-28 2022-08-31 2022-08-28
6     4 2022-08-28 2022-08-31 2022-08-30
  • Related