I have a data as:
ID Date1 VarA 1 2005-01-02 x 1 2021-01-02 20 1 2021-01-01 y 2 2020-12-20 No 2 2020-12-19 10 3 1998-05-01 0
Here is the R-code to reproduce the data
example = data.frame(ID = c(1,1,1,2,2,3),
Date1 = c('2005-01-02',
'2021-01-02',
'2021-01-01',
'2020-12-20',
'2020-12-19',
'1998-05-01'),
VarA = c('x','20','y','No', '10','0'))
I would prefer the solution to do following:
First, flag the maximum date in data.
ID Date1 VarA Last_visit 1 2005-01-02 x 0 1 2021-01-02 20 1 1 2021-01-01 y 0 2 2020-12-20 No 1 2 2020-12-19 10 0 3 1998-05-01 0 1
Finally, It should retain only where the Last_visit=1
ID Date1 VarA Last_visit 1 2021-01-02 20 1 2 2020-12-20 No 1 3 1998-05-01 0 1
I am requesting the intermediate steps as well to perform a sanity check. Thanks!
CodePudding user response:
We create a new column after grouping by 'ID'
library(dplyr)
example %>%
group_by(ID) %>%
mutate(Last_visit = (row_number() %in% which.max(as.Date(Date1)))) %>%
ungroup
and then filter/slice
based on the column
example %>%
group_by(ID) %>%
mutate(Last_visit = (row_number() %in% which.max(as.Date(Date1)))) %>%
slice_max(n = 1, order_by = Last_visit) %>%
ungroup
-output
# A tibble: 3 × 4
ID Date1 VarA Last_visit
<dbl> <chr> <chr> <int>
1 1 2021-01-02 20 1
2 2 2020-12-20 No 1
3 3 1998-05-01 0 1
Another option is to convert the 'Date1' to Date
class first, then do an arrange
and use distinct
example %>%
mutate(Date1 = as.Date(Date1)) %>%
arrange(ID, desc(Date1)) %>%
distinct(ID, .keep_all = TRUE) %>%
mutate(Last_visit = 1)
ID Date1 VarA Last_visit
1 1 2021-01-02 20 1
2 2 2020-12-20 No 1
3 3 1998-05-01 0 1