I have a dataset with groups and dates like this:
> df
Group Date
1 01-01-2016
1 01-02-2016
1 01-03-2016
2 01-04-2016
2 01-05-2016
2 01-06-2016
I would like to only keep the most recent plus the second most recent rows. So I would like to end up with this:
> df
Group Date
1 01-02-2016
1 01-03-2016
2 01-05-2016
2 01-06-2016
Until now I got it sorted by date like this:
sorted_data <- df %>% arrange(Group,Date)
And I also found that just getting the most recent date row, I could do this:
df %>%
group_by(Group) %>%
slice(which.max(as.Date(Date, '%d-%m-%Y')))
But I'm not sure how to keep the 2 most recent rows, does someone know?
CodePudding user response:
Does this work:
library(dplyr)
df %>% mutate(Date = lubridate::dmy(Date)) %>% group_by(Group) %>% slice_max(Date,n= 2)
# A tibble: 4 × 2
# Groups: Group [2]
Group Date
<dbl> <date>
1 1 2016-03-01
2 1 2016-02-01
3 2 2016-06-01
4 2 2016-05-01