Home > Back-end >  R - dplyr keep 2 most recent (date) rows
R - dplyr keep 2 most recent (date) rows

Time:10-11

I have a dataset with groups and dates like this:

> df
   Group  Date 
   1      01-01-2016
   1      01-02-2016
   1      01-03-2016
   2      01-04-2016
   2      01-05-2016
   2      01-06-2016

I would like to only keep the most recent plus the second most recent rows. So I would like to end up with this:

> df
       Group  Date
       1      01-02-2016
       1      01-03-2016
       2      01-05-2016
       2      01-06-2016

Until now I got it sorted by date like this:

sorted_data <- df %>% arrange(Group,Date)

And I also found that just getting the most recent date row, I could do this:

df %>% 
  group_by(Group) %>%
  slice(which.max(as.Date(Date, '%d-%m-%Y')))

But I'm not sure how to keep the 2 most recent rows, does someone know?

CodePudding user response:

Does this work:

library(dplyr)

df %>% mutate(Date = lubridate::dmy(Date)) %>% group_by(Group) %>% slice_max(Date,n= 2)
# A tibble: 4 × 2
# Groups:   Group [2]
  Group Date      
  <dbl> <date>    
1     1 2016-03-01
2     1 2016-02-01
3     2 2016-06-01
4     2 2016-05-01
  • Related