Delete a subgroup based on the last observation per group-CodePudding

hi and thanks for reading me

I am working with a database that contains two variables to group by dates, but I would like to eliminate those groups in which the last date is less than 2020 although I still cannot do this, does anyone have any idea how to do it? The code that I have so far is the following:

df <- data.frame(

Servicio = c("Servicio 1", "Servicio 1","Servicio 1","Servicio 1","Servicio 1","Servicio 1",
           "Servicio 2", "Servicio 2", "Servicio 2", "Servicio 2","Servicio 2","Servicio 2"),
Subservicio = c("Sub 1","Sub 1","Sub 1","Sub 2","Sub 2","Sub 2","Sub 1","Sub 1","Sub 1","Sub 2"
              ,"Sub 2","Sub 2"),
  fecha = c("2020-01-01","2020-02-01","2020-03-01","2020-01-01","2020-02-01","2020-03-01",
        "2019-01-01","2019-02-01","2019-03-01","2020-01-01","2020-02-01","2020-03-01")

)


df |>
   group_by(Servicio, Subservicio) |>
    slice(tail(row_number(), 1))

Thanks for the help

CodePudding user response：

library(tidyverse)
library(lubridate)

df %>%
   group_by(Servicio, Subservicio) %>%
   filter(last(year(ymd(fecha))) >= 2020)


# A tibble: 9 x 3
# Groups:   Servicio, Subservicio [3]
  Servicio   Subservicio fecha     
  <chr>      <chr>       <chr>     
1 Servicio 1 Sub 1       2020-01-01
2 Servicio 1 Sub 1       2020-02-01
3 Servicio 1 Sub 1       2020-03-01
4 Servicio 1 Sub 2       2020-01-01
5 Servicio 1 Sub 2       2020-02-01
6 Servicio 1 Sub 2       2020-03-01
7 Servicio 2 Sub 2       2020-01-01
8 Servicio 2 Sub 2       2020-02-01
9 Servicio 2 Sub 2       2020-03-01

CodePudding user response：

Another approach using the data.table package. Please find below the following reprex:

Reprex

library(data.table)
library(lubridate)

setDT(df)[df[,.I[year(ymd(fecha))[.N]>= 2020], by = .(Servicio, Subservicio)]$V1]

#>      Servicio Subservicio      fecha
#> 1: Servicio 1       Sub 1 2020-01-01
#> 2: Servicio 1       Sub 1 2020-02-01
#> 3: Servicio 1       Sub 1 2020-03-01
#> 4: Servicio 1       Sub 2 2020-01-01
#> 5: Servicio 1       Sub 2 2020-02-01
#> 6: Servicio 1       Sub 2 2020-03-01
#> 7: Servicio 2       Sub 2 2020-01-01
#> 8: Servicio 2       Sub 2 2020-02-01
#> 9: Servicio 2       Sub 2 2020-03-01

^{Created on 2021-11-01 by the reprex package (v0.3.0)}