hi and thanks for reading me
I am working with a database that contains two variables to group by dates, but I would like to eliminate those groups in which the last date is less than 2020 although I still cannot do this, does anyone have any idea how to do it? The code that I have so far is the following:
df <- data.frame(
Servicio = c("Servicio 1", "Servicio 1","Servicio 1","Servicio 1","Servicio 1","Servicio 1",
"Servicio 2", "Servicio 2", "Servicio 2", "Servicio 2","Servicio 2","Servicio 2"),
Subservicio = c("Sub 1","Sub 1","Sub 1","Sub 2","Sub 2","Sub 2","Sub 1","Sub 1","Sub 1","Sub 2"
,"Sub 2","Sub 2"),
fecha = c("2020-01-01","2020-02-01","2020-03-01","2020-01-01","2020-02-01","2020-03-01",
"2019-01-01","2019-02-01","2019-03-01","2020-01-01","2020-02-01","2020-03-01")
)
df |>
group_by(Servicio, Subservicio) |>
slice(tail(row_number(), 1))
Thanks for the help
CodePudding user response:
library(tidyverse)
library(lubridate)
df %>%
group_by(Servicio, Subservicio) %>%
filter(last(year(ymd(fecha))) >= 2020)
# A tibble: 9 x 3
# Groups: Servicio, Subservicio [3]
Servicio Subservicio fecha
<chr> <chr> <chr>
1 Servicio 1 Sub 1 2020-01-01
2 Servicio 1 Sub 1 2020-02-01
3 Servicio 1 Sub 1 2020-03-01
4 Servicio 1 Sub 2 2020-01-01
5 Servicio 1 Sub 2 2020-02-01
6 Servicio 1 Sub 2 2020-03-01
7 Servicio 2 Sub 2 2020-01-01
8 Servicio 2 Sub 2 2020-02-01
9 Servicio 2 Sub 2 2020-03-01
CodePudding user response:
Another approach using the data.table
package. Please find below the following reprex:
Reprex
library(data.table)
library(lubridate)
setDT(df)[df[,.I[year(ymd(fecha))[.N]>= 2020], by = .(Servicio, Subservicio)]$V1]
#> Servicio Subservicio fecha
#> 1: Servicio 1 Sub 1 2020-01-01
#> 2: Servicio 1 Sub 1 2020-02-01
#> 3: Servicio 1 Sub 1 2020-03-01
#> 4: Servicio 1 Sub 2 2020-01-01
#> 5: Servicio 1 Sub 2 2020-02-01
#> 6: Servicio 1 Sub 2 2020-03-01
#> 7: Servicio 2 Sub 2 2020-01-01
#> 8: Servicio 2 Sub 2 2020-02-01
#> 9: Servicio 2 Sub 2 2020-03-01
Created on 2021-11-01 by the reprex package (v0.3.0)