Home > Software design >  Creating new dataset with only the most recent date in R
Creating new dataset with only the most recent date in R

Time:06-16

I am trying to create a new table where table1 is grouped by ID and only the most recent date is picked from table1$new.

I have the tables below:

1st Table:

ID Status Date
001 1003 2021-02-01
001 1003 2021-02-01
001 1003 2021-02-21
001 1003 2021-04-21
002 1003 2021-05-01
002 1003 2021-05-01
002 1003 2021-05-21
002 1003 2021-07-21
003 1004 2021-05-01
003 1004 2021-05-01
003 1004 2021-05-21
003 1004 2021-07-21

Current Output

ID Status Date
001 1003 2021-02-01
001 1003 2021-02-01
001 1003 2021-02-21
001 1003 2021-04-21
002 1003 2021-05-01
002 1003 2021-05-01
002 1003 2021-05-21
002 1003 2021-07-21

Desired Output:

ID New
001 2021-4-21
002 2021-07-21

Here is my code as of now

desired <- table1 %>% 
group_by(ID) %>% 
subset(Status == '1003', select = c("ID", "Date")) %>% rename(New = Date)

I think my first mistake is in the group_by and I am not sure how to only keep the most recent date. Any help would be appreciated.

CodePudding user response:

I would use dplyr::slice_max() to select the most recent date.

library(tidyverse)
d <- tibble::tribble(
  ~ID, ~Status,        ~Date,
  "001",   1003L, "2021-02-01",
  "001",   1003L, "2021-02-01",
  "001",   1003L, "2021-02-21",
  "001",   1003L, "2021-04-21",
  "002",   1003L, "2021-05-01",
  "002",   1003L, "2021-05-01",
  "002",   1003L, "2021-05-21",
  "002",   1003L, "2021-07-21",
  "003",   1004L, "2021-05-01",
  "003",   1004L, "2021-05-01",
  "003",   1004L, "2021-05-21",
  "003",   1004L, "2021-07-21"
)

res <- d |> 
  mutate(Date = lubridate::ymd(Date)) |> 
  dplyr::filter(Status == 1003) |> 
  group_by(ID) |> 
  slice_max(Date, n = 1) |> 
  ungroup() |> 
  select(ID, New = Date) # rename variable in the selection of variables
res
#> # A tibble: 2 x 2
#>   ID    New       
#>   <chr> <date>    
#> 1 001   2021-04-21
#> 2 002   2021-07-21

Created on 2022-06-15 by the reprex package (v2.0.1)

  • Related