Home > Mobile >  R: Subset a data frame based on higest date for multiple columns
R: Subset a data frame based on higest date for multiple columns

Time:05-04

I have a dafa frame which looks roughly like this:

Location          Date         code     total_cases   total_vaccinations
Afghanistan       2022-04-23     NA          5.00               NA
Afghanistan       2022-04-22     3           3.00                2
Afghanistan       2022-04-21     2           3.00               NA
Albania           2022-04-24     3           9.00               NA
Albania           2022-04-23     NA          9.00               NA
Albania           2022-04-22     5           7.00               NA
Albania           2022-04-21     7           3.00               NA
Bolivia           2022-04-24     2           NA                  1
Bolivia           2022-04-23     3           3.00                0
........

My problem is trying to make a new data frame which will contain each country once and each row will contain the most recent values * which isn't NA, if available *. For the above table the result should look like this:

Location          Date      code     total_cases   total_vaccinations
Afghanistan   2022-04-23     3           5.00                2
Albania       2022-04-24     3           5.00                NA
Bolivia       2022-04-24     2           3.00                1

So far I tried:

  new_data <- main_data %>%
  group_by(Location) %>%
  arrange(desc(Date)) %>%
  filter(date==max(Date))

But that doesn't work. Would appricate any help.

CodePudding user response:

A possible solution, based on tidyverse:

library(tidyverse)

df %>% 
  group_by(Location) %>% 
  arrange(Date) %>% 
  fill(-Date, .direction="down") %>% 
  slice_max(Date) %>% 
  ungroup

#> # A tibble: 3 × 5
#>   Location    Date        code total_cases total_vaccinations
#>   <chr>       <chr>      <int>       <dbl>              <int>
#> 1 Afghanistan 2022-04-23     3           5                  2
#> 2 Albania     2022-04-24     3           9                 NA
#> 3 Bolivia     2022-04-24     2           3                  1
  • Related