I have a dafa frame which looks roughly like this:
Location Date code total_cases total_vaccinations
Afghanistan 2022-04-23 NA 5.00 NA
Afghanistan 2022-04-22 3 3.00 2
Afghanistan 2022-04-21 2 3.00 NA
Albania 2022-04-24 3 9.00 NA
Albania 2022-04-23 NA 9.00 NA
Albania 2022-04-22 5 7.00 NA
Albania 2022-04-21 7 3.00 NA
Bolivia 2022-04-24 2 NA 1
Bolivia 2022-04-23 3 3.00 0
........
My problem is trying to make a new data frame which will contain each country once and each row will contain the most recent values * which isn't NA, if available *. For the above table the result should look like this:
Location Date code total_cases total_vaccinations
Afghanistan 2022-04-23 3 5.00 2
Albania 2022-04-24 3 5.00 NA
Bolivia 2022-04-24 2 3.00 1
So far I tried:
new_data <- main_data %>%
group_by(Location) %>%
arrange(desc(Date)) %>%
filter(date==max(Date))
But that doesn't work. Would appricate any help.
CodePudding user response:
A possible solution, based on tidyverse
:
library(tidyverse)
df %>%
group_by(Location) %>%
arrange(Date) %>%
fill(-Date, .direction="down") %>%
slice_max(Date) %>%
ungroup
#> # A tibble: 3 × 5
#> Location Date code total_cases total_vaccinations
#> <chr> <chr> <int> <dbl> <int>
#> 1 Afghanistan 2022-04-23 3 5 2
#> 2 Albania 2022-04-24 3 9 NA
#> 3 Bolivia 2022-04-24 2 3 1