Home > Blockchain >  Getting rid of NA values in R when trying to aggregate columns
Getting rid of NA values in R when trying to aggregate columns

Time:07-30

df

enter image description here

I'm trying to aggregate this df by the last value in each corresponding country observation. For some reason, the last value that is added to the tibble is not correct.

aggre_data <- combined %>% 
    group_by(location) %>%
    summarise(Last_value_vacc = last(people_vaccinated_per_hundred)
aggre_data

I believe it has something to do with all of the NA values throughout the df. However I did try:

aggre_data <- combined %>% 
    group_by(location) %>%
    summarise(Last_value_vacc = last(people_vaccinated_per_hundred(na.rm = TRUE)))
aggre_data

CodePudding user response:

combined %>% 
group_by(location) %>% 
arrange(date) %>% # or whatever
summarise(Last_value_vacc = last(people_vaccinated_per_hundred, na.rm = TRUE))

CodePudding user response:

If this is your dataset:

structure(list(continent = c("Asia", "Asia", "Asia", "Asia", 
"Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", 
"Asia", "Asia", "Asia", "Asia"), location = c("China", "China", 
"China", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", 
"Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", 
"Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan"), 
    date = c("7/16/2022", "7/17/2022", "7/18/2022", "7/19/2022", 
    "7/20/2022", "7/21/2022", "7/22/2022", "7/14/2022", "7/15/2022", 
    "7/16/2022", "7/17/2022", "7/18/2022", "7/19/2022", "7/20/2022", 
    "7/21/2022", "7/22/2022"), total_vaccinations = c(NA, 6706843L, 
    NA, NA, NA, NA, NA, NA, NA, NA, 6706843L, NA, NA, NA, NA, 
    1L), people_vaccinated = c(NA, 5969406L, NA, NA, NA, NA, 
    NA, NA, NA, NA, 5969406L, NA, NA, NA, NA, 2L), people_fully_vaccinated = c(NA, 
    5309804L, NA, NA, NA, NA, NA, NA, NA, NA, 5309804L, NA, NA, 
    NA, NA, 3L)), class = "data.frame", row.names = c(NA, -16L
))

   continent    location      date total_vaccinations people_vaccinated people_fully_vaccinated
1       Asia       China 7/16/2022                 NA                NA                      NA
2       Asia       China 7/17/2022            6706843           5969406                 5309804
3       Asia       China 7/18/2022                 NA                NA                      NA
4       Asia Afghanistan 7/19/2022                 NA                NA                      NA
5       Asia Afghanistan 7/20/2022                 NA                NA                      NA
6       Asia Afghanistan 7/21/2022                 NA                NA                      NA
7       Asia Afghanistan 7/22/2022                 NA                NA                      NA
8       Asia Afghanistan 7/14/2022                 NA                NA                      NA
9       Asia Afghanistan 7/15/2022                 NA                NA                      NA
10      Asia Afghanistan 7/16/2022                 NA                NA                      NA
11      Asia Afghanistan 7/17/2022            6706843           5969406                 5309804
12      Asia Afghanistan 7/18/2022                 NA                NA                      NA
13      Asia Afghanistan 7/19/2022                 NA                NA                      NA
14      Asia Afghanistan 7/20/2022                 NA                NA                      NA
15      Asia Afghanistan 7/21/2022                 NA                NA                      NA
16      Asia Afghanistan 7/22/2022                  1                 2                       3

and you apply this code:

library(dplyr)
df %>% 
  group_by(location) %>% 
  arrange(date, .by_group = TRUE) %>% 
  summarise(Last_value_vacc = last(people_vaccinated))

then you get this

  location    Last_value_vacc
  <chr>                 <int>
1 Afghanistan               2
2 China                    NA
  • Related