Home > other >  Fill in Multiple NAs with Lagged Values R
Fill in Multiple NAs with Lagged Values R

Time:02-24

I am trying to fill the NA values in this data frame with the most recent non-NA value in the cost column. I want to group by city - so all NAs for Omaha should be 44.50, and the NAs for Lincoln should be 62.50. Here is the code I have been using - it replaces the first NA (April) for each group with the correct value, but does not fill past that.

df <- df %>% 
  group_by(city) %>%
  mutate(cost = ifelse(is.na(cost), lag(cost, na.rm=TRUE), cost))

Data before running code:

year   month      city     cost
2021   January    Omaha     45.50  
2021   February   Omaha     46.75
2021   March      Omaha     44.50
2021   April      Omaha     NA
2021   May        Omaha     NA
2021   June       Omaha     NA
2021   January    Lincoln   55.25
2021   February   Lincoln   53.80
2021   March      Lincoln   62.50
2021   April      Lincoln   NA
2021   May        Lincoln   NA
2021   June       Lincoln   NA

CodePudding user response:

Use:

library(tidyverse)

df %>% 
  group_by(city) %>%
  fill(cost)

# A tibble: 12 x 4
# Groups:   city [2]
    year month    city     cost
   <int> <chr>    <chr>   <dbl>
 1  2021 January  Omaha    45.5
 2  2021 February Omaha    46.8
 3  2021 March    Omaha    44.5
 4  2021 April    Omaha    44.5
 5  2021 May      Omaha    44.5
 6  2021 June     Omaha    44.5
 7  2021 January  Lincoln  55.2
 8  2021 February Lincoln  53.8
 9  2021 March    Lincoln  62.5
10  2021 April    Lincoln  62.5
11  2021 May      Lincoln  62.5
12  2021 June     Lincoln  62.5

CodePudding user response:

With your code, you would want to use last rather than lag (though fill is the much better option here). We also need to wrap cost in na.omit.

library(tidyverse)

df %>%
  group_by(city) %>%
  mutate(cost = ifelse(is.na(cost), last(na.omit(cost)), cost))

Output

    year month    city     cost
   <int> <chr>    <chr>   <dbl>
 1  2021 January  Omaha    45.5
 2  2021 February Omaha    46.8
 3  2021 March    Omaha    44.5
 4  2021 April    Omaha    44.5
 5  2021 May      Omaha    44.5
 6  2021 June     Omaha    44.5
 7  2021 January  Lincoln  55.2
 8  2021 February Lincoln  53.8
 9  2021 March    Lincoln  62.5
10  2021 April    Lincoln  62.5
11  2021 May      Lincoln  62.5
12  2021 June     Lincoln  62.5

Data

df <- structure(list(year = c(2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 
2021L, 2021L, 2021L, 2021L, 2021L, 2021L), month = c("January", 
"February", "March", "April", "May", "June", "January", "February", 
"March", "April", "May", "June"), city = c("Omaha", "Omaha", 
"Omaha", "Omaha", "Omaha", "Omaha", "Lincoln", "Lincoln", "Lincoln", 
"Lincoln", "Lincoln", "Lincoln"), cost = c(45.5, 46.75, 44.5, 
NA, NA, NA, 55.25, 53.8, 62.5, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-12L))
  • Related