fill NA values per group based on first value of a group-CodePudding

I am trying to fill NA values of my dataframe. However, I would like to fill them based on the first value of each group.

#> df = data.frame(
    group = c(rep("A", 4), rep("B", 4)),
    val = c(1, 2, NA, NA, 4, 3, NA, NA)
  )
#> df
  group val
1     A   1
2     A   2
3     A  NA
4     A  NA
5     B   4
6     B   3
7     B  NA
8     B  NA
#> fill(df, val, .direction = "down")
  group val
1     A   1
2     A   2
3     A   2 # -> should be 1
4     A   2 # -> should be 1
5     B   4
6     B   3
7     B   3 # -> should be 4
8     B   3 # -> should be 4

Can I do this with tidyr::fill()? Or is there another (more or less elegant) way how to do this? I need to use this in a longer chain (%>%) operation.

Thank you very much!

CodePudding user response：

Use tidyr::replace_na() and dplyr::first() (or val[[1]]) inside a grouped mutate():

library(dplyr)
library(tidyr)

df %>% 
  group_by(group) %>% 
  mutate(val = replace_na(val, first(val))) %>% 
  ungroup()

#> # A tibble: 8 × 2
#>   group   val
#>   <chr> <dbl>
#> 1 A         1
#> 2 A         2
#> 3 A         1
#> 4 A         1
#> 5 B         4
#> 6 B         3
#> 7 B         4
#> 8 B         4

PS - @richarddmorey points out the case where the first value for a group is NA. The above code would keep all NA values as NA. If you'd like to instead replace with the first non-missing value per group, you could subset the vector using !is.na():

df %>% 
  group_by(group) %>% 
  mutate(val = replace_na(val, first(val[!is.na(val)]))) %>% 
  ungroup()

^{Created on 2022-11-17 with reprex v2.0.2}

CodePudding user response：

This should work, which uses dplyr's case_when

library(dplyr)

 df %>% 
   group_by(group) %>% 
   mutate(val = case_when(
     is.na(val) ~ val[1],
     TRUE ~ val
   ))

Output:

  group   val
  <chr> <dbl>
1 A         1
2 A         2
3 A         1
4 A         1
5 B         4
6 B         3
7 B         4
8 B         4