I am currently working with time series data which looks something like the following:
ID | Var1 | Var2 | Var3 | Var4 | Var5 |
---|---|---|---|---|---|
1 | A | NA | A | NA | A |
2 | B | C | NA | NA | B |
3 | A | A | NA | NA | A |
4 | A | B | NA | NA | B |
5 | C | NA | B | NA | B |
df <- data.frame("ID" = c(1, 2, 3, 4, 5),
"Var1" = c("A", "B", "A", "A", "C"),
"Var2" = c(NA, "C", "A", "B", NA),
"Var3" = c("A", NA, NA, NA, "B"),
"Var4" = c(NA, NA, NA, NA, NA),
"Var5" = c("A", "B", "A", "B", "B"))
I wish to fill in the "NA" values if the first non-missing previous and first non-missing next value are consistent. That is, the desired result would be
ID | Var1 | Var2 | Var3 | Var4 | Var5 |
---|---|---|---|---|---|
1 | A | A | A | A | A |
2 | B | C | NA | NA | B |
3 | A | A | A | A | A |
4 | A | B | B | B | B |
5 | C | NA | B | B | B |
Where the data for ID = 2 is not replaced, since Var2 and Var5 do not match. Moreover, the missing value for ID = 2 at Var2 is not replaced, since Var1 and Var3 are not consistent. I am struggling with how to accomplish this, and any help would be appreciated.
CodePudding user response:
- Pivot longer to make use of
tidyr::fill()
. - Use
fill()
to createfill_down
andfill_up
columns, which will include the previous and next non-missing values, respectively. - If previous non-missing == next non-missing, use that value; otherwise keep value as is. (This will also keep non-missing values as is, because in this case previous non-missing will always == next non-missing.)
- Pivot back to original format.
library(tidyverse)
df_filled <- df %>%
pivot_longer(!ID) %>%
mutate(
fill_down = value,
fill_up = value
) %>%
group_by(ID) %>%
fill(fill_down) %>%
fill(fill_up, .direction = "up") %>%
mutate(value = if_else(fill_down == fill_up, fill_down, value)) %>%
ungroup() %>%
pivot_wider(id_cols = ID)
df_filled
# # A tibble: 5 x 6
# ID Var1 Var2 Var3 Var4 Var5
# <dbl> <chr> <chr> <chr> <chr> <chr>
# 1 1 A A A A A
# 2 2 B C NA NA B
# 3 3 A A A A A
# 4 4 A B B B B
# 5 5 C NA B B B