After merging two time-series datasets I have some missings in a time-invariant variable(value_2) which I would like to complete. However, some units are missing this variable completely, as can be seen, in my explanatory data:
ID Year value_1 value_2
1 1 2000 a NA
2 1 2001 b NA
3 1 2002 r 2
4 1 2004 s 2
5 2 2000 w NA
6 2 2001 r NA
7 2 2002 d NA
8 2 2004 s NA
9 3 2000 e 3
10 3 2001 f 3
11 3 2002 s NA
12 3 2004 w2 NA
13 4 2000 e NA
14 4 2001 r NA
15 4 2002 e NA
16 4 2004 w NA
How can I complete the missing data the best, while keeping the value_2 values for the units without any observation at NA? The output should look like this:
ID Year value_1 value_2
1 1 2000 a 2
2 1 2001 b 2
3 1 2002 r 2
4 1 2004 s 2
5 2 2000 w NA
6 2 2001 r NA
7 2 2002 d NA
8 2 2004 s NA
9 3 2000 e 3
10 3 2001 f 3
11 3 2002 s 3
12 3 2004 w2 3
13 4 2000 e NA
14 4 2001 r NA
15 4 2002 e NA
16 4 2004 w NA
CodePudding user response:
You can use tidyr::fill()
by groups.
library(dplyr)
df %>%
group_by(ID) %>%
tidyr::fill(value_2, .direction = "downup") %>%
ungroup()
# A tibble: 16 × 4
ID Year value_1 value_2
<dbl> <dbl> <chr> <dbl>
1 1 2000 a 2
2 1 2001 b 2
3 1 2002 r 2
4 1 2004 s 2
5 2 2000 w NA
6 2 2001 r NA
7 2 2002 d NA
8 2 2004 s NA
9 3 2000 e 3
10 3 2001 f 3
11 3 2002 s 3
12 3 2004 w2 3
13 4 2000 e NA
14 4 2001 r NA
15 4 2002 e NA
16 4 2004 w NA
CodePudding user response:
How about this:
library(dplyr)
d <- tibble::tribble(
~ID, ~Year, ~value_1, ~value_2,
1, 2000, "a", NA,
1, 2001, "b", NA,
1, 2002, "r", 2,
1, 2004, "s", 2,
2, 2000, "w", NA,
2, 2001, "r", NA,
2, 2002, "d", NA,
2, 2004, "s", NA,
3, 2000, "e", 3,
3, 2001, "f", 3,
3, 2002, "s", NA,
3, 2004, "w2", NA,
4, 2000, "e", NA,
4, 2001, "r", NA,
4, 2002, "e", NA,
4, 2004, "w", NA)
d %>% group_by(ID) %>%
arrange(value_2, .by_group = TRUE) %>%
mutate(value_2 = first(value_2)) %>%
arrange(ID, Year)
#> # A tibble: 16 × 4
#> # Groups: ID [4]
#> ID Year value_1 value_2
#> <dbl> <dbl> <chr> <dbl>
#> 1 1 2000 a 2
#> 2 1 2001 b 2
#> 3 1 2002 r 2
#> 4 1 2004 s 2
#> 5 2 2000 w NA
#> 6 2 2001 r NA
#> 7 2 2002 d NA
#> 8 2 2004 s NA
#> 9 3 2000 e 3
#> 10 3 2001 f 3
#> 11 3 2002 s 3
#> 12 3 2004 w2 3
#> 13 4 2000 e NA
#> 14 4 2001 r NA
#> 15 4 2002 e NA
#> 16 4 2004 w NA
Created on 2023-01-19 by the reprex package (v2.0.1)