I have a data frame grouped by 'id' and a variable 'age' which contains missing values, NA
.
Within each 'id', I want to replace missing values of 'age', but only "fill up" before the first non-NA
value.
data <- data.frame(id=c(1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3),
age=c(NA,6,NA,8,NA,NA,NA,NA,3,8,NA,NA,NA,7,NA,9))
id age
1 1 NA
2 1 6 # first non-NA in id = 1. Fill up from here
3 1 NA
4 1 8
5 1 NA
6 1 NA
7 2 NA
8 2 NA
9 2 3 # first non-NA in id = 2. Fill up from here
10 2 8
11 2 NA
12 3 NA
13 3 NA
14 3 7 # first non-NA in id = 3. Fill up from here
15 3 NA
16 3 9
Expected output:
1 1 6
2 1 6
3 1 NA
4 1 8
5 1 NA
6 1 NA
7 2 3
8 2 3
9 2 3
10 2 8
11 2 NA
12 3 7
13 3 7
14 3 7
15 3 NA
16 3 9
I tried using fill
with .direction = "up"
like this:
library(dplyr)
library(tidyr)
data1 <- data %>% group_by(id) %>%
fill(!is.na(age[1]), .direction = "up")
CodePudding user response:
You could use cumall(is.na(age))
to find the positions before the first non-NA value.
library(dplyr)
data %>%
group_by(id) %>%
mutate(age2 = replace(age, cumall(is.na(age)), age[!is.na(age)][1])) %>%
ungroup()
# A tibble: 16 × 3
id age age2
<dbl> <dbl> <dbl>
1 1 NA 6
2 1 6 6
3 1 NA NA
4 1 8 8
5 1 NA NA
6 1 NA NA
7 2 NA 3
8 2 NA 3
9 2 3 3
10 2 8 8
11 2 NA NA
12 3 NA 7
13 3 NA 7
14 3 7 7
15 3 NA NA
16 3 9 9
CodePudding user response:
Another option (agnostic about where the missing and non-missing values start) could be:
data %>%
group_by(id) %>%
mutate(rleid = with(rle(is.na(age)), rep(seq_along(lengths), lengths)),
age2 = ifelse(rleid == min(rleid[is.na(age)]),
age[rleid == (min(rleid[is.na(age)]) 1)][1],
age))
id age rleid age2
<dbl> <dbl> <int> <dbl>
1 1 NA 1 6
2 1 6 2 6
3 1 NA 3 NA
4 1 8 4 8
5 1 NA 5 NA
6 1 NA 5 NA
7 2 NA 1 3
8 2 NA 1 3
9 2 3 2 3
10 2 8 2 8
11 2 NA 3 NA
12 3 NA 1 7
13 3 NA 1 7
14 3 7 2 7
15 3 NA 3 NA
16 3 9 4 9