Home > Net >  Fill missing values (NA) before the first non-NA value by group
Fill missing values (NA) before the first non-NA value by group

Time:08-05

I have a data frame grouped by 'id' and a variable 'age' which contains missing values, NA.

Within each 'id', I want to replace missing values of 'age', but only "fill up" before the first non-NA value.

data <- data.frame(id=c(1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3),
                   age=c(NA,6,NA,8,NA,NA,NA,NA,3,8,NA,NA,NA,7,NA,9))

   id age
1   1  NA
2   1   6 # first non-NA in id = 1. Fill up from here
3   1  NA
4   1   8
5   1  NA
6   1  NA
7   2  NA
8   2  NA
9   2   3 # first non-NA in id = 2. Fill up from here
10  2   8
11  2  NA
12  3  NA
13  3  NA
14  3   7 # first non-NA in id = 3. Fill up from here
15  3  NA
16  3   9

Expected output:

1   1   6
2   1   6
3   1  NA
4   1   8
5   1  NA
6   1  NA
7   2   3
8   2   3
9   2   3
10  2   8
11  2  NA
12  3   7
13  3   7
14  3   7
15  3  NA
16  3   9

I tried using fill with .direction = "up" like this:

library(dplyr)
library(tidyr)

data1 <- data %>% group_by(id) %>%  
  fill(!is.na(age[1]), .direction = "up")

CodePudding user response:

You could use cumall(is.na(age)) to find the positions before the first non-NA value.

library(dplyr)

data %>%
  group_by(id) %>%
  mutate(age2 = replace(age, cumall(is.na(age)), age[!is.na(age)][1])) %>%
  ungroup()

# A tibble: 16 × 3
      id   age  age2
   <dbl> <dbl> <dbl>
 1     1    NA     6
 2     1     6     6
 3     1    NA    NA
 4     1     8     8
 5     1    NA    NA
 6     1    NA    NA
 7     2    NA     3
 8     2    NA     3
 9     2     3     3
10     2     8     8
11     2    NA    NA
12     3    NA     7
13     3    NA     7
14     3     7     7
15     3    NA    NA
16     3     9     9

CodePudding user response:

Another option (agnostic about where the missing and non-missing values start) could be:

data %>%
 group_by(id) %>%
 mutate(rleid = with(rle(is.na(age)), rep(seq_along(lengths), lengths)),
        age2 = ifelse(rleid == min(rleid[is.na(age)]), 
                      age[rleid == (min(rleid[is.na(age)])   1)][1],
                      age))

      id   age rleid  age2
   <dbl> <dbl> <int> <dbl>
 1     1    NA     1     6
 2     1     6     2     6
 3     1    NA     3    NA
 4     1     8     4     8
 5     1    NA     5    NA
 6     1    NA     5    NA
 7     2    NA     1     3
 8     2    NA     1     3
 9     2     3     2     3
10     2     8     2     8
11     2    NA     3    NA
12     3    NA     1     7
13     3    NA     1     7
14     3     7     2     7
15     3    NA     3    NA
16     3     9     4     9
  • Related