Home > Enterprise >  Filling in NA values with a sequence by group
Filling in NA values with a sequence by group

Time:12-21

I have a data set that looks like the following:

ID  Count
1   0
1   1
1   NA 
1   2
1   NA 
1   NA 
1   NA 
1   NA 
1   NA 
2   0
2   NA
2   NA
2   3

The first row of each ID starts with 0. I want to fill the NA values with sequential values by group. If there are values before and after the NA values, I need to fill the NA values with a sequence counting up to the first value after the NA values. If there are no values after the NA values, I need to fill the NA values with a sequence counting up from the last value before the NA value. The output should look like following:

ID  Count
1   0
1   1
1   1
1   2
1   3
1   4
1   5
1   6
1   7
2   0
2   1
2   2
2   3

CodePudding user response:

This is a little complicated, but I think this does what you want. I left all my helper columns in so you can see what's happening, but the non-needed columns can all be dropped at the end.

library(dplyr)
library(vctrs)
df %>%
  group_by(ID, na_group = cumsum(!is.na(Count))) %>%
  mutate(n_til_non_na = ifelse(is.na(Count), rev(row_number()), 0L)) %>%
  group_by(ID) %>%
  mutate(
    fill_down = vec_fill_missing(Count, direction = "down"),
    fill_up = vec_fill_missing(Count, direction = "up"),
    result = case_when(
      is.na(fill_up) ~ fill_down   cumsum(is.na(fill_up)),
      is.na(Count) ~ fill_up - n_til_non_na,
      TRUE ~ Count
    )
  ) %>%
  ungroup()
# # A tibble: 13 × 7
#       ID Count na_group n_til_non_na fill_down fill_up result
#    <int> <int>    <int>        <int>     <int>   <int>  <int>
#  1     1     0        1            0         0       0      0
#  2     1     1        2            0         1       1      1
#  3     1    NA        2            1         1       2      1
#  4     1     2        3            0         2       2      2
#  5     1    NA        3            5         2      NA      3
#  6     1    NA        3            4         2      NA      4
#  7     1    NA        3            3         2      NA      5
#  8     1    NA        3            2         2      NA      6
#  9     1    NA        3            1         2      NA      7
# 10     2     0        4            0         0       0      0
# 11     2    NA        4            2         0       3      1
# 12     2    NA        4            1         0       3      2
# 13     2     3        5            0         3       3      3

Using this sample data:

df = read.table(text = 'ID  Count
1   0
1   1
1   NA 
1   2
1   NA 
1   NA 
1   NA 
1   NA 
1   NA 
2   0
2   NA
2   NA
2   3', header = T)

CodePudding user response:

You can use purrr::accumulate(), first backwards, then forward. While going backwards, replace each missing value with the previous value - 1 to count down; then while moving forwards, replace remaining missing values with the previous value 1 to count up.

library(dplyr)
library(purrr)

dat %>%
  group_by(ID) %>%
  mutate(
    Count = accumulate(
      Count, 
      \(x, y) ifelse(is.na(x), y - 1, x),
      .dir = "backward"
    ),
    Count = accumulate(
      Count, 
      \(x, y) ifelse(is.na(y), x   1, y)
    )
  ) %>%
  ungroup()
# A tibble: 13 × 2
      ID Count
   <dbl> <dbl>
 1     1     0
 2     1     1
 3     1     1
 4     1     2
 5     1     3
 6     1     4
 7     1     5
 8     1     6
 9     1     7
10     2     0
11     2     1
12     2     2
13     2     3
  • Related