Home > Net >  Fill NA with condition based on the same column
Fill NA with condition based on the same column

Time:04-19

I am using R and I need to fill NA in one column based on the value from the same column and another Date column. For example, the initial data frame looks like the following:

df <- data.frame( ID = c("A","A","A","A","A",
                         "B","B","B","B","B",
                         "C","C","C","C","C"), 
                  Year = c(2013,2015,2019,2020,2021,
                         2001,2005,2009,2010,2016,
                         2010,2011,2014,2015,2018),
                  value = c(NA,NA,1,NA,2,NA,1,2,NA,3,1,NA,NA,2,NA))    
df
   ID Year value
1   A 2013    NA
2   A 2015    NA
3   A 2019     1
4   A 2020    NA
5   A 2021     2
6   B 2001    NA
7   B 2005     1
8   B 2009     2
9   B 2010    NA
10  B 2016     3
11  C 2010     1
12  C 2011    NA
13  C 2014    NA
14  C 2015     2
15  C 2018    NA

Here is the rule I need to follow to replace NA's:

For each ID, if the data starts with value = NA, choose the next non-NA value. If there is non-NA value in the previous year(s), choose that value.

To do so, I am trying to follow these two steps:

  • (i) For each ID, if the value of the oldest year (e.g., ID A: year = 2013) is NA, then fill the NA with the first non-NA value (e.g., because the value of ID A in year 2019 is 1, the value of ID A in year 2013 should be 1)
  • (ii) then fill the NA with the previous non-NA value (e.g., ID A: value of year = 2015 needs to be 1 since the value in year 2013 is 1).

The desired output looks like this:

   ID Year value
1   A 2013     1
2   A 2015     1
3   A 2019     1
4   A 2020     1
5   A 2021     2
6   B 2001     1
7   B 2005     1
8   B 2009     2
9   B 2010     2
10  B 2016     3
11  C 2010     1
12  C 2011     1
13  C 2014     1
14  C 2015     2
15  C 2018     2

However, I can't think of a way to accomplish this.

CodePudding user response:

This matches the updated desired output.

 df %>% 
    group_by(ID) %>% 
    tidyr::fill(value, .direction = "downup") %>% 
    ungroup()

Result

# A tibble: 15 × 3
   ID     Year value
   <chr> <dbl> <dbl>
 1 A      2013     1
 2 A      2015     1
 3 A      2019     1
 4 A      2020     1
 5 A      2021     2
 6 B      2001     1
 7 B      2005     1
 8 B      2009     2
 9 B      2010     2
10 B      2016     3
11 C      2010     1
12 C      2011     1
13 C      2014     1
14 C      2015     2
15 C      2018     2

CodePudding user response:

This is a step by step verification of Jon Springs solution: So his answer is correct:

df %>% 
  group_by(ID) %>% 
  transmute(helper = value) %>% 
  arrange(helper, .by_group = TRUE) %>% 
  bind_cols(df) %>% 
  group_by(ID...1) %>% 
  mutate(jonSprings_value = ifelse(row_number()==1, helper, value)) %>% 
  tidyr::fill(jonSprings_value, .direction = "down") %>% 
  ungroup()
   ID...1 helper ID...3  Year value jonSprings_value
   <chr>   <dbl> <chr>  <dbl> <dbl>            <dbl>
 1 A           1 A       2013    NA                1
 2 A           2 A       2015    NA                1
 3 A          NA A       2019     1                1
 4 A          NA A       2020    NA                1
 5 A          NA A       2021     2                2
 6 B           1 B       2001    NA                1
 7 B           2 B       2005     1                1
 8 B           3 B       2009     2                2
 9 B          NA B       2010    NA                2
10 B          NA B       2016     3                3
11 C           1 C       2010     1                1
12 C           2 C       2011    NA                1
13 C          NA C       2014    NA                1
14 C          NA C       2015     2                2
15 C          NA C       2018    NA                2
  • Related