I am using R and I need to fill NA in one column based on the value from the same column and another Date column. For example, the initial data frame looks like the following:
df <- data.frame( ID = c("A","A","A","A","A",
"B","B","B","B","B",
"C","C","C","C","C"),
Year = c(2013,2015,2019,2020,2021,
2001,2005,2009,2010,2016,
2010,2011,2014,2015,2018),
value = c(NA,NA,1,NA,2,NA,1,2,NA,3,1,NA,NA,2,NA))
df
ID Year value
1 A 2013 NA
2 A 2015 NA
3 A 2019 1
4 A 2020 NA
5 A 2021 2
6 B 2001 NA
7 B 2005 1
8 B 2009 2
9 B 2010 NA
10 B 2016 3
11 C 2010 1
12 C 2011 NA
13 C 2014 NA
14 C 2015 2
15 C 2018 NA
Here is the rule I need to follow to replace NA
's:
For each ID, if the data starts with value = NA, choose the next non-NA value. If there is non-NA value in the previous year(s), choose that value.
To do so, I am trying to follow these two steps:
- (i) For each ID, if the value of the oldest year (e.g., ID A: year = 2013) is NA, then fill the NA with the first non-NA value (e.g., because the value of ID A in year 2019 is 1, the value of ID A in year 2013 should be 1)
- (ii) then fill the NA with the previous non-NA value (e.g., ID A: value of year = 2015 needs to be 1 since the value in year 2013 is 1).
The desired output looks like this:
ID Year value
1 A 2013 1
2 A 2015 1
3 A 2019 1
4 A 2020 1
5 A 2021 2
6 B 2001 1
7 B 2005 1
8 B 2009 2
9 B 2010 2
10 B 2016 3
11 C 2010 1
12 C 2011 1
13 C 2014 1
14 C 2015 2
15 C 2018 2
However, I can't think of a way to accomplish this.
CodePudding user response:
This matches the updated desired output.
df %>%
group_by(ID) %>%
tidyr::fill(value, .direction = "downup") %>%
ungroup()
Result
# A tibble: 15 × 3
ID Year value
<chr> <dbl> <dbl>
1 A 2013 1
2 A 2015 1
3 A 2019 1
4 A 2020 1
5 A 2021 2
6 B 2001 1
7 B 2005 1
8 B 2009 2
9 B 2010 2
10 B 2016 3
11 C 2010 1
12 C 2011 1
13 C 2014 1
14 C 2015 2
15 C 2018 2
CodePudding user response:
This is a step by step verification of Jon Springs solution: So his answer is correct:
df %>%
group_by(ID) %>%
transmute(helper = value) %>%
arrange(helper, .by_group = TRUE) %>%
bind_cols(df) %>%
group_by(ID...1) %>%
mutate(jonSprings_value = ifelse(row_number()==1, helper, value)) %>%
tidyr::fill(jonSprings_value, .direction = "down") %>%
ungroup()
ID...1 helper ID...3 Year value jonSprings_value
<chr> <dbl> <chr> <dbl> <dbl> <dbl>
1 A 1 A 2013 NA 1
2 A 2 A 2015 NA 1
3 A NA A 2019 1 1
4 A NA A 2020 NA 1
5 A NA A 2021 2 2
6 B 1 B 2001 NA 1
7 B 2 B 2005 1 1
8 B 3 B 2009 2 2
9 B NA B 2010 NA 2
10 B NA B 2016 3 3
11 C 1 C 2010 1 1
12 C 2 C 2011 NA 1
13 C NA C 2014 NA 1
14 C NA C 2015 2 2
15 C NA C 2018 NA 2