This question is related to this Is there an R function for imputing missing year values, consecutively, by group?:
The OP asks for this dataframe to impute the missing year values by group: (There are already sufficient answers)
df <- data.frame(ID=c("A", "A", "A", "A",
"B", "B", "B", "B",
"C", "C", "C", "C",
"D", "D", "D", "D"),
grade=c("KG", "01", "02", "03",
"KG", "01", "02", "03",
"KG", "01", "02", "03",
"KG", "01", "02", "03"),
year=c(2002, 2003, NA, 2005,
2007, NA, NA, 2010,
NA, 2005, 2006, NA,
2009, 2010, NA, NA))
I tried to use ifelse
with lag()
or lead()
:
The idea in words: If the row is NA then take the row above and add 1. This works fine if there is only one NA row in the group. If there are 2 consecutive NAs then it is getting clumsy.
My question is how can I make ifelse
run with one call until all NA's are replaced:
My try:
library(dplyr)
df %>%
group_by(ID) %>%
mutate(year= ifelse(is.na(year), lag(year) 1, year),
year= ifelse(is.na(year), lag(year) 1, year),
year= ifelse(is.na(year), lead(year)-1, year))
gives:
ID grade year
<chr> <chr> <dbl>
1 A KG 2002
2 A 01 2003
3 A 02 2004
4 A 03 2005
5 B KG 2007
6 B 01 2008
7 B 02 2009
8 B 03 2010
9 C KG 2004
10 C 01 2005
11 C 02 2006
12 C 03 2007
13 D KG 2009
14 D 01 2010
15 D 02 2011
16 D 03 2012
CodePudding user response:
We may use accumulate
from purrr
library(dplyr)
library(purrr)
df %>%
group_by(ID) %>%
mutate(year = accumulate(accumulate(year,
~ if(is.na(.y)) .x 1 else .y),
~ if(is.na(.x)) .y - 1 else .x, .dir = "backward")) %>%
ungroup
-output
# A tibble: 16 × 3
ID grade year
<chr> <chr> <dbl>
1 A KG 2002
2 A 01 2003
3 A 02 2004
4 A 03 2005
5 B KG 2007
6 B 01 2008
7 B 02 2009
8 B 03 2010
9 C KG 2004
10 C 01 2005
11 C 02 2006
12 C 03 2007
13 D KG 2009
14 D 01 2010
15 D 02 2011
16 D 03 2012