Here I have a data that looks like this:
year <- c(2000,2001,2002,2003,2005,2006,2007,2008,2009,2010)
x <- c(1,2,3,NA,5,NA,NA,NA,9,10)
dat <- data.frame(year, x)
- I want to replace
NA
with the nearest neighbor according to the year variable.
For example, The fourth place of the data (the first NA
) takes the value from its left neighbor rather than its right neighbor because its year "2003" is closer to "2002" instead of "2005"
- I want to leave the
NA
there when it does not have nearest nonNA neighbor.
For example, the seventh place of the data (the third NA) will still be NA because it does not have non-NA neighbor.
After imputing, the resulting x should be 1, 2, 3, 3, 5, 5, NA, 9, 9, 10
CodePudding user response:
One option would be to make use of case_when
from tidyverse
. Essentially, if the previous row has a closer year and is not NA
, then return x
from that row. If not, then choose the row below. Or if the year is closer above but there is an NA
, then return the row below. Then, same for if the row below has a closer year, but has an NA
, then return the row above. If a row does not have an NA
, then just return x
.
library(tidyverse)
dat %>%
mutate(x = case_when(is.na(x) & !is.na(lag(x)) & year - lag(year) < lead(year) - year ~ lag(x),
is.na(x) & !is.na(lead(x)) & year - lag(year) > lead(year) - year ~ lead(x),
is.na(x) & is.na(lag(x)) ~ lead(x),
is.na(x) & is.na(lead(x)) ~ lag(x),
TRUE ~ x))
Output
year x
1 2000 1
2 2001 2
3 2002 3
4 2003 3
5 2005 5
6 2006 5
7 2007 NA
8 2008 9
9 2009 9
10 2010 10
CodePudding user response:
A method using imap()
:
library(tidyverse)
dat %>%
mutate(new = imap_dbl(x, ~ {
if(is.na(.x)) {
dist <- abs(year[-.y] - year[.y])
res <- x[-.y][dist == min(dist, na.rm = TRUE)]
if(all(is.na(res))) NA else na.omit(res)
} else .x
}))
# year x new
# 1 2000 1 1
# 2 2001 2 2
# 3 2002 3 3
# 4 2003 NA 3
# 5 2005 5 5
# 6 2006 NA 5
# 7 2007 NA NA
# 8 2008 NA 9
# 9 2009 9 9
# 10 2010 10 10