Home > Enterprise >  Replace NA with the nearest value based on another variable, while keeping NA for observation which
Replace NA with the nearest value based on another variable, while keeping NA for observation which

Time:04-17

Here I have a data that looks like this:

year <- c(2000,2001,2002,2003,2005,2006,2007,2008,2009,2010)
x <- c(1,2,3,NA,5,NA,NA,NA,9,10)
dat <- data.frame(year, x)
  1. I want to replace NA with the nearest neighbor according to the year variable.

For example, The fourth place of the data (the first NA) takes the value from its left neighbor rather than its right neighbor because its year "2003" is closer to "2002" instead of "2005"

  1. I want to leave the NA there when it does not have nearest nonNA neighbor.

For example, the seventh place of the data (the third NA) will still be NA because it does not have non-NA neighbor.

After imputing, the resulting x should be 1, 2, 3, 3, 5, 5, NA, 9, 9, 10

CodePudding user response:

One option would be to make use of case_when from tidyverse. Essentially, if the previous row has a closer year and is not NA, then return x from that row. If not, then choose the row below. Or if the year is closer above but there is an NA, then return the row below. Then, same for if the row below has a closer year, but has an NA, then return the row above. If a row does not have an NA, then just return x.

library(tidyverse)

dat %>%
  mutate(x = case_when(is.na(x) & !is.na(lag(x)) & year - lag(year) < lead(year) - year ~ lag(x),
                       is.na(x) & !is.na(lead(x)) & year - lag(year) > lead(year) - year ~ lead(x),
                       is.na(x) & is.na(lag(x)) ~ lead(x),
                       is.na(x) & is.na(lead(x)) ~ lag(x),
                       TRUE ~ x))

Output

   year  x
1  2000  1
2  2001  2
3  2002  3
4  2003  3
5  2005  5
6  2006  5
7  2007 NA
8  2008  9
9  2009  9
10 2010 10

CodePudding user response:

A method using imap():

library(tidyverse)

dat %>%
  mutate(new = imap_dbl(x, ~ {
    if(is.na(.x)) {
      dist <- abs(year[-.y] - year[.y])
      res <- x[-.y][dist == min(dist, na.rm = TRUE)]
      if(all(is.na(res))) NA else na.omit(res)
    } else .x
  }))

#    year  x new
# 1  2000  1   1
# 2  2001  2   2
# 3  2002  3   3
# 4  2003 NA   3
# 5  2005  5   5
# 6  2006 NA   5
# 7  2007 NA  NA
# 8  2008 NA   9
# 9  2009  9   9
# 10 2010 10  10
  • Related