Home > OS >  How to fill NAs based on values near them
How to fill NAs based on values near them

Time:10-20

I have a dataframe in which a column (x) has some missing values and I would like to create a new column (y) replacing these NAs with the nearest value already existing. Also I would like that if the distance is the same both ways, I get the mean of both.

Here is an example of what I mean:

data <- data.frame(x = c(2, 3, 3, NA, NA, 4, NA, 3, 2, NA, NA, NA, 4))

The dataframe I would like to obtain would be that:

x y
2 2
3 3
3 3
NA 3
NA 4
4 4
NA 3.5
3 3
2 2
NA 2
NA 3
NA 4
4 4

Thanks in advance

CodePudding user response:

Your case is slightly different from the usual cases, where interpolation is needed from the first NA. Here, interpolation is necessary only when the NA sequence is 1, 3, or more, and for the first and last value in a sequence of NA, you need the closest non-NA.

You can use lead and lag to do so, and then apply interpolation with zoo::na.approx.

library(dplyr)
library(zoo)
data %>% 
  mutate(y = case_when(is.na(x) & !is.na(lag(x)) & is.na(lead(x)) ~ lag(x),
                       is.na(x) & is.na(lag(x)) & !is.na(lead(x)) ~ lead(x),
                       TRUE ~ x) %>% 
           na.approx(.))

output

    x   y
1   2 2.0
2   3 3.0
3   3 3.0
4  NA 3.0
5  NA 4.0
6   4 4.0
7  NA 3.5
8   3 3.0
9   2 2.0
10 NA 2.0
11 NA 3.0
12 NA 4.0
13  4 4.0
  •  Tags:  
  • r
  • Related