Home > Enterprise >  Filling NA values with last non-NA's if between repeated identical non-NA values
Filling NA values with last non-NA's if between repeated identical non-NA values

Time:10-04

I would like to replace the NA's values in my dataset with the previous non-NA value but only if the NA's are between identical values.

To illustrate here's a small sample of the data:

      date        1     2     3
1  2004-12-27     NA    NA    NA
2  2004-12-28  2.299 2.349 2.348
3  2004-12-29     NA    NA    NA
4  2005-01-03     NA    NA    NA
5  2005-01-04     NA    NA    NA
6  2005-01-05  2.299    NA    NA
7  2005-01-06     NA    NA    NA
8  2005-01-10     NA    NA    NA
9  2005-01-11  2.299 2.349 2.348
10 2005-01-12     NA    NA    NA
11 2005-01-17     NA    NA    NA
12 2005-01-18  2.299    NA    NA
13 2005-01-19     NA    NA    NA
14 2005-01-24     NA    NA    NA
15 2005-01-25     NA 2.369 2.368
16 2005-01-26  2.299    NA    NA
17 2005-01-31  2.299    NA    NA
18 2005-02-01     NA    NA    NA
19 2005-02-02     NA    NA    NA
20 2005-02-08     NA    NA    NA

The ideal output would be:

     date         1     2     3
1  2004-12-27     NA    NA    NA
2  2004-12-28  2.299 2.349 2.348
3  2004-12-29  2.299 2.349 2.348
4  2005-01-03  2.299 2.349 2.348
5  2005-01-04  2.299 2.349 2.348
6  2005-01-05  2.299 2.349 2.348
7  2005-01-06  2.299 2.349 2.348
8  2005-01-10  2.299 2.349 2.348
9  2005-01-11  2.299 2.349 2.348
10 2005-01-12  2.299    NA    NA
11 2005-01-17  2.299    NA    NA
12 2005-01-18  2.299    NA    NA
13 2005-01-19  2.299    NA    NA
14 2005-01-24  2.299    NA    NA
15 2005-01-25  2.299 2.369 2.368
16 2005-01-26  2.299    NA    NA
17 2005-01-31  2.299    NA    NA

Here's a reproducible sample of the dataset using dput:

structure(list(data_gas = structure(c(12779, 12780, 12781, 12786, 
12787, 12788, 12789, 12793, 12794, 12795, 12800, 12801, 12802, 
12807, 12808, 12809, 12814, 12815, 12816, 12822), class = "Date"), 
    `1` = c(NA, 2.299, NA, NA, NA, 2.299, NA, NA, 2.299, NA, 
    NA, 2.299, NA, NA, NA, 2.299, 2.299, NA, NA, NA), `3` = c(NA, 
    2.349, NA, NA, NA, NA, NA, NA, 2.349, NA, NA, NA, NA, NA, 
    2.369, NA, NA, NA, NA, NA), `4` = c(NA, 2.348, NA, NA, NA, 
    NA, NA, NA, 2.348, NA, NA, NA, NA, NA, 2.368, NA, NA, NA, 
    NA, NA)), row.names = c(NA, 20L), class = "data.frame")

I've tried a few for loops without sucess.

Any help will be greatly appreciated.

CodePudding user response:

Here is a base R for loop solution.

Write a function that compares two consecutive non-NA values and if they are the same fill the middle NA values with the same value.

fill_NA_values <- function(x) {
  #Index of non-NA values
  non_na_values <- which(!is.na(x))
  #loop over each index.
  for(i in seq_along(non_na_values[-1])) {
    #If two consecutive non-NA value are the same
    if(x[non_na_values[i]] == x[non_na_values[i   1]]) {
      #Fill the NA values in between with the value.
      x[(non_na_values[i]   1):(non_na_values[i 1] -1)] <- x[non_na_values[i]]
    }
  }
  x
}

Apply this for multiple columns using lapply.

df[-1] <- lapply(df[-1], fill_NA_values)
df

#         date    X1    X3    X4
#1  2004-12-27    NA    NA    NA
#2  2004-12-28 2.299 2.349 2.348
#3  2004-12-29 2.299 2.349 2.348
#4  2005-01-03 2.299 2.349 2.348
#5  2005-01-04 2.299 2.349 2.348
#6  2005-01-05 2.299 2.349 2.348
#7  2005-01-06 2.299 2.349 2.348
#8  2005-01-10 2.299 2.349 2.348
#9  2005-01-11 2.299 2.349 2.348
#10 2005-01-12 2.299    NA    NA
#11 2005-01-17 2.299    NA    NA
#12 2005-01-18 2.299    NA    NA
#13 2005-01-19 2.299    NA    NA
#14 2005-01-24 2.299    NA    NA
#15 2005-01-25 2.299 2.369 2.368
#16 2005-01-26 2.299    NA    NA
#17 2005-01-31 2.299    NA    NA
#18 2005-02-01    NA    NA    NA
#19 2005-02-02    NA    NA    NA
#20 2005-02-08    NA    NA    NA
  • Related