I would like to replace the NA's values in my dataset with the previous non-NA value but only if the NA's are between identical values.
To illustrate here's a small sample of the data:
date 1 2 3
1 2004-12-27 NA NA NA
2 2004-12-28 2.299 2.349 2.348
3 2004-12-29 NA NA NA
4 2005-01-03 NA NA NA
5 2005-01-04 NA NA NA
6 2005-01-05 2.299 NA NA
7 2005-01-06 NA NA NA
8 2005-01-10 NA NA NA
9 2005-01-11 2.299 2.349 2.348
10 2005-01-12 NA NA NA
11 2005-01-17 NA NA NA
12 2005-01-18 2.299 NA NA
13 2005-01-19 NA NA NA
14 2005-01-24 NA NA NA
15 2005-01-25 NA 2.369 2.368
16 2005-01-26 2.299 NA NA
17 2005-01-31 2.299 NA NA
18 2005-02-01 NA NA NA
19 2005-02-02 NA NA NA
20 2005-02-08 NA NA NA
The ideal output would be:
date 1 2 3
1 2004-12-27 NA NA NA
2 2004-12-28 2.299 2.349 2.348
3 2004-12-29 2.299 2.349 2.348
4 2005-01-03 2.299 2.349 2.348
5 2005-01-04 2.299 2.349 2.348
6 2005-01-05 2.299 2.349 2.348
7 2005-01-06 2.299 2.349 2.348
8 2005-01-10 2.299 2.349 2.348
9 2005-01-11 2.299 2.349 2.348
10 2005-01-12 2.299 NA NA
11 2005-01-17 2.299 NA NA
12 2005-01-18 2.299 NA NA
13 2005-01-19 2.299 NA NA
14 2005-01-24 2.299 NA NA
15 2005-01-25 2.299 2.369 2.368
16 2005-01-26 2.299 NA NA
17 2005-01-31 2.299 NA NA
Here's a reproducible sample of the dataset using dput
:
structure(list(data_gas = structure(c(12779, 12780, 12781, 12786,
12787, 12788, 12789, 12793, 12794, 12795, 12800, 12801, 12802,
12807, 12808, 12809, 12814, 12815, 12816, 12822), class = "Date"),
`1` = c(NA, 2.299, NA, NA, NA, 2.299, NA, NA, 2.299, NA,
NA, 2.299, NA, NA, NA, 2.299, 2.299, NA, NA, NA), `3` = c(NA,
2.349, NA, NA, NA, NA, NA, NA, 2.349, NA, NA, NA, NA, NA,
2.369, NA, NA, NA, NA, NA), `4` = c(NA, 2.348, NA, NA, NA,
NA, NA, NA, 2.348, NA, NA, NA, NA, NA, 2.368, NA, NA, NA,
NA, NA)), row.names = c(NA, 20L), class = "data.frame")
I've tried a few for
loops without sucess.
Any help will be greatly appreciated.
CodePudding user response:
Here is a base R for
loop solution.
Write a function that compares two consecutive non-NA
values and if they are the same fill the middle NA
values with the same value.
fill_NA_values <- function(x) {
#Index of non-NA values
non_na_values <- which(!is.na(x))
#loop over each index.
for(i in seq_along(non_na_values[-1])) {
#If two consecutive non-NA value are the same
if(x[non_na_values[i]] == x[non_na_values[i 1]]) {
#Fill the NA values in between with the value.
x[(non_na_values[i] 1):(non_na_values[i 1] -1)] <- x[non_na_values[i]]
}
}
x
}
Apply this for multiple columns using lapply
.
df[-1] <- lapply(df[-1], fill_NA_values)
df
# date X1 X3 X4
#1 2004-12-27 NA NA NA
#2 2004-12-28 2.299 2.349 2.348
#3 2004-12-29 2.299 2.349 2.348
#4 2005-01-03 2.299 2.349 2.348
#5 2005-01-04 2.299 2.349 2.348
#6 2005-01-05 2.299 2.349 2.348
#7 2005-01-06 2.299 2.349 2.348
#8 2005-01-10 2.299 2.349 2.348
#9 2005-01-11 2.299 2.349 2.348
#10 2005-01-12 2.299 NA NA
#11 2005-01-17 2.299 NA NA
#12 2005-01-18 2.299 NA NA
#13 2005-01-19 2.299 NA NA
#14 2005-01-24 2.299 NA NA
#15 2005-01-25 2.299 2.369 2.368
#16 2005-01-26 2.299 NA NA
#17 2005-01-31 2.299 NA NA
#18 2005-02-01 NA NA NA
#19 2005-02-02 NA NA NA
#20 2005-02-08 NA NA NA