R - How to fill in values in NA, but only when ending value is the same as the beginning value?-CodePudding

I have the following example data:

Example <- data.frame(col1 =c(1, NA, NA, 4, NA, NA, 6, NA, NA, NA, 6, 8, NA, 2, NA))

col1
1
NA
NA
4
NA
NA
6
NA
NA
NA
6
8
NA
2
NA

I want to fill the NAs with value from above, but only if the NAs are between 2 identical values. In this example the first NA gap from 1 to 4 should not be filled with 1s. But the gap between the first 6 and the second 6 should be filled, with 6s. All other values should stay NA. Therefore, afterwards it should look like:

col1
1
NA
NA
4
NA
NA
6
6
6
6
6
8
NA
2
NA

But in reality I do not have only 15 observations, but over 50000. Therefore I need a efficient solution, which is more difficult than I thought. I tried to use the Fill function but was not able to come up with a solution.

CodePudding user response：

One dplyr and zoo option could be:

df %>%
    mutate(cond = na.locf0(col1) == na.locf0(col1, fromLast = TRUE),
           col1 = ifelse(cond, na.locf0(col1), col1)) %>%
    select(-cond)

   col1
1     1
2    NA
3    NA
4     4
5    NA
6    NA
7     6
8     6
9     6
10    6
11    6
12    8
13   NA
14    2
15   NA

CodePudding user response：

Here is a dply solution:

First I create the data in tibble format:

df <- tibble(
  x = c(1, NA_real_, NA_real_, 
        4, NA_real_, NA_real_,
        6, NA_real_, NA_real_, NA_real_, 
        6, 8, NA_real_, 2, NA_real_)
)

Next, I create two grouping variables which will be helpful in identifying the first and the last non-NA value. I then save these reference values to ref_start and ref_end. In the end I overwrite the values of x:

df %>%
  mutate(gr1 = cumsum(!is.na(x))) %>%
  group_by(gr1) %>%
  mutate(ref_start = first(x)) %>%
  ungroup() %>%
  mutate(gr2 = lag(gr1, default = 1)) %>%
  group_by(gr2) %>%
  mutate(ref_end = last(x)) %>%
  ungroup() %>%
  mutate(x = if_else(is.na(x) & ref_start == ref_end, ref_start, x))

# A tibble: 15 x 1
       x
   <dbl>
 1     1
 2    NA
 3    NA
 4     4
 5    NA
 6    NA
 7     6
 8     6
 9     6
10     6
11     6
12     8
13    NA
14     2
15    NA

CodePudding user response：

df <- data.frame(col1 =c(1, NA, NA, 4, NA, NA, 6, NA, NA, NA, 6, 8, NA, 2, NA))

library(data.table)
library(magrittr)

setDT(df)[!is.na(col1), n := .N, by = col1] %>% 
  .[, n := nafill(n, type = "locf")] %>% 
  .[n == 2, col1 := nafill(col1, type = "locf")] %>% 
  .[, n := NULL] %>% 
  .[]
#>     col1
#>  1:    1
#>  2:   NA
#>  3:   NA
#>  4:    4
#>  5:   NA
#>  6:   NA
#>  7:    6
#>  8:    6
#>  9:    6
#> 10:    6
#> 11:    6
#> 12:    8
#> 13:   NA
#> 14:    2
#> 15:   NA

^{Created on 2021-10-11 by the reprex package (v2.0.1)}

CodePudding user response：

Here is a tidyverse approach using dplyr and tidyr: Logic:

Create an id column
Remove all na rows
Flag if next value is the same
right_join with first Example df
fill down flag and corresponding col1.y
mutate with an ifelse

library(dplyr)
library(tidyr)

Example <- Example %>% 
  mutate(id=row_number())

Example %>% 
  na.omit() %>% 
  mutate(flag = ifelse(col1==lead(col1), TRUE, FALSE)) %>% 
  right_join(Example, by="id") %>% 
  arrange(id) %>% 
  fill(col1.y, .direction="down") %>% 
  fill(flag, .direction="down") %>% 
  mutate(col1.x = ifelse(flag==TRUE, col1.y, col1.x), .keep="unused") %>% 
  select(col1 = col1.x)

Output: