Conditionally update R tibble values based on other values in the row-CodePudding

I have a tibble with NA and "1" values, and I need to add in a "1" for all values in a row that are between two "1" values less than 4 columns apart (i.e. 3 or fewer columns apart). For example, take this example tibble:

# Example Tibble
ex_input <- tibble( "A" = c(1, NA, NA, NA), 
             "B" = c(NA, NA, 1, 1), 
             "C" = c(1, 1, NA, NA),
             "D" = c(1, NA, NA, NA),
             "E" = c(1, NA, NA, NA),
             "F" = c(1, NA, NA, NA),
             "G" = c(1, 1, NA, NA),
             "H" = c(1, NA, NA, 1),
             "I" = c(1, NA, NA, NA),
             "J" = c(1, NA, 1, 1))

Which looks like:

> print(ex_input)
# A tibble: 4 x 10
      A     B     C     D     E     F     G     H     I     J
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1    NA     1     1     1     1     1     1     1     1
2    NA    NA     1    NA    NA    NA     1    NA    NA    NA
3    NA     1    NA    NA    NA    NA    NA    NA    NA     1
4    NA     1    NA    NA    NA    NA    NA     1    NA     1

What I need at the end is for an output in which a "1" is added - in this example - to B1, D2, C2, E2, and I2 because those are all in a row between two "1" values less than 4 columns apart. Like so:

> print(output)
# A tibble: 4 x 10
      A     B     C     D     E     F     G     H     I     J
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     1     1     1     1     1     1     1     1     1
2    NA    NA     1     1     1     1     1    NA    NA    NA
3    NA     1    NA    NA    NA    NA    NA    NA    NA     1
4    NA     1    NA    NA    NA    NA    NA     1     1     1

Thank you in advance for your help!

CodePudding user response：

We can identify the position of run lengths of NA that are less or equal to 4, not at the beginning or end of a row, and assign a "1" to ex_input based on those. First I change base rle a bit to return a data.frame which is a bit easier to work with.

rlen <- function (x) {
  if (!is.vector(x) && !is.list(x)) stop("'x' must be a vector of an atomic type")
  n <- length(x)
  if (n == 0L) return(data.frame(lengths = integer(), values = x))
  y <- x[-1L] != x[-n]
  i <- c(which(y | is.na(y)), n)
  within(
    data.frame(
      lengths = diff(c(0L, i)),
      values = x[i]), {
        end = cumsum(lengths)
        start = c(1, end)[1:length(end)]
      })
}

is.na(ex_input) converts the data.frame to TRUE and FALSE, circumventing some annoyances with rle. After the apply step, we have vectors of positions to be replaced, which can be NULL. Using imap, we can access the list index, and insert that in the row slot of [, returning invisibly because we are after the side effects.

library(tidyverse)
y <- apply(is.na(ex_input), 1, function(x){
  ids <- rlen(x) %>%
    mutate(rnum = seq_along(lengths)) %>%
    filter(rnum != nrow(.) & rnum != 1 & values & end-start <= 4)
  if(nrow(ids) != 0) ids$start:ids$end
})

invisible(imap(y, ~ if(!is.null(.x)) ex_input[.y, .x] <<- 1))
ex_input

# A tibble: 4 x 10
      A     B     C     D     E     F     G     H     I     J
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     1     1     1     1     1     1     1     1     1
2    NA    NA     1     1     1     1     1    NA    NA    NA
3    NA     1    NA    NA    NA    NA    NA    NA    NA     1
4    NA     1    NA    NA    NA    NA    NA     1     1     1

CodePudding user response：

Here is one possible solution (though not so elegant in places). I first put the data in a long format. Next, I split the data into a list of dataframes by group (i.e., each row), then I use shift from data.table to get values from up to 2 rows ahead and 2 rows behind, which then is bound together, then I get the sum of the values. The logic is that if a given cell is NA, then there should be at least a sum of 2 from both directions to be able to fill that cell in with a 1. Then, I unlist all of the sums, then bind back to the long form of the data, df_long. The first case_when is used to find any sums that is greater than 2, if so, then change value to 1. The second case_when is to deal with the D2 and F2, as these would not meet your first criteria of having a 1 in each direction within 4 columns. But we can use lag and lead to look ahead and behind and if there is a 1 on both sides, then we can change to 1.

library(tidyverse)
library(data.table)

df_long <- ex_input %>%
  mutate(row = row_number()) %>%
  pivot_longer(-row)

df <- df_long %>%
  group_split(row) %>%
  map(., function(x) rowSums(do.call(cbind, shift(x$value, -2:2)), na.rm = TRUE)) %>%
  unlist() %>%
  bind_cols(df_long, sums = .) %>%
  group_by(row) %>%
  mutate(value = case_when(value == 1 ~ value,
                           sums >= 2 ~ 1,
                           TRUE ~ NA_real_),
         value = case_when(value == 1 ~ value,
                           lead(value) == 1 & lag(value) == 1 ~ 1,
                           TRUE ~ NA_real_)) %>%
  ungroup() %>%
  select(-sums) %>%
  pivot_wider(names_from = name, values_from = value) %>%
  select(-row)

Output

      A     B     C     D     E     F     G     H     I     J
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     1     1     1     1     1     1     1     1     1
2    NA    NA     1     1     1     1     1    NA    NA    NA
3    NA     1    NA    NA    NA    NA    NA    NA    NA     1
4    NA     1    NA    NA    NA    NA    NA     1     1     1

CodePudding user response：

This works in base-R (not counting everything being a tibble).

for(i in seq(nrow(ex_input))){
  r <- ex_input[i,]
  for(cl in seq(ncol(r))){
    
    if(cl 4 > ncol(r)){break()}
    r2 <- r[cl:c(cl 4)]  
    if(sum(r2, na.rm = T) >= 2){

      colms <- which(colnames(r2) %in% names(r2[which(!is.na(r2))]))
      r[seq(min(colms cl-1), max(colms cl-1))] <- 1
      
      ex_input[i,] <- r
    }
  }
}  

ex_input
    
# A tibble: 4 x 10
      A     B     C     D     E     F     G     H     I     J
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     1     1     1     1     1     1     1     1     1
2    NA    NA     1     1     1     1     1    NA    NA    NA
3    NA     1    NA    NA    NA    NA    NA    NA    NA     1
4    NA     1    NA    NA    NA    NA    NA     1     1     1