Home > Software engineering >  Filling a row from a certain column under certain condition in r
Filling a row from a certain column under certain condition in r

Time:06-02

I want to fill a row with "-3" from some specific columns (B and E in this example) to the end when these columns contain "-3" in that row. I figured out a solution, but it is extremely slow in my original dataset (2435 x 431 cells) and with 15 columns to check for values == "-3".

In this example, the rows to fill with "-3" are 4 and 10 from column "B" and 3 from column "D". Note that 4 and 10 also contain values == "-3" in column "E" but they were already filled when iterated over column "B"

library(tidyverse)

values <- as.character(-3:3)

set.seed(123)

data <- tibble(
  A = sample(values, 10, replace = T),
  B = sample(values, 10, replace = T),
  C = sample(values, 10, replace = T),
  D = sample(values, 10, replace = T),
  E = sample(values, 10, replace = T),
  F = sample(values, 10, replace = T)
)

fill_minus_three <- function(x){
  for (i in 1:length(x)){
    if ((names(x)[i] %in% c("B", "E")) && x[i] == "-3"){
      x[i:length(x)] <- "-3"
      break
    }
  }
  return(x)
}

t(apply(data, 1, fill_minus_three)) %>% 
  as_tibble()

#> # A tibble: 10 x 6
#> A     B     C     D     E     F    
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1  3    0     0    -1     1     1    
#> 2  3    2    -3     0     3    -1   
#> 3 -1    2    -3     2    -3    -3   
#> 4  2   -3    -3    -3    -3    -3   
#> 5 -1   -2    -1    -1    -2    -2   
#> 6 -2   -1    -2     3     3     1    
#> 7 -2    1     3     1    -1     1    
#> 8  2   -1    -2     0     0     0    
#> 9 -1   -1    -3     3     1     3    
#> 10 1   -3    -3    -3    -3    -3  

In adittion, I would like to use map_* family since the rest of the scripts follow the tidyverse approach (however, this is optional).

CodePudding user response:

Si entenc bé, you're trying to change values in multiple columns according to the values in columns B and E.

No need for for loops or map/apply functions, you can just use mutate and pair it with across:

library(dplyr)

data |> 
  mutate(across(C:F, ~ if_else(B == "-3", "-3", .x)),
         F = if_else(E == "-3", "-3", F))

Output

#> # A tibble: 10 × 6
#>    A     B     C     D     E     F    
#>    <chr> <chr> <chr> <chr> <chr> <chr>
#>  1 3     0     0     -1    1     1    
#>  2 3     2     -3    0     3     -1   
#>  3 -1    2     -3    2     -3    -3   
#>  4 2     -3    -3    -3    -3    -3   
#>  5 -1    -2    -1    -1    -2    -2   
#>  6 -2    -1    -2    3     3     1    
#>  7 -2    1     3     1     -1    1    
#>  8 2     -1    -2    0     0     0    
#>  9 -1    -1    -3    3     1     3    
#> 10 1     -3    -3    -3    -3    -3

Created on 2022-06-02 by the reprex package (v2.0.1)

CodePudding user response:

I give a thought and I realized that I do not need to iterate over all columns, just the ones I need to check for value == "-3". So I changed the function a bit, I works much faster (0.3 s in the entire dataset while previous took two minutes). Yet, it is not tidy-friendly. :/

positions <- which(names(data) %in% c("B", "E"))

fill_minus_three <- function(x, positions){
  for (i in positions){
    if (x[i] == "-3"){
      x[i:length(x)] <- "-3"
      break
    }
  }
  return(x)
}

t(apply(data, 1, function(x) fill_minus_three(x, positions))) %>% 
  as_tibble()
  • Related