Home > database >  Conditional mutate across multiple columns using values from other columns - looking for an efficien
Conditional mutate across multiple columns using values from other columns - looking for an efficien

Time:09-12

I like to replace the values of one column with the values of another column if a certain condition is met. Below, you find a toy example where I am first achieving this goal manually and then draft a somewhat clumsy programmatic solution to the problem. (Of course, my real data contain much more variables and also require a more complex conditional replacement of values)

library(dplyr)
library(purrr)

# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

## example data
set.seed(123)
df <- tibble(
  x = sample(0:1, 10, replace = T),
  x_99 = sample(3:4, 10, replace = T),
  y = sample(0:1, 10, replace = T),
  y_99 = sample(3:4, 10, replace = T)
)

df
#> # A tibble: 10 × 4
#>        x  x_99     y  y_99
#>    <int> <int> <int> <int>
#>  1     0     4     0     3
#>  2     0     4     1     4
#>  3     0     4     0     3
#>  4     1     3     0     4
#>  5     0     4     0     4
#>  6     1     3     0     3
#>  7     1     4     1     3
#>  8     1     3     1     3
#>  9     0     3     0     3
#> 10     0     3     1     4

# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# manual mutate

df |>
  transmute(
    x = ifelse(x == 0, x_99, x),
    y = ifelse(y == 0, y_99, y)
  )
#> # A tibble: 10 × 2
#>        x     y
#>    <int> <int>
#>  1     4     3
#>  2     4     1
#>  3     4     3
#>  4     1     4
#>  5     4     4
#>  6     1     3
#>  7     1     1
#>  8     1     1
#>  9     3     3
#> 10     3     1

# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# verbose programmatic solution

helper <- function(df, x, xnew) {
  df[df[x] == 0, ][, x] <- df[df[x] == 0, ][, xnew]
  return(tibble(df[x]))
}

col.vec1 <- c("x", "y")
col.vec2 <- paste0(col.vec1, "_99")

map2(
  col.vec1, col.vec2,
  ~ helper(df, .x, .y)
) |>
  bind_cols()
#> # A tibble: 10 × 2
#>        x     y
#>    <int> <int>
#>  1     4     3
#>  2     4     1
#>  3     4     3
#>  4     1     4
#>  5     4     4
#>  6     1     3
#>  7     1     1
#>  8     1     1
#>  9     3     3
#> 10     3     1

Created on 2022-09-11 by the reprex package (v2.0.1)

I am actually doing my data cleaning mainly with tidyverse tools. But in this case, this was definitely beyond my skills. I still found a solution, but I am pretty sure that there is a much more elegant and less verbose one. I very much appreciate your suggestions.

CodePudding user response:

I think across with a peek at cur_column() will work for you. The only thing you need to really change is the c(x, y) to some clear tidy way of identifying the columns.

df %>%
  mutate(
    across(c(x, y), ~ if_else(. == 0L, get(paste0(cur_column(), "_99")), .))
  )
# # A tibble: 10 x 4
#        x  x_99     y  y_99
#    <int> <int> <int> <int>
#  1     4     4     3     3
#  2     4     4     1     4
#  3     4     4     3     3
#  4     1     3     4     4
#  5     4     4     4     4
#  6     1     3     3     3
#  7     1     4     1     3
#  8     1     3     1     3
#  9     3     3     3     3
# 10     3     3     1     4
  • Related