I like to replace the values of one column with the values of another column if a certain condition is met. Below, you find a toy example where I am first achieving this goal manually and then draft a somewhat clumsy programmatic solution to the problem. (Of course, my real data contain much more variables and also require a more complex conditional replacement of values)
library(dplyr)
library(purrr)
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
## example data
set.seed(123)
df <- tibble(
x = sample(0:1, 10, replace = T),
x_99 = sample(3:4, 10, replace = T),
y = sample(0:1, 10, replace = T),
y_99 = sample(3:4, 10, replace = T)
)
df
#> # A tibble: 10 × 4
#> x x_99 y y_99
#> <int> <int> <int> <int>
#> 1 0 4 0 3
#> 2 0 4 1 4
#> 3 0 4 0 3
#> 4 1 3 0 4
#> 5 0 4 0 4
#> 6 1 3 0 3
#> 7 1 4 1 3
#> 8 1 3 1 3
#> 9 0 3 0 3
#> 10 0 3 1 4
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# manual mutate
df |>
transmute(
x = ifelse(x == 0, x_99, x),
y = ifelse(y == 0, y_99, y)
)
#> # A tibble: 10 × 2
#> x y
#> <int> <int>
#> 1 4 3
#> 2 4 1
#> 3 4 3
#> 4 1 4
#> 5 4 4
#> 6 1 3
#> 7 1 1
#> 8 1 1
#> 9 3 3
#> 10 3 1
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# verbose programmatic solution
helper <- function(df, x, xnew) {
df[df[x] == 0, ][, x] <- df[df[x] == 0, ][, xnew]
return(tibble(df[x]))
}
col.vec1 <- c("x", "y")
col.vec2 <- paste0(col.vec1, "_99")
map2(
col.vec1, col.vec2,
~ helper(df, .x, .y)
) |>
bind_cols()
#> # A tibble: 10 × 2
#> x y
#> <int> <int>
#> 1 4 3
#> 2 4 1
#> 3 4 3
#> 4 1 4
#> 5 4 4
#> 6 1 3
#> 7 1 1
#> 8 1 1
#> 9 3 3
#> 10 3 1
Created on 2022-09-11 by the reprex package (v2.0.1)
I am actually doing my data cleaning mainly with tidyverse tools. But in this case, this was definitely beyond my skills. I still found a solution, but I am pretty sure that there is a much more elegant and less verbose one. I very much appreciate your suggestions.
CodePudding user response:
I think across
with a peek at cur_column()
will work for you. The only thing you need to really change is the c(x, y)
to some clear tidy way of identifying the columns.
df %>%
mutate(
across(c(x, y), ~ if_else(. == 0L, get(paste0(cur_column(), "_99")), .))
)
# # A tibble: 10 x 4
# x x_99 y y_99
# <int> <int> <int> <int>
# 1 4 4 3 3
# 2 4 4 1 4
# 3 4 4 3 3
# 4 1 3 4 4
# 5 4 4 4 4
# 6 1 3 3 3
# 7 1 4 1 3
# 8 1 3 1 3
# 9 3 3 3 3
# 10 3 3 1 4