Replace values in multiple columns based on a second set of columns-CodePudding

I have two 'sets' of binary variables in a R data frame (X and Y). The X variables contain the information if the value in Y is valid (1) or not (0). I would like to use the X variables as a mask and replace all invalid cells of the Y variables with NA, where the corresponding X variable is 0.

Simplified example:

input <- tibble(X1 = c(1,1,0,1), X2 = c(1,1,1,0), X3 = c(0,0,0,1), Y1 = c(1,0,1,1), Y2 = c(1,1,0,0), Y3 = c(0,0,0,0))
output <- tibble(Z1 = c(1,0,NA,1), Z2 = c(1,1,0,NA), Z3 = c(NA,NA,NA,0))

CodePudding user response：

Here’s a base R solution:

output <- input[, 4:6]
output[input[, 1:3] == 0] <- NA
names(output) <- paste0("Z", 1:3)

Result:

# A tibble: 4 × 3
     Z1    Z2    Z3
  <dbl> <dbl> <dbl>
1     1     1    NA
2     0     1    NA
3    NA     0    NA
4     1    NA     0

CodePudding user response：

We can use the tidyverse.

library(stringr)
library(purrr)
library(dplyr)

input %>%
    split.default(str_remove_all(names(.), '\\d ')) %>%
    pmap_dfc(~ifelse(.x==1, .y, NA)) %>%
    rename_with(~paste0('Z', seq_len(length(.))))

# A tibble: 4 × 3
     Z1    Z2    Z3
  <dbl> <dbl> <dbl>
1     1     1    NA
2     0     1    NA
3    NA     0    NA
4     1    NA     0