I have two 'sets' of binary variables in a R data frame (X and Y). The X variables contain the information if the value in Y is valid (1) or not (0). I would like to use the X variables as a mask and replace all invalid cells of the Y variables with NA, where the corresponding X variable is 0.
Simplified example:
input <- tibble(X1 = c(1,1,0,1), X2 = c(1,1,1,0), X3 = c(0,0,0,1), Y1 = c(1,0,1,1), Y2 = c(1,1,0,0), Y3 = c(0,0,0,0))
output <- tibble(Z1 = c(1,0,NA,1), Z2 = c(1,1,0,NA), Z3 = c(NA,NA,NA,0))
CodePudding user response:
Here’s a base R solution:
output <- input[, 4:6]
output[input[, 1:3] == 0] <- NA
names(output) <- paste0("Z", 1:3)
Result:
# A tibble: 4 × 3
Z1 Z2 Z3
<dbl> <dbl> <dbl>
1 1 1 NA
2 0 1 NA
3 NA 0 NA
4 1 NA 0
CodePudding user response:
We can use the tidyverse.
library(stringr)
library(purrr)
library(dplyr)
input %>%
split.default(str_remove_all(names(.), '\\d ')) %>%
pmap_dfc(~ifelse(.x==1, .y, NA)) %>%
rename_with(~paste0('Z', seq_len(length(.))))
# A tibble: 4 × 3
Z1 Z2 Z3
<dbl> <dbl> <dbl>
1 1 1 NA
2 0 1 NA
3 NA 0 NA
4 1 NA 0