I am working with some survey data and I would like to replace the contents of one survey item/column with another survey item, while keeping original cell contents. Ex - replace Q2_1.x with Q2_1.y if Q2_1.x is missing.
Here is an example of my data:
org_dat <- read_table('ID Q2_1.x Q2_2.x Q2_1.y Q2_2.y Q14_1.x Q14_1.y Q15
1 Yes NA NA NA Sometimes NA NA
2 -99 NA No NA NA Always Yes
3 NA NA NA NA NA NA NA
4 NA NA NA No NA NA No
5 NA NA NA NA NA Always NA
6 NA NA NA No NA NA NA') %>% mutate_all(as.character)
Here is my desired output:
dat_out <- read_table('ID Q2_1 Q2_2 Q14_1 Q15
1 Yes NA Sometimes NA
2 No NA Always Yes
3 NA NA NA NA
4 NA No NA No
5 NA NA Always NA
6 NA No NA NA')
Current solution I know that I can replace each of these columns individually, but I have a lot of columns to deal with and I would like to use a smart dplyr/grepl way of solving this! Any ideas? It is always the case that I am replacing the Q*.x with the Q*.y.
org_dat %>% mutate(Q2_1.x = case_when(is.na(Q2_1.x) ~ Q2_1.y,
TRUE ~ Q2_1.x)) %>%
mutate(Q2_2.x = case_when(is.na(Q2_2.x) ~ Q2_2.y,
TRUE ~ Q2_2.x)) %>%
mutate(Q14_1.x = case_when(is.na(Q14_1.x) ~ Q14_1.y,
TRUE ~ Q14_1.x)) %>%
rename(Q2_1 = Q2_1.x,
Q2_2 = Q2_2.x,
Q14_1 = Q14_1.x) %>%
select(-matches("x|y"))
CodePudding user response:
Here is an option with across
and coalesce
, loop across
the columns that ends_with
'x', replace (str_replace
) the substring in column name (cur_column()
) from 'x' to 'y', get
the column value, do coalesce
with the looped column, and subsequently, remove the substring from column name in .names
library(dplyr)
library(stringr)
org_dat %>%
mutate(across(ends_with("x"),
~ coalesce(., get(str_replace(cur_column(), "x", "y"))),
.names = "{str_remove(.col, '.x')}"), .keep = "unused", .before = 2)
-output
# A tibble: 6 × 5
ID Q2_1 Q2_2 Q14_1 Q15
<chr> <chr> <chr> <chr> <chr>
1 1 Yes <NA> Sometimes <NA>
2 2 No <NA> Always Yes
3 3 <NA> <NA> <NA> <NA>
4 4 <NA> No <NA> No
5 5 <NA> <NA> Always <NA>
6 6 <NA> No <NA> <NA>