Home > front end >  replace one column with another using regex matching in R
replace one column with another using regex matching in R

Time:01-27

I am working with some survey data and I would like to replace the contents of one survey item/column with another survey item, while keeping original cell contents. Ex - replace Q2_1.x with Q2_1.y if Q2_1.x is missing.

Here is an example of my data:

org_dat <- read_table('ID   Q2_1.x  Q2_2.x  Q2_1.y  Q2_2.y  Q14_1.x Q14_1.y Q15
1   Yes NA  NA  NA  Sometimes   NA  NA
2   -99 NA  No  NA  NA  Always  Yes
3   NA  NA  NA  NA  NA  NA  NA
4   NA  NA  NA  No  NA  NA  No 
5   NA  NA  NA  NA  NA  Always  NA
6   NA  NA  NA  No  NA  NA  NA') %>% mutate_all(as.character)

Here is my desired output:

dat_out <- read_table('ID   Q2_1    Q2_2    Q14_1   Q15
1   Yes NA  Sometimes   NA
2   No  NA  Always  Yes
3   NA  NA  NA  NA
4   NA  No  NA  No
5   NA  NA  Always  NA
6   NA  No  NA  NA')

Current solution I know that I can replace each of these columns individually, but I have a lot of columns to deal with and I would like to use a smart dplyr/grepl way of solving this! Any ideas? It is always the case that I am replacing the Q*.x with the Q*.y.

org_dat %>% mutate(Q2_1.x = case_when(is.na(Q2_1.x) ~ Q2_1.y,
                                TRUE ~ Q2_1.x)) %>% 
       mutate(Q2_2.x = case_when(is.na(Q2_2.x) ~ Q2_2.y,
                                TRUE ~ Q2_2.x)) %>%
  mutate(Q14_1.x = case_when(is.na(Q14_1.x) ~ Q14_1.y,
                            TRUE ~ Q14_1.x)) %>%
  rename(Q2_1 = Q2_1.x,
         Q2_2 = Q2_2.x,
         Q14_1 = Q14_1.x) %>%
  select(-matches("x|y"))

CodePudding user response:

Here is an option with across and coalesce, loop across the columns that ends_with 'x', replace (str_replace) the substring in column name (cur_column()) from 'x' to 'y', get the column value, do coalesce with the looped column, and subsequently, remove the substring from column name in .names

library(dplyr)
library(stringr)
org_dat %>% 
    mutate(across(ends_with("x"),
     ~ coalesce(., get(str_replace(cur_column(), "x", "y"))),
        .names = "{str_remove(.col, '.x')}"), .keep = "unused", .before = 2)

-output

# A tibble: 6 × 5
  ID    Q2_1  Q2_2  Q14_1     Q15  
  <chr> <chr> <chr> <chr>     <chr>
1 1     Yes   <NA>  Sometimes <NA> 
2 2     No    <NA>  Always    Yes  
3 3     <NA>  <NA>  <NA>      <NA> 
4 4     <NA>  No    <NA>      No   
5 5     <NA>  <NA>  Always    <NA> 
6 6     <NA>  No    <NA>      <NA> 
  •  Tags:  
  • Related