Home > Enterprise >  extract string from multiple columns in new column
extract string from multiple columns in new column

Time:09-17

I want to find a word in different columns and mutate it in a new column.

"data" is an example and "goal" is what I want. I tried a lot but I didn't get is work.

 library(dplyr)
 library(stringr)

 data <- tibble(
    component1 = c(NA, NA, "Word", NA, NA, "Word"),
    component2 = c(NA, "Word", "different_word", NA, NA, "not_this")
    )

 goal <- tibble(
    component1 = c(NA, NA, "Word", NA, NA, "Word"),
    component2 = c(NA, "Word", "different_word", NA, NA, "not_this"),
    component = c(NA, "Word", "Word", NA, NA, "Word")
    )


not_working <- data %>%
     mutate(component = across(starts_with("component"), ~ str_extract(.x, "Word")))

CodePudding user response:

For your provided data structure we could use coalesce:

library(dplyr)

data %>% 
  mutate(component = coalesce(component1, component2))
component1 component2     component
  <chr>      <chr>          <chr>    
1 NA         NA             NA       
2 NA         Word           Word     
3 Word       different_word Word     
4 NA         NA             NA       
5 NA         NA             NA       
6 Word       not_this       Word     

CodePudding user response:

With if_any and str_detect:

library(dplyr)
library(stringr)
data %>% 
  mutate(component = ifelse(if_any(starts_with("component"), str_detect, "Word"), "Word", NA))

output

  component1 component2     component
  <chr>      <chr>          <chr>    
1 NA         NA             NA       
2 NA         Word           Word     
3 Word       different_word Word     
4 NA         NA             NA       
5 NA         NA             NA       
6 Word       not_this       Word   

If you wanna stick to str_extract, this would be the way to go:

data %>%
  mutate(across(starts_with("component"), str_extract, "Word", 
         .names = "{.col}_extract")) %>% 
  mutate(component = coalesce(component1_extract, component2_extract),
         .keep = "unused")
# A tibble: 6 × 3
  component1     component2     component
  <chr>          <chr>          <chr>    
1 NA             NA             NA       
2 NA             Word           Word     
3 Word           different_word Word     
4 NA             NA             NA       
5 NA             NA             NA       
6 different_word Word           Word     
  • Related