Return similar character of two strings in R-CodePudding

I have the following dataframe called df (dput below):

  string1 string2
1     bDe     lDC
2      gj      iE
3     Plm     DOl
4    QWVe    dVtQ

I would like to return the similar character(s) of the columns string1 and string2. If there are multiple similar character, it should return them all. The desired output should look like this:

  string1 string2 similar
1     bDe     lDC       D
2      gj      iE    <NA>
3     Plm     DOl       l
4    qWVe    dVtq      Vq

So I was wondering if anyone knows how to return similar characters of strings in R?

dput of df:

df <- structure(list(string1 = c("bDe", "gj", "Plm", "QWVe"), string2 = c("lDC", 
"iE", "DOl", "dVtQ")), class = "data.frame", row.names = c(NA, 
-4L))

CodePudding user response：

One way would be to use str_extract_all() and use the second string as a regex pattern:

library(stringr)
library(dplyr)
library(purrr)

df %>%
  mutate(similar = map_chr(str_extract_all(string1, sprintf("[%s]", string2)), str_flatten)) 

  string1 string2 similar
1     bDe     lDC       D
2      gj      iE        
3     Plm     DOl       l
4    QWVe    dVtQ      QV

CodePudding user response：

A second option would be to use strsplit and intersect:

library(dplyr, warn=FALSE)
library(purrr)

df |> 
  mutate(similar = purrr::map2_chr(
    strsplit(string1, split = ""), 
    strsplit(string2, split = ""), 
    ~ paste(intersect(.x, .y), collapse = "")))
#>   string1 string2 similar
#> 1     bDe     lDC       D
#> 2      gj      iE        
#> 3     Plm     DOl       l
#> 4    QWVe    dVtQ      QV