I have the following dataframe called df (dput
below):
string1 string2
1 bDe lDC
2 gj iE
3 Plm DOl
4 QWVe dVtQ
I would like to return the similar character(s) of the columns string1 and string2. If there are multiple similar character, it should return them all. The desired output should look like this:
string1 string2 similar
1 bDe lDC D
2 gj iE <NA>
3 Plm DOl l
4 qWVe dVtq Vq
So I was wondering if anyone knows how to return similar characters of strings in R?
dput
of df:
df <- structure(list(string1 = c("bDe", "gj", "Plm", "QWVe"), string2 = c("lDC",
"iE", "DOl", "dVtQ")), class = "data.frame", row.names = c(NA,
-4L))
CodePudding user response:
One way would be to use str_extract_all()
and use the second string as a regex pattern:
library(stringr)
library(dplyr)
library(purrr)
df %>%
mutate(similar = map_chr(str_extract_all(string1, sprintf("[%s]", string2)), str_flatten))
string1 string2 similar
1 bDe lDC D
2 gj iE
3 Plm DOl l
4 QWVe dVtQ QV
CodePudding user response:
A second option would be to use strsplit
and intersect
:
library(dplyr, warn=FALSE)
library(purrr)
df |>
mutate(similar = purrr::map2_chr(
strsplit(string1, split = ""),
strsplit(string2, split = ""),
~ paste(intersect(.x, .y), collapse = "")))
#> string1 string2 similar
#> 1 bDe lDC D
#> 2 gj iE
#> 3 Plm DOl l
#> 4 QWVe dVtQ QV