Home > database >  Search for string in one column using strings from another column in another dataframe in R
Search for string in one column using strings from another column in another dataframe in R

Time:04-11

I have 2 dataframes (both dataframes have 1 column each) and I want to search for strings present in the 1st column in the 1st dataframe for their presence in each row in the 2nd column of the other dataframe. If present, return the string value in a new column ("String") and a boolean column ("Match"). I tried a few commands like grepl and stringr but could not make it work. Thanks!

Sample below:

1st Dataframe

SName
svc1
svc123
svc567

2nd Dataframe

Description
- ls svc368 -@#@#
mkdir test svc #*-/
mkdir df2 svc123 #*-/
mkdir random svc1 #*-/
mkdir test svc1 *&%^$%$
mkdir fr svc567 *&%@
mkdir 82 svc56 *&??//
mkdir kol svc *&

Result desired:

Description Match String
- ls svc368 -@#@# No
mkdir test svc #*-/ No
mkdir df2 svc123 #*-/ Yes svc123
mkdir random svc1 #*-/ Yes svc1
mkdir test svc1 *&%^$%$ Yes svc1
mkdir fr svc567 *&%@ Yes svc567
mkdir 82 svc56 *&??// No
mkdir kol svc *& No

CodePudding user response:

One approach would be to form a regex alternation of the terms in the first dataframe. Then use grepl and sub to generate the output columns.

regex <- paste0("\\b(", paste(df1$SName, collapse="|"), ")\\b")
df2$match <- ifelse(grepl(regex, df2$Description), "Yes", "No")
df2$String <- ifelse(grepl(regex, df2$Description),
                     sub(paste0(".*", regex, ".*"), "\\1", df2$Description),
                     "")
df2

            Description match String
1     - ls svc368 -@#@#    No       
2   mkdir test svc #*-/    No       
3 mkdir df2 svc123 #*-/   Yes svc123
...
  •  Tags:  
  • r
  • Related