Is there a way to show the matching element of a specific case using the grepl function in R?-CodePudding

I checked whether the brands of the data frame "df1"

 brands
1 Nike   
2 Adidas  
3 D&G

are to be found in the elements of the following column of the data frame "df2"

statements 
1 I love Nike   
2 I don't like Adidas   
3 I hate Puma

For this I use the code:

subset_df2 <- df2[grepl(paste(df1$brands, collapse="|"), ignore.case=TRUE, df2$statements), ]

The code works and I get a subset of df2 containing only the lines with the desired brands:

 statements*
1 I love Nike   
2 I don't like Adidas

Is there also a way to display which element of the cells from df2$statements exactly matches with df1$brands? For instance, a vector like [Nike, Adidas]. So, I only want to get the Nike and Adidas elements as my output and not the whole statement.

Many thanks in advance!

CodePudding user response：

brands <- c("nike", "adidas", "d&g")  # lower-case here
text <- c("I love Nike", "I love Adidas")
ptns <- paste(brands, collapse = "|")
ptns
# [1] "nike|adidas|d&g"
text2 <- text[NA]
text2[grepl(ptns, text, ignore.case=TRUE)] <- gsub(paste0(".*(", ptns, ").*"), "\\1", text, ignore.case = TRUE)
text2
# [1] "Nike"   "Adidas"

The pre-assignment of text[NA] is because gsub will make no change if the pattern is not found. I'm using text[NA], but we could also use rep(NA_character_, length(text)), it's the same effect.

If you need multiple matches per text, then perhaps

brands <- c("Nike", "Adidas", "d&g")
text <- c("I love nike", "I love Adidas and Nike")
ptns <- paste(brands, collapse = "|") 
gre <- gregexpr(ptns, text, ignore.case = TRUE)
sapply(regmatches(text, gre), paste, collapse = ";")
# [1] "nike"        "Adidas;Nike"