How to extract all matches of pattern and combine distinct matches using R?-CodePudding

I would like to extract all the matches from a string using a regex pattern, and then combine only the distinct matches into a single string.

I want to extract all the words that proceed the word films and then combine only the distinct words. I'm trying the following script, which combines all the matches:

text1 <- "Netflix announced 34 new Korean films to hit the streaming platform in 2023, along with 12 Japanese films. The upcoming titles, which Netflix calls their “biggest-ever lineup of Korean films and series."

pattern <- "\\b[[:alpha:]] \\b(?=\\sfilms)"

map_chr(str_extract_all(text1, pattern), paste, collapse = " | ")

> 'Korean | Japanese | Korean'

Desired output:

'Korean | Japanese'

CodePudding user response：

Try this

text1 <- "Netflix announced 34 new Korean films to hit the streaming platform in 2023, along with 12 Japanese films. The upcoming titles, which Netflix calls their “biggest-ever lineup of Korean films and series."

pattern <- "\\b[[:alpha:]] \\b(?=\\sfilms)"

paste(unique((str_extract_all(text1, pattern)[[1]])), collapse = " | ")

We get

"Korean | Japanese"

CodePudding user response：

Please unlist as done in below code and then consider the unique elements

text1 <- "Netflix announced 34 new Korean films to hit the streaming platform in 2023, along with 12 Japanese films. The upcoming titles, which Netflix calls their “biggest-ever lineup of Korean films and series."

pattern <- "\\b[[:alpha:]] \\b(?=\\sfilms)"

map_chr(unique(unlist(str_extract_all(text1, pattern))), paste, collapse = " | ")