I would like to extract all the matches from a string using a regex pattern, and then combine only the distinct
matches into a single string.
I want to extract all the words that proceed the word films
and then combine only the distinct
words. I'm trying the following script, which combines all the matches:
text1 <- "Netflix announced 34 new Korean films to hit the streaming platform in 2023, along with 12 Japanese films. The upcoming titles, which Netflix calls their “biggest-ever lineup of Korean films and series."
pattern <- "\\b[[:alpha:]] \\b(?=\\sfilms)"
map_chr(str_extract_all(text1, pattern), paste, collapse = " | ")
> 'Korean | Japanese | Korean'
Desired output:
'Korean | Japanese'
CodePudding user response:
Try this
text1 <- "Netflix announced 34 new Korean films to hit the streaming platform in 2023, along with 12 Japanese films. The upcoming titles, which Netflix calls their “biggest-ever lineup of Korean films and series."
pattern <- "\\b[[:alpha:]] \\b(?=\\sfilms)"
paste(unique((str_extract_all(text1, pattern)[[1]])), collapse = " | ")
We get
"Korean | Japanese"
CodePudding user response:
Please unlist as done in below code and then consider the unique elements
text1 <- "Netflix announced 34 new Korean films to hit the streaming platform in 2023, along with 12 Japanese films. The upcoming titles, which Netflix calls their “biggest-ever lineup of Korean films and series."
pattern <- "\\b[[:alpha:]] \\b(?=\\sfilms)"
map_chr(unique(unlist(str_extract_all(text1, pattern))), paste, collapse = " | ")