I have a list with some texts and within these texts I want to retrieve all the occurences together with the n-words (in this case, 4) after it. Here is my example:
all_terms <- c("hospital santa clara bla bla bla bla bla hospital san francisco",
" blablabla ",
"hospital holy mary, bla bla bla hospital 9 de julho")
all_terms %>%
str_extract_all("hospital.\\w ") %>%
unlist()
[1] "hospital santa" "hospital san" "hospital holy" "hospital 9"
What I wanted:
[1] "hospital santa clara bla" "hospital san francisco" "hospital holy mary" "hospital 9 de julho"
CodePudding user response:
Try this
str_extract_all(all_terms, "hospital(\\s\\w ){1,3}")
[[1]]
[1] "hospital santa clara bla" "hospital san francisco"
[[2]]
character(0)
[[3]]
[1] "hospital holy mary" "hospital 9 de julho"
CodePudding user response:
library(stringr)
all_terms <- c("hospital santa clara bla bla bla bla bla hospital san francisco",
" blablabla ",
"hospital holy mary, bla bla bla hospital 9 de julho")
all_terms %>%
str_extract_all("hospital\\s\\S*\\s\\S*\\s*\\S*\\s*\\S*") %>%
unlist() %>%
str_replace_all(pattern=",.*", replacement = "")
#> [1] "hospital santa clara bla bla" "hospital san francisco"
#> [3] "hospital holy mary" "hospital 9 de julho"
Created on 2022-04-19 by the reprex package (v2.0.1)