I'm trying to get all the "banana word" ocurrences of a given object, but the str_extract returns only the first occurence. My code:
all_terms <- c("banana word2 word3 word4 banana split word2 word3 word4",
"x y z",
"banana ice cream")
banana_terms <- all_terms %>%
str_extract("banana. ") %>%
word(1,2)
banana_terms
Out: [1] "banana word2" NA "banana ice"
What I wanted:
Out: [1] "banana word2" "banana split", "banana ice"
CodePudding user response:
Use str_extract_all
and \\w
to get the word after banana (and banana).
all_terms %>%
str_extract_all("banana.\\w ") %>%
unlist()
# [1] "banana word2" "banana split" "banana ice"
Without unlist, you get a list:
str_extract_all(all_terms, "banana.\\w ")
[[1]]
[1] "banana word2" "banana split"
[[2]]
character(0)
[[3]]
[1] "banana ice"
CodePudding user response:
If you want to use str_extract
, you need to make sure each "banana word" is an individual element in a vector.
str_split
is used to split every "empty space" "banana" pattern (" (?=banana)"
) into individual element. Then use the regex (banana.\\w
) provided by @Maël in str_extract
.
Finally, remove NA
in the vector.
library(stringr)
all_banana <- str_extract(str_split(all_terms, " (?=banana)", simplify = T), "banana.\\w ")
all_banana <- all_banana[!is.na(all_banana)]
all_banana
[1] "banana word2" "banana ice" "banana split"
CodePudding user response:
In base R
, we can use regmatches/gregexpr
unlist(regmatches(all_terms, gregexpr("banana\\s \\S ", all_terms)))
[1] "banana word2" "banana split" "banana ice"